文章/答案/技术大牛

发布

社区首页 >问答首页 >Python高效的字符串混淆

问Python高效的字符串混淆
EN

Stack Overflow用户

提问于 2011-09-21 01:11:13

回答 3查看 13.2K关注 0票数 11

我需要混淆Unicode文本行，以减慢那些想要提取它们的人的速度。理想情况下，这可以通过一个内置的Python模块或一个小的附加库来完成；字符串的长度将与原始字符串相同或更短；并且“去混淆”应该尽可能快。

我尝试过各种字符交换和XOR例程，但它们都很慢。Base64和十六进制编码大大增加了大小。到目前为止，我发现的最有效的方法是在最低设置(1)下使用zlib进行压缩。有没有更好的方法？

python

string

unicode

回答 3

Stack Overflow用户

发布于 2011-09-21 02:09:23

这在bytes对象上使用了一个简单、快速的加密方案。

# For Python 3 - strings are Unicode, print is a function

def obfuscate(byt):
    # Use same function in both directions.  Input and output are bytes
    # objects.
    mask = b'keyword'
    lmask = len(mask)
    return bytes(c ^ mask[i % lmask] for i, c in enumerate(byt))

def test(s):
    data = obfuscate(s.encode())
    print(len(s), len(data), data)
    newdata = obfuscate(data).decode()
    print(newdata == s)


simple_string = 'Just plain ASCII'
unicode_string = ('sensei = \N{HIRAGANA LETTER SE}\N{HIRAGANA LETTER N}'
                  '\N{HIRAGANA LETTER SE}\N{HIRAGANA LETTER I}')

test(simple_string)
test(unicode_string)

Python 2版本：

# For Python 2

mask = 'keyword'
nmask = [ord(c) for c in mask]
lmask = len(mask)

def obfuscate(s):
    # Use same function in both directions.  Input and output are
    # Python 2 strings, ASCII only.
    return ''.join([chr(ord(c) ^ nmask[i % lmask])
                    for i, c in enumerate(s)])

def test(s):
    data = obfuscate(s.encode('utf-8'))
    print len(s), len(data), repr(data)
    newdata = obfuscate(data).decode('utf-8')
    print newdata == s


simple_string = u'Just plain ASCII'
unicode_string = (u'sensei = \N{HIRAGANA LETTER SE}\N{HIRAGANA LETTER N}'
                  '\N{HIRAGANA LETTER SE}\N{HIRAGANA LETTER I}')

test(simple_string)
test(unicode_string)

票数 12

Stack Overflow用户

发布于 2011-09-21 02:26:23

ROT13的老把戏怎么样？

Python 3：

>>> import codecs
>>> x = 'some string'
>>> y = codecs.encode(x, 'rot13')
>>> y
'fbzr fgevat'
>>> codecs.decode(y, 'rot13')
u'some string'

Python 2：

>>> x = 'some string'
>>> y = x.encode('rot13')
>>> y
'fbzr fgevat'
>>> y.decode('rot13')
u'some string'

对于unicode字符串：

>>> x = u'國碼'
>>> print x
國碼
>>> y = x.encode('unicode-escape').encode('rot13')
>>> print y
\h570o\h78op
>>> print y.decode('rot13').decode('unicode-escape')
國碼

票数 11

Stack Overflow用户

发布于 2021-02-10 17:14:42

这取决于输入的大小，如果输入超过1K，那么使用numpy大约快60倍(运行在不到2%的朴素Python代码中)。

import time
import numpy as np

mask = b'We are the knights who say "Ni"!'
mask_length = len(mask)

def mask_python(val: bytes) -> bytes:
    return bytes(c ^ mask[i % mask_length] for i, c in enumerate(val))

def mask_numpy(val: bytes) -> bytes:
    arr = np.frombuffer(val, dtype=np.int8)
    length = len(value)
    np_mask = np.tile(np.frombuffer(mask, dtype=np.int8), round(length/mask_length+0.5))[:length]
    masked = arr ^ np_mask
    return masked.tobytes()


value = b'0123456789'
for i in range(9):
    start_py = time.perf_counter()
    masked_py = mask_python(value)
    end_py = time.perf_counter()

    start_np = time.perf_counter()
    masked_np = mask_numpy(value)
    end_np = time.perf_counter()

    assert masked_py == masked_np
    print(f"{i+1} {len(value)} {end_py-start_py} {end_np-start_np}")
    value = value * 10

注：我是numpy的新手，如果有人对我的代码有任何意见，我很乐意在评论中听到。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/7488995

复制

相似问题

问Python高效的字符串混淆
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python高效的字符串混淆EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python高效的字符串混淆
EN