我用Python (纯粹为了教育目的)做了一个散列程序,但是它真的很慢(对于一个4个字符串,大约120秒)。我怎么能加快速度?
目前的优化和解释:
CharSet.get_advance中的闭包:这些比属性查找更快。iter in PasswordCracker.crack:这将循环移动到C。CharSet.next作为一个array.array:比dict快。今后可能进行的优化:
advance有点慢,但我不知道如何加快速度。代码:
import hashlib
from string import printable
from time import time
import itertools
from array import array
ENCODING = "ascii" # utf-8 for unicode support
class CharSet():
def __init__(self, chars):
chars = to_bytes(chars)
self.chars = set(chars)
self.first = chars[0]
self.last = chars[-1]
self.next = array("B", [0] * 256)
for char, next_char in zip(chars, chars[1:]):
self.next[char] = next_char
def update_chars(self, new_chars):
new_chars = to_bytes(new_chars)
new_chars = set(new_chars) - self.chars
if new_chars: # if theres anything new
self.chars |= new_chars
new_chars = list(new_chars)
self.next[self.last] = new_chars[0]
self.last = new_chars[-1]
for char, next_char in zip(new_chars, new_chars[1:]):
self.next[char] = next_char
def get_advance(self, arr, hash_):
first = self.first
last = self.last
next_ = self.next
def advance():
for ind, byte in enumerate(arr):
if byte == last:
arr[ind] = first
else:
arr[ind] = next_[byte]
return hash_(arr)
arr.append(first)
return hash_(arr)
return advance
class PasswordCracker():
def __init__(self, hash_, chars=None):
self.hash = hash_
if chars is None:
chars = printable
self.char_set = CharSet(chars)
def update_chars(self, string):
self.char_set.update_chars(string)
def crack(self, hashed):
arr = bytearray()
advance = self.char_set.get_advance(arr, self.hash)
for _ in iter(advance, hashed):
pass
return arr
def to_bytes(string):
if isinstance(string, str):
return bytearray(string, ENCODING)
elif isinstance(string, (bytes, bytearray)):
return string
else:
raise TypeError(f"Cannot convert {string} to bytes")
def get_hasher(hash_):
def hasher(bytes):
return hash_(bytes).digest()
return hasher
md5 = get_hasher(hashlib.md5)
cracker = PasswordCracker(md5)
password = input("Enter password: ")
cracker.update_chars(password)
password = md5(to_bytes(password))
start = time()
cracked = cracker.crack(password)
end = time()
print(f"Password cracked: {cracked.decode(ENCODING)}")
print(f"Time: {end - start} seconds.")分析结果(密码为"pww"):
1333313 function calls in 1.500 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.500 1.500 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 main.py:31(get_advance)
333326 0.394 0.000 1.376 0.000 main.py:35(advance)
1 0.124 0.124 1.500 1.500 main.py:58(crack)
333326 0.311 0.000 0.982 0.000 main.py:74(hasher)
333326 0.265 0.000 0.265 0.000 {built-in method _hashlib.openssl_md5}
1 0.000 0.000 1.500 1.500 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method builtins.iter}
3 0.000 0.000 0.000 0.000 {method 'append' of 'bytearray' objects}
333326 0.405 0.000 0.405 0.000 {method 'digest' of '_hashlib.HASH' objects}分析结果(密码为"pwww",附加"w"):
133333314 function calls in 190.800 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 190.799 190.799 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 main.py:31(get_advance)
33333326 65.652 0.000 169.782 0.000 main.py:35(advance)
1 21.017 21.017 190.799 190.799 main.py:58(crack)
33333326 40.640 0.000 104.130 0.000 main.py:74(hasher)
33333326 27.957 0.000 27.957 0.000 {built-in method _hashlib.openssl_md5}
1 0.000 0.000 190.800 190.800 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method builtins.iter}
4 0.000 0.000 0.000 0.000 {method 'append' of 'bytearray' objects}
33333326 35.533 0.000 35.533 0.000 {method 'digest' of '_hashlib.HASH' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}发布于 2018-03-04 15:47:24
工具
这是一个容易的问题,但很难用电脑解决。最好使用低级语言(如c )。
您不需要手动创建密码,最好使用迭代工具库。
from hashlib import md5
from time import time
from string import printable
from itertools import product, count
def passwords(encoding):
chars = [c.encode(encoding) for c in printable]
for length in count(start=1):
for pwd in product(chars, repeat=length):
yield b''.join(pwd)
def crack(search_hash, encoding):
for pwd in passwords(encoding):
if md5(pwd).digest() == search_hash:
return pwd.decode(encoding)
if __name__ == "__main__":
encoding = 'ascii' # utf-8 for unicode support
password = 'pwww'
password_hash = md5(password.encode(encoding)).digest()
start = time()
cracked = crack(password_hash, encoding)
end = time()
print(f"Password cracked: {cracked}")
print(f"Time: {end - start} seconds.")通常最好的选择是from x import y,但是这里可以减少缓存
# import hashlib # usually bad one
# from hashlib import md5 # usually best one
from _md5 import md5 # ugly hack but speed up发布于 2018-03-03 17:17:17
我知道这是为了学习的目的,您对这个具体实现的性能很感兴趣。否则,我会告诉您,每次计算散列可能比存储它们稍微慢一点。
首先生成可能的密码列表不是更快吗?并行性和过度设计可能会使这个部分变慢,我确定99.9%,但是它会为其他部分设置一些很好的并行性。
from itertools import product
passwords = product(printable, repeat = 4)对我来说。对于范围(0,255),而不是可打印的,需要1.5秒。
然后,您可以在multiprocessing.dummy中使用multiprocessing.dummy来获取->生成+检查散列的其他方式。(参考参见https://stackoverflow.com/a/28463266/8695782|,我认为哈希生成和检查部件并行可能会有所帮助)。例如,我倾向于使用查找类型的结构,在生成/存储+重新加载之后,我希望O(1)在检索上进行。
我能理解为什么学习的目的可能会让你不想存储“彩虹表”并限制内存的使用,但请记住,在性能方面,总是会有速度与空间的权衡。至于空格,为什么所有255个字符,至少排除了一些控制字符。可以对代码和https://gizmodo.com/over-560-million-passwords-discovered-by-security-resea-1795254560进行基准测试。
https://codereview.stackexchange.com/questions/187830
复制相似问题