文章/答案/技术大牛

发布

社区首页 >问答首页 >强力哈希破碎机

问强力哈希破碎机
EN

Code Review用户

提问于 2018-02-18 23:34:39

回答 2查看 7.2K关注 0票数 9

我用Python (纯粹为了教育目的)做了一个散列程序，但是它真的很慢(对于一个4个字符串，大约120秒)。我怎么能加快速度？

目前的优化和解释：

CharSet.get_advance中的闭包:这些比属性查找更快。
iter in PasswordCracker.crack：这将循环移动到C。
CharSet.next作为一个array.array：比dict快。

今后可能进行的优化：

advance有点慢，但我不知道如何加快速度。

代码：

import hashlib
from string import printable
from time import time
import itertools
from array import array

ENCODING = "ascii" # utf-8 for unicode support

class CharSet():
  def __init__(self, chars):
    chars = to_bytes(chars)
    self.chars = set(chars)
    self.first = chars[0]
    self.last = chars[-1]
    self.next = array("B", [0] * 256)
    for char, next_char in zip(chars, chars[1:]):
      self.next[char] = next_char

  def update_chars(self, new_chars):
    new_chars = to_bytes(new_chars)
    new_chars = set(new_chars) - self.chars
    if new_chars: # if theres anything new
      self.chars |= new_chars
      new_chars = list(new_chars)
      self.next[self.last] = new_chars[0]
      self.last = new_chars[-1]
      for char, next_char in zip(new_chars, new_chars[1:]):
        self.next[char] = next_char

  def get_advance(self, arr, hash_):
    first = self.first
    last = self.last
    next_ = self.next
    def advance():
      for ind, byte in enumerate(arr):
        if byte == last:
          arr[ind] = first
        else:
          arr[ind] = next_[byte]
          return hash_(arr)

      arr.append(first)
      return hash_(arr)

    return advance

class PasswordCracker():
  def __init__(self, hash_, chars=None):
    self.hash = hash_
    if chars is None:
      chars = printable
    self.char_set = CharSet(chars)

  def update_chars(self, string):
    self.char_set.update_chars(string)

  def crack(self, hashed):
    arr = bytearray()
    advance = self.char_set.get_advance(arr, self.hash)
    for _ in iter(advance, hashed):
      pass
    return arr

def to_bytes(string):
  if isinstance(string, str):
    return bytearray(string, ENCODING)
  elif isinstance(string, (bytes, bytearray)):
    return string
  else:
    raise TypeError(f"Cannot convert {string} to bytes")

def get_hasher(hash_):
  def hasher(bytes):
    return hash_(bytes).digest()

  return hasher

md5 = get_hasher(hashlib.md5)

cracker = PasswordCracker(md5)

password = input("Enter password: ")

cracker.update_chars(password)
password = md5(to_bytes(password))

start = time()
cracked = cracker.crack(password)
end = time()
print(f"Password cracked: {cracked.decode(ENCODING)}")
print(f"Time: {end - start} seconds.")

分析结果(密码为"pww")：

      1333313 function calls in 1.500 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.500    1.500 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 main.py:31(get_advance)
   333326    0.394    0.000    1.376    0.000 main.py:35(advance)
        1    0.124    0.124    1.500    1.500 main.py:58(crack)
   333326    0.311    0.000    0.982    0.000 main.py:74(hasher)
   333326    0.265    0.000    0.265    0.000 {built-in method _hashlib.openssl_md5}
        1    0.000    0.000    1.500    1.500 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.iter}
        3    0.000    0.000    0.000    0.000 {method 'append' of 'bytearray' objects}
   333326    0.405    0.000    0.405    0.000 {method 'digest' of '_hashlib.HASH' objects}

分析结果(密码为"pwww"，附加"w")：

         133333314 function calls in 190.800 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  190.799  190.799 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 main.py:31(get_advance)
 33333326   65.652    0.000  169.782    0.000 main.py:35(advance)
        1   21.017   21.017  190.799  190.799 main.py:58(crack)
 33333326   40.640    0.000  104.130    0.000 main.py:74(hasher)
 33333326   27.957    0.000   27.957    0.000 {built-in method _hashlib.openssl_md5}
        1    0.000    0.000  190.800  190.800 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.iter}
        4    0.000    0.000    0.000    0.000 {method 'append' of 'bytearray' objects}
 33333326   35.533    0.000   35.533    0.000 {method 'digest' of '_hashlib.HASH' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

python

performance

python-3.x

回答 2

Code Review用户

发布于 2018-03-04 15:47:24

使用正确的作业

工具

这是一个容易的问题，但很难用电脑解决。最好使用低级语言(如c )。

生成可能的密码

您不需要手动创建密码，最好使用迭代工具库。

from hashlib import md5
from time import time
from string import printable
from itertools import product, count


def passwords(encoding):
    chars = [c.encode(encoding) for c in printable]
    for length in count(start=1):
        for pwd in product(chars, repeat=length):
            yield b''.join(pwd)


def crack(search_hash, encoding):
    for pwd in passwords(encoding):
        if md5(pwd).digest() == search_hash:
            return pwd.decode(encoding)


if __name__ == "__main__":
    encoding = 'ascii'  # utf-8 for unicode support
    password = 'pwww'
    password_hash = md5(password.encode(encoding)).digest()

    start = time()
    cracked = crack(password_hash, encoding)
    end = time()
    print(f"Password cracked: {cracked}")
    print(f"Time: {end - start} seconds.")

进口

通常最好的选择是from x import y，但是这里可以减少缓存

# import hashlib # usually bad one
# from hashlib import md5 # usually best one
from _md5 import md5  # ugly hack but speed up

票数 3

Code Review用户

发布于 2018-03-03 17:17:17

我知道这是为了学习的目的，您对这个具体实现的性能很感兴趣。否则，我会告诉您，每次计算散列可能比存储它们稍微慢一点。

首先生成可能的密码列表不是更快吗？并行性和过度设计可能会使这个部分变慢，我确定99.9%，但是它会为其他部分设置一些很好的并行性。

from itertools import product
passwords = product(printable, repeat = 4)

对我来说。对于范围(0,255)，而不是可打印的，需要1.5秒。

然后，您可以在multiprocessing.dummy中使用multiprocessing.dummy来获取->生成+检查散列的其他方式。(参考参见https://stackoverflow.com/a/28463266/8695782|，我认为哈希生成和检查部件并行可能会有所帮助)。例如，我倾向于使用查找类型的结构，在生成/存储+重新加载之后，我希望O(1)在检索上进行。

我能理解为什么学习的目的可能会让你不想存储“彩虹表”并限制内存的使用，但请记住，在性能方面，总是会有速度与空间的权衡。至于空格，为什么所有255个字符，至少排除了一些控制字符。可以对代码和https://gizmodo.com/over-560-million-passwords-discovered-by-security-resea-1795254560进行基准测试。

票数 1

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/187830

复制

相似问题

问强力哈希破碎机
EN

回答 2

Code Review用户

使用正确的作业

生成可能的密码

进口

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问强力哈希破碎机EN

回答 2

Code Review用户

使用正确的作业

生成可能的密码

进口

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问强力哈希破碎机
EN