文章/答案/技术大牛

发布

社区首页 >问答首页 >Python: pdf文件与下载对象的散列

问Python: pdf文件与下载对象的散列
EN

Stack Overflow用户

提问于 2017-07-14 00:08:39

回答 2查看 2.6K关注 0票数 2

我想检查网页服务器上的pdf文件内容是否与我电脑上的pdf文件内容相同。我尝试了一下，但没有成功：

>>> import requests, hashlib
>>> pdf = requests.get('<http link to pdf file>')
>>> type(pdf.content)
<class 'bytes'>
>>> type(repr(open('file.pdf','rb')).encode('utf-8'))
<class 'bytes'>
>>> hashlib.sha256(repr(open('file.pdf','rb')).encode('utf-8')) == hashlib.sha256(repr(pdf.content).encode('utf-8')).hexdigest()
False
>>> hashlib.sha256(repr(open('file.pdf','rb')).encode('utf-8')) == hashlib.sha256(pdf.content).hexdigest()
False

python

pdf

hash

回答 2

Stack Overflow用户

回答已采纳

发布于 2017-07-14 00:11:49

您散列的是文件对象的UTF-8编码的repr，而不是文件的内容。无论如何，没有理由使用repr；直接对内容进行散列。

>>> with open('file.pdf', 'rb') as f:
...     h1 = hashlib.sha256(f.read()).digest()
>>> h2 = hashlib.sha256(pdf.content).digest()
>>> h1 == h2
True

票数 4

Stack Overflow用户

发布于 2017-07-14 00:13:29

第一个散列是文件对象(而不是其内容)表示的散列：

repr(open('file.pdf','rb'))  
    # "<_io.BufferedReader name='file.pdf'>"
repr(open('file.pdf','rb')).encode('utf-8')  
    # b"<_io.BufferedReader name='file.pdf'>"

您的第一个散列是在bytes：b"<_io.BufferedReader name='file.pdf'>"上。

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/45085998

复制

相似问题

问Python: pdf文件与下载对象的散列
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python: pdf文件与下载对象的散列EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python: pdf文件与下载对象的散列
EN