我正在尝试从驻留在Zenodo中的压缩文件中缓慢地加载项。我的目标是迭代地生成项目,而不将文件存储在我的计算机中。我的问题是,在读取第一个非空行之后就会出现EOFerror。我怎样才能克服这个问题?
这是我的代码:
import requests as req
import json
from bz2 import BZ2Decompressor
def lazy_load(file_url):
dec = BZ2Decompressor()
with req.get(file_url, stream=True) as res:
for chunk in res.iter_content(chunk_size=1024):
data = dec.decompress(chunk).decode('utf-8')
# do something with 'data'
if __name__ == "__main__":
creds = json.load(open('credentials.json'))
url = 'https://zenodo.org/api/records/'
id = '4617285'
filename = '10.Papers.nt.bz2'
res = req.get(f'{url}{id}', params={'access_token': creds['zenodo_token']})
for file in res.json()['files']:
if file['key'] == filename:
for item in lazy_load(file['links']['self']):
# do something with 'item'我得到的错误如下:
Traceback (most recent call last):
File ".\mag_loader.py", line 51, in <module>
for item in lazy_load(file['links']['self']):
File ".\mag_loader.py", line 18, in lazy_load
data = dec.decompress(chunk)
EOFError: End of stream already reache要运行代码,您需要一个Zenodo访问令牌,为此您需要一个帐户。登录后,您可以在这里创建令牌:https://zenodo.org/account/settings/applications/tokens/new/
发布于 2022-05-08 05:24:03
我也遇到了类似的问题,原来是因为bz2通常是“多流”,这意味着".bz2文件“只是一个连续的、独立编码的bz2块流。
但是,在python的bz2.解压缩器的文档中,它说:
注意:这个类不透明地处理包含多个压缩流的输入,不像decompress()和BZ2File。如果需要使用BZ2Decompressor解压缩多流输入,则必须为每个流使用新的解压缩器。
所以我不得不像这样修改代码:
def lazy_load(file_url):
dec = BZ2Decompressor()
with req.get(file_url, stream=True) as res:
for chunk in res.iter_content(chunk_size=1024):
data = dec.decompress(chunk).decode('utf-8')
# do something with 'data'
# ===== new code here =====
if dec.eof:
leftover = dec.unused_data
# you should see that 'leftover' is the start of a new stream
# beginning with "BZh9..."
print(f"EOF! {leftover=}")
# we have to start a new decompressor
dec = BZ2Decompressor()
data = dec.decompress(leftover).decode('utf-8')
# do something with 'data' here toohttps://stackoverflow.com/questions/67974852
复制相似问题