文章/答案/技术大牛

发布

社区首页 >问答首页 >Python bz2在读取整个文件之前返回EOFerror

问Python bz2在读取整个文件之前返回EOFerror
EN

Stack Overflow用户

提问于 2021-06-14 17:38:10

回答 1查看 99关注 0票数 0

我正在尝试从驻留在Zenodo中的压缩文件中缓慢地加载项。我的目标是迭代地生成项目，而不将文件存储在我的计算机中。我的问题是，在读取第一个非空行之后就会出现EOFerror。我怎样才能克服这个问题？

这是我的代码：

import requests as req
import json
from bz2 import BZ2Decompressor


def lazy_load(file_url):
    dec = BZ2Decompressor()
    with req.get(file_url, stream=True) as res:
        for chunk in res.iter_content(chunk_size=1024):
            data = dec.decompress(chunk).decode('utf-8')
            # do something with 'data'


if __name__ == "__main__":
    creds = json.load(open('credentials.json'))
    url = 'https://zenodo.org/api/records/'
    id = '4617285'
    filename = '10.Papers.nt.bz2'
    res = req.get(f'{url}{id}', params={'access_token': creds['zenodo_token']})
    for file in res.json()['files']:
    if file['key'] == filename:
        for item in lazy_load(file['links']['self']):
            # do something with 'item'

我得到的错误如下：

Traceback (most recent call last):
File ".\mag_loader.py", line 51, in <module>
  for item in lazy_load(file['links']['self']):
File ".\mag_loader.py", line 18, in lazy_load
  data = dec.decompress(chunk)
EOFError: End of stream already reache

要运行代码，您需要一个Zenodo访问令牌，为此您需要一个帐户。登录后，您可以在这里创建令牌：https://zenodo.org/account/settings/applications/tokens/new/

python

python-requests

stream

lazy-loading

bz2

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-05-08 05:24:03

我也遇到了类似的问题，原来是因为bz2通常是“多流”，这意味着".bz2文件“只是一个连续的、独立编码的bz2块流。

但是，在python的bz2.解压缩器的文档中，它说：

注意:这个类不透明地处理包含多个压缩流的输入，不像decompress()和BZ2File。如果需要使用BZ2Decompressor解压缩多流输入，则必须为每个流使用新的解压缩器。

所以我不得不像这样修改代码：

def lazy_load(file_url):
    dec = BZ2Decompressor()
    with req.get(file_url, stream=True) as res:
        for chunk in res.iter_content(chunk_size=1024):
            data = dec.decompress(chunk).decode('utf-8')
            # do something with 'data'

            # ===== new code here =====
            if dec.eof:
                leftover = dec.unused_data
                # you should see that 'leftover' is the start of a new stream
                # beginning with "BZh9..."
                print(f"EOF! {leftover=}")
                # we have to start a new decompressor
                dec = BZ2Decompressor()
                data = dec.decompress(leftover).decode('utf-8')
                # do something with 'data' here too

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67974852

复制

相似问题

问Python bz2在读取整个文件之前返回EOFerror
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python bz2在读取整个文件之前返回EOFerrorEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python bz2在读取整个文件之前返回EOFerror
EN