文章/答案/技术大牛

发布

社区首页 >问答首页 >使用struct.unpack VS np.frombuffer VS np.ndarray VS np.fromfile解压缩二进制文件

问使用struct.unpack VS np.frombuffer VS np.ndarray VS np.fromfile解压缩二进制文件
EN

Stack Overflow用户

提问于 2019-02-13 21:46:17

回答 1查看 2.5K关注 0票数 3

我正在用许多不同的数据类型解压缩大型二进制文件(~1GB)。我正处于创建循环以隐藏每个字节的早期阶段。我一直在使用struct.unpack，但最近我认为如果我使用numpy，它会运行得更快。然而，切换到numpy已经减慢了我的计划。我试过：

struct.unpack
np.fromfile
np.frombuffer
np.ndarray

注意:在np.fromfile方法中，我将文件保持为打开状态，不将其加载到内存中并通过它进行查找。

with open(file="file_loc" , mode='rb') as file: 
    RAW = file.read()
byte=0
len = len(RAW)
while( byte < len):
    header = struct.unpack(">HHIH", RAW[byte:(byte+10)])
    size = header[1]
    loc  = str(header[3])
    data[loc] = struct.unpack(">B", RAW[byte+10:byte+size-10)
    byte+=size

dt=('>u2,>u2,>u4,>u2')
with open(file="file_loc" , mode='rb') as RAW:
    same loop as above:
        header = np.fromfile(RAW[byte:byte+10], dtype=dt, count=1)[0]
        data   = np.fromfile(RAW[byte+10:byte+size-10], dtype=">u1", count=size-10)

dt=('>u2,>u2,>u4,>u2')
with open(file="file_loc" , mode='rb') as file:
    RAW = file.read()
same loop:
    header = np.ndarray(buffer=RAW[byte:byte+10], dtype=dt_header, shape= 1)[0]
    data   = np.ndarray(buffer=RAW[byte+10:byte+size-10], dtype=">u1", shape=size-10)

4) pretty much the same as 3 except using np.frombuffer()

所有的numpy实现都以大约一半的速度作为struct.unpack方法处理，这与我所期望的不一样。

如果我能做些什么来提高业绩，请告诉我。

另外，我刚从内存中输入了这个，所以它可能有一些错误。

python

numpy

struct

回答 1

Stack Overflow用户

发布于 2019-02-14 00:16:04

我没有经常使用struct，但是在代码和文档之间，我让它在一个存储整数数组的缓冲区上工作。

从numpy数组创建一个字节数组/字符串。

In [81]: arr = np.arange(1000)
In [82]: barr = arr.tobytes()
In [83]: type(barr)
Out[83]: bytes
In [84]: len(barr)
Out[84]: 8000

相反是tobytes

In [85]: x = np.frombuffer(barr, dtype=int)
In [86]: x[:10]
Out[86]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [87]: np.allclose(x,arr)
Out[87]: True

ndarray也能工作，尽管通常不鼓励直接使用此构造函数：

In [88]: x = np.ndarray(buffer=barr, dtype=int, shape=(1000,))
In [89]: np.allclose(x,arr)
Out[89]: True

要使用struct，我需要创建一个包含长度"1000长“的格式：

In [90]: tup = struct.unpack('1000l', barr)
In [91]: len(tup)
Out[91]: 1000
In [92]: tup[:10]
Out[92]: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
In [93]: np.allclose(np.array(tup),arr)
Out[93]: True

因此，既然我们已经建立了读取缓冲区的等效方法，那么就执行一些时间安排：

In [94]: timeit x = np.frombuffer(barr, dtype=int)
617 ns ± 0.806 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [95]: timeit x = np.ndarray(buffer=barr, dtype=int, shape=(1000,))
1.11 µs ± 1.76 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [96]: timeit tup = struct.unpack('1000l', barr)
19 µs ± 38.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [97]: timeit tup = np.array(struct.unpack('1000l', barr))
87.5 µs ± 25.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

frombuffer看起来不错。

你的struct.unpack循环让我困惑。我不认为它在做和frombuffer一样的事情。但正如一开始所说的，我并没有经常使用struct。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54679949

复制

相似问题

问使用struct.unpack VS np.frombuffer VS np.ndarray VS np.fromfile解压缩二进制文件
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用struct.unpack VS np.frombuffer VS np.ndarray VS np.fromfile解压缩二进制文件EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用struct.unpack VS np.frombuffer VS np.ndarray VS np.fromfile解压缩二进制文件
EN