文章/答案/技术大牛

发布

社区首页 >问答首页 >Python File Slurp w/ endian转换

问Python File Slurp w/ endian转换
EN

Stack Overflow用户

提问于 2009-10-28 02:11:43

回答 4查看 4.7K关注 0票数 5

最近有人问how to do a file slurp in python，被接受的答案是这样的：

with open('x.txt') as x: f = x.read()

我该如何读入文件并转换数据的字节顺序表示法呢？

例如，我有一个1 1GB的二进制文件，它只是一堆打包为高字节顺序的单精度浮点数，我想将其转换为低字节顺序并转储到一个numpy数组中。下面是我为完成此任务而编写的函数，以及调用它的一些实际代码。我使用struct.unpack进行字节顺序转换，并试图通过使用mmap来加快速度。

我的问题是，我是否在mmap和struct.unpack中正确使用了slurp？有没有一种更干净、更快的方法来做这件事？现在我所做的一切都是可行的，但我真的很想学习如何做得更好。

提前感谢！

#!/usr/bin/python
from struct import unpack
import mmap
import numpy as np

def mmapChannel(arrayName,  fileName,  channelNo,  line_count,  sample_count):
    """
    We need to read in the asf internal file and convert it into a numpy array.
    It is stored as a single row, and is binary. Thenumber of lines (rows), samples (columns),
    and channels all come from the .meta text file
    Also, internal format files are packed big endian, but most systems use little endian, so we need
    to make that conversion as well.
    Memory mapping seemed to improve the ingestion speed a bit
    """
    # memory-map the file, size 0 means whole file
    # length = line_count * sample_count * arrayName.itemsize
    print "\tMemory Mapping..."
    with open(fileName, "rb") as f:
        map = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
        map.seek(channelNo*line_count*sample_count*arrayName.itemsize)

        for i in xrange(line_count*sample_count):
            arrayName[0, i] = unpack('>f', map.read(arrayName.itemsize) )[0]

        # Same method as above, just more verbose for the maintenance programmer.
        #        for i in xrange(line_count*sample_count): #row
        #            be_float = map.read(arrayName.itemsize) # arrayName.itemsize should be 4 for float32
        #            le_float = unpack('>f', be_float)[0] # > for big endian, < for little endian
        #            arrayName[0, i]= le_float

        map.close()
    return arrayName

print "Initializing the Amp HH HV, and Phase HH HV arrays..."
HHamp = np.ones((1,  line_count*sample_count),  dtype='float32')
HHphase = np.ones((1,  line_count*sample_count),  dtype='float32')
HVamp = np.ones((1,  line_count*sample_count),  dtype='float32')
HVphase = np.ones((1,  line_count*sample_count),  dtype='float32')



print "Ingesting HH_Amp..."
HHamp = mmapChannel(HHamp, 'ALPSRP042301700-P1.1__A.img',  0,  line_count,  sample_count)
print "Ingesting HH_phase..."
HHphase = mmapChannel(HHphase, 'ALPSRP042301700-P1.1__A.img',  1,  line_count,  sample_count)
print "Ingesting HV_AMP..."
HVamp = mmapChannel(HVamp, 'ALPSRP042301700-P1.1__A.img',  2,  line_count,  sample_count)
print "Ingesting HV_phase..."
HVphase = mmapChannel(HVphase, 'ALPSRP042301700-P1.1__A.img',  3,  line_count,  sample_count)

print "Reshaping...."
HHamp_orig = HHamp.reshape(line_count, -1)
HHphase_orig = HHphase.reshape(line_count, -1)
HVamp_orig = HVamp.reshape(line_count, -1)
HVphase_orig = HVphase.reshape(line_count, -1)

python

struct

numpy

endianness

mmap

回答 4

Stack Overflow用户

回答已采纳

发布于 2009-10-28 04:19:50

with open(fileName, "rb") as f:
  arrayName = numpy.fromfile(f, numpy.float32)
arrayName.byteswap(True)

在速度和简洁性方面很难被击败；-)。对于byteswap，请参见here ( True参数的意思是“就地执行”)；对于fromfile，请参见here。

这在小端机器上是有效的(因为数据是大端的，所以需要byteswap )。您可以测试是否需要有条件地执行byteswap，将最后一行从无条件调用更改为byteswap，例如：

if struct.pack('=f', 2.3) == struct.pack('<f', 2.3):
  arrayName.byteswap(True)

即以低字节顺序测试为条件的对byteswap的调用。

票数 6

Stack Overflow用户

发布于 2009-10-28 04:42:18

稍微修改了一下@Alex Martelli's answer

arr = numpy.fromfile(filename, numpy.dtype('>f4'))
# no byteswap is needed regardless of endianess of the machine

票数 7

Stack Overflow用户

发布于 2009-10-28 02:30:38

您可以使用CorePy将ASM based solution组合在一起。不过，我想知道您是否能够从算法的其他部分获得足够的性能。I/O和对1 1GB数据块的操作将需要一段时间，无论您以何种方式对其进行切片。

您可能会发现另一件有用的事情是，一旦您在python中构建了算法的原型，就切换到C语言。有一次，我这样做是为了对整个世界的DEM (高度)数据集进行操作。一旦我摆脱了解释过的脚本，整个事情就变得更容易接受了。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/1632673

复制

相似问题

问Python File Slurp w/ endian转换
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python File Slurp w/ endian转换EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python File Slurp w/ endian转换
EN