文章/答案/技术大牛

发布

问PyTables d类型对齐问题
EN

Stack Overflow用户

提问于 2014-02-21 05:45:34

回答 1查看 158关注 0票数 1

考虑以下代码：

import os
import numpy as np
import tables as tb

# Pass the field-names and their respective datatypes as
# a description to the table
dt = np.dtype([('doc_id', 'u4'), ('word', 'u4'), 
    ('tfidf', 'f4')], align=True)

# Open a h5 file and create a table
f = tb.openFile('corpus.h5', 'w')
t = f.createTable(f.root, 'table', dt, 'train set',
    filters=tb.Filters(5, 'blosc'))

r = t.row
for i in xrange(20):
    r['doc_id'] = i
    r['word'] = np.random.randint(1000000)
    r['tfidf'] = rand()
    r.append()
t.flush()

# structured array from table
sa = t[:]

f.close()
os.remove('corpus.h5')

我传入了一个对齐的dtype对象，但是当我观察到sa时，我得到了以下内容：

print dt
print "aligned?", dt.isalignedstruct
print
print sa.dtype
print "aligned?", sa.dtype.isalignedstruct

>>> 

    {'names':['doc_id','word','tfidf'], 'formats':['<u4','<u4','<f4'], 'offsets':[0,4,8], 'itemsize':12, 'aligned':True}
    aligned? True

    [('doc_id', '<u4'), ('word', '<u4'), ('tfidf', '<f4')]
    aligned? False

结构化数组不对齐。在PyTables中没有执行对齐的当前方法，或者我做错了什么？

编辑:我注意到我的问题类似于this one，但我复制并尝试了它提供的答案，但它也不起作用。

Edit2：(见Joel以下的答案)

我复制了Joel的答案，并测试了它是否真的通过Cython解压缩。原来是：

In [1]: %load_ext cythonmagic

In [2]: %%cython -f -c=-O3
   ...: import numpy as np
   ...: cimport numpy as np
   ...: import tables as tb
   ...: f = tb.openFile("corpus.h5", "r")
   ...: t = f.root.table
   ...: cdef struct Word: # notice how this is not packed
   ...:     np.uint32_t doc_id, word
   ...:     np.float32_t tfidf
   ...: def main(): # <-- np arrays in Cython have to be locally declared, so put array in a function
   ...:     cdef np.ndarray[Word] sa = t[:3]
   ...:     print sa
   ...:     print "aligned?", sa.dtype.isalignedstruct
   ...: main()
   ...: f.close()
   ...: 
[(0L, 232880L, 0.2658001184463501) (1L, 605285L, 0.9921777248382568) (2L, 86609L, 0.5266860723495483)]
aligned? False

python

numpy

memory-alignment

pytables

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-02-21 14:01:19

目前，无法在PyTables中对齐数据:(

在实践中，我做了两件事中的一件来解决这个问题：

我执行一个额外的步骤-> np.require(sa, dtype=dt, requirements='ACO') 或
我在我的dtype描述中排列这些字段，使它们对齐。

作为第二个选项的示例，假设我有以下的dtype：

dt = np.dtype([('f1', np.bool),('f2', '<i4'),('f3', '<f8')], align=True)

如果打印dt.descr，您将看到添加了一个空空间来对齐数据：

dt.descr >>> [('f1', '|b1'), ('', '|V3'), ('f2', '<i4'), ('f3', '<f8')]

但是，如果我像这样命令我的dtype (从最大到最小字节)：

dt = np.dtype([('f3', '<f8'), ('f2', '<i4'), ('f1', np.bool)])

无论我是否指定align = True/False，数据现在都是对齐的。

有人请纠正我，如果我错了，但即使dt.isalignedstruct = False，如果它已经被订购，如上文所示，它是技术上对齐。在我需要将对齐数据发送到C的应用程序中，这对我起了作用。

在您提供的示例中，即使sa.dtype.isalignedstruct = False给出了

dt.descr = [('doc_id', '<u4'), ('word', '<u4'), ('tfidf', '<f4')] 和

sa.dtype.descr = [('doc_id', '<u4'), ('word', '<u4'), ('tfidf', '<f4')]

sa数组是对齐的(没有在descr中添加空空间)。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/21926238

复制

相似问题

问PyTables d类型对齐问题
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问PyTables d类型对齐问题EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问PyTables d类型对齐问题
EN