首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Dask数组到zarr,形状未知

Dask数组到zarr,形状未知
EN

Stack Overflow用户
提问于 2019-07-23 19:06:15
回答 1查看 547关注 0票数 2

我正在尝试将dask数组存储在zarr文件中。

当dask数组具有定义的形状时,我成功地做到了这一点。

代码语言:javascript
复制
import dask
import dask.array as da
import numpy as np
from tempfile import TemporaryDirectory
import zarr


np_array = np.random.randint(1, 10, size=1000)
array = da.from_array(np_array)

with TemporaryDirectory() as tmpdir:
    delayed = da.to_zarr(array, url=tmpdir,
                         compute=False, component='/data')
    dask.compute(delayed)

     z_object = zarr.open_group(tmpdir, mode='r')

     assert np.all(np_array == z_object.data[:])

但是,如果我对dask数组执行了任何操作,形状就会丢失,并且zarr会抱怨形状中的Nans。

代码语言:javascript
复制
# this will fail

np_array = np.random.randint(1, 10, size=1000)
array = da.from_array(np_array)

array = array[array > 5]

with TemporaryDirectory() as tmpdir:
    delayed = da.to_zarr(array, url=tmpdir,
                         compute=False, component='/data')
    dask.compute(delayed)

    z_object = zarr.open_group(tmpdir, mode='r')

    assert np.all(np_array[np_array > 5] == z_object.data[:])

这是引发的错误:

代码语言:javascript
复制
Traceback (most recent call last):
  File "/home/peio/devel/variation/variation6/variation6/tests/test_zarr.py", line 38, in <module>
    without_shape()
  File "/home/peio/devel/variation/variation6/variation6/tests/test_zarr.py", line 29, in without_shape
    compute=False, component='/data')
  File "/home/peio/devel/variation/pyenv3/lib/python3.7/site-packages/dask/array/core.py", line 2808, in to_zarr
    **kwargs
  File "/home/peio/devel/variation/pyenv3/lib/python3.7/site-packages/zarr/creation.py", line 120, in create
    chunk_store=chunk_store, filters=filters, object_codec=object_codec)
  File "/home/peio/devel/variation/pyenv3/lib/python3.7/site-packages/zarr/storage.py", line 323, in init_array
    object_codec=object_codec)
  File "/home/peio/devel/variation/pyenv3/lib/python3.7/site-packages/zarr/storage.py", line 343, in _init_array_metadata
    shape = normalize_shape(shape) + dtype.shape
  File "/home/peio/devel/variation/pyenv3/lib/python3.7/site-packages/zarr/util.py", line 58, in normalize_shape
    shape = tuple(int(s) for s in shape)
  File "/home/peio/devel/variation/pyenv3/lib/python3.7/site-packages/zarr/util.py", line 58, in <genexpr>
    shape = tuple(int(s) for s in shape)
ValueError: cannot convert float NaN to integer

有没有办法将不知道形状的dask数组存储到zarr文件中?

提前感谢!

EN

回答 1

Stack Overflow用户

发布于 2019-07-24 01:19:40

Zarr希望块的形状是统一的,并且事先就知道了。目前,Dask通过将数组重新分块为均匀数组来促进这一点。然而,array[array > 5]创建了一个具有未知块形状的Dask数组。因此,由于不存在所需的信息,因此无法预先将其重新分块为统一的。也就是说,我们可以使用explain this better

可以通过使用返回已知块形状的Dask操作来解决此问题(如David所建议的)。或者,可以在存储之前确定块形状(at the cost of computing)。我们也可以讨论extending Zarr to handle this case,但这是一个长期的解决方案。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57162752

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档