首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从Xarray往返Zarr数据

从Xarray往返Zarr数据
EN

Stack Overflow用户
提问于 2018-04-10 23:01:12
回答 1查看 503关注 0票数 1

xarray中,我使用ds.to_zarr()将数据集写入S3,然后使用xr.open_zarr()查看是否获得相同的数据集。

我在xarray中的数据集如下所示:

代码语言:javascript
复制
<xarray.Dataset>
Dimensions:                     (nv: 2, reference_time: 11, time: 11, x: 4608, y: 3840)
Coordinates:
  * reference_time              (reference_time) datetime64[ns] 2018-04-01T18:00:00 ...
  * x                           (x) float64 -2.304e+06 -2.303e+06 -2.302e+06 ...
  * y                           (y) float64 -1.92e+06 -1.919e+06 -1.918e+06 ...
  * time                        (time) datetime64[ns] 2018-04-01T19:00:00 ...
Dimensions without coordinates: nv
Data variables:
    time_bounds                 (time, nv) datetime64[ns] dask.array<shape=(11, 2), chunksize=(1, 2)>
    ProjectionCoordinateSystem  (time) |S64 b'' b'' b'' b'' b'' b'' b'' b'' ...
    T2D                         (time, y, x) float64 dask.array<shape=(11, 3840, 4608), chunksize=(1, 3840, 4608)>

我使用以下命令将其写入zarr

代码语言:javascript
复制
fs = s3fs.S3FileSystem(anon=False)
d = s3fs.S3Map(f_zarr, s3=fs)
ds.to_zarr(store=d, mode='w')

当我尝试使用以下命令重新阅读它时:

代码语言:javascript
复制
ds2 = xr.open_zarr(d)

我回来了:

代码语言:javascript
复制
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-8198db1c8578> in <module>()
----> 1 ds2 = xr.open_zarr(d)

/opt/conda/lib/python3.6/site-packages/xarray/backends/zarr.py in open_zarr(store, group, synchronizer, auto_chunk, decode_cf, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables)
    476 
    477         variables = OrderedDict([(k, maybe_chunk(k, v))
--> 478                                  for k, v in ds.variables.items()])
    479         return ds._replace_vars_and_dims(variables)
    480     else:

/opt/conda/lib/python3.6/site-packages/xarray/backends/zarr.py in <listcomp>(.0)
    476 
    477         variables = OrderedDict([(k, maybe_chunk(k, v))
--> 478                                  for k, v in ds.variables.items()])
    479         return ds._replace_vars_and_dims(variables)
    480     else:

/opt/conda/lib/python3.6/site-packages/xarray/backends/zarr.py in maybe_chunk(name, var)
    471                 token2 = tokenize(name, var._data)
    472                 name2 = 'zarr-%s' % token2
--> 473                 return var.chunk(chunks, name=name2, lock=None)
    474             else:
    475                 return var

/opt/conda/lib/python3.6/site-packages/xarray/core/variable.py in chunk(self, chunks, name, lock)
    820             data = indexing.ImplicitToExplicitIndexingAdapter(
    821                 data, indexing.OuterIndexer)
--> 822             data = da.from_array(data, chunks, name=name, lock=lock)
    823 
    824         return type(self)(self.dims, data, self._attrs, self._encoding,

/opt/conda/lib/python3.6/site-packages/dask/array/core.py in from_array(x, chunks, name, lock, asarray, fancy, getitem)
   1977     >>> a = da.from_array(x, chunks=(1000, 1000), lock=True)  # doctest: +SKIP
   1978     """
-> 1979     chunks = normalize_chunks(chunks, x.shape)
   1980     if name in (None, True):
   1981         token = tokenize(x, chunks)

/opt/conda/lib/python3.6/site-packages/dask/array/core.py in normalize_chunks(chunks, shape)
   1907             raise ValueError(
   1908                 "Chunks and shape must be of the same length/dimension. "
-> 1909                 "Got chunks=%s, shape=%s" % (chunks, shape))
   1910 
   1911     if shape is not None:

ValueError: Chunks and shape must be of the same length/dimension. Got chunks=(11, 64), shape=(11,)

如果设置为auto_chunk=False,则可以读取数据集

代码语言:javascript
复制
ds2 = xr.open_zarr(d, auto_chunk=False)
ds2 

结果:

代码语言:javascript
复制
<xarray.Dataset>
Dimensions:                     (nv: 2, reference_time: 11, time: 11, x: 4608, y: 3840)
Coordinates:
  * reference_time              (reference_time) datetime64[ns] 2018-04-01T18:00:00 ...
  * time                        (time) datetime64[ns] 2018-04-01T19:00:00 ...
  * x                           (x) float64 -2.304e+06 -2.303e+06 -2.302e+06 ...
  * y                           (y) float64 -1.92e+06 -1.919e+06 -1.918e+06 ...
Dimensions without coordinates: nv
Data variables:
    LWDOWN                      (time, y, x) float64 ...

但是,我是不是对chunking以及xarraydaskzarr应该一起工作的方式还不了解呢?

要让auto_chunk=True正常工作,我需要做些什么呢?

EN

回答 1

Stack Overflow用户

发布于 2018-04-14 19:59:43

正如@mdurant所建议的,带有dtype=S64的变量ProjectionCoordinateSystem导致了这个问题。因为我不需要这个非标准变量,所以只需用

代码语言:javascript
复制
ds.drop(['ProjectionCoordinateSystem']) 

ds.to_zarr解决这个问题之前,允许ds.open_zarr()与默认的autochunk=True很好地工作。

完整笔记本点击此处:https://gist.github.com/rsignell-usgs/4a54ea152d4e10a14deff516bf597015

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/49756981

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档