文章/答案/技术大牛

发布

社区首页 >问答首页 >NetCDF大数据

问NetCDF大数据
EN

Stack Overflow用户

提问于 2016-05-24 10:27:32

回答 1查看 763关注 0票数 1

我需要将大的(+15 to ) NetCDF文件读入一个程序中，其中包含一个3D变量(等时间作为记录维，数据是经度纬度)。

我正在3级嵌套循环中处理数据(检查NetCDF的每个块是否通过某种条件)。例如；

from netCDF4 import Dataset                   
import numpy as np

File = Dataset('Somebigfile.nc', 'r')
Data = File.variables['Wind'][:]

Getdimensions = np.shape(Data)
Time = Getdimensions[0]
Latdim  = Getdimensions[1]
Longdim = Getdimensions[2]

for t in range(0,Time):
    for i in range(0,Latdim):
        for j in range(0,Longdim):

            if Data[t,i,j] > Somethreshold:
                #Do something

无论如何，我是否可以一次在NetCDF文件中读取一次记录？大大减少了内存的使用。任何帮助都非常感谢。

我知道NCO操作符，但我不喜欢在使用脚本之前使用这些方法来分解文件。

netcdf

bigdata

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-05-24 17:43:21

听起来您已经确定了一个解决方案，但我将抛出一个使用xarray和dask的更优雅、更矢量化(可能更快)的解决方案。嵌套的for循环将非常低效。结合xarray和dask，您可以在半向量化庄园中增量地处理文件中的数据。

由于您的Do something步骤并不那么具体，所以您必须从我的示例中推断。

import xarray as xr

# xarray will open your file but doesn't load in any data until you ask for it
# dask handles the chunking and memory management for you
# chunk size can be optimized for your specific dataset.
ds = xr.open_dataset('Somebigfile.nc', chunks={'time': 100})

# mask out values below the threshold
da_thresh = ds['Wind'].where(ds['Wind'] > Somethreshold)

# Now just operate on the values greater than your threshold
do_something(da_thresh)

Xarray/Dask文档：http://xarray.pydata.org/en/stable/dask.html

票数 6

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/37410900

复制

相似问题

问NetCDF大数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问NetCDF大数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问NetCDF大数据
EN