我正在尝试对大型GIS数据集(10000 x 10000阵列)进行高斯平滑。我目前的方法是将整个数组加载到内存中,平滑它,然后写回它。它看起来是这样的:
big_array = band_on_disk.ReadAsArray()
scipy.ndimage.gaussian_filter(big_array, sigma, output=smoothed_array)
output_band.WriteArray(smoothed_array)对于较大的光栅,我得到一个MemoryError,所以我想加载该数组的子块,但我不确定如何处理影响相邻子块的区域的高斯平滑。
有什么建议可以修复上面的算法,使其在较小的内存占用上工作,同时仍然正确地平滑整个数组?
发布于 2012-10-12 02:29:25
尝试使用memory mapped文件。
中等的内存使用率和可承受的速度
如果你能负担得起内存中的一个数组,这是相当快的:
import numpy as np
from scipy.ndimage import gaussian_filter
# create some fake data, save it to disk, and free up its memory
shape = (10000,10000)
orig = np.random.random_sample(shape)
orig.tofile('orig.dat')
print 'saved original'
del orig
# allocate memory for the smoothed data
smoothed = np.zeros((10000,10000))
# memory-map the original data, so it isn't read into memory all at once
orig = np.memmap('orig.dat', np.float64, 'r', shape=shape)
print 'memmapped'
sigma = 10 # I have no idea what a reasonable value is here
gaussian_filter(orig, sigma, output = smoothed)
# save the smoothed data to disk
smoothed.tofile('smoothed.dat')内存使用率低,速度非常慢
如果你不能同时在内存中拥有任何一个数组,你可以对原始数组和平滑数组进行内存映射。这有非常低的内存使用率,但速度非常慢,至少在我的机器上是如此。
您必须忽略此代码的第一部分,因为它欺骗并一次性创建原始数组,然后将其保存到磁盘。您可以将其替换为加载在磁盘上以增量方式构建的数据的代码。
import numpy as np
from scipy.ndimage import gaussian_filter
# create some fake data, save it to disk, and free up its memory
shape = (10000,10000)
orig = np.random.random_sample(shape)
orig.tofile('orig.dat')
print 'saved original'
del orig
# memory-map the original data, so it isn't read into memory all at once
orig = np.memmap('orig.dat', np.float64, 'r', shape=shape)
# create a memory mapped array for the smoothed data
smoothed = np.memmap('smoothed.dat', np.float64, 'w+', shape = shape)
print 'memmapped'
sigma = 10 # I have no idea what a reasonable value is here
gaussian_filter(orig, sigma, output = smoothed)https://stackoverflow.com/questions/12845904
复制相似问题