文章/答案/技术大牛

发布

社区首页 >问答首页 >在不加载整个矩阵的情况下读取.h5文件中的随机元素

问在不加载整个矩阵的情况下读取.h5文件中的随机元素
EN

Stack Overflow用户

提问于 2019-03-11 18:58:57

回答 1查看 369关注 0票数 0

我有一个庞大的训练数据集，无法适应RAM。我试图在没有加载整个.h5的情况下在堆栈中加载随机批图像。我的方法是创建一个索引列表并对它们进行洗牌，而不是对整个.h5文件进行洗牌。让我们说：

a = np.arange(2000*2000*2000).reshape(2000, 2000, 2000)
idx = np.random.randint(2000, size = 800) #so that I only need to shuffle this idx at the end of epoch

# create this huge data 32GBs > my RAM
with h5py.File('./tmp.h5', 'w') as f:
     tmp = f.create_dataset('a', (2000, 2000, 2000))
     tmp[:] = a

# read it
with h5py.File('./tmp.h5', 'r') as f:
     tensor = f['a'][:][idx] #if I don't do [:] there will be error if I do so it will load whole file which I don't want

有人有解决办法吗？

tensorflow

neural-network

bigdata

h5py

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-04-03 22:37:40

感谢@max9111，下面是我解决这个问题的方法：

batch_size = 100 
idx = np.arange(2000)
# shuffle
idx = np.random.shuffle(idx)

由于h5py约束

选择坐标必须以递增的顺序给出。

在阅读之前，你应该进行排序：

for step in range(epoch_len // batch_size):
     try:
          with h5py.File(path, 'r') as f:
               return f['img'][np.sort(idx[step * batch_size])], f['label'][np.sort(idx[step * batch_size])]
     except:
          raise('epoch finished and drop the remainder')

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/55108611

复制

相似问题

问在不加载整个矩阵的情况下读取.h5文件中的随机元素
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在不加载整个矩阵的情况下读取.h5文件中的随机元素EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在不加载整个矩阵的情况下读取.h5文件中的随机元素
EN