文章/答案/技术大牛

发布

社区首页 >问答首页 >在Google ml-engine (tensorflow)中读取存储桶中的数据

问在Google ml-engine (tensorflow)中读取存储桶中的数据
EN

Stack Overflow用户

提问于 2017-09-20 04:21:17

回答 2查看 2K关注 0票数 4

我在从Google托管的存储桶中读取数据时遇到问题。我有一个存储桶，其中包含大约1000个我需要访问的文件，存放在(例如) gs://my- bucket /data中

从命令行或其他google的Python API客户端使用gsutil，我可以访问存储桶中的数据，但是在google-cloud-ml-engine上默认情况下不支持导入这些API。

我需要一种方法来访问数据和文件名，要么使用默认的python库(即os)，要么使用tensorflow。我知道tensorflow在某个地方内置了这个功能，我很难找到它

理想情况下，我正在寻找一个命令的替代品，比如os.listdir()和另一个命令()

train_data = [read_training_data(filename) for filename in os.listdir('gs://my-bucket/data/')]

read_training_data使用tensorflow阅读器对象的位置

谢谢你的帮助！(另见附注：我的数据是二进制的)

python

tensorflow

google-cloud-ml-engine

回答 2

Stack Overflow用户

发布于 2017-09-20 09:58:12

如果您只想将数据读入内存，那么this answer具有您需要的详细信息，即使用file_io模块。

也就是说，您可能希望考虑对TensorFlow使用内置的读取机制，因为它们的性能会更好。

有关阅读的信息可以在here上找到。最新和最棒的(但还不是官方“核心”TensorFlow的一部分)是Dataset API (more info here)。

以下是一些需要记住的事情：

您使用的是TensorFlow可以读取的格式吗？它可以转换为这种格式吗？
是不是“喂食”的开销高到足以影响训练集太大而无法放入内存？

如果一个或多个问题的答案是肯定的，特别是后两个问题，请考虑使用阅读器。

票数 3

Stack Overflow用户

发布于 2018-07-10 00:08:39

不管它有什么价值。我在读取文件时也遇到了问题，特别是在datalab笔记本中从google云存储中读取二进制文件。第一种方法是使用gs-utils将文件复制到我的本地文件系统，然后使用tensorflow正常读取文件。这是在文件复制完成后在这里演示的。

这是我的设置单元

import math
import shutil
import numpy as np
import pandas as pd
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

这是一个用于在本地读取文件的单元格，作为健全性检查。

# this works for reading local file
audio_binary_local = tf.read_file("100852.mp3")
waveform = tf.contrib.ffmpeg.decode_audio(audio_binary_local, file_format='mp3', 
samples_per_second=44100, channel_count=2)
# this will show that it has two channels of data
with tf.Session() as sess:
    result = sess.run(waveform)
    print (result)

下面是从gs:直接以二进制文件的形式读取文件。

# this works for remote files in gs:
gsfilename = 'gs://proj-getting-started/UrbanSound/data/air_conditioner/100852.mp3'
# python 2
#audio_binary_remote = tf.gfile.Open(gsfilename).read()
# python 3
audio_binary_remote = tf.gfile.Open(gsfilename, 'rb').read()
waveform = tf.contrib.ffmpeg.decode_audio(audio_binary_remote, file_format='mp3', samples_per_second=44100, channel_count=2)
# this will show that it has two channels of data
with tf.Session() as sess:
  result = sess.run(waveform)
  print (result)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/46309161

复制

相似问题

问在Google ml-engine (tensorflow)中读取存储桶中的数据
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Google ml-engine (tensorflow)中读取存储桶中的数据EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Google ml-engine (tensorflow)中读取存储桶中的数据
EN