首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >示例代码使用Google服务和Cloud对初始阶段进行错误再培训

示例代码使用Google服务和Cloud对初始阶段进行错误再培训
EN

Stack Overflow用户
提问于 2017-02-16 13:49:17
回答 1查看 223关注 0票数 0

在Cloud中运行示例代码Google的@SlavenBilac 已发布使用和Cloud训练和分类图像时会发生错误。

代码卡在全局步骤/秒:0

代码语言:javascript
复制
INFO    2017-02-16 06:28:36 -0600       master-replica-0                Start master session 538be2b71d17c4dc with config: 
ERROR   2017-02-16 06:28:36 -0600       master-replica-0                device_filters: "/job:ps"
ERROR   2017-02-16 06:28:36 -0600       master-replica-0                device_filters: "/job:master/task:0"
INFO    2017-02-16 06:28:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:30:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:32:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:34:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:36:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:38:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:40:39 -0600       master-replica-0                global_step/sec: 0
<keeps repeating until I kill the job>

基于Google的@JoshGC 回答类似的问题,我昨天创建了一个全新的Google帐户(包含新的计费帐户、新项目等),然后运行CloudShell安装脚本和其他步骤来设置环境,然后针对示例花数据运行示例代码。错误会发生(如下面所示),因此我不认为原因与数据或帐户配置有关。

如何从GoogleCloudPlatform/cloudml-样本/花卉修改文件以避免此错误?

摘录:

运行示例代码

代码语言:javascript
复制
cfinley3@wordthree-wordfour-7654321:~/google-cloud-ml/samples/flowers$ ./sample.sh

Your active configuration is: [cloudshell-18758]
Using job id:  flowers_cfinley3_20170216_045347

预处理似乎没问题

代码语言:javascript
复制
python trainer/preprocess.py \
  --input_dict "$DICT_FILE" \
  --input_path "gs://cloud-ml-data/img/flower_photos/train_set.csv" \
  --output_path "${GCS_PATH}/preprocess/train" \
  --cloud

培训开始

代码语言:javascript
复制
gcloud beta ml jobs submit training "$JOB_ID" \
  --module-name trainer.task \
  --package-path trainer \
  --staging-bucket "$BUCKET" \
  --region us-central1 \
  -- \
  --output_path "${GCS_PATH}/training" \
  --eval_data_paths "${GCS_PATH}/preproc/eval*" \
  --train_data_paths "${GCS_PATH}/preproc/train*"
Job [flowers_cfinley3_20170216_045347] submitted successfully.

训练停留在全局步长/秒:0

代码语言:javascript
复制
INFO    2017-02-16 06:24:48 -0600       unknown_task            Validating job requirements...
INFO    2017-02-16 06:24:48 -0600       unknown_task            Job creation request has been successfully validated.
INFO    2017-02-16 06:24:48 -0600       unknown_task            Job flowers_cfinley3_20170216_045347 is queued.
INFO    2017-02-16 06:24:55 -0600       unknown_task            Waiting for job to be provisioned.
INFO    2017-02-16 06:24:55 -0600       unknown_task            Waiting for TensorFlow to start.
INFO    2017-02-16 06:28:27 -0600       master-replica-0                Running task with arguments: --cluster={"master": ["master-9a431abe8e-0:2222"]} --task={"type": "master", "index": 0} --job={
INFO    2017-02-16 06:28:27 -0600       master-replica-0                  "package_uris": ["gs://wordthree-wordfour-7654321-ml/flowers_cfinley3_20170216_045347/edafa5c7debed9fc8612af3c0dd33d145e23502e/trainer-0.1.tar.gz"],
INFO    2017-02-16 06:28:27 -0600       master-replica-0                  "python_module": "trainer.task",
INFO    2017-02-16 06:28:27 -0600       master-replica-0                  "args": ["--output_path", "gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/training", "--eval_data_paths", "gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/preproc/eval*", "--train_data_paths", "gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/preproc/train*"],
INFO    2017-02-16 06:28:27 -0600       master-replica-0                  "region": "us-central1"
INFO    2017-02-16 06:28:27 -0600       master-replica-0                } --beta
INFO    2017-02-16 06:28:28 -0600       master-replica-0                Running module trainer.task.
INFO    2017-02-16 06:28:28 -0600       master-replica-0                Running command: gsutil -q cp gs://wordthree-wordfour-7654321-ml/flowers_cfinley3_20170216_045347/edafa5c7debed9fc8612af3c0dd33d145e23502e/trainer-0.1.tar.gz trainer-0.1.tar.gz
INFO    2017-02-16 06:28:29 -0600       master-replica-0                Installing the package: gs://wordthree-wordfour-7654321-ml/flowers_cfinley3_20170216_045347/edafa5c7debed9fc8612af3c0dd33d145e23502e/trainer-0.1.tar.gz
INFO    2017-02-16 06:28:29 -0600       master-replica-0                Running command: pip install --user --upgrade --force-reinstall trainer-0.1.tar.gz
INFO    2017-02-16 06:28:29 -0600       master-replica-0                Processing ./trainer-0.1.tar.gz
INFO    2017-02-16 06:28:30 -0600       master-replica-0                Building wheels for collected packages: trainer
INFO    2017-02-16 06:28:30 -0600       master-replica-0                  Running setup.py bdist_wheel for trainer: started
INFO    2017-02-16 06:28:30 -0600       master-replica-0                creating '/tmp/tmpn9HeiIpip-wheel-/trainer-0.1-cp27-none-any.whl' and adding '.' to it
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer/model.py'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer/__init__.py'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer/util.py'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer/preprocess.py'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer-0.1.dist-info/DESCRIPTION.rst'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer-0.1.dist-info/metadata.json'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer-0.1.dist-info/top_level.txt'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer-0.1.dist-info/METADATA'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer-0.1.dist-info/RECORD'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                  Running setup.py bdist_wheel for trainer: finished with status 'done'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                  Stored in directory: /root/.cache/pip/wheels/e8/0c/c7/b77d64796dbbac82503870c4881d606fa27e63942e07c75f0e
INFO    2017-02-16 06:28:30 -0600       master-replica-0                Successfully built trainer
INFO    2017-02-16 06:28:30 -0600       master-replica-0                Installing collected packages: trainer
INFO    2017-02-16 06:28:30 -0600       master-replica-0                Successfully installed trainer-0.1
INFO    2017-02-16 06:28:31 -0600       master-replica-0                Running command: python -m trainer.task --output_path gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/training --eval_data_paths gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/preproc/eval* --train_data_paths gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/preproc/train*
INFO    2017-02-16 06:28:34 -0600       master-replica-0                Original job data: {u'package_uris': [u'gs://wordthree-wordfour-7654321-ml/flowers_cfinley3_20170216_045347/edafa5c7debed9fc8612af3c0dd33d145e23502e/trainer-0.1.tar.gz'], u'args': [u'--output_path', u'gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/training', u'--eval_data_paths', u'gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/preproc/eval*', u'--train_data_paths', u'gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/preproc/train*'], u'python_module': u'trainer.task', u'region': u'us-central1'}
INFO    2017-02-16 06:28:34 -0600       master-replica-0                setting eval batch size to 100
INFO    2017-02-16 06:28:34 -0600       master-replica-0                Starting master/0
INFO    2017-02-16 06:28:34 -0600       master-replica-0                Initialize GrpcChannelCache for job master -> {0 -> localhost:2222}
INFO    2017-02-16 06:28:34 -0600       master-replica-0                Started server with target: grpc://localhost:2222
WARNING 2017-02-16 06:28:35 -0600       master-replica-0                From /root/.local/lib/python2.7/site-packages/trainer/task.py:211 in run_training.: merge_all_summaries (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
WARNING 2017-02-16 06:28:35 -0600       master-replica-0                Instructions for updating:
WARNING 2017-02-16 06:28:35 -0600       master-replica-0                Please switch to tf.summary.merge_all.
WARNING 2017-02-16 06:28:35 -0600       master-replica-0                From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/logging_ops.py:270 in merge_all_summaries.: merge_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
WARNING 2017-02-16 06:28:35 -0600       master-replica-0                Instructions for updating:
WARNING 2017-02-16 06:28:35 -0600       master-replica-0                Please switch to tf.summary.merge.
INFO    2017-02-16 06:28:36 -0600       master-replica-0                Start master session 538be2b71d17c4dc with config: 
ERROR.  2017-02-16 06:28:36 -0600       master-replica-0                device_filters: "/job:ps"
ERROR.  2017-02-16 06:28:36 -0600       master-replica-0                device_filters: "/job:master/task:0"
INFO    2017-02-16 06:28:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:30:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:32:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:34:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:36:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:38:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:40:39 -0600       master-replica-0                global_step/sec: 0
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-02-22 14:24:37

请参见类似的问题。检查您的输入数据文件,确保它们不是空的。如果您的数据文件是空的,这会导致此行为,因为TF永远等待数据。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/42275808

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档