首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >`capture_tpu_profile`无法访问TPU

`capture_tpu_profile`无法访问TPU
EN

Stack Overflow用户
提问于 2020-02-01 02:27:25
回答 1查看 162关注 0票数 0

我有以下TPU:

代码语言:javascript
复制
$ gcloud compute tpus list 
NAME         ZONE           ACCELERATOR_TYPE  NETWORK_ENDPOINTS  NETWORK  RANGE          STATUS
daniels-tpu  us-central1-a  v3-8              10.240.1.10:8470   default  10.240.1.8/29  READY

但不能通过capture_tpu_profile访问:

代码语言:javascript
复制
$  capture_tpu_profile --tpu=daniels-tpu 
TensorFlow version 1.15.2 detected
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W0131 10:15:16.553251 4571966912 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1319, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1252, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1298, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1247, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 966, in send
    self.connect()
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 938, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 707, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 752, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/capture_tpu_profile", line 8, in <module>
    sys.exit(run_main())
  File "/usr/local/lib/python3.7/site-packages/cloud_tpu_profiler/main.py", line 85, in run_main
    tf.compat.v1.app.run(main)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.7/site-packages/cloud_tpu_profiler/main.py", line 105, in main
    [FLAGS.tpu], zone=FLAGS.tpu_zone, project=FLAGS.gcp_project))
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/distribute/cluster_resolver/tpu_cluster_resolver.py", line 330, in __init__
    self._request_compute_metadata('project/project-id'))
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/distribute/cluster_resolver/tpu_cluster_resolver.py", line 124, in _request_compute_metadata
    resp = urlopen(req)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1347, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1321, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>

即使使用了--tpu_zone=us-central1-a标志,错误也没有消失。

EN

回答 1

Stack Overflow用户

发布于 2020-02-01 04:14:25

当您使用capture_tpu_profile捕获配置文件时,会将一个.tracetable文件保存到您的Google Cloud Storage存储桶中,如果未提供此位置,则它将无处可去。

您是否可以尝试使用以下命令添加存储桶的路径:

代码语言:javascript
复制
capture_tpu_profile --tpu=tpu-name --logdir=${MODEL_DIR}

--logdir=${MODEL_DIR} -这是存储模型和检查点的云存储位置。

documentation中提到的所有详细信息

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/60009383

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档