(ray==1.12.0)
ray up的新的Ray ray up集群指令,但给出了一个例外:2022-04-28 08:19:46,218 ERROR services.py:1481 -- Failed to start the dashboard: Failed to start the dashboard, return code 1
The last 10 lines of /tmp/ray/session_2022-04-28_08-19-43_178339_1674/logs/dashboard.log:
File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/ray/dashboard/modules/state/state_head.py", line 11, in <module>
from ray.dashboard.state_aggregator import StateAPIManager
File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/ray/dashboard/state_aggregator.py", line 21, in <module>
from ray.experimental.state.state_manager import StateDataSourceClient
File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/ray/experimental/state/state_manager.py", line 67, in <module>
class StateDataSourceClient:
File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/ray/experimental/state/state_manager.py", line 80, in StateDataSourceClient
def __init__(self, gcs_channel: grpc.aio.Channel):
AttributeError: module 'grpc' has no attribute 'aio'
2022-04-28 08:19:46,218 ERROR services.py:1482 -- Failed to start the dashboard, return code 1
The last 10 lines of /tmp/ray/session_2022-04-28_08-19-43_178339_1674/logs/dashboard.log:
File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/ray/dashboard/modules/state/state_head.py", line 11, in <module>
from ray.dashboard.state_aggregator import StateAPIManager
File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/ray/dashboard/state_aggregator.py", line 21, in <module>
from ray.experimental.state.state_manager import StateDataSourceClient
File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/ray/experimental/state/state_manager.py", line 67, in <module>
class StateDataSourceClient:
File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/ray/experimental/state/state_manager.py", line 80, in StateDataSourceClient
def __init__(self, gcs_channel: grpc.aio.Channel):
AttributeError: module 'grpc' has no attribute 'aio'
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/ray/_private/services.py", line 1458, in start_dashboard
raise Exception(err_msg + last_log_str)
Exception: Failed to start the dashboard, return code 1
The last 10 lines of /tmp/ray/session_2022-04-28_08-19-43_178339_1674/logs/dashboard.log:
File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/ray/dashboard/modules/state/state_head.py", line 11, in <module>
from ray.dashboard.state_aggregator import StateAPIManager
File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/ray/dashboard/state_aggregator.py", line 21, in <module>
from ray.experimental.state.state_manager import StateDataSourceClient
File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/ray/experimental/state/state_manager.py", line 67, in <module>
class StateDataSourceClient:
File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/ray/experimental/state/state_manager.py", line 80, in StateDataSourceClient
def __init__(self, gcs_channel: grpc.aio.Channel):
AttributeError: module 'grpc' has no attribute 'aio'然后,
执行ray submit config.yaml script.py崩溃步骤。
ConnectionError: Could not find any running Ray instance. Please specify the one to connect to by setting `--address` flag or `RAY_ADDRESS` environment variable.当我在一个新集群中尝试相同的[2022-04-28 08:44:54,724 E 2087 2087] core_worker.cc:137: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory时,我得到了另一个错误
看起来要么是Ray中的bug,要么是文档不是最新的,要么是初学者不友好(我以前从未使用过Ray )。这里发生了什么事?也许还有更多初学者友好的教程,如何在AWS上使用Ray?
发布于 2022-07-26 08:29:58
在使用Slurm的HPC上,我也犯了同样的错误。对于我来说,指定一个比默认端口更有效的端口。
https://stackoverflow.com/questions/72040411
复制相似问题