首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >分布式Tensorflow Tensorforest

分布式Tensorflow Tensorforest
EN

Stack Overflow用户
提问于 2018-05-25 22:44:58
回答 1查看 187关注 0票数 0

我是分布式处理的新手,我想要knowHow,我们可以使用分布式张量森林训练张量森林模型吗?我知道神经网络是如何实现的,但我不了解tensorforest,它是使用tensorflow框架的随机森林实现

EN

回答 1

Stack Overflow用户

发布于 2019-04-04 09:38:36

我最近深入研究了这个话题。由于TensorForestEstimator是从tf.contrib.learn.Estimator派生的,因此应该可以在分布式训练环境中使用它。

我遇到的问题是如何正确配置设备分配。TensorForestEstimator的构造函数接受device_assigner关键字参数。

device_assigner: An object instance that controls how trees get assigned to devices. If None, will use tensor_forest.RandomForestDeviceAssigner.

文档不准确。默认值实际上是tf.contrib.framework.VariableDeviceChooser的一个实例。

https://github.com/tensorflow/tensorflow/blob/v1.12.0/tensorflow/contrib/tensor_forest/python/tensor_forest.py#L380

该代码实例化不带参数的VariableDeviceChooser,它应该在不带参数服务器的情况下运行。这在单机环境中很好,但在分布式环境中就不是这样了。我尝试传递一个实例化的VariableDeviceChooser值,该值由TF_CONFIG中的数据推断出的参数服务器的数量来实例化。

这是我在训练操作期间启动会话时观察到的错误消息。

代码语言:javascript
复制
  File "/home/ubuntu/.pyenv/versions/cmle-1_12-py-3_5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/ubuntu/.pyenv/versions/cmle-1_12-py-3_5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _run_fn
    self._extend_graph()
  File "/home/ubuntu/.pyenv/versions/cmle-1_12-py-3_5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1352, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation device_dummy_0/Initializer/random_uniform/RandomUniform: Could not satisfy explicit device specification '' because the node {{colocation_node device_dummy_0/Initializer/random_uniform/RandomUniform}} was colocated with a group of nodes that required incompatible device '/job:ps/task:0/device:CPU:0'
Colocation Debug Info:
Colocation group had the following types and devices: 
IsVariableInitialized: CPU 
Assign: CPU 
Identity: CPU XLA_CPU 
VariableV2: CPU  
Mul: CPU XLA_CPU 
Add: CPU XLA_CPU 
Sub: CPU XLA_CPU 
RandomUniform: CPU XLA_CPU 
Const: CPU XLA_CPU 

Colocation members and user-requested devices:
  device_dummy_0/Initializer/random_uniform/shape (Const) 
  device_dummy_0/Initializer/random_uniform/min (Const) 
  device_dummy_0/Initializer/random_uniform/max (Const) 
  device_dummy_0/Initializer/random_uniform/RandomUniform (RandomUniform) 
  device_dummy_0/Initializer/random_uniform/sub (Sub) 
  device_dummy_0/Initializer/random_uniform/mul (Mul) 
  device_dummy_0/Initializer/random_uniform (Add) 
  device_dummy_0 (VariableV2) /job:ps/task:0/device:CPU:0   
  device_dummy_0/Assign (Assign) /job:ps/task:0/device:CPU:0
  device_dummy_0/read (Identity) /job:ps/task:0/device:CPU:0
  report_uninitialized_variables/IsVariableInitialized_1 (IsVariableInitialized) /job:ps/task:0/device:CPU:0  
  report_uninitialized_variables_1/IsVariableInitialized_1 (IsVariableInitialized) /job:ps/task:0/device:CPU:0
  save/Assign_1 (Assign) /job:ps/task:0/device:CPU:0

     [[{{node device_dummy_0/Initializer/random_uniform/RandomUniform}} = RandomUniform[T=DT_INT32, _class=["loc:@device_dummy_0"], dtype=DT_FLOAT, seed=0, seed2=0](device_dummy_0/Initializer/random_uniform/shape)]]```
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/50531801

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档