首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用CUDA8.0在AWS SageMaker笔记本上使用Turi创建对象检测

用CUDA8.0在AWS SageMaker笔记本上使用Turi创建对象检测
EN

Stack Overflow用户
提问于 2019-07-01 14:51:57
回答 1查看 389关注 0票数 0

正如标题所述,我试图使用Python3.6( SageMaker环境)在AWS SageMaker笔记本实例上使用Turi。即使在默认情况下安装了CUDA 10.0,CUDA 8.0也是预装的,可以使用笔记本中的以下命令进行选择:

代码语言:javascript
复制
!sudo rm /usr/local/cuda
!sudo ln -s /usr/local/cuda-8.0 /usr/local/cuda

我已经使用nvcc --version验证了这个安装,并且:

代码语言:javascript
复制
$ cd /usr/local/cuda/samples/1_Utilities/deviceQuery
$ sudo make
$ ./deviceQuery

接下来,在我的笔记本中,我为CUDA 8.0安装了Turi和mxnet的正确版本:

代码语言:javascript
复制
!pip install turicreate==5.4
!pip uninstall -y mxnet
!pip install mxnet-cu80==1.1.0

然后,我准备我的图像并尝试创建一个模型:

代码语言:javascript
复制
import turicreate as tc

tc.config.set_num_gpus(-1)
images = tc.image_analysis.load_images('images', ignore_failure=True);
data = images.join(annotations_);
train_data, test_data = data.random_split(0.8)
model = tc.object_detector.create(train_data, max_iterations=50)

,它在运行tc.object_detector.create时输出以下内容

代码语言:javascript
复制
Using 'image' as feature column
Using 'annotaion' as annotations column
Downloading https://docs-assets.developer.apple.com/turicreate/models/darknet.params
Download completed: /var/tmp/model_cache/darknet.params
Setting 'batch_size' to 32
Using GPUs to create model (Tesla K80, Tesla K80, Tesla K80, Tesla K80, Tesla K80, Tesla K80, Tesla K80, Tesla K80)
Using default 16 lambda workers.
To maximize the degree of parallelism, add the following code to the beginning of the program:
"turicreate.config.set_runtime_config('TURI_DEFAULT_NUM_PYLAMBDA_WORKERS', 32)"
Note that increasing the degree of parallelism also increases the memory footprint.
---------------------------------------------------------------------------
MXNetError                                Traceback (most recent call last)
_ctypes/callbacks.c in 'calling callback function'()

~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/kvstore.py in updater_handle(key, lhs_handle, rhs_handle, _)
     81         lhs = _ndarray_cls(NDArrayHandle(lhs_handle))
     82         rhs = _ndarray_cls(NDArrayHandle(rhs_handle))
---> 83         updater(key, lhs, rhs)
     84     return updater_handle
     85 

~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/optimizer/optimizer.py in __call__(self, index, grad, weight)
   1528                 self.sync_state_context(self.states[index], weight.context)
   1529             self.states_synced[index] = True
-> 1530         self.optimizer.update_multi_precision(index, weight, grad, self.states[index])
   1531 
   1532     def sync_state_context(self, state, context):

~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/optimizer/optimizer.py in update_multi_precision(self, index, weight, grad, state)
    553         use_multi_precision = self.multi_precision and weight.dtype == numpy.float16
    554         self._update_impl(index, weight, grad, state,
--> 555                           multi_precision=use_multi_precision)
    556 
    557 @register

~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/optimizer/optimizer.py in _update_impl(self, index, weight, grad, state, multi_precision)
    535             if state is not None:
    536                 sgd_mom_update(weight, grad, state, out=weight,
--> 537                                lazy_update=self.lazy_update, lr=lr, wd=wd, **kwargs)
    538             else:
    539                 sgd_update(weight, grad, out=weight, lazy_update=self.lazy_update,

~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/ndarray/register.py in sgd_mom_update(weight, grad, mom, lr, momentum, wd, rescale_grad, clip_gradient, out, name, **kwargs)

~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py in _imperative_invoke(handle, ndargs, keys, vals, out)
     90         c_str_array(keys),
     91         c_str_array([str(s) for s in vals]),
---> 92         ctypes.byref(out_stypes)))
     93 
     94     if original_output is not None:

~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/base.py in check_call(ret)
    144     """
    145     if ret != 0:
--> 146         raise MXNetError(py_str(_LIB.MXGetLastError()))
    147 
    148 

MXNetError: Cannot find argument 'lazy_update', Possible Arguments:
----------------
lr : float, required
    Learning rate
momentum : float, optional, default=0
    The decay rate of momentum estimates at each epoch.
wd : float, optional, default=0
    Weight decay augments the objective function with a regularization term that penalizes large weights. The penalty scales with the square of the magnitude of each weight.
rescale_grad : float, optional, default=1
    Rescale gradient to grad = rescale_grad*grad.
clip_gradient : float, optional, default=-1
    Clip gradient to the range of [-clip_gradient, clip_gradient] If clip_gradient <= 0, gradient clipping is turned off. grad = max(min(grad, clip_gradient), -clip_gradient).
, in operator sgd_mom_update(name="", wd="0.0005", momentum="0.9", clip_gradient="0.025", rescale_grad="1.0", lr="0.001", lazy_update="True")

有趣的是,如果我用CUDA 10.0代替Turi创建5.6:

代码语言:javascript
复制
!pip install turicreate==5.6
!pip uninstall -y mxnet
!pip install mxnet-cu100==1.4.0.post0

笔记本仍然失败,但如果我立即卸载turicreatemxnet-cu100,并再次尝试上述步骤的CUDA 8.0,它的工作没有任何问题。

上一次,在重新启动实例之后,我尝试了pip freeze > requirements.txt,然后尝试了pip install -r requirements.txt,但是我仍然得到了与上面相同的错误(除非我首先尝试使用CUDA 10.0 )。这里发生了什么事?如有任何建议,敬请见谅。

EN

回答 1

Stack Overflow用户

发布于 2019-12-09 19:32:30

从mxnet1.1.0到1.4.0的更新是正确的修正。看起来这个错误与CUDA版本无关,而是与MXNet本身有关。

Mxnet1.1.0的https://github.com/apache/incubator-mxnet源代码没有sgd_mom_update函数的lazy_update参数。

您可以通过比较mxnet发布标签1.4.0的优化器代码中的sgd_mom_update函数调用来观察这一点。

https://github.com/apache/incubator-mxnet/blob/a03d59ed867ba334d78d61246a1090cd1868f5da/python/mxnet/optimizer/optimizer.py#L536

使用mxnet发布标签1.1.0的优化器代码

https://github.com/apache/incubator-mxnet/blob/07a83a0325a3d782513a04f47d711710972cb144/python/mxnet/optimizer.py#L517

这些更改包含在mxnet>=1.3.0中,这就是为什么您的测试在mxnet-cu100==1.4.0.post0上成功的原因。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/56837823

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档