文章/答案/技术大牛

发布

社区首页 >问答首页 >不清楚为什么Deeplearning4j与CUDA和cuDNN与OutOfMemory失败

问不清楚为什么Deeplearning4j与CUDA和cuDNN与OutOfMemory失败
EN

Stack Overflow用户

提问于 2019-01-12 13:10:19

回答 1查看 1.4K关注 0票数 1

环境: Windows 7，GeForce GTX 750，CUDA 10.0，cuDNN 7.4

Maven依赖关系：

    <dependency>
        <groupId>org.nd4j</groupId>
        <artifactId>nd4j-cuda-10.0</artifactId>
        <version>1.0.0-beta3</version>
    </dependency>
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-cuda-10.0</artifactId>
        <version>1.0.0-beta3</version>
    </dependency>

我每10次检查一次测试结果。我以前经常调用net.evaluate()，但这给了我一个错误：

Exception in thread "AMDSI prefetch thread" java.lang.RuntimeException: java.lang.RuntimeException: Failed to allocate 637074016 bytes from DEVICE [0] memory
    at org.deeplearning4j.datasets.iterator.AsyncMultiDataSetIterator$AsyncPrefetchThread.run(AsyncMultiDataSetIterator.java:396)
Caused by: java.lang.RuntimeException: Failed to allocate 637074016 bytes from DEVICE [0] memory
    at org.nd4j.jita.memory.CudaMemoryManager.allocate(CudaMemoryManager.java:76)
    at org.nd4j.jita.workspace.CudaWorkspace.init(CudaWorkspace.java:88)
    at org.nd4j.linalg.memory.abstracts.Nd4jWorkspace.initializeWorkspace(Nd4jWorkspace.java:508)
    at org.nd4j.linalg.memory.abstracts.Nd4jWorkspace.close(Nd4jWorkspace.java:651)
    at org.deeplearning4j.datasets.iterator.AsyncMultiDataSetIterator$AsyncPrefetchThread.run(AsyncMultiDataSetIterator.java:372)

然后，我用net.evaluate= false将测试集从net.output()转换为net.output()，并将测试集的大小从100减少到只有20。我试图将记录的数量增加到30，它显示了这一警告，但仍然有效：

2019-01-12 14:47:44 WARN  org.deeplearning4j.nn.layers.BaseCudnnHelper Cannot allocate 300000000 bytes of device memory (CUDA error = 2), proceeding with host memory

我可以理解，视频卡上没有足够的内存(GeForce GTX 750规格显示内存为1G)，但由于它可以使用主机内存，所以我将测试集大小增加到100，并因此错误导致永久失败：

2019-01-12 14:59:29 WARN  org.deeplearning4j.nn.layers.BaseCudnnHelper Cannot allocate 1000000000 bytes of device memory (CUDA error = 2), proceeding with host memory
Exception in thread "main" 2019-01-12 14:59:39 ERROR org.deeplearning4j.util.CrashReportingUtil >>> Out of Memory Exception Detected. Memory crash dump written to: C:\DATA\Projects\dl4j-language-model\dl4j-memory-crash-dump-1547294372940_1.txt
java.lang.OutOfMemoryError: Failed to allocate memory within limits: totalBytes (470M + 7629M) > maxBytes (7851M)
2019-01-12 14:59:39 WARN  org.deeplearning4j.util.CrashReportingUtil Memory crash dump reporting can be disabled with CrashUtil.crashDumpsEnabled(false) or using system property -Dorg.deeplearning4j.crash.reporting.enabled=false
    at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:580)
        at org.deeplearning4j.nn.layers.BaseCudnnHelper$DataCache.<init>(BaseCudnnHelper.java:119)
2019-01-12 14:59:39 WARN  org.deeplearning4j.util.CrashReportingUtil Memory crash dump reporting output location can be set with CrashUtil.crashDumpOutputDirectory(File) or using system property -Dorg.deeplearning4j.crash.reporting.directory=<path>
        at org.deeplearning4j.nn.layers.recurrent.CudnnLSTMHelper.activate(CudnnLSTMHelper.java:509)

现在，我假设maxBytes (7851M)引用堆大小(JVM使用-Xmx8G -Xms8G运行)，但是我也输出了Runtime freeMemory()和totalMemory()，在崩溃之前它显示了以下内容，这是足够的空闲内存：

2019-01-12 15:29:20 INFO  Free memory: 7722607976/8232370176

因此，我的问题是，如果JVM？totalBytes (470M + 7629M)？中有可用的空闲内存，那么数字来自哪里，为什么不能分配所需的1G？

以下是内存崩溃报告：

Deeplearning4j OOM Exception Encountered for ComputationGraph
Timestamp:                              2019-01-12 14:59:32.940
Thread ID                               1
Thread Name                             main


Stack Trace:
java.lang.OutOfMemoryError: Failed to allocate memory within limits: totalBytes (470M + 7629M) > maxBytes (7851M)
    at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:580)
    at org.deeplearning4j.nn.layers.BaseCudnnHelper$DataCache.<init>(BaseCudnnHelper.java:119)
    at org.deeplearning4j.nn.layers.recurrent.CudnnLSTMHelper.activate(CudnnLSTMHelper.java:509)
    at org.deeplearning4j.nn.layers.recurrent.LSTMHelpers.activateHelper(LSTMHelpers.java:205)
    at org.deeplearning4j.nn.layers.recurrent.LSTM.activateHelper(LSTM.java:163)
    at org.deeplearning4j.nn.layers.recurrent.LSTM.activate(LSTM.java:140)
    at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:110)
    at org.deeplearning4j.nn.graph.ComputationGraph.outputOfLayersDetached(ComputationGraph.java:2316)
    at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1727)
    at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1686)
    at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1672)
    at org.lungen.deeplearning.net.predictor.CharacterSequenceValuePredictorNet.testOutputAndScore(CharacterSequenceValuePredictorNet.java:195)
    at org.lungen.deeplearning.net.predictor.CharacterSequenceValuePredictorNet.train(CharacterSequenceValuePredictorNet.java:166)
    at org.lungen.deeplearning.net.predictor.CharacterSequenceValuePredictorNet.main(CharacterSequenceValuePredictorNet.java:283)


========== Memory Information ==========
----- Version Information -----
Deeplearning4j Version                  1.0.0-beta3
Deeplearning4j CUDA                     deeplearning4j-cuda-10.0

----- System Information -----
Operating System                        Microsoft Windows 7 SP1
CPU                                     Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
CPU Cores - Physical                    4
CPU Cores - Logical                     8
Total System Memory                       15.97 GB (17144102912)
Number of GPUs Detected                 1
  Name                           CC                Total Memory              Used Memory              Free Memory
  GeForce GTX 750                5.0          2 GB (2147483648)     1.67 GB (1795002368)    336.15 MB (352481280)

----- ND4J Environment Information -----
Data Type                               FLOAT
backend                                 CUDA
blas.vendor                             CUBLAS
os                                      Windows 7

----- Memory Configuration -----
JVM Memory: XMX                            7.67 GB (8232370176)
JVM Memory: current                        7.67 GB (8232370176)
JavaCPP Memory: Max Bytes                  7.67 GB (8232370176)
JavaCPP Memory: Max Physical              15.33 GB (16464740352)
JavaCPP Memory: Current Bytes            470.26 MB (493106209)
JavaCPP Memory: Current Physical           3.35 GB (3601498112)
Periodic GC Enabled                     true
Periodic GC Frequency                   100 ms

----- Workspace Information -----
Workspaces: # for current thread        4
Current thread workspaces:
  Name                      State       Size                          # Cycles            
  WS_LAYER_WORKING_MEM      CLOSED       117.40 MB (123100000)        6802                
  WS_ALL_LAYERS_ACT         CLOSED        19.41 MB (20349840)         2400                
  WS_LAYER_ACT_0            CLOSED         6.23 MB (6528000)          1601                
  WS_LAYER_ACT_1            CLOSED       381.47 MB (400000000)        1601                
Workspaces total size                    524.50 MB (549977840)
Helper Workspaces
  CUDNN_WORKSPACE                            7.06 MB (7408000)

----- Network Information -----
Network # Parameters                    1432106
Parameter Memory                           5.46 MB (5728424)
Parameter Gradients Memory                 5.46 MB (5728424)
Updater Number of Elements              2862812
Updater Memory                            10.92 MB (11451248)
Updater Classes:
  org.nd4j.linalg.learning.AdamUpdater
  org.nd4j.linalg.learning.NoOpUpdater
Params + Gradient + Updater Memory        16.38 MB (17179672)
Iteration Count                         400
Epoch Count                             0
Backprop Type                           TruncatedBPTT
TBPTT Length                            50/50
Workspace Mode: Training                ENABLED
Workspace Mode: Inference               ENABLED
Number of Layers                        7
Layer Counts
  BatchNormalization                      2
  DenseLayer                              1
  LSTM                                    3
  OutputLayer                             1
Layer Parameter Breakdown
  Idx Name                 Layer Type           Layer # Parameters   Layer Parameter Memory
  1   lstm-1               LSTM                 403000                  1.54 MB (1612000)
  2   lstm-2               LSTM                 501000                  1.91 MB (2004000)
  3   lstm-3               LSTM                 501000                  1.91 MB (2004000)
  5   norm-1               BatchNormalization   1000                    3.91 KB (4000)   
  6   dense-1              DenseLayer           25100                  98.05 KB (100400) 
  7   norm-2               BatchNormalization   400                     1.56 KB (1600)   
  8   output               OutputLayer          606                     2.37 KB (2424)   

----- Layer Helpers - Memory Use -----
#   Layer Name           Layer Class               Helper Class                   Total Memory Memory Breakdown
5   norm-1               BatchNormalization        CudnnBatchNormalizationHelper     1.95 KB (2000) {meanCache=1000, varCache=1000}
7   norm-2               BatchNormalization        CudnnBatchNormalizationHelper       800 B   {meanCache=400, varCache=400}
Total Helper Count                      2
Helper Count w/ Memory                  2
Total Helper Persistent Memory Use         2.73 KB (2800)

----- Network Activations: Inferred Activation Shapes -----
Current Minibatch Size                  100
Current Input Shape (Input 0)           [100, 152, 2000]
Idx Name                 Layer Type           Activations Type                           Activations Shape    # Elements   Memory      
0   recurrentInput       InputVertex          InputTypeRecurrent(152,timeSeriesLength=2000) [100, 152, 2000]     30400000      115.97 MB (121600000)
1   lstm-1               LSTM                 InputTypeRecurrent(250,timeSeriesLength=2000) [100, 250, 2000]     50000000      190.73 MB (200000000)
2   lstm-2               LSTM                 InputTypeRecurrent(250,timeSeriesLength=2000) [100, 250, 2000]     50000000      190.73 MB (200000000)
3   lstm-3               LSTM                 InputTypeRecurrent(250,timeSeriesLength=2000) [100, 250, 2000]     50000000      190.73 MB (200000000)
4   thoughtVector        LastTimeStepVertex   InputTypeFeedForward(250)                  [100, 250]           25000          97.66 KB (100000)
5   norm-1               BatchNormalization   InputTypeFeedForward(250)                  [100, 250]           25000          97.66 KB (100000)
6   dense-1              DenseLayer           InputTypeFeedForward(100)                  [100, 100]           10000          39.06 KB (40000)
7   norm-2               BatchNormalization   InputTypeFeedForward(100)                  [100, 100]           10000          39.06 KB (40000)
8   output               OutputLayer          InputTypeFeedForward(6)                    [100, 6]             600             2.34 KB (2400)
Total Activations Memory                 688.44 MB (721882400)
Total Activation Gradient Memory         688.44 MB (721880000)

----- Network Training Listeners -----
Number of Listeners                     3
Listener 0                              org.x.deeplearning.listener.ScorePrintListener@7b78ed6a
Listener 1                              ScoreIterationListener(10)
Listener 2                              org.x.deeplearning.listener.UIStatsListener@6fca5907

gpu

deeplearning4j

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-01-26 16:20:37

所以，简短的解释来结束这个问题。ND4J使用堆外内存，基本上映射到GPU内存。因此，正如@Samuel所指出的，7629M指的是堆外内存，这显然不适合我的GTX 750的GPU内存。

DL4J文档的最后说明

请注意，如果您的GPU有< 2g的RAM，它可能无法用于深入学习。如果是这样的话，您应该考虑使用CPU。典型的深度学习工作负载至少应该有4GB的RAM。即使是这样也是小的。对于深入学习的工作负载，建议在GPU上设置8GB的RAM。

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54159900

复制

相似问题

问不清楚为什么Deeplearning4j与CUDA和cuDNN与OutOfMemory失败
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问不清楚为什么Deeplearning4j与CUDA和cuDNN与OutOfMemory失败EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问不清楚为什么Deeplearning4j与CUDA和cuDNN与OutOfMemory失败
EN