首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >调试:在Google colab上创建顺序模型崩溃的Keras+tensorflow代码,并显示消息"Your session crashed for an unknown reason“

调试:在Google colab上创建顺序模型崩溃的Keras+tensorflow代码,并显示消息"Your session crashed for an unknown reason“
EN

Stack Overflow用户
提问于 2021-01-12 11:53:54
回答 1查看 137关注 0票数 0

当使用'GPU‘运行时在Google colab上运行以下代码时(因为我的一个自定义层使用GPU执行tensorflow.fft ),我的会话崩溃了-

代码语言:javascript
复制
fc2_shape = 32*32

model = models.Sequential()
model.add(layers.Flatten(input_shape=(32, 32, 2)))
model.add(layers.Dense(fc2_shape, activation='tanh'))
model.add(layers.Dense(fc2_shape, activation='tanh'))
model.add(layers.Reshape((32, 32, 1)))
model.add(conv2d_layer(num_features=32, kernel_size=5, type_conv="complex"))
model.add(layers.Activation('relu'))
model.add(conv2d_layer(num_features=32, kernel_size=5, type_conv="complex", kernel_regularizer=regularizers.l1(0.0001)))
model.add(layers.Activation('relu'))
model.add(complex_conv_transpose_layer(num_features=1, kernel_size=9, strides=1))

model.summary()

它会崩溃,并显示消息“您的会话崩溃。自动重新启动..您的会话在崩溃后重新启动。调试..您的会话由于未知原因崩溃。查看运行时日志”\newline

运行时日志在下面共享。我可以得到一些帮助,以了解可能导致崩溃的原因,因为日志中甚至没有错误消息,只有警告。在日志中尝试了许多建议的警告方法,但似乎都不起作用。需要找出确切的原因。谢谢。

代码语言:javascript
复制
Jan 12, 2021, 9:02:25 AM    WARNING WARNING:root:kernel 96640b1f-78c4-4aee-8612-299bbd2a4d8d restarted
Jan 12, 2021, 9:02:25 AM    INFO    KernelRestarter: restarting kernel (1/5), keep random ports
Jan 12, 2021, 9:02:20 AM    WARNING 2021-01-12 03:32:20.430989: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
Jan 12, 2021, 9:02:20 AM    WARNING 2021-01-12 03:32:20.256096: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 13960 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
Jan 12, 2021, 9:02:20 AM    WARNING 2021-01-12 03:32:20.256008: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
Jan 12, 2021, 9:02:20 AM    WARNING 2021-01-12 03:32:20.255209: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Jan 12, 2021, 9:02:20 AM    WARNING 2021-01-12 03:32:20.254277: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Jan 12, 2021, 9:02:20 AM    WARNING 2021-01-12 03:32:20.253280: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Jan 12, 2021, 9:02:20 AM    WARNING 2021-01-12 03:32:20.247860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
Jan 12, 2021, 9:02:20 AM    WARNING 2021-01-12 03:32:20.247846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
Jan 12, 2021, 9:02:20 AM    WARNING 2021-01-12 03:32:20.247794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.208463: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.205616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.204824: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.203940: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.203862: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.203843: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.203825: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.203806: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.203787: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.203768: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.203743: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.203699: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
Jan 12, 2021, 9:02:16 AM    WARNING coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.73GiB deviceMemoryBandwidth: 298.08GiB/s
Jan 12, 2021, 9:02:16 AM    WARNING pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.203627: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.202836: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.202313: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.201374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.197656: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.196404: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Jan 12, 2021, 9:02:16 AM    WARNING 2021-01-12 03:32:16.196182: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
Jan 12, 2021, 9:02:15 AM    WARNING 2021-01-12 03:32:15.708027: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
Jan 12, 2021, 9:02:15 AM    WARNING 2021-01-12 03:32:15.691252: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
Jan 12, 2021, 9:02:15 AM    WARNING 2021-01-12 03:32:15.439674: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
Jan 12, 2021, 9:02:15 AM    WARNING 2021-01-12 03:32:15.390009: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
Jan 12, 2021, 9:02:15 AM    WARNING 2021-01-12 03:32:15.274378: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
Jan 12, 2021, 9:02:15 AM    WARNING 2021-01-12 03:32:15.274194: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
Jan 12, 2021, 9:02:14 AM    WARNING 2021-01-12 03:32:14.935005: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
Jan 12, 2021, 9:02:14 AM    WARNING coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.73GiB deviceMemoryBandwidth: 298.08GiB/s
Jan 12, 2021, 9:02:14 AM    WARNING pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
Jan 12, 2021, 9:02:14 AM    WARNING 2021-01-12 03:32:14.934938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
Jan 12, 2021, 9:02:14 AM    WARNING 2021-01-12 03:32:14.933915: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Jan 12, 2021, 9:02:14 AM    WARNING 2021-01-12 03:32:14.866772: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
Jan 12, 2021, 9:02:14 AM    WARNING 2021-01-12 03:32:14.865367: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
Jan 12, 2021, 9:02:14 AM    WARNING tcmalloc: large alloc 1228800000 bytes == 0x14210000 @ 0x7f536abaa1e7 0x7f53620a841e 0x7f53620f8c2b 0x7f53620f8cc8 0x7f53621b4d19 0x7f53621b7dec 0x7f53622d6ddf 0x7f53622dcf15 0x7f53622ded9d 0x7f53622e0476 0x5a48ec 0x5a4fb8 0x7f53621bf438 0x59c9f0 0x50ea2d 0x507be4 0x5161c5 0x50a12f 0x50beb4 0x507be4 0x509900 0x50a2fd 0x50beb4 0x507be4 0x509900 0x50a2fd 0x50cc96 0x507be4 0x508ec2 0x594a01 0x59fd0e
Jan 12, 2021, 9:02:06 AM    WARNING 2021-01-12 03:32:06.977597: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
Jan 12, 2021, 9:01:54 AM    INFO    Adapting to protocol v5.1 for kernel 96640b1f-78c4-4aee-8612-299bbd2a4d8d
Jan 12, 2021, 9:01:52 AM    INFO    Kernel started: 96640b1f-78c4-4aee-8612-299bbd2a4d8d
Jan 12, 2021, 9:00:04 AM    INFO    Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Jan 12, 2021, 9:00:04 AM    INFO    http://172.28.0.2:9000/
Jan 12, 2021, 9:00:04 AM    INFO    The Jupyter Notebook is running at:
Jan 12, 2021, 9:00:04 AM    INFO    0 active kernels
Jan 12, 2021, 9:00:04 AM    INFO    Serving notebooks from local directory: /
Jan 12, 2021, 9:00:04 AM    INFO    google.colab serverextension initialized.
Jan 12, 2021, 9:00:04 AM    INFO    Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
Jan 12, 2021, 9:00:04 AM    WARNING Config option `delete_to_trash` not recognized by `ColabFileContentsManager`.
EN

回答 1

Stack Overflow用户

发布于 2021-01-12 13:20:18

当您使用google colab上所有可用的RAM时,通常会发生这种情况,因为它不能处理非常大的数据集。您可以尝试升级RAM或使用不同的服务。

尝试使用Microsoft AzureAWSGoogle Cloud Services

他们都有很好的机器学习产品,但都是付费的。另一种选择是在Jupyter上本地运行它。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65677815

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档