这是否意味着我需要更多vram ?我目前正在gtx 1050 2 GB机型上训练该模型
2021-02-12 22:51:38.033037: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 84.38MiB (rounded to 88473600). Current allocation summary follows.
当我运行脚本时,它会获取数据,制作图像并拆分训练数据,开始并完成训练,给出大约20个结果,然后崩溃
这是我用来训练的模型。
img_width, img_height = 150, 150
# Enter the number of samples, training + validation
nb_train_samples = x1 + y2
nb_validation_samples = x2 + y2
nb_filter1 = 16
nb_filter2 = 16
nb_filter3 = 32
conv1_size = 3
conv2_size = 2
conv3_size = 5
pool_size = 2
# We have 2 classes
classes_num = 2
batch_size = 10
lr = 0.001
chanDim =3
model = Sequential()
model.add(Convolution2D(nb_filter1, conv1_size, conv1_size, border_mode ='same', input_shape=(img_height, img_width , 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(pool_size, pool_size)))
model.add(Convolution2D(nb_filter2, conv2_size, conv2_size, border_mode ="same"))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(pool_size, pool_size), dim_ordering='th'))
model.add(Convolution2D(nb_filter3, conv3_size, conv3_size, border_mode ='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(pool_size, pool_size), dim_ordering='th'))
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(classes_num, activation='softmax'))有什么办法可以解决这个问题吗?
Limit: 1406277838
InUse: 1388972288
MaxInUse: 1388972288
NumAllocs: 667963
MaxAllocSize: 176941568
2021-02-12 22:51:38.055777: W tensorflow/core/common_runtime/bfc_allocator.cc:424] *******x*******xx********_**********************************************************************xxxx
2021-02-12 22:51:38.055870: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at cwise_ops_common.cc:82 : Resource exhausted: OOM when allocating tensor with shape[21600,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc发布于 2021-02-13 04:09:41
根据模型体系结构和数据大小,VRAM耗尽是没有变通办法的。您可以尝试减少过滤器的数量,特别是在第一层中。从技术上讲,您可以使用CPU进行训练,但它可能会非常慢。
https://stackoverflow.com/questions/66178368
复制相似问题