我的目标是使用Tensorflow在MNIST上训练一个非常简单的CNN,将其转换为TensorRT,并使用TensorRT在MNIST测试集上执行推断,所有这些都是在Jetson Nano上进行的,但我收到了几个错误和警告,包括“OutOfMemory Error in GpuMemory: 0”。为了尝试并减少内存占用,我还尝试创建了一个脚本,在该脚本中,我只是加载TensorRT模型(该模型已经被转换并保存在前面的脚本中),并使用它对MNIST测试集的一小部分(100个浮点值)执行推断,但我仍然得到相同的内存不足错误。包含TensorRT模型的整个目录只有488KB,100个测试点不可能占用太多内存,所以我搞不懂为什么GPU内存会用完。这可能是什么原因,我如何解决它?
另一件看起来可疑的事情是,一些Tensorflow日志信息消息被多次打印,EG“成功打开动态库libcudart”,“成功打开动态库libcublas”,“ARM64不支持NUMA -返回NUMA节点零”。这可能是什么原因(EG动态库被一次又一次地打开),这是否与GPU内存不断耗尽有关?
下面显示的是两个Python脚本;每个脚本的控制台输出都太长,不能发布到Stack Overflow上,但可以看到它们附加在这个Gist上:https://gist.github.com/jakelevi1996/8a86f2c2257001afc939343891ee5de7
"""
Example script which trains a simple CNN for 1 epoch on a subset of MNIST, and
converts the model to TensorRT format, for enhanced performance which fully
utilises the NVIDIA GPU, and then performs inference.
Useful resources:
- https://stackoverflow.com/questions/58846828/how-to-convert-tensorflow-2-0-savedmodel-to-tensorrt
- https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#worflow-with-savedmodel
- https://www.tensorflow.org/api_docs/python/tf/experimental/tensorrt/Converter
- https://github.com/tensorflow/tensorflow/issues/34339
- https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/image-classification/image_classification.py
Tested on the NVIDIA Jetson Nano, Python 3.6.9, tensorflow 2.1.0+nv20.4, numpy
1.16.1
"""
import os
from time import perf_counter
import numpy as np
t0 = perf_counter()
import tensorflow as tf
from tensorflow.keras import datasets, layers, models, Input
from tensorflow.python.compiler.tensorrt import trt_convert as trt
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
from tensorflow.python.framework import convert_to_constants
tf.compat.v1.enable_eager_execution() # see github issue above
# Get training and test data
(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()
x_train = np.expand_dims(x_train, -1) / 255.0
x_test = np.expand_dims(x_test, -1) / 255.0
# Create model
model = models.Sequential()
# model.add(Input(shape=x_train.shape[1:], batch_size=batch_size))
model.add(layers.Conv2D(10, (5, 5), activation='relu', padding="same"))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(10))
# Compile and train model
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(
x_train[:10000], y_train[:10000], validation_data=(x_test, y_test),
batch_size=100, epochs=1,
)
# Save model
print("Saving model...")
current_dir = os.path.dirname(os.path.abspath(__file__))
model_dir = os.path.join(current_dir, "CNN_MNIST")
if not os.path.isdir(model_dir): os.makedirs(model_dir)
# model.save(model_dir)
tf.saved_model.save(model, model_dir)
# Convert to TRT format
trt_model_dir = os.path.join(current_dir, "CNN_MNIST_TRT")
converter = trt.TrtGraphConverterV2(input_saved_model_dir=model_dir)
converter.convert()
converter.save(trt_model_dir)
t1 = perf_counter()
print("Finished TRT conversion; time taken = {:.3f} s".format(t1 - t0))
# Make predictions using saved model, and print the results (NB using an alias
# for tf.saved_model.load, because the normal way of calling this function
# throws an error because for some reason it is expecting a sess)
saved_model_loaded = tf.compat.v1.saved_model.load_v2(
export_dir=trt_model_dir, tags=[tag_constants.SERVING])
graph_func = saved_model_loaded.signatures[
signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
graph_func = convert_to_constants.convert_variables_to_constants_v2(graph_func)
x_test_tensor = tf.convert_to_tensor(x_test, dtype=tf.float32)
preds = graph_func(x_test_tensor)[0].numpy()
print(preds.shape, y_test.shape)
accuracy = list(preds.argmax(axis=1) == y_test).count(True) / y_test.size
print("Accuracy of predictions = {:.2f} %".format(accuracy * 100))"""
Example script which trains a simple CNN for 1 epoch on a subset of MNIST, and
converts the model to TensorRT format, for enhanced performance which fully
utilises the NVIDIA GPU.
Useful resources:
- https://stackoverflow.com/questions/58846828/how-to-convert-tensorflow-2-0-savedmodel-to-tensorrt
- https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#worflow-with-savedmodel
- https://www.tensorflow.org/api_docs/python/tf/experimental/tensorrt/Converter
- https://github.com/tensorflow/tensorflow/issues/34339
- https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/image-classification/image_classification.py
Tested on the NVIDIA Jetson Nano, Python 3.6.9, tensorflow 2.1.0+nv20.4, numpy
1.16.1
"""
import os
from time import perf_counter
import numpy as np
t0 = perf_counter()
import tensorflow as tf
from tensorflow.keras import datasets
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
from tensorflow.python.framework import convert_to_constants
tf.compat.v1.enable_eager_execution() # see github issue above
# Get training and test data
(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()
x_train = np.expand_dims(x_train, -1) / 255.0
x_test = np.expand_dims(x_test, -1) / 255.0
# TEMPORARY: just use 100 test points to minimise GPU memory
num_points = 100
x_test, y_test = x_test[:num_points], y_test[:num_points]
current_dir = os.path.dirname(os.path.abspath(__file__))
trt_model_dir = os.path.join(current_dir, "CNN_MNIST_TRT")
# Make predictions using saved model, and print the results (NB using an alias
# for tf.saved_model.load, because the normal way of calling this function
# throws an error because for some reason it is expecting a sess)
saved_model_loaded = tf.compat.v1.saved_model.load_v2(
export_dir=trt_model_dir, tags=[tag_constants.SERVING])
graph_func = saved_model_loaded.signatures[
signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
graph_func = convert_to_constants.convert_variables_to_constants_v2(graph_func)
x_test_tensor = tf.convert_to_tensor(x_test, dtype=tf.float32)
preds = graph_func(x_test_tensor)[0].numpy()
print(preds.shape, y_test.shape)
accuracy = list(preds.argmax(axis=1) == y_test).count(True) / y_test.size
print("Accuracy of predictions = {:.2f} %".format(accuracy * 100))
t1 = perf_counter()
print("Finished inference; time taken = {:.3f} s".format(t1 - t0))发布于 2020-07-02 12:15:06
我在Jetson Tx2上也遇到了同样的错误。我认为它来自GPU和CPU之间的共享内存,tensorflow不允许足够的内存或操作系统限制了分配。
要解决此问题,您可以允许内存增长:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)或者,您可以强制tensorflow分配足够的内存:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only allocate 1GB of memory on the first GPU
try:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048)])
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Virtual devices must be set before GPUs have been initialized
print(e)发布于 2020-06-27 07:43:00
我在日志中看到它创建了600 Mb的GPU设备:
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 638 MB memory)然后它尝试分配1 1Gb:
Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.00GiB (rounded to 1073742336).这一点也很清楚。该GPU设备的内存超过600Mb。在日志中可以看到:
2020-06-23 23:06:36.463934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s那么也许你的GPU正在运行其他的计算呢?
https://stackoverflow.com/questions/62600335
复制相似问题