文章/答案/技术大牛

发布

社区首页 >问答首页 >Tensorflow对象检测推理在CPU上缓慢

问Tensorflow对象检测推理在CPU上缓慢
EN

Stack Overflow用户

提问于 2017-09-19 13:30:33

回答 1查看 4.4K关注 0票数 4

系统信息

您正在使用的模型的顶层目录是什么：对象_检测/ssd_inception_v2
编写了自定义代码(而不是使用TensorFlow中提供的股票示例脚本)： No
操作系统平台和发行版(例如LinuxUbuntu16.04)：Ubuntu16.04
从(源或二进制)安装的TensorFlow：二进制
TensorFlow版本(使用下面的命令)： 1.2.1
Bazel版本(如果从源代码编译)： no
cuda /cuDNN版本：cuda 8.0
GPU模型和内存： Quadro M6000 24 GPU

在我的自定义数据集上训练了一个ssd_inception_v2模型之后，我想使用它进行推理。因为推理以后应该在没有GPU的设备上运行，所以I只提供给CPU进行推理。我对opject_detection_tutorial.ipynb进行了调整，以度量推理的时间，并让下面的代码在一系列视频图像上运行。

with detection_graph.as_default():
  with tf.Session(graph=detection_graph) as sess:
    while success:
      #print(str(datetime.datetime.now().time()) + " " + str(count))
      #read image
      success,image = vidcap.read()
      #resize image
      image = cv2.resize(image , (711, 400))
      # crop image to fit 690 x 400
      image = image[ : , 11:691]
      # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
      image_np_expanded = np.expand_dims(image, axis=0)
      #print(image_np_expanded.shape)
      image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
      # Each box represents a part of the image where a particular object was detected.
      boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
      # Each score represent how level of confidence for each of the objects.
      # Score is shown on the result image, together with the class label.
      scores = detection_graph.get_tensor_by_name('detection_scores:0')
      classes = detection_graph.get_tensor_by_name('detection_classes:0')
      num_detections = detection_graph.get_tensor_by_name('num_detections:0')
      before = datetime.datetime.now()
      # Actual detection.
      (boxes, scores, classes, num_detections) = sess.run(
          [boxes, scores, classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})
      print("This took : " + str(datetime.datetime.now() - before))  
      vis_util.visualize_boxes_and_labels_on_image_array(
          image,
          np.squeeze(boxes),
          np.squeeze(classes).astype(np.int32),
          np.squeeze(scores),
          category_index,
          use_normalized_coordinates=True,
          line_thickness=8)

      #cv2.imwrite("converted/frame%d.jpg" % count, image)     # save frame as JPEG file
      count += 1

具有以下输出：

这个花了: 0:00:04.289925

这个花了: 0:00:00.909071

这个花了: 0:00:00.917636

这个花了: 0:00:00.908391

这个花了: 0:00:00.896601

这个花了: 0:00:00.908698

这个花了: 0:00:00.890018

这个花了: 0:00:00.896373

当然，每幅图像900毫秒是不够快的视频处理。在阅读了许多线程之后，我发现了两种可能的改进方法：

图形转换工具:为了更快地得到冻结的推理图。(我对此犹豫不决，因为据我所知，我必须从来源构建TF，而且我通常对当前的安装感到满意)
替换喂养: feed_dict={image_tensor: image_np_expanded}似乎不是向TF图提供数据的好方法。QueueRunner对象可以在这里提供帮助。

因此，我的问题是，上述两个改进是否有可能提高对实时使用(10-20 fps)的推断，还是我在这里走错了路，应该尝试其他什么？欢迎任何建议。

performance

tensorflow

cpu

object-detection

回答 1

Stack Overflow用户

发布于 2022-05-26 16:13:28

另一种选择是使用不同的工具箱进行推理，例如OpenVINO。OpenVINO是为英特尔硬件设计的，尽管它应该与任何CPU一起工作。它通过将模型转换为中间表示(中间表示，IR)，进行图形剪枝，并将某些操作合并到其他操作，从而提高模型的准确性。然后，在运行时，它使用矢量化。

将Tensorflow模型转换为OpenVINO非常简单，除非您有漂亮的自定义层。关于如何做到这一点的完整教程可以找到这里。下面是一些片段。

安装OpenVINO

最简单的方法是使用PIP。或者，您可以使用这个工具在您的情况下找到最佳方法。

pip install openvino-dev[tensorflow2]

使用模型优化器转换SavedModel模型

模型优化器是来自OpenVINO开发包的命令行工具。它将Tensorflow模型转换为IR，这是OpenVINO的默认格式。您还可以尝试FP16的精度，这将使您在不降低精度的情况下获得更好的性能(只需更改data_type)。在命令行中运行：

mo --saved_model_dir "model" --input_shape "[1, 3, 224, 224]" --data_type FP32 --output_dir "model_ir"

运行推理

转换后的模型可以由运行时加载，并为特定的设备进行编译，例如CPU或GPU (集成到CPU中，比如Intel HD Graphics)。如果你不知道什么是你最好的选择，只需使用汽车。

# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")

# Get output layer
output_layer_ir = compiled_model_ir.output(0)

# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]

免责声明:我在OpenVINO工作。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/46301822

复制

相似问题

问Tensorflow对象检测推理在CPU上缓慢
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Tensorflow对象检测推理在CPU上缓慢EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Tensorflow对象检测推理在CPU上缓慢
EN