首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在tensorflow c++中,批处理推理的速度与单图像推理一样慢

在tensorflow c++中,批处理推理的速度与单图像推理一样慢
EN

Stack Overflow用户
提问于 2019-08-12 19:59:14
回答 1查看 906关注 0票数 0

操作系统:Ubuntu 16.04

版本:Tensorflow c++ 2.0-beta1 (使用所有优化标志编译:AVX FMA SSE4.1SSE4.2FMA XLA)

IDE:eclipse

使用CUDA:否(在我的预测中只有CPU )

我测试过tensorflow c++ api的单映像推理时间是0.02秒,这太慢了,我简直不敢相信,因为我已经编译了tensorflow c++共享库,并进行了所有的优化,如AVX/AVX2/FMA/SSE4.1/SSE4.2/FMA。然而,我必须找到降低成本的解决方案。prediction.Someone告诉我,如果我使用批处理推理而不是单图像inference.Unfortunately,时间可以大大减少,当批处理大小为32时,批处理推理的时间为0.7秒。换句话说,0.7/32=0.02,与单图像推理一样慢。

为了提高tensorflow的c++应用程序接口的推理性能(减少预测时间),我尝试了tensorflow中的MKL-DNN,但它在提高时间开销方面无效。

一开始,我编译tensorflow c++共享库时没有设置优化标志--AVX/avx2/sse4.1/sse4.2/FMA/XLA,所以每次运行prediction.In时都会出现6个优化警告。我认为,如果我用6个优化flags.However重新编译tensorflow,成本时间将会减少,有优化标志的推理时间与没有这些标志的推理时间几乎相同。

下面的代码是单图像推理。

代码语言:javascript
复制
Session* session;
  Status status = NewSession(SessionOptions(), &session);

  const std::string graph_fn = "/media/root/Ubuntu311/projects/Ecology_projects/JPMVCNN_AlgaeAnalysisMathTestDemo/model-0723/model.meta";
  MetaGraphDef graphdef;
  Status status_load = ReadBinaryProto(Env::Default(), graph_fn, &graphdef); //从meta文件中读取图模型;
  if (!status_load.ok()) {
        std::cout << "ERROR: Loading model failed..." << graph_fn << std::endl;
        std::cout << status_load.ToString() << "\n";
        return -1;
  }

  Status status_create = session->Create(graphdef.graph_def()); //将模型导入会话Session中;
  if (!status_create.ok()) {
        std::cout << "ERROR: Creating graph in session failed..." << status_create.ToString() << std::endl;
        return -1;
  }
//    cout << "Session successfully created.Load model successfully!"<< endl;

  // 读入预先训练好的模型的权重
  const std::string checkpointPath = "/media/root/Ubuntu311/projects/Ecology_projects/JPMVCNN_AlgaeAnalysisMathTestDemo/model-0723/model";
  Tensor checkpointPathTensor(DT_STRING, TensorShape());
  checkpointPathTensor.scalar<std::string>()() = checkpointPath;
  status = session->Run(
          {{ graphdef.saver_def().filename_tensor_name(), checkpointPathTensor },},
          {},{graphdef.saver_def().restore_op_name()},nullptr);
  if (!status.ok())
  {
      throw runtime_error("Error loading checkpoint from " + checkpointPath + ": " + status.ToString());
  }
//    cout << "Load weights successfully!"<< endl;


  //read image for prediction...
  char srcfile[200];
  double alltime=0.0;
  for(int numingroup=0;numingroup<1326;numingroup++)
  {
      sprintf(srcfile, "/media/root/Ubuntu311/projects/Ecology_projects/copy/cnn-imgs96224/%d.JPG",numingroup);
      cv::Mat srcimg=cv::imread(srcfile,0);
      if(!srcimg.data)
      {
          continue;
      }

      Tensor resized_tensor(DT_FLOAT, TensorShape({1,96,224,1}));
      float *imgdata = resized_tensor.flat<float>().data();
      cv::Mat cameraImg(96, 224, CV_32FC1, imgdata);
      srcimg.convertTo(cameraImg, CV_32FC1);
      //对图像做预处理
      cameraImg=cameraImg/255;
//        std::cout <<"Read image successfully: "<< resized_tensor.DebugString()<<endl;

       vector<std::pair<string, Tensor> > inputs;
       std::string Input1Name = "input";
       inputs.push_back(std::make_pair(Input1Name, resized_tensor));
       Tensor is_training_val(DT_BOOL,TensorShape());
       is_training_val.scalar<bool>()()=false;
       std::string Input2Name = "is_training";
       inputs.push_back(std::make_pair(Input2Name, is_training_val));

       vector<tensorflow::Tensor> outputs;
       string output="output";

       cv::TickMeter timer;
       timer.start();
       Status status_run = session->Run(inputs, {output}, {}, &outputs);
       if (!status_run.ok()) {
           std::cout << "ERROR: RUN failed..."  << std::endl;
           std::cout << status_run.ToString() << "\n";
           return -1;
       }

       timer.stop();
       cout<<"single image inference time is: "<<timer.getTimeSec()<<" s."<<endl;
       alltime+=(timer.getTimeSec());
       timer.reset();

      Tensor t = outputs[0];
      int ndim2 = t.shape().dims();
      auto tmap = t.tensor<float, 2>();  // Tensor Shape: [batch_size, target_class_num]
      int output_dim = t.shape().dim_size(1);
      std::vector<double> tout;

      // Argmax: Get Final Prediction Label and Probability
      int output_class_id = -1;
      double output_prob = 0.0;
      for (int j = 0; j < output_dim; j++)
      {
            std::cout << "Class " << j << " prob:" << tmap(0, j) << "," << std::endl;
            if (tmap(0, j) >= output_prob) {
                    output_class_id = j;
                    output_prob = tmap(0, j);
            }
      }
//        std::cout << "Final class id: " << output_class_id << std::endl;
//        std::cout << "Final class prob: " << output_prob << std::endl;
  }

  cout<<"all image have been predicted and time is: "<<alltime<<endl;

下面的信息是输出。

代码语言:javascript
复制
root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/tensorflowtest/Debug# ./tensorflowtest 
2019-08-12 17:44:40.362149: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3407969999 Hz
2019-08-12 17:44:40.362455: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2af1b90 executing computations on platform Host. Devices:
2019-08-12 17:44:40.362469: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
single image inference time is: 0.941759 s.
single image inference time is: 0.0218276 s.
single image inference time is: 0.0230476 s.
single image inference time is: 0.0221443 s.
single image inference time is: 0.0222238 s.
single image inference time is: 0.021393 s.
single image inference time is: 0.0223495 s.
single image inference time is: 0.0227179 s.
single image inference time is: 0.021407 s.
single image inference time is: 0.02372 s.
single image inference time is: 0.0220384 s.
single image inference time is: 0.0225262 s.
single image inference time is: 0.0217821 s.
single image inference time is: 0.0230875 s.
single image inference time is: 0.0228805 s.
single image inference time is: 0.0217929 s.
single image inference time is: 0.0220751 s.
single image inference time is: 0.0281811 s.
single image inference time is: 0.0257438 s.
single image inference time is: 0.0259228 s.
single image inference time is: 0.0264548 s.
single image inference time is: 0.0242932 s.
single image inference time is: 0.025251 s.
single image inference time is: 0.0258176 s.
single image inference time is: 0.025607 s.
single image inference time is: 0.0265529 s.
single image inference time is: 0.0252388 s.
single image inference time is: 0.0229052 s.
single image inference time is: 0.0234532 s.
single image inference time is: 0.0219921 s.
single image inference time is: 0.0222037 s.
single image inference time is: 0.0228582 s.
single image inference time is: 0.0231251 s.
single image inference time is: 0.0211131 s.
single image inference time is: 0.0234812 s.
single image inference time is: 0.0227733 s.
single image inference time is: 0.02183 s.
single image inference time is: 0.0215002 s.
single image inference time is: 0.0222 s.
single image inference time is: 0.022995 s.
single image inference time is: 0.0217708 s.
single image inference time is: 0.0226695 s.
single image inference time is: 0.0234447 s.
...
single image inference time is: 0.0226969 s.
single image inference time is: 0.0216993 s.
single image inference time is: 0.0220073 s.
single image inference time is: 0.0224785 s.
single image inference time is: 0.0219879 s.
single image inference time is: 0.0233075 s.
single image inference time is: 0.0229301 s.
single image inference time is: 0.0215029 s.
single image inference time is: 0.0230741 s.
single image inference time is: 0.0224437 s.
single image inference time is: 0.0220314 s.
single image inference time is: 0.0212338 s.
single image inference time is: 0.0226974 s.
all image have been predicted and time is: 31.1918
root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/tensorflowtest/Debug#   

下面的代码是批量推理。

代码语言:javascript
复制
Session* session;
  Status status = NewSession(SessionOptions(), &session);

  const std::string graph_fn = "/media/root/Ubuntu311/projects/Ecology_projects/JPMVCNN_AlgaeAnalysisMathTestDemo/model-0723/model.meta";
  MetaGraphDef graphdef;
  Status status_load = ReadBinaryProto(Env::Default(), graph_fn, &graphdef); //从meta文件中读取图模型;
  if (!status_load.ok()) {
        std::cout << "ERROR: Loading model failed..." << graph_fn << std::endl;
        std::cout << status_load.ToString() << "\n";
        return -1;
  }

  Status status_create = session->Create(graphdef.graph_def()); //将模型导入会话Session中;
  if (!status_create.ok()) {
        std::cout << "ERROR: Creating graph in session failed..." << status_create.ToString() << std::endl;
        return -1;
  }
//    cout << "Session successfully created.Load model successfully!"<< endl;

  // 读入预先训练好的模型的权重
  const std::string checkpointPath = "/media/root/Ubuntu311/projects/Ecology_projects/JPMVCNN_AlgaeAnalysisMathTestDemo/model-0723/model";
  Tensor checkpointPathTensor(DT_STRING, TensorShape());
  checkpointPathTensor.scalar<std::string>()() = checkpointPath;
  status = session->Run(
          {{ graphdef.saver_def().filename_tensor_name(), checkpointPathTensor },},
          {},{graphdef.saver_def().restore_op_name()},nullptr);
  if (!status.ok())
  {
      throw runtime_error("Error loading checkpoint from " + checkpointPath + ": " + status.ToString());
  }
//    cout << "Load weights successfully!"<< endl;


  int cnnrows=96;
  int cnncols=224;
  //read image for prediction...
  char srcfile[200];
  const int imgnum=1326;

  const int batch=32;
  double alltime=0.0;
  //all image inference...
  for(int imgind=0;imgind<imgnum/batch;imgind++)
  {
      //a batch inference...
      tensorflow::Tensor input_tensor(tensorflow::DT_FLOAT, tensorflow::TensorShape({ batch, cnnrows, cnncols, 1 }));
      auto input_tensor_mapped = input_tensor.tensor<float, 4>();

      int batchind=0;
      int imgrealind=imgind*batch;
      for(;batchind!=batch;batchind++)
      {
          sprintf(srcfile, "/media/root/Ubuntu311/projects/Ecology_projects/copy/cnn-imgs96224/%d.JPG",imgrealind);
          cv::Mat srcimg=cv::imread(srcfile,0);
          if(!srcimg.data)
          {
              continue;
          }
          cv::Mat cameraImg(96, 224, CV_32FC1);
          srcimg.convertTo(cameraImg, CV_32FC1);
          cameraImg=cameraImg/255;

          //convert batch cv image to tensor
          for (int y = 0; y < cnnrows; ++y)
          {
              const float* source_row = (float*)cameraImg.data + (y * cnncols);
              for (int x = 0; x < cnncols; ++x)
              {
                    const float* source_pixel = source_row + x;
                    input_tensor_mapped(batchind, y, x, 0) = *source_pixel;
              }
          }
          imgrealind++;
      //a batch image transfer done...
      }

      vector<std::pair<string, Tensor> > inputs;
      std::string Input1Name = "input";
      inputs.push_back(std::make_pair(Input1Name, input_tensor));
      Tensor is_training_val(DT_BOOL,TensorShape());
      is_training_val.scalar<bool>()()=false;
      std::string Input2Name = "is_training";
      inputs.push_back(std::make_pair(Input2Name, is_training_val));

      vector<tensorflow::Tensor> outputs;
      string output="output";
      cv::TickMeter timer;
      timer.start();
      Status status_run = session->Run(inputs, {output}, {}, &outputs);
      if (!status_run.ok()) {
       std::cout << "ERROR: RUN failed..."  << std::endl;
       std::cout << status_run.ToString() << "\n";
       return -1;
      }

      timer.stop();
      cout<<"time of this batch inference is: "<<timer.getTimeSec()<<" s."<<endl;
      alltime+=(timer.getTimeSec());
      timer.reset();

      auto finalOutputTensor  = outputs[0].tensor<float, 2>();
      int output_dim = outputs[0].shape().dim_size(1);
      for(int b=0; b<batch;b++)
      {
          for(int i=0; i<output_dim; i++)
          {
//                cout << b << "the output for class "<<i<<" is "<< finalOutputTensor(b, i) <<endl;
          }
      }
  //all images inference done...
  }

  cout<<"all image have been predicted and time is: "<<alltime<<endl;

下面的信息是它的输出:

代码语言:javascript
复制
root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/tensorflowtest/Debug# ./tensorflowtest 
2019-08-12 17:47:26.517909: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3407969999 Hz
2019-08-12 17:47:26.518092: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1481b90 executing computations on platform Host. Devices:
2019-08-12 17:47:26.518106: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
time of this batch inference is: 1.73786 s.
time of this batch inference is: 0.735492 s.
time of this batch inference is: 0.735382 s.
time of this batch inference is: 0.714616 s.
time of this batch inference is: 0.753576 s.
time of this batch inference is: 0.734335 s.
time of this batch inference is: 0.738822 s.
time of this batch inference is: 0.727782 s.
time of this batch inference is: 0.726601 s.
time of this batch inference is: 0.724234 s.
time of this batch inference is: 0.737588 s.
time of this batch inference is: 0.743579 s.
time of this batch inference is: 0.737886 s.
time of this batch inference is: 0.729694 s.
time of this batch inference is: 0.72652 s.
time of this batch inference is: 0.724418 s.
time of this batch inference is: 0.728979 s.
time of this batch inference is: 0.720166 s.
time of this batch inference is: 0.727582 s.
time of this batch inference is: 0.732912 s.
time of this batch inference is: 0.734843 s.
time of this batch inference is: 0.732175 s.
time of this batch inference is: 0.724297 s.
time of this batch inference is: 0.724738 s.
time of this batch inference is: 0.736695 s.
time of this batch inference is: 0.736627 s.
time of this batch inference is: 0.726824 s.
time of this batch inference is: 0.731248 s.
time of this batch inference is: 0.72861 s.
time of this batch inference is: 0.752497 s.
time of this batch inference is: 0.737133 s.
time of this batch inference is: 0.742782 s.
time of this batch inference is: 0.730087 s.
time of this batch inference is: 0.732464 s.
time of this batch inference is: 0.737972 s.
time of this batch inference is: 0.738182 s.
time of this batch inference is: 0.738349 s.
time of this batch inference is: 0.72544 s.
time of this batch inference is: 0.741428 s.
time of this batch inference is: 0.733115 s.
time of this batch inference is: 0.743221 s.
all image have been predicted and time is: 31.0668
root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/tensorflowtest/Debug# 

任何帮助都将不胜感激。

EN

回答 1

Stack Overflow用户

发布于 2020-06-29 21:05:17

对于cpu推断,更多的批处理无能为力,因为cpu是串行计算的。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57460782

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档