文章/答案/技术大牛

发布

社区首页 >问答首页 >在tensorflow c++中，批处理推理的速度与单图像推理一样慢

问在tensorflow c++中，批处理推理的速度与单图像推理一样慢
EN

Stack Overflow用户

提问于 2019-08-12 19:59:14

回答 1查看 906关注 0票数 0

操作系统:Ubuntu 16.04

版本:Tensorflow c++ 2.0-beta1 (使用所有优化标志编译:AVX FMA SSE4.1SSE4.2FMA XLA)

IDE:eclipse

使用CUDA:否(在我的预测中只有CPU )

我测试过tensorflow c++ api的单映像推理时间是0.02秒，这太慢了，我简直不敢相信，因为我已经编译了tensorflow c++共享库，并进行了所有的优化，如AVX/AVX2/FMA/SSE4.1/SSE4.2/FMA。然而，我必须找到降低成本的解决方案。prediction.Someone告诉我，如果我使用批处理推理而不是单图像inference.Unfortunately，时间可以大大减少，当批处理大小为32时，批处理推理的时间为0.7秒。换句话说，0.7/32=0.02，与单图像推理一样慢。

为了提高tensorflow的c++应用程序接口的推理性能(减少预测时间)，我尝试了tensorflow中的MKL-DNN，但它在提高时间开销方面无效。

一开始，我编译tensorflow c++共享库时没有设置优化标志--AVX/avx2/sse4.1/sse4.2/FMA/XLA，所以每次运行prediction.In时都会出现6个优化警告。我认为，如果我用6个优化flags.However重新编译tensorflow，成本时间将会减少，有优化标志的推理时间与没有这些标志的推理时间几乎相同。

下面的代码是单图像推理。

Session* session;
  Status status = NewSession(SessionOptions(), &session);

  const std::string graph_fn = "/media/root/Ubuntu311/projects/Ecology_projects/JPMVCNN_AlgaeAnalysisMathTestDemo/model-0723/model.meta";
  MetaGraphDef graphdef;
  Status status_load = ReadBinaryProto(Env::Default(), graph_fn, &graphdef); //从meta文件中读取图模型;
  if (!status_load.ok()) {
        std::cout << "ERROR: Loading model failed..." << graph_fn << std::endl;
        std::cout << status_load.ToString() << "\n";
        return -1;
  }

  Status status_create = session->Create(graphdef.graph_def()); //将模型导入会话Session中;
  if (!status_create.ok()) {
        std::cout << "ERROR: Creating graph in session failed..." << status_create.ToString() << std::endl;
        return -1;
  }
//    cout << "Session successfully created.Load model successfully!"<< endl;

  // 读入预先训练好的模型的权重
  const std::string checkpointPath = "/media/root/Ubuntu311/projects/Ecology_projects/JPMVCNN_AlgaeAnalysisMathTestDemo/model-0723/model";
  Tensor checkpointPathTensor(DT_STRING, TensorShape());
  checkpointPathTensor.scalar<std::string>()() = checkpointPath;
  status = session->Run(
          {{ graphdef.saver_def().filename_tensor_name(), checkpointPathTensor },},
          {},{graphdef.saver_def().restore_op_name()},nullptr);
  if (!status.ok())
  {
      throw runtime_error("Error loading checkpoint from " + checkpointPath + ": " + status.ToString());
  }
//    cout << "Load weights successfully!"<< endl;


  //read image for prediction...
  char srcfile[200];
  double alltime=0.0;
  for(int numingroup=0;numingroup<1326;numingroup++)
  {
      sprintf(srcfile, "/media/root/Ubuntu311/projects/Ecology_projects/copy/cnn-imgs96224/%d.JPG",numingroup);
      cv::Mat srcimg=cv::imread(srcfile,0);
      if(!srcimg.data)
      {
          continue;
      }

      Tensor resized_tensor(DT_FLOAT, TensorShape({1,96,224,1}));
      float *imgdata = resized_tensor.flat<float>().data();
      cv::Mat cameraImg(96, 224, CV_32FC1, imgdata);
      srcimg.convertTo(cameraImg, CV_32FC1);
      //对图像做预处理
      cameraImg=cameraImg/255;
//        std::cout <<"Read image successfully: "<< resized_tensor.DebugString()<<endl;

       vector<std::pair<string, Tensor> > inputs;
       std::string Input1Name = "input";
       inputs.push_back(std::make_pair(Input1Name, resized_tensor));
       Tensor is_training_val(DT_BOOL,TensorShape());
       is_training_val.scalar<bool>()()=false;
       std::string Input2Name = "is_training";
       inputs.push_back(std::make_pair(Input2Name, is_training_val));

       vector<tensorflow::Tensor> outputs;
       string output="output";

       cv::TickMeter timer;
       timer.start();
       Status status_run = session->Run(inputs, {output}, {}, &outputs);
       if (!status_run.ok()) {
           std::cout << "ERROR: RUN failed..."  << std::endl;
           std::cout << status_run.ToString() << "\n";
           return -1;
       }

       timer.stop();
       cout<<"single image inference time is: "<<timer.getTimeSec()<<" s."<<endl;
       alltime+=(timer.getTimeSec());
       timer.reset();

      Tensor t = outputs[0];
      int ndim2 = t.shape().dims();
      auto tmap = t.tensor<float, 2>();  // Tensor Shape: [batch_size, target_class_num]
      int output_dim = t.shape().dim_size(1);
      std::vector<double> tout;

      // Argmax: Get Final Prediction Label and Probability
      int output_class_id = -1;
      double output_prob = 0.0;
      for (int j = 0; j < output_dim; j++)
      {
            std::cout << "Class " << j << " prob:" << tmap(0, j) << "," << std::endl;
            if (tmap(0, j) >= output_prob) {
                    output_class_id = j;
                    output_prob = tmap(0, j);
            }
      }
//        std::cout << "Final class id: " << output_class_id << std::endl;
//        std::cout << "Final class prob: " << output_prob << std::endl;
  }

  cout<<"all image have been predicted and time is: "<<alltime<<endl;

下面的信息是输出。

root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/tensorflowtest/Debug# ./tensorflowtest 
2019-08-12 17:44:40.362149: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3407969999 Hz
2019-08-12 17:44:40.362455: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2af1b90 executing computations on platform Host. Devices:
2019-08-12 17:44:40.362469: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
single image inference time is: 0.941759 s.
single image inference time is: 0.0218276 s.
single image inference time is: 0.0230476 s.
single image inference time is: 0.0221443 s.
single image inference time is: 0.0222238 s.
single image inference time is: 0.021393 s.
single image inference time is: 0.0223495 s.
single image inference time is: 0.0227179 s.
single image inference time is: 0.021407 s.
single image inference time is: 0.02372 s.
single image inference time is: 0.0220384 s.
single image inference time is: 0.0225262 s.
single image inference time is: 0.0217821 s.
single image inference time is: 0.0230875 s.
single image inference time is: 0.0228805 s.
single image inference time is: 0.0217929 s.
single image inference time is: 0.0220751 s.
single image inference time is: 0.0281811 s.
single image inference time is: 0.0257438 s.
single image inference time is: 0.0259228 s.
single image inference time is: 0.0264548 s.
single image inference time is: 0.0242932 s.
single image inference time is: 0.025251 s.
single image inference time is: 0.0258176 s.
single image inference time is: 0.025607 s.
single image inference time is: 0.0265529 s.
single image inference time is: 0.0252388 s.
single image inference time is: 0.0229052 s.
single image inference time is: 0.0234532 s.
single image inference time is: 0.0219921 s.
single image inference time is: 0.0222037 s.
single image inference time is: 0.0228582 s.
single image inference time is: 0.0231251 s.
single image inference time is: 0.0211131 s.
single image inference time is: 0.0234812 s.
single image inference time is: 0.0227733 s.
single image inference time is: 0.02183 s.
single image inference time is: 0.0215002 s.
single image inference time is: 0.0222 s.
single image inference time is: 0.022995 s.
single image inference time is: 0.0217708 s.
single image inference time is: 0.0226695 s.
single image inference time is: 0.0234447 s.
...
single image inference time is: 0.0226969 s.
single image inference time is: 0.0216993 s.
single image inference time is: 0.0220073 s.
single image inference time is: 0.0224785 s.
single image inference time is: 0.0219879 s.
single image inference time is: 0.0233075 s.
single image inference time is: 0.0229301 s.
single image inference time is: 0.0215029 s.
single image inference time is: 0.0230741 s.
single image inference time is: 0.0224437 s.
single image inference time is: 0.0220314 s.
single image inference time is: 0.0212338 s.
single image inference time is: 0.0226974 s.
all image have been predicted and time is: 31.1918
root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/tensorflowtest/Debug#

下面的代码是批量推理。

Session* session;
  Status status = NewSession(SessionOptions(), &session);

  const std::string graph_fn = "/media/root/Ubuntu311/projects/Ecology_projects/JPMVCNN_AlgaeAnalysisMathTestDemo/model-0723/model.meta";
  MetaGraphDef graphdef;
  Status status_load = ReadBinaryProto(Env::Default(), graph_fn, &graphdef); //从meta文件中读取图模型;
  if (!status_load.ok()) {
        std::cout << "ERROR: Loading model failed..." << graph_fn << std::endl;
        std::cout << status_load.ToString() << "\n";
        return -1;
  }

  Status status_create = session->Create(graphdef.graph_def()); //将模型导入会话Session中;
  if (!status_create.ok()) {
        std::cout << "ERROR: Creating graph in session failed..." << status_create.ToString() << std::endl;
        return -1;
  }
//    cout << "Session successfully created.Load model successfully!"<< endl;

  // 读入预先训练好的模型的权重
  const std::string checkpointPath = "/media/root/Ubuntu311/projects/Ecology_projects/JPMVCNN_AlgaeAnalysisMathTestDemo/model-0723/model";
  Tensor checkpointPathTensor(DT_STRING, TensorShape());
  checkpointPathTensor.scalar<std::string>()() = checkpointPath;
  status = session->Run(
          {{ graphdef.saver_def().filename_tensor_name(), checkpointPathTensor },},
          {},{graphdef.saver_def().restore_op_name()},nullptr);
  if (!status.ok())
  {
      throw runtime_error("Error loading checkpoint from " + checkpointPath + ": " + status.ToString());
  }
//    cout << "Load weights successfully!"<< endl;


  int cnnrows=96;
  int cnncols=224;
  //read image for prediction...
  char srcfile[200];
  const int imgnum=1326;

  const int batch=32;
  double alltime=0.0;
  //all image inference...
  for(int imgind=0;imgind<imgnum/batch;imgind++)
  {
      //a batch inference...
      tensorflow::Tensor input_tensor(tensorflow::DT_FLOAT, tensorflow::TensorShape({ batch, cnnrows, cnncols, 1 }));
      auto input_tensor_mapped = input_tensor.tensor<float, 4>();

      int batchind=0;
      int imgrealind=imgind*batch;
      for(;batchind!=batch;batchind++)
      {
          sprintf(srcfile, "/media/root/Ubuntu311/projects/Ecology_projects/copy/cnn-imgs96224/%d.JPG",imgrealind);
          cv::Mat srcimg=cv::imread(srcfile,0);
          if(!srcimg.data)
          {
              continue;
          }
          cv::Mat cameraImg(96, 224, CV_32FC1);
          srcimg.convertTo(cameraImg, CV_32FC1);
          cameraImg=cameraImg/255;

          //convert batch cv image to tensor
          for (int y = 0; y < cnnrows; ++y)
          {
              const float* source_row = (float*)cameraImg.data + (y * cnncols);
              for (int x = 0; x < cnncols; ++x)
              {
                    const float* source_pixel = source_row + x;
                    input_tensor_mapped(batchind, y, x, 0) = *source_pixel;
              }
          }
          imgrealind++;
      //a batch image transfer done...
      }

      vector<std::pair<string, Tensor> > inputs;
      std::string Input1Name = "input";
      inputs.push_back(std::make_pair(Input1Name, input_tensor));
      Tensor is_training_val(DT_BOOL,TensorShape());
      is_training_val.scalar<bool>()()=false;
      std::string Input2Name = "is_training";
      inputs.push_back(std::make_pair(Input2Name, is_training_val));

      vector<tensorflow::Tensor> outputs;
      string output="output";
      cv::TickMeter timer;
      timer.start();
      Status status_run = session->Run(inputs, {output}, {}, &outputs);
      if (!status_run.ok()) {
       std::cout << "ERROR: RUN failed..."  << std::endl;
       std::cout << status_run.ToString() << "\n";
       return -1;
      }

      timer.stop();
      cout<<"time of this batch inference is: "<<timer.getTimeSec()<<" s."<<endl;
      alltime+=(timer.getTimeSec());
      timer.reset();

      auto finalOutputTensor  = outputs[0].tensor<float, 2>();
      int output_dim = outputs[0].shape().dim_size(1);
      for(int b=0; b<batch;b++)
      {
          for(int i=0; i<output_dim; i++)
          {
//                cout << b << "the output for class "<<i<<" is "<< finalOutputTensor(b, i) <<endl;
          }
      }
  //all images inference done...
  }

  cout<<"all image have been predicted and time is: "<<alltime<<endl;

下面的信息是它的输出：

root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/tensorflowtest/Debug# ./tensorflowtest 
2019-08-12 17:47:26.517909: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3407969999 Hz
2019-08-12 17:47:26.518092: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1481b90 executing computations on platform Host. Devices:
2019-08-12 17:47:26.518106: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
time of this batch inference is: 1.73786 s.
time of this batch inference is: 0.735492 s.
time of this batch inference is: 0.735382 s.
time of this batch inference is: 0.714616 s.
time of this batch inference is: 0.753576 s.
time of this batch inference is: 0.734335 s.
time of this batch inference is: 0.738822 s.
time of this batch inference is: 0.727782 s.
time of this batch inference is: 0.726601 s.
time of this batch inference is: 0.724234 s.
time of this batch inference is: 0.737588 s.
time of this batch inference is: 0.743579 s.
time of this batch inference is: 0.737886 s.
time of this batch inference is: 0.729694 s.
time of this batch inference is: 0.72652 s.
time of this batch inference is: 0.724418 s.
time of this batch inference is: 0.728979 s.
time of this batch inference is: 0.720166 s.
time of this batch inference is: 0.727582 s.
time of this batch inference is: 0.732912 s.
time of this batch inference is: 0.734843 s.
time of this batch inference is: 0.732175 s.
time of this batch inference is: 0.724297 s.
time of this batch inference is: 0.724738 s.
time of this batch inference is: 0.736695 s.
time of this batch inference is: 0.736627 s.
time of this batch inference is: 0.726824 s.
time of this batch inference is: 0.731248 s.
time of this batch inference is: 0.72861 s.
time of this batch inference is: 0.752497 s.
time of this batch inference is: 0.737133 s.
time of this batch inference is: 0.742782 s.
time of this batch inference is: 0.730087 s.
time of this batch inference is: 0.732464 s.
time of this batch inference is: 0.737972 s.
time of this batch inference is: 0.738182 s.
time of this batch inference is: 0.738349 s.
time of this batch inference is: 0.72544 s.
time of this batch inference is: 0.741428 s.
time of this batch inference is: 0.733115 s.
time of this batch inference is: 0.743221 s.
all image have been predicted and time is: 31.0668
root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/tensorflowtest/Debug#

任何帮助都将不胜感激。

tensorflow

batch-processing

c++

performance

回答 1

Stack Overflow用户

发布于 2020-06-29 21:05:17

对于cpu推断，更多的批处理无能为力，因为cpu是串行计算的。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/57460782

复制

相似问题

问在tensorflow c++中，批处理推理的速度与单图像推理一样慢
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在tensorflow c++中，批处理推理的速度与单图像推理一样慢EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在tensorflow c++中，批处理推理的速度与单图像推理一样慢
EN