操作系统:Ubuntu 16.04
版本:Tensorflow c++ 2.0-beta1 (使用所有优化标志编译:AVX FMA SSE4.1SSE4.2FMA XLA)
IDE:eclipse
使用CUDA:否(在我的预测中只有CPU )
我测试过tensorflow c++ api的单映像推理时间是0.02秒,这太慢了,我简直不敢相信,因为我已经编译了tensorflow c++共享库,并进行了所有的优化,如AVX/AVX2/FMA/SSE4.1/SSE4.2/FMA。然而,我必须找到降低成本的解决方案。prediction.Someone告诉我,如果我使用批处理推理而不是单图像inference.Unfortunately,时间可以大大减少,当批处理大小为32时,批处理推理的时间为0.7秒。换句话说,0.7/32=0.02,与单图像推理一样慢。
为了提高tensorflow的c++应用程序接口的推理性能(减少预测时间),我尝试了tensorflow中的MKL-DNN,但它在提高时间开销方面无效。
一开始,我编译tensorflow c++共享库时没有设置优化标志--AVX/avx2/sse4.1/sse4.2/FMA/XLA,所以每次运行prediction.In时都会出现6个优化警告。我认为,如果我用6个优化flags.However重新编译tensorflow,成本时间将会减少,有优化标志的推理时间与没有这些标志的推理时间几乎相同。
下面的代码是单图像推理。
Session* session;
Status status = NewSession(SessionOptions(), &session);
const std::string graph_fn = "/media/root/Ubuntu311/projects/Ecology_projects/JPMVCNN_AlgaeAnalysisMathTestDemo/model-0723/model.meta";
MetaGraphDef graphdef;
Status status_load = ReadBinaryProto(Env::Default(), graph_fn, &graphdef); //从meta文件中读取图模型;
if (!status_load.ok()) {
std::cout << "ERROR: Loading model failed..." << graph_fn << std::endl;
std::cout << status_load.ToString() << "\n";
return -1;
}
Status status_create = session->Create(graphdef.graph_def()); //将模型导入会话Session中;
if (!status_create.ok()) {
std::cout << "ERROR: Creating graph in session failed..." << status_create.ToString() << std::endl;
return -1;
}
// cout << "Session successfully created.Load model successfully!"<< endl;
// 读入预先训练好的模型的权重
const std::string checkpointPath = "/media/root/Ubuntu311/projects/Ecology_projects/JPMVCNN_AlgaeAnalysisMathTestDemo/model-0723/model";
Tensor checkpointPathTensor(DT_STRING, TensorShape());
checkpointPathTensor.scalar<std::string>()() = checkpointPath;
status = session->Run(
{{ graphdef.saver_def().filename_tensor_name(), checkpointPathTensor },},
{},{graphdef.saver_def().restore_op_name()},nullptr);
if (!status.ok())
{
throw runtime_error("Error loading checkpoint from " + checkpointPath + ": " + status.ToString());
}
// cout << "Load weights successfully!"<< endl;
//read image for prediction...
char srcfile[200];
double alltime=0.0;
for(int numingroup=0;numingroup<1326;numingroup++)
{
sprintf(srcfile, "/media/root/Ubuntu311/projects/Ecology_projects/copy/cnn-imgs96224/%d.JPG",numingroup);
cv::Mat srcimg=cv::imread(srcfile,0);
if(!srcimg.data)
{
continue;
}
Tensor resized_tensor(DT_FLOAT, TensorShape({1,96,224,1}));
float *imgdata = resized_tensor.flat<float>().data();
cv::Mat cameraImg(96, 224, CV_32FC1, imgdata);
srcimg.convertTo(cameraImg, CV_32FC1);
//对图像做预处理
cameraImg=cameraImg/255;
// std::cout <<"Read image successfully: "<< resized_tensor.DebugString()<<endl;
vector<std::pair<string, Tensor> > inputs;
std::string Input1Name = "input";
inputs.push_back(std::make_pair(Input1Name, resized_tensor));
Tensor is_training_val(DT_BOOL,TensorShape());
is_training_val.scalar<bool>()()=false;
std::string Input2Name = "is_training";
inputs.push_back(std::make_pair(Input2Name, is_training_val));
vector<tensorflow::Tensor> outputs;
string output="output";
cv::TickMeter timer;
timer.start();
Status status_run = session->Run(inputs, {output}, {}, &outputs);
if (!status_run.ok()) {
std::cout << "ERROR: RUN failed..." << std::endl;
std::cout << status_run.ToString() << "\n";
return -1;
}
timer.stop();
cout<<"single image inference time is: "<<timer.getTimeSec()<<" s."<<endl;
alltime+=(timer.getTimeSec());
timer.reset();
Tensor t = outputs[0];
int ndim2 = t.shape().dims();
auto tmap = t.tensor<float, 2>(); // Tensor Shape: [batch_size, target_class_num]
int output_dim = t.shape().dim_size(1);
std::vector<double> tout;
// Argmax: Get Final Prediction Label and Probability
int output_class_id = -1;
double output_prob = 0.0;
for (int j = 0; j < output_dim; j++)
{
std::cout << "Class " << j << " prob:" << tmap(0, j) << "," << std::endl;
if (tmap(0, j) >= output_prob) {
output_class_id = j;
output_prob = tmap(0, j);
}
}
// std::cout << "Final class id: " << output_class_id << std::endl;
// std::cout << "Final class prob: " << output_prob << std::endl;
}
cout<<"all image have been predicted and time is: "<<alltime<<endl;下面的信息是输出。
root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/tensorflowtest/Debug# ./tensorflowtest
2019-08-12 17:44:40.362149: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3407969999 Hz
2019-08-12 17:44:40.362455: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2af1b90 executing computations on platform Host. Devices:
2019-08-12 17:44:40.362469: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
single image inference time is: 0.941759 s.
single image inference time is: 0.0218276 s.
single image inference time is: 0.0230476 s.
single image inference time is: 0.0221443 s.
single image inference time is: 0.0222238 s.
single image inference time is: 0.021393 s.
single image inference time is: 0.0223495 s.
single image inference time is: 0.0227179 s.
single image inference time is: 0.021407 s.
single image inference time is: 0.02372 s.
single image inference time is: 0.0220384 s.
single image inference time is: 0.0225262 s.
single image inference time is: 0.0217821 s.
single image inference time is: 0.0230875 s.
single image inference time is: 0.0228805 s.
single image inference time is: 0.0217929 s.
single image inference time is: 0.0220751 s.
single image inference time is: 0.0281811 s.
single image inference time is: 0.0257438 s.
single image inference time is: 0.0259228 s.
single image inference time is: 0.0264548 s.
single image inference time is: 0.0242932 s.
single image inference time is: 0.025251 s.
single image inference time is: 0.0258176 s.
single image inference time is: 0.025607 s.
single image inference time is: 0.0265529 s.
single image inference time is: 0.0252388 s.
single image inference time is: 0.0229052 s.
single image inference time is: 0.0234532 s.
single image inference time is: 0.0219921 s.
single image inference time is: 0.0222037 s.
single image inference time is: 0.0228582 s.
single image inference time is: 0.0231251 s.
single image inference time is: 0.0211131 s.
single image inference time is: 0.0234812 s.
single image inference time is: 0.0227733 s.
single image inference time is: 0.02183 s.
single image inference time is: 0.0215002 s.
single image inference time is: 0.0222 s.
single image inference time is: 0.022995 s.
single image inference time is: 0.0217708 s.
single image inference time is: 0.0226695 s.
single image inference time is: 0.0234447 s.
...
single image inference time is: 0.0226969 s.
single image inference time is: 0.0216993 s.
single image inference time is: 0.0220073 s.
single image inference time is: 0.0224785 s.
single image inference time is: 0.0219879 s.
single image inference time is: 0.0233075 s.
single image inference time is: 0.0229301 s.
single image inference time is: 0.0215029 s.
single image inference time is: 0.0230741 s.
single image inference time is: 0.0224437 s.
single image inference time is: 0.0220314 s.
single image inference time is: 0.0212338 s.
single image inference time is: 0.0226974 s.
all image have been predicted and time is: 31.1918
root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/tensorflowtest/Debug# 下面的代码是批量推理。
Session* session;
Status status = NewSession(SessionOptions(), &session);
const std::string graph_fn = "/media/root/Ubuntu311/projects/Ecology_projects/JPMVCNN_AlgaeAnalysisMathTestDemo/model-0723/model.meta";
MetaGraphDef graphdef;
Status status_load = ReadBinaryProto(Env::Default(), graph_fn, &graphdef); //从meta文件中读取图模型;
if (!status_load.ok()) {
std::cout << "ERROR: Loading model failed..." << graph_fn << std::endl;
std::cout << status_load.ToString() << "\n";
return -1;
}
Status status_create = session->Create(graphdef.graph_def()); //将模型导入会话Session中;
if (!status_create.ok()) {
std::cout << "ERROR: Creating graph in session failed..." << status_create.ToString() << std::endl;
return -1;
}
// cout << "Session successfully created.Load model successfully!"<< endl;
// 读入预先训练好的模型的权重
const std::string checkpointPath = "/media/root/Ubuntu311/projects/Ecology_projects/JPMVCNN_AlgaeAnalysisMathTestDemo/model-0723/model";
Tensor checkpointPathTensor(DT_STRING, TensorShape());
checkpointPathTensor.scalar<std::string>()() = checkpointPath;
status = session->Run(
{{ graphdef.saver_def().filename_tensor_name(), checkpointPathTensor },},
{},{graphdef.saver_def().restore_op_name()},nullptr);
if (!status.ok())
{
throw runtime_error("Error loading checkpoint from " + checkpointPath + ": " + status.ToString());
}
// cout << "Load weights successfully!"<< endl;
int cnnrows=96;
int cnncols=224;
//read image for prediction...
char srcfile[200];
const int imgnum=1326;
const int batch=32;
double alltime=0.0;
//all image inference...
for(int imgind=0;imgind<imgnum/batch;imgind++)
{
//a batch inference...
tensorflow::Tensor input_tensor(tensorflow::DT_FLOAT, tensorflow::TensorShape({ batch, cnnrows, cnncols, 1 }));
auto input_tensor_mapped = input_tensor.tensor<float, 4>();
int batchind=0;
int imgrealind=imgind*batch;
for(;batchind!=batch;batchind++)
{
sprintf(srcfile, "/media/root/Ubuntu311/projects/Ecology_projects/copy/cnn-imgs96224/%d.JPG",imgrealind);
cv::Mat srcimg=cv::imread(srcfile,0);
if(!srcimg.data)
{
continue;
}
cv::Mat cameraImg(96, 224, CV_32FC1);
srcimg.convertTo(cameraImg, CV_32FC1);
cameraImg=cameraImg/255;
//convert batch cv image to tensor
for (int y = 0; y < cnnrows; ++y)
{
const float* source_row = (float*)cameraImg.data + (y * cnncols);
for (int x = 0; x < cnncols; ++x)
{
const float* source_pixel = source_row + x;
input_tensor_mapped(batchind, y, x, 0) = *source_pixel;
}
}
imgrealind++;
//a batch image transfer done...
}
vector<std::pair<string, Tensor> > inputs;
std::string Input1Name = "input";
inputs.push_back(std::make_pair(Input1Name, input_tensor));
Tensor is_training_val(DT_BOOL,TensorShape());
is_training_val.scalar<bool>()()=false;
std::string Input2Name = "is_training";
inputs.push_back(std::make_pair(Input2Name, is_training_val));
vector<tensorflow::Tensor> outputs;
string output="output";
cv::TickMeter timer;
timer.start();
Status status_run = session->Run(inputs, {output}, {}, &outputs);
if (!status_run.ok()) {
std::cout << "ERROR: RUN failed..." << std::endl;
std::cout << status_run.ToString() << "\n";
return -1;
}
timer.stop();
cout<<"time of this batch inference is: "<<timer.getTimeSec()<<" s."<<endl;
alltime+=(timer.getTimeSec());
timer.reset();
auto finalOutputTensor = outputs[0].tensor<float, 2>();
int output_dim = outputs[0].shape().dim_size(1);
for(int b=0; b<batch;b++)
{
for(int i=0; i<output_dim; i++)
{
// cout << b << "the output for class "<<i<<" is "<< finalOutputTensor(b, i) <<endl;
}
}
//all images inference done...
}
cout<<"all image have been predicted and time is: "<<alltime<<endl;下面的信息是它的输出:
root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/tensorflowtest/Debug# ./tensorflowtest
2019-08-12 17:47:26.517909: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3407969999 Hz
2019-08-12 17:47:26.518092: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1481b90 executing computations on platform Host. Devices:
2019-08-12 17:47:26.518106: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
time of this batch inference is: 1.73786 s.
time of this batch inference is: 0.735492 s.
time of this batch inference is: 0.735382 s.
time of this batch inference is: 0.714616 s.
time of this batch inference is: 0.753576 s.
time of this batch inference is: 0.734335 s.
time of this batch inference is: 0.738822 s.
time of this batch inference is: 0.727782 s.
time of this batch inference is: 0.726601 s.
time of this batch inference is: 0.724234 s.
time of this batch inference is: 0.737588 s.
time of this batch inference is: 0.743579 s.
time of this batch inference is: 0.737886 s.
time of this batch inference is: 0.729694 s.
time of this batch inference is: 0.72652 s.
time of this batch inference is: 0.724418 s.
time of this batch inference is: 0.728979 s.
time of this batch inference is: 0.720166 s.
time of this batch inference is: 0.727582 s.
time of this batch inference is: 0.732912 s.
time of this batch inference is: 0.734843 s.
time of this batch inference is: 0.732175 s.
time of this batch inference is: 0.724297 s.
time of this batch inference is: 0.724738 s.
time of this batch inference is: 0.736695 s.
time of this batch inference is: 0.736627 s.
time of this batch inference is: 0.726824 s.
time of this batch inference is: 0.731248 s.
time of this batch inference is: 0.72861 s.
time of this batch inference is: 0.752497 s.
time of this batch inference is: 0.737133 s.
time of this batch inference is: 0.742782 s.
time of this batch inference is: 0.730087 s.
time of this batch inference is: 0.732464 s.
time of this batch inference is: 0.737972 s.
time of this batch inference is: 0.738182 s.
time of this batch inference is: 0.738349 s.
time of this batch inference is: 0.72544 s.
time of this batch inference is: 0.741428 s.
time of this batch inference is: 0.733115 s.
time of this batch inference is: 0.743221 s.
all image have been predicted and time is: 31.0668
root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/tensorflowtest/Debug# 任何帮助都将不胜感激。
发布于 2020-06-29 21:05:17
对于cpu推断,更多的批处理无能为力,因为cpu是串行计算的。
https://stackoverflow.com/questions/57460782
复制相似问题