我已经从源代码构建了Tensorflow,并且我正在使用它的C API。到目前为止一切运行正常,我也在使用AVX / AVX2。我从源代码构建的Tensorflow也是使用XLA支持构建的。我现在还想激活XLA (加速线性代数),因为我希望它能再次提高推理过程中的性能/速度。
如果我现在开始我的运行,我会得到这样的消息:
2019-06-17 16:09:06.753737: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1541] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.在XLA官方主页(https://www.tensorflow.org/xla/jit)上,我找到了关于如何在会话级别上打开jit的信息:
# Config to turn on JIT compilation
config = tf.ConfigProto()
config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
sess = tf.Session(config=config)这里(https://github.com/tensorflow/tensorflow/issues/13853)解释了如何在C中设置TF_SetConfig。在使用以下Python代码的输出之前,我可以将其限制为一个内核:
config1 = tf.ConfigProto(device_count={'CPU':1})
serialized1 = config1.SerializeToString()
print(list(map(hex, serialized1)))我实现它的方式如下:
uint8_t intra_op_parallelism_threads = maxCores; // for operations that can be parallelized internally, such as matrix multiplication
uint8_t inter_op_parallelism_threads = maxCores; // for operations that are independent in your TensorFlow graph because there is no directed path between them in the dataflow graph
uint8_t config[]={0x10,intra_op_parallelism_threads,0x28,inter_op_parallelism_threads};
TF_SetConfig(sess_opts,config,sizeof(config),status);因此,我认为这将有助于XLA激活:
config= tf.ConfigProto()
config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
output = config.SerializeToString()
print(list(map(hex, output)))这次实现:
uint8_t config[]={0x52,0x4,0x1a,0x2,0x28,0x1};
TF_SetConfig(sess_opts,config,sizeof(config),status);然而,XLA似乎仍然处于停用状态。有人能帮我解决这个问题吗?或者,如果你再一次看到警告:
2019-06-17 16:09:06.753737: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1541] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.这是否意味着我必须在构建过程中设置XLA_FLAGS?
提前感谢!
发布于 2019-06-19 16:42:13
好了,我知道如何使用XLA JIT,它只在c_api_experimental.h头中可用。只需包含此标头,然后使用:
TF_EnableXLACompilation(sess_opts,true);发布于 2019-07-18 15:12:04
@tre95我试过了
#include "c_api_experimental.h" TF_SessionOptions* options = TF_NewSessionOptions(); TF_EnableXLACompilation(options,true);
但它编译失败,并返回错误collect2: error: ld returned 1 exit status .However,如果我不这样做,它可以成功编译和运行。
https://stackoverflow.com/questions/56633372
复制相似问题