如何使用Estimator API启用GNMT TF代码自动混合精度以进行评估

我正在尝试在裸机系统上运行GnmT TF代码,并且已经设置了CUDA堆栈和tensorflow-gpu v1.15。 Tensorflow从1.14到1.15进行了一些API更改,但解决了这些问题后,我得以运行代码进行培训和评估。

但是,从NGC容器中查看日志并进行比较,我发现此裸机运行未使用AMP。我研究了Nvidia的文档,并找到了启用它进行培训here的方法。
我在here之前添加了以下行:

opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)

但是,我看不到自动混合精度用于评估,因为优化器仅在Backprop期间调用。因此,我尝试通过修改estimator.py中的图形配置向eval_fn()添加mixed_precision_rewrite来修改eval函数:

def eval_fn(hparams,ckpt=None,only_translate=False):
  model_fn = make_model_fn(hparams)
  sess_config = tf.ConfigProto(allow_soft_placement=True)
  sess_config.graph_options.rewrite_options.auto_mixed_precision=1
  config = tf.estimator.Runconfig(
        log_step_count_steps=hparams.log_step_count_steps,session_config=sess_config)
  pred_estimator = tf.estimator.Estimator(
      model_fn=model_fn,model_dir=hparams.output_dir,config=config)
  return get_metrics(hparams,model_fn,pred_estimator,ckpt,only_translate=only_translate)

并注释掉this call

但是,这会导致运行错误:

Colocation members,user-requested devices,and framework assigned devices,if any:
  tower_0/v0/index_to_string/hash_table (HashTableV2) /device:GPU:0
  tower_0/v0/index_to_string/table_init/InitializetableFromTextFileV2 (InitializetableFromTextFileV2) /device:GPU:0
  tower_0/v0/hash_table_Lookup/LookupTableFindV2 (LookupTableFindV2) /device:GPU:0

2019-11-07 07:51:24.124179: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1767] Running auto_mixed_precision graph optimizer
2019-11-07 07:51:24.124776: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found,nothing to do
2019-11-07 07:51:24.803817: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1767] Running auto_mixed_precision graph optimizer
2019-11-07 07:51:24.804442: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found,nothing to do
I1107 07:51:24.825255 140735364352992 session_manager.py:500] Running local_init_op.
2019-11-07 07:51:24.846707: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1767] Running auto_mixed_precision graph optimizer
2019-11-07 07:51:24.846978: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found,nothing to do
2019-11-07 07:51:24.870466: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file results/vocab.bpe.32000.en is already initialized.
I1107 07:51:24.872127 140735364352992 session_manager.py:502] Done running local_init_op.
2019-11-07 07:51:24.902816: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1767] Running auto_mixed_precision graph optimizer
2019-11-07 07:51:24.903393: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found,nothing to do
2019-11-07 07:51:24.950724: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1767] Running auto_mixed_precision graph optimizer
2019-11-07 07:51:24.951080: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found,nothing to do
2019-11-07 07:51:24.958353: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found,nothing to do
2019-11-07 07:51:24.960220: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found,nothing to do
2019-11-07 07:51:24.961727: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found,nothing to do
2019-11-07 07:51:24.963636: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found,nothing to do
2019-11-07 07:51:24.965878: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found,nothing to do
2019-11-07 07:51:24.967928: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found,nothing to do
2019-11-07 07:51:25.309130: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1767] Running auto_mixed_precision graph optimizer
2019-11-07 07:51:25.319260: W tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1775] auto_mixed_precision graph optimizer FAILED: Failed precondition: Expected exactly 1 output from port tower_0/v0/dynamic_seq2seq/decoder/decoder/while/NextIteration_22:0,got 2
2019-11-07 07:51:25.319653: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] auto_mixed_precision failed: Failed precondition: Expected exactly 1 output from port tower_0/v0/dynamic_seq2seq/decoder/decoder/while/NextIteration_22:0,got 2
2019-11-07 07:51:25.497377: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
I1107 07:53:57.598690 140735364352992 estimator.py:748] Writing to file results/newstest2014_out_4000.tok.de
W1107 07:53:57.614538 140735364352992 deprecation_wrapper.py:119] From /home/mayroy13/Mayank/Mayank/test/nvidia_tf_examples/gnmt_v2/estimator.py:758: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

W1107 07:53:57.615267 140735364352992 deprecation_wrapper.py:119] From /home/mayroy13/Mayank/Mayank/test/nvidia_tf_examples/gnmt_v2/estimator.py:685: The name tf.gfile.Remove is deprecated. Please use tf.io.gfile.remove instead.

W1107 07:53:57.615499 140735364352992 deprecation_wrapper.py:119] From /home/mayroy13/Mayank/Mayank/test/nvidia_tf_examples/gnmt_v2/estimator.py:686: The name tf.gfile.Copy is deprecated. Please use tf.io.gfile.copy instead.

Warning: No built-in rules for language de.
Detokenizer Version $Revision: 4134 $
Language: de

任何潜在客户都将有助于启用自动混合精度进行评估。谢谢:)

我也在here的Nvidia仓库中添加了一个问题。

c17789642 回答:如何使用Estimator API启用GNMT TF代码自动混合精度以进行评估

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/3124284.html

大家都在问