无效的参数:通过编辑标签数在汇总直方图中的Nan

我已经将的标签数量从19个减少到10个。我的目标是更改数据集,以便解码器需要重新学习权重,作为增加解码器输出类的预演运动。

我正在使用的网络是deeplab,最初的培训过程很好。发生错误之前,大约执行了500个步骤。

(下面的代码不是从训练开始后的第一行开始的)

I1111 16:19:23.461441 140502638323520 basic_session_run_hooks.py:692] global_step/sec: 1.82067
Total loss is :[6.42209053]
INFO:tensorflow:global_step/sec: 1.84064
I1111 16:19:28.894436 140502638323520 basic_session_run_hooks.py:692] global_step/sec: 1.84064
Total loss is :[6.23576546]
INFO:tensorflow:global_step/sec: 1.84368
I1111 16:19:34.318257 140502638323520 basic_session_run_hooks.py:692] global_step/sec: 1.84368
Total loss is :[6.09628582]
INFO:tensorflow:global_step/sec: 1.83645
I1111 16:19:39.763585 140502638323520 basic_session_run_hooks.py:692] global_step/sec: 1.83645
Total loss is :[6.20008707]
INFO:tensorflow:global_step/sec: 1.84192
I1111 16:19:45.192930 140502638323520 basic_session_run_hooks.py:692] global_step/sec: 1.84192
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py",line 1356,in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py",line 1341,in _run_fn
    options,feed_dict,fetch_list,target_list,run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py",line 1429,in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Nan in summary histogram for: image_pooling/BatchNorm/moving_variance_1
     [[{{node image_pooling/BatchNorm/moving_variance_1}}]]
     [[Mean_225/_10177]]
  (1) Invalid argument: Nan in summary histogram for: image_pooling/BatchNorm/moving_variance_1
     [[{{node image_pooling/BatchNorm/moving_variance_1}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception,another exception occurred:

Traceback (most recent call last):
  File "/home/zwang/workspace//models-master/research/deeplab/train.py",line 521,in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py",line 40,in run
    _run(main=main,argv=argv,flags_parser=_parse_flags_tolerate_undef)
  File "/home/zwang/.local/lib/python3.6/site-packages/absl/app.py",line 299,in run
    _run_main(main,args)
  File "/home/zwang/.local/lib/python3.6/site-packages/absl/app.py",line 250,in _run_main
    sys.exit(main(argv))
  File "/home/zwang/workspace//models-master/research/deeplab/train.py",line 515,in main
    sess.run([train_tensor])
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py",line 754,in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py",line 1252,line 1353,in run
    raise six.reraise(*original_exc_info)
  File "/home/zwang/.local/lib/python3.6/site-packages/six.py",line 693,in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py",line 1338,in run
    return self._sess.run(*args,**kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py",line 1411,line 1169,**kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py",line 950,in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py",line 1173,in _run
    feed_dict_tensor,options,line 1350,in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py",line 1370,in _do_call
    raise type(e)(node_def,op,message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Nan in summary histogram for: image_pooling/BatchNorm/moving_variance_1
     [[node image_pooling/BatchNorm/moving_variance_1 (defined at home/zwang/workspace//models-master/research/deeplab/train.py:328) ]]
     [[Mean_225/_10177]]
  (1) Invalid argument: Nan in summary histogram for: image_pooling/BatchNorm/moving_variance_1
     [[node image_pooling/BatchNorm/moving_variance_1 (defined at home/zwang/workspace//models-master/research/deeplab/train.py:328) ]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node image_pooling/BatchNorm/moving_variance_1:
 image_pooling/BatchNorm/moving_variance/read (defined at home/zwang/workspace/models-master/research/deeplab/model.py:478)

Input Source operations connected to node image_pooling/BatchNorm/moving_variance_1:
 image_pooling/BatchNorm/moving_variance/read (defined at home/zwang/workspace/models-master/research/deeplab/model.py:478)

Original stack trace for 'image_pooling/BatchNorm/moving_variance_1':
  File "home/zwang/workspace//models-master/research/deeplab/train.py",in <module>
    tf.app.run()
  File "usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py",flags_parser=_parse_flags_tolerate_undef)
  File "home/zwang/.local/lib/python3.6/site-packages/absl/app.py",args)
  File "home/zwang/.local/lib/python3.6/site-packages/absl/app.py",in _run_main
    sys.exit(main(argv))
  File "home/zwang/workspace//models-master/research/deeplab/train.py",line 472,in main
    dataset.ignore_label)
  File "home/zwang/workspace//models-master/research/deeplab/train.py",line 379,in _train_deeplab_model
    reuse_variable=(i != 0))
  File "home/zwang/workspace//models-master/research/deeplab/train.py",line 275,in _tower_loss
    _build_deeplab(iterator,{common.OUTPUT_TYPE: num_of_classes},ignore_label)
  File "home/zwang/workspace//models-master/research/deeplab/train.py",line 257,in _build_deeplab
    output_type_dict[model.MERGED_LOGITS_SCOPE])
  File "home/zwang/workspace//models-master/research/deeplab/train.py",line 328,in _log_summaries
    tf.summary.histogram(model_var.op.name,model_var)
  File "usr/local/lib/python3.6/dist-packages/tensorflow/python/summary/summary.py",line 179,in histogram
    tag=tag,values=values,name=scope)
  File "usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_logging_ops.py",line 329,in histogram_summary
    "HistogramSummary",tag=tag,name=name)
  File "usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py",line 788,in _apply_op_helper
    op_def=op_def)
  File "usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py",line 507,in new_func
    return func(*args,**kwargs)
  File "usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py",line 3616,in create_op
    op_def=op_def)
  File "usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py",line 2005,in __init__
    self._traceback = tf_stack.extract_stack()

我认为错误

  (0) Invalid argument: Nan in summary histogram for: image_pooling/BatchNorm/moving_variance_1

好像是张量板错误,有什么办法可以避免它?

由于我的训练已完成30000步中的500步,所以没有任何问题。我希望没有函数的某些部分(例如张量板的直方图),或者通过在其他地方_(也许需要编辑the_num_of_classes的另一个参数)来编辑num_of_labels_,训练过程将正常运行。

您能否针对此错误或我的一般方法提出一些建议?谢谢

最好的问候

Z

wang_hecheng 回答:无效的参数:通过编辑标签数在汇总直方图中的Nan

通过调整用于训练的超参数解决了问题,例如降低学习速率以稳定训练过程。

本文链接:https://www.f2er.com/3122853.html

大家都在问