我已经将的标签数量从19个减少到10个。我的目标是更改数据集,以便解码器需要重新学习权重,作为增加解码器输出类的预演运动。
我正在使用的网络是deeplab,最初的培训过程很好。发生错误之前,大约执行了500个步骤。
(下面的代码不是从训练开始后的第一行开始的)
I1111 16:19:23.461441 140502638323520 basic_session_run_hooks.py:692] global_step/sec: 1.82067
Total loss is :[6.42209053]
INFO:tensorflow:global_step/sec: 1.84064
I1111 16:19:28.894436 140502638323520 basic_session_run_hooks.py:692] global_step/sec: 1.84064
Total loss is :[6.23576546]
INFO:tensorflow:global_step/sec: 1.84368
I1111 16:19:34.318257 140502638323520 basic_session_run_hooks.py:692] global_step/sec: 1.84368
Total loss is :[6.09628582]
INFO:tensorflow:global_step/sec: 1.83645
I1111 16:19:39.763585 140502638323520 basic_session_run_hooks.py:692] global_step/sec: 1.83645
Total loss is :[6.20008707]
INFO:tensorflow:global_step/sec: 1.84192
I1111 16:19:45.192930 140502638323520 basic_session_run_hooks.py:692] global_step/sec: 1.84192
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py",line 1356,in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py",line 1341,in _run_fn
options,feed_dict,fetch_list,target_list,run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py",line 1429,in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Nan in summary histogram for: image_pooling/BatchNorm/moving_variance_1
[[{{node image_pooling/BatchNorm/moving_variance_1}}]]
[[Mean_225/_10177]]
(1) Invalid argument: Nan in summary histogram for: image_pooling/BatchNorm/moving_variance_1
[[{{node image_pooling/BatchNorm/moving_variance_1}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception,another exception occurred:
Traceback (most recent call last):
File "/home/zwang/workspace//models-master/research/deeplab/train.py",line 521,in <module>
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py",line 40,in run
_run(main=main,argv=argv,flags_parser=_parse_flags_tolerate_undef)
File "/home/zwang/.local/lib/python3.6/site-packages/absl/app.py",line 299,in run
_run_main(main,args)
File "/home/zwang/.local/lib/python3.6/site-packages/absl/app.py",line 250,in _run_main
sys.exit(main(argv))
File "/home/zwang/workspace//models-master/research/deeplab/train.py",line 515,in main
sess.run([train_tensor])
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py",line 754,in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py",line 1252,line 1353,in run
raise six.reraise(*original_exc_info)
File "/home/zwang/.local/lib/python3.6/site-packages/six.py",line 693,in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py",line 1338,in run
return self._sess.run(*args,**kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py",line 1411,line 1169,**kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py",line 950,in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py",line 1173,in _run
feed_dict_tensor,options,line 1350,in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py",line 1370,in _do_call
raise type(e)(node_def,op,message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Nan in summary histogram for: image_pooling/BatchNorm/moving_variance_1
[[node image_pooling/BatchNorm/moving_variance_1 (defined at home/zwang/workspace//models-master/research/deeplab/train.py:328) ]]
[[Mean_225/_10177]]
(1) Invalid argument: Nan in summary histogram for: image_pooling/BatchNorm/moving_variance_1
[[node image_pooling/BatchNorm/moving_variance_1 (defined at home/zwang/workspace//models-master/research/deeplab/train.py:328) ]]
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation.
Input Source operations connected to node image_pooling/BatchNorm/moving_variance_1:
image_pooling/BatchNorm/moving_variance/read (defined at home/zwang/workspace/models-master/research/deeplab/model.py:478)
Input Source operations connected to node image_pooling/BatchNorm/moving_variance_1:
image_pooling/BatchNorm/moving_variance/read (defined at home/zwang/workspace/models-master/research/deeplab/model.py:478)
Original stack trace for 'image_pooling/BatchNorm/moving_variance_1':
File "home/zwang/workspace//models-master/research/deeplab/train.py",in <module>
tf.app.run()
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py",flags_parser=_parse_flags_tolerate_undef)
File "home/zwang/.local/lib/python3.6/site-packages/absl/app.py",args)
File "home/zwang/.local/lib/python3.6/site-packages/absl/app.py",in _run_main
sys.exit(main(argv))
File "home/zwang/workspace//models-master/research/deeplab/train.py",line 472,in main
dataset.ignore_label)
File "home/zwang/workspace//models-master/research/deeplab/train.py",line 379,in _train_deeplab_model
reuse_variable=(i != 0))
File "home/zwang/workspace//models-master/research/deeplab/train.py",line 275,in _tower_loss
_build_deeplab(iterator,{common.OUTPUT_TYPE: num_of_classes},ignore_label)
File "home/zwang/workspace//models-master/research/deeplab/train.py",line 257,in _build_deeplab
output_type_dict[model.MERGED_LOGITS_SCOPE])
File "home/zwang/workspace//models-master/research/deeplab/train.py",line 328,in _log_summaries
tf.summary.histogram(model_var.op.name,model_var)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/summary/summary.py",line 179,in histogram
tag=tag,values=values,name=scope)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_logging_ops.py",line 329,in histogram_summary
"HistogramSummary",tag=tag,name=name)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py",line 788,in _apply_op_helper
op_def=op_def)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py",line 507,in new_func
return func(*args,**kwargs)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py",line 3616,in create_op
op_def=op_def)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py",line 2005,in __init__
self._traceback = tf_stack.extract_stack()
我认为错误
(0) Invalid argument: Nan in summary histogram for: image_pooling/BatchNorm/moving_variance_1
好像是张量板错误,有什么办法可以避免它?
由于我的训练已完成30000步中的500步,所以没有任何问题。我希望没有函数的某些部分(例如张量板的直方图),或者通过在其他地方_(也许需要编辑the_num_of_classes的另一个参数)来编辑num_of_labels_,训练过程将正常运行。
您能否针对此错误或我的一般方法提出一些建议?谢谢
最好的问候
Z