Error when training model


(karthikeyan k) #1

hi i tried your method of training the model but i am getting the below error…
can you please help me with this…
thank you…


Unable to train model with default checkpoints and frozen model
(Lissyx) #2

Please avoid screenshots, it’s not readable. And please avoid hijacking other’s threads, this is adding noise and not helping.


(Gr8nishan) #3

@karthikeyank I think your tensorflow version is mismatching the decoder version. Check the tensorflow version in the requirements.txt and install appropriately.


(karthikeyan k) #4

Yeah Thank you @gr8nishan …Now I built a comlpete new project from the deepspeech 0.3.0 release with the respective requirements.txt packages and native_client.amd64.cpu.linux.tar.xz file from the releases tensorflow version = 1.11.0. And I am getting this issue now…

('Preprocessing', ['/home/userk/DeepSpeechPro/datasets/train/train.csv'])
Traceback (most recent call last):
File "DeepSpeech.py", line 1988, in 
tf.app.run(main)
File "/home/userk/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "DeepSpeech.py", line 1944, in main
train()
File "DeepSpeech.py", line 1468, in train
hdf5_cache_path=FLAGS.train_cached_features_path)
File "/home/userk/DeepSpeechPro/DeepSpeech2/DeepSpeech-0.3.0/util/preprocess.py", line 68, in preprocess
out_data = pmap(step_fn, source_data.iterrows())
File "/home/userk/DeepSpeechPro/DeepSpeech2/DeepSpeech-0.3.0/util/preprocess.py", line 13, in pmap
results = pool.map(fun, iterable)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
AttributeError: 'Series' object has no attribute 'transcript'

(Gr8nishan) #5

@karthikeyank does your csv has all these three columns wav_filename, wav_filesize,transcript. It looks like if transcript column is missing from your csv. Also make sure that you have these three as headers present in your csv.


(karthikeyan k) #6

yes all three columns are present…
here is a snap


(Lissyx) #7

Again, no screenshots, share the file. We might be missing a lot of informations. And again, stop asking the same question everywhere …


(karthikeyan k) #8

Okay… The CSV file looks like…

wav_filename,wav_filesize,transcript /home/userk/DeepSpeechPro/datasets/train/chunk1.wav,15788,how can i help

and actually three persons helping me out including you, so I have to update my results to all of them right, that’s why am keep on updating my states…Sorry…


(Lissyx) #9

No, share the file, not it’s content: there might be some non printable char messing


(karthikeyan k) #10

okay I check for it and update


(karthikeyan k) #12

@lissyx yes you was correct the CSV file was corrupt due to fixed length… I have rebuilt everything and executed. Now its training… Thanks for your support Throughout…


(karthikeyan k) #13

@lissyx, I am getting this error after 3 hours of training, can you please have a look at it…

E Labels length is zero in batch 0
E [[{{node tower_0/CTCLoss}} = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/raw_logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]
E
E Caused by op ‘tower_0/CTCLoss’, defined at:
E File “DeepSpeech.py”, line 1988, in
E tf.app.run(main)
E File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py”, line 125, in run
E _sys.exit(main(argv))
E File “DeepSpeech.py”, line 1944, in main
E train()
E File “DeepSpeech.py”, line 1520, in train
E results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
E File “DeepSpeech.py”, line 634, in get_tower_results
E calculate_mean_edit_distance_and_loss(model_feeder, i, dropout_rates, reuse=i>0)
E File “DeepSpeech.py”, line 521, in calculate_mean_edit_distance_and_loss
E total_loss = tf.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len, ignore_longer_outputs_than_inputs=True)
E File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/ops/ctc_ops.py”, line 158, in ctc_loss
E ignore_longer_outputs_than_inputs=ignore_longer_outputs_than_inputs)
E File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_ctc_ops.py”, line 286, in ctc_loss
E name=name)
E File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py”, line 787, in _apply_op_helper
E op_def=op_def)
E File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py”, line 488, in new_func
E return func(*args, **kwargs)
E File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 3272, in create_op
E op_def=op_def)
E File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 1768, in init
E self._traceback = tf_stack.extract_stack()
E
E InvalidArgumentError (see above for traceback): Labels length is zero in batch 0
E [[{{node tower_0/CTCLoss}} = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/raw_logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]
33% (745 of 2223) |################################ | Elapsed Time: 3:55:09 ETA: 13:19:16Traceback (most recent call last):
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1292, in _do_call
return fn(*args)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1277, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1367, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Labels length is zero in batch 0
[[{{node tower_0/CTCLoss}} = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/raw_logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “DeepSpeech.py”, line 1729, in train
_, current_step, batch_loss, batch_report, step_summary = session.run([train_op, global_step, loss, report_params, step_summaries_op], **extra_params)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 671, in run
run_metadata=run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1148, in run
run_metadata=run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1239, in run
raise six.reraise(*original_exc_info)
File “/usr/lib/python3/dist-packages/six.py”, line 686, in reraise
raise value
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1224, in run
return self._sess.run(*args, **kwargs)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1296, in run
run_metadata=run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1076, in run
return self._sess.run(*args, **kwargs)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 887, in run
run_metadata_ptr)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1110, in _run
feed_dict_tensor, options, run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1286, in _do_run
run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1308, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Labels length is zero in batch 0
[[{{node tower_0/CTCLoss}} = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/raw_logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]

Caused by op ‘tower_0/CTCLoss’, defined at:
File “DeepSpeech.py”, line 1988, in
tf.app.run(main)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “DeepSpeech.py”, line 1944, in main
train()
File “DeepSpeech.py”, line 1520, in train
results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
File “DeepSpeech.py”, line 634, in get_tower_results
calculate_mean_edit_distance_and_loss(model_feeder, i, dropout_rates, reuse=i>0)
File “DeepSpeech.py”, line 521, in calculate_mean_edit_distance_and_loss
total_loss = tf.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len, ignore_longer_outputs_than_inputs=True)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/ops/ctc_ops.py”, line 158, in ctc_loss
ignore_longer_outputs_than_inputs=ignore_longer_outputs_than_inputs)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_ctc_ops.py”, line 286, in ctc_loss
name=name)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py”, line 787, in _apply_op_helper
op_def=op_def)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py”, line 488, in new_func
return func(*args, **kwargs)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 3272, in create_op
op_def=op_def)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 1768, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Labels length is zero in batch 0
[[{{node tower_0/CTCLoss}} = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/raw_logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]

Traceback (most recent call last):
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1292, in _do_call
return fn(*args)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1277, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1367, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Labels length is zero in batch 0
[[{{node tower_0/CTCLoss}} = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/raw_logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “DeepSpeech.py”, line 1729, in train
_, current_step, batch_loss, batch_report, step_summary = session.run([train_op, global_step, loss, report_params, step_summaries_op], **extra_params)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 671, in run
run_metadata=run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1148, in run
run_metadata=run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1239, in run
raise six.reraise(*original_exc_info)
File “/usr/lib/python3/dist-packages/six.py”, line 686, in reraise
raise value
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1224, in run
return self._sess.run(*args, **kwargs)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1296, in run
run_metadata=run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1076, in run
return self._sess.run(*args, **kwargs)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 887, in run
run_metadata_ptr)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1110, in _run
feed_dict_tensor, options, run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1286, in _do_run
run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1308, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Labels length is zero in batch 0
[[{{node tower_0/CTCLoss}} = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/raw_logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]

Caused by op ‘tower_0/CTCLoss’, defined at:
File “DeepSpeech.py”, line 1988, in
tf.app.run(main)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “DeepSpeech.py”, line 1944, in main
train()
File “DeepSpeech.py”, line 1520, in train
results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
File “DeepSpeech.py”, line 634, in get_tower_results
calculate_mean_edit_distance_and_loss(model_feeder, i, dropout_rates, reuse=i>0)
File “DeepSpeech.py”, line 521, in calculate_mean_edit_distance_and_loss
total_loss = tf.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len, ignore_longer_outputs_than_inputs=True)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/ops/ctc_ops.py”, line 158, in ctc_loss
ignore_longer_outputs_than_inputs=ignore_longer_outputs_than_inputs)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_ctc_ops.py”, line 286, in ctc_loss
name=name)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py”, line 787, in _apply_op_helper
op_def=op_def)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py”, line 488, in new_func
return func(*args, **kwargs)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 3272, in create_op
op_def=op_def)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 1768, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Labels length is zero in batch 0
[[{{node tower_0/CTCLoss}} = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/raw_logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “DeepSpeech.py”, line 1988, in
tf.app.run(main)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “DeepSpeech.py”, line 1944, in main
train()
File “DeepSpeech.py”, line 1768, in train
hook.end(session)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/basic_session_run_hooks.py”, line 587, in end
self._save(session, last_step)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/basic_session_run_hooks.py”, line 598, in _save
self._get_saver().save(session, self._save_path, global_step=step)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 1421, in save
raise TypeError("‘sess’ must be a Session; %s" % sess)
TypeError: ‘sess’ must be a Session; <tensorflow.python.training.monitored_session.MonitoredSession object at 0x7f8c8406fe10>

thank you…


(Murugan R) #14

@karthikeyank sir,
once you open DeepSpeech.py then check line 517, add this parametre
ignore_longer_outputs_than_inputs=True

total_loss = tf.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len, ignore_longer_outputs_than_inputs=True)

sir now start training. i think it will works fine.:slightly_smiling_face:

previously i was facing this issue.


(karthikeyan k) #15

@muruganrajenthirean, okay please help me with this too…
If I am training the model for 3 hours and I want to pause and train it tomorrow from where it paused… How can I achieve this…


(Murugan R) #16

i have one idea. if you see one epoch how much time it takes to calculate upto 3 hours. give you that epoch and train a model, create checkpoint.

tomorrow then finetune your model with yesterday checkpoints. that’s it.

do you know continues training(transfer learning). your checkpoints will start continue to train remaining epoch.

do you understand sir? :slightly_smiling_face:


(karthikeyan k) #17

@muruganrajenthirean, yes I can understand… but here I’m training on a CPU which took 3hrs to train 33% of train set…

and

this lineignore_longer_outputs_than_inputs=True is already added when i got the file too short for transcription error…
note: actually while training, the system went for 30 mins sleep, would that be the cause of this issue…


(Lissyx) #18

You have some empty transcription somewhere ?


(Murugan R) #19

here I’m training on a CPU which took 3hrs to train 33% of train set

it’s never completing just one epoch. then you go to GPU. otherwise i don’t know possibilities sir.

this line ignore_longer_outputs_than_inputs=True is already added when i got thefile too short for transcriptionerror

i think your hyper parametres something will create issue. fine tune your hyper parameters sir. :slightly_smiling_face:


(karthikeyan k) #20

@lissyx, I checked the csv files, there is no empty transcription… at least two characters are present… only the end rows are empty…


(Lissyx) #21

That would fit as a description of “empty transcription”