Error when training model

karthikeyank · December 5, 2018, 8:58am

hi i tried your method of training the model but i am getting the below error…
can you please help me with this…
thank you…

lissyx · December 5, 2018, 1:01pm

Please avoid screenshots, it’s not readable. And please avoid hijacking other’s threads, this is adding noise and not helping.

gr8nishan · December 6, 2018, 7:00am

@karthikeyank I think your tensorflow version is mismatching the decoder version. Check the tensorflow version in the requirements.txt and install appropriately.

karthikeyank · December 6, 2018, 7:26am

Yeah Thank you @gr8nishan …Now I built a comlpete new project from the deepspeech 0.3.0 release with the respective requirements.txt packages and native_client.amd64.cpu.linux.tar.xz file from the releases tensorflow version = 1.11.0. And I am getting this issue now…

('Preprocessing', ['/home/userk/DeepSpeechPro/datasets/train/train.csv'])
Traceback (most recent call last):
File "DeepSpeech.py", line 1988, in 
tf.app.run(main)
File "/home/userk/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "DeepSpeech.py", line 1944, in main
train()
File "DeepSpeech.py", line 1468, in train
hdf5_cache_path=FLAGS.train_cached_features_path)
File "/home/userk/DeepSpeechPro/DeepSpeech2/DeepSpeech-0.3.0/util/preprocess.py", line 68, in preprocess
out_data = pmap(step_fn, source_data.iterrows())
File "/home/userk/DeepSpeechPro/DeepSpeech2/DeepSpeech-0.3.0/util/preprocess.py", line 13, in pmap
results = pool.map(fun, iterable)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
AttributeError: 'Series' object has no attribute 'transcript'

gr8nishan · December 6, 2018, 9:16am

@karthikeyank does your csv has all these three columns wav_filename, wav_filesize,transcript. It looks like if transcript column is missing from your csv. Also make sure that you have these three as headers present in your csv.

karthikeyank · December 6, 2018, 9:21am

yes all three columns are present…
here is a snap

lissyx · December 6, 2018, 3:27pm

Again, no screenshots, share the file. We might be missing a lot of informations. And again, stop asking the same question everywhere …

karthikeyank · December 6, 2018, 4:04pm

Okay… The CSV file looks like…

wav_filename,wav_filesize,transcript /home/userk/DeepSpeechPro/datasets/train/chunk1.wav,15788,how can i help

and actually three persons helping me out including you, so I have to update my results to all of them right, that’s why am keep on updating my states…Sorry…

lissyx · December 6, 2018, 6:33pm

No, share the file, not it’s content: there might be some non printable char messing

karthikeyank · December 7, 2018, 4:20am

okay I check for it and update

karthikeyank · December 8, 2018, 10:03am

@lissyx yes you was correct the CSV file was corrupt due to fixed length… I have rebuilt everything and executed. Now its training… Thanks for your support Throughout…

karthikeyank · December 10, 2018, 8:23am

@lissyx, I am getting this error after 3 hours of training, can you please have a look at it…

E Labels length is zero in batch 0
E [[{{node tower_0/CTCLoss}} = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/raw_logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]
E
E Caused by op ‘tower_0/CTCLoss’, defined at:
E File “DeepSpeech.py”, line 1988, in
E tf.app.run(main)
E File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py”, line 125, in run
E _sys.exit(main(argv))
E File “DeepSpeech.py”, line 1944, in main
E train()
E File “DeepSpeech.py”, line 1520, in train
E results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
E File “DeepSpeech.py”, line 634, in get_tower_results
E calculate_mean_edit_distance_and_loss(model_feeder, i, dropout_rates, reuse=i>0)
E File “DeepSpeech.py”, line 521, in calculate_mean_edit_distance_and_loss
E total_loss = tf.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len, ignore_longer_outputs_than_inputs=True)
E File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/ops/ctc_ops.py”, line 158, in ctc_loss
E ignore_longer_outputs_than_inputs=ignore_longer_outputs_than_inputs)
E File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_ctc_ops.py”, line 286, in ctc_loss
E name=name)
E File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py”, line 787, in _apply_op_helper
E op_def=op_def)
E File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py”, line 488, in new_func
E return func(*args, **kwargs)
E File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 3272, in create_op
E op_def=op_def)
E File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 1768, in init
E self._traceback = tf_stack.extract_stack()
E
E InvalidArgumentError (see above for traceback): Labels length is zero in batch 0
E [[{{node tower_0/CTCLoss}} = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/raw_logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]
33% (745 of 2223) |################################ | Elapsed Time: 3:55:09 ETA: 13:19:16Traceback (most recent call last):
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1292, in _do_call
return fn(*args)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1277, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1367, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Labels length is zero in batch 0
[[{{node tower_0/CTCLoss}} = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/raw_logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “DeepSpeech.py”, line 1729, in train
_, current_step, batch_loss, batch_report, step_summary = session.run([train_op, global_step, loss, report_params, step_summaries_op], **extra_params)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 671, in run
run_metadata=run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1148, in run
run_metadata=run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1239, in run
raise six.reraise(*original_exc_info)
File “/usr/lib/python3/dist-packages/six.py”, line 686, in reraise
raise value
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1224, in run
return self._sess.run(*args, **kwargs)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1296, in run
run_metadata=run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1076, in run
return self._sess.run(*args, **kwargs)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 887, in run
run_metadata_ptr)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1110, in _run
feed_dict_tensor, options, run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1286, in _do_run
run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1308, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Labels length is zero in batch 0
[[{{node tower_0/CTCLoss}} = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/raw_logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]

Caused by op ‘tower_0/CTCLoss’, defined at:
File “DeepSpeech.py”, line 1988, in
tf.app.run(main)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “DeepSpeech.py”, line 1944, in main
train()
File “DeepSpeech.py”, line 1520, in train
results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
File “DeepSpeech.py”, line 634, in get_tower_results
calculate_mean_edit_distance_and_loss(model_feeder, i, dropout_rates, reuse=i>0)
File “DeepSpeech.py”, line 521, in calculate_mean_edit_distance_and_loss
total_loss = tf.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len, ignore_longer_outputs_than_inputs=True)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/ops/ctc_ops.py”, line 158, in ctc_loss
ignore_longer_outputs_than_inputs=ignore_longer_outputs_than_inputs)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_ctc_ops.py”, line 286, in ctc_loss
name=name)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py”, line 787, in _apply_op_helper
op_def=op_def)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py”, line 488, in new_func
return func(*args, **kwargs)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 3272, in create_op
op_def=op_def)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 1768, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Labels length is zero in batch 0
[[{{node tower_0/CTCLoss}} = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/raw_logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]

Traceback (most recent call last):
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1292, in _do_call
return fn(*args)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1277, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1367, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Labels length is zero in batch 0
[[{{node tower_0/CTCLoss}} = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/raw_logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “DeepSpeech.py”, line 1729, in train
_, current_step, batch_loss, batch_report, step_summary = session.run([train_op, global_step, loss, report_params, step_summaries_op], **extra_params)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 671, in run
run_metadata=run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1148, in run
run_metadata=run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1239, in run
raise six.reraise(*original_exc_info)
File “/usr/lib/python3/dist-packages/six.py”, line 686, in reraise
raise value
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1224, in run
return self._sess.run(*args, **kwargs)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1296, in run
run_metadata=run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1076, in run
return self._sess.run(*args, **kwargs)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 887, in run
run_metadata_ptr)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1110, in _run
feed_dict_tensor, options, run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1286, in _do_run
run_metadata)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1308, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Labels length is zero in batch 0
[[{{node tower_0/CTCLoss}} = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/raw_logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]

Caused by op ‘tower_0/CTCLoss’, defined at:
File “DeepSpeech.py”, line 1988, in
tf.app.run(main)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “DeepSpeech.py”, line 1944, in main
train()
File “DeepSpeech.py”, line 1520, in train
results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
File “DeepSpeech.py”, line 634, in get_tower_results
calculate_mean_edit_distance_and_loss(model_feeder, i, dropout_rates, reuse=i>0)
File “DeepSpeech.py”, line 521, in calculate_mean_edit_distance_and_loss
total_loss = tf.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len, ignore_longer_outputs_than_inputs=True)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/ops/ctc_ops.py”, line 158, in ctc_loss
ignore_longer_outputs_than_inputs=ignore_longer_outputs_than_inputs)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_ctc_ops.py”, line 286, in ctc_loss
name=name)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py”, line 787, in _apply_op_helper
op_def=op_def)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py”, line 488, in new_func
return func(*args, **kwargs)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 3272, in create_op
op_def=op_def)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 1768, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Labels length is zero in batch 0
[[{{node tower_0/CTCLoss}} = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/raw_logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “DeepSpeech.py”, line 1988, in
tf.app.run(main)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “DeepSpeech.py”, line 1944, in main
train()
File “DeepSpeech.py”, line 1768, in train
hook.end(session)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/basic_session_run_hooks.py”, line 587, in end
self._save(session, last_step)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/basic_session_run_hooks.py”, line 598, in _save
self._get_saver().save(session, self._save_path, global_step=step)
File “/home/userk/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 1421, in save
raise TypeError("‘sess’ must be a Session; %s" % sess)
TypeError: ‘sess’ must be a Session; <tensorflow.python.training.monitored_session.MonitoredSession object at 0x7f8c8406fe10>

thank you…

muruganrajenthirean · December 10, 2018, 8:55am

@karthikeyank sir,
once you open DeepSpeech.py then check line 517, add this parametre
ignore_longer_outputs_than_inputs=True

total_loss = tf.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len, ignore_longer_outputs_than_inputs=True)

sir now start training. i think it will works fine.

previously i was facing this issue.

karthikeyank · December 10, 2018, 8:57am

@muruganrajenthirean, okay please help me with this too…
If I am training the model for 3 hours and I want to pause and train it tomorrow from where it paused… How can I achieve this…

muruganrajenthirean · December 10, 2018, 9:03am

i have one idea. if you see one epoch how much time it takes to calculate upto 3 hours. give you that epoch and train a model, create checkpoint.

tomorrow then finetune your model with yesterday checkpoints. that’s it.

do you know continues training(transfer learning). your checkpoints will start continue to train remaining epoch.

do you understand sir?

karthikeyank · December 10, 2018, 9:08am

@muruganrajenthirean, yes I can understand… but here I’m training on a CPU which took 3hrs to train 33% of train set…

and

this lineignore_longer_outputs_than_inputs=True is already added when i got the file too short for transcription error…
note: actually while training, the system went for 30 mins sleep, would that be the cause of this issue…

lissyx · December 10, 2018, 9:11am

You have some empty transcription somewhere ?

muruganrajenthirean · December 10, 2018, 9:12am

here I’m training on a CPU which took 3hrs to train 33% of train set

it’s never completing just one epoch. then you go to GPU. otherwise i don’t know possibilities sir.

this line ignore_longer_outputs_than_inputs=True is already added when i got thefile too short for transcriptionerror

i think your hyper parametres something will create issue. fine tune your hyper parameters sir.

karthikeyank · December 10, 2018, 9:13am

@lissyx, I checked the csv files, there is no empty transcription… at least two characters are present… only the end rows are empty…

lissyx · December 10, 2018, 9:15am

That would fit as a description of “empty transcription”

Topic		Replies	Views
Running Deepspeech 0.7.4 on Google Commands Dataset DeepSpeech	24	1131	July 24, 2020
DeepSpeech Training own English model for call center speech recognition DeepSpeech	22	3231	October 8, 2019
Inference prediction with own trained model DeepSpeech	9	1416	September 19, 2018
Using Deep Speech DeepSpeech	34	12793	August 20, 2019
DeepSpeech model training DeepSpeech	65	7968	November 12, 2019

Error when training model

Related topics