Hi! I’m trying to train Deepspeech 0.9.3 on hindi data.
I followed https://mozilla.github.io/deepspeech-playbook/ to format data, generate scorer file and other configurations.
And then ran this command:
python3 /DeepSpeech/DeepSpeech.py
–train_files /DeepSpeech/deepspeech-data/cv-corpus-6.1-2020-12-11/hi/clips/train.csv
–dev_files /DeepSpeech/deepspeech-data/cv-corpus-6.1-2020-12-11/hi/clips/dev.csv
–test_files /DeepSpeech/deepspeech-data/cv-corpus-6.1-2020-12-11/hi/clips/test.csv
–alphabet_config_path=/DeepSpeech/data/alphabet.txt
–checkpoint_dir /DeepSpeech/deepspeech-data/checkpoints
got below errors :
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1365, in _do_call
return fn(*args)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1350, in _run_fn
target_list, run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[{{node tower_0/IteratorGetNext}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/DeepSpeech/training/deepspeech_training/train.py”, line 572, in run_set
feed_dict=feed_dict)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 956, in run
run_metadata_ptr)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1180, in _run
feed_dict_tensor, options, run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1359, in _do_run
run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[node tower_0/IteratorGetNext (defined at usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
Original stack trace for ‘tower_0/IteratorGetNext’:
File “DeepSpeech/DeepSpeech.py”, line 12, in
ds_train.run_script()
File “DeepSpeech/training/deepspeech_training/train.py”, line 982, in run_script
absl.app.run(main)
File “usr/local/lib/python3.6/dist-packages/absl/app.py”, line 300, in run
_run_main(main, args)
File “usr/local/lib/python3.6/dist-packages/absl/app.py”, line 251, in _run_main
sys.exit(main(argv))
File “DeepSpeech/training/deepspeech_training/train.py”, line 954, in main
train()
File “DeepSpeech/training/deepspeech_training/train.py”, line 484, in train
gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
File “DeepSpeech/training/deepspeech_training/train.py”, line 317, in get_tower_results
avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)
File “DeepSpeech/training/deepspeech_training/train.py”, line 236, in calculate_mean_edit_distance_and_loss
batch_filenames, (batch_x, batch_seq_len), batch_y = iterator.get_next()
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/iterator_ops.py”, line 426, in get_next
name=name)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_dataset_ops.py”, line 2518, in iterator_get_next
output_shapes=output_shapes, name=name)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py”, line 794, in _apply_op_helper
op_def=op_def)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py”, line 507, in new_func
return func(*args, **kwargs)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3357, in create_op
attrs, op_def, compute_device)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3426, in _create_op_internal
op_def=op_def)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 1748, in init
self._traceback = tf_stack.extract_stack()
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/DeepSpeech/DeepSpeech.py”, line 12, in
ds_train.run_script()
File “/DeepSpeech/training/deepspeech_training/train.py”, line 982, in run_script
absl.app.run(main)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 300, in run
_run_main(main, args)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 251, in _run_main
sys.exit(main(argv))
File “/DeepSpeech/training/deepspeech_training/train.py”, line 954, in main
train()
File “/DeepSpeech/training/deepspeech_training/train.py”, line 607, in train
train_loss, _ = run_set(‘train’, epoch, train_init_op)
File “/DeepSpeech/training/deepspeech_training/train.py”, line 575, in run_set
exception_box.raise_if_set()
File “/DeepSpeech/training/deepspeech_training/util/helpers.py”, line 149, in raise_if_set
raise exception # pylint: disable = raising-bad-type
File “/DeepSpeech/training/deepspeech_training/util/helpers.py”, line 157, in do_iterate
yield from iterable()
File “/DeepSpeech/training/deepspeech_training/util/feeding.py”, line 118, in generate_values
transcript = text_to_char_array(sample.transcript, Config.alphabet, context=sample.sample_id)
File “/DeepSpeech/training/deepspeech_training/util/text.py”, line 18, in text_to_char_array
.format(transcript, context, list(ch for ch in transcript if not alphabet.CanEncodeSingle(ch))))
ValueError: Alphabet cannot encode transcript “बड़ी रात थी।” while processing sample “/DeepSpeech/deepspeech-data/cv-corpus-6.1-2020-12-11/hi/clips/common_voice_hi_23849359.wav”, check that your alphabet contains all characters in the training corpus. Missing characters are: [‘ब’, ‘ड’, ‘़’, ‘ी’, ‘र’, ‘ा’, ‘त’, ‘थ’, ‘ी’, ‘।’].
Although all alphabets are added in alphabet.txt file.
Can anyone please help me figure out why am I getting these errors ?