Hi everyone!
I am currently using DeepSpeech code v0.8.0, running training inside a Docker container created from the template provided by the Makefile available in the DeepSpeech repo.
I am now facing this error when trying to train with Common Voice Spanish data corpus:
root@32b0785706d5:/DeepSpeech# ./bin/run-ES-ds.sh
+ [ ! -f DeepSpeech.py ]
+ export CUDA_VISIBLE_DEVICES=0
+ python -u DeepSpeech.py --train_files /data/cv_es/train.csv --test_files /data/cv_es/test.csv --dev_files /data/cv_es/dev.csv --train_batch_size 100 --dev_batch_size 100 --test_batch_size 1 --n_hidden 100 --epochs 1 --checkpoint_dir /checkpoints
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Not enough time for target transition sequence (required: 20, available: 19)8You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
[[{{node tower_0/CTCLoss}}]]
These are my training parameters:
python -u DeepSpeech.py \
--train_files /data/cv_es/train.csv \
--test_files /data/cv_es/test.csv \
--dev_files /data/cv_es/dev.csv \
--train_batch_size 100 \
--dev_batch_size 100 \
--test_batch_size 1 \
--n_hidden 100 \
--epochs 1 \
--checkpoint_dir /checkpoints \
"$@"
They are just a test, I will adjust them later for proper training.
From what I have been reading in other people’s posts, having too short audios can yield this error.
I have read that setting the flag ignore_longer_outputs_than_inputs
to True on the CTC loss function can solve this. I have also read that this is only a workaround and that data should be cleaned more in depth.
What I don’t know is what kind of cleaning I must perform. Maybe removing audios shorter than an arbitrary duration? I have listened to some of the audios that make this happen after setting such flag (ignore_longer_outputs_than_inputs
) and they seem normal to me. They are roughly 2 seconds long or less and I am afraid this could be the problem.
Could anyone suggest a solution or give some hint on the problem?
Thanks in advance