Creating an Indian accent model with ~115k files

Hey,

I am trying to create an Indian audio model using around 80000 audio files.

This is the error i am getting while trying to train. Can you help me know if there is something wrong with the parameters i am assigning or its something else? :

$ sh run_file.sh

‘[’ ‘!’ -f DeepSpeech.py ‘]’

python -u DeepSpeech.py --train_files /Users/naveen/Downloads/all_datasets/DeepSpeech/TRAIN/train.csv --dev_files /Users/naveen/Downloads/all_datasets/DeepSpeech/DEV/dev.csv --test_files /Users/naveen/Downloads/all_datasets/DeepSpeech/TEST/test.csv --train_batch_size 80 --dev_batch_size 80 --test_batch_size 40 --n_hidden 375 --epoch 33 --validation_step 1 --early_stop True --earlystop_nsteps 6 --estop_mean_thresh 0.1 --estop_std_thresh 0.1 --dropout_rate 0.22 --learning_rate 0.00095 --report_count 100 --use_seq_length False --export_dir /Users/naveen/Downloads/all_datasets/DeepSpeech/results/model_export/ --checkpoint_dir /Users/naveen/Downloads/all_datasets/DeepSpeech/results/checkout/ --decoder_library_path /Users/naveen/Downloads/DeepSpeech/DeepSpeech/libctc_decoder_with_kenlm.so --alphabet_config_path /Users/naveen/Downloads/all_datasets/DeepSpeech/alphabet.txt --lm_binary_path /Users/naveen/Downloads/all_datasets/DeepSpeech/lm.binary --lm_trie_path /Users/naveen/Downloads/all_datasets/DeepSpeech/trie
/Users/naveen/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File “/Users/naveen/Downloads/DeepSpeech/DeepSpeech/util/audio.py”, line 7, in
from deepspeech.utils import audioToInputVector
ModuleNotFoundError: No module named ‘deepspeech’
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “DeepSpeech.py”, line 24, in
from util.audio import audiofile_to_input_vector
File “/Users/naveen/Downloads/DeepSpeech/DeepSpeech/util/audio.py”, line 10, in
from python_speech_features import mfcc
ModuleNotFoundError: No module named ‘python_speech_features’
Deepak’s MacBook Pro:DeepSpeech naveen$ pip install deepspeech
Collecting deepspeech
Using cached https://files.pythonhosted.org/packages/14/c9/e969fbdaac6b2ce7a0fc4c24f0bc96ab4aaaac0e5c0be85f0dceb90c6fb9/deepspeech-0.1.1-cp36-cp36m-macosx_10_10_x86_64.whl
Requirement already satisfied: scipy in /Users/naveen/anaconda3/lib/python3.6/site-packages (from deepspeech)
Requirement already satisfied: numpy in /Users/naveen/anaconda3/lib/python3.6/site-packages (from deepspeech)
Installing collected packages: deepspeech
Successfully installed deepspeech-0.1.1
You are using pip version 9.0.1, however version 10.0.1 is available.
You should consider upgrading via the ‘pip install --upgrade pip’ command.
Deepak’s MacBook Pro:DeepSpeech naveen$ sh run_file.sh

‘[’ ‘!’ -f DeepSpeech.py ‘]’
python -u DeepSpeech.py --train_files /Users/naveen/Downloads/all_datasets/DeepSpeech/TRAIN/train.csv --dev_files /Users/naveen/Downloads/all_datasets/DeepSpeech/DEV/dev.csv --test_files /Users/naveen/Downloads/all_datasets/DeepSpeech/TEST/test.csv --train_batch_size 80 --dev_batch_size 80 --test_batch_size 40 --n_hidden 375 --epoch 33 --validation_step 1 --early_stop True --earlystop_nsteps 6 --estop_mean_thresh 0.1 --estop_std_thresh 0.1 --dropout_rate 0.22 --learning_rate 0.00095 --report_count 100 --use_seq_length False --export_dir /Users/naveen/Downloads/all_datasets/DeepSpeech/results/model_export/ --checkpoint_dir /Users/naveen/Downloads/all_datasets/DeepSpeech/results/checkout/ --decoder_library_path /Users/naveen/Downloads/DeepSpeech/DeepSpeech/libctc_decoder_with_kenlm.so --alphabet_config_path /Users/naveen/Downloads/all_datasets/DeepSpeech/alphabet.txt --lm_binary_path /Users/naveen/Downloads/all_datasets/DeepSpeech/lm.binary --lm_trie_path /Users/naveen/Downloads/all_datasets/DeepSpeech/trie
/Users/naveen/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File “DeepSpeech.py”, line 29, in
from xdg import BaseDirectory as xdg
ModuleNotFoundError: No module named ‘xdg’
Deepak’s MacBook Pro:DeepSpeech naveen$ pip install pyxdg
Collecting pyxdg
Using cached https://files.pythonhosted.org/packages/39/03/12eb9062f43adb94e30f366743cb5c83fd15fef026500cd4de42c7c12280/pyxdg-0.26-py2.py3-none-any.whl
Installing collected packages: pyxdg
Successfully installed pyxdg-0.26
You are using pip version 9.0.1, however version 10.0.1 is available.
You should consider upgrading via the ‘pip install --upgrade pip’ command.
Deepak’s MacBook Pro:DeepSpeech naveen$ sh run_file.sh
‘[’ ‘!’ -f DeepSpeech.py ‘]’
python -u DeepSpeech.py --train_files /Users/naveen/Downloads/all_datasets/DeepSpeech/TRAIN/train.csv --dev_files /Users/naveen/Downloads/all_datasets/DeepSpeech/DEV/dev.csv --test_files /Users/naveen/Downloads/all_datasets/DeepSpeech/TEST/test.csv --train_batch_size 80 --dev_batch_size 80 --test_batch_size 40 --n_hidden 375 --epoch 33 --validation_step 1 --early_stop True --earlystop_nsteps 6 --estop_mean_thresh 0.1 --estop_std_thresh 0.1 --dropout_rate 0.22 --learning_rate 0.00095 --report_count 100 --use_seq_length False --export_dir /Users/naveen/Downloads/all_datasets/DeepSpeech/results/model_export/ --checkpoint_dir /Users/naveen/Downloads/all_datasets/DeepSpeech/results/checkout/ --decoder_library_path /Users/naveen/Downloads/DeepSpeech/DeepSpeech/libctc_decoder_with_kenlm.so --alphabet_config_path /Users/naveen/Downloads/all_datasets/DeepSpeech/alphabet.txt --lm_binary_path /Users/naveen/Downloads/all_datasets/DeepSpeech/lm.binary --lm_trie_path /Users/naveen/Downloads/all_datasets/DeepSpeech/trie
/Users/naveen/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
E Labels length is zero in batch 0
E [[Node: tower_0/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]
E
E Caused by op ‘tower_0/CTCLoss’, defined at:
E File “DeepSpeech.py”, line 1838, in
E tf.app.run()
E File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py”, line 126, in run
E _sys.exit(main(argv))
E File “DeepSpeech.py”, line 1795, in main
E train()
E File “DeepSpeech.py”, line 1501, in train
E results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
E File “DeepSpeech.py”, line 640, in get_tower_results
E calculate_mean_edit_distance_and_loss(model_feeder, i, no_dropout if optimizer is None else dropout_rates)
E File “DeepSpeech.py”, line 527, in calculate_mean_edit_distance_and_loss
E total_loss = tf.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len)
E File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/ctc_ops.py”, line 158, in ctc_loss
E ignore_longer_outputs_than_inputs=ignore_longer_outputs_than_inputs)
E File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_ctc_ops.py”, line 285, in ctc_loss
E name=name)
E File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py”, line 787, in _apply_op_helper
E op_def=op_def)
E File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py”, line 3392, in create_op
E op_def=op_def)
E File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py”, line 1718, in init
E self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
E
E InvalidArgumentError (see above for traceback): Labels length is zero in batch 0
E [[Node: tower_0/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]
E
Traceback (most recent call last):
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1322, in _do_call
return fn(*args)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Labels length is zero in batch 0
[[Node: tower_0/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “DeepSpeech.py”, line 1595, in train
step = session.run(global_step, feed_dict=feed_dict)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py”, line 567, in run
run_metadata=run_metadata)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py”, line 1043, in run
run_metadata=run_metadata)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py”, line 1134, in run
raise six.reraise(*original_exc_info)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/six.py”, line 693, in reraise
raise value
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py”, line 1119, in run
return self._sess.run(*args, **kwargs)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py”, line 1191, in run
run_metadata=run_metadata)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py”, line 971, in run
return self._sess.run(*args, **kwargs)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 900, in run
run_metadata_ptr)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1135, in _run
feed_dict_tensor, options, run_metadata)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1316, in _do_run
run_metadata)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Labels length is zero in batch 0
[[Node: tower_0/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]

Caused by op ‘tower_0/CTCLoss’, defined at:
File “DeepSpeech.py”, line 1838, in
tf.app.run()
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py”, line 126, in run
_sys.exit(main(argv))
File “DeepSpeech.py”, line 1795, in main
train()
File “DeepSpeech.py”, line 1501, in train
results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
File “DeepSpeech.py”, line 640, in get_tower_results
calculate_mean_edit_distance_and_loss(model_feeder, i, no_dropout if optimizer is None else dropout_rates)
File “DeepSpeech.py”, line 527, in calculate_mean_edit_distance_and_loss
total_loss = tf.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/ctc_ops.py”, line 158, in ctc_loss
ignore_longer_outputs_than_inputs=ignore_longer_outputs_than_inputs)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_ctc_ops.py”, line 285, in ctc_loss
name=name)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py”, line 787, in _apply_op_helper
op_def=op_def)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py”, line 3392, in create_op
op_def=op_def)
File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py”, line 1718, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Labels length is zero in batch 0
[[Node: tower_0/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/logits, tower_0/ToInt64, tower_0/GatherV2, tower_0/GatherV2_DequeueMany:1)]]

E You must feed a value for placeholder tensor ‘Queue_Selector’ with dtype int32
E [[Node: Queue_Selector = Placeholderdtype=DT_INT32, shape=, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
E
E Caused by op ‘Queue_Selector’, defined at:
E File “DeepSpeech.py”, line 1838, in
E tf.app.run()
E File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py”, line 126, in run
E _sys.exit(main(argv))
E File “DeepSpeech.py”, line 1795, in main
E train()
E File “DeepSpeech.py”, line 1489, in train
E tower_feeder_count=len(available_devices))
E File “/Users/naveen/Downloads/DeepSpeech/DeepSpeech/util/feeding.py”, line 43, in init
E self.ph_queue_selector = tf.placeholder(tf.int32, name=‘Queue_Selector’)
E File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py”, line 1808, in placeholder
E return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
E File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py”, line 4848, in placeholder
E “Placeholder”, dtype=dtype, shape=shape, name=name)
E File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py”, line 787, in _apply_op_helper
E op_def=op_def)
E File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py”, line 3392, in create_op
E op_def=op_def)
E File “/Users/naveen/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py”, line 1718, in init
E self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
E
E InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor ‘Queue_Selector’ with dtype int32
E [[Node: Queue_Selector = Placeholderdtype=DT_INT32, shape=, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
E
E The checkpoint in /Users/naveen/Downloads/all_datasets/DeepSpeech/results/checkout/ does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of /Users/naveen/Downloads/all_datasets/DeepSpeech/results/checkout/.

Please make sure you use proper code formatting, the current output is not readable.

Thanks, that helps a lot.

Also, i am training these 80000 files with the same parameters as mentioned below from this link (TUTORIAL : How I trained a specific french model to control my robot) :-

–train_batch_size 80
–dev_batch_size 80
–test_batch_size 40
–n_hidden 375
–epoch 33
–validation_step 1
–early_stop True
–earlystop_nsteps 6
–estop_mean_thresh 0.1
–estop_std_thresh 0.1
–dropout_rate 0.22
–learning_rate 0.00095
–report_count 100
–use_seq_length False \

The length of my audio files is around 5 sec per audio file. Can you please let me know which parameters i need to research and tweak?

So, until you fix your current paste to make it readable, I’m not able to help you: content is painful to read, and some of the console output is being interpretted as markdown formatting.

Okay,
is this helpful? Or i need to make more changes?

That seems to be your root case. Some of your data as no label. Please check your CSV and importer.

okay i corrected the label problem in data. There was some problem with my csv file.

I am going to put the model for run again. I have 2 major doubts for that :

  1. I got CUDA support but still Model may run for a week, how do i check its progress in percentage if i am running the script in terminal?

  2. Also, i am training these ~115K files with the same parameters as mentioned below from this link (TUTORIAL : How I trained a specific french model to control my robot) :-

–train_batch_size 80
–dev_batch_size 80
–test_batch_size 40
–n_hidden 375
–epoch 33
–validation_step 1
–early_stop True
–earlystop_nsteps 6
–estop_mean_thresh 0.1
–estop_std_thresh 0.1
–dropout_rate 0.22
–learning_rate 0.00095
–report_count 100
–use_seq_length False \

The length of my audio files is around 5 sec per audio file. Can you please let me know which parameters i need to research and tweak?

As much as I could see from your previous console dumps, you are running the training on macOS. TensorFlow dropped support for CUDA on this platform several releases ago. So any training done there is going to be CPU-only. You should really aim at a Linux GPU-powered system. Progress will be controled by the --display_step. Check the documentation for that. Display step runs a WER, which consumes a lot of resources. Use with care.

Please look on the forum, there’s already lot of informations around that. I’m not sure how much more I can help you, except that the parameters used by Vincent for his robots are tailored to a specific, small dataset. Likely you will have to augment the n_hidden. Check the current github issues related to benchmarks for v0.2.0 project, we have run some tests to estimate better sizes. We also document the others good values for parameters in the v0.1.1 release notes. Please check with those.

115k files of 5 secs each on average gets you around 160 hours of audio. Likely a not too bad starting point, but you should not expect too much from that.

Actually i got a CUDA enabled Linux GPU-Powered system. Sorry, i forgot to mention that.

So, with --display _step tweaking i can check the progress of how much my model has trained in percentage if i train the model in terminal??

Also, can you answer the second part of my previous question? Which of these parameters will change as i have a total of ~115 audio files of average 5 sec length (80k in train, 23k in dev, 12k in test)?

These parameters were used by me directly from this link : (TUTORIAL : How I trained a specific french model to control my robot) but in that previous run i had only a total of 5000 files.

But since now, i have such a huge number of files, i am skeptical to use same value for these parameters.

apologies as i did not check your second answer before posting my previous comment.

Ignore the part where i am asking you to answer the parameter setting query.

We dont have « percentage ». But you set a number of epochs, and display step will tell you which epoch and which WER …

Hi,
While trying to run the model, i am getting errors like these:

Exception in thread Thread-7:
Traceback (most recent call last):
  File "/Users/naveen/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/Users/naveen/anaconda3/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/naveen/Downloads/DeepSpeech/DeepSpeech/util/feeding.py", line 151, in _populate_batch_queue
    raise ValueError('Error: Audio file {} is too short for transcription.'.format(wav_file))

ValueError: Error: Audio file /Users/naveen/Downloads/all_datasets/DeepSpeech/TEST/g0907_e_tam_f_output.wav is too short for transcription.

Should i remove the file altogether, along with its corresponding entry from the csv files or there is any other solution to this?

Hard to tell without more context :frowning:

okay, as discussed earlier in this thread, i am trying to create a model with 160 hours of Indian accent audios but while running the model creation code, i am facing this error for many files that :

Exception in thread Thread-7:
Traceback (most recent call last):
File “/Users/naveen/anaconda3/lib/python3.6/threading.py”, line 916, in _bootstrap_inner
self.run()
File “/Users/naveen/anaconda3/lib/python3.6/threading.py”, line 864, in run
self._target(*self._args, **self._kwargs)
File “/Users/naveen/Downloads/DeepSpeech/DeepSpeech/util/feeding.py”, line 151, in _populate_batch_queue
raise ValueError(‘Error: Audio file {} is too short for transcription.’.format(wav_file))

ValueError: Error: Audio file /Users/naveen/Downloads/all_datasets/DeepSpeech/TEST/g0907_e_tam_f_output.wav is too short for transcription.

No, context I meant is “what is this file” ? I mean, if the file is too short, what’s wrong with just removing it and its transcription ?

There are multiple files. Initially 3. Then I removed file and its transcription.

In fact, one of the files had a good length transcription too. Example:

“eng_text_90-2_e_man_m_output.wav, 33964, tenaliraman approached thimmana and appeased him with his expertise in spontaneous poetry”

But the problem is that, after rerunning the code, i am getting same error for more files.

If in a single run only atleast i would have got to know all the files that are giving this error, i would have removed them all at once. But, that is where the problem is, i am getting error for few files then after that there is no output. And then when i am rerunning, i am getting same error for new files.

So, i am just removing files and corresponding transcriptions and rerunning the code.

What would help here is that you document what’s the transcription AND the audio length. You might be able to search more broadly this way …

what is the minimum length of audio that i should feed while training the model?

Have a look at the source code that generates the error, you’ll get the answer. The stack tells you it is at util/feeding.py:151