Noise injection training experiment

Thank you for the quick response!!

Okay I will try to do that and download the dataset to my computer instead of google collab then upload only the new limited dataset created.

Question about this - On google collab, I would download it with !wget I don’t know if it’s possible but if I could limit the amount of Gb to download and get maybe only 15gb done from the en.tar.gz file would I be able to use it for training? Or it would definitely be missing something important?

We really can’t help on third party tooling: we don’t use it, we have no experience on it.

from what I understand, you would just get a file that cannot be extracted.

I see thank you! I will try splitting the dataset then and see how it will go.

Oh also , if i were to do the experiment with a different language would that lead to more inconsistent data because of how the language is structured itself? If not could I just do the experiment with any language, as I am only interested in the correlation of noisy speech recognition with differing noise values added into the training data

Why don’t you use the LJ Speech dataset instead. It is clean to start with and you can add noise as you please.

Oh nice thank you , can i simply just use the !bin/ on this ? or do I have to do processing of the data myself?

Hm, check the other import scripts as it is used for testing. @lissyx do you know which import script might be suitable for LJ Speech?

Sorry, but I dont think we have an importer. @arpi.aszalos if you want to write one, don’t hesitate, it’s not super complicated, and you could send a PR for it

Thanks for answering! I did have a look at the import scripts , but I don’t think I can make an importer as I am quite new to programming still, most of the code doesn’t make much sense to me.

You might be interested in this pull request: It did allow you to mix noise/speech online into your testset. But it’s somewhat older (around version 0.7) and not continued anymore.

I already did some noise tests before, you can find my setup steps here: and the results in the tables below.

1 Like

Yes thank you @dan.bmh I started reading , however i wouldnt use the background noise augmentation. Maybe im misunderstanding, but Add augmentation adds random noise to the training data in forms of numbers to the spectograms?

I will check the pull and your setup thanks for reaching out.

You think i could get data showing some correlation if I train it for only 3-4 hours? (using gpu)
For the voxforge dataset english

Both, the current DeepSpeech master and the pull request have flags to augment with noise audio (using standard csv format), it’s called overlay augmentation in master.

I did run my tests with voxforge german dataset (~32h) and I could see some improvement on tests with noise if I also trained with noise (0.43->0.37 WER)

@dan.bmh did you do an inference test in addition with a different noisy german dataset or this is just from the test.csv file after training is done? Sorry for asking this many questions, but do you know by chance how much data is in english voxforge dataset? I can’t seem to find it anywhere Thanks.

I did use voxforge testset and I also splitted my voice dataset into train/dev/test

No idea. If you use the preparation steps from my repository, it will print you the length (while dataset cleaning).

Hey @lissyx @othiele I have decided to use voxforge dataset through the importer in the bin folder.

I have 12gb of ram and a Tesla 80k 12Gb GPU , would the process be more efficient in terms of time with the same gpu and 24 gb of ram?

As I have limited resources but still would like to get some meaningful data without training for days I was wondering if you could give me some advice on what parameters i could run my training on?

My main aim is not to have an extremely low WER it’s only that there is some correlation shown between the different models Testing data. I am looking to train them for around 6 hours max as I have limited time.

What value should I use with


Im not sure how dropout rate works, should I change those also?

Also about the Add Augmentation from the documentation -
–augment add[p=,stddev=,domain=]

If I specify domain=‘spectogram’ then If I understand correctly, random number values will be added to the number representation of the audio?

Do you by chance have detailed documentation on the way it works? I will try looking in the code if not. Do I find details on this in the script?

Thanks in advance!

Probably not, 12 GB should be fine for K80, should have >5 CPUs if possible

6 hours is not much, you should get 10-15 epochs per model for somewhat ok results.

The higher the batch size the better, e.g. 8, 16, …

Default n_hidden

Same dropout (e.g. 0.3 or 0.4) and learning rate (default e-03 ) for all models as this changes results dramatically.

Don’t know about augmentation, I wouldn’t count on the documentation, read the code.

@othiele Thank you the training worked I ended up with a 94% WER thought but it finished optimising in only 3 hours so I can still increase the epochs from 10 to 20 in the next run. I will also try to increase batch size from 64 to 76 or 88, it was using only 4 gb of memory . This time I didnt do augmentation yet just wanted to see if it finished properly. Will report back once I am done with the models or if I run into some problems.

Also could i train on another language or would that introduce some intereference into the data i would be getting by the 3 models because of the language complexity?

Would the smaller dataset of another language help get better results in training in less amount of time?

Also is there a chance that my model is underfitting or the validation loss and training loss are good like this? These are figure after 5 hours of training.

Thanks again .

I0903 17:28:48.316457 140227250132864] NumExpr defaulting to 2 threads.
I Loading best validating checkpoint from /content/checks/best_dev-13240
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/bias/Adam
I Loading variable from checkpoint: layer_6/bias/Adam_1
I Loading variable from checkpoint: layer_6/weights
I Loading variable from checkpoint: layer_6/weights/Adam
I Loading variable from checkpoint: layer_6/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:18:26 | Steps: 1324 | Loss: 81.477223
Epoch 0 | Validation | Elapsed Time: 0:00:05 | Steps: 10 | Loss: 81.157199 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 81.157199 to: /content/checks/best_dev-14564

Epoch 1 | Training | Elapsed Time: 0:17:59 | Steps: 1324 | Loss: 80.177082
Epoch 1 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 80.640202 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 80.640202 to: /content/checks/best_dev-15888

Epoch 2 | Training | Elapsed Time: 0:17:58 | Steps: 1324 | Loss: 79.045931
Epoch 2 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 79.040100 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 79.040100 to: /content/checks/best_dev-17212

Epoch 3 | Training | Elapsed Time: 0:17:53 | Steps: 1324 | Loss: 77.994055
Epoch 3 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 79.000660 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 79.000660 to: /content/checks/best_dev-18536

Epoch 4 | Training | Elapsed Time: 0:17:44 | Steps: 1324 | Loss: 77.084805
Epoch 4 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 78.440584 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 78.440584 to: /content/checks/best_dev-19860

Epoch 5 | Training | Elapsed Time: 0:17:45 | Steps: 1324 | Loss: 76.281919
Epoch 5 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 78.114109 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 78.114109 to: /content/checks/best_dev-21184

Epoch 6 | Training | Elapsed Time: 0:17:39 | Steps: 1324 | Loss: 75.526123
Epoch 6 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 77.518653 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 77.518653 to: /content/checks/best_dev-22508

Epoch 7 | Training | Elapsed Time: 0:17:49 | Steps: 1324 | Loss: 74.905342
Epoch 7 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 77.188713 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 77.188713 to: /content/checks/best_dev-23832

Epoch 8 | Training | Elapsed Time: 0:17:54 | Steps: 1324 | Loss: 74.230232
Epoch 8 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 76.931261 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 76.931261 to: /content/checks/best_dev-25156

Hey @lissyx @othiele I tried training with the same training parameters but with augmentation and i got the following error

!python3 --train_files /content/voxforge/voxforge-train.csv --test_files /content/voxforge/voxforge-test.csv --dev_files /content/voxforge/voxforge-dev.csv --epochs 15 --dev_batch_size 64 --train_batch_size 64 --test_batch_size 64 --log_dir /content/loggs --export_dir /content/models/ --train_cudnn True --checkpoint_dir /content/checks/ --alphabet_config_path /content/voxforge/alphabet.txt --export_model_name 'sept4,0.5,1,spec,noise1' --summary_dir /content/tensorsumm/ --augment add[p=0.5,stddev=0.5,domain='spectrogram']

 I0904 09:25:33.918955 139703338915712] NumExpr defaulting to 2 
 I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:33:28 | Steps: 1313 | Loss: 148.881187   Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/", line 1443, in _call_tf_sessionrun
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 518, 64, 2048] 
	 [[{{node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3}}]]
  (1) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 518, 64, 2048] 
	 [[{{node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "", line 12, in <module>
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/", line 961, in run_script
  File "/usr/local/lib/python3.6/dist-packages/absl/", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/", line 250, in _run_main
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/", line 933, in main
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/", line 601, in train
    train_loss, _ = run_set('train', epoch, train_init_op)
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/", line 566, in run_set
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/", line 956, in run
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/", line 1359, in _do_run
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 518, 64, 2048] 
	 [[node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ ]]
  (1) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 518, 64, 2048] 
	 [[node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3':
  File "", line 12, in <module>
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/", line 961, in run_script
  File "/usr/local/lib/python3.6/dist-packages/absl/", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/", line 250, in _run_main
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/", line 933, in main
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/", line 479, in train
    gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/", line 321, in get_tower_results
    gradients = optimizer.compute_gradients(avg_loss)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/", line 512, in compute_gradients
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/", line 158, in gradients
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/", line 679, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/", line 350, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/", line 679, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/", line 104, in _cudnn_rnn_backwardv3
    direction=op.get_attr("direction")) + (None,)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/", line 749, in cudnn_rnn_backprop_v3
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/", line 794, in _apply_op_helper
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/", line 3426, in _create_op_internal
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

...which was originally created as op 'tower_0/cudnn_lstm/CudnnRNNV3', defined at:
  File "", line 12, in <module>
[elided 4 identical lines from previous traceback]
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/", line 479, in train
    gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/", line 312, in get_tower_results
    avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/", line 239, in calculate_mean_edit_distance_and_loss
    logits, _ = create_model(batch_x, batch_seq_len, dropout, reuse=reuse, rnn_impl=rnn_impl)
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/", line 190, in create_model
    output, output_state = rnn_impl(layer_3, seq_length, previous_state, reuse)
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/", line 128, in rnn_impl_cudnn_rnn
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/layers/", line 548, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/", line 854, in __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/", line 234, in wrapper
    return converted_call(f, options, args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/", line 439, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/", line 330, in _call_unconverted
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/", line 440, in call
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/", line 518, in _forward
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/cudnn_rnn/python/ops/", line 1132, in _cudnn_rnn
    outputs, output_h, output_c, _, _ = gen_cudnn_rnn_ops.cudnn_rnnv3(**args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/", line 2051, in cudnn_rnnv3
    time_major=time_major, name=name)

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/", line 262, in _run_finalizers
Process ForkPoolWorker-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/", line 258, in _bootstrap
  File "/usr/lib/python3.6/multiprocessing/", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/", line 186, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/", line 571, in _terminate_pool
    cls._help_stuff_finish(inqueue, task_handler, len(pool))
  File "/usr/lib/python3.6/multiprocessing/", line 556, in _help_stuff_finish
Process ForkPoolWorker-2:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/", line 258, in _bootstrap
  File "/usr/lib/python3.6/multiprocessing/", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/", line 335, in get
    res = self._reader.recv_bytes()
  File "/usr/lib/python3.6/multiprocessing/", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib/python3.6/multiprocessing/", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.6/multiprocessing/", line 379, in _recv
    chunk = read(handle, remaining)

From the response above you can see I ran a training sucessfuly with no --augment and --epochs 10 , I don’t know why i got this error, what can I change?

Thanks in advance!