Noise injection training experiment

dan.bmh · September 2, 2020, 6:19pm

You might be interested in this pull request: https://github.com/mozilla/DeepSpeech/pull/2622. It did allow you to mix noise/speech online into your testset. But it’s somewhat older (around version 0.7) and not continued anymore.

I already did some noise tests before, you can find my setup steps here: https://gitlab.com/Jaco-Assistant/deepspeech-polyglot#download-and-prepare-noise-data and the results in the tables below.

arpi.aszalos · September 2, 2020, 6:59pm

Yes thank you @dan.bmh I started reading , however i wouldnt use the background noise augmentation. Maybe im misunderstanding, but Add augmentation adds random noise to the training data in forms of numbers to the spectograms?

I will check the pull and your setup thanks for reaching out.

arpi.aszalos · September 2, 2020, 6:40pm

You think i could get data showing some correlation if I train it for only 3-4 hours? (using gpu)
For the voxforge dataset english

dan.bmh · September 2, 2020, 7:12pm

Both, the current DeepSpeech master and the pull request have flags to augment with noise audio (using standard csv format), it’s called overlay augmentation in master.

I did run my tests with voxforge german dataset (~32h) and I could see some improvement on tests with noise if I also trained with noise (0.43->0.37 WER)

arpi.aszalos · September 2, 2020, 7:22pm

@dan.bmh did you do an inference test in addition with a different noisy german dataset or this is just from the test.csv file after training is done? Sorry for asking this many questions, but do you know by chance how much data is in english voxforge dataset? I can’t seem to find it anywhere Thanks.

dan.bmh · September 2, 2020, 7:35pm

I did use voxforge testset and I also splitted my voice dataset into train/dev/test

No idea. If you use the preparation steps from my repository, it will print you the length (while dataset cleaning).

arpi.aszalos · September 3, 2020, 7:15am

Hey @lissyx @othiele I have decided to use voxforge dataset through the importer in the bin folder.

I have 12gb of ram and a Tesla 80k 12Gb GPU , would the process be more efficient in terms of time with the same gpu and 24 gb of ram?

As I have limited resources but still would like to get some meaningful data without training for days I was wondering if you could give me some advice on what parameters i could run my training on?

My main aim is not to have an extremely low WER it’s only that there is some correlation shown between the different models Testing data. I am looking to train them for around 6 hours max as I have limited time.

What value should I use with

–epochs
–train_batch_size
–dev_batch_size
–test_batch_size
–n_hidden
–learning_rate

Im not sure how dropout rate works, should I change those also?

Also about the Add Augmentation from the documentation - https://deepspeech.readthedocs.io/en/v0.8.2/TRAINING.html#augmentation
–augment add[p=,stddev=,domain=]

If I specify domain=‘spectogram’ then If I understand correctly, random number values will be added to the number representation of the audio?

Do you by chance have detailed documentation on the way it works? I will try looking in the code if not. Do I find details on this in the DeepSpeech.py script?

Thanks in advance!

othiele · September 3, 2020, 9:55am

Probably not, 12 GB should be fine for K80, should have >5 CPUs if possible

6 hours is not much, you should get 10-15 epochs per model for somewhat ok results.

The higher the batch size the better, e.g. 8, 16, …

Default n_hidden

Same dropout (e.g. 0.3 or 0.4) and learning rate (default e-03 ) for all models as this changes results dramatically.

Don’t know about augmentation, I wouldn’t count on the documentation, read the code.

arpi.aszalos · September 3, 2020, 8:22pm

@othiele Thank you the training worked I ended up with a 94% WER thought but it finished optimising in only 3 hours so I can still increase the epochs from 10 to 20 in the next run. I will also try to increase batch size from 64 to 76 or 88, it was using only 4 gb of memory . This time I didnt do augmentation yet just wanted to see if it finished properly. Will report back once I am done with the models or if I run into some problems.

Also could i train on another language or would that introduce some intereference into the data i would be getting by the 3 models because of the language complexity?

Would the smaller dataset of another language help get better results in training in less amount of time?

Also is there a chance that my model is underfitting or the validation loss and training loss are good like this? These are figure after 5 hours of training.

Thanks again .

I0903 17:28:48.316457 140227250132864 utils.py:141] NumExpr defaulting to 2 threads.
I Loading best validating checkpoint from /content/checks/best_dev-13240
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/bias/Adam
I Loading variable from checkpoint: layer_6/bias/Adam_1
I Loading variable from checkpoint: layer_6/weights
I Loading variable from checkpoint: layer_6/weights/Adam
I Loading variable from checkpoint: layer_6/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:18:26 | Steps: 1324 | Loss: 81.477223
Epoch 0 | Validation | Elapsed Time: 0:00:05 | Steps: 10 | Loss: 81.157199 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 81.157199 to: /content/checks/best_dev-14564

Epoch 1 | Training | Elapsed Time: 0:17:59 | Steps: 1324 | Loss: 80.177082
Epoch 1 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 80.640202 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 80.640202 to: /content/checks/best_dev-15888

Epoch 2 | Training | Elapsed Time: 0:17:58 | Steps: 1324 | Loss: 79.045931
Epoch 2 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 79.040100 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 79.040100 to: /content/checks/best_dev-17212

Epoch 3 | Training | Elapsed Time: 0:17:53 | Steps: 1324 | Loss: 77.994055
Epoch 3 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 79.000660 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 79.000660 to: /content/checks/best_dev-18536

Epoch 4 | Training | Elapsed Time: 0:17:44 | Steps: 1324 | Loss: 77.084805
Epoch 4 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 78.440584 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 78.440584 to: /content/checks/best_dev-19860

Epoch 5 | Training | Elapsed Time: 0:17:45 | Steps: 1324 | Loss: 76.281919
Epoch 5 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 78.114109 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 78.114109 to: /content/checks/best_dev-21184

Epoch 6 | Training | Elapsed Time: 0:17:39 | Steps: 1324 | Loss: 75.526123
Epoch 6 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 77.518653 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 77.518653 to: /content/checks/best_dev-22508

Epoch 7 | Training | Elapsed Time: 0:17:49 | Steps: 1324 | Loss: 74.905342
Epoch 7 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 77.188713 | Dataset: /content/voxforge/voxforge-dev.csv
I Saved new best validating model with loss 77.188713 to: /content/checks/best_dev-23832

arpi.aszalos · September 4, 2020, 10:09am

Hey @lissyx @othiele I tried training with the same training parameters but with augmentation and i got the following error

!python3 DeepSpeech.py --train_files /content/voxforge/voxforge-train.csv --test_files /content/voxforge/voxforge-test.csv --dev_files /content/voxforge/voxforge-dev.csv --epochs 15 --dev_batch_size 64 --train_batch_size 64 --test_batch_size 64 --log_dir /content/loggs --export_dir /content/models/ --train_cudnn True --checkpoint_dir /content/checks/ --alphabet_config_path /content/voxforge/alphabet.txt --export_model_name 'sept4,0.5,1,spec,noise1' --summary_dir /content/tensorsumm/ --augment add[p=0.5,stddev=0.5,domain='spectrogram']

 I0904 09:25:33.918955 139703338915712 utils.py:141] NumExpr defaulting to 2 
 threads.
 I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:33:28 | Steps: 1313 | Loss: 148.881187   Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 518, 64, 2048] 
	 [[{{node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3}}]]
	 [[tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3/_69]]
  (1) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 518, 64, 2048] 
	 [[{{node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/train.py", line 961, in run_script
    absl.app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/train.py", line 933, in main
    train()
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/train.py", line 601, in train
    train_loss, _ = run_set('train', epoch, train_init_op)
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/train.py", line 566, in run_set
    feed_dict=feed_dict)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 518, 64, 2048] 
	 [[node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
	 [[tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3/_69]]
  (1) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 518, 64, 2048] 
	 [[node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3':
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/train.py", line 961, in run_script
    absl.app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/train.py", line 933, in main
    train()
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/train.py", line 479, in train
    gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/train.py", line 321, in get_tower_results
    gradients = optimizer.compute_gradients(avg_loss)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/optimizer.py", line 512, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_impl.py", line 158, in gradients
    unconnected_gradients)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_util.py", line 679, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_util.py", line 350, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_util.py", line 679, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/cudnn_rnn_grad.py", line 104, in _cudnn_rnn_backwardv3
    direction=op.get_attr("direction")) + (None,)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_cudnn_rnn_ops.py", line 749, in cudnn_rnn_backprop_v3
    name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

...which was originally created as op 'tower_0/cudnn_lstm/CudnnRNNV3', defined at:
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
[elided 4 identical lines from previous traceback]
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/train.py", line 479, in train
    gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/train.py", line 312, in get_tower_results
    avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/train.py", line 239, in calculate_mean_edit_distance_and_loss
    logits, _ = create_model(batch_x, batch_seq_len, dropout, reuse=reuse, rnn_impl=rnn_impl)
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/train.py", line 190, in create_model
    output, output_state = rnn_impl(layer_3, seq_length, previous_state, reuse)
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_training/train.py", line 128, in rnn_impl_cudnn_rnn
    sequence_lengths=seq_length)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/layers/base.py", line 548, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py", line 854, in __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py", line 234, in wrapper
    return converted_call(f, options, args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py", line 439, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py", line 330, in _call_unconverted
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 440, in call
    training)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 518, in _forward
    seed=self._seed)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 1132, in _cudnn_rnn
    outputs, output_h, output_c, _, _ = gen_cudnn_rnn_ops.cudnn_rnnv3(**args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_cudnn_rnn_ops.py", line 2051, in cudnn_rnnv3
    time_major=time_major, name=name)

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers
Process ForkPoolWorker-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/util.py", line 186, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 571, in _terminate_pool
    cls._help_stuff_finish(inqueue, task_handler, len(pool))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 556, in _help_stuff_finish
    inqueue._rlock.acquire()
KeyboardInterrupt
Process ForkPoolWorker-2:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 335, in get
    res = self._reader.recv_bytes()
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt

From the response above you can see I ran a training sucessfuly with no --augment and --epochs 10 , I don’t know why i got this error, what can I change?

Thanks in advance!

lissyx · September 4, 2020, 10:58am

Please search on github issues this is upstream tensorflow issue.

arpi.aszalos · September 5, 2020, 11:18am

Hey @othiele @lissyx or anyone reading. It turned out to be a gpu ram issue as i decreased batch size the training went through, I also need to be careful with the probability of the augmentation as considering the voxforge dataset I cannot handle half of the audio being augmented, I decided to decrease probability to about 0,03; 0,05 ; 0,1.

My two trainings didn’t work at all as for testing one of them only returned the two
letters ‘e a’ and the other only spaces. I assume it’s because I have made the ‘stddev’ that is the added noise too big for it to be able to learn anything.

On this note I am having a hard time understanding a decent value for ‘stddev’ in the --augment add[] argument, Maybe you could help me how the standard deviation of the normal distribution works, in terms of what marginal value could I be giving it for little and for lot of noise?

Furthermore as training from scratch as I have now realised takes a lot of resources and is quite painful to experiment with in terms of time which I dont have given I have a deadline for the experiment. Would it be a wise choice to use the pre trained model for further training? Would that give me visible results faster , hypothetically
speaking?

Thanks in advance, you are helping me immensely!

othiele · September 5, 2020, 1:36pm

You should get a regular training without augmentation going so you have a baseline.
If you just get one letter output this could mean too few data or epochs.
Transfer might be a good idea, as we said before, you’ll need to experiment a bit and this might take some time.

arpi.aszalos · September 9, 2020, 9:23am

@othiele Hey I trained on the release v.0.8.2 checkpoints for 10 epochs with reduced learning rate on plateaue, and got my loss to around 7.345 , and it has a WER of 26% . Is it enough to start my new training from the checkpoints or is there something else I need to import so the training starts from the previous Training and validation loss of my model?

Thanks in advance.

othiele · September 9, 2020, 9:32am

I am not sure I understand what your are doing.

What material do you use for finetuning, what for testing, what is this step for? WER of 0.26 sounds high for the release.

arpi.aszalos · September 9, 2020, 9:43am

Sorry i will try to provide more info.

So i take the 0.8.2 checkpoints and start my training from there using the Voxforge dataset , with the !bin/import_voxforge.py util . I initiate the training with the argument which reduces the learning rate if the loss plateaus. I do this cause I have found that the loss increases if I keep the initial learning rate. With this I ran the training for 10 epochs using the voxforge dataset and 64 batch sizes. After the 10 epochs I end with a loss of around 7.45 . With this model the test epoch yields a 0.26 WER .

When i try to continue from the checkpoints saved from this model which I have fine tuned for 10 epochs. I find my training loss to start from 156.00 . Is this normal? Shouldn’t it start from the loss where it left off after fine tuning?

If you need any other information I will try my best to provide it.

othiele · September 9, 2020, 9:45am

Learning rate, dropout?

Search fine tuning, transfer learning in this forum for numbers, which you obviously didn’t …

High loss is fine, but 10 epochs might not be enough

arpi.aszalos · September 9, 2020, 9:50am

Thanks will do more! I will get back if I have other problems, thank you!

arpi.aszalos · September 15, 2020, 3:15pm

Hey @othiele , @lissyx I have a question about the test epoch. So Rather than doing inference testing with the exported model would Running training from the same checkpoints for 0 epochs be testing? I tried it and it immediately gets to the testing phase. Is it a valid way to test models?

Thanks in advance

othiele · September 16, 2020, 12:54pm

Please search before you post. Let us know what you found out while searching and we’ll happily add to that.

Noise injection training experiment

Epoch 1 | Training | Elapsed Time: 0:17:59 | Steps: 1324 | Loss: 80.177082 Epoch 1 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 80.640202 | Dataset: /content/voxforge/voxforge-dev.csv I Saved new best validating model with loss 80.640202 to: /content/checks/best_dev-15888

Epoch 2 | Training | Elapsed Time: 0:17:58 | Steps: 1324 | Loss: 79.045931 Epoch 2 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 79.040100 | Dataset: /content/voxforge/voxforge-dev.csv I Saved new best validating model with loss 79.040100 to: /content/checks/best_dev-17212

Epoch 3 | Training | Elapsed Time: 0:17:53 | Steps: 1324 | Loss: 77.994055 Epoch 3 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 79.000660 | Dataset: /content/voxforge/voxforge-dev.csv I Saved new best validating model with loss 79.000660 to: /content/checks/best_dev-18536

Epoch 4 | Training | Elapsed Time: 0:17:44 | Steps: 1324 | Loss: 77.084805 Epoch 4 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 78.440584 | Dataset: /content/voxforge/voxforge-dev.csv I Saved new best validating model with loss 78.440584 to: /content/checks/best_dev-19860

Epoch 5 | Training | Elapsed Time: 0:17:45 | Steps: 1324 | Loss: 76.281919 Epoch 5 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 78.114109 | Dataset: /content/voxforge/voxforge-dev.csv I Saved new best validating model with loss 78.114109 to: /content/checks/best_dev-21184

Epoch 6 | Training | Elapsed Time: 0:17:39 | Steps: 1324 | Loss: 75.526123 Epoch 6 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 77.518653 | Dataset: /content/voxforge/voxforge-dev.csv I Saved new best validating model with loss 77.518653 to: /content/checks/best_dev-22508

Epoch 7 | Training | Elapsed Time: 0:17:49 | Steps: 1324 | Loss: 74.905342 Epoch 7 | Validation | Elapsed Time: 0:00:04 | Steps: 10 | Loss: 77.188713 | Dataset: /content/voxforge/voxforge-dev.csv I Saved new best validating model with loss 77.188713 to: /content/checks/best_dev-23832