- Mozilla STT version: DeepSpeech 0.9.3
- OS: On Colab
- Python 3.7.10
- Tensorflow 1.15.2
- Using GPU CUDA 10.0
I tried fine tuning my model by downloading the checkpoints and scorer model for v.0.9.3 by using these parameters.Splitting the data in ratio 8:1:1.But later on using the same dev and test files.So in ratio 8:2.
!python3 DeepSpeech.py --train_cudnn True --early_stop True --es_epochs 3 –es_steps 5 --n_hidden 2048 --epochs 20 \ --export_dir /content/models/ --checkpoint_dir /content/model_checkpoints/ \ --train_files /content/train.csv --dev_files /content/intermediate.csv --test_files /content/intermediate.csv \ --learning_rate 0.0001 --train_batch_size 24 --test_batch_size 48 --dev_batch_size 48 --export_file_name 'ft_model' \ --augment reverb[p=0.2,delay=50.0~30.0,decay=10.0:2.0~1.0] \ --augment volume[p=0.2,dbfs=-10:-40] \ --augment pitch[p=0.2,pitch=1~0.2] \ --augment tempo[p=0.2,factor=1~0.5]
This gave me the following results:
> I0405 02:04:46.870591 140206192527232 utils.py:157] NumExpr defaulting to 2 threads.
> I Could not find best validating checkpoint.
> I Could not find most recent checkpoint.
> I Initializing all variables.
> I STARTING Optimization
> Epoch 0 | Training | Elapsed Time: 0:00:02 | Steps: 1 | Loss: 645.115417
> Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 195.178772 | Dataset: /content/intermediate.csv
> I Saved new best validating model with loss 195.178772 to: /content/model_checkpoints/best_dev-1
> --------------------------------------------------------------------------------
> Epoch 1 | Training | Elapsed Time: 0:00:01 | Steps: 1 | Loss: 208.354904
> Epoch 1 | Validation | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 124.489716 | Dataset: /content/intermediate.csv
> I Saved new best validating model with loss 124.489716 to: /content/model_checkpoints/best_dev-2
> --------------------------------------------------------------------------------
> Epoch 2 | Training | Elapsed Time: 0:00:01 | Steps: 1 | Loss: 151.153809
> Epoch 2 | Validation | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 149.881775 | Dataset: /content/intermediate.csv
> --------------------------------------------------------------------------------
> Epoch 3 | Training | Elapsed Time: 0:00:01 | Steps: 1 | Loss: 184.461914
> Epoch 3 | Validation | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 138.397873 | Dataset: /content/intermediate.csv
> --------------------------------------------------------------------------------
> Epoch 4 | Training | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 169.295624
> Epoch 4 | Validation | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 113.872894 | Dataset: /content/intermediate.csv
> I Saved new best validating model with loss 113.872894 to: /content/model_checkpoints/best_dev-5
> --------------------------------------------------------------------------------
> Epoch 5 | Training | Elapsed Time: 0:00:01 | Steps: 1 | Loss: 141.762100
> Epoch 5 | Validation | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 111.355629 | Dataset: /content/intermediate.csv
> I Saved new best validating model with loss 111.355629 to: /content/model_checkpoints/best_dev-6
> --------------------------------------------------------------------------------
> Epoch 6 | Training | Elapsed Time: 0:00:03 | Steps: 1 | Loss: 124.472694
> Epoch 6 | Validation | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 134.303055 | Dataset: /content/intermediate.csv
> --------------------------------------------------------------------------------
> Epoch 7 | Training | Elapsed Time: 0:00:01 | Steps: 1 | Loss: 144.363953
> Epoch 7 | Validation | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 124.479218 | Dataset: /content/intermediate.csv
> --------------------------------------------------------------------------------
> Epoch 8 | Training | Elapsed Time: 0:00:01 | Steps: 1 | Loss: 131.589767
> Epoch 8 | Validation | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 109.516235 | Dataset: /content/intermediate.csv
> I Saved new best validating model with loss 109.516235 to: /content/model_checkpoints/best_dev-9
> --------------------------------------------------------------------------------
> Epoch 9 | Training | Elapsed Time: 0:00:01 | Steps: 1 | Loss: 122.574684
> Epoch 9 | Validation | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 104.244781 | Dataset: /content/intermediate.csv
> I Saved new best validating model with loss 104.244781 to: /content/model_checkpoints/best_dev-10
> --------------------------------------------------------------------------------
> Epoch 10 | Training | Elapsed Time: 0:00:01 | Steps: 1 | Loss: 120.229431
> Epoch 10 | Validation | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 105.151703 | Dataset: /content/intermediate.csv
> --------------------------------------------------------------------------------
> Epoch 11 | Training | Elapsed Time: 0:00:01 | Steps: 1 | Loss: 124.279762
> Epoch 11 | Validation | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 106.004166 | Dataset: /content/intermediate.csv
> --------------------------------------------------------------------------------
> Epoch 12 | Training | Elapsed Time: 0:00:01 | Steps: 1 | Loss: 123.945976
> Epoch 12 | Validation | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 105.062477 | Dataset: /content/intermediate.csv
> I Early stop triggered as the loss did not improve the last 3 epochs
> I FINISHED optimization in 0:05:06.340667
> I Loading best validating checkpoint from /content/model_checkpoints/best_dev-10
> I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
> I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
> I Loading variable from checkpoint: global_step
> I Loading variable from checkpoint: layer_1/bias
> I Loading variable from checkpoint: layer_1/weights
> I Loading variable from checkpoint: layer_2/bias
> I Loading variable from checkpoint: layer_2/weights
> I Loading variable from checkpoint: layer_3/bias
> I Loading variable from checkpoint: layer_3/weights
> I Loading variable from checkpoint: layer_5/bias
> I Loading variable from checkpoint: layer_5/weights
> I Loading variable from checkpoint: layer_6/bias
> I Loading variable from checkpoint: layer_6/weights
> Testing model on /content/intermediate.csv
> Test epoch | Steps: 1 | Elapsed Time: 0:00:12
> Test on /content/intermediate.csv - WER: 1.000000, CER: 0.880259, loss: 104.244766
> --------------------------------------------------------------------------------
> Best WER:
> --------------------------------------------------------------------------------
> WER: 1.000000, CER: 0.884615, loss: 154.878494
> - wav: file:///content/drive/MyDrive/audiowavfiles/07.wav
> - src: "brain is the highest coordinating centre in the body"
> - res: " "
> --------------------------------------------------------------------------------
> WER: 1.000000, CER: 0.860000, loss: 153.049408
> - wav: file:///content/drive/MyDrive/audiowavfiles/09.wav
> - src: "energy required by an organism comes from the food"
> - res: " "
> --------------------------------------------------------------------------------
> WER: 1.000000, CER: 0.869565, loss: 138.342255
> - wav: file:///content/drive/MyDrive/audiowavfiles/25.wav
> - src: "enables the creation of cross platform program"
> - res: "e e "
> --------------------------------------------------------------------------------
> WER: 1.000000, CER: 0.875000, loss: 120.579895
> - wav: file:///content/drive/MyDrive/audiowavfiles/04.wav
> - src: "upgrade changes in core system resources"
> - res: " "
> --------------------------------------------------------------------------------
> WER: 1.000000, CER: 0.916667, loss: 107.980042
> - wav: file:///content/drive/MyDrive/audiowavfiles/06.wav
> - src: "dealloaction is completely automatic"
> - res: " "
> --------------------------------------------------------------------------------
> Median WER:
> --------------------------------------------------------------------------------
> WER: 1.000000, CER: 0.869565, loss: 138.342255
> - wav: file:///content/drive/MyDrive/audiowavfiles/25.wav
> - src: "enables the creation of cross platform program"
> - res: "e e "
> --------------------------------------------------------------------------------
> WER: 1.000000, CER: 0.875000, loss: 120.579895
> - wav: file:///content/drive/MyDrive/audiowavfiles/04.wav
> - src: "upgrade changes in core system resources"
> - res: " "
> --------------------------------------------------------------------------------
> WER: 1.000000, CER: 0.916667, loss: 107.980042
> - wav: file:///content/drive/MyDrive/audiowavfiles/06.wav
> - src: "dealloaction is completely automatic"
> - res: " "
> --------------------------------------------------------------------------------
> WER: 1.000000, CER: 0.880000, loss: 76.253258
> - wav: file:///content/drive/MyDrive/audiowavfiles/08.wav
> - src: "liver secretes bile juice"
> - res: " "
> --------------------------------------------------------------------------------
> WER: 1.000000, CER: 0.869565, loss: 68.422134
> - wav: file:///content/drive/MyDrive/audiowavfiles/f521a5fd-3081-4c34-9c13-ed1e840925ea.wav
> - src: "pass me the salt bottle"
> - res: " "
> --------------------------------------------------------------------------------
> Worst WER:
> --------------------------------------------------------------------------------
> WER: 1.000000, CER: 0.916667, loss: 107.980042
> - wav: file:///content/drive/MyDrive/audiowavfiles/06.wav
> - src: "dealloaction is completely automatic"
> - res: " "
> --------------------------------------------------------------------------------
> WER: 1.000000, CER: 0.880000, loss: 76.253258
> - wav: file:///content/drive/MyDrive/audiowavfiles/08.wav
> - src: "liver secretes bile juice"
> - res: " "
> --------------------------------------------------------------------------------
> WER: 1.000000, CER: 0.869565, loss: 68.422134
> - wav: file:///content/drive/MyDrive/audiowavfiles/f521a5fd-3081-4c34-9c13-ed1e840925ea.wav
> - src: "pass me the salt bottle"
> - res: " "
> --------------------------------------------------------------------------------
> WER: 1.000000, CER: 0.894737, loss: 61.059723
> - wav: file:///content/drive/MyDrive/audiowavfiles/8178df6f-a65d-4745-862a-c6eb5b655d6d.wav
> - src: "it's time for lunch"
> - res: " "
> --------------------------------------------------------------------------------
> WER: 1.000000, CER: 0.888889, loss: 57.637714
> - wav: file:///content/drive/MyDrive/audiowavfiles/7f7a8b09-01d8-451a-a425-4ca4da3322dc.wav
> - src: "she plays football"
> - res: " "
> --------------------------------------------------------------------------------
My csv file has 34 audio records of length 5 secs average by a single person in Indian accent female voice.The WER is 1.000 and loss is very high.I can not figure out where I am going wrong.
I have tried:
-
Using different dev and test sets but no major difference on results.
-
fine tuning with and without scorer model but WER remained same.
-
Different combinations of hyperparameters as suggested in official docs but results are almost the same .
The questions I have: -
Is it because I am using smaller amount of data?
-
I have a list of 100-200 commands that needs to be recognized exactly.How do I fine tune according to that?
-
What are the hyperparameters most suitable for this size of data,(epochs,steps etc)?