Trained model always infers one letter with WER of 1.0

Tejas_Shah · August 19, 2020, 10:18pm

Hello,
I am training the english model with the sample data from common voice. Following are the environment details:
TensorFlow: v2.2.0-24-g1c1b2b9
DeepSpeech: v0.8.1-0-gfa883eb
Python: 3.6
Ubuntu: 18.04
GPUs: 4 Tesla M60
I am carrying out the training on 20K samples of common voice with 2K dev and 2K test samples.
Following is the command I am using for building the training.

*python3 -u DeepSpeech.py --noshow_progressbar *
*–early_stop True *
*–automatic_mixed_precision True *
*–train_files /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/train-trunc.csv *
*–train_batch_size 16 *
*–dev_files /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv *
*–dev_batch_size 16 *
*–test_files /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/test-trunc.csv *
*–test_batch_size 8 *
*–epochs 50 *
*–max_to_keep 5 *
*–checkpoint_dir /home/ubuntu/deepspeech/checkpt *
*–learning_rate 0.00095 *
*–dropout_rate 0.01 *
*–train_cudnn True *
*–use_allow_growth True *
–export_dir /home/ubuntu/deepspeech/models

Following are some of the epoch results…

I Finished training epoch 0 - loss: 210.038538
I Validating epoch 0 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv...
I Finished validating epoch 0 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv - loss: 161.661049
I Saved new best validating model with loss 161.661049 to: /home/ubuntu/deepspeech/checkpt/best_dev-312
--------------------------------------------------------------------------------
I Training epoch 1...
I Finished training epoch 1 - loss: 176.056927
I Validating epoch 1 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv...
I Finished validating epoch 1 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv - loss: 161.454853
I Saved new best validating model with loss 161.454853 to: /home/ubuntu/deepspeech/checkpt/best_dev-624
--------------------------------------------------------------------------------
I Training epoch 2...

I Finished training epoch 2 - loss: 175.998201
I Validating epoch 2 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv...
I Finished validating epoch 2 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv - loss: 160.673306
I Saved new best validating model with loss 160.673306 to: /home/ubuntu/deepspeech/checkpt/best_dev-936
--------------------------------------------------------------------------------
I Training epoch 3...
I Finished training epoch 3 - loss: 175.912230
I Validating epoch 3 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv...
I Finished validating epoch 3 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv - loss: 160.469619
I Saved new best validating model with loss 160.469619 to: /home/ubuntu/deepspeech/checkpt/best_dev-1248
--------------------------------------------------------------------------------
I Training epoch 4...
I Finished training epoch 4 - loss: 175.855402
I Validating epoch 4 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv...
I Finished validating epoch 4 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv - loss: 160.392101
I Saved new best validating model with loss 160.392101 to: /home/ubuntu/deepspeech/checkpt/best_dev-1560
--------------------------------------------------------------------------------
............

--------------------------------------------------------------------------------
I Training epoch 44...
I Finished training epoch 44 - loss: 175.437583
I Validating epoch 44 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv...
I Finished validating epoch 44 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv - loss: 160.548939
--------------------------------------------------------------------------------
I Training epoch 45...
I Finished training epoch 45 - loss: 175.429157
I Validating epoch 45 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv...
I Finished validating epoch 45 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv - loss: 160.334952
--------------------------------------------------------------------------------
I Training epoch 46...
I Finished training epoch 46 - loss: 175.445124
I Validating epoch 46 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv...
I Finished validating epoch 46 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv - loss: 160.499846
--------------------------------------------------------------------------------
I Training epoch 47...
I Finished training epoch 47 - loss: 175.427625
I Validating epoch 47 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv...
I Finished validating epoch 47 on /home/ubuntu/deepspeech/cv-corpus-5.1-2020-06-22/en/clips/dev-trunc.csv - loss: 160.415010
I Early stop triggered as the loss did not improve the last 25 epochs
I FINISHED optimization in 4:59:18.492089

The model gets generated but when I test it using the native_client/python/client.py, it outputs only one letter every time.

$ python3 native_client/python/client.py --model …/models/output_graph.pb --audio …/cv-corpus-5.1-2020-06-22/en/clips/common_voice_en_19740376.wav
Loading model from file …/models/output_graph.pb
TensorFlow: v2.2.0-24-g1c1b2b9
DeepSpeech: v0.8.1-0-gfa883eb
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
Loaded model in 3.35s.
Running inference.
t
Inference took 9.225s for 7.560s audio file.

I have tried changing the learning rate, number of epochs etc. but nothing seems to work.
can someone please help ?
@lissyx

othiele · August 20, 2020, 7:31am

That is few data, don’t expect much, especially on noisy common voice. Use LibriSpeech instead.

Use sth like 0.2-0.4

Tejas_Shah · August 20, 2020, 8:11am

Thanks Olaf. I thought the released model from deepspeech was created from Common Voice samples. With the parameters I have used, can I use it to train on the entire Common Voice dataset (of 140K samples)?
Meanwhile I will try with LibriSpeech.

othiele · August 20, 2020, 8:17am

Read the release notes in detail, much more data is used together with LibriSpeech as dev/gold standard, this should get you better results.