Training Model with Custom Data (Not Learning) (DeepSpeech)

I have a dataset of more than 20 hours and trying to Train a Model for Roman-Urdu.

  • Command Used to Train:
    !python3 /content/DeepSpeech/DeepSpeech.py --n_hidden 1024 --early_stop True --test_batch_size 30 --dev_batch_size 20 --train_batch_size 30 --use_allow_growth True --learning_rate 0.0001 --drop_source_layers 2 --train_cudnn 1 --load_checkpoint_dir /root/.local/share/deepspeech/checkpoints --epochs 10 --alphabet_config_path=/content/DeepSpeech/data/alphabet.txt --scorer /content/DeepSpeech/data/lm_model/kenlm-urdu.scorer --export_dir /content/DeepSpeech/model --train_files /content/train.csv --dev_files /content/dev.csv --test_files /content/test.csv
  • Training Output:
    W WARNING: You specified different values for --load_checkpoint_dir and --save_checkpoint_dir, but you are running training and testing in a single invocation. The testing step will respect --load_checkpoint_dir, and thus WILL NOT TEST THE CHECKPOINT CREATED BY THE TRAINING STEP. Train and test in two separate invocations, specifying the correct --load_checkpoint_dir in both cases, or use the same location for loading and saving.
    I0615 19:07:16.263570 140080426973056 utils.py:157] NumExpr defaulting to 2 threads.
    I Could not find best validating checkpoint.
    I Could not find most recent checkpoint.
    I Initializing all variables.
    I STARTING Optimization
    Epoch 0 | Training | Elapsed Time: 0:02:33 | Steps: 72 | Loss: 76.401279
    Epoch 0 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 73.786954 | Dataset: /content/dev.csv
    I Saved new best validating model with loss 73.786954 to: /root/.local/share/deepspeech/checkpoints/best_dev-72

Epoch 1 | Training | Elapsed Time: 0:02:32 | Steps: 72 | Loss: 68.511852
Epoch 1 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 72.548277 | Dataset: /content/dev.csv
I Saved new best validating model with loss 72.548277 to: /root/.local/share/deepspeech/checkpoints/best_dev-144

Epoch 2 | Training | Elapsed Time: 0:02:32 | Steps: 72 | Loss: 68.221344
Epoch 2 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 72.418481 | Dataset: /content/dev.csv
I Saved new best validating model with loss 72.418481 to: /root/.local/share/deepspeech/checkpoints/best_dev-216

Epoch 3 | Training | Elapsed Time: 0:02:32 | Steps: 72 | Loss: 67.996752
Epoch 3 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 72.094603 | Dataset: /content/dev.csv
I Saved new best validating model with loss 72.094603 to: /root/.local/share/deepspeech/checkpoints/best_dev-288

Epoch 4 | Training | Elapsed Time: 0:02:32 | Steps: 72 | Loss: 67.702504
Epoch 4 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 71.911608 | Dataset: /content/dev.csv
I Saved new best validating model with loss 71.911608 to: /root/.local/share/deepspeech/checkpoints/best_dev-360

Epoch 5 | Training | Elapsed Time: 0:02:32 | Steps: 72 | Loss: 67.358164
Epoch 5 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 71.578543 | Dataset: /content/dev.csv
I Saved new best validating model with loss 71.578543 to: /root/.local/share/deepspeech/checkpoints/best_dev-432

Epoch 6 | Training | Elapsed Time: 0:02:32 | Steps: 72 | Loss: 67.456234
Epoch 6 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 71.063190 | Dataset: /content/dev.csv
I Saved new best validating model with loss 71.063190 to: /root/.local/share/deepspeech/checkpoints/best_dev-504

Epoch 7 | Training | Elapsed Time: 0:02:32 | Steps: 72 | Loss: 67.348347
Epoch 7 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 70.383539 | Dataset: /content/dev.csv
I Saved new best validating model with loss 70.383539 to: /root/.local/share/deepspeech/checkpoints/best_dev-576

Epoch 8 | Training | Elapsed Time: 0:02:33 | Steps: 72 | Loss: 67.022106
Epoch 8 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 69.945858 | Dataset: /content/dev.csv
I Saved new best validating model with loss 69.945858 to: /root/.local/share/deepspeech/checkpoints/best_dev-648

Epoch 9 | Training | Elapsed Time: 0:02:33 | Steps: 72 | Loss: 66.372585
Epoch 9 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 69.482766 | Dataset: /content/dev.csv
I Saved new best validating model with loss 69.482766 to: /root/.local/share/deepspeech/checkpoints/best_dev-720

I FINISHED optimization in 0:27:55.360663

  • Testing Output:
    I0615 19:47:34.184036 140205381154688 utils.py:157] NumExpr defaulting to 2 threads.
    I Loading best validating checkpoint from /root/.local/share/deepspeech/checkpoints/best_dev-720
    I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
    I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
    I Loading variable from checkpoint: global_step
    I Loading variable from checkpoint: layer_1/bias
    I Loading variable from checkpoint: layer_1/weights
    I Loading variable from checkpoint: layer_2/bias
    I Loading variable from checkpoint: layer_2/weights
    I Loading variable from checkpoint: layer_3/bias
    I Loading variable from checkpoint: layer_3/weights
    I Loading variable from checkpoint: layer_5/bias
    I Loading variable from checkpoint: layer_5/weights
    I Loading variable from checkpoint: layer_6/bias
    I Loading variable from checkpoint: layer_6/weights
    Testing model on /content/test.csv
    Test epoch | Steps: 290 | Elapsed Time: 0:04:58
    Test on /content/test.csv - WER: 0.999268, CER: 0.978865, loss: 71.347733

Best WER:

WER: 0.750000, CER: 0.833333, loss: 70.279411

  • wav: file:///content/drive/MyDrive/Temp_Data/2474.wav
  • src: “او جناب حافظ بھائی”
  • res: "او "

WER: 1.000000, CER: 1.000000, loss: 144.464981

  • wav: file:///content/drive/MyDrive/Temp_Data/2404.wav
  • src: “جی شکر ہے بھائی آپ سناؤ صحت طبیعت”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 139.195999

  • wav: file:///content/drive/MyDrive/Temp_Data/19352.wav
  • src: “میری جونٹ باڑہ میں لگی بلوچستان تو”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 133.489365

  • wav: file:///content/drive/MyDrive/Temp_Data/2003.wav
  • src: “لیکن بابے سخت ہیں میرے بہت سخت ہیں”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 132.580078

  • wav: file:///content/drive/MyDrive/Temp_Data/19478.wav
  • src: “نیٹ ورک ہی نہیں آ رہے آپ کے ادھر”
  • res: “”

Median WER:

WER: 1.000000, CER: 0.888889, loss: 69.960648

  • wav: file:///content/drive/MyDrive/Temp_Data/243_ST.wav
  • src: “چاند کریں لنگے کے.”
  • res: "او "

WER: 1.000000, CER: 1.000000, loss: 69.491913

  • wav: file:///content/drive/MyDrive/Temp_Data/1937.wav
  • src: “آپریشن ٹھیک ہو گیا ہے.”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 69.144783

  • wav: file:///content/drive/MyDrive/Temp_Data/2040.wav
  • src: “آپ نے کھانا کھایا ہے”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 68.759491

  • wav: file:///content/drive/MyDrive/Temp_Data/2077.wav
  • src: “میرا موبائل لے کر آؤ.”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 68.722771

  • wav: file:///content/drive/MyDrive/Temp_Data/2038.wav
  • src: “جی ہاں کھانا کھایا ہے”
  • res: “”

Worst WER:

WER: 1.000000, CER: 1.000000, loss: 26.710321

  • wav: file:///content/drive/MyDrive/Temp_Data/2369.wav
  • src: “نہیں ہوا”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 22.763023

  • wav: file:///content/drive/MyDrive/Temp_Data/2554.wav
  • src: “ٹھیک ہے”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 21.538822

  • wav: file:///content/drive/MyDrive/Temp_Data/2471.wav
  • src: “او یار”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 20.852743

  • wav: file:///content/drive/MyDrive/Temp_Data/2312.wav
  • src: “بس بہت”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 18.352442

  • wav: file:///content/drive/MyDrive/Temp_Data/2139.wav
  • src: “آمین.”
  • res: “”

  • Sample Dataset:
wav_filename wav_filesize transcript
/content/drive/MyDrive/Temp_Data/0.wav 274430 السلام علیکم کیسے ہو?
/content/drive/MyDrive/Temp_Data/0_TT.wav 1209470 وہ گودا رینج زیادہ آئے ہوتے ہیں.
/content/drive/MyDrive/Temp_Data/1000.wav 236030 میڈم رپلائے تو دے دیں
/content/drive/MyDrive/Temp_Data/10008.wav 1096190 ہاں ہاں گزارا نہیں.
/content/drive/MyDrive/Temp_Data/1001.wav 236030 کہ شاید وہاں نیٹ رکنی آ رہے ہیں
/content/drive/MyDrive/Temp_Data/10010.wav 1457150 بالکل یار. پروگرام کروں گا
/content/drive/MyDrive/Temp_Data/10013.wav 742910 ہائے ملک رال کے چھڑ سو.
/content/drive/MyDrive/Temp_Data/10014.wav 623870 قربان تھی ملک.
/content/drive/MyDrive/Temp_Data/10015.wav 909950 اور اللہ نے بہت کودہ کھڑا ہے. ماما
/content/drive/MyDrive/Temp_Data/10019.wav 381950 میری جان بس نہ پوچھ نہ پوچھ

Questions:

  1. I have a language Model aside, I want to build an Acoustic Model. On Testing the Model, It provides Empty res. The goal is to train a model to transcript roman urdu from Audio

Any Help regarding this can be pretty much appreciated.

Can someone provide their input on it?

Several people have successfully trained with Urdu. What training data are you using? You can also join us on Mozilla’s Matrix to get realtime help.