I have a dataset of more than 20 hours and trying to Train a Model for Roman-Urdu.
- Command Used to Train:
!python3 /content/DeepSpeech/DeepSpeech.py --n_hidden 1024 --early_stop True --test_batch_size 30 --dev_batch_size 20 --train_batch_size 30 --use_allow_growth True --learning_rate 0.0001 --drop_source_layers 2 --train_cudnn 1 --load_checkpoint_dir /root/.local/share/deepspeech/checkpoints --epochs 10 --alphabet_config_path=/content/DeepSpeech/data/alphabet.txt --scorer /content/DeepSpeech/data/lm_model/kenlm-urdu.scorer --export_dir /content/DeepSpeech/model --train_files /content/train.csv --dev_files /content/dev.csv --test_files /content/test.csv - Training Output:
W WARNING: You specified different values for --load_checkpoint_dir and --save_checkpoint_dir, but you are running training and testing in a single invocation. The testing step will respect --load_checkpoint_dir, and thus WILL NOT TEST THE CHECKPOINT CREATED BY THE TRAINING STEP. Train and test in two separate invocations, specifying the correct --load_checkpoint_dir in both cases, or use the same location for loading and saving.
I0615 19:07:16.263570 140080426973056 utils.py:157] NumExpr defaulting to 2 threads.
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:02:33 | Steps: 72 | Loss: 76.401279
Epoch 0 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 73.786954 | Dataset: /content/dev.csv
I Saved new best validating model with loss 73.786954 to: /root/.local/share/deepspeech/checkpoints/best_dev-72
Epoch 1 | Training | Elapsed Time: 0:02:32 | Steps: 72 | Loss: 68.511852
Epoch 1 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 72.548277 | Dataset: /content/dev.csv
I Saved new best validating model with loss 72.548277 to: /root/.local/share/deepspeech/checkpoints/best_dev-144
Epoch 2 | Training | Elapsed Time: 0:02:32 | Steps: 72 | Loss: 68.221344
Epoch 2 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 72.418481 | Dataset: /content/dev.csv
I Saved new best validating model with loss 72.418481 to: /root/.local/share/deepspeech/checkpoints/best_dev-216
Epoch 3 | Training | Elapsed Time: 0:02:32 | Steps: 72 | Loss: 67.996752
Epoch 3 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 72.094603 | Dataset: /content/dev.csv
I Saved new best validating model with loss 72.094603 to: /root/.local/share/deepspeech/checkpoints/best_dev-288
Epoch 4 | Training | Elapsed Time: 0:02:32 | Steps: 72 | Loss: 67.702504
Epoch 4 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 71.911608 | Dataset: /content/dev.csv
I Saved new best validating model with loss 71.911608 to: /root/.local/share/deepspeech/checkpoints/best_dev-360
Epoch 5 | Training | Elapsed Time: 0:02:32 | Steps: 72 | Loss: 67.358164
Epoch 5 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 71.578543 | Dataset: /content/dev.csv
I Saved new best validating model with loss 71.578543 to: /root/.local/share/deepspeech/checkpoints/best_dev-432
Epoch 6 | Training | Elapsed Time: 0:02:32 | Steps: 72 | Loss: 67.456234
Epoch 6 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 71.063190 | Dataset: /content/dev.csv
I Saved new best validating model with loss 71.063190 to: /root/.local/share/deepspeech/checkpoints/best_dev-504
Epoch 7 | Training | Elapsed Time: 0:02:32 | Steps: 72 | Loss: 67.348347
Epoch 7 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 70.383539 | Dataset: /content/dev.csv
I Saved new best validating model with loss 70.383539 to: /root/.local/share/deepspeech/checkpoints/best_dev-576
Epoch 8 | Training | Elapsed Time: 0:02:33 | Steps: 72 | Loss: 67.022106
Epoch 8 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 69.945858 | Dataset: /content/dev.csv
I Saved new best validating model with loss 69.945858 to: /root/.local/share/deepspeech/checkpoints/best_dev-648
Epoch 9 | Training | Elapsed Time: 0:02:33 | Steps: 72 | Loss: 66.372585
Epoch 9 | Validation | Elapsed Time: 0:00:13 | Steps: 22 | Loss: 69.482766 | Dataset: /content/dev.csv
I Saved new best validating model with loss 69.482766 to: /root/.local/share/deepspeech/checkpoints/best_dev-720
I FINISHED optimization in 0:27:55.360663
- Testing Output:
I0615 19:47:34.184036 140205381154688 utils.py:157] NumExpr defaulting to 2 threads.
I Loading best validating checkpoint from /root/.local/share/deepspeech/checkpoints/best_dev-720
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/weights
Testing model on /content/test.csv
Test epoch | Steps: 290 | Elapsed Time: 0:04:58
Test on /content/test.csv - WER: 0.999268, CER: 0.978865, loss: 71.347733
Best WER:
WER: 0.750000, CER: 0.833333, loss: 70.279411
- wav: file:///content/drive/MyDrive/Temp_Data/2474.wav
- src: “او جناب حافظ بھائی”
- res: "او "
WER: 1.000000, CER: 1.000000, loss: 144.464981
- wav: file:///content/drive/MyDrive/Temp_Data/2404.wav
- src: “جی شکر ہے بھائی آپ سناؤ صحت طبیعت”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 139.195999
- wav: file:///content/drive/MyDrive/Temp_Data/19352.wav
- src: “میری جونٹ باڑہ میں لگی بلوچستان تو”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 133.489365
- wav: file:///content/drive/MyDrive/Temp_Data/2003.wav
- src: “لیکن بابے سخت ہیں میرے بہت سخت ہیں”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 132.580078
- wav: file:///content/drive/MyDrive/Temp_Data/19478.wav
- src: “نیٹ ورک ہی نہیں آ رہے آپ کے ادھر”
- res: “”
Median WER:
WER: 1.000000, CER: 0.888889, loss: 69.960648
- wav: file:///content/drive/MyDrive/Temp_Data/243_ST.wav
- src: “چاند کریں لنگے کے.”
- res: "او "
WER: 1.000000, CER: 1.000000, loss: 69.491913
- wav: file:///content/drive/MyDrive/Temp_Data/1937.wav
- src: “آپریشن ٹھیک ہو گیا ہے.”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 69.144783
- wav: file:///content/drive/MyDrive/Temp_Data/2040.wav
- src: “آپ نے کھانا کھایا ہے”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 68.759491
- wav: file:///content/drive/MyDrive/Temp_Data/2077.wav
- src: “میرا موبائل لے کر آؤ.”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 68.722771
- wav: file:///content/drive/MyDrive/Temp_Data/2038.wav
- src: “جی ہاں کھانا کھایا ہے”
- res: “”
Worst WER:
WER: 1.000000, CER: 1.000000, loss: 26.710321
- wav: file:///content/drive/MyDrive/Temp_Data/2369.wav
- src: “نہیں ہوا”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 22.763023
- wav: file:///content/drive/MyDrive/Temp_Data/2554.wav
- src: “ٹھیک ہے”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 21.538822
- wav: file:///content/drive/MyDrive/Temp_Data/2471.wav
- src: “او یار”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 20.852743
- wav: file:///content/drive/MyDrive/Temp_Data/2312.wav
- src: “بس بہت”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 18.352442
- wav: file:///content/drive/MyDrive/Temp_Data/2139.wav
- src: “آمین.”
- res: “”
- Sample Dataset:
wav_filename | wav_filesize | transcript |
---|---|---|
/content/drive/MyDrive/Temp_Data/0.wav | 274430 | السلام علیکم کیسے ہو? |
/content/drive/MyDrive/Temp_Data/0_TT.wav | 1209470 | وہ گودا رینج زیادہ آئے ہوتے ہیں. |
/content/drive/MyDrive/Temp_Data/1000.wav | 236030 | میڈم رپلائے تو دے دیں |
/content/drive/MyDrive/Temp_Data/10008.wav | 1096190 | ہاں ہاں گزارا نہیں. |
/content/drive/MyDrive/Temp_Data/1001.wav | 236030 | کہ شاید وہاں نیٹ رکنی آ رہے ہیں |
/content/drive/MyDrive/Temp_Data/10010.wav | 1457150 | بالکل یار. پروگرام کروں گا |
/content/drive/MyDrive/Temp_Data/10013.wav | 742910 | ہائے ملک رال کے چھڑ سو. |
/content/drive/MyDrive/Temp_Data/10014.wav | 623870 | قربان تھی ملک. |
/content/drive/MyDrive/Temp_Data/10015.wav | 909950 | اور اللہ نے بہت کودہ کھڑا ہے. ماما |
/content/drive/MyDrive/Temp_Data/10019.wav | 381950 | میری جان بس نہ پوچھ نہ پوچھ |
Questions:
- I have a language Model aside, I want to build an Acoustic Model. On Testing the Model, It provides Empty res. The goal is to train a model to transcript roman urdu from Audio
Any Help regarding this can be pretty much appreciated.