Transcribe.py wrong transcription

Hello!
I am new to all this. I am trying to train the Slovenian language. But my first question is, on how many steps can I expect that transcription would a bit work (get a few words right etc.)? Currently, I am on 1100 step and transcribe doesn’t work. All audio recordings I put in it transcribe.py returns this:

{“start”: 0, “end”: 2940, “transcript”: "i "}

If I export the model to .pbmm model and try it in a C# application, I get similar results. For the same recording, I get "a a a a " or something like that…

I use:

  • TensorFlow 1.15.2 for CPU.
  • WSL Ubuntu 18.04.4 LTS
  • Python 3.6.9
  • DeepSpeech 0.7.4

Thank you for your help!

Please What and how to report if you need support and share meaningful training context.

1 Like

Hi!
Now I ran the test on a training process and the result are this:
I FINISHED optimization in 0:02:32.277167
I Could not find best validating checkpoint.
I Loading most recent checkpoint from /mnt/d/DeepSpeech/checkpoint/train-1629
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/weights
Testing model on /mnt/d/sl/clips/test.csv
Test epoch | Steps: 338 | Elapsed Time: 0:23:28
Test on /mnt/d/sl/clips/test.csv - WER: 1.000000, CER: 0.940109, loss: 130.935013

Best WER:

WER: 1.000000, CER: 0.888889, loss: 434.096863

  • wav: file:///mnt/d/sl/clips/common_voice_sl_17435123.wav
  • src: “oprostite”
  • res: "i "

WER: 1.000000, CER: 0.969697, loss: 277.048157

  • wav: file:///mnt/d/sl/clips/common_voice_sl_17399349.wav
  • src: “podkupnine ne dosežejo svojega namena če tam delajo iskreni ljudje”
  • res: "i "

WER: 1.000000, CER: 0.965517, loss: 239.865509

  • wav: file:///mnt/d/sl/clips/common_voice_sl_17419740.wav
  • src: “življenje je kot šahovska igra spreminja se z vsako potezo”
  • res: "i "

WER: 1.000000, CER: 0.962264, loss: 222.684586

  • wav: file:///mnt/d/sl/clips/common_voice_sl_17419622.wav
  • src: “če hočeš prijatelja izgubiti z njim o politiki govori”
  • res: "i "

WER: 1.000000, CER: 0.957447, loss: 220.735397

  • wav: file:///mnt/d/sl/clips/common_voice_sl_17419624.wav
  • src: “moški razmišljajo načrtujejo in včasih delujejo”
  • res: "i "

Median WER:

WER: 1.000000, CER: 0.962963, loss: 127.144165

  • wav: file:///mnt/d/sl/clips/common_voice_sl_18133783.wav
  • src: “kakšno je njegovo drugo ime”
  • res: "i "

WER: 1.000000, CER: 0.933333, loss: 126.714897

  • wav: file:///mnt/d/sl/clips/common_voice_sl_17399481.wav
  • src: “človeška narava ima svoje meje”
  • res: "i "

WER: 1.000000, CER: 0.925926, loss: 126.667732

  • wav: file:///mnt/d/sl/clips/common_voice_sl_17419659.wav
  • src: “nuja je mati iznajdljivosti”
  • res: "i "

WER: 1.000000, CER: 0.941176, loss: 126.427841

  • wav: file:///mnt/d/sl/clips/common_voice_sl_17419751.wav
  • src: “smrt se ga boji ker ima levje srce”
  • res: "i "

WER: 1.000000, CER: 0.965517, loss: 126.229973

  • wav: file:///mnt/d/sl/clips/common_voice_sl_17419640.wav
  • src: “kakšno je podnebje na jamajki”
  • res: "i "

Worst WER:

WER: 1.000000, CER: 0.818182, loss: 44.926929

  • wav: file:///mnt/d/sl/clips/common_voice_sl_17794339.wav
  • src: “ali ni tako”
  • res: "i "

WER: 1.000000, CER: 0.888889, loss: 43.481487

  • wav: file:///mnt/d/sl/clips/common_voice_sl_17383582.wav
  • src: “v redu je”
  • res: "i "

WER: 1.000000, CER: 0.857143, loss: 42.304008

  • wav: file:///mnt/d/sl/clips/common_voice_sl_17399395.wav
  • src: “tu imaš”
  • res: "i "

WER: 1.000000, CER: 0.833333, loss: 36.833286

  • wav: file:///mnt/d/sl/clips/common_voice_sl_17399294.wav
  • src: “ne vem”
  • res: "i "

WER: 1.111111, CER: 0.807692, loss: 238.193420

  • wav: file:///mnt/d/sl/clips/common_voice_sl_17368788.wav
  • src: “zmago je divje odletel z helikopterjem visoko v zrak”
  • res: "a a a a a a a a a a "

Wihtout more context (parameters, datasets, …) we can’t help.

Hi!
As I said, I am using a dataset for Slovenian language. I am using voice.mozzila.com dataset I have downloaded. I was following these instructions: https://deepspeech.readthedocs.io/en/v0.7.4/TRAINING.html My command for training is this:

python3 DeepSpeech.py --train_files /mnt/d/sl/clips/train.csv --dev_files /mnt/d/sl/clips/dev.csv --test_files /mnt/d/sl/clips/test.csv --alphabet_config_path /mnt/d/DeepSpeech/data/alphabet.txt --checkpoint_dir /mnt/d/DeepSpeech/checkpoint/

how much data, in audio hours, is that?

well you need to adapt hyperparameters to your dataset

how many epochs did you complete?

Hi!
Slovenian dataset has 7 hours in total. 4 hours is validated.

Am, I did not see it in the manual. I went up to the exporting module (here). I did exactly as it says here.

Currently, I am on Epoch 0, Step: 2107.

that very very now much, dont expect to train anything usable with this amount. You might try transfer learning.

So you need much more epochs as well once you have enough data