Empty results in inference mode

Hi, I’m training Common Voice of TW data on deepspeech,
I installed the version 0.6.1 deepspeech along with the native_client.tar.xz version=0.6.0

I train the model with command line:

python3 DeepSpeech.py
 --train_files mozilla_common_voice/clips/new_train.csv 
 --dev_files mozilla_common_voice/clips/new_dev.csv 
 --test_files mozilla_common_voice/clips/new_test.csv 
 --checkpoint_dir checkpoints 
 --export_dir checkpoints 
 --alphabet_config_path data/alphabet.txt 
 --lm_binary_path data/lm/lm.binary  
 --lm_trie_path ./trie
 --train_batch_size 32
 --test_batch_size 32
 --dev_batch_szie 32

about the data,
I have new_train.csv with 48966 .wavs (I’ve used import_cv2.py to transform mp3 to wav)
new_dev.csv with 5281 .wavs
new_test.csv with 2430 .wavs
the sample of data is like this:

wav_filename,wav_filesize,transcript
mozilla_common_voice/clips/common_voice_zh-TW_18500863.wav,107564,在 黑 暗 中 進 行
mozilla_common_voice/clips/common_voice_zh-TW_19673313.wav,148268,地 面 層 平 均 單 價 約 為 每 坪 四 十 萬 元
mozilla_common_voice/clips/common_voice_zh-TW_17850420.wav,157484,報 名 費 太 貴 了
mozilla_common_voice/clips/common_voice_zh-TW_19424053.wav,184364,突 然 想 到 一 個 念 頭

I used the space separated new_train.csv as corpus to train the kenlm lm.binary
It automatically trained for 4 epochs and ran into early stopping.
the evaluation results as follows:

--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 15.238984
 - wav: file://mozilla_common_voice/clips/common_voice_zh-TW_19354170.wav
 - src: "於 是"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 16.088865
 - wav: file://mozilla_common_voice/clips/common_voice_zh-TW_19275254.wav
 - src: "沒 有"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 17.799671
 - wav: file://mozilla_common_voice/clips/common_voice_zh-TW_18072642.wav
 - src: "近 年"
 - res: ""
--------------------------------------------------------------------------------

Is my configuration wrong? or any step of installation is not correct?
Thank you very much.

You should have same versions everywhere to avoid problems.

How much data is that in term of hours of audio ? Importer shows it at the end of the process.

Likely that default early stopping parameters are not okay in your case and you just have a model that learnt nothing.

hi @lissyx,
Thanks for your quick reply,
I’ve tried to do what you said.
I’ve made the version fixed to v0.6.0 for both deepspeech and native_client.

As for the audio length,
the training set is about 43:59:27
the test set is about 2:20:21
the dev set is about 2:13:32

and I set es_steps as 15 epochs,
so now the results turn to be like this:

Test on /home/nsml/workspace/nlp_data/mozilla_common_voice/clips/new_test.csv - WER: 0.985370, CER: 0.926809, loss: 66.073044
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.666667, loss: 14.619208
 - wav: file:///home/nsml/workspace/nlp_data/mozilla_common_voice/clips/common_voice_zh-TW_18500869.wav
 - src: "你 好"
 - res: "我 "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.666667, loss: 20.093563
 - wav: file:///home/nsml/workspace/nlp_data/mozilla_common_voice/clips/common_voice_zh-TW_17666033.wav
 - src: "站 住"
 - res: "我 "
--------------------------------------------------------------------------------

the overall WER did reduce to 0.98x,
would you mind having a look?
In your experience, is it caused by early stop steps(hope so)? or data issue?
Thank you very much

I told you to use 0.6.1.

That’s really a low amount.

Early stops analyzes the behavior of the loss, you give me no log over time, I can’t help you. In your case, I think it’s both.

Please take a look at the other early stop flags or just disable early stop.