I tried to create a language model for german language buzzwords like “yes” (ja) and “no” (nein). Can anyone tell how many audio files i’ll need. Currently i have only like 80 wavs 16 khz mono. The audio files include up to 8 seconds the words “ja” and “nein” in random order. The tests results are pretty bad WER 0.5 up to 1.0.
I --------------------------------------------------------------------------------
I WER: 0.750000, loss: 55.009773, mean edit distance: 0.655172
I - src: “ja ja nein ja nein ja ja nein”
I - res: "nein nein nein ja ja ja ja ja ja ja "
I --------------------------------------------------------------------------------
I WER: 0.750000, loss: 55.009773, mean edit distance: 0.655172
I - src: “ja ja nein ja nein ja ja nein”
I - res: "nein nein nein ja ja ja ja ja ja ja "
I --------------------------------------------------------------------------------
I WER: 0.750000, loss: 55.009773, mean edit distance: 0.655172
I - src: “ja ja nein ja nein ja ja nein”
I - res: "nein nein nein ja ja ja ja ja ja ja "
I --------------------------------------------------------------------------------
I WER: 0.750000, loss: 55.009773, mean edit distance: 0.655172
I - src: “ja ja nein ja nein ja ja nein”
I - res: "nein nein nein ja ja ja ja ja ja ja "
I --------------------------------------------------------------------------------
I WER: 0.750000, loss: 55.009773, mean edit distance: 0.655172
I - src: “ja ja nein ja nein ja ja nein”
I - res: "nein nein nein ja ja ja ja ja ja ja "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 17.231991, mean edit distance: 0.900000
I - src: “ja ja nein”
I - res: "ja ja ja ja ja "
I --------------------------------------------------------------------------------
I used the configuration of the french guys tutorial .
Also im wondering about the steps of each epoch. It says there’s only 1 step per epoch but early on i remember it was like 1 of 40. Does this may cause the bad results?
I Training of Epoch 0 - loss: 579.255272
100% (1 of 1) |#######################################################################################################################################################| Elapsed Time: 0:00:12 Time: 0:00:12
Also im curious about the duration of creating a language model. When running the sh script from the tutorial i mentioned, one epoch would just have 1 job with a duration of 12 seconds for each epoch. When i run deepspeech with default settings (i only set the paths to the csv files) 1 epoch has 51 jobs and a duration of 1 hour.