what do you think is the good value for n_hidden parameter.
Tried with 375, 1024 and 2048, (Early stop enabled) but I’m getting very high validation and test losses, though training losses are less.
For e.g.
With n_hidden = 375, WER = 0.582319 CER=36.162546 loss=146.159454
With n_hidden = 1024, WER = 0.759299 CER=27.491103 loss =101.068916
The models are not giving any thing close when tested with test wav files but are giving perfect output with training wav files. Looks like model has overfitted though early stop is enabled. Also, training loss are falling sharp to ~20s while validation losses are staying high at ~100s.
Any suggestions on how to improve the test/validation loss.
hi, may I ask you a question? What version of deepspeech do you use? I found you use the parameter of “–display_step”, but there is no such a parameter in my version (https://github.com/mozilla/DeepSpeech) when I run “./DeepSpeech.py --helpful”.
80hrs of training material sounds quite small. Are you trying to do General Speech to text model so it can understand German language in general or focus on some topic … ?
@lissyx: When training Deep Speech with German, do we need to change number of FEATURES in the code? I saw somewhere in the code number of features mentioned as 26, which corresponds to 26 English alphabets. Do we need to set it as 29 for German?
You can either change German Umlaute (ä, ö, ü and ß) to ae, … or add those to the alphabet file. Either way has advantages and disadvantages. You don’t change the number of features in the code.
@othiele: Sorry, if this question appears naive, are the number of features for English or German or any other language same i.e. 26? How many MFCC features are extracted from the audio signal?
Hello @agarwalaashish20 thanks for your work, very interesting. Now that deepspeech 0.6.1 and the new dataset is available, will you keep updating this repo or is this project over?
Great news. I will test how good your system works with my voice (that wasn’t part of the last release) and let you now about it on github.
Another possible dataset could be Tatoeba. Unfortunately there is no german dataset with audiofiles ready to download, one would have to write a script to download all sentences with audio in German, you can download the sentence lists here: https://tatoeba.org/deu/downloads
The ID of a sentence is also the name of the audio file, so it should be scriptable easily.
If you do this please share the script, this could be very useful for a lot of languages.
@stergro: Ok. But strangely I am couldn’t find any link to download the individual files. Could you point me any link where we can download any file?
I believe name attribution only applies when you really offer the sound file, if you use it for training of a neural network this won’t transfer the need to attribute the name from the training data to the finished system. If you store the files in a public dataset you can add a simple csv with the attributions.
Could you explain how you go about creating these datasets? Once you have the .cvs file, how do you upload them? Just simple old commonvoice.mozilla.org/sentence-collector with an attribution to Tatoeba?