DeepSpeech for German Language

am trying to train acoustic model for German using ~80 hrs German dataset.
With the following parameters

python -u \
  --train_files $exp_path/data/train.csv \
  --dev_files $exp_path/data/dev.csv \
  --test_files $exp_path/data/test.csv \
  --train_batch_size 12 \
  --dev_batch_size 12 \
  --test_batch_size 12 \
  --n_hidden 375 \
  --epoch 50 \
  --display_step 0 \
  --validation_step 1 \
  --early_stop True \
  --earlystop_nsteps 6 \
  --estop_mean_thresh 0.1 \
  --estop_std_thresh 0.1 \
  --dropout_rate 0.22 \
  --learning_rate 0.00095 \
  --report_count 10 \
  --use_seq_length False \
  --coord_port 8686 \
  --export_dir $exp_path/model_export/ \
  --checkpoint_dir $exp_path/checkpoints/ \
  --decoder_library_path native_client/ \
  --alphabet_config_path $alphabet_path \
  --lm_binary_path $exp_path/lm.binary \
  --lm_trie_path $exp_path/trie 

what do you think is the good value for n_hidden parameter.

Tried with 375, 1024 and 2048, (Early stop enabled) but I’m getting very high validation and test losses, though training losses are less.
For e.g.
With n_hidden = 375, WER = 0.582319 CER=36.162546 loss=146.159454
With n_hidden = 1024, WER = 0.759299 CER=27.491103 loss =101.068916

The models are not giving any thing close when tested with test wav files but are giving perfect output with training wav files. Looks like model has overfitted though early stop is enabled. Also, training loss are falling sharp to ~20s while validation losses are staying high at ~100s.

Any suggestions on how to improve the test/validation loss.

hi, may I ask you a question? What version of deepspeech do you use? I found you use the parameter of “–display_step”, but there is no such a parameter in my version ( when I run “./ --helpful”.

I’ve got exactly the same issue. Did you find an answer or better results changing the hyperparamters?


80hrs of training material sounds quite small. Are you trying to do General Speech to text model so it can understand German language in general or focus on some topic … ?

@lissyx: When training Deep Speech with German, do we need to change number of FEATURES in the code? I saw somewhere in the code number of features mentioned as 26, which corresponds to 26 English alphabets. Do we need to set it as 29 for German?

You can either change German Umlaute (ä, ö, ü and ß) to ae, … or add those to the alphabet file. Either way has advantages and disadvantages. You don’t change the number of features in the code.

1 Like

@othiele: Sorry, if this question appears naive, are the number of features for English or German or any other language same i.e. 26? How many MFCC features are extracted from the audio signal?

You don’t need to change the the number of features.

If you are still looking for DeepSpeech results on German Language. Check paper and repository. It might be useful.


Hello @agarwalaashish20 thanks for your work, very interesting. Now that deepspeech 0.6.1 and the new dataset is available, will you keep updating this repo or is this project over?

@stergro: Yes, we will keep on updating the repository with new datasets and deepspeech releases.

In case you find any other public datasets apart from Voxforge, Tuda-De, MCV, Mailabs and SWC, kindly let us know.

It would be great if you can also post a comment on GitHub, so that development activities can be prioritized.

1 Like

Great news. I will test how good your system works with my voice (that wasn’t part of the last release) and let you now about it on github.

Another possible dataset could be Tatoeba. Unfortunately there is no german dataset with audiofiles ready to download, one would have to write a script to download all sentences with audio in German, you can download the sentence lists here:
The ID of a sentence is also the name of the audio file, so it should be scriptable easily.

If you do this please share the script, this could be very useful for a lot of languages.

@stergro: Thank you for the link. We would definitely share the scripts. I have two questions:

  1. What should be the approx size of the dataset?
  2. Is it public licensed?

It is licensed as CC BY 2.0, some sentences are CC0. You can have a look into the data here, where it says that German has 23 222 recorded sentences.

@stergro: Ok. But strangely I am couldn’t find any link to download the individual files. Could you point me any link where we can download any file?

Check Downloads here and look into audiomate source to see how they get “de” files from it. License should be fine for research.

While the sentences are mainly CC 0 and CC BY 2.0 and simply attributable to “Tatoeba”, the audio is mostly unfree (83% CC BY-NC-ND, 8% CC BY-NC, 2% CC BY, 7% only for Tatoeba - see and has to be attributed to the indiviual users (

1 Like

The English tatoeba dataset is already prepared and available on the datasets section lower on this page:

I believe name attribution only applies when you really offer the sound file, if you use it for training of a neural network this won’t transfer the need to attribute the name from the training data to the finished system. If you store the files in a public dataset you can add a simple csv with the attributions.

Could you explain how you go about creating these datasets? Once you have the .cvs file, how do you upload them? Just simple old with an attribution to Tatoeba?

Recent experience attempting to do DeepSpeech German.

I am using version 0.10.0-alpha.3 nuget [], deepspeech-0.9.3-models.pbmm, and arctic_a0024.wav for English without problem.

However, when I use output_graph_de.pbmm [from polygot], which I understand for 0.7 DeepSpeech version. I could not get it to work.

For people who are working to distribute DeepSpeech Greman…

  • Please provide a sample German.wav that will definitely work.
  • Please distribute the 0.10 version in pbmm

Thank you