DeepSpeech for German Language

rpratesh · March 4, 2019, 11:07am

am trying to train acoustic model for German using ~80 hrs German dataset.
With the following parameters

python -u DeepSpeech.py \
  --train_files $exp_path/data/train.csv \
  --dev_files $exp_path/data/dev.csv \
  --test_files $exp_path/data/test.csv \
  --train_batch_size 12 \
  --dev_batch_size 12 \
  --test_batch_size 12 \
  --n_hidden 375 \
  --epoch 50 \
  --display_step 0 \
  --validation_step 1 \
  --early_stop True \
  --earlystop_nsteps 6 \
  --estop_mean_thresh 0.1 \
  --estop_std_thresh 0.1 \
  --dropout_rate 0.22 \
  --learning_rate 0.00095 \
  --report_count 10 \
  --use_seq_length False \
  --coord_port 8686 \
  --export_dir $exp_path/model_export/ \
  --checkpoint_dir $exp_path/checkpoints/ \
  --decoder_library_path native_client/libctc_decoder_with_kenlm.so \
  --alphabet_config_path $alphabet_path \
  --lm_binary_path $exp_path/lm.binary \
  --lm_trie_path $exp_path/trie

what do you think is the good value for n_hidden parameter.

Tried with 375, 1024 and 2048, (Early stop enabled) but I’m getting very high validation and test losses, though training losses are less.
For e.g.
With n_hidden = 375, WER = 0.582319 CER=36.162546 loss=146.159454
With n_hidden = 1024, WER = 0.759299 CER=27.491103 loss =101.068916

The models are not giving any thing close when tested with test wav files but are giving perfect output with training wav files. Looks like model has overfitted though early stop is enabled. Also, training loss are falling sharp to ~20s while validation losses are staying high at ~100s.

Any suggestions on how to improve the test/validation loss.

1293577861 · March 8, 2019, 3:24am

hi, may I ask you a question? What version of deepspeech do you use? I found you use the parameter of “–display_step”, but there is no such a parameter in my version (https://github.com/mozilla/DeepSpeech) when I run “./DeepSpeech.py --helpful”.

ena.1994 · May 28, 2019, 5:35pm

I’ve got exactly the same issue. Did you find an answer or better results changing the hyperparamters?

pete · May 29, 2019, 12:48pm

Hello,

80hrs of training material sounds quite small. Are you trying to do General Speech to text model so it can understand German language in general or focus on some topic … ?

agarwalaashish20 · June 25, 2019, 12:46pm

@lissyx: When training Deep Speech with German, do we need to change number of FEATURES in the code? I saw somewhere in the code number of features mentioned as 26, which corresponds to 26 English alphabets. Do we need to set it as 29 for German?

othiele · June 25, 2019, 12:52pm

You can either change German Umlaute (ä, ö, ü and ß) to ae, … or add those to the alphabet file. Either way has advantages and disadvantages. You don’t change the number of features in the code.

agarwalaashish20 · June 25, 2019, 1:37pm

@othiele: Sorry, if this question appears naive, are the number of features for English or German or any other language same i.e. 26? How many MFCC features are extracted from the audio signal?

reuben · June 25, 2019, 2:09pm

You don’t need to change the the number of features.

agarwalaashish20 · October 22, 2019, 8:57am

If you are still looking for DeepSpeech results on German Language. Check paper and repository. It might be useful.

https://www.researchgate.net/publication/336532830_German_End-to-end_Speech_Recognition_based_on_DeepSpeech

stergro · January 15, 2020, 1:45pm

Hello @agarwalaashish20 thanks for your work, very interesting. Now that deepspeech 0.6.1 and the new dataset is available, will you keep updating this repo or is this project over?

agarwalaashish20 · January 15, 2020, 2:22pm

@stergro: Yes, we will keep on updating the repository with new datasets and deepspeech releases.

In case you find any other public datasets apart from Voxforge, Tuda-De, MCV, Mailabs and SWC, kindly let us know.

It would be great if you can also post a comment on GitHub, so that development activities can be prioritized.

stergro · January 15, 2020, 3:02pm

Great news. I will test how good your system works with my voice (that wasn’t part of the last release) and let you now about it on github.

Another possible dataset could be Tatoeba. Unfortunately there is no german dataset with audiofiles ready to download, one would have to write a script to download all sentences with audio in German, you can download the sentence lists here: https://tatoeba.org/deu/downloads
The ID of a sentence is also the name of the audio file, so it should be scriptable easily.

If you do this please share the script, this could be very useful for a lot of languages.

agarwalaashish20 · January 15, 2020, 2:45pm

@stergro: Thank you for the link. We would definitely share the scripts. I have two questions:

What should be the approx size of the dataset?
Is it public licensed?

stergro · January 15, 2020, 2:54pm

It is licensed as CC BY 2.0, some sentences are CC0. You can have a look into the data here, where it says that German has 23 222 recorded sentences.

agarwalaashish20 · January 15, 2020, 10:21pm

@stergro: Ok. But strangely I am couldn’t find any link to download the individual files. Could you point me any link where we can download any file?

othiele · January 15, 2020, 10:39pm

Check Downloads here and look into audiomate source to see how they get “de” files from it. License should be fine for research.

CennoxX · January 16, 2020, 3:47pm

While the sentences are mainly CC 0 and CC BY 2.0 and simply attributable to “Tatoeba”, the audio is mostly unfree (83% CC BY-NC-ND, 8% CC BY-NC, 2% CC BY, 7% only for Tatoeba - see http://downloads.tatoeba.org/exports/sentences_with_audio.tar.bz2) and has to be attributed to the indiviual users (https://en.wiki.tatoeba.org/articles/show/faq).

stergro · January 16, 2020, 7:56pm

The English tatoeba dataset is already prepared and available on the datasets section lower on this page: https://voice.mozilla.org/en/datasets

I believe name attribution only applies when you really offer the sound file, if you use it for training of a neural network this won’t transfer the need to attribute the name from the training data to the finished system. If you store the files in a public dataset you can add a simple csv with the attributions.

xosecalvo · February 9, 2021, 9:14pm

Could you explain how you go about creating these datasets? Once you have the .cvs file, how do you upload them? Just simple old commonvoice.mozilla.org/sentence-collector with an attribution to Tatoeba?

jimS · February 16, 2021, 9:09am

Recent experience attempting to do DeepSpeech German.

I am using version 0.10.0-alpha.3 nuget [libdeepspeech.so], deepspeech-0.9.3-models.pbmm, and arctic_a0024.wav for English without problem.

However, when I use output_graph_de.pbmm [from polygot], which I understand for 0.7 DeepSpeech version. I could not get it to work.

For people who are working to distribute DeepSpeech Greman…

Please provide a sample German.wav that will definitely work.
Please distribute the 0.10 version in pbmm

Thank you