DeepSpeech for German Language

Hello,

80hrs of training material sounds quite small. Are you trying to do General Speech to text model so it can understand German language in general or focus on some topic … ?

@lissyx: When training Deep Speech with German, do we need to change number of FEATURES in the code? I saw somewhere in the code number of features mentioned as 26, which corresponds to 26 English alphabets. Do we need to set it as 29 for German?

You can either change German Umlaute (ä, ö, ü and ß) to ae, … or add those to the alphabet file. Either way has advantages and disadvantages. You don’t change the number of features in the code.

1 Like

@othiele: Sorry, if this question appears naive, are the number of features for English or German or any other language same i.e. 26? How many MFCC features are extracted from the audio signal?

You don’t need to change the the number of features.

If you are still looking for DeepSpeech results on German Language. Check paper and repository. It might be useful.

https://www.researchgate.net/publication/336532830_German_End-to-end_Speech_Recognition_based_on_DeepSpeech

4 Likes

Hello @agarwalaashish20 thanks for your work, very interesting. Now that deepspeech 0.6.1 and the new dataset is available, will you keep updating this repo or is this project over?

@stergro: Yes, we will keep on updating the repository with new datasets and deepspeech releases.

In case you find any other public datasets apart from Voxforge, Tuda-De, MCV, Mailabs and SWC, kindly let us know.

It would be great if you can also post a comment on GitHub, so that development activities can be prioritized.

1 Like

Great news. I will test how good your system works with my voice (that wasn’t part of the last release) and let you now about it on github.

Another possible dataset could be Tatoeba. Unfortunately there is no german dataset with audiofiles ready to download, one would have to write a script to download all sentences with audio in German, you can download the sentence lists here: https://tatoeba.org/deu/downloads
The ID of a sentence is also the name of the audio file, so it should be scriptable easily.

If you do this please share the script, this could be very useful for a lot of languages.

@stergro: Thank you for the link. We would definitely share the scripts. I have two questions:

  1. What should be the approx size of the dataset?
  2. Is it public licensed?

It is licensed as CC BY 2.0, some sentences are CC0. You can have a look into the data here, where it says that German has 23 222 recorded sentences.

@stergro: Ok. But strangely I am couldn’t find any link to download the individual files. Could you point me any link where we can download any file?

Check Downloads here and look into audiomate source to see how they get “de” files from it. License should be fine for research.

While the sentences are mainly CC 0 and CC BY 2.0 and simply attributable to “Tatoeba”, the audio is mostly unfree (83% CC BY-NC-ND, 8% CC BY-NC, 2% CC BY, 7% only for Tatoeba - see http://downloads.tatoeba.org/exports/sentences_with_audio.tar.bz2) and has to be attributed to the indiviual users (https://en.wiki.tatoeba.org/articles/show/faq).

1 Like

The English tatoeba dataset is already prepared and available on the datasets section lower on this page: https://voice.mozilla.org/en/datasets

I believe name attribution only applies when you really offer the sound file, if you use it for training of a neural network this won’t transfer the need to attribute the name from the training data to the finished system. If you store the files in a public dataset you can add a simple csv with the attributions.

Could you explain how you go about creating these datasets? Once you have the .cvs file, how do you upload them? Just simple old commonvoice.mozilla.org/sentence-collector with an attribution to Tatoeba?

Recent experience attempting to do DeepSpeech German.

I am using version 0.10.0-alpha.3 nuget [libdeepspeech.so], deepspeech-0.9.3-models.pbmm, and arctic_a0024.wav for English without problem.

However, when I use output_graph_de.pbmm [from polygot], which I understand for 0.7 DeepSpeech version. I could not get it to work.

For people who are working to distribute DeepSpeech Greman…

  • Please provide a sample German.wav that will definitely work.
  • Please distribute the 0.10 version in pbmm

Thank you

If you are reaching for help on the German model produced by community, please:

  • at least follow the discourse guidelines
  • share your error instead of saying “I could not get it to work”: this is non actionable, it makes people unable to help you, and it’s inefficient for everybody
  • start a new thread instead of hijacking a year old dormant thread
  • contact upstream and not downstream?

@lissyx sorry that I may seem disrespect. This is not my intention.

I think that may be part of your problem as I can confirm that polyglot and ds-german model work with deepspeech v0.9.3

1 Like