German dataset doesn't work for training

wagnrd · March 19, 2019, 3:51pm

I’m trying to train a german model with DeepSpeech. But as I downloaded the german dataset from the website, I noticed that the folder names and the the whole structure of the tsv-files differs from the english ones which I downloaded with the import_cv.py script (those even were csv-files).

I tried to unpack the german dataset with the import_cv.py script in the hope that it would reformat the files and folders like it did with the english dataset. But it didn’t.

Is there some kind of converter, or do I have to write one myself? And why aren’t the files already in the right form so DeepSpeech.py can use them?

kdavis · March 19, 2019, 3:53pm

Did you try using import_cv2.py?

wagnrd · March 19, 2019, 6:59pm

In fact, I did not. But with import_cv2.py it works.
Thank you.

agarwalaashish20 · October 22, 2019, 8:58am

@wagnrd: if you are still looking for DeepSpeech results on German Language. Check paper and repository. It might be useful.

https://www.researchgate.net/publication/336532830_German_End-to-end_Speech_Recognition_based_on_DeepSpeech