Thread: Discussion on how to manage repositories on GitHub for DeepSpeech work in other languages

dan.bmh · March 5, 2021, 8:51am

Hi @kreid,
in general, I think this is a great idea.

I just wanted to notify you that I already did some work on this field with my DeepSpeech-Polyglot project:

Pros:

Currently supports 5 different languages (German, Spanish, French, Italian, Polish)
Covers the whole training process, with data preprocessing, language model building, training and exporting.
Adding support for new languages is also very easy, you just have to add a new alphabet_xx.txt file and extend the special words and character replacement file (langdicts.json).

Cons:

I will soon drop support of direct integration into DeepSpeech, because I’m trying to replace it with an improved network architecture (I’m not finished with it yet).
I’m open for a full integration of the exported networks into DeepSpeech again, but this requires some effort, mainly in the native client code, and currently I don’t have the time for it (I already started a discussion about this here: Integration of DeepSpeech-Polyglot's new networks - #10 by dan.bmh).