Question about multi-language support

I am wondering whether there will be support for other languages such as German, French, Italian and so on. For English there is already many hours (>1000) of free speech data available, e.g. from the LibriSpeech (http://www.openslr.org/12/ 27) project or from VoxForge (http://www.voxforge.org 21). However, for all other languages the data situation is much worse, so there might be a much higher need for collecting data for other languages. For English, the best thing to do is probably to integrate an ASR system into Mozilla and collect real user data.

According to this post, multi-language support is supposed to be launched this week.

I guess a lot of people are really looking forward to contribute!

Hello both,

Indeed multi-language is launched, in that you can now localize the website into 30 langauges (there is a drop-down in the upper-right corner of the page).

However, we cannot yet collect voice data in any other language than English yet. We are working right now to collect sentences for people to read in these new languages. Stay tuned for more info about that.

Thanks!
Michael

1 Like

Where can we contribute non-English sentences?

Create files under server/data/${TWO_LETTER_LANGUAGE_CODE}/ and make a pull request?

@zeno For now, you can put them in a pastebin and send me the link. We are working on an official process for this right now.

400 German sentences you could say to your digital assistant.

https://pastebin.mozilla.org/9085019

Sorted, deduplicated, spell-checked.

I came up with them myself, and hereby put them into public domain.
I made the pastebin for 1 month, let me know if permanent would be better for you guys.
More to come …

1 Like

This book by Stefan Zweig on Project Gutenberg is out of copyright in the US and Germany:

http://www.mirrorservice.org/sites/ftp.ibiblio.org/pub/docs/books/gutenberg/2/4/1/7/24173/24173-8.txt

Only the mirror site is accessible from Germany as gutenberg.org blocks all German IPs (https://cand.pglaf.org/germany/index.html)

1 Like

thanks for the sentences @zeno! I’ll put them in our review pipeline, and hopefully we’ll get German up and running in the next week or so.

good suggestions @flx, gutenberg is indeed one of the places we ask people to look.