Have you thought about linking with librivox and lingua libre as sources? (Both free and community driven) I have already recorded lots of words for lingua libre for example
For the Italian model we are evaluating https://github.com/MozillaItalia/DeepSpeech-Italian-Model/issues/25
Yes, for the French model I am experimenting with, we are using Lingua Libre, and are in touch with its developers for feedback.
1 Like
There is this import scrips for lingua libre in the Deepspeech repo:
I found and example how to use it:
$ ./bin/import_lingua_libre ’path_to_download’ --qId 21 —-iso639-3 fra -—english-name French —-normalize (optionnel)
But I don’t completely understand what it does and not all parameters, especially the one that looks like a Wikidata id (–qId 21)
Edit: Turns out the parameters are just part of the file name that you can look up on Index of /datasets/