I just wanted to share details of the scripts we’ve developed at Bangor University that bring together the various features of DeepSpeech, along with CommonVoice data, and provides a complete solution for producing models and scorers for Welsh language speech recognition. They may be of interest to any other users of DeepSpeech that are working with a similarly lesser resourced language to Welsh.
The scripts:
are based on DeepSpeech 0.7.4
make use of DeepSpeech’s Dockerfiles (so setup and installation is easier).
train with CommonVoice data
utilize transfer learning
with some additional test sets and corpora, produce optimized scorers/language models for various applications
At the moment these models are used in two prototype applications which the Welsh speaking community can install and try, namely a Windows/C# based transcriber and an Android/iOS voice assistant app called Macsen. Source code for these applications using DeepSpeech can also be found on GitHub.
We are immensly grateful to Mozilla for creating the Common Voice and DeepSpeech projects.
Thanks! I hadn’t seen that repo before. Is it linked to in the docs? I think we’re similarly trying to join up the dots into concrete and useful ‘pipelines’ that are probably not much language specific despite repo names with ‘-fr’ and ‘-cy’ and non-English descriptions.
I’ll take a closer look.
1 Like
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
7
Nope, because it’s not “official” and still under progress. To the best of my knowledge, it’s used by community members for some Spanish, Italian and I helped a student working on a Kabyle model based on this repo.
Unifying efforts seems like a very a good thing, your feedback from having grown up a similar solution would be valuable.
That’s too bad. Would be good to have a section for user contributed model training examples, similar to the current section similar to User contributed examples — DeepSpeech 0.7.4 documentation. Or perhaps if there’s to be a repository of user contributed models (guessing from the new metadata parameters in exporting(?)) then it’d be useful to have a link to the training scripts.
I’ve studied your scripts a bit more and see what’s common and different and I’d like to borrow in order improve our repo. The only feedback I could give to you (for now) is that you could add using lm_optimizer.py into your build-lm.shscript(?) It’s worthwhile in our case since I’ve observed the WER for our transcribing model reduce from 33% to 27%.
The documentation could explain this script a bit better with an example usage such as I figured out from reading the source:
(I’m not sure however why it attempts over 2000 times to find the best alpha and beta scores. It’s already found the optimal values by attempt no. 200. At 2000 however, it takes a much longer time to optimize the lm than it does to train the acoustic models with transfer learning. I’ve left it as it is for now since the test sets are small for the time being and since mostly everything else is using default settings)
Hwyl!
2 Likes
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
9
Indeed, but at the same time exposing it means supporting it. Until it’s solid enough, it would add more troubles for few gains.
This is something we are working on, yes
I’ve separated building the LM from evaluating its params, which is done in evaluate_lm.sh
Those are just the default values pickes from DeepSpeech
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
10
Yes, there is mostly no documentation, again because it’s too much of a WIP and I lack feedback on how much useful it is to people.