Sharing our DeepSpeech training scripts and models for Welsh

I just wanted to share details of the scripts we’ve developed at Bangor University that bring together the various features of DeepSpeech, along with CommonVoice data, and provides a complete solution for producing models and scorers for Welsh language speech recognition. They may be of interest to any other users of DeepSpeech that are working with a similarly lesser resourced language to Welsh.

The scripts:

  • are based on DeepSpeech 0.7.4
  • make use of DeepSpeech’s Dockerfiles (so setup and installation is easier).
  • train with CommonVoice data
  • utilize transfer learning
  • with some additional test sets and corpora, produce optimized scorers/language models for various applications
  • exports models with metadata

The initial README describes how to get started.

We’d like to share also the models that are produced from these scripts which can be found at https://github.com/techiaith/docker-deepspeech-cy/releases/tag/20.06

At the moment these models are used in two prototype applications which the Welsh speaking community can install and try, namely a Windows/C# based transcriber and an Android/iOS voice assistant app called Macsen. Source code for these applications using DeepSpeech can also be found on GitHub.

We are immensly grateful to Mozilla for creating the Common Voice and DeepSpeech projects.

1 Like

Thanks for sharing, what a great project.

Why don’t you append it also under this thread so people find it more easily in the future?

Helo!

I already did in January: Deep Speech in the Wild!

But added as suggested. Thanks!

1 Like

Oh s****, should have checked, sorry.

I meant of course to post an update, still great work, thanks.

I’m just discovering your work, which is cool, and also similar to what I have in https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/Dockerfile.train

Thanks! I hadn’t seen that repo before. Is it linked to in the docs? I think we’re similarly trying to join up the dots into concrete and useful ‘pipelines’ that are probably not much language specific despite repo names with ‘-fr’ and ‘-cy’ and non-English descriptions.

I’ll take a closer look.

1 Like

Nope, because it’s not “official” and still under progress. To the best of my knowledge, it’s used by community members for some Spanish, Italian and I helped a student working on a Kabyle model based on this repo.

Unifying efforts seems like a very a good thing, your feedback from having grown up a similar solution would be valuable.

That’s too bad. Would be good to have a section for user contributed model training examples, similar to the current section similar to https://deepspeech.readthedocs.io/en/v0.7.4/Contributed-Examples.html. Or perhaps if there’s to be a repository of user contributed models (guessing from the new metadata parameters in exporting(?)) then it’d be useful to have a link to the training scripts.

I’ve studied your scripts a bit more and see what’s common and different and I’d like to borrow in order improve our repo. The only feedback I could give to you (for now) is that you could add using lm_optimizer.py into your build-lm.shscript(?) It’s worthwhile in our case since I’ve observed the WER for our transcribing model reduce from 33% to 27%.

The documentation could explain this script a bit better with an example usage such as I figured out from reading the source:

(I’m not sure however why it attempts over 2000 times to find the best alpha and beta scores. It’s already found the optimal values by attempt no. 200. At 2000 however, it takes a much longer time to optimize the lm than it does to train the acoustic models with transfer learning. I’ve left it as it is for now since the test sets are small for the time being and since mostly everything else is using default settings)

Hwyl!

2 Likes

Indeed, but at the same time exposing it means supporting it. Until it’s solid enough, it would add more troubles for few gains.

This is something we are working on, yes

I’ve separated building the LM from evaluating its params, which is done in evaluate_lm.sh

Those are just the default values pickes from DeepSpeech

Yes, there is mostly no documentation, again because it’s too much of a WIP and I lack feedback on how much useful it is to people.