What else needs testing in the Mozilla STT?


With an academic year having started, we got a project in our Software Testing classes. Choose any of the open-source software available on the Internet and test it, making good contribution for the chosen project.

My friend and I want to contribute for Mozilla STT / DeepSpeech project in the form of experimenting with different noise levels and by checking its accuracy. We will let you know of our progress. This was our first idea, as it seems to be simple, yet useful for potential developers using Mozilla STT.

However, we were advised to first ask You about any particular problems that are urgent for testing. As beginner testers we can do something that is fairly easy but requires more time and effort. If there is something that makes you unsure in any of Mozilla’s STT software functionalities, we are here to make some contributions in testing those cases.

We will also browse and keep an eye on this forum.


Help is always appreciated, thanks.

@lissyx and @reuben might have some great ideas to test the software itself.

The augmentation part you mention is something that could really use some testing and documentation of best practices. It is not software testing by the book though, but definitely something that testing will evolve towards in the future.

Another, more classic, area that could use some love is the importers. Common Voice data is flawed and other external sources as well. We could use some filters etc. to identify bad audio input before starting training. Maybe some integration of something like audiomate to make that easier.

Both are not classic testing, so be sure you can do that. Test coverage of the code itself is not very high, but would be somewhat complex to implement :slight_smile:


Thanks @Clockworker for reaching out, I’ll look at that in details and get back by monday :slight_smile:

I’ve noticed that new 0.9 version is in alpha test phase. If there are some new features that may be affecting accuracy, I think we can test them for you. Maybe “hot-word” feature? This seems interesting and fresh rather that testing augmentations.

I am just asking in case you had any tips or recommendations for newcomers in this project :smile:

I’ve already had a great success in setting up environment (recognition + model training) for the lastest release.

1 Like

Definitively! That, as well as the timestamps fix that landed for r0.9: https://github.com/mozilla/DeepSpeech/pull/3279

Can you elaborate on that? What is the scope of the course you are taking? How much time is allocated for the project? How important is it in your whole cursus?

Obligatory course for one semester of my Bachelor’s degree in CS, so we have time up to February. Whole practical component of this course is dedicated for making this project. Our teacher is a chill guy, who just wants us to make some positive contribution for better resume. Sharing problems that we tested with community is kind of nesscessary for a better grade, but not mandatory. Scope: I’d like it to remain simple but effective in results, so we can show off something to the class. I don’t mind it being time consuming if it is pretty understandable for a advanced programmer with no job experimence.

If we talking about hours spent into testing process: there are two of us and I can say that we can spend about total 60 hours working on it for two of us. This semester is pretty easy so this estimation may vary (read: we can spend maybe more than 60 hours).

So, that’s 60 hours for each of you, or both? Over a semester, it does qualify as small project in my terminology, something you can commit ~2-3h per week. I do fear noise level might be a bit too much because of all the exploration involved, and within your timeframe that might be tight.

Hot Word boosting and the Timesteps fix could be good candidate. If you had a bit more time and/or GPUs, looking at the All bytes mode for languages like Mandarin could be a good idea as well, since @reuben is working on that to finalize a 0.9

1 Like

60 for each of us.

Okay, so we will look on that closely soon.

And on that too. As I was browsing, I saw a post or two about people having problem with it. If @reuben had something that needed testing for things he’s doing right now, and it doesn’t require hi-tech GPU (we both have mediocre laptops) we can also consider doing that. However, at this moment Hot Word boosting seems reasonable thing to test as it doesn’t depart far from black-box testing, our original idea.

Thanks a lot!

Also, does that means you need to work out some test plans and stuff like that?

Yes, plans, test types etc., everything that testers do in their job. So maybe my estimation wasn’t good. I was prioritizing practical stuff and not counting that paper-work into this total 60 hours.

Well, 120h for the two of you, accounting you are students, I guess Hot Words and Timesteps fix are perfect fit. Hot words should be easier, timesteps really would welcome extensive testing like that.

1 Like