DeepSpeech PlayBook v0.1 Alpha is available for feedback

kreid · February 9, 2021, 11:24am

Hi everyone,

Firstly a huge thanks to everyone for sending through ideas on what should be in a DeepSpeech PlayBook - your feedback was invaluable.

We’re pleased to announce that the first alpha release of the DeepSpeech PlayBook is now available for feedback and testing.

https://mozilla.github.io/deepspeech-playbook/

The PlayBook is written in MarkDown, and we welcome Issues and PRs to the GitHub repository.

In particular, we are seeking the following feedback:

Please try these instructions, particularly for building a Docker image and running a Docker container, on multiple distributions of Linux so that we can identify corner cases.
Please contribute your tacit knowledge - such as:
- common errors encountered in data formatting, environment setup, training and validation
- techniques or approaches for improving the scorer, alphabet file or the accuracy of Word Error Rate (WER) and Character Error Rate (CER).
- case studies of the work you or your organisation have been doing, showing your approaches to data validation, training or evaluation.
Please identify errors in text - with many eyes, bugs are shallow

The PlayBook focuses on training models in DeepSpeech, and does not seek to replace the existing documentation, but instead provides initial guidance to overcome common hurdles experienced when first training models in DeepSpeech.

I’d like to give a huge shoutout here to @ftyers, @Joshua_Meyer and XXX for all their expertise, feedback and patience as we built this out.

othiele · February 9, 2021, 7:24pm

Do you know whether one could easily add a search function? Wanted to find “test splits” in a hurry.

kreid · February 9, 2021, 10:01pm

Great question @othiele. At the moment the only way to search would be searching the GitHub repo itself. I know that test splits are not covered, and that information about test splits should live in TESTING.md:

github.com

mozilla/deepspeech-playbook/blob/master/TESTING.md

[Home](README.md) | [Previous - Training your model](TRAINING.md) | [Next - Deploying your model](DEPLOYMENT.md)

# Testing and evaluating your trained model

## Contents

- [Testing and evaluating your trained model](#testing-and-evaluating-your-trained-model)
  * [Contents](#contents)
  * [Word Error Rate, Character Error Rate, loss and model performance](#word-error-rate--character-error-rate--loss-and-model-performance)
  * [Acoustic model and language model working together](#acoustic-model-and-language-model-working-together)
  * [Heuristics](#heuristics)
  * [Fine tuning and transfer learning](#fine-tuning-and-transfer-learning)

Let's say that you've already trained an acoustic model and a language model (a [scorer](SCORER.md)). Congratulations! But before you [deploy](DEPLOYMENT.md) your setup, you will need to evaluate how well it will work in practice - on your intended use case.

We're talking here about a _setup_ rather than a trained _model_ on purpose - as there are multiple factors that influence how well a _setup_ performs in real life. There are multiple factors that influence the success of an application, and you need to keep all these factors in mind. The acoustic model and language model work with each other to turn speech into text, and there are lots of ways (i.e. decoding hyperparameter settings) with which you can combine those two models.

## Word Error Rate, Character Error Rate, loss and model performance

During acoustic model [training](TRAINING.md) with Tensorflow, you hopefully saw the training and validation _loss_ go down over time. At the end of the training, DeepSpeech would have printed scores for your model called the _Word Error Rate (WER)_ and _Character Error Rate (CER)_.

This file has been truncated. show original

Best, Kathy