Transcription is successful. Need help for training

I have installed deepspeech and could successfully perform live transcription with mic_vad_streaming. Now, I desire to train with my data which consists of about 15 words (i.e., 15 commands). I have the following difficulties:

  • I am using windows. I am finding DeepSpeech.EXE and not DeepSpeech.py. Executing DeepSpeech.exe throws a message with options which does NOT incude --train-files
  • DeepSpeech-0.9.3 archive is obtained separately. If I execute DeepSpeech.py, I am getting the following error:

from deepspeech_training import train as ds_train
ModuleNotFoundError: No module named ‘deepspeech_training’

  • Since my language contains only 15 words, do I need GPU or does CPU suffices?

  • The command to train deep speech (python3 DeepSpeech.py --train_files …/data/CV/en/clips/train.csv --dev_files …/data/CV/en/clips/dev.csv --test_files …/data/CV/en/clips/test.csv) is obtained from https://deepspeech.readthedocs.io/en/r0.9/TRAINING.html. The command does not include ‘alphabet.txt’. Is it implied that ‘alphabet.txt’ exists in the current directory?

  • The wav file size in CSV represents actual size of file? I mean, what is obtained from ‘dir’ command?

  • Is there any simplified data set (say about 10 words) to ILLUSTRATE training?

Thanks and Regards
S Srinivasan

My guess: You didn’t install the module deepspeech_training with pip

It doesn’t make sense why you use .exe and python. Those are different environments. I mean usually when you run .py you do everything in python. I am python newbe on linux.

The DeepSpeech Playbook provides a step by step guide to model training, and considers aspects such as the alphabet.txt file, the environment needed for training a model (we recommend not using CPUs) among other considerations.

First, my sincere gratitude for a valuable response. I went through the instructions and could successfully perform the training resulting in generation of .pb file. Yet, I have a few issues:

My Language : My language has only three words “ram”, “robert” and “rahim”. I spoke these words multiple times and recorded. I have prepared the wave files using Audacity and manually prepared the CSV files. I seek help on the following:

(1) Performed the training WITHOUT alphabet.txt. Does it mean that it takes English by default? (Because, playbook training example (for Indonesian data set) does not include alphabet.txt at all. Even, I am using English script)
(2) Post training, I attempted transcription using mic_vad_streaming. But, whatever I speak, it outputs only the single character “r”

I seek valuable comments, especially on (2) above.

Regards
S Srinivasan