I have installed deepspeech and could successfully perform live transcription with mic_vad_streaming. Now, I desire to train with my data which consists of about 15 words (i.e., 15 commands). I have the following difficulties:
I am using windows. I am finding DeepSpeech.EXE and not DeepSpeech.py. Executing DeepSpeech.exe throws a message with options which does NOT incude --train-files
DeepSpeech-0.9.3 archive is obtained separately. If I execute DeepSpeech.py, I am getting the following error:
from deepspeech_training import train as ds_train
ModuleNotFoundError: No module named ‘deepspeech_training’
Since my language contains only 15 words, do I need GPU or does CPU suffices?
The command to train deep speech (python3 DeepSpeech.py --train_files …/data/CV/en/clips/train.csv --dev_files …/data/CV/en/clips/dev.csv --test_files …/data/CV/en/clips/test.csv) is obtained from https://deepspeech.readthedocs.io/en/r0.9/TRAINING.html. The command does not include ‘alphabet.txt’. Is it implied that ‘alphabet.txt’ exists in the current directory?
The wav file size in CSV represents actual size of file? I mean, what is obtained from ‘dir’ command?
Is there any simplified data set (say about 10 words) to ILLUSTRATE training?
My guess: You didn’t install the module deepspeech_training with pip
It doesn’t make sense why you use .exe and python. Those are different environments. I mean usually when you run .py you do everything in python. I am python newbe on linux.
The DeepSpeech Playbook provides a step by step guide to model training, and considers aspects such as the alphabet.txt file, the environment needed for training a model (we recommend not using CPUs) among other considerations.
First, my sincere gratitude for a valuable response. I went through the instructions and could successfully perform the training resulting in generation of .pb file. Yet, I have a few issues:
My Language : My language has only three words “ram”, “robert” and “rahim”. I spoke these words multiple times and recorded. I have prepared the wave files using Audacity and manually prepared the CSV files. I seek help on the following:
(1) Performed the training WITHOUT alphabet.txt. Does it mean that it takes English by default? (Because, playbook training example (for Indonesian data set) does not include alphabet.txt at all. Even, I am using English script)
(2) Post training, I attempted transcription using mic_vad_streaming. But, whatever I speak, it outputs only the single character “r”
I seek valuable comments, especially on (2) above.