I am new to Deepspeech and i wanted to train a model for my free spoken digits datset and i found this tutorial TUTORIAL : How I trained a specific french model to control my robot to train using our own data but i have the following questions like
- where do i place my dataset ? should it be placed under the deepspeech/data folder? or any where else? You can find my dataset from this github link (https://github.com/Jakobovski/free-spoken-digit-dataset)
- how the vocabulary.txt file should look like?
- if we split the whole dataset into test,train and dev then where should i put the vocabulary.txt file ?
- what is an arpa and why do we need it to build the lm model?
- I have deepSpeech installed inside a linux virtual machine in my PC and i do not have a GPU support in my device, would deepspeech training will work for my small dataset
Like those questions i have many questions?
Basically like in kaldi i need a “DeepSpeech for Dummies tutorial”