Need a deepspeech for dummies tutorial

raghupathyv4 · December 24, 2019, 9:09am

Hi all,
I am new to Deepspeech and i wanted to train a model for my free spoken digits datset and i found this tutorial TUTORIAL : How I trained a specific french model to control my robot to train using our own data but i have the following questions like

where do i place my dataset ? should it be placed under the deepspeech/data folder? or any where else? You can find my dataset from this github link (https://github.com/Jakobovski/free-spoken-digit-dataset)
how the vocabulary.txt file should look like?
if we split the whole dataset into test,train and dev then where should i put the vocabulary.txt file ?
what is an arpa and why do we need it to build the lm model?
I have deepSpeech installed inside a linux virtual machine in my PC and i do not have a GPU support in my device, would deepspeech training will work for my small dataset
Like those questions i have many questions?
Basically like in kaldi i need a “DeepSpeech for Dummies tutorial”

lissyx · December 24, 2019, 9:35am

Where you want, since you can pass --train_files and others arguments

Please look at data/lm content

This question makes no sense to me, there’s no intersection between dataset and vocabulary file

This is KenLM-level, you jush ave to build it in the process, but you won’t need it after

Not sure I get the point here:

do you want help to get teh GPU working in the VM ?
do you want help to get it working on your basesystem where the GPU is available ?

Define your hardware, define your dataset. We can’t tell you without more context …

Hard to write when you don’t now what “dummies” might be. Training a model is non-trivial. What dummy do you target ? People who know nothing about machine learning ? People who are keen in machine learning but just new to DeepSpeech ?

As for the second kind, I’m experimenting that: https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/ so it’s easily forkable, hackable, reproductible for people who wants to ease the pain.

FrontierDK · April 19, 2021, 6:56am

I agree with raghupathyv4 here, I also use Linux in VMs only. I’m guessing he’s asking for the same reason as I - since we don’t have dedicated Linux computers and a limited budget - we use virtual machines instead.

So I guess both his and my question is - how to make it work in a Linux VM, where there is no GPU.

Also, what is a KenLM-level?

I 100% agree that this is the worst documented tech for many years, and I’m also trying to:

make it work.
want to create my own recognizer, in a different language, and I can easily make voice files from different voices.
I want to include new words (local street names etc.)

lissyx · April 19, 2021, 11:05am

Thanks for taking care of sharing your feelings. Doing documentation is hard, especially when the topic is complex ; however when we are being shared with actionable feedback on what to improve, we can do things.

There’s now a Playbook available: DeepSpeech Playbook | deepspeech-playbook

Unfortunately, if you need to do actual training, you will need a GPU.

This is documented. If all you can say is that it is poorly documented, I’m afraid we can’t help you.

This is documented and also covered in the playbook.

FrontierDK · April 19, 2021, 11:26am

Hi lissyx, thank you very much for the info and link to the Playbook, it will be instantly studdied