Create a domain model with DeepSpeech

gabrielenones · December 3, 2020, 10:16am

I am trying to create a domain template, I have about 700 audio files, they are all: mono / 16kHz / 16bit.
I created 3 csv files in which I inserted file_path, file_size, transcript, dividing the audio files into 70% train, 20% dev, 10% test. I created an alphabet that contains all the characters present in the transcriptions.
I trained the model using this command:

  python3 DeepSpeech.py  \
--train_files /test/csvFile/train.csv \
--dev_files /test/csvFile/dev.csv \
--test_files /test/csvFile/test.csv \
--export_dir /test/model/ \
--alphabet_config_path /test/alphabet.txt

During the training phase there are no errors or problems, after finishing the training the model only returns empty strings like this

+

What should I do, has anyone already had this problem?

othiele · December 3, 2020, 10:33am

Please no images, just text.
700 files of 10 secs is way too few to do anything. Maybe transfer, but even then.
The output is totally normal for this few data.
If you have more data, use dropout of 0.3 or higher.

Hard to say, but you should have thousands of samples to get ok results.

What do you mean by domain template?

lissyx · December 3, 2020, 10:46am

it looks like you are working on italian, maybe you should work from italian’s trained model checkpoint? @Mte90

Mte90 · December 3, 2020, 10:53am

The italian model is at https://github.com/MozillaItalia/DeepSpeech-Italian-Model but you can reach in the community on telegram with @mozitabot in the developers group.

Topic		Replies	Views
Need to create Arabic models DeepSpeech	3	586	November 24, 2020
Problem when training my own model DeepSpeech	6	1362	November 2, 2020
Creating an Indian accent model with ~115k files DeepSpeech	40	5153	July 12, 2020
Noob questions on training my own model DeepSpeech	8	895	February 8, 2021
Issue regarding dataset format DeepSpeech	1	506	April 7, 2020

Create a domain model with DeepSpeech

Related topics