Too many steps?

apradillap · May 4, 2020, 6:43am

First of all, thanks for the reply in the thread The same spped with cpu and with gpu helped me to review a couple of topics.

DeepSpeech Versión v0.7.0
Server with GPU NVIDIA Tesla K80

I’m trying Transfer-Learning (new alphabet)
and I execute

CUDA_VISIBLE_DEVICES=0 python3 ./…/…/DeepSpeech.py --drop_source_layers 1 --alphabet_config_path alphabet.txt --save_checkpoint_dir . --load_checkpoint_dir . --train_files ./…/clips/train.csv --dev_files ./…/clips/dev.csv --test_files ./…/clips/test.csv --scorer_path ./…/deepspeech-0.7.0-models.scorer --load_cudnn --train_cudnn

being the alphabet.txt the spanish one and the scorer I downloaded it from https://github.com/mozilla/DeepSpeech/releases/tag/v0.7.0

I don’t know if I’m doing anything wrong, but why when I execute:

python3 ./…/DeepSpeech.py
–train_files clips/train.csv
–dev_files clips/dev.csv
–test_files clips/test.csv
–test_batch_size 64
–dev_batch_size 64
–train_batch_size 64
–n_hidden 2048
–learning_rate 0.0001
–epochs 125
–dropout_rate 0.40
–alphabet_config_path ./…/alphabet.txt
–export_dir export070
–checkpoint_dir export070
–summary_dir export070
–scorer_path deepspeech-0.7.0-models.scorer
–export_language es

and with
CUDA_VISIBLE_DEVICES=0 python3 ./…/…/DeepSpeech.py --drop_source_layers 1 --alphabet_config_path alphabet.txt --save_checkpoint_dir . --load_checkpoint_dir . --train_files ./…/clips/train.csv --dev_files ./…/clips/dev.csv --test_files ./…/clips/test.csv --scorer_path ./…/deepspeech-0.7.0-models.scorer --load_cudnn --train_cudnn

With these numbers it would take about 20 days to train…

Do I have to pass you any parameters?

I have verified that it uses the GPU.

Another question, I want to better understand how this technology works. Can you recommend a course that has worked for you?

Thank you very much!!!

othiele · May 4, 2020, 7:18am

Use train/dev batch size of 32 or 64, that should speed things up a lot.
Udacity is currently free for 30 days, it’s NLP nanodegree has an ASR part.

othiele · May 4, 2020, 11:53am

And maybe dropout of 0.4 is a bit high, try also 0.3. And depending on your data 15 epochs already give you good results. 125 epochs is for thousands of hours

apradillap · May 5, 2020, 9:03am

I executed

CUDA_VISIBLE_DEVICES=0 nohup python3 ./…/…/DeepSpeech.py --drop_source_layers 1 --alphabet_config_path alphabet.txt --save_checkpoint_dir . --load_checkpoint_dir . --train_files ./…/clips/train.csv --dev_files ./…/clips/dev.csv --test_files ./…/clips/test.csv --scorer_path ./…/deepspeech-0.7.0-models.scorer --load_cudnn --test_batch_size 64 --dev_batch_size 64 --train_batch_size 64 --epochs 15 --dropout_rate 0.3 --n_hidden 2048 --learning_rate 0.0001

but the result is very bad, I’m trying Transfer-Learning (new alphabet)

The files for trainig are from https://voice.mozilla.org/es/datasets (Spanish) 5GB (167 hours)

What do you recommend I try?
Change any parameters?
Change the steps?
Getting more data?

Best WER:

WER: 0.500000, CER: 0.428571, loss: 15.113347

wav: file://…/clips/common_voice_es_18319898.wav
src: “no es como un”
res: “no es cum”

WER: 0.600000, CER: 0.535714, loss: 34.693447

wav: file://…/clips/common_voice_es_19734178.wav
src: “y la casa permanece derruida”
res: “la casa enlaced”

WER: 0.666667, CER: 0.764706, loss: 50.104347
6865,1 99%

res: “a cecilian”

WER: 1.000000, CER: 0.818182, loss: 87.728905

wav: file://…/clips/common_voice_es_19668502.wav
src: “se casaron en una ceremonia civil en londres”
res: “academician”

WER: 1.000000, CER: 0.857143, loss: 87.712791

wav: file://…/clips/common_voice_es_19693027.wav
src: “en ella caben destacar a autores muy prestigiosos”
res: “decadence”

WER: 1.000000, CER: 0.807692, loss: 87.665970

wav: file://…/clips/common_voice_es_19956591.wav
src: “avenida del sur une las poblaciones de la huerta sur”
res: “as needlebeam”

WER: 1.000000, CER: 0.878049, loss: 87.652550

wav: file://…/clips/common_voice_es_19270413.wav
src: “nos hizo lo imposible para acomodar deuce”
res: “icelandic”

Worst WER:

WER: 1.333333, CER: 0.583333, loss: 23.569023

wav: file://…/clips/common_voice_es_19698843.wav
src: “la pistol md”
res: “ma is to mine”

WER: 1.500000, CER: 0.750000, loss: 47.286495

wav: file://…/clips/common_voice_es_19744650.wav
src: “recientemente mr”
res: “decadence is the”

WER: 1.500000, CER: 0.500000, loss: 22.669468

wav: file://…/clips/common_voice_es_19751269.wav
src: “enlaces internos”
res: “ella as tars”

WER: 1.500000, CER: 0.666667, loss: 12.712641

wav: file://…/clips/common_voice_es_20018351.wav
src: “loja ecuador”
res: “no i call”

WER: 1.666667, CER: 0.733333, loss: 51.519295

wav: file://…/clips/common_voice_es_19971827.wav
src: “en apokolips mr”
res: “no o o i mite”

othiele · May 5, 2020, 9:17am

Post the WER and CER of the whole dataset, not just the best and worst.

Depending on that, it might be hard to transfer from English to Spanish.

apradillap · May 5, 2020, 9:24am

Ok, supposed that you could change from any language to another. I will try another strategy. ThanksSS!!! @othiele

Here are all

Best WER:

WER: 0.500000, CER: 0.428571, loss: 15.113347

wav: file://…/clips/common_voice_es_18319898.wav
src: “no es como un”
res: “no es cum”

WER: 0.600000, CER: 0.535714, loss: 34.693447

wav: file://…/clips/common_voice_es_19734178.wav
src: “y la casa permanece derruida”
res: “la casa enlaced”

WER: 0.666667, CER: 0.764706, loss: 50.104347

wav: file://…/clips/common_voice_es_19749325.wav
src: “no obstante yahoo”
res: “no eden”

WER: 0.666667, CER: 0.818182, loss: 34.066921

wav: file://…/clips/common_voice_es_19043194.wav
src: “no me rasco”
res: “no”

WER: 0.666667, CER: 0.625000, loss: 27.268988

wav: file://…/clips/common_voice_es_19736149.wav
src: “no ser arrogante”
res: “no seamed”

Median WER:

WER: 1.000000, CER: 0.836735, loss: 87.737968

wav: file://…/clips/common_voice_es_19695121.wav
src: “la presidencia llegara en junio del siguiente año”
res: “a cecilian”

WER: 1.000000, CER: 0.818182, loss: 87.728905

wav: file://…/clips/common_voice_es_19668502.wav
src: “se casaron en una ceremonia civil en londres”
res: “academician”

WER: 1.000000, CER: 0.857143, loss: 87.712791

wav: file://…/clips/common_voice_es_19693027.wav
src: “en ella caben destacar a autores muy prestigiosos”
res: “decadence”

WER: 1.000000, CER: 0.807692, loss: 87.665970

wav: file://…/clips/common_voice_es_19956591.wav
src: “avenida del sur une las poblaciones de la huerta sur”
res: “as needlebeam”

WER: 1.000000, CER: 0.878049, loss: 87.652550

wav: file://…/clips/common_voice_es_19270413.wav
src: “nos hizo lo imposible para acomodar deuce”
res: “icelandic”

Worst WER:

WER: 1.333333, CER: 0.583333, loss: 23.569023

wav: file://…/clips/common_voice_es_19698843.wav
src: “la pistol md”
res: “ma is to mine”

WER: 1.500000, CER: 0.750000, loss: 47.286495

wav: file://…/clips/common_voice_es_19744650.wav
src: “recientemente mr”
res: “decadence is the”

WER: 1.500000, CER: 0.500000, loss: 22.669468

wav: file://…/clips/common_voice_es_19751269.wav
src: “enlaces internos”
res: “ella as tars”

WER: 1.500000, CER: 0.666667, loss: 12.712641

wav: file://…/clips/common_voice_es_20018351.wav
src: “loja ecuador”
res: “no i call”

WER: 1.666667, CER: 0.733333, loss: 51.519295

wav: file://…/clips/common_voice_es_19971827.wav
src: “en apokolips mr”
res: “no o o i mite”

othiele · May 5, 2020, 9:29am

Almost first line of output is WER and CER for the whole test set, not individual files. That is important.

apradillap · May 5, 2020, 9:35am

Sorry

Test on ./…/clips/test.csv - WER: 0.996058, CER: 0.829890, loss: 90.574287

othiele · May 5, 2020, 9:42am

You built a Spanish scorer or are you using an English one? Words look English. If you want to recognize Spanish, you have to have a Spanish one

Topic		Replies	Views
DeepSpeech Training questions DeepSpeech	9	828	February 16, 2021
Step, epoch, hardware, weird Duration DeepSpeech	8	609	July 1, 2020
More than 2 days training DeepSpeech participation , feedback	2	909	April 12, 2020
Training and Testing Accuracy vs Inference Accuracy DeepSpeech feedback	24	1731	May 4, 2020
The same spped with cpu and with gpu DeepSpeech	42	2275	May 3, 2020

Too many steps?

Best WER:

Worst WER:

Best WER:

Median WER:

Worst WER:

Related topics