Too many steps?

First of all, thanks for the reply in the thread The same spped with cpu and with gpu helped me to review a couple of topics.

DeepSpeech Versión v0.7.0
Server with GPU NVIDIA Tesla K80

I’m trying Transfer-Learning (new alphabet)
and I execute

CUDA_VISIBLE_DEVICES=0 python3 ./…/…/DeepSpeech.py --drop_source_layers 1 --alphabet_config_path alphabet.txt --save_checkpoint_dir . --load_checkpoint_dir . --train_files ./…/clips/train.csv --dev_files ./…/clips/dev.csv --test_files ./…/clips/test.csv --scorer_path ./…/deepspeech-0.7.0-models.scorer --load_cudnn --train_cudnn

being the alphabet.txt the spanish one and the scorer I downloaded it from https://github.com/mozilla/DeepSpeech/releases/tag/v0.7.0

I don’t know if I’m doing anything wrong, but why when I execute:

python3 ./…/DeepSpeech.py
–train_files clips/train.csv
–dev_files clips/dev.csv
–test_files clips/test.csv
–test_batch_size 64
–dev_batch_size 64
–train_batch_size 64
–n_hidden 2048
–learning_rate 0.0001
–epochs 125
–dropout_rate 0.40
–alphabet_config_path ./…/alphabet.txt
–export_dir export070
–checkpoint_dir export070
–summary_dir export070
–scorer_path deepspeech-0.7.0-models.scorer
–export_language es

My last step on Epoch 0 is
Epoch 0 | Training | Elapsed Time: 0:21:52 | Steps: 366 | Loss: 133.388470
Epoch 0 | Training | Elapsed Time: 0:21:59 | Steps: 367 | Loss: 133.466653
Epoch 0 | Training | Elapsed Time: 0:21:59 | Steps: 367 | Loss: 133.466653

and with
CUDA_VISIBLE_DEVICES=0 python3 ./…/…/DeepSpeech.py --drop_source_layers 1 --alphabet_config_path alphabet.txt --save_checkpoint_dir . --load_checkpoint_dir . --train_files ./…/clips/train.csv --dev_files ./…/clips/dev.csv --test_files ./…/clips/test.csv --scorer_path ./…/deepspeech-0.7.0-models.scorer --load_cudnn --train_cudnn

Epoch 0 | Training | Elapsed Time: 6:40:11 | Steps: 23524 | Loss: 100.323532
Epoch 0 | Training | Elapsed Time: 6:40:13 | Steps: 23525 | Loss: 100.326249
Epoch 0 | Training | Elapsed Time: 6:40:15 | Steps: 23526 | Loss: 100.329100
Epoch 0 | Training | Elapsed Time: 6:40:15 | Steps: 23526 | Loss: 100.329100

With these numbers it would take about 20 days to train…

Do I have to pass you any parameters?

I have verified that it uses the GPU.

Another question, I want to better understand how this technology works. Can you recommend a course that has worked for you?

Thank you very much!!!

  1. Use train/dev batch size of 32 or 64, that should speed things up a lot.

  2. Udacity is currently free for 30 days, it’s NLP nanodegree has an ASR part.

1 Like

And maybe dropout of 0.4 is a bit high, try also 0.3. And depending on your data 15 epochs already give you good results. 125 epochs is for thousands of hours :slight_smile:

I executed

CUDA_VISIBLE_DEVICES=0 nohup python3 ./…/…/DeepSpeech.py --drop_source_layers 1 --alphabet_config_path alphabet.txt --save_checkpoint_dir . --load_checkpoint_dir . --train_files ./…/clips/train.csv --dev_files ./…/clips/dev.csv --test_files ./…/clips/test.csv --scorer_path ./…/deepspeech-0.7.0-models.scorer --load_cudnn --test_batch_size 64 --dev_batch_size 64 --train_batch_size 64 --epochs 15 --dropout_rate 0.3 --n_hidden 2048 --learning_rate 0.0001

but the result is very bad, I’m trying Transfer-Learning (new alphabet)

The files for trainig are from https://voice.mozilla.org/es/datasets (Spanish) 5GB (167 hours)

What do you recommend I try?
Change any parameters?
Change the steps?
Getting more data?


Best WER:

WER: 0.500000, CER: 0.428571, loss: 15.113347

  • wav: file://…/clips/common_voice_es_18319898.wav
  • src: “no es como un”
  • res: “no es cum”

WER: 0.600000, CER: 0.535714, loss: 34.693447

  • wav: file://…/clips/common_voice_es_19734178.wav
  • src: “y la casa permanece derruida”
  • res: “la casa enlaced”

WER: 0.666667, CER: 0.764706, loss: 50.104347
6865,1 99%

  • res: “a cecilian”

WER: 1.000000, CER: 0.818182, loss: 87.728905

  • wav: file://…/clips/common_voice_es_19668502.wav
  • src: “se casaron en una ceremonia civil en londres”
  • res: “academician”

WER: 1.000000, CER: 0.857143, loss: 87.712791

  • wav: file://…/clips/common_voice_es_19693027.wav
  • src: “en ella caben destacar a autores muy prestigiosos”
  • res: “decadence”

WER: 1.000000, CER: 0.807692, loss: 87.665970

  • wav: file://…/clips/common_voice_es_19956591.wav
  • src: “avenida del sur une las poblaciones de la huerta sur”
  • res: “as needlebeam”

WER: 1.000000, CER: 0.878049, loss: 87.652550

  • wav: file://…/clips/common_voice_es_19270413.wav
  • src: “nos hizo lo imposible para acomodar deuce”
  • res: “icelandic”

Worst WER:

WER: 1.333333, CER: 0.583333, loss: 23.569023

  • wav: file://…/clips/common_voice_es_19698843.wav
  • src: “la pistol md”
  • res: “ma is to mine”

WER: 1.500000, CER: 0.750000, loss: 47.286495

  • wav: file://…/clips/common_voice_es_19744650.wav
  • src: “recientemente mr”
  • res: “decadence is the”

WER: 1.500000, CER: 0.500000, loss: 22.669468

  • wav: file://…/clips/common_voice_es_19751269.wav
  • src: “enlaces internos”
  • res: “ella as tars”

WER: 1.500000, CER: 0.666667, loss: 12.712641

  • wav: file://…/clips/common_voice_es_20018351.wav
  • src: “loja ecuador”
  • res: “no i call”

WER: 1.666667, CER: 0.733333, loss: 51.519295

  • wav: file://…/clips/common_voice_es_19971827.wav
  • src: “en apokolips mr”
  • res: “no o o i mite”

Post the WER and CER of the whole dataset, not just the best and worst.

Depending on that, it might be hard to transfer from English to Spanish.

Ok, supposed that you could change from any language to another. I will try another strategy. ThanksSS!!! @othiele

Here are all


Best WER:

WER: 0.500000, CER: 0.428571, loss: 15.113347

  • wav: file://…/clips/common_voice_es_18319898.wav
  • src: “no es como un”
  • res: “no es cum”

WER: 0.600000, CER: 0.535714, loss: 34.693447

  • wav: file://…/clips/common_voice_es_19734178.wav
  • src: “y la casa permanece derruida”
  • res: “la casa enlaced”

WER: 0.666667, CER: 0.764706, loss: 50.104347

  • wav: file://…/clips/common_voice_es_19749325.wav
  • src: “no obstante yahoo”
  • res: “no eden”

WER: 0.666667, CER: 0.818182, loss: 34.066921

  • wav: file://…/clips/common_voice_es_19043194.wav
  • src: “no me rasco”
  • res: “no”

WER: 0.666667, CER: 0.625000, loss: 27.268988

  • wav: file://…/clips/common_voice_es_19736149.wav
  • src: “no ser arrogante”
  • res: “no seamed”

Median WER:

WER: 1.000000, CER: 0.836735, loss: 87.737968

  • wav: file://…/clips/common_voice_es_19695121.wav
  • src: “la presidencia llegara en junio del siguiente año”
  • res: “a cecilian”

WER: 1.000000, CER: 0.818182, loss: 87.728905

  • wav: file://…/clips/common_voice_es_19668502.wav
  • src: “se casaron en una ceremonia civil en londres”
  • res: “academician”

WER: 1.000000, CER: 0.857143, loss: 87.712791

  • wav: file://…/clips/common_voice_es_19693027.wav
  • src: “en ella caben destacar a autores muy prestigiosos”
  • res: “decadence”

WER: 1.000000, CER: 0.807692, loss: 87.665970

  • wav: file://…/clips/common_voice_es_19956591.wav
  • src: “avenida del sur une las poblaciones de la huerta sur”
  • res: “as needlebeam”

WER: 1.000000, CER: 0.878049, loss: 87.652550

  • wav: file://…/clips/common_voice_es_19270413.wav
  • src: “nos hizo lo imposible para acomodar deuce”
  • res: “icelandic”

Worst WER:

WER: 1.333333, CER: 0.583333, loss: 23.569023

  • wav: file://…/clips/common_voice_es_19698843.wav
  • src: “la pistol md”
  • res: “ma is to mine”

WER: 1.500000, CER: 0.750000, loss: 47.286495

  • wav: file://…/clips/common_voice_es_19744650.wav
  • src: “recientemente mr”
  • res: “decadence is the”

WER: 1.500000, CER: 0.500000, loss: 22.669468

  • wav: file://…/clips/common_voice_es_19751269.wav
  • src: “enlaces internos”
  • res: “ella as tars”

WER: 1.500000, CER: 0.666667, loss: 12.712641

  • wav: file://…/clips/common_voice_es_20018351.wav
  • src: “loja ecuador”
  • res: “no i call”

WER: 1.666667, CER: 0.733333, loss: 51.519295

  • wav: file://…/clips/common_voice_es_19971827.wav
  • src: “en apokolips mr”
  • res: “no o o i mite”

Almost first line of output is WER and CER for the whole test set, not individual files. That is important.

Sorry

Test on ./…/clips/test.csv - WER: 0.996058, CER: 0.829890, loss: 90.574287

You built a Spanish scorer or are you using an English one? Words look English. If you want to recognize Spanish, you have to have a Spanish one :slight_smile:

1 Like