Training a Mexican Deep Speech Model

This topic started from issue #2306 where i posted some initial experiments to train a DS model for Mexican Spanish.

Until now my progress are

  1. to gather some initial data from ciempiess corpus

  2. to generate tools to format csv files for DS

    cat ds_out/train.csv

wav_filename,wav_filesize,transcript data/speech/male/M30ABR1342/CHMC_M_75_30ABR1342_0000.wav,130448,veamos e parece como lo de la invasión española data/speech/female/F30ABR1528/CHMC_F_75_30ABR1528_0000.wav,42398,hay fuego
  1. My first run
python -u DeepSpeech.py \
  --train_files /home/amolina/repo/ciem2ds/ciempiess_ds/sortlen_half_train.csv \
  --test_files /home/amolina/repo/ciem2ds/ciempiess_ds/sortlen_all_test.csv \
  --alphabet_config_path data/mex_alphabet.txt \
  --train_batch_size 1 \
  --test_batch_size 1 \
  --n_hidden 100 \
  --epochs 200 \
  --checkpoint_dir "$checkpoint_dir" \
  "$@"

@carlfm01

Hello @alemol I see that you are using ciempiess data, from my experience with ciempiess the data is not clean enough, wrong transcriptions lead to inf as your log shows.
See gt transcriptions “tiones”, “ciertas ac” that looks wrong.

Your train_batch_size is too low, try increasing it to 20, are you training on GPU?

Did you train a new lm? I see you didn’t use the lm param, maybe is falling back to the english one?

Try using data from http://www.openslr.org/resources.php the crowdsourced works for me

I added my own lang model:

  python -u DeepSpeech.py \
  --train_files /home/amolina/repo/ciem2ds/ciempiess_ds/sortlen_half_train.csv \
  --test_files /home/amolina/repo/ciem2ds/ciempiess_ds/sortlen_all_test.csv \
  --alphabet_config_path data/mex_alphabet.txt \
  --lm_binary_path data/mexlm/transcrip_efinfo_noloc_2017-2018_probing.binary \
  --train_batch_size 2 \
  --test_batch_size 1 \
  --n_hidden 124 \
  --epochs 30 \
  --checkpoint_dir "$checkpoint_dir" \
  "$@"

Got better:

Epoch 29 |   Training | Elapsed Time: 0:07:03 | Steps: 9297 | Loss: 66.886114                                                                                                                        
I FINISHED optimization in 3:32:14.989515
WARNING:tensorflow:From /home/amolina/deepvenv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
I Restored variables from most recent checkpoint at data/CIEMhalf_checkpoint/train-278910, step 278910
Testing model on /home/amolina/repo/ciem2ds/ciempiess_ds/sortlen_all_test.csv
Computing acoustic model predictions | Steps: 6974 | Elapsed Time: 0:01:56                                                                                                                           
Decoding predictions | 100% (6974 of 6974) |###################################################################################################################| Elapsed Time: 0:23:50 Time:  0:23:50
Test on /home/amolina/repo/ciem2ds/ciempiess_ds/sortlen_all_test.csv - WER: 0.911197, CER: 0.754889, loss: 154.287247
--------------------------------------------------------------------------------
WER: 2.500000, CER: 12.000000, loss: 51.920681
 - src: "después evolucionó"
 - res: "de que con con lo"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 4.000000, loss: 7.394387
 - src: "pintando"
 - res: "en tanto"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 5.000000, loss: 9.625606
 - src: "talando"
 - res: "a la"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 4.000000, loss: 10.330667
 - src: "esclavos"
 - res: "es la"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 4.000000, loss: 11.659359
 - src: "entonces"
 - res: "en donde"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 5.000000, loss: 12.737014
 - src: "soldados"
 - res: "son las"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 5.000000, loss: 15.502726
 - src: "tiones"
 - res: "yo me"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 4.000000, loss: 15.514644
 - src: "okey"
 - res: "o que"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 6.000000, loss: 17.840826
 - src: "círculo"
 - res: "si no"
--------------------------------------------------------------------------------
WER: 1.666667, CER: 22.000000, loss: 56.092628
 - src: "concientización magníficamente desentendimiento"
 - res: "con sentido la mexicana sentimiento"
--------------------------------------------------------------------------------

--lm_trie_path is missing, did you trained the lm with accents á,é,ó? I think is only using a-z without accents from the english trie

Are you just learning how to use DS or trying to build a usable model?

Are you just learning how to use DS or trying to build a usable model?

I am trying both learning how to use DS and then trying to build a usable model. I just want to be sure that i have all the necessary before train seriously.

I will add the --lm_trie_path but is easier if you tell me what else is missing . Thanks

Please read :

Here’s one of my old commands

./DeepSpeech.py \
 --beam_width 1024 \
 --train_files /yourpath/train.csv \
 --dev_files /yourpath/dev.csv \
 --test_files /yourpath/test.csv \
 --train_batch_size 20 \
 --dev_batch_size 48 \
 --test_batch_size 48 \
 --n_hidden 2048 \
 --epochs 15 \
 --report_count 900000 \
 --earlystop_nsteps 1 \
 --dropout_rate 0.11 \
 --early_stop True \
 --learning_rate 0.0001 \
 --lm_alpha 0.75 \
 --lm_beta 2.2 \
 --export_dir /yourpath/result/models-tl \
 --checkpoint_dir /yourpath/result/ckpts-tl2 \
 --alphabet_config_path /yourpath/langmodel/alphabet.txt \
 --lm_binary_path /yourpath/langmodel/lm.binary \
 --lm_trie_path/yourpath/langmodel/trie

And sorry, if you need to build a production model you will need a good set of GPUs.

1 Like