alemol
(Alejandro Molina)
September 3, 2019, 8:30pm
1
This topic started from issue #2306 where i posted some initial experiments to train a DS model for Mexican Spanish.
Until now my progress are
to gather some initial data from ciempiess corpus
to generate tools to format csv files for DS
cat ds_out/train.csv
wav_filename,wav_filesize,transcript
data/speech/male/M30ABR1342/CHMC_M_75_30ABR1342_0000.wav,130448,veamos e parece como lo de la invasión española
data/speech/female/F30ABR1528/CHMC_F_75_30ABR1528_0000.wav,42398,hay fuego
My first run
python -u DeepSpeech.py \
--train_files /home/amolina/repo/ciem2ds/ciempiess_ds/sortlen_half_train.csv \
--test_files /home/amolina/repo/ciem2ds/ciempiess_ds/sortlen_all_test.csv \
--alphabet_config_path data/mex_alphabet.txt \
--train_batch_size 1 \
--test_batch_size 1 \
--n_hidden 100 \
--epochs 200 \
--checkpoint_dir "$checkpoint_dir" \
"$@"
alemol
(Alejandro Molina)
September 3, 2019, 8:56pm
2
@carlfm01
Hello @alemol I see that you are using ciempiess data, from my experience with ciempiess the data is not clean enough, wrong transcriptions lead to inf as your log shows.
See gt transcriptions “tiones”, “ciertas ac” that looks wrong.
Your train_batch_size is too low, try increasing it to 20, are you training on GPU?
Did you train a new lm? I see you didn’t use the lm param, maybe is falling back to the english one?
Try using data from http://www.openslr.org/resources.php the crowdsourced works for me
alemol
(Alejandro Molina)
September 5, 2019, 5:20pm
3
I added my own lang model:
python -u DeepSpeech.py \
--train_files /home/amolina/repo/ciem2ds/ciempiess_ds/sortlen_half_train.csv \
--test_files /home/amolina/repo/ciem2ds/ciempiess_ds/sortlen_all_test.csv \
--alphabet_config_path data/mex_alphabet.txt \
--lm_binary_path data/mexlm/transcrip_efinfo_noloc_2017-2018_probing.binary \
--train_batch_size 2 \
--test_batch_size 1 \
--n_hidden 124 \
--epochs 30 \
--checkpoint_dir "$checkpoint_dir" \
"$@"
Got better:
Epoch 29 | Training | Elapsed Time: 0:07:03 | Steps: 9297 | Loss: 66.886114
I FINISHED optimization in 3:32:14.989515
WARNING:tensorflow:From /home/amolina/deepvenv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
I Restored variables from most recent checkpoint at data/CIEMhalf_checkpoint/train-278910, step 278910
Testing model on /home/amolina/repo/ciem2ds/ciempiess_ds/sortlen_all_test.csv
Computing acoustic model predictions | Steps: 6974 | Elapsed Time: 0:01:56
Decoding predictions | 100% (6974 of 6974) |###################################################################################################################| Elapsed Time: 0:23:50 Time: 0:23:50
Test on /home/amolina/repo/ciem2ds/ciempiess_ds/sortlen_all_test.csv - WER: 0.911197, CER: 0.754889, loss: 154.287247
--------------------------------------------------------------------------------
WER: 2.500000, CER: 12.000000, loss: 51.920681
- src: "después evolucionó"
- res: "de que con con lo"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 4.000000, loss: 7.394387
- src: "pintando"
- res: "en tanto"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 5.000000, loss: 9.625606
- src: "talando"
- res: "a la"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 4.000000, loss: 10.330667
- src: "esclavos"
- res: "es la"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 4.000000, loss: 11.659359
- src: "entonces"
- res: "en donde"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 5.000000, loss: 12.737014
- src: "soldados"
- res: "son las"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 5.000000, loss: 15.502726
- src: "tiones"
- res: "yo me"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 4.000000, loss: 15.514644
- src: "okey"
- res: "o que"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 6.000000, loss: 17.840826
- src: "círculo"
- res: "si no"
--------------------------------------------------------------------------------
WER: 1.666667, CER: 22.000000, loss: 56.092628
- src: "concientización magníficamente desentendimiento"
- res: "con sentido la mexicana sentimiento"
--------------------------------------------------------------------------------
--lm_trie_path
is missing, did you trained the lm with accents á,é,ó? I think is only using a-z without accents from the english trie
alemol:
–train_batch_size 2
Are you just learning how to use DS or trying to build a usable model?
alemol
(Alejandro Molina)
September 5, 2019, 7:00pm
5
Are you just learning how to use DS or trying to build a usable model?
I am trying both learning how to use DS and then trying to build a usable model. I just want to be sure that i have all the necessary before train seriously.
I will add the --lm_trie_path but is easier if you tell me what else is missing . Thanks
Please read :
from __future__ import absolute_import, division, print_function
import os
import absl.flags
FLAGS = absl.flags.FLAGS
def create_flags():
# Importer
# ========
f = absl.flags
f.DEFINE_string('train_files', '', 'comma separated list of files specifying the dataset used for training. Multiple files will get merged. If empty, training will not be run.')
f.DEFINE_string('dev_files', '', 'comma separated list of files specifying the dataset used for validation. Multiple files will get merged. If empty, validation will not be run.')
f.DEFINE_string('test_files', '', 'comma separated list of files specifying the dataset used for testing. Multiple files will get merged. If empty, the model will not be tested.')
f.DEFINE_string('feature_cache', '', 'path where cached features extracted from --train_files will be saved. If empty, caching will be done in memory and no files will be written.')
f.DEFINE_integer('feature_win_len', 32, 'feature extraction audio window length in milliseconds')
This file has been truncated. show original
Here’s one of my old commands
./DeepSpeech.py \
--beam_width 1024 \
--train_files /yourpath/train.csv \
--dev_files /yourpath/dev.csv \
--test_files /yourpath/test.csv \
--train_batch_size 20 \
--dev_batch_size 48 \
--test_batch_size 48 \
--n_hidden 2048 \
--epochs 15 \
--report_count 900000 \
--earlystop_nsteps 1 \
--dropout_rate 0.11 \
--early_stop True \
--learning_rate 0.0001 \
--lm_alpha 0.75 \
--lm_beta 2.2 \
--export_dir /yourpath/result/models-tl \
--checkpoint_dir /yourpath/result/ckpts-tl2 \
--alphabet_config_path /yourpath/langmodel/alphabet.txt \
--lm_binary_path /yourpath/langmodel/lm.binary \
--lm_trie_path/yourpath/langmodel/trie
And sorry, if you need to build a production model you will need a good set of GPUs.
1 Like