My goal is to take an existing model, in Italian language ,with the transfer-learning method and continue its training with other datasets of my choice, obtaining a new model as output. The starting model is the one released in the latest release of DeepSpeech-Italian-Model on GitHub I refer to the file called ‘transfer_model_tensorflow_it.tar.xz’. To continue the training I understood that it is necessary to use the checkpoint files, which in this case are always released with the same release, I refer to the file ‘transfer_checkpoint_it.tar.xz’. The model hyper-parameters declared for model training are as follows:
- batch_size=64
- n_hidden=2048
- epochs=30
- learning_rate=0.0001
- dropout=0.4
- lm_alpha=0
- lm_beta=0
- es_epochs=10
- early_stop=1
- amp=0
- drop_source_layer=1
Assuming that I need to continue training the starting model (using only the CPU, then inserting '--load_cudnn False'
) with a test dataset called “cv-tiny”, I launched the following command:
python3 DeepSpeech.py \
--load_cudnn False \
--alphabet_config_path /alphabet.txt \
--checkpoint_dir /transfer_checkpoint_it \
--train_files cv-tiny/train.csv \
--dev_files cv-tiny/dev.csv \
--test_files cv-tiny/test.csv \
--scorer_path /scorer \
“””
hyperparameters declared in the starting model
“””
--train_batch_size 64 \
--dev_batch_size 64 \
--test_batch_size 64 \
--n_hidden 2048 \
--epochs 30 \
--learning_rate 0.0001 \
--dropout_rate 0.4 \
--es_epochs 10 \
--early_stop 1 \
--drop_source_layers 1 \
“””
files export
“””
--export_dir /ckpt/ \
--export_file_name 'output_graph'
Is it correct to use the same hyper-parameters of the starting model with the exception of '--lm_alpha = 0'
and '--lm_beta = 0'
? In the transfer_flag.txt file released in the release, the values amount to '--lm_alpha = 0.931289039105002'
and '--lm_beta = 1.1834137581510284'
.
Is it correct to load the ‘score’ file of the starting model in this way '--scorer_path / scorer'
or is it not necessary?
Do I still have to enter '--drop_source_layers 1'
?
why during model testing I get bad results (‘src’ is intended as the original sentence and ‘res’ final transcription) such as:
WER: 1.000000, CER: 1.484848, loss: 1267.066528
- wav: file:///home/cv-tiny/common_voice_it_19973815.wav
- src: "alan clarke"
- res: "mnmnmnmnmnmnm uguaglianza bumburubumbububum"
The model has already been trained with a sufficient number of hours and should perform much better during testing. I report the whole process:
(env) root@pablo-G5-5590:/home/pablo/deep-speech/DeepSpeech-r0.9# python3 DeepSpeech.py \
> --load_cudnn False \
> --alphabet_config_path /home/pablo/deep-speech/transfer_model_tensorflow_it/alphabet.txt \
> --checkpoint_dir /mnt/checkpoints \
> --train_files /home/pablo/deep-speech/cv-tiny/train.csv \
> --dev_files /home/pablo/deep-speech/cv-tiny/dev.csv \
> --test_files /home/pablo/deep-speech/cv-tiny/test.csv \
> --scorer_path /home/pablo/deep-speech/transfer_model_tensorflow_it/scorer \
> --train_batch_size 64 \
> --dev_batch_size 64 \
> --test_batch_size 64 \
> --n_hidden 2048 \
> --epochs 30 \
> --learning_rate 0.0001 \
> --dropout_rate 0.4 \
> --es_epochs 10 \
> --early_stop 1 \
> --drop_source_layers 1 \
> --export_dir /home/pablo/deep-speech/ckpt/ \
> --export_file_name 'ft_model'
I Loading best validating checkpoint from /mnt/checkpoints/best_dev-754152
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam_1
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I Initializing variable: layer_6/bias
I Initializing variable: layer_6/bias/Adam
I Initializing variable: layer_6/bias/Adam_1
I Initializing variable: layer_6/weights
I Initializing variable: layer_6/weights/Adam
I Initializing variable: layer_6/weights/Adam_1
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | DatasEpoch 0 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | DatEpoch 0 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | Dataset: /home/pablo/deep-speech/cv-tiny/dev.csv
I Saved new best validating model with loss 852.037170 to: /mnt/checkpoints/best_dev-754152
--------------------------------------------------------------------------------
Epoch 1 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 1 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | DatasEpoch 1 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | DatEpoch 1 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | Dataset: /home/pablo/deep-speech/cv-tiny/dev.csv
--------------------------------------------------------------------------------
Epoch 2 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 2 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | DatasEpoch 2 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | DatEpoch 2 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | Dataset: /home/pablo/deep-speech/cv-tiny/dev.csv
--------------------------------------------------------------------------------
Epoch 3 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 3 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | DatasEpoch 3 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | DatEpoch 3 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | Dataset: /home/pablo/deep-speech/cv-tiny/dev.csv
--------------------------------------------------------------------------------
Epoch 4 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 4 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | DatasEpoch 4 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | DatEpoch 4 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | Dataset: /home/pablo/deep-speech/cv-tiny/dev.csv
--------------------------------------------------------------------------------
Epoch 5 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 5 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | DatasEpoch 5 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | DatEpoch 5 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | Dataset: /home/pablo/deep-speech/cv-tiny/dev.csv
--------------------------------------------------------------------------------
Epoch 6 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 6 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | DatasEpoch 6 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | DatEpoch 6 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | Dataset: /home/pablo/deep-speech/cv-tiny/dev.csv
--------------------------------------------------------------------------------
Epoch 7 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 7 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | DatasEpoch 7 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | DatEpoch 7 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | Dataset: /home/pablo/deep-speech/cv-tiny/dev.csv
--------------------------------------------------------------------------------
Epoch 8 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 8 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | DatasEpoch 8 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | DatEpoch 8 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | Dataset: /home/pablo/deep-speech/cv-tiny/dev.csv
--------------------------------------------------------------------------------
Epoch 9 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 9 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | DatasEpoch 9 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | DatEpoch 9 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | Dataset: /home/pablo/deep-speech/cv-tiny/dev.csv
--------------------------------------------------------------------------------
Epoch 10 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 10 | Validation | Elapsed Time: 0:00:06 | Steps: 1 | Loss: 852.037170 | Dataset: /home/pablo/deep-speech/cv-tiny/dev.csv
I Early stop triggered as the loss did not improve the last 10 epochs
I FINISHED optimization in 0:01:21.120896
I Loading best validating checkpoint from /mnt/checkpoints/best_dev-754152
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/weights
Testing model on /home/pablo/deep-speech/cv-tiny/test.csv
Test epoch | Steps: 1 | Elapsed Time: 0:00:09
Test on /home/pablo/deep-speech/cv-tiny/test.csv - WER: 1.000000, CER: 1.000000, loss: 897.548157
--------------------------------------------------------------------------------
Best WER:
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.246753, loss: 1364.445801
- wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_20001185.wav
- src: "in seguito kygo e shear hanno proposto di continuare a lavorare sulla canzone"
- res: "mnmnmnmnmnmnm mnmnmnmnmnmnm novantanove neurodegenerative e unoperazione buonaparte furstenfeldbruck bisbisbisbisbisbis"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.484848, loss: 1267.066528
- wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_19973815.wav
- src: "vi furono internati ebrei e profughi slavi provenienti dai balcani"
- res: "manderebbe nerobianconerobianconerobianconerobianconerobianco neurofibromatosi pulitissimo neolaureata beauxbatons e sbirbimababu"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.951220, loss: 1070.170776
- wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_20045040.wav
- src: "fin dall'inizio la sede episcopale è stata immediatamente soggetta alla santa sede"
- res: "non mnmnmnmnmnmnm nerobianconerobianconerobianconerobianconerobianco effettuerebbero bubububù"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.184211, loss: 902.796448
- wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_20059124.wav
- src: "la parte superiore della facciata comprende una finestra rettangolare murata"
- res: "mnmnmnmnmnmnm bambinimiracolosidilahiri biancobiancobiancobiancobianco ugualmente aufnahmeausschusssitzung membri"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.060241, loss: 891.478088
- wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_20042813.wav
- src: "dopo alcuni anni egli decise di tornare in india per raccogliere altri insegnamenti"
- res: "mnmnmnmnmnmnm dinosauro buongustaio separatamente autocensurerebbero fermerebbe perfettamente bisbisbisbisbisbis"
--------------------------------------------------------------------------------
Median WER:
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.951220, loss: 1070.170776
- wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_20045040.wav
- src: "fin dall'inizio la sede episcopale è stata immediatamente soggetta alla santa sede"
- res: "non mnmnmnmnmnmnm nerobianconerobianconerobianconerobianconerobianco effettuerebbero bubububù"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.184211, loss: 902.796448
- wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_20059124.wav
- src: "la parte superiore della facciata comprende una finestra rettangolare murata"
- res: "mnmnmnmnmnmnm bambinimiracolosidilahiri biancobiancobiancobiancobianco ugualmente aufnahmeausschusssitzung membri"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.060241, loss: 891.478088
- wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_20042813.wav
- src: "dopo alcuni anni egli decise di tornare in india per raccogliere altri insegnamenti"
- res: "mnmnmnmnmnmnm dinosauro buongustaio separatamente autocensurerebbero fermerebbe perfettamente bisbisbisbisbisbis"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.759259, loss: 882.784058
- wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_20033266.wav
- src: "particolare riguardo è riservato alla produzione da agricoltura biologica sempre più diffusa nella provincia"
- res: "mandarinadorme bustamontesecondo piupericoloso biofluorescente furtivamente furuhatauna bibibibi"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.860000, loss: 869.030579
- wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_20060953.wav
- src: "è anche supportata una cifratura utente end to end"
- res: "mnmnmnmnmnmnm nabucodonosor uòresce fuhrerhauptquartiere un'esagerazione unopinione unafalciatrice bumburubumbububum"
--------------------------------------------------------------------------------
Worst WER:
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.060241, loss: 891.478088
- wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_20042813.wav
- src: "dopo alcuni anni egli decise di tornare in india per raccogliere altri insegnamenti"
- res: "mnmnmnmnmnmnm dinosauro buongustaio separatamente autocensurerebbero fermerebbe perfettamente bisbisbisbisbisbis"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.759259, loss: 882.784058
- wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_20033266.wav
- src: "particolare riguardo è riservato alla produzione da agricoltura biologica sempre più diffusa nella provincia"
- res: "mandarinadorme bustamontesecondo piupericoloso biofluorescente furtivamente furuhatauna bibibibi"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.860000, loss: 869.030579
- wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_20060953.wav
- src: "è anche supportata una cifratura utente end to end"
- res: "mnmnmnmnmnmnm nabucodonosor uòresce fuhrerhauptquartiere un'esagerazione unopinione unafalciatrice bumburubumbububum"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 2.000000, loss: 462.202454
- wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_17544185.wav
- src: "il vuoto assoluto"
- res: "mnmnmnmnmnmnm incensurato finanzierebbe "
--------------------------------------------------------------------------------
WER: 1.500000, CER: 3.363636, loss: 367.958771
- wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_19997999.wav
- src: "alan clarke"
- res: "mnmnmnmnmnmnm uguaglianza bumburubumbububum"
--------------------------------------------------------------------------------
I Exporting the model...
I Loading best validating checkpoint from /mnt/checkpoints/best_dev-754152
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/weights
I Models exported at /home/pablo/deep-speech/ckpt/
I Model metadata file saved to /home/pablo/deep-speech/ckpt/author_model_0.0.1.md. Before submitting the exported model for publishing make sure all information in the metadata file is correct, and complete the URL fields.