Essentially I would like to continue the training of a pre-trained model, the pre-trained model is the one provided by the latest release of deepspeech italia (from here I took files alphabet.txt, checkpoint_it and scorer) and is built with transfer learning and with a different alphabet. I downloaded the zip file of the deepspeech v0.9 repository and after run python3 setup.py install
and creating the environment like this:
python3 -m pip install --no-cache-dir --upgrade pip==20.0.2 wheel==0.34.2 setuptools==46.1.3
python3 -m pip install install numpy==1.18.5
python3 -m pip install pandas==1.1.5
python3 -m pip install scipy==1.5.1
python3 -m pip install tensorflow==1.15.4
python3 -m pip install tensorflow-gpu==1.15.4
I ran the following command to test the training with a small dataset called cv-tiny (consisting of only 50 clips).
python3 DeepSpeech.py \
--load_cudnn False \
--alphabet_config_path /alphabet.txt \
--checkpoint_dir /transfer_checkpoint_it \
--train_files cv-tiny/train.csv \
--dev_files cv-tiny/dev.csv \
--test_files cv-tiny/test.csv \
--scorer_path /scorer \
--train_batch_size 64 \
--dev_batch_size 64 \
--test_batch_size 64 \
--n_hidden 2048 \
--epochs 30 \
--learning_rate 0.0001 \
--dropout_rate 0.4 \
--es_epochs 10 \
--early_stop 1 \
--drop_source_layers 1 \
--export_dir /ckpt/ \
--export_file_name 'output_graph'
the problem comes now, after training while testing the model I get all suspicious bad results like this:
WER: 1.000000, CER: 2.000000, loss: 462.202454
- wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_17544185.wav
- src: "il vuoto assoluto"
- res: "mnmnmnmnmnmnm incensurato finanzierebbe "
if I use the pre-training model I can correctly transcribe the clip "il vuoto assoluto"
, instead during the test of the new model it is completely wrong. The result shouldn’t be like that, theoretically I’m adding knowledge to the pre-trained model which should then correctly transcribe the clip.
Did I forget any steps or flags?
ps. I am using the cpu.