Essentially I would like to continue the training of a pre-trained model, the pre-trained model is the one provided by the latest release of deepspeech italia (from here I took files alphabet.txt, checkpoint_it and scorer) and is built with transfer learning and with a different alphabet. I downloaded the zip file of the deepspeech v0.9 repository and after run
python3 setup.py install and creating the environment like this:
python3 -m pip install --no-cache-dir --upgrade pip==20.0.2 wheel==0.34.2 setuptools==46.1.3 python3 -m pip install install numpy==1.18.5 python3 -m pip install pandas==1.1.5 python3 -m pip install scipy==1.5.1 python3 -m pip install tensorflow==1.15.4 python3 -m pip install tensorflow-gpu==1.15.4
I ran the following command to test the training with a small dataset called cv-tiny (consisting of only 50 clips).
python3 DeepSpeech.py \ --load_cudnn False \ --alphabet_config_path /alphabet.txt \ --checkpoint_dir /transfer_checkpoint_it \ --train_files cv-tiny/train.csv \ --dev_files cv-tiny/dev.csv \ --test_files cv-tiny/test.csv \ --scorer_path /scorer \ --train_batch_size 64 \ --dev_batch_size 64 \ --test_batch_size 64 \ --n_hidden 2048 \ --epochs 30 \ --learning_rate 0.0001 \ --dropout_rate 0.4 \ --es_epochs 10 \ --early_stop 1 \ --drop_source_layers 1 \ --export_dir /ckpt/ \ --export_file_name 'output_graph'
the problem comes now, after training while testing the model I get all suspicious bad results like this:
WER: 1.000000, CER: 2.000000, loss: 462.202454 - wav: file:///home/pablo/deep-speech/cv-tiny/common_voice_it_17544185.wav - src: "il vuoto assoluto" - res: "mnmnmnmnmnmnm incensurato finanzierebbe "
if I use the pre-training model I can correctly transcribe the clip
"il vuoto assoluto", instead during the test of the new model it is completely wrong. The result shouldn’t be like that, theoretically I’m adding knowledge to the pre-trained model which should then correctly transcribe the clip.
Did I forget any steps or flags?
ps. I am using the cpu.