Hello @lissyx and @othiele,
I have downloaded all the dependencies and successfully fine-tuned with 34 hours of Indian accent audio extracted from Youtube. There is a total of 16500 audio files, I trained on 13k, 2k, 1.5k (train, dev, test). But I didn’t get good accuracy, my WER is 0.29.
python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir checkpoints/vlsi2 --epochs 40 --train_files …/deepspeech-0.6.1-models/audio/vlsi/train.csv --dev_files …/deepspeech-0.6.1-models/audio/vlsi/validate.csv --test_files …/deepspeech-0.6.1-models/audio/vlsi/test.csv --learning_rate 0.000001 --use_cudnn_rnn true --use_allow_growth true --lm_binary_path …/deepspeech-0.6.1-models/lm.binary --lm_trie_path …/deepspeech-0.6.1-models/trie --noearly_stop --export_dir exported_model/vlsi2 --train_batch_size 128 --dev_batch_size 128 --test_batch_size 128
The model is giving WER 0.29 on the test set, and even it predicts some of the words wrong as compared to the pre-trained model (w/o tuning).
Am I doing something wrong?
If you need any other info, plz let me know.
EDIT: I am using DeepSpeech 0.6.1