Hi,
I have been trying to train DeepSpeech (0.7.0-alpha.2) on a custom dataset of 69,000+ audio files. Setup the kenlm scorer after creating a custom lm.binary using the following commands:
./bin/lmplz --text …/…/tts/vocabulary.txt --arpa words.arpa --o 5 --discount_fallback
./bin/build_binary -T -s -v words.arpa lm.binary
cd to DeepSpeech/data/lm, ran the below command:
python3 generate_package.py --alphabet ~/April14/alphabet.txt --lm ~/April14/lm.binary --vocab ~/April14/vocabulary.txt --default_alpha 0.75 --default_beta 1.85 --package ~/April14/kenlm.scorer
I then ran the training with the following configuration flags:
python3 DeepSpeech.py
–train_files /home/ubuntu/April14/train/train.csv
–dev_files /home/ubuntu/April14/dev/dev.csv
–test_files /home/ubuntu/April14/test/test.csv
–test_batch_size 5
–dev_batch_size 5
–train_batch_size 32
–learning_rate 0.000055
–epochs 200
–early_stop True
–augmentation_freq_and_time_masking True
–augmentation_pitch_and_tempo_scaling True
–augmentation_spec_dropout_keeprate 0.8
–automatic_mixed_precision True
–train_cudnn True
–alphabet_config_path /home/ubuntu/April14/viet_alpha.txt
–export_dir /home/ubuntu/April14/results/model_export/
–checkpoint_dir /home/ubuntu/April14/checkout/
–summary_dir /home/ubuntu/April14/summary/
–scorer_path /home/ubuntu/April14/kenlm.scorer
–export_language Vietnamese
–es_epochs 30 \
The result of the training was:
Epoch 79 | Training | Elapsed Time: 0:28:28 | Steps: 2099 | Loss: 0.034167
Epoch 79 | Validation | Elapsed Time: 0:00:15 | Steps: 137 | Loss: 0.300532 | Dataset: /home/ubuntu/April14/dev/dev.csv
I Early stop triggered as the loss did not improve the last 30 epochs
I FINISHED optimization in 1 day, 14:28:35.242939
The result of the testing was:
Test on /home/ubuntu/April14/test/test.csv - WER: 0.001845, CER: 0.003121, loss: 0.295476
So in conclusion it did really well, however the results weren’t so good while creating inferences.
For example:
src: tôi có phải đóng tiền cho nhân viên khi đăng ký vay mua xe không
res: hàng bô đo tquái gàn đaơng vgh loa lưng lôi th lản h vay lua gha la a đ
How can I use the below parameters to get better augmentations for the data?
- Standard deviation for Gaussian additive noise: data_aug_features_additive
- Standard deviation for Normal distribution around 1 for multiplicative noise: data_aug_features_multiplicative
- Standard deviation for speeding-up tempo. If Standard deviation is 0, this augmentation is not performed: augmentation_speed_up_std
PS: if you have any other suggestions please let me know.