I am about to create my own model with little data using colab to train (tensorflow 1.15, cuda11.2) this is notebook
First I created vocab-500.txt and lm.binary:
python3 generate_lm vocabulary.py --input_txt vocabulary.txt --output_dir.
–top_k 500 --kenlm_bins ~ / DeepSpeech / kenlm / build / bin /
–arpa_order 3 --max_arpa_memory “85%” --arpa_prune “0 | 0 | 1”
–binary_a_bits 255 --binary_q_bits 8 --binary_type trie
–discount_fallback
Secondly I create scorer with alpha = 0.75 and beta = 1.85
./generate_scorer_package - alphabet alphabet.txt
–lm lm.binary
–vocab vocab-500.txt
–package kenlm.scorer
–default_alpha 0.75
–default_beta 1.85
–force_bytes_output_mode True
3rd I train models:
! python3 /content/DeepSpeech/DeepSpeech.py –n_hidden 2048
–early_stop True
–es_epochs 30
–test_batch_size 1
–dev_batch_size 10
–train_batch_size 16
–feature_win_step 10 -
-train_cudnn True
–checkpoint_dir / content /
–epochs 100
–train_files /content/vivos/train.csv
–dev_files /content/vivos/dev.csv
–test_files /content/vivos/test.csv
–learning_rate 0.000095
–export_tflite
–export_dir / content / DeepSpeech / output_models /
–automatic_mixed_precision True
–dropout_rate 0.05
–al alphabet_config_path /content/vivos/al alphabet.txt
–scorer_path /content/vivos/kenlm.scorer
And get results:
Epoch x | Training | Elapsed Time: 0:05:50 | Steps: 728 | Loss: 0.669057
Epoch x | Validation | Elapsed Time: 0:00:08 | Steps: 52 | Loss: 96.212590 | Dataset: /content/vivos/dev.csv
and test without any words:
Best WER:
WER: 1.000000, CER: 1.000000, loss: 318.883636
- wav: file:///content/vivos/test/waves/VIVOSDEV02/VIVOSDEV02_R154.wav
- src: “BỌN BUÔN DỰ ÁN CHẠY CHỌT BÀY RA CÁC DỰ ÁN ĐỂ KIẾM CHÁC”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 221.555115
- wav: file:///content/vivos/test/waves/VIVOSDEV02/VIVOSDEV02_R132.wav
- src: “ĐIỆN THOẠI RENG NHƯNG TA CŨNG PHẢI NHẤC MÁY ĐÚNG KHÔNG”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 204.180481
- wav: file:///content/vivos/test/waves/VIVOSDEV02/VIVOSDEV02_T022.wav
- src: “THÁNG TRƯỚC CHỒNG VỀ MẤY NGÀY MÌNH NẰM NGỦ THẤY ẤM KINH KHỦNG”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 198.045197
- wav: file:///content/vivos/test/waves/VIVOSDEV02/VIVOSDEV02_R071.wav
- src: “MỖI KHI XE RÁC ĐI NGANG QUA ĐÁNH KẺNG MỚI ĐEM RA ĐỔ”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 195.038269
- wav: file:///content/vivos/test/waves/VIVOSDEV02/VIVOSDEV02_R045.wav
- src: “CÔ CỨ NHÈ NGAY CÁI NỌNG CÁ TRÊ”
- res: “”
4th I search for alpha and beta with this model:
! python3 /content/DeepSpeech/lm_optimizer.py
–test_files /content/vivos/test.csv
–al alphabet_config_path /content/vivos/al alphabet.txt
–checkpoint_dir / content / fine_tuning_checkpoints
–n_hidden 2048
Results: [I 2021-04-22 09: 34: 16,592] Trial 0 finished with value: 0.9127290260366442 and parameters: {‘lm_alpha’: 2.6143084230162623, ‘lm_beta’: 1.998640895456838}. Best is trial 0 with value: 0.9127290260366442.
At worst, it still has the words:
Worst WER:
WER: 1.000000, CER: 0.666667, loss: 48.078129
- wav: file:///content/vivos/test/waves/VIVOSDEV04/VIVOSDEV04_R049.wav
- src: “VÌ TỐI ĐÓ”
- res: “VIÌ TÍ MỐC”
WER: 1.000000, CER: 0.625000, loss: 46.857883
- wav: file:///content/vivos/test/waves/VIVOSDEV05/VIVOSDEV05_091.wav
- src: “TRÁN GIÔ”
- res: “TẢN DIỘC”
WER: 1.000000, CER: 0.333333, loss: 35.851494
- wav: file:///content/vivos/test/waves/VIVOSDEV06/VIVOSDEV06_222.wav
- src: “NĂM MƯƠI NĂM MƯƠI MỐT”
- res: “ĐĂM MƯỜI ĐĂM MÙY MÓT”
WER: 1.000000, CER: 0.500000, loss: 29.178698
- wav: file:///content/vivos/test/waves/VIVOSDEV04/VIVOSDEV04_R100.wav
- src: “BÀ NÓI”
- res: “BẢ ÓC”
WER: 1.000000, CER: 0.263158, loss: 25.467527
- wav: file:///content/vivos/test/waves/VIVOSDEV06/VIVOSDEV06_212.wav
- src: “BA MƯƠI BA MƯƠI MỐT”
- res: “BAN MƯỜI BAN MƯAI MÓT”
then I create a new scorer with there alpha,beta
./generate_scorer_package - alphabet alphabet.txt
–lm lm.binary
–vocab vocab-500.txt
–package kenlm.scorer
–default_alpha 2.6143084230162623
–default_beta 1.998640895456838
–force_bytes_output_mode True
I use this tflite and scorer files in android-mic-strem-app however I don’t get any word at output. Please help me.