Why tflite don't return any word

I am about to create my own model with little data using colab to train (tensorflow 1.15, cuda11.2) this is notebook
First I created vocab-500.txt and lm.binary:
python3 generate_lm vocabulary.py --input_txt vocabulary.txt --output_dir.
–top_k 500 --kenlm_bins ~ / DeepSpeech / kenlm / build / bin /
–arpa_order 3 --max_arpa_memory “85%” --arpa_prune “0 | 0 | 1”
–binary_a_bits 255 --binary_q_bits 8 --binary_type trie
–discount_fallback
Secondly I create scorer with alpha = 0.75 and beta = 1.85
./generate_scorer_package - alphabet alphabet.txt
–lm lm.binary
–vocab vocab-500.txt
–package kenlm.scorer
–default_alpha 0.75
–default_beta 1.85
–force_bytes_output_mode True
3rd I train models:
! python3 /content/DeepSpeech/DeepSpeech.py ​​–n_hidden 2048
–early_stop True
–es_epochs 30
–test_batch_size 1
–dev_batch_size 10
–train_batch_size 16
–feature_win_step 10 -
-train_cudnn True
–checkpoint_dir / content /
–epochs 100
–train_files /content/vivos/train.csv
–dev_files /content/vivos/dev.csv
–test_files /content/vivos/test.csv
–learning_rate 0.000095
–export_tflite
–export_dir / content / DeepSpeech / output_models /
–automatic_mixed_precision True
–dropout_rate 0.05
–al alphabet_config_path /content/vivos/al alphabet.txt
–scorer_path /content/vivos/kenlm.scorer
And get results:
Epoch x | Training | Elapsed Time: 0:05:50 | Steps: 728 | Loss: 0.669057
Epoch x | Validation | Elapsed Time: 0:00:08 | Steps: 52 | Loss: 96.212590 | Dataset: /content/vivos/dev.csv
and test without any words:
Best WER:

WER: 1.000000, CER: 1.000000, loss: 318.883636

  • wav: file:///content/vivos/test/waves/VIVOSDEV02/VIVOSDEV02_R154.wav
  • src: “BỌN BUÔN DỰ ÁN CHẠY CHỌT BÀY RA CÁC DỰ ÁN ĐỂ KIẾM CHÁC”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 221.555115

  • wav: file:///content/vivos/test/waves/VIVOSDEV02/VIVOSDEV02_R132.wav
  • src: “ĐIỆN THOẠI RENG NHƯNG TA CŨNG PHẢI NHẤC MÁY ĐÚNG KHÔNG”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 204.180481

  • wav: file:///content/vivos/test/waves/VIVOSDEV02/VIVOSDEV02_T022.wav
  • src: “THÁNG TRƯỚC CHỒNG VỀ MẤY NGÀY MÌNH NẰM NGỦ THẤY ẤM KINH KHỦNG”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 198.045197

  • wav: file:///content/vivos/test/waves/VIVOSDEV02/VIVOSDEV02_R071.wav
  • src: “MỖI KHI XE RÁC ĐI NGANG QUA ĐÁNH KẺNG MỚI ĐEM RA ĐỔ”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 195.038269

  • wav: file:///content/vivos/test/waves/VIVOSDEV02/VIVOSDEV02_R045.wav
  • src: “CÔ CỨ NHÈ NGAY CÁI NỌNG CÁ TRÊ”
  • res: “”

4th I search for alpha and beta with this model:
! python3 /content/DeepSpeech/lm_optimizer.py
–test_files /content/vivos/test.csv
–al alphabet_config_path /content/vivos/al alphabet.txt
–checkpoint_dir / content / fine_tuning_checkpoints
–n_hidden 2048
Results: [I 2021-04-22 09: 34: 16,592] Trial 0 finished with value: 0.9127290260366442 and parameters: {‘lm_alpha’: 2.6143084230162623, ‘lm_beta’: 1.998640895456838}. Best is trial 0 with value: 0.9127290260366442.
At worst, it still has the words:
Worst WER:

WER: 1.000000, CER: 0.666667, loss: 48.078129

  • wav: file:///content/vivos/test/waves/VIVOSDEV04/VIVOSDEV04_R049.wav
  • src: “VÌ TỐI ĐÓ”
  • res: “VIÌ TÍ MỐC”

WER: 1.000000, CER: 0.625000, loss: 46.857883

  • wav: file:///content/vivos/test/waves/VIVOSDEV05/VIVOSDEV05_091.wav
  • src: “TRÁN GIÔ”
  • res: “TẢN DIỘC”

WER: 1.000000, CER: 0.333333, loss: 35.851494

  • wav: file:///content/vivos/test/waves/VIVOSDEV06/VIVOSDEV06_222.wav
  • src: “NĂM MƯƠI NĂM MƯƠI MỐT”
  • res: “ĐĂM MƯỜI ĐĂM MÙY MÓT”

WER: 1.000000, CER: 0.500000, loss: 29.178698

  • wav: file:///content/vivos/test/waves/VIVOSDEV04/VIVOSDEV04_R100.wav
  • src: “BÀ NÓI”
  • res: “BẢ ÓC”

WER: 1.000000, CER: 0.263158, loss: 25.467527

  • wav: file:///content/vivos/test/waves/VIVOSDEV06/VIVOSDEV06_212.wav
  • src: “BA MƯƠI BA MƯƠI MỐT”
  • res: “BAN MƯỜI BAN MƯAI MÓT”

then I create a new scorer with there alpha,beta
./generate_scorer_package - alphabet alphabet.txt
–lm lm.binary
–vocab vocab-500.txt
–package kenlm.scorer
–default_alpha 2.6143084230162623
–default_beta 1.998640895456838
–force_bytes_output_mode True

I use this tflite and scorer files in android-mic-strem-app however I don’t get any word at output. Please help me.

Perhaps the problem is with my scorer model. I think I did the right thing to create the scorer file, this is my alphabet.txt. Has someone encountered a similar situation yet. Please help me!

This looks about right for the amount of data. You might try using transfer learning, and for Vietnamese you might try reducing the vocabulary size by using NFKD.

1 Like

Transfer learning is to reuse the checkpoints of the previous training process. I am doing the same. Especially when I find the right alpha-beta, the test results have word but when using this to build scorer, no word is found.

I’m sorry, I didn’t understand. Your alphabet also has a space after every character. Could you join us on Matrix ?

1 Like

So I just tried training with Vietnamese and the Common Voice data. You can find the model here.

The results without an LM, and the results with an LM.

I used:

  • drop_source_layers: 2
  • learning_rate: 0.00001
  • dropout: 0.2
  • no SpecAugment
1 Like

Can you share me code to create scorer model? I think this is my break.

Thank you for enthusiasm,love you :sunny:

1 Like

Sure, the code is here, it’s not very readable, I’m sorry -_-;; … I’m happy to explain specific parts though.

1 Like

Thank you i use force byte and maybe it’s not right, i’m adding data to the model, i will try it as soon as i am done. Thank you so much.

1 Like