Created an Arabic LM, but deepspeech is not learning, early stops

I created an arabic model using:

>alphabets.txt awk 'BEGIN{FS=""} {for(i=1;i<=NF;i++){chars[$(i)]=$(i);}} END{for(c in chars){print c;} }' vocab.txt | sort >/dev/null
kenlm/build/bin/lmplz --text vocab.txt --arpa --o 4
kenlm/build/bin/build_binary trie -q 16 -b 7 -a 64 lm.binary
nat_client.0.1.1/generate_trie alphabets.txt lm.binary vocab.txt words.trie

Then I imported my waves (16KHz, 16bit, 1channel) and created csv files, as expected. When I started deepspeech learning, it runs for some epochs (~2 CPU hours), then triggers an early stop. Just for the sake of testing the system I thought that overfitting a single wav in all test/dev/train should show positive results, like that seen with ldc93s1. But I still get early stops with WER=1, loss>100 and weird output after a short CPU time.

My runner has:

python -u \
  --train_files "$COMPUTE_DATA_DIR/arabic1.csv" \
  --dev_files "$COMPUTE_DATA_DIR/arabic1.csv" \
  --test_files "$COMPUTE_DATA_DIR/arabic1.csv" \
  --alphabet_config_path "$COMPUTE_DATA_DIR/alphabets.txt" \
  --lm_binary_path "$COMPUTE_DATA_DIR/lm.binary" \
  --lm_trie_path "$COMPUTE_DATA_DIR/words.trie" \
  \#--export_dir "$COMPUTE_DATA_DIR/exported.model" \
  --train_batch_size 1 \
  --dev_batch_size 1 \
  --test_batch_size 1 \
  --epoch 100 \
  --display_step 1 \
  --validation_step 1 \
  --dropout_rate 0.10 \
  --n_hidden 1024 \
  --default_stddev 0.03125 \
  --learning_rate 0.00001 \
  --checkpoint_dir "$checkpoint_dir" \
  --checkpoint_secs 1800 \
  --summary_secs 1800

I tried n_hidden = 2048, 512, 1024 (with sdt_dev= sqrt(2/(2*n_hidden)))
I tried rates=0.0001 and 0.00001

I manually checked the ARPA file, it was as expected. Seems that I’m missing a crucial point that’s blocking any learning.

increasing the learning rate fixed the issue.


Could you please provide the your trained model ? It would be very helpful.

1 Like

what version of Tensor was the model in data/quran trained. And will it only work with deepspeech 0.2.0?

@Parveez_Ali_Masood_Syed I was able to get it running on the utf8-ctc-v2 branch of DeepSpeech.

You can see the code we used here and preprocessing tools here.

This is bleeding edge so expect many changes over the next few days as I debug myself in this post :slight_smile:

1 Like

@tarekeldeeb In your Readme you mention The accuracy is very high, but the perfect recording conditions does not match the average user (mobile phone?), have you tried adding noise to the samples to sort of re-create how a user would use this model?

Great work btw, really looking forward to seeing how this turns out.

Do you have a link to a pre-trained model that you have created?

Hi … I used to work on 0.3 … now I migrated to 0.6
You can reuse my mode here:

1 Like

I am working on korean speech dataset and some specific vocabulary only. Please let me know how can I create language model for those specific korean vocabulary ?