Created an Arabic LM, but deepspeech is not learning, early stops

tarekeldeeb · July 14, 2018, 1:57pm

Hello,
I created an arabic model using:

>alphabets.txt awk 'BEGIN{FS=""} {for(i=1;i<=NF;i++){chars[$(i)]=$(i);}} END{for(c in chars){print c;} }' vocab.txt | sort >/dev/null
kenlm/build/bin/lmplz --text vocab.txt --arpa  words.arpa --o 4
kenlm/build/bin/build_binary trie -q 16 -b 7 -a 64 words.arpa lm.binary
nat_client.0.1.1/generate_trie alphabets.txt lm.binary vocab.txt words.trie

Then I imported my waves (16KHz, 16bit, 1channel) and created csv files, as expected. When I started deepspeech learning, it runs for some epochs (~2 CPU hours), then triggers an early stop. Just for the sake of testing the system I thought that overfitting a single wav in all test/dev/train should show positive results, like that seen with ldc93s1. But I still get early stops with WER=1, loss>100 and weird output after a short CPU time.

My runner has:

python -u DeepSpeech.py \
  --train_files "$COMPUTE_DATA_DIR/arabic1.csv" \
  --dev_files "$COMPUTE_DATA_DIR/arabic1.csv" \
  --test_files "$COMPUTE_DATA_DIR/arabic1.csv" \
  --alphabet_config_path "$COMPUTE_DATA_DIR/alphabets.txt" \
  --lm_binary_path "$COMPUTE_DATA_DIR/lm.binary" \
  --lm_trie_path "$COMPUTE_DATA_DIR/words.trie" \
  \#--export_dir "$COMPUTE_DATA_DIR/exported.model" \
  --train_batch_size 1 \
  --dev_batch_size 1 \
  --test_batch_size 1 \
  --epoch 100 \
  --display_step 1 \
  --validation_step 1 \
  --dropout_rate 0.10 \
  --n_hidden 1024 \
  --default_stddev 0.03125 \
  --learning_rate 0.00001 \
  --checkpoint_dir "$checkpoint_dir" \
  --checkpoint_secs 1800 \
  --summary_secs 1800
  "$@"

I tried n_hidden = 2048, 512, 1024 (with sdt_dev= sqrt(2/(2*n_hidden)))
I tried rates=0.0001 and 0.00001

I manually checked the ARPA file, it was as expected. Seems that I’m missing a crucial point that’s blocking any learning.

tarekeldeeb · July 17, 2018, 4:26pm

increasing the learning rate fixed the issue.

ashutosh.pednekar · April 20, 2019, 6:43am

Could you please provide the your trained model ? It would be very helpful.

tarekeldeeb · December 19, 2019, 2:03pm

Parveez_Ali_Masood_Syed · December 22, 2019, 6:26pm

what version of Tensor was the model in data/quran trained. And will it only work with deepspeech 0.2.0?

anas9011 · December 23, 2019, 8:46am

@Parveez_Ali_Masood_Syed I was able to get it running on the utf8-ctc-v2 branch of DeepSpeech.

You can see the code we used here and preprocessing tools here.

This is bleeding edge so expect many changes over the next few days as I debug myself in this post

chags · December 23, 2019, 4:01pm

@tarekeldeeb In your Readme you mention The accuracy is very high, but the perfect recording conditions does not match the average user (mobile phone?), have you tried adding noise to the samples to sort of re-create how a user would use this model?

Great work btw, really looking forward to seeing how this turns out.

Parveez_Ali_Masood_Syed · December 25, 2019, 2:06pm

Do you have a link to a pre-trained model that you have created?

tarekeldeeb · December 30, 2019, 12:44pm

Hi … I used to work on 0.3 … now I migrated to 0.6
You can reuse my mode here:

deep_learning · January 10, 2020, 3:09am

Hello,
I am working on korean speech dataset and some specific vocabulary only. Please let me know how can I create language model for those specific korean vocabulary ?