Training on Persian dataset cannot converge

did you use transfer learning from Latin languages to gain this WER on test data?

I think that may help. Because many people used transfer learning from English to Spanish or German, and got better results. But I a, not sure this can help for Persian too :thinking:

Hi Mohammad
would you please share the details of your training like the value of your parameters and your loss logs?!
Iā€™m doing the same as you did and canā€™t get appropriate responses.
thanks in advance

and what is the size of the Persain text file for training scorer and what kind of sentences do you use?!

I think transfer learning is propper when your alphabets are the same (I mean Latin alphabet). but for Persian, because the alphabets are different so you canā€™t do transfer learning.

1 Like

hey can you share your steps to make your own scorer file .
Iā€™m unable to get the scorer when i run the below command`

!./generate_scorer_package --alphabet /gdrive/My\ Drive/dataset/UrduAlphabet_newscrawl.txt --lm /gdrive/My\ Drive/urdu_lm/lm.binary
ā€“package /content/gdrive/My\ Drive/dataset/kenlm.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284 --vocab /content/gdrive/My\ Drive/urdu_lm/vocab-500

got below error when run this

500000 unique words read from vocabulary file.
Doesnā€™t look like a character based (Bytes Are All You Need) model.
ā€“force_bytes_output_mode was not specified, using value infered from vocabulary contents: false
Invalid label 0
`

Really? Reuben gave you the answer 2 days ago. Please donā€™t hijack older threds, but read what we post:

Dear Mohammad,
Can you make share changes you make to the original DeepSpeech repo to work with Arabic/ Farsi script?

If there are fixes for training arabic/farsi, please send PR ā€¦