Arabic model stuck

lissyx · March 2, 2021, 12:10pm

can you please share that as plain text here, instead of a google doc link?

hhiyassat · March 2, 2021, 12:48pm

sure

Epoch	training loss	Validation loss
0	70.788821	54.815269
1	43.40599	45.615384
2	36.944965	40.852923
3	33.187017	37.502073
4	30.515962	34.640956
5	28.921748	32.462484
6	27.693192	30.43777
7	26.646094	26.959204
8	25.71389	24.63013
9	25.01252	23.623141
10	24.456176	22.395472
11	23.689649	21.885186
12	23.181059	21.200026
13	22.769138	20.290973
14	22.604338	19.716581
15	22.366818	19.167822
16	22.205989	21.117617
17	21.955193	18.876346
18	21.86773	18.313352

lissyx · March 2, 2021, 12:51pm

I’m wondering if your training might not have been stopped a bit too early, looking at the figures.

hhiyassat · March 2, 2021, 1:31pm

i removed the no early stop this time

hhiyassat · March 2, 2021, 7:19pm

The training finished but result is worse than the training with early stop
Any suggestions is highly appreciated

lissyx · March 2, 2021, 7:28pm

Unfortunately, as I stated earlier, it’s likely expected given the data you have.

dan.bmh · March 3, 2021, 8:08am

I also would say the above training did stop too early.

In my trainings I always used early stopping, with es_epochs=7 and es_min_delta=0.1, as well as plateau reduction with epochs=3 I think.

Set training epochs to your estimated maximal epoch number, something like 50?, but not to high, because of the augmentation ranges.

hhiyassat · March 4, 2021, 9:16am

What is your training corpus size in terms of Hours
mine is about 1200 hours i guess

dan.bmh · March 4, 2021, 9:45am

I did use about 1500h for german and 700h for Spanish.
So 1200h should be something you can work with, especially if you can later reduce the domain size with a specialized language model.
I would estimate you can reach something between 15-20% WER, depending how complex understanding arabic is.

-------- Original-Nachricht --------

hhiyassat · March 4, 2021, 12:24pm

Thank you daneial
Can you share your training script variables
No of epoch
Learned no rate etc
Thank you

dan.bmh · March 4, 2021, 3:00pm

You can find everything here:

The exact variables are included in the published checkpoint files (links are at the bottom of the readme)

hhiyassat · March 6, 2021, 8:37pm

No if I want to go to higher number of epochs say 30
Can I continue training from where it is stopped
Or I should start fresh training with him number of epoch
Some one told me that Adam optimizer changes as the number of epoch increase

dan.bmh · March 7, 2021, 9:11am

You can continue it, because optimizer state should be saved in the checkpoints.

-------- Original-Nachricht --------

hhiyassat · March 8, 2021, 11:39am

thank you
i will set max numnber of epoch to 30 and see what happened

Pak · March 8, 2021, 2:11pm

Hello,
Can you share the changes you made to DS to work with Arabic? I am working in Urdu.
Thanks.

hhiyassat · March 8, 2021, 8:03pm

nothing, it works directly just generate new alphabet.txt file for Arabic and that is it
in your case you can use transcript in Urdu and regenerate alphabet.txt for Urdu and point to its location

hhiyassat · March 21, 2021, 8:47am

Even when I put number of epoch =30
Abd remove no early stop
The problem still exist on deepspeech android tflite
To refresh your memory the problem is this
We implement this code

On English model it works fine
When we port our own model it recognized the very first utterance and stop recognizing any thing
When I go to debug on machine I notice that the decoder use the first audio for recognition and does not use latest audio pronounced

zhengMa021 · November 22, 2022, 11:43am

Hello, alex. I want to know It is possible provide the Arabic Deepspeech model you are using ?
It is public model or did you trained it yourself?
If its public then where I can download it ? If its not public I want to know it is possible you can provide a alphabet.txt of Arabic.

I am Chinese but my uncle was an Islam so he want to extract the Arabic text from video which speaking Arabic. I know how to train but I am stucked at providing the Arabic.txt file to Deepspeech.
Thanks.