Arabic model stuck

i use augmentation in training
language model (scorer) is consist of the training text itself only it is very narrow domain about 50k unique words and 6500 sentence in total

What is the output without a scorer? This will tell you whether it is a scorer problem.

How much audio material do you use for training? Early stop after 13 epochs doesn’t sound too bad for some material.

the output without scorer is list of character connected not so accurate
training material about 600 hours
see this video for how it stop recognizing

Thanks for the video. Try the same sentences and make longer pauses between words to check whether it has to do with that. General recognition looks ok, but I don’t speak Arabic.

And try to record a problematic sequence and feed it to DS on a desktop/server and play maybe via speaker on phone to check whether it is something on device.

dear othiele
thank you for your support
i just started new training yesterday using transfer learning using this model
/home/ubuntu/DeepSpeech_latest/EX-HD/deepspeech-0.9.3-checkpoint

here is the log so far
i will keep you updated

1 Like

With this amount of data, and while I’m no arabic speaker, your description seems to be enough to conclude you just don’t have enough data.

can you please share that as plain text here, instead of a google doc link?

sure

Epoch training loss Validation loss
0 70.788821 54.815269
1 43.40599 45.615384
2 36.944965 40.852923
3 33.187017 37.502073
4 30.515962 34.640956
5 28.921748 32.462484
6 27.693192 30.43777
7 26.646094 26.959204
8 25.71389 24.63013
9 25.01252 23.623141
10 24.456176 22.395472
11 23.689649 21.885186
12 23.181059 21.200026
13 22.769138 20.290973
14 22.604338 19.716581
15 22.366818 19.167822
16 22.205989 21.117617
17 21.955193 18.876346
18 21.86773 18.313352
1 Like

I’m wondering if your training might not have been stopped a bit too early, looking at the figures.

i removed the no early stop this time

The training finished but result is worse than the training with early stop
Any suggestions is highly appreciated

Unfortunately, as I stated earlier, it’s likely expected given the data you have.

I also would say the above training did stop too early.

In my trainings I always used early stopping, with es_epochs=7 and es_min_delta=0.1, as well as plateau reduction with epochs=3 I think.

Set training epochs to your estimated maximal epoch number, something like 50?, but not to high, because of the augmentation ranges.

1 Like

What is your training corpus size in terms of Hours
mine is about 1200 hours i guess

I did use about 1500h for german and 700h for Spanish.
So 1200h should be something you can work with, especially if you can later reduce the domain size with a specialized language model.
I would estimate you can reach something between 15-20% WER, depending how complex understanding arabic is.

-------- Original-Nachricht --------

Thank you daneial
Can you share your training script variables
No of epoch
Learned no rate etc
Thank you

You can find everything here:

The exact variables are included in the published checkpoint files (links are at the bottom of the readme)

1 Like

No if I want to go to higher number of epochs say 30
Can I continue training from where it is stopped
Or I should start fresh training with him number of epoch
Some one told me that Adam optimizer changes as the number of epoch increase

You can continue it, because optimizer state should be saved in the checkpoints.

-------- Original-Nachricht --------

thank you
i will set max numnber of epoch to 30 and see what happened