Arabic model stuck

alexmay23 · December 6, 2020, 12:53pm

As for reproducability, it is hard to say, since I am personally tested it with live audio from microphone. But for me, even in live audio it failed every time. I guess we need to use all your recommendations to check everything, once we finish we will send some results here. Thank you very much for your help.

othiele · December 6, 2020, 12:58pm

And record some audio and test DS directly with. Search this forum. There was some talk of microphone problems lately. Make sure this is a DS problem, not from recording microphone with missing frames or something.

hhiyassat · February 28, 2021, 6:54pm

hi Othiele
i have run the training with dropout of 0.35 and the trialing and evaluation goes in similar way
the training stopped at the 13 epoch ( because of early stopped flag is not used )
then i exported the tflite model and ported it on android example

when using the android app using the English pretrained model the app was ruining smooth and the recognition is excellent
but when i use my trained Arabic model the app recognized the first utterance and half off the second one prefect only and nothing shown on the app
please note that we use the same app and switched the model only betwen English and Arabic
which means that the error were from our Arabic model not from VAD or the app itself

othiele · February 28, 2021, 7:08pm

What about the language model and alphabet?
Your material might not be that good. And maybe without augmentation?

Pak · March 1, 2021, 6:11am

Which corpus is use? thanks you

hhiyassat · March 1, 2021, 8:24am

i use augmentation in training
language model (scorer) is consist of the training text itself only it is very narrow domain about 50k unique words and 6500 sentence in total

othiele · March 1, 2021, 8:36am

What is the output without a scorer? This will tell you whether it is a scorer problem.

How much audio material do you use for training? Early stop after 13 epochs doesn’t sound too bad for some material.

hhiyassat · March 1, 2021, 9:38am

the output without scorer is list of character connected not so accurate
training material about 600 hours
see this video for how it stop recognizing

othiele · March 1, 2021, 9:49am

Thanks for the video. Try the same sentences and make longer pauses between words to check whether it has to do with that. General recognition looks ok, but I don’t speak Arabic.

And try to record a problematic sequence and feed it to DS on a desktop/server and play maybe via speaker on phone to check whether it is something on device.

hhiyassat · March 2, 2021, 12:03pm

dear othiele
thank you for your support
i just started new training yesterday using transfer learning using this model
/home/ubuntu/DeepSpeech_latest/EX-HD/deepspeech-0.9.3-checkpoint

here is the log so far
i will keep you updated

lissyx · March 2, 2021, 12:10pm

With this amount of data, and while I’m no arabic speaker, your description seems to be enough to conclude you just don’t have enough data.

lissyx · March 2, 2021, 12:10pm

can you please share that as plain text here, instead of a google doc link?

hhiyassat · March 2, 2021, 12:48pm

sure

Epoch	training loss	Validation loss
0	70.788821	54.815269
1	43.40599	45.615384
2	36.944965	40.852923
3	33.187017	37.502073
4	30.515962	34.640956
5	28.921748	32.462484
6	27.693192	30.43777
7	26.646094	26.959204
8	25.71389	24.63013
9	25.01252	23.623141
10	24.456176	22.395472
11	23.689649	21.885186
12	23.181059	21.200026
13	22.769138	20.290973
14	22.604338	19.716581
15	22.366818	19.167822
16	22.205989	21.117617
17	21.955193	18.876346
18	21.86773	18.313352

lissyx · March 2, 2021, 12:51pm

I’m wondering if your training might not have been stopped a bit too early, looking at the figures.

hhiyassat · March 2, 2021, 1:31pm

i removed the no early stop this time

hhiyassat · March 2, 2021, 7:19pm

The training finished but result is worse than the training with early stop
Any suggestions is highly appreciated

lissyx · March 2, 2021, 7:28pm

Unfortunately, as I stated earlier, it’s likely expected given the data you have.

dan.bmh · March 3, 2021, 8:08am

I also would say the above training did stop too early.

In my trainings I always used early stopping, with es_epochs=7 and es_min_delta=0.1, as well as plateau reduction with epochs=3 I think.

Set training epochs to your estimated maximal epoch number, something like 50?, but not to high, because of the augmentation ranges.

hhiyassat · March 4, 2021, 9:16am

What is your training corpus size in terms of Hours
mine is about 1200 hours i guess

dan.bmh · March 4, 2021, 9:45am

I did use about 1500h for german and 700h for Spanish.
So 1200h should be something you can work with, especially if you can later reduce the domain size with a specialized language model.
I would estimate you can reach something between 15-20% WER, depending how complex understanding arabic is.

-------- Original-Nachricht --------