Trainig model loss

lissyx · March 13, 2020, 12:56pm

Please read and document yourself on machine learning.

Akmal_Nodirov · March 13, 2020, 12:58pm

what do you mean with dataset ? my audio and text files ? i will give you any dataset, bu i dont get what do you mean with datasets ?

lissyx · March 13, 2020, 12:57pm

Yes. Also, why don’t you give a dev set. You can’t train seriously without

Akmal_Nodirov · March 13, 2020, 1:04pm

https://drive.google.com/open?id=1BDWNbsqSnrYw1i3MqfO4r0Is312DgfWu Here is datasets csv
https://drive.google.com/open?id=1X5wHUui0BPDRaxnOOAMtOTn5t3UFfUBv here is my audios

Akmal_Nodirov · March 13, 2020, 1:17pm

what parameters needed ? and what proportions ? epochs, n_hidden, learning_rate, drop_out_rate ? Where can i find informations about these paramateres ?

lissyx · March 13, 2020, 1:42pm

You misunderstood me. I don’t have the time to examinate your dataset. Please explain it.

Have you read the documentation ? They are documented. You also really need to get some experience on machine learning.

lissyx · March 13, 2020, 1:43pm

@Akmal_Nodirov This is really getting tiring now, though. You have not explained why you don’t have a dev set on your training.

Akmal_Nodirov · March 13, 2020, 2:18pm

Brother, i dont really understand you, how to explain my datasets, i ll try:
i have 93 phrases, not single words. Phrases like “Hello this is me”, “My name is someone” and so on.i I gave 10 of them to test, and another ten 10 dev set, leaving 73 for train like:
–train_batch_size 73 *
** --test_batch_size 10 *
** --dev_batch_size 10 **
** --n_hidden 100 **
** --epochs 200 **

and i have .wav audio file for every phrase

this is what youre asking of datasets ?

lissyx · March 13, 2020, 2:18pm

Ok, so very very small, hand-crafted dataset. Now we progress. How much audio does that make, in time? One hour?

lissyx · March 13, 2020, 2:18pm

What kind of use-case do you target ?

Akmal_Nodirov · March 13, 2020, 2:21pm

the audios are 10 minutes long. All audios are 10 minutes

lissyx · March 13, 2020, 2:21pm

Ok, so you have roughly 15 hours ? 10*93 makes 930 mins.

Akmal_Nodirov · March 13, 2020, 2:22pm

no, 93 phrase lasts 10 minutes)

lissyx · March 13, 2020, 2:23pm

You can’t expect to train anything with just 10 minutes. Again, what use-case do you target ?

Akmal_Nodirov · March 13, 2020, 2:26pm

Im uzbek, and there arent any fine uzbek SST, im goin to develop it) i mean my company, my city can use it after developing it. no commercial user for now

lissyx · March 13, 2020, 2:25pm

You need more than 10 minutes for generic-purpose STT. Are you contributing to Common Voice for Uzbek? That’s the best course of action at the moment.

Once you get a few dozen of hours, you can try and start building something with transfer-learning.

Akmal_Nodirov · March 13, 2020, 2:28pm

Ok thank you brother, i will try

Akmal_Nodirov · March 13, 2020, 2:35pm

one question, can i repeate phrases ? for example above 93 phrases are only 13 different phrases. I just have repeated them. For example i have this phrase:
“Plastik yoqolsa bankka borib ochtiring”. This phrase repeated 7 times. and also 7 different audio with different tempo

lissyx · March 13, 2020, 2:56pm

We don’t have enough feedback yet on the behavior of DeepSpeech on that, it might depend on your dataset as well as your language. Repeated sentences (i.e. same sentence spoken by different people, so close to your case) seems to improve a bit the model. But we are far from having a definitive answer on that, so I would suggest to check that cautiously.

othiele · March 13, 2020, 9:09pm

@Akmal_Nodirov A good way to start is to retrain something that is well known. Have you seen this video?

And for the number of samples. Typically, you will need about 10.000 samples of length 4-8 seconds to get somewhat good result for general language understanding.

Maybe you can get them here?