I am using version 0.7.4 of deepspeech and training the model by leveraging the existing checkpoint . I have gone through the couple of issue on discourse that if we are training the model on top of existing checkpoint the data set should be large .
But if at all I don’t have huge dataset in hand and I want to make of the existing checkpoint , so that I get the good results out of it , what are the options available for me?
Thank you in advance!
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
No, fine-tuning / transfer-learning aims at helping for low-volume languages. You need large data when training from scratch.
I know @lissyx , the data is really very less for training , which should not be the case in real.
But my query is given the 0.7.4 checkpoint, lets says I train using 2 audio files , what will be the impact ?
The model that I will get after the training with my custom data ( which in this case is really very less) , what impact will it have on the model ?
The model will have the combination of existing checkpoint data and new checkpoint data. Or
Just the data from new checkpoint.
Because the observation is , if I train with lesser amount of that , and test with the same set , the transcription comes out to be perfect, but when I test with some other data which was earlier coming out to fine , give wrong transcriptions now.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
6
Sorry, but I can’t answer, you have not confirmed the amount of audio that is.
Please, this is explained in the docs on fine-tuning and transfer-learning
@lissyx So here are the results that I have in hand , comparison of the 0.7.4 model and my custom model (made by fine tuning the 0.7.4 checkpoint with 2 audios):
Actual transcript: warranty Default model : what in de Custom model: won ta
Actual transcript: scheduled and appointment Default model : she daund pointment Custom model: thank wove towne findmen
Actual transcript: how much is it Default model : howet is it Custom model: holde of
Actual transcript: what is the cost Default model : he oti ticonst Custom model: thyou foelnking work o sea colgs
Actual transcript: sync Default model : sing Custom model: thank for
In my believe if I am fine tuning at least what it was showing with default model should not be overridden . But seeing these result it seems opposite.
Could you please help me understand ?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
10
No:
you don’t have enough data for sure
you don’t share any of your training infos, so I have no idea what you are doing
I’m sorry but I can’t keep asking and asking, I have actual work to do.
What information are you asking for @lissyx , you asked for the amount of the audios , I replied to you saying 150 KBs. Am I missing something you asked?
Sorry about that, I understand but I think I shared the information you asked for ? Please let me know what other information you need .
Just to clarify, training is done with tens or hundreds of thousands of files of 5-10 seconds. For fine tuning you should use thousands or tens of thousands, not just 2-3.
Fine tuning will do nothing for just a couple of seconds, please collect material first, then train more.
I totally agree to your point @othiele , but my only question here is to understand , if fine tuning with 2-3 files don’t do anything, will it deteriorate the existing 0.7.4 checkpoint(if I use same directory as checkpoint_dir flag) or will it add something(may be little) to it .
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
16
As I said, since it’s not something useful, we don’t have feedback on that because we don’t do that.