Train using existing checkpoint,but with smaller dataset set


I am using version 0.7.4 of deepspeech and training the model by leveraging the existing checkpoint . I have gone through the couple of issue on discourse that if we are training the model on top of existing checkpoint the data set should be large .

But if at all I don’t have huge dataset in hand and I want to make of the existing checkpoint , so that I get the good results out of it , what are the options available for me?

Thank you in advance!

No, fine-tuning / transfer-learning aims at helping for low-volume languages. You need large data when training from scratch.

1 Like

@lissyx ok I understood . But we have seen the results way off after fine tuning with say 1 or 2 audios on top of existing checkpoint .

Also I was reading the issues at discourse , it is mentioned that epoch value should be negative while fine tuning on top of existing checkpoint.

It is for older deepspeech version, is that statement still valid for 0.7.4 ?

I’m unsure to get your point here: one or two audio files? How much content is that?

You are refering to old threads, this is not valid anymore for a long time.

I know @lissyx , the data is really very less for training , which should not be the case in real.

But my query is given the 0.7.4 checkpoint, lets says I train using 2 audio files , what will be the impact ?

The model that I will get after the training with my custom data ( which in this case is really very less) , what impact will it have on the model ?

  1. The model will have the combination of existing checkpoint data and new checkpoint data. Or
  2. Just the data from new checkpoint.

Because the observation is , if I train with lesser amount of that , and test with the same set , the transcription comes out to be perfect, but when I test with some other data which was earlier coming out to fine , give wrong transcriptions now.

Sorry, but I can’t answer, you have not confirmed the amount of audio that is.

Please, this is explained in the docs on fine-tuning and transfer-learning

The audios are nearly 150 KBs

@lissyx So here are the results that I have in hand , comparison of the 0.7.4 model and my custom model (made by fine tuning the 0.7.4 checkpoint with 2 audios):

Actual transcript: warranty
Default model : what in de
Custom model: won ta

Actual transcript: scheduled and appointment
Default model : she daund pointment
Custom model: thank wove towne findmen

Actual transcript: how much is it
Default model : howet is it
Custom model: holde of

Actual transcript: what is the cost
Default model : he oti ticonst
Custom model: thyou foelnking work o sea colgs

Actual transcript: sync
Default model : sing
Custom model: thank for

In my believe if I am fine tuning at least what it was showing with default model should not be overridden . But seeing these result it seems opposite.

Could you please help me understand ?


  • you don’t have enough data for sure
  • you don’t share any of your training infos, so I have no idea what you are doing

I’m sorry but I can’t keep asking and asking, I have actual work to do.

I understand I have less data.

What information are you asking for @lissyx , you asked for the amount of the audios , I replied to you saying 150 KBs. Am I missing something you asked?

Sorry about that, I understand but I think I shared the information you asked for ? Please let me know what other information you need .

The command used for training

python --noshow_progressbar --noearly_stop --alphabet_config_path “./data/alphabet.txt” --load_train “best” --train_files <train_file> --train_batch_size 1 --dev_files <dev_file> --dev_batch_size 1 --test_files <test_file> --test_batch_size 1 --save_checkpoint_dir ‘/directory1’ --load_checkpoint_dir ‘/directory1’ --scorer_path ‘’ --n_hidden 2048 --epochs 250

Below are the specification

  • Training or Inference - Training
  • DeepSpeech branch/version - 0.7.4
  • OS Platform and Distribution (e.g., Linux Ubuntu 18.04) - Linux - Ubuntu 18.04
  • Python version - 3.6.9
  • TensorFlow version - tensorflow-gpu==1.15.2

Please let me know if any other information is needed , will be happy to share!

Just to clarify, training is done with tens or hundreds of thousands of files of 5-10 seconds. For fine tuning you should use thousands or tens of thousands, not just 2-3.

Fine tuning will do nothing for just a couple of seconds, please collect material first, then train more.

1 Like

I totally agree to your point @othiele , but my only question here is to understand , if fine tuning with 2-3 files don’t do anything, will it deteriorate the existing 0.7.4 checkpoint(if I use same directory as checkpoint_dir flag) or will it add something(may be little) to it .

As I said, since it’s not something useful, we don’t have feedback on that because we don’t do that.

Sure , I understood your point @lissyx . Thank you so much for your help.