Training/fine-tuning DeepSpeech branch/version - 0.7.0 on Linux

@reuben as a question on this note:

With fine tuning (and training with custom training set) would it make sense to just use the LibriSpeech Validation set or would it be better to go for the validation set that was split from the custom dataset?

It only makes sense to use your own data, otherwise you’ll be fine tuning blindly, with no way to track how the model is evolving w.r.t. your data.

@othiele @reuben
I am working on training/fine-tuning DeepSpeech branch/version - 0.7.0 on Linux Ubuntu 16.04 with Python version -3.6.5, TensorFlow version - 1.15.2, CUDA/cuDNN version - CUDA 10.0/cuDNN 7.6.5.
I have a 2070 hour dataset that is about 90-95% accurate in terms of the transcription itself (like the ums, uhs, repeated words, false starts, stutters are not accounted for, also occasionally there is less or more text in the transcript than in wav file itself). I initially split this as 2000hr train, 35hr validation and 35hr test sets, but later split into 2060 train, 5hr test, 5hr test sets as I manually fixed the transcription on the 5hr validation and made sure it was 99+% accurate.

My question is two fold:

  1. Are there any automatic suggested ways to fix the missing transcription? Saw DS Align and other force alignment tools, but have not spent the time to get them work. Is that the right direction here? Or it is better to fix the data manually even if it is slow? what has worked best in your experience?

  2. What is the suggested test/train/val split here? I am assuming my current 5 hour validation is too small (I just chose 5hours to cleanup manually mainly because released model was trained on 3817 hours with a 5 hour clean validation). Is the 5hour pick too small? or it is set sufficient? or would it better to have a split with 2-3% validation and test sets and the rest as training? Appreciate your thoughts /suggestions on this.


  1. Please stop hijacking an older post of a rising loss to post a question on train/dev/test split.

  2. Search your question before asking and you’ll find great answers.

Hello i have a questions about datasets: i have a 10 different words, sounded by 10 different persons, and all sounds are almost 400 of that 10 words. now how can i properly separate train, dev, and test files ? and i dont exatly know if i can use like this, i mean same words sounded by different people ? thank you

@A_N, @Akmal_Nodirov please learn about how to post in forums. I just said “don’t hijack old threads” and you do exactly that. Post it in a new thread with a good headline and we can discuss it. And do your research first:

here, i didnt understand with test data, please clarify your answer, can we get test data from our train data ?

(1) search for test/dev split in this forums
(2) open a new thread/ticket/whatever you call it with a descriptive title
(3) write down what you have learned while searching
(4) ask for what is still unknown

Do not post further in this thread

what do you think am i doing now?

I’ve split this into its own topic. Please do not hijack threads with unrelated questions.

1 Like