Forced alignment and train data quality

madziszyn · February 17, 2020, 1:44pm

Hi,

first of all - big thanks to the Deepspeech team. Amazing work.

I am working on creating dataset of polish speech. I am using aeneas FA tool to get alignments from audiobooks (ofc, public domain). I can’t afford to manually check and finetune all produced alignments. From random sampling my dataset, i assmue there is about 1% really bad alignments (audio doesn’t match transcript at all) and 12% almost perfect alignments (for example last word of transcript is cut off slightly), rest is good quality (audio matches transcript 100%).

My question is - can i just ignore bad quality utterances, feed them to DS with the rest of the dataset and hope that small proportion of bad quality samples will not disturb overall training convergance?

victornoriega7 · February 17, 2020, 11:20pm

I’ve experienced something like that. I will explain what I did:
From a spanish dataset of around 350 hours of train, 30 of them weren’t perfect (in terms of aligment). So that’s less than 9% of bad-quality data. Then I trained with that dataset (2048 n_hiddens, 0.2 dropout) and I got around 10% CER, and a few samples that were incorrectly labeled in the test, were correctly labeled by my model!!

So, I think you should give it a shot to it and while is training, take care of you dataset. Just think that there’s no perfect dataset. It’s impossible to check thousands of speech data.

Topic		Replies	Views
Force alignment (synchronize audio with text) DeepSpeech	9	4210	October 28, 2019
Impact of alignment of training data on model accuracy DeepSpeech	0	368	December 14, 2018
Question on training data set DeepSpeech	3	375	June 22, 2020
DSAlign - handling disfluencies DeepSpeech	2	645	September 16, 2019
Training/fine-tuning DeepSpeech branch/version - 0.7.0 on Linux DeepSpeech	9	563	July 10, 2020

Forced alignment and train data quality

Related topics