Problem in creating DeepSpeech training data

Hi,

I am trying to create a voice corpus for DeepSpeech training. I have downloaded the youtube videos and subtitles. I noticed that for many videos, video frames and video subtitles are not 100% correctly aligned.
Is there any tool available that will align the frames with subtitles. Pls advise.

Is this content you are allowed to use ? How were the subtitles made ?

Can you elaborate on that ?

There are several projects that allows to perform forced alignment, but they require some existing, not-perfect-but-good-enough model for your language.

send me the link of those projects that do forced alignments.

I noticed that for many videos, video frames and video subtitles are not 100% correctly aligned.

i mean there is start and end time of each frame subtitles. i split up the entire audio file into multiple audio files based on start and end times of each frame subtitles. I noticed in many frames the start time and end time is incorrect. in each frame of subtitiles, the start time and end time are wrongly annotated, i do not know how the subtitles and times are generated.

Here: https://www.google.com/search?q=forced+alignment

That’s not very descriptive. Do you have audio missing ? Something else ?

You need to research, because if it’s automatically generated, it’s likely not such a good idea to use without proper review prior.

You really need to check license for your data as well, I doubt anything pushed to YouTube is allowing you to do that.