In order to create more dataset for deepspeech, we are trying to break the audio data we have on words and then rearrange them (with sufficient buffers in between).
For example, if we have an audio with the text:
“hey how are you feeling today”
and if we cut small chunks of this audio with following translations:
“hey”
“how”
“are”
“you”
“feeling”
“today”
can we rearrange them like:
“hey today you feeling are how”
does this kind of arrangement affect the language model ?
and what if we just remove some words from the original sequence, like:
“hey are feeling today”
also considering that we have a lot of data and somewhere a similar fully correct sentence is also being captured correctly?