Hi,
I have a dataset of small sentences of around 2 to 3 seconds.
Training after a particular threshold of file size will fail if the size is very less.
What I’m thinking to append all small audio chunks to create a big chunk and then send them for training.
I want to know whether it will affect the model performance or not?
For example, might be there will be some sentences containing 5 nouns in a single sentence which will hardly be possible. So can I go for this or not?
What do you want to use DeepSpeech for? The acoustic model and language model are not linked. So what you train for acoustic can be different from a custom language model.
I’m training the depespeech for generalized sentences in Hindi language.
Yeah that’s true that model and lm not linked.
Here I’m talking about the acoustic model.
If I train my model containing random words in a sentence which makes no sense (just to increase the chunk size), while predicting will it affects the prediction?
Please understand how the inference process works. Acoustic and language model are not coupled, so how can the acoustic model “understand” random sentences? It doesn’t. So your approach should work. Ideally you train with the same sort of material that you want to recognize later on. If you don’t have that, improvise