Is it real to train TTS on 3 hours long 2400 samples?

The title says by itself, I can’t get more samples and that’s kinda pain :frowning:

you can try to finetune one of the pretrained models but in general it is not enough for solid results.

Thanks, so even for finetune samples count isn’t enough?

It would be great if you gave it a go and report back here. That’ll help everyone at the same time as empowering you to answer your own question.

1 Like

were you able to get any answer on this? I am also trying to train some indic dataset of just 3 hours of recording - training on the nvidia/tacotron2 results in overfitting

In general what type of algorithms are more suitable for small indic datasets? the letter to phoneme mapping is always the same in most indic languages - i.e. a letter will always map to the same phoneme irrespective of the word it is located in.