It’s easy to get data for that speaker (David Attenborough), background noise might be a problem though.
I’m not sure if it would be better to train from scratch or finetune?
And is mozilla TTS well suited for what I want to do? Or would you suggest something else?
I know ASR quite well but am new to TTS.
Also the link in the README is broken:
If you are new, you can also find here a brief post about TTS architectures and their comparisons.