What to consider when using ASR dataset for TTS

I am new to NLP field. So, please excuse me, if I am asking about some obvious stuff.

Are there any fundametal differences in the datasets used for training ASR and TTS models?

In case, this is possible, is there anything still to pay attention to when using the dataset for TTS training?

ASR datasets usually have recordings of many different speakers with different dialects, intonations, etc. and may contains background noise.

For a TTS dataset you want to have clean recordings with consistant intonation from a single speaker.

See here: https://tts.readthedocs.io/en/latest/what_makes_a_good_dataset.html#what-makes-a-good-dataset

1 Like