Hi everyone. I got 2 question on TTS dataset/corpus
My question is that:
is high frequency word (number of word) in a corpus is better than low frequency word when it comes to generating an audio for a particular text?
If i have the word ‘curious’ as the high-frequency word/audio in my corpus, how can the model generalize that word/audio when it comes to TTS?
Since the word is most likely to have various different intonation during recording.
Frequency = how many times a certain word occur in our corpus