Uniform pitch & tempo dataset

lclrk · July 16, 2020, 11:15am

I’m wondering what the effects of having a dataset where the spoken phrases have uniform timing and pitch throughout. For example, if a reference pitch was played at 110 beats per minute, every syllable would land on a beat and be voiced at the particular reference pitch. I wonder how this would effect training, would it speed it up, or would the organically robotic/monotone style of dataset confuse the model from little variation?

georroussos · July 16, 2020, 11:33am

It wouldn’t confuse it, it would probably learn it. Then if you wanted to train using GST layers, it might not be a versatile model, but in general it shouldn’t be a problem.

lclrk · July 16, 2020, 11:55am

good to hear, thanks. I will add more variations later, so wanted to see if anyone thought there might be issues with this at the basic level.

Topic		Replies	Views
[Private dataset - Portuguese] Expecting healthier results at 10k+ steps TTS (Text-to-Speech)	13	898	May 8, 2020
Combining GST and multi-speaker for adaptation and prosody control TTS (Text-to-Speech)	1	777	December 6, 2021
Training with custom Dataset TTS (Text-to-Speech)	1	589	November 4, 2020
Pretrained model for Multiple Speaker Embedding TTS (Text-to-Speech)	1	566	September 5, 2019
New to the TTS field and i have some questions (about the necessary data) TTS (Text-to-Speech) learning	3	833	February 12, 2021

Uniform pitch & tempo dataset

Related topics