Has anyone looked at the topics of punctuation and/or reading speed in TTS?
For punctuation, a couple of months back I had a go with an adjustment to the output from espeak so that it kept commas which normally get taken out of input text. The model trained with it was then responsive to commas but it had degraded speech quality. If there’s interest, I can write up the process and maybe I’ll try again (as MelGAN has moved quality forward dramatically)
I’m also interested in speed of the output. It’s no doubt largely determined by my dataset but it definitely seems to read a touch faster than expected. I might try adding a postprocessing step so I can adjust this outside the model. Wondering where a GST approach might help there instead? (not something I’ve looked at closely)
Any suggestions on either of these points?