I am currently working on Polish version of TTS, but my final goal is to obtain a Polish-speaking lector for films.
Of course, I can use simple program like Sony Vegas Studio to merge film with my .wav file, but my question is, how to generate .wav file, which will exactly fit into intervals of time?
a person is speaking from (mm:ss:msms) 00:00:02 to 00:08:01, and next person is speaking from 00:11:01 to 00:19:22.
My lector is a single voice, no division into man/female voice etc.
Do you have any advice how to do it?