Hi, we have recorded two datasets (male and female) for URDU language. Both the datasets are in LJSpeech format with total length of 10 hours each and sample rate of 48kHz. I have previously trained simple Mozilla Tacotron (Griffin-Lim) model using distributed training for upto 10k epochs, and got good results for the Male voice, however, the female voice didnt come out good. Maybe its because of the recording style of the data idk.
Now I want to try my luck on https://github.com/coqui-ai/TTS-recipes/tree/master/LJSpeech/DoubleDecoderConsistency this following recipe. Is there any notebook that I can use later on for the voice synthesis once I have the trained models from the above recipe? I am looking to improve the prosody and intonation of the voice.
Thank You.
Regards,
Zain