I am performing the first phase of the training with a code that I clone from TTS recepies(hier). So I’m running the first code snippet below.
# change the GPU id if needed
CUDA_VISIBLE_DEVICES="0" python TTS/train.py --config_path model_config.json
# train vocoder ...
CUDA_VISIBLE_DEVICES="0" python TTS/vocoder/train.py --config_path vocoder_config.json
Although I haven’t trained the vocoder yet, under the tts_model/test_audios folder, there are test sounds (corresponding to the sample sentences I entered). How is this possible even though I have not trained the vocoder? I used a 3.5GB data set and now it’s 180K in training (single GPU). Sounds are still coming, will the robots get better when you train the vocoder?
If anyone knows the answer to my question and shares it with me, I would be very happy.
Do I have to start the vocoder training after the decoder training finished?
Normally the Vocoder takes mel spectograms as input from the decoder. So as far as I understand we need to train the vocoder after the decoder. But the vocoder training code works without training the decoder, would this be a correct training? So,does it make sense to start training the decoder on one computer and start training the vocoder on another computer?
Both are seperate, you can train a vocoder now if you want to or try one that erogol already made public. Might not be great for other stuff than English.