Hi @btomtom5 - welcome to the forum
I haven’t experienced the issue you mention but in case you haven’t seen it, there’s some info on the wiki that goes over finding parameters that work well for a particular dataset and I wonder if this might help you with the noise issue and in turn help the model produce results in a more normal range
This section specifically:
CheckSpectrograms is to measure the noise level of the clips and find good audio processing parameters. Noise level might be observed by checking spectrograms. If spectrograms look cluttered, especially in silent parts, this dataset might not be a good candidate for a TTS project. If your voice clips are too noisy in the background, it makes things harder for your model to learn the alignment and the final result might be different than the voice you are given. If the spectrograms look good, then the next step is to find good set of audio processing parameters, defined in config.json
. In the notebook, you can compare different set of parameters and see the resynthesis results in relation to given ground-truth. Find the best parameters that give the best possible synthesis performance.