I would like to open a discussion about the config.json file included in the master branch. While questions about a “best” configuration may not be answered universally, a consistent exemplary configuration provides a solid foundation for individual experiments.
Towards this goal, I propose to remove or change some confusing elements in the current config file. Right now, an exemplary configuration for a Tacotron2 training with LJSpeech is indicated there. This makes sense considering the “Collaborative Experimentation Guide” advertised in the README, that equally advocates to use LJSpeech for experiments. However, there seem to be a confusing inconsistency between some of the parameters set and their respective comments.
Consider for instance:
"do_trim_silence": true,// enable trimming of slience of audio as you load it. LJspeech (false), TWEB (false), Nancy (true)
If the comment proposes not to use do_trim_silence
with LJspeech, the parameters value should be false
.
A second example:
"attention_norm": "sigmoid", // softmax or sigmoid. Suggested to use softmax for Tacotron2 and sigmoid for Tacotron.
Set attention_norm
to softmax, or indicate why sigmoid should be used if Tacotron2 is trained with LJSpeech.
In my opinion, the problem is not only that there seems to be a mismatch between the parameter value and the comment, but also that it becomes unclear whether the values of other parameters are already adapted to LJSpeech or require further changes. While the inconsistency is obvious in the above examples, it may be the case in others as well, but less obviously.
Independent of the specific training set, this comment does not reflect the source code:
"sample_rate": 22050, // DATASET-RELATED: wav sample-rate. If different than the original data, it is resampled.
Yet, the audio is not resampled, see #405. Instead, maybe recommend to resample before training or remove that part of the comment?
I understand that - as TTS is under active development - knowledge and best practices change frequently, so keeping a consistent, up-to-date config file might be difficult. Alternatively, maybe there could be a section in the wiki devoted to lessons learned regarding Model-Dataset-Configurations and exemplary configurations?
I’m happy to make a PR with regards to the identified parts above. However, maybe my understanding of the whole situation is not on point, so I’m happy to hear your thoughts.