Hi, i intend to use an open source dataset with phrases. I know about the audio requirements, but a bit unsure about the localization part.
As we have different letters in swedish language, i suppose there are some crucial settings that need to be changed in order to train it proplerly
In config.json check section “characters” and add special characters if missing, in case you do phoneme based training check if list of phonemes is complete.
You simply have to try if character or phoneme training gives better results.
When setting “use_phonemes” to true don’t forget to set “phoneme_language” to swedish.
Also check"text_cleaner" and consider to create your own in case the default one does not fit your use-case.
Thanks! Small effort when compared with the huge effort made by Mozilla implementing this.
In my runs, the multiband melgan vocoder has issues with breathing. I continued training the vocoder for 1m steps but the metallic sound didn’t disappear. The pre-trained model available for download has the same issues on my data. WaveGAN sounds better overall for my data but the MB melgan vocoder sounds more realistic if you disregards the metallic sound when the model makes breathing pauses.
Thank you for the link! I’m work on a Swedish speech to text and a text to speech model for the Rhasspy voice assistant. If I can get these datasets downloaded, they will be very helpful