The Zamia TTS project has a Karlsson model. The demo audio does sound âscratchyâ and speech flow is quite âroboticâ and I think the WaveRNN demo is based on another model/vocoder.
I already checked that but unfortunately the demo quality is not that good and on top they stick to Python 2.7 which is a no go.
If the datasets involved to generate Karlsson with WaveRNN is public it would be nice to have pretrained models available. Karlsson sounds clear and natural for my ears.
Thank you fort the hint, contributions from user Karlsson are public domain and can be found here https://librivox.org/reader/5055
Are there any arguments not to use this type of dataset with audiobooks as a base in general. Looking at the amount of recorded hours and quality i wonder why it is not used widely.
You need to carefully check the audio just like for every other dataset. The rules for a good datasets apply here as well.
Looking at the german M-AILABS datasets I found in some cases the text transcript does not match perfectly the spoken text, I also found that audio files are not properly edited and the last syllables of a sentence is missing sometimes.
Depending on the speaker and the story the reading speech and tonality might be varying when the speaker mimics different voices or emotions.
Donât know if this is a real problem, but the Librivox books are mostly older books (where copyright has expired ) so the vocabulary is somewhat outdated and model might have problems pronouncing âmodern wordsâ like âcomputerâ or âcoronavirusâ
The spoken text is taken from the data set itself, i.e. the very first section of âDas alte Hausâ. Therefore it might be worth to try Zamia TTS / Karlsson though its based on Python 2.7 just to check the overall capability and some test sentences which were usual after 1950
No luck with Zamia TTS, no install instructions etc. Below are several samples of a three year old project utilizing Karlssons âsisterâ Hokuspokus and some more from different languages https://github.com/Kyubyong/css10
The Zamia guy was involved in a speech startup some time ago and has since moved on. But he is still quite interested in the subject. If you feel like it, just mail him and heâll probably help.
@TheDayAfter check this new dataset by Facebook with 2k of speech in German. Should have Karlsson in it, even though it is mean 15 seconds you might be able to split it or use it for longer sentences.