German Karlsson Models and Notebook

TheDayAfter · December 1, 2020, 7:01pm

Does anyone now if there are any models and notebooks are available for the German sample https://soundcloud.com/user-565970875/german-karlsson ?

The description links to https://github.com/erogol/WaveRNN where the Benchmark.ipynb is not available anymore. On Mozillas TTS repository https://github.com/mozilla/TTS/wiki/Released-Models its mentioned that WaveRNN is depreciated.

dkreutz · December 2, 2020, 7:53am

The Zamia TTS project has a Karlsson model. The demo audio does sound “scratchy” and speech flow is quite “robotic” and I think the WaveRNN demo is based on another model/vocoder.

TheDayAfter · December 2, 2020, 4:31pm

I already checked that but unfortunately the demo quality is not that good and on top they stick to Python 2.7 which is a no go.

If the datasets involved to generate Karlsson with WaveRNN is public it would be nice to have pretrained models available. Karlsson sounds clear and natural for my ears.

dkreutz · December 2, 2020, 6:07pm

Karlsson is a male voice of the german M-AILABS dataset: https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/

TheDayAfter · December 3, 2020, 5:06pm

Thank you fort the hint, contributions from user Karlsson are public domain and can be found here https://librivox.org/reader/5055

Are there any arguments not to use this type of dataset with audiobooks as a base in general. Looking at the amount of recorded hours and quality i wonder why it is not used widely.

dkreutz · December 3, 2020, 7:40pm

You need to carefully check the audio just like for every other dataset. The rules for a good datasets apply here as well.

Looking at the german M-AILABS datasets I found in some cases the text transcript does not match perfectly the spoken text, I also found that audio files are not properly edited and the last syllables of a sentence is missing sometimes.

Depending on the speaker and the story the reading speech and tonality might be varying when the speaker mimics different voices or emotions.

Don‘t know if this is a real problem, but the Librivox books are mostly older books (where copyright has expired ) so the vocabulary is somewhat outdated and model might have problems pronouncing „modern words“ like „computer“ or „coronavirus“

TheDayAfter · December 4, 2020, 3:14pm

The spoken text is taken from the data set itself, i.e. the very first section of “Das alte Haus”. Therefore it might be worth to try Zamia TTS / Karlsson though its based on Python 2.7 just to check the overall capability and some test sentences which were usual after 1950

TheDayAfter · December 5, 2020, 8:49am

No luck with Zamia TTS, no install instructions etc. Below are several samples of a three year old project utilizing Karlssons ‘sister’ Hokuspokus and some more from different languages https://github.com/Kyubyong/css10

othiele · December 5, 2020, 3:36pm

The Zamia guy was involved in a speech startup some time ago and has since moved on. But he is still quite interested in the subject. If you feel like it, just mail him and he’ll probably help.

othiele · December 23, 2020, 10:42am

@TheDayAfter check this new dataset by Facebook with 2k of speech in German. Should have Karlsson in it, even though it is mean 15 seconds you might be able to split it or use it for longer sentences.

TheDayAfter · December 23, 2020, 1:09pm

@othiele Thank you for the link Olaf. I would probably need a new hard disc to download, extract and edit the file Might be worth a try in the future.