Trying to create own czech model

Hello,
I try to train my own TTS model, but I am already in 95800 step and I can’t understand a single word from the output. I am lost, and I don’t know what I can do anymore. Any ideas what I did wrong?

Dataset: https://drive.google.com/drive/folders/13CSvJhH68C7BqepyOdRts9w0IY5hqILw?usp=sharing
Model: https://drive.google.com/drive/folders/1V9ILmK31SV8vnN-Et2y1BW6lb8njlvNS?usp=sharing

Any help is appreciated, thanks.

Can you share a Colab to try the model?

Do you plan to release the model open source?

Is this an open source dataset?

Maybe we can work together on that to make Czech available on the TTS.

I don’t plan on making this one open source because I’m doing it for friend, but I would definitely collaborate on an open source czech model.

do you know any open dataset ?

Yes, but it is under CC-0

but does it have a enough size single speaker subset?

how much do you need?

It’s hard to tell a number, because it primarily depends on a good phoneme coverage. But most single speaker datasets i know provide a minimum of 16 hours (or more) of voice recordings.

Additionally this might help you:

1 Like

I feel like bargaining :slight_smile: but at least 5 hours is like a good value to fine-tune a pre-trained model.

I have about 5 hours of audio of the same voice.

Sample: https://drive.google.com/file/d/1FeVXnKWIFHDepuXo44b1V1Lv-q8Vtb9J/view?usp=sharing

1 Like

let me check and thx for sharing.