Recently we have been training a Chinese speech synthesis model and found an amazing open-source model Coqui.
Part of it was done for Chinese by “Kirian Guiller”, and we are looking forward to contacting this developer.
The problem we are currently encountering is the dataset. In what format should the dataset be organized? There are different formats in different datasets, which is very confusing.
We hope Kirian Guiller can share how the Chinese dataset was organized and formatted in that project. This would be very helpful for researchers and engineers working on speech synthesis models for Chinese and other Asian languages.