Persian/Farsi TTS

Wanted to run Mozilla TTS on Persian text and was unable to do so. Doing this requires me to read almost every thing about it and get go deep in there. Wanted to post something in here so that if someone can help linking a google colab notebook others can also benefit from.
Please write where can I start to be able to make Mozilla TTS work on Persian. (I believe there exists enough data on Mozilla common-voice and there should be some audio books out there too.)

This reads like you are asking others to do the work for you. :face_with_raised_eyebrow:

2 Likes

You are right and I don’t want people to do the work for me. All I’m asking is to give me hints and knowledge they know. If someone knows much he/she is able to guide me/and others how to approach this problem and if there exists any materials/codes.

@sanjaesc is right, you will have to put a lot of time into it as this is not some ready to use software. If you want to do the work, this repo by @mrthorstenm has everything you need to train a German model.

1 Like

Hi @i3130002, I may be able to help :slight_smile:

I work on Rhasspy, a free & open source voice assistant that works offline. We’re always looking to add new languages, both for speech recognition and text to speech.

I have a fork of MozillaTTS that I’ve trained 6 voices with so far. If you’d like to collaborate, let me know.

We may be able to use existing speech data if there’s enough and it’s good quality. Another option (if you have a good microphone) is to have you or another native speaker record a set of phrases that are phonetically “rich”. I’ve built a tool to help find these phrases, though I’d need to add Persian/Farsi sounds.

2 Likes

That’s awesome, I emailed you so that we can start cooperating.

1 Like

@othiele How could this repo helps to see the modification that sould be done to train TTS on another language?

1 Like

Usually one learns by copying what others are doing and then you adapt it to your own needs. So I suggested you copy how we did it and then you can change stuff for Persian. If you already did that, what exactly is your question? And as @sanjaesc said, please don’t ask us do it all for you. Btw, @synesthesiam offered his excellent repo as well. Same there, study and then ask detailed questions.

2 Likes

Thank you, @othiele. If you do come across people who aren’t able to do it themselves for under-served languages, please send them my way. I’m willing to do a lot of the work as long as I have a native speaker to consult with :slight_smile:

1 Like

@synesthesiam Great to hear that and thanks for all the work you put into new models. Will send people your way :slight_smile:

1 Like

Are your models open? Would you mind sharing to link on https://github.com/mozilla/TTS/wiki/Released-Models

I’m interested in Farsi TTS. What is the status of your work, @i3130002 @synesthesiam? Do you need help?

1 Like

I did nothing and was unable to communicate much with him. I had to put off this project as of some personal problems, Though, I imagine using his github you should be able to start working on Farsi/Persian and if I could do something just email me (on gmail).
Wish you luck :grinning:

2 Likes

Sorry I haven’t been very responsive, @i3130002. I should have some more time now during the holidays.

I’ve made some progress, but I need help now :slight_smile: (see below)

I was able to find some Farsi speech data: I contacted the author of the MirasVoice corpus and got the full set. Unfortunately, it doesn’t have enough data from a single speaker. I might be able to use it in the future for Farsi speech to text in Rhasspy, but it’ll need a lot of pre-processing.

So I will need to collect recordings from a volunteer. But first, I need to develop a set of sentences that have good phoneme coverage. I’ve already added Farsi phonemes to my gruut-ipa library, and I’ve located a large corpus of sentences in OSCAR. But here’s where I’m stuck: numbers.

I use the num2words library to convert digits into words (1 -> one), and it doesn’t support Farsi yet. Would either of you (@i3130002 or @hkalbasi) be able to help me add support?

Once I can convert numbers to words, I can filter the OSCAR Farsi sentences and find a small set of sentences (usually < 2000) that will provide good phoneme pair examples. After that, we’ll need to find a volunteer with a good microphone and a lot of patience. With this dataset, I’ll be happy to train models for both MozillaTTS and my Larynx fork.

EDIT: Forgot to add one more step: filtering sentences. I usually start with a set of 2000-5000 phonetically rich sentences, and then ask volunteers to help filter out ones that don’t make sense, are offensive somehow, or are something that a real native speaker would never say. This can be done by multiple people in parallel at least, but it’s an important step :+1:

1 Like

I just made a pull request for adding Farsi in num2words library.

Why we don’t use common voice sentences? There are near 7000 sentences in Farsi, which is reviewed and does not have numbers.

My friend is volunteer for recording the dataset. But I think we don’t have good microphone. Is IPhone’s microphone considered good? Can we solve this by software? Please message me for more details in an instant messaging app so I can send you samples of my friend voice. (Matrix: @hkalbasi:mozilla.org , telegram: @hkalbasi)

And another concern: In Farsi, characters like e, a, o are optional. For example کِتاب which means book is ketaab and کَتاب which is an invalid word is kataab (notice the small character moves from down to up, I hope your browser show that, but I can also send image) and everyone write کتاب without any a or e. This maybe is possible to handle by a dictionary. But the problem becomes more difficult in genitive case, which is connected in Farsi by e. For example my book in farsi is کِتابِ مَن = ketaabe man. But every one write it کتاب من and espeak read it ketaab man which is wrong. How we can handle that so our machine can read simple texts without ـَ ـِ ـُ explicity declared?

2 Likes

(my reply got lost for some reason, so I’m re-typing it)

@hkalbasi, I’ve pulled your changes into my num2words fork. Thank you for such a quick response!

This is a great idea. I looked at the (now) 9000 validated Farsi sentences, and ran them through my phoneme coverage analysis. It looks like we could get excellent coverage even with half (~4000), but the more data the better.

How many sentences is your volunteer willing to read? I contacted you on Matrix; I’ll have to listen to some samples from the iPhone to see if it will be good enough.

I took some time this afternoon to look into this. In your example, is the “e” in “ketaabe man” pronounced? If so, none of the grapheme-to-phoneme systems I tried were able to produce the correct pronunciation.

A dictionary approach can help, but I may need to do some more research if this is a common problem.

3 Likes

Hi, is there any progress? I’m really looking forward to it.
I’d be glad to help the progress of persian TTS,
Cheers!

1 Like

Last I checked, @hkalbasi’s friend had recorded about 400 phrases out of the 2400.

If you’re interested in recording, please let me know :slight_smile:

1 Like

Of course, I’d like to help :slight_smile:
BTW I’m also looking for an offline Persian speech recognition in Python and it would be awesome if someone could help :pray:

I have about 400 hours of Persian speech data, which would be a good start for training a Kaldi speech recognition model.

Unfortunately, some of the audio is not aligned with the corresponding text. I have an audio book, for example, whose chapters are PDFs. If you would be willing to help get this data split into (sentence, text) pairs, I will train the Kaldi model and add it to Rhasspy.

1 Like