Persian/Farsi TTS

i3130002 · November 13, 2020, 11:14am

Wanted to run Mozilla TTS on Persian text and was unable to do so. Doing this requires me to read almost every thing about it and get go deep in there. Wanted to post something in here so that if someone can help linking a google colab notebook others can also benefit from.
Please write where can I start to be able to make Mozilla TTS work on Persian. (I believe there exists enough data on Mozilla common-voice and there should be some audio books out there too.)

sanjaesc · November 13, 2020, 11:18am

This reads like you are asking others to do the work for you.

i3130002 · November 13, 2020, 11:40am

You are right and I don’t want people to do the work for me. All I’m asking is to give me hints and knowledge they know. If someone knows much he/she is able to guide me/and others how to approach this problem and if there exists any materials/codes.

othiele · November 13, 2020, 12:22pm

@sanjaesc is right, you will have to put a lot of time into it as this is not some ready to use software. If you want to do the work, this repo by @mrthorstenm has everything you need to train a German model.

synesthesiam · November 13, 2020, 10:14pm

Hi @i3130002, I may be able to help

I work on Rhasspy, a free & open source voice assistant that works offline. We’re always looking to add new languages, both for speech recognition and text to speech.

I have a fork of MozillaTTS that I’ve trained 6 voices with so far. If you’d like to collaborate, let me know.

We may be able to use existing speech data if there’s enough and it’s good quality. Another option (if you have a good microphone) is to have you or another native speaker record a set of phrases that are phonetically “rich”. I’ve built a tool to help find these phrases, though I’d need to add Persian/Farsi sounds.

i3130002 · November 14, 2020, 8:54am

That’s awesome, I emailed you so that we can start cooperating.

khalilrhouma · November 16, 2020, 10:10am

@othiele How could this repo helps to see the modification that sould be done to train TTS on another language?

othiele · November 16, 2020, 10:23am

Usually one learns by copying what others are doing and then you adapt it to your own needs. So I suggested you copy how we did it and then you can change stuff for Persian. If you already did that, what exactly is your question? And as @sanjaesc said, please don’t ask us do it all for you. Btw, @synesthesiam offered his excellent repo as well. Same there, study and then ask detailed questions.

synesthesiam · November 19, 2020, 1:06am

Thank you, @othiele. If you do come across people who aren’t able to do it themselves for under-served languages, please send them my way. I’m willing to do a lot of the work as long as I have a native speaker to consult with

othiele · November 19, 2020, 9:38am

@synesthesiam Great to hear that and thanks for all the work you put into new models. Will send people your way

erogol · November 20, 2020, 12:34pm

Are your models open? Would you mind sharing to link on https://github.com/mozilla/TTS/wiki/Released-Models

hkalbasi · December 16, 2020, 8:03am

I’m interested in Farsi TTS. What is the status of your work, @i3130002 @synesthesiam? Do you need help?

i3130002 · December 16, 2020, 8:40am

I did nothing and was unable to communicate much with him. I had to put off this project as of some personal problems, Though, I imagine using his github you should be able to start working on Farsi/Persian and if I could do something just email me (on gmail).
Wish you luck

synesthesiam · December 16, 2020, 2:57pm

Sorry I haven’t been very responsive, @i3130002. I should have some more time now during the holidays.

I’ve made some progress, but I need help now (see below)

I was able to find some Farsi speech data: I contacted the author of the MirasVoice corpus and got the full set. Unfortunately, it doesn’t have enough data from a single speaker. I might be able to use it in the future for Farsi speech to text in Rhasspy, but it’ll need a lot of pre-processing.

So I will need to collect recordings from a volunteer. But first, I need to develop a set of sentences that have good phoneme coverage. I’ve already added Farsi phonemes to my gruut-ipa library, and I’ve located a large corpus of sentences in OSCAR. But here’s where I’m stuck: numbers.

I use the num2words library to convert digits into words (1 -> one), and it doesn’t support Farsi yet. Would either of you (@i3130002 or @hkalbasi) be able to help me add support?

Once I can convert numbers to words, I can filter the OSCAR Farsi sentences and find a small set of sentences (usually < 2000) that will provide good phoneme pair examples. After that, we’ll need to find a volunteer with a good microphone and a lot of patience. With this dataset, I’ll be happy to train models for both MozillaTTS and my Larynx fork.

EDIT: Forgot to add one more step: filtering sentences. I usually start with a set of 2000-5000 phonetically rich sentences, and then ask volunteers to help filter out ones that don’t make sense, are offensive somehow, or are something that a real native speaker would never say. This can be done by multiple people in parallel at least, but it’s an important step

hkalbasi · December 16, 2020, 6:59pm

I just made a pull request for adding Farsi in num2words library.

Why we don’t use common voice sentences? There are near 7000 sentences in Farsi, which is reviewed and does not have numbers.

My friend is volunteer for recording the dataset. But I think we don’t have good microphone. Is IPhone’s microphone considered good? Can we solve this by software? Please message me for more details in an instant messaging app so I can send you samples of my friend voice. (Matrix: @hkalbasi:mozilla.org , telegram: @hkalbasi)

And another concern: In Farsi, characters like e, a, o are optional. For example کِتاب which means book is ketaab and کَتاب which is an invalid word is kataab (notice the small character moves from down to up, I hope your browser show that, but I can also send image) and everyone write کتاب without any a or e. This maybe is possible to handle by a dictionary. But the problem becomes more difficult in genitive case, which is connected in Farsi by e. For example my book in farsi is کِتابِ مَن = ketaabe man. But every one write it کتاب من and espeak read it ketaab man which is wrong. How we can handle that so our machine can read simple texts without ـَ ـِ ـُ explicity declared?

synesthesiam · December 17, 2020, 12:48am

(my reply got lost for some reason, so I’m re-typing it)

@hkalbasi, I’ve pulled your changes into my num2words fork. Thank you for such a quick response!

This is a great idea. I looked at the (now) 9000 validated Farsi sentences, and ran them through my phoneme coverage analysis. It looks like we could get excellent coverage even with half (~4000), but the more data the better.

How many sentences is your volunteer willing to read? I contacted you on Matrix; I’ll have to listen to some samples from the iPhone to see if it will be good enough.

I took some time this afternoon to look into this. In your example, is the “e” in “ketaabe man” pronounced? If so, none of the grapheme-to-phoneme systems I tried were able to produce the correct pronunciation.

A dictionary approach can help, but I may need to do some more research if this is a common problem.

Muhammad_Mirab_Br · February 15, 2021, 11:32am

Hi, is there any progress? I’m really looking forward to it.
I’d be glad to help the progress of persian TTS,
Cheers!

synesthesiam · February 18, 2021, 3:18pm

Last I checked, @hkalbasi’s friend had recorded about 400 phrases out of the 2400.

If you’re interested in recording, please let me know

Muhammad_Mirab_Br · February 19, 2021, 11:33am

Of course, I’d like to help
BTW I’m also looking for an offline Persian speech recognition in Python and it would be awesome if someone could help

synesthesiam · February 19, 2021, 2:52pm

I have about 400 hours of Persian speech data, which would be a good start for training a Kaldi speech recognition model.

Unfortunately, some of the audio is not aligned with the corresponding text. I have an audio book, for example, whose chapters are PDFs. If you would be willing to help get this data split into (sentence, text) pairs, I will train the Kaldi model and add it to Rhasspy.

Topic		Replies	Views
Questions about phonemizer for a non-English language TTS (Text-to-Speech)	1	2109	July 7, 2022
Mozilla Voice [ANSWERED] TTS (Text-to-Speech)	2	689	April 14, 2021
Speech to tex persian by deep speech DeepSpeech learning	10	3310	September 27, 2020
Sharing my 100h of single speaker (Spanish) TTS (Text-to-Speech)	6	2435	September 20, 2019
Training on Persian dataset cannot converge DeepSpeech	27	2136	March 2, 2021

Persian/Farsi TTS

Related topics