LibrtiTTS, phonemizer and punctuation

Trying to phonemize on LibritTTS subsets with various hacks to get rid of weird punctuation, but still have stuff like:

She said, sadly, 'Yes!


I don’t even know how to read that and the phonemizer barfs:

phonemize(text=“She said, sadly, 'Yes!”, separator=phonemizer.separator.Separator(’ |’, ‘’, ‘|’), strip=False, njobs=1, backend=‘espeak’, language=‘en-us’,punctuation_marks=’;:,.!?¡¿—…"«»“”’’,preserve_punctuation=True)
Traceback (most recent call last):
File “”, line 1, in
File “/home/users/myadav/.virtualenvs/sri_tts/lib/python3.6/site-packages/phonemizer/”, line 172, in phonemize
text, separator=separator, strip=strip, njobs=njobs)
File “/home/users/myadav/.virtualenvs/sri_tts/lib/python3.6/site-packages/phonemizer/backend/”, line 126, in phonemize
text = self._punctuator.restore(text, punctuation_marks)
File “/home/users/myadav/.virtualenvs/sri_tts/lib/python3.6/site-packages/phonemizer/”, line 146, in restore
return cls._restore_aux(str2list(text), marks, 0)
File “/home/users/myadav/.virtualenvs/sri_tts/lib/python3.6/site-packages/phonemizer/”, line 166, in _restore_aux
[text[0] + m.mark + text[1]] + text[2:], marks[1:], n)
File “/home/users/myadav/.virtualenvs/sri_tts/lib/python3.6/site-packages/phonemizer/”, line 166, in _restore_aux
[text[0] + m.mark + text[1]] + text[2:], marks[1:], n)
File “/home/users/myadav/.virtualenvs/sri_tts/lib/python3.6/site-packages/phonemizer/”, line 166, in _restore_aux
[text[0] + m.mark + text[1]] + text[2:], marks[1:], n)
IndexError: list index out of range

Hard for me to guess what the right approach here is. Has anyone already dealt with this in a clever way?

I hadn’t realised that Phonemizer implemented punctuation handling and I see it only came in in late January this year:

I’d gone ahead and rolled my own when I tried to test the impact of keeping punctuation in the phonetic output and using that to train a model. My attempts had made it responsive to commas but at the cost of choice voice quality.

There’s a little detail here:

I’ve yet to try Eren’s suggestion to explore mapping several punctuation characters to one symbol in case that’s less disruptive of voice quality. The intuition to me with this is that a dash, comma and bracket etc tend to cause a brief pause thus it could provide a useful signal for prosody even though we’ve lost some insight/meaning with the specific punctuation used.

One other factor that may make this harder to tackle, although not insurmountable, is finding a dataset with significant and consistent punctuation usage (lots of cases where it’s implies one should slow down but there’s no comma; some use it more /less liberally)

Thx for the pointer. I’ve already done the necessary changes for this update. However, TTS used to handle the punctuation using a weird trick before. So this phonemizer update should not change anything in TTS results.

My hacky solution to deal was replace the matches from:

re.compile("(?<=[:;.,!?()’]) (?=[:;.,!?()’])")

with nothing (I had forgotten that you can lookaround in regex, smh).
It phonemeizes libritts ok it seems, not sure how much gunk I’ve introduced. Just put it here in case anyone else is in the middle of dealing.