Trying to phonemize on LibritTTS subsets with various hacks to get rid of weird punctuation, but still have stuff like:
She said, sadly, 'Yes!
(3482_170453_000032_000003.original.txt)
I don’t even know how to read that and the phonemizer barfs:
phonemize(text=“She said, sadly, 'Yes!”, separator=phonemizer.separator.Separator(’ |’, ‘’, ‘|’), strip=False, njobs=1, backend=‘espeak’, language=‘en-us’,punctuation_marks=’;:,.!?¡¿—…"«»“”’’,preserve_punctuation=True)
Traceback (most recent call last):
File “”, line 1, in
File “/home/users/myadav/.virtualenvs/sri_tts/lib/python3.6/site-packages/phonemizer/phonemize.py”, line 172, in phonemize
text, separator=separator, strip=strip, njobs=njobs)
File “/home/users/myadav/.virtualenvs/sri_tts/lib/python3.6/site-packages/phonemizer/backend/base.py”, line 126, in phonemize
text = self._punctuator.restore(text, punctuation_marks)
File “/home/users/myadav/.virtualenvs/sri_tts/lib/python3.6/site-packages/phonemizer/punctuation.py”, line 146, in restore
return cls._restore_aux(str2list(text), marks, 0)
File “/home/users/myadav/.virtualenvs/sri_tts/lib/python3.6/site-packages/phonemizer/punctuation.py”, line 166, in _restore_aux
[text[0] + m.mark + text[1]] + text[2:], marks[1:], n)
File “/home/users/myadav/.virtualenvs/sri_tts/lib/python3.6/site-packages/phonemizer/punctuation.py”, line 166, in _restore_aux
[text[0] + m.mark + text[1]] + text[2:], marks[1:], n)
File “/home/users/myadav/.virtualenvs/sri_tts/lib/python3.6/site-packages/phonemizer/punctuation.py”, line 166, in _restore_aux
[text[0] + m.mark + text[1]] + text[2:], marks[1:], n)
IndexError: list index out of range
Hard for me to guess what the right approach here is. Has anyone already dealt with this in a clever way?