ESPnet project uses g2p library for phoneme translation (available from pip or https://github.com/Kyubyong/g2p).
I did a quick test and g2p appears to be more accurate than phonemizer for my favorite semi-ambiguous sentences.
For example:
txt=“Who’s read the book.”
phonemize(txt,backend=‘espeak’)
Out[56]: 'huːz ɹiːd ðə bʊk ’
" ".join(g2p(txt))
Out[66]: ‘HH UW1 EH1 S R EH1 D DH AH0 B UH1 K .’
Also, it’s about 6x faster for a paragraph-long text:
%timeit phonemize(txt,backend=‘espeak’)
195 ms ± 7.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit g2p(txt)
28.3 ms ± 261 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)