Does deepspeech use any phonetics sublayer while transforming speech to text? Is there a possibility to see its before and not after applying deepspeech to speech?
If not, maybe is there some neural networks for this goal, i mean to convert speech to phonetics?
with deepspeech it doesn’t work with the normal models. the lowest processing level corresponds to the output without a scorer (= the result of the acoustic model). the acoustic model already has a direct relation to the orthography. a special acoustic model would be necessary (which would be incompatible with the deepspeech scorer). As far as I know it doesn’t exist - if I’m wrong -> I’d be interested too.