Labeling with phonems rather than letters

(Nikita) #1

I’ve searched through this forum but haven’t found any documentation on why Deep Speech uses letters as input rather than phonemes. Is there any practical reason to do so?
My team at University and I make our own spoken corpora and we wonder if there would be any gain from using phonemes instead of letters? Would it require a lot of pre/post-processing or Deep Speech can be easily adapted for doing so? Thank you in advance.

(kdavis) #2

You could use phonemes, but then you’d always have to have a phonetic spelling for whatever language you train on. This would make the STT hurdle for something like Hakha Chin higher.

(Nikita) #3

Thank you for response. Don’t quite understand you, isn’t using phoneme labels just a matter of changing alphabet.txt?

(Nikita) #4

I understand that using phoneme-based alphabet would require a lot of professional phoneticians’ labor, but how do you consider the gain we would have with it? Phonemes better reflect the sound people utter (‘o’ in ‘cow’ and ‘o’ in ‘dog’ are very different sounds), it let us handle out-of-vocabulary words better…? I’ve searched through web but don’t see any substantial discussion over this.

(kdavis) #5

Yes, changing alphabet.txt would work assuming the data you train on also has phonetic spellings and not transcripts.

(Nikita) #6

Thanks. What do you think about the question above? (power of phonetic alphabet)

(kdavis) #7

I’d guess phonemes would work better, but we’re not choosing to use them for now.