Reviewing German sentences of other contributors, I found a bunch of sentences where the author circumvented the acronym detection by inserting dashes (BKA -> B-K-A). I don’t think this is how we should deal with abbreviations. The How-To should be adapted to be more explicit on this issue.
I must say though, that I don’t like the strict prohibition of acronyms. In many cases (such as BKA), there really is only one way to pronounce them. Maybe we should have a peer-reviewed whitelist of acronyms?
@nukeador Is acronym prohibition really necessary? The engine developed is speech to text. Therefore multiple pronunciations can be mapped to a single acronym without confusing the engine, right?
eXeMeL → XML
iXeMeL → XML
We have a lot of recorded lectures making extensive use of acronyms. How will the engine be able to properly recognize use of acronyms without training?
I think this is a good point @rillke, as long as the data set is used for STT only. Is anybody planning to build a TTS engine from it, as well? If yes, I’d still vote for the whitelist solution.
nukeador
(Rubén Martín [❌ taking a break from Mozilla])
5
I’ll defer to @kdavis or @josh_meyer to answer this one but my understanding is that the engine should know how to pronounce individual letters.
It’s not as simple as “spelling them out” (for example “NASA” vs “CIA”). Furthermore, different people may say the same acronym differently, making the transcripts even less reliable.