I have recently started working with Mozilla Deepspeech, and I was very pleased with its performance. However, the pre-trained model shared by the folks at Mozilla has been a bit of a mixed bag: it works pretty well with crystal-clear, professional speaker’s speech, but anything slightly aside from that (e.g. conferences recordings, 1:1 classes) will trigger a host of words often not even “real” (brands, names, rare loan words from other languages).
I would be very grateful if anyone had a relatively more accurate trained model they were willing to share (ideally under 50 GBs) - I was struck by the Ted-lium corpus in particular, but any submission will be greatly appreciated!! Thanks!
They are working on better models, e.g. through augmentation and more, better data. I don’t think you’ll find anybody sharing a better model for free for now
Depending on your use case, try playing with the language model for better transcriptions. And it looks like you have more real-life data, think of contributing it for future training runs.
Thank you very much for pitching in!
Unfortunately I do not possess a GPU powerful enough to train data myself, and this is why I was looking for a pre-trained model that performed better in real life applications.
My use-case scenario is transcribing university recordings, and a strong will to find a suitable free and open source alternative that would alleviate my reliance on online services (e.g. otter.ai and similar ones)
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
4
Are you using specific software like eSup Pod’s or OpenCast ?
I should have mentioned, my position is that of university student.
I would usually record classes (now the recording part has been superseded as classes have moved online in my country, many being pre-recorded, and those which are live-streamed are easily captured), and make use of AI transcription tools to aid myself in preparing revision notes based on what was taught during the class.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
6
Still, you might have been working as an intern etc.
I was asking because people working on those platforms are looking into DeepSpeech (you can see their questions here and here on Discourse) for this exact need.
Thank you @lissyx , I will have a look into them.
If you personally had in mind a specific relevant conversation you would point me to, I would be grateful about that!