If you are after a server-less solution (i.e. inference done on the device, without connecting to a server), and you are working on CUDA GPU-less low-end devices like smartphones, you would need small model size, speedy and CPU-only inference.
One solution might be Coqui STT, which does that. It is the successor of Mozilla Deepspeech. It is not as accurate as larger models, and largely depends on the language model / vocabulary, but a serverless solution for mobile phones where you get everything correctly transcribed in real time does not exist, at least AFAIK…
Here is the repo: https://github.com/coqui-ai/STT
From there you can reach other resources, such as chat area or documentation.
I’m not so much familiar with other models, so there might be other solutions people can suggest.
thank you soooo much, I saw deepspeech examples but most of them are written with python, and I know nothing about python ): , if u know anything about connecting this model (https://huggingface.co/Akashpb13/Central_kurdish_xlsr) with an android app, please let me know
We already have a considerable amount of data and many skilled programmers, I do not understand why no one has made a decent offline application for voice recognition for android, windows and linux. It doesn’t make any sense.
@mary, it might be for security reasons? If you have such many apps able to post recordings though a public API, it would not be controllable.
Actually, CV has a working app for all operating systems, it is called “browser”, although not offline.
AFAIK, CV is also aware of the problems for people with low-connection quality. Last year they had a competition for changing the current codebase to a PWA (progressive web application), which would be able to work offline by definition, which would also prevent low-bandwidth scenarios. But nobody came out for this implementation, it was a huge undertaking. But you already know the existing Android app, right?