Hey @williamalex, welcome…
If you are after a server-less solution (i.e. inference done on the device, without connecting to a server), and you are working on CUDA GPU-less low-end devices like smartphones, you would need small model size, speedy and CPU-only inference.
One solution might be Coqui STT, which does that. It is the successor of Mozilla Deepspeech. It is not as accurate as larger models, and largely depends on the language model / vocabulary, but a serverless solution for mobile phones where you get everything correctly transcribed in real time does not exist, at least AFAIK…
Here is the repo: https://github.com/coqui-ai/STT
From there you can reach other resources, such as chat area or documentation.
I’m not so much familiar with other models, so there might be other solutions people can suggest.