Using common voice to make android speech to text app

williamalex · December 29, 2022, 9:01am

hello, I want to download the common voice dataset and use the dataset to make an android speech-to-text app, but the problem is I don’t know to use which library!
should I download the dataset here?
https://commonvoice.mozilla.org/ckb/datasets
or on this website, as you can see
https://huggingface.co/Akashpb13/Central_kurdish_xlsr

please help me, I’m trying to do this for weeks

bozden · December 29, 2022, 5:05pm

Hey @williamalex, welcome…

If you are after a server-less solution (i.e. inference done on the device, without connecting to a server), and you are working on CUDA GPU-less low-end devices like smartphones, you would need small model size, speedy and CPU-only inference.

One solution might be Coqui STT, which does that. It is the successor of Mozilla Deepspeech. It is not as accurate as larger models, and largely depends on the language model / vocabulary, but a serverless solution for mobile phones where you get everything correctly transcribed in real time does not exist, at least AFAIK…

Here is the repo: https://github.com/coqui-ai/STT
From there you can reach other resources, such as chat area or documentation.

I’m not so much familiar with other models, so there might be other solutions people can suggest.

williamalex · December 29, 2022, 5:24pm

thank you soooo much, I saw deepspeech examples but most of them are written with python, and I know nothing about python ): , if u know anything about connecting this model (https://huggingface.co/Akashpb13/Central_kurdish_xlsr) with an android app, please let me know

bozden · December 29, 2022, 5:39pm

Sorry, I’m not so versed on HF stuff (and don’t want to comment on the methodology there).

But that model seems old, if I’m not mistaken. Central Kurdish dataset has been very nicely worked on lately (see here for more statistics).

If I were you, I’d re-train it. FYI, there is also this.

mary · January 17, 2023, 6:02pm

We already have a considerable amount of data and many skilled programmers, I do not understand why no one has made a decent offline application for voice recognition for android, windows and linux. It doesn’t make any sense.

bozden · January 18, 2023, 4:05am

It doesn’t make any sense.

@mary, it might be for security reasons? If you have such many apps able to post recordings though a public API, it would not be controllable.

Actually, CV has a working app for all operating systems, it is called “browser”, although not offline.

AFAIK, CV is also aware of the problems for people with low-connection quality. Last year they had a competition for changing the current codebase to a PWA (progressive web application), which would be able to work offline by definition, which would also prevent low-bandwidth scenarios. But nobody came out for this implementation, it was a huge undertaking. But you already know the existing Android app, right?