Offline speech recognition on mobile

Can DeepSpeech let me implement local, offline speech recognition on mobile?

2 Likes

Right now, you could do it on a high end phone, but it would be slow. We haven’t yet created models optimized for inference on mobile devices, but it’s on the roadmap.

4 Likes

I was just wondering how to use Mozilla Deep Speech in Android instead of the Google Voice service. I guess it’s not possible yet? What’s the roadmap, roughly? How can someone with little coding experience help?
Cheers :slight_smile:

Any update on getting Deep Speech on to an iOS device?

There’s been progress, in that the model is actually convertible to CoreML now: https://github.com/mozilla/DeepSpeech/issues/642 and https://github.com/tf-coreml/tf-coreml/issues/309

Next steps would be:

  1. Adding a class that implements the ModelState API using CoreML, similar to how we currently have TFModelState and TFLiteModelState implementations.
  2. Figuring out how to compute features, as I’ve had to remove the feature computation sub-graph to get the CoreML conversion to finish. I don’t think the AudioSpectrogram/MFCC ops are supported in CoreML. I’d start by simply vendoring TensorFlow’s kernels and building those into libdeepspeech.so. We could even use this work in all model types, to reduce overhead.
  3. Figuring out packaging for iOS. Nobody on our team has iOS experience so I don’t have any suggestions for this. Basically, make it possible to build a DeepSpeech package with the format used for iOS dependency management.
2 Likes

Step 3 would possibly also involve adding Swift bindings to the C API.

2 Likes

sorry if i missed something, but does it mean that for now, the best way to make a deepspeech model run streaming inferences on IOS is :

  • Recoding preprocessing part in metal
  • convert subgraph CNNRNN to coreml
  • convert language model to an ios interpretable format
  • Recoding ds_decode part in metal
  • wrap all this with custom swift pipeline

thxs