The README in that repo now says pipsqueak is part of the DeepSpeech repo. It is a using the same codebase and model now as ‘server’ DeepSpeech or part of the native client?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
4
I’m not sure what you mean, “pipsqueak” is basically the model exported as TFLite, and libdeepspeech.so including runtime for TFLite on (some) platforms.
Thanks. I wondered whether pipsqueak was a model different from DeepSpeech, specifically tuned for low resource systems, with possible disadvantages, but it seems that’s not the case.
Visit Mozilla’s GitHub
Read the GitHub wiki
Pipsqueak Engine
Online STT technologies can have security and privacy vulnerabilities. Mozilla researchers aim to create a competitive offline STT engine called Pipsqueak that promotes security and privacy. This implementation of a deep learning STT engine can be run on a machine as small as a Raspberry Pi 3. Our goal is to disrupt the existing trend in STT that favors a few commercial companies, and to stay true to our mission of making safe, open, affordable technologies available to anyone who wants to use them.
Am interested in the current status of the former intent. Specifically the ability to use Pip Squeak Engine as the service for Web Speech API, to avoid the issue of sending audio to a remote web service w3c/speech-api#56.
How far (in estimated time) are we away from being able to use Pip Squeak or DeepSearch as the local service to use for STT?
I happened to search STT and found the same link of Mozilla which mentioned about Pipsqueak (and it’s exciting!). But I cannot find further information and noticed that there’s nothing of Pipsqueak can be found in the DeepSpeech repo. The information so far confuses me a lot. Does anyone know the status of Pipsqueak?
1 Like
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
9
If you read the documentation and the releases notes, you can see we run on Android devices and RPi4 realtime now.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
10
Followed the instructions. deepspeech is not being compiled. Not sure what to state? Can step-by-step instructions be posted for *nix 32-bit architecture, or clearly stated such platform and archicture are not supported whatsoever?
MediaStreamTrack is described in Media Capture and Streams spaecification (W3C).
That is not an issue. Can communicate with the native file system using a variety of means, including Native Messaging and WebSocket.
Consider https://github.com/w3c/speech-api/issues/66. Am in the process of creating a proof-of-concept demonstrating passing a MediaStreamTrack (audio), data URI, ArrayBuffer, Float32Array to a local function which executes a STT binary or series of binaries on the client - without any external resources involved.
Already created several proof-of-concepts for TTS using espeak, espeak-ng, Native Messaging and WebSocket, and Native File System.
The goal is to create JavaScript functions which execute local binaries to achieve TTS and SST commenced by browser code completely locally.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
15
No, sorry. Mostly all distributions are stopping supporting 32-bits, and upstream tensorflow does not, so we cannot do it.
Ubuntu still releases 32-bit distributions. “Chromium team” PPA supports 32-bit versions of Chromium dev version. Am running Nightly 32-bit right now.
From perspective here, until completely obsolete 32-bit programs should be supported. (Do not burn books because the Web exists).
The concept is to implement the code by “any means” at this point. As far as “spread” a working “hack” is “better” than no code to use at all. AFAICT do not have any “followers” in that sense. Work alone and publish own workarounds.
Again, it is unfortunate this project is not being concurrently implemented as the code shipped with the browser as the local service for Web Speech API.
Will take some time to read the links that you have posted.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
17
TensorFlow does not, and since we rely on that, and we don’t have time to fix the world, we don’t.
This is a WIP hack. It is not meant for anything more than exercising the WebSpeech API and improving it and DeepSpeech API. The current Firefox media code that it interacts with is expected to have a huge refactoring.
Nothing is “alone” here, it’s just not ready for being more broadly communicated, and that being ready is not under my responsability.
Have you read my code and the bug ? This is explicitely a WIP of local WebSpeech API implementation backed by DeepSpeech. It’s still hack because a lot of other refactoring needs to be done prior to this work, and this refactoring still has not landed.
InvalidStateError: An attempt was made to use an object that is not, or is no longer, usable
at Nightly 71.0a1 (2019-10-05) (32-bit)
29 recognition.start();
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
20
I don’t think there’s anything actionable here where you can be of help, sadly. You need to use a build from the branch I gave earlier. Work to integrate DeepSpeech as a WebSpeech API backend is still a long way. We are working on that, nothing is done behind closed doors, as you can see on the bugzilla links I shared.