Bumblebee first release (deepspeech desktop app server)

dsteinman · July 8, 2020, 5:02pm

I’ve released the first version of Bumblebee:

This is my attempt at creating a DeepSpeech electronjs app, and a websocket API for writing voice-controlled JavaScript apps.

The idea here is Bumblebee handles the troublesome stuff of installing and setting up DeepSpeech, handles the microphone, adds an “alexa” style hotword system (porcupine), and a text-to-speech system (mespeak). And together these systems can be treated as an always-available service that applications can use. No need to start and stop DeepSpeech as you’re writing an application - Bumblebee stays running in the background and your application communicates to it through a websocket API.

The API is very simple, as you can see in the hello world example:

github.com

jaxcore/bumblebee/blob/master/examples/helloworld/helloworld.js

const Bumblebee = require('jaxcore-bumblebee');

class HelloWorldApp extends Bumblebee.Application {
	constructor() {
		super(...arguments);
	}
	
	async loop() {
		this.console('Say "Hello World"');
		
		let recognition = await this.recognize();
		this.console(recognition);
		
		if (recognition.text === 'hello world') {
			await this.playSound('okay');
			await this.say('Hello World');
		}
		else {
			await this.playSound('error');
		}

This file has been truncated. show original

I think the really nice thing about having a shared base system like this is I can write a small voice app in one single JavaScript file, and just share that file, and be able to run it on another computer without having to install a new instance of DeepSpeech for each app.

i don’t have any sophisticated NLP/intent parsing going on here and I’m only using the pretrained English models. This release was mainly just to get the base system working and running/installing correctly on Mac, Linux, and Windows.

I don’t have an exact roadmap of where this project goes from here, but I have a lot of ideas for applications that I want to write. I’m primarily interested in voice-controlling pretty much everything – my computer, things around my home, and to build voice-controlled web applications.

I’d appreciate feedback – do other people find use in an API like this that handles all the stt/tts/hotword stuff leaving you to focus on what you want your “voice app” to do?

dan.bmh · July 10, 2020, 9:03am

In case you didn’t see it already (link), I made something similar, but in python. The project uses rasa for nlu parsing and automatically builds domain language models out of the installed skills for improved stt performance.

Maybe you can find some inspirations for your project there, you can find it here:

dsteinman · July 10, 2020, 12:16pm

Wow nice work. Yes, I’d definitely be interested in seeing your solution. I don’t have any NLP experience so I’m kind of starting from scratch. I don’t know of many JavaScript-centric tools for this, and ideally I need something that can work in a browser.