RFC: Picking the "living room microphone" as our only focus


(Michiel de Jong) #1

After the current tech explorations and user research phase, IMHO we will have to pick one product to focus all our efforts on. I don’t think we are good enough to successfully build:

  • a home automation hub that interacts with IoT devices, as well as
  • a Thinkerbell/RemoteWorker runtime device to which third parties can push background code/rules, as well as
  • an always-on microphone / voice browser for your living room.

We would be splitting our project into three focuses, and nobody could give a consistent answer anymore to the question ‘what is Project Link’.

Let me try to explain why I think the microphone product should be the one that should win our attention. This is very much “IMHO” and I definitely expect others to disagree with this, so maybe read this mostly as a strawman argument…

IMHO the home automation hub and the Thinkerbell/RemoteWorker runtime are not stand-alone products (by themselves they are a hidden helper product that people need to buy to get a desirable third-party app work). So we will never sell these unless we can identify a third-party whose product is exciting, and is useless if not accompanied by our helper product. Sounds unlikely to happen before December 2016?

On the other hand, as the always-on living room microphone is emerging as a new internet client device (next to laptop, tablet, smartphone) through which people interact with hosted services, I see a clear role for Mozilla. For a moment, forget about home automation and pro-active/learning rules engines, and consider only the ‘voice browser’ as a very simplistic “thin client” product that does very little processing itself.

This is the always-on, 1984-style, living room microphone as an internet client. The Amazon Echo is leading the way, and it’s not a secret that they’re selling it at an attractive price, so that it makes the user spend more on Amazon’s hosted services, and so that Amazon get more intimate information with which they can profile users. It seems likely that Google, Apple and Microsoft will launch an Echo-competitor. Oh and wait, I just checked and it seems that Google just announced theirs a few hours ago! :slight_smile: http://www.billboard.com/articles/business/7377470/google-announces-home-competitor-amazon-echo

It’s obvious that these microphone products will come pre-configured to use the hosted voice-controlled internet services of their corresponding vendors. But what if we build one (just the microphone device, a very thin client) that allows the user to choose where to send their voice commands?

And what if the user can send different commands to different voice-controlled services - using multiple wake words to act like DuckDuckGo Bang-operators? It would break open this market (I guess since today’s Google announcement we can suddenly call “living room microphones” a market).

If we use an off-the-shelf microphone array for the hardware prototype, and only have to process the wakeword locally (no other complexity, so a very “thin” client, really just a voice-controlled search browser), all we really have to do is implement abstractions of the different third-party voice APIs.

The pivot would hurt of course (team focus always does, I guess, as it limits team members in picking what you want to work on), but if we assume we want voice (which it seems we’re doing), then consider the only alternative option: we would make voice an “added feature” of the product we built so far. It would mean we also need to do everything we need to do to build a simplistic voice browser device, as well as split our attention to IoT protocol adapters and the runtime device, all efforts in parallel.

IMHO, if we decide we want to do voice then that’s enough to keep our small team busy, and we should temporarily focus on doing only that and temporarily park everything else.

Once we ship and see market adoption, we can then add IoT connections and the scripts runtime back as added features, once we have them shippable, in the next OTA update of the base-product, to further amaze the user. :slight_smile:

My 2ct.


(David Teller) #2

Have you looked at my proposal here: https://drive.google.com/file/d/0B7cLrZgzp2TaNnhLZ1Y3ZElfWHc/view?usp=sharing ?


(Michiel de Jong) #3

Ah, I hadn’t yet - great doc! But, are you suggesting we can build and ship all of that (‘Generation 1’) before December?


(Michiel de Jong) #4

I guess what I’m proposing is using your “foot in the door strategy” all the way, to launch just the minimal, thin, internet-voice-client-device first, and use Q3+Q4 to succeed at that, rather than disperse ourselves into understaffed mini-teams with multiple huge goals.

Then launching it with the promise, of course, that we’ve already prototyped home automation functionality for it, which we’ll launch later when it’s ready (again spending at least one or two quarters with the whole team, just on nailing the addition of home automation for people who have or want to have IoT devices at home). Smartphone integration could be the next update after that, maybe? Again, I could easily see our whole team spend 2 quarters just on getting the integration with both Android and iOS addressbooks to work in a way we can be proud of.


(David Teller) #5

I hope that we can. But I’m even happier if you can find/refine something even smaller that we can build and ship faster, and that we can later upgrade (preferably over-the-air) into a Home User Agent.

It’s just not very clear to me which use cases would be served by your “living room microphone” proposal. Could you elaborate?


(Michiel de Jong) #6

Very similar to the use cases of a web browser - it gives access to online services by 1) uploading your request to the right server and 2) presenting the answer that was given by that server. Only difference is you speak instead of typing/clicking, and you listen instead of viewing/scrolling (so use cases are restricted to queries that allow for a short answer), and that it’s always-on in your living room instead of being installed on your phone or laptop.

So initially, two use cases:

  • I want to send a voice command to Google’s API and get the answer streamed over the speaker.
  • I want to send a voice command to Amazon’s API and get the answer streamed over the speaker.

We could add Cortana/Bing Speech Services and Siri, although it’s not clear if they have a speech API we can use (still looking into that).

Also nice would be if we can get the device to stream internet radio over its own speaker, and possibly also integrate with Chromecast for video streaming. I mean, the answer to a voice command may be:

  • a TTS answer,
  • a sound media file that starts streaming,
  • something being displayed on your television,
  • something happening (confirmed by TTS answer when successful), like ordering a book from Amazon or adding something to your Google Calendar.

Would have to look deeper into the various Speech APIs to say what they can do once you send voice commands to them, and that will be the use cases of the device.

All the device does by itself is acting as a generic client for voice-controlled cloud services, so upload the sound snippet from the microphone and stream back the sound answer from the service that was contacted, over its speaker.


(David Teller) #7

I’m a bit scared about the listening part. If I’m looking for a specific information on Wikipaedia, I don’t want to hear my speaker buzzing for 10 minutes. If I’m looking for a specific recipe, I don’t want to hear the navigation menu or the ads, etc.

Do you have ideas on how to solve these issues?


(Michiel de Jong) #8

The answers from Alexa Voice Service will already come optimized for that, they even already come as an audio stream to directly play on the speaker. For Google’s Speech API it’s different, that’s not a virtual assistant, it’s only a STT engine. Google Home gets optimized answers from the virtual assistant server, but that API is not open yet.

Ideally all voice-controlled virtual assistant services will eventually have APIs like Alexa does. For the others we’d have to see how we get short enough answers out of them.

So enough work there just ‘doing voice’ without also doing IoT and smartphone-interaction in the first version. :slight_smile:


(David Teller) #9

In that case, indeed, this would probably be a valid foot-in-the-door product. I’m a bit skeptical of depending 100% upon Alexa but that might not be a blocker, in particular if we intend to follow up with additional features.

I most likely wouldn’t buy it for myself, though, as my sole application would be checking recipes while I’m cooking.