Create A SubSet of existing models

elpimous_robot · February 16, 2018, 8:50pm

@mischmerz,
you’re right. This technology isn’t very usefull for RPI3.
For a small GPU board like mine (TX2), it’s better.

It was one of my researchs in the past : what is the best STT for a small board/low power/outdoor…
I really think that, for a full model, the best way is an online solution (google/bing…) with a mobile model access (phone)

But, Lissyx, and all the team, work hard, triing to reduce ram/size needs. They are dependant of TF possibilities.

Sure, in the past, cpu need will be reduced (and cards will grow up. Thanks Mr Moore)

Now, I’m an old user of pocketsphinx, I created my own fr model, adapted from LIUM one, and even with a lot of time spent to increase learning (adaptation), I never had this result ! (never better than 82% accuracy. Now, with same vocabulary, I reach 95% !)
The time, in response(inference) is nearly the same pocketsphinx/deepspeech(python)

But, I agree, a small board like a rpi3 is too light !! (but good for arcade emulation (‘recalbox’) LOL

See U

mischmerz · February 16, 2018, 11:15pm

@elpimous_robot,

Here’s the thing: The voice controlled environment runs primarily on small devices like the RPi3. And in order to be able to rely on speech driven IoT, we must remove the cloud from the equation. Or you accept the fact that you won’t be able to turn your lights on when the Internet is down. If we can’t get rid of this dependence by being able to reduce the complexity of the problems down to to be able to run deepspeech on e.g. RPIs - without network access - I have to ask myself what is the ultimate target for this project? Wakeword systems like kitt.ai are light weight and fairly reliable. Sure - just one word. But why not have a deepspeech engine that understands 10 or 20 words on a RPi? Are you telling me that this is not possible?

elpimous_robot · February 16, 2018, 11:30pm

Of course, not !!
Everything could work nice !
But you must anderstand that the time spent for inference will grow with words numbers.
For a very small vocabulary, it will work, but you will not have realtime.
It will be your compromise !
Hope it will help you, Mischmerz

mischmerz · February 16, 2018, 11:46pm

@elpimous_robot

Sphinx doesn’t run real time on an RPi either. Yes - I am talking about 20 words or so. And you’ll have a delay on networked streamed services as well. But even with a two second or so delay - it would still be a lot… lot better than all current alternatives.

BadrEL · March 18, 2019, 1:14pm

The idea of @mischmerz is cool.
I don’t know what is the structure of a trained model, but it will be nice if we could extract a subset from a bigger model or merge multiple small models into one bigger model.
Not much projects need a universal model I guess.

elpimous_robot · March 30, 2019, 1:57pm

Hi.
I just tried this, without success.

First, i trained a full fr model (11Go)
Next, i created a small LM and TRIE files
Last, i tested inferences mixing BIG model and SMALL LM/TRIE…worked but…
It took a too long time too…

The idea is to build the fastest multi speaker on small dictionary.

Any ideas ??