Developing ad-hoc STT and TTS systems

(Vincent BERMENT) #1

Dear all,

I would like to develop a dedicated STT system. More precisely, I want
to make a system that recognizes (and also generates) a “small” number
of sentences (~5,000) but perfectly. Would it be possible with DeepSpeech ?

I was also wondering whether such systems would require a GPU server tu
work or if they could run on small platforms such as Paspberry Pi 3 B.

Thank you for your help !


(Vincent Foucault) #2

Hi, Vincent,

First, about the RPI3 usage, inferences with large model doesn’t seems to be possible, at least for now because inference process takes too long time, to hope realtime.
Lissyx is working on AOT model (optimized), so perhaps it could be possible, reducing the model !

You’d like to use a model, with a 5000 possible sentences, with the best accuracy possible :
Sure, with Deepspeech, you’re in the right place.

Hoping that your sentences contain standard words,
you create a vocab.txt, containing all your ~5000 sentences, a LM and a TRIE files.
You should obtain a very good accuracy, quite better than everything known in phonemes STT
But, I must say that, actually, there is no perfect solution ! Not yet…

Sorry, Deepspeech is only, actually a STT engine.

Hope it helped

(Egolge) #3

What I can adhere is, we also begin our research on TTS.