Running TTS on constrained hardware (+ no GPU)

nmstoker · May 7, 2020, 6:44pm

Yes, that sounds great. TF-lite allows really impressive performance for DeepSpeech on RPi

I’ve got a little behind with the write up but it’s mainly showing how to get round a couple of non-obvious installation issues with a couple of packages. It’s a national holiday here tomorrow, so a long weekend, meaning it’ll definitely get done by Sunday

Rayd · May 7, 2020, 9:31pm

Sorry for the bluntness, but soon as in when?

Rayd · May 7, 2020, 9:32pm

Awesome, thank you. Can’t wait

_CA_A · May 10, 2020, 11:31pm

@nmstoker Any updates? I ran into the same problem. I would like to do TTS on a Raspberry Pi. Using TF Lite or OpenCV and read out loud as it detects.

I was/am considering:

Festival, Flite: A small fast portable speech synthesis system

SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis (A lighter version of WaveGlowish)

WaveNet
A TensorFlow implementation of DeepMind’s WaveNet paper

Tacotron
An implementation of Tacotron speech synthesis in TensorFlow.

WaveGlow
A Flow-based Generative Network for Speech Synthesis

Tacotron 2? are you using Google’s? or NVIDIA’s?

I was trying to make it light/fast over customization or handwritten recognition.
If anyone has experience please let me know thoughts/pros/cons

baconator · May 11, 2020, 12:36am

Of those, flite might work on a pi for something maybe neartime. The rest would benefit greatly from having more resources.

_CA_A · May 11, 2020, 6:12am

Got it! Do know if flite would integrate with TF Lite vs full TF? I am also using Coral (Edge TPU)

baconator · May 11, 2020, 6:22am

Flite does not use tensorflow.

erogol · May 11, 2020, 2:13pm

as in “don’t do any plans depending on when I release”

erogol · May 11, 2020, 2:17pm

so can I say TTS is 6x slower than real-time on Raspi?

nmstoker · May 12, 2020, 5:48pm

Yes, that’s a fair estimate

erogol · May 13, 2020, 12:07pm

If it is the case, I don’t think any other model, even fastspeech would work in real-time in raspi. At least, without any additional optimization. Fastspeech is very computation heavy although it is structurally feed forward.

nmstoker · May 14, 2020, 1:09am

I’ve put detailed instructions on how to install it on an RPi4 here: https://medium.com/@nmstoker/installing-mozilla-tts-on-a-raspberry-pi-4-e6af16459ab9

I actually recorded the install end-to-end with asciinema with the intention of posting that too but then ran into difficulties as the file is over their limit and converting it to a video (which itself is kind of against the intent of asciinema) also caused problems, but rather than hold it up for that, I’ve posted this now.

There could be some refinements in the approach, but I know for sure that this works with Feb 2020 Buster on an RPi4 4Gb (having had a complete run through to confirm my cut down version was okay and then again to record the terminal session!!)

Would be interested to hear how people get on if they give it a go

mrthorstenm · May 14, 2020, 5:11am

Amazing work, thanks a lot @nmstoker .

erogol · May 15, 2020, 9:00am

I added this to the project wiki under examples

kms · May 16, 2020, 12:29am

Hey @_CA_A, have you tried SqueezeWave? They promisse a lot of speed up. They also have something to try on GitHub. However, I couldn’t install the requirements. I’ve tried with Python 3.6, Pyhton 3.7 and Python 3.8.

I think that they might be the best hope for an ARM TTS right now…

erogol · May 17, 2020, 11:42am

I don’t know one-to-one comparison but melgan can also run on raspi and probably easier to train.

mrthorstenm · August 4, 2020, 4:09am

Thanks to your great documentation i’ve setup tts server on a raspberry pi 3 model b rev 1.2.

@nmstoker What about some test sentences to have a performance comparison?

Hello, how are you?
This phrase could be spoken because of mozilla tts project and it’s great community.
free to use text to speech and speech to text by common voice is important for future
Hello Neil, how about a little raspberry performance battle?

mrthorstenm · August 4, 2020, 4:08am

Raspberry pi 3 model b rev 1.2:

Hello, how are you? —> 37 seconds
This phrase could be spoken because of mozilla tts project and it’s great community. --> 100 seconds
free to use text to speech and speech to text by common voice is important for future —> 100 seconds
Hello Neil, how about a little raspberry performance battle? —> 84 seconds

nmstoker · August 4, 2020, 7:49pm

Very good!

I expect using the TF version will see quite a speed boost on RPi.

Also, this reminds me, I need to follow up to see what needs to be done to get llvmlite compiled into a wheel on piwheels (for RPi) which in turn should smooth along installing librosa. Details here: https://github.com/piwheels/packages/issues/33

erogol · August 6, 2020, 12:27pm

what model did you use for this?