Sorry for the bluntness, but soon as in when?
Awesome, thank you. Can’t wait
@nmstoker Any updates? I ran into the same problem. I would like to do TTS on a Raspberry Pi. Using TF Lite or OpenCV and read out loud as it detects.
I was/am considering:
Festival, Flite: A small fast portable speech synthesis system
SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis (A lighter version of WaveGlowish)
A TensorFlow implementation of DeepMind’s WaveNet paper
An implementation of Tacotron speech synthesis in TensorFlow.
A Flow-based Generative Network for Speech Synthesis
Tacotron 2? are you using Google’s? or NVIDIA’s?
I was trying to make it light/fast over customization or handwritten recognition.
If anyone has experience please let me know thoughts/pros/cons
Of those, flite might work on a pi for something maybe neartime. The rest would benefit greatly from having more resources.
Got it! Do know if flite would integrate with TF Lite vs full TF? I am also using Coral (Edge TPU)
Flite does not use tensorflow.
as in “don’t do any plans depending on when I release”
so can I say TTS is 6x slower than real-time on Raspi?
Yes, that’s a fair estimate
If it is the case, I don’t think any other model, even fastspeech would work in real-time in raspi. At least, without any additional optimization. Fastspeech is very computation heavy although it is structurally feed forward.
I’ve put detailed instructions on how to install it on an RPi4 here: https://medium.com/@nmstoker/installing-mozilla-tts-on-a-raspberry-pi-4-e6af16459ab9
I actually recorded the install end-to-end with asciinema with the intention of posting that too but then ran into difficulties as the file is over their limit and converting it to a video (which itself is kind of against the intent of asciinema) also caused problems, but rather than hold it up for that, I’ve posted this now.
There could be some refinements in the approach, but I know for sure that this works with Feb 2020 Buster on an RPi4 4Gb (having had a complete run through to confirm my cut down version was okay and then again to record the terminal session!!)
Would be interested to hear how people get on if they give it a go
Amazing work, thanks a lot @nmstoker .
I added this to the project wiki under examples
Hey @_CA_A, have you tried SqueezeWave? They promisse a lot of speed up. They also have something to try on GitHub. However, I couldn’t install the requirements. I’ve tried with Python 3.6, Pyhton 3.7 and Python 3.8.
I think that they might be the best hope for an ARM TTS right now…
I don’t know one-to-one comparison but melgan can also run on raspi and probably easier to train.
Thanks to your great documentation i’ve setup tts server on a raspberry pi 3 model b rev 1.2.
@nmstoker What about some test sentences to have a performance comparison?
- Hello, how are you?
- This phrase could be spoken because of mozilla tts project and it’s great community.
- free to use text to speech and speech to text by common voice is important for future
- Hello Neil, how about a little raspberry performance battle?
Raspberry pi 3 model b rev 1.2:
- Hello, how are you? —> 37 seconds
- This phrase could be spoken because of mozilla tts project and it’s great community. --> 100 seconds
- free to use text to speech and speech to text by common voice is important for future —> 100 seconds
- Hello Neil, how about a little raspberry performance battle? —> 84 seconds
I expect using the TF version will see quite a speed boost on RPi.
Also, this reminds me, I need to follow up to see what needs to be done to get llvmlite compiled into a wheel on piwheels (for RPi) which in turn should smooth along installing librosa. Details here: https://github.com/piwheels/packages/issues/33
what model did you use for this?
I used the model described in the article by @nmstoker.
- TTS base: branch master, commit 2e2221f
- TTS model: https://github.com/reuben/TTS/releases/download/ljspeech-fwd-attn-pwgan/TTS-0.0.1+92aea2a-py3-none-any.whl