So.. how does it actually work?

This is a n00b question, but:

I’ve managed to install this on a Raspberry Pi 4B with 4G ram in the hopes of using this as part of a smart home system. To be specific: I’d like to integrate this into the Mozilla WebThings Gateway.

Currently I use a simple binary (NanoTTS) which generates speech from a simple input string.

I’d like to upgrade that to natural sounding voice. If it takes a few seconds to generate sentences such as “That device is off” or “It is 4:30”, that is not really an issue.

Now that I have it installed… what do I do? How can I generate a binary / shell script what takes a string and turns it into a wav? What are the steps I should look into?

(Also, I will not be training the model myself)


TTS models are not deployment ready yet.