Multi-platform, multi-language Docker images

Hi, everyone!

I’ve recently updated my MozillaTTS docker images with:

  • Support for ARM platforms (32-bit and 64-bit) as well as CPUs without AVX instructions (Celeron, etc.)
  • Pre-trained models for English, Spanish, French, and German

Try it out with:

docker run -it -p 5002:5002 synesthesiam/mozillatts:<LANGUAGE>

where <LANGUAGE> is en, es, fr, or de. If you have a CPU without AVX instructions, add -noavx to the end.

You should be able to visit http://localhost:5002

There’s a /api/tts endpoint (GET with ?text=... or POST) as well as a MaryTTS-compatible /process endpoint so you can use things like the Home Assistant MaryTTS integration (hint: use -p 59125:5002).

Please let me know if you find any bugs or have other models to include :slight_smile:


Small update: the /api/tts endpoint now handles text with multiple lines. It will synthesize each line independently, and then combine the audio for the final output.


I pulled the one for ARM64 and it doesn’t work on Raspberry Pi 5. It has a numpy version issue.

Has anyone successfully run this in a Raspberry Pi in 64 bit mode?

I’m a hardware guy, looking to self-contain several AI files inside the Raspberry Pi, so I can study how to convert AI execution to fully parallel FPGA logic and make a $20 AI chip.

I’ve made worse things before. A full hardware H.264 encoder for example. No CPUs in it anywhere.

AI ought to be a lot easier to do in hardware than H.264.