Has anyone looked at the practicalities of running this TTS inference on constrained hardware, such as a mobile phone / Raspberry Pi?
I haven’t got to the point of trying this myself yet but it would be useful to hear if anyone tried it and/or if it’s on the road map for the project.
I’m assuming the inference time would be measurably longer, if it’s possible at all - of course, maybe not having a GPU would be a deal breaker (??)
If it weren’t exceptionally slow it might still be reasonably usable for a number of scenarios as it’s fairly easy make the Demo server cache results (helpful where the bulk of your responses are typically from a common set of spoken output, which wouldn’t need inference after the initial time)