Hello, everyone. I am trying to make mozillatts take in a paragraph and read it while it is still inferencing; i.e., play a sentence while processing the next sentences instead of outputing them in one batch. I am doing this:
import os
import sys
import nltk.data
tokenizer = nltk.data.load(‘tokenizers/punkt/english.pickle’)
fp = sys.stdin
data = fp.read()
sens = tokenizer.tokenize(data)
for sen in sens:
os.system(‘curl -G --output - --data-urlencode text="’ + sen + ‘" http://localhost:5002/api/tts | aplay’)
Then I run that through a script:
#!/bin/bash
cat /dev/stdin | python /home/zoomerhimmer/bin/scripts/feed-speak.py - &
trap ‘kill $!; exit 0’ INT
wait
On the terminal:
$echo “Say something. How about this? OK, that’s good.” | foliate-speak.sh
Do not be deceived! I am not a real developer. I got all this code from who-knows-where over the web. However, there are at least two issues with the above: 1) It doesn't inference ahead, but only plays the sentence then processes the next; 2) For some reason it doesn't work with the foliate e-reader.
I have been using just the server so far, but I'm wondering if it wouldn't be better to write a custom synthesis script. My hunch is that I'll need to dig deeper (maybe at the synthesizer.py level?) to be able to queue the audio and inference simultaneously. And I'll actually have to learn python for real this time.
Anyway, I will keep working on it and let you guys know if I succeed. Besides, it's probably something very simple progammically speaking. There's a deepspeech streaming script right here: https://github.com/mozilla/DeepSpeech-examples/blob/r0.9/mic_vad_streaming/mic_vad_streaming.py. So I don't think it should be impossible.
Of course, if anyone feels inclined to point me to a ready made solution like deepspeech's, I would be more than happy to drop all my pride in workmanship and run gleefully through the easy gate. That way I can hold off on learning python until I next find myself in a sticky wicket :)
Previously, something like this worked with foliate:
curl -G --output - --data-urlencode text="$(cat /dev/stdin)" ‘http://localhost:5002/api/tts’ | aplay - &
trap ‘kill $!; exit 0’ INT
wait
But it would take a long time to process a page. Though it might be practicable with a gpu.