Okay, looks like I am just missing some poorly documented or hidden in all the documentation, command line option. https://tts.readthedocs.io/en/dev/inference.html is quite helpful!
When I create a custom python script like the following, I get the same behaviour.
import torch
from pydub import AudioSegment
from pydub.playback import play
if torch.cuda.is_available():
device = torch.device("cuda")
print("Using GPU.")
else:
device = torch.device("cpu")
print("GPU not available. Using CPU.")
from TTS.tts.configs.bark_config import BarkConfig
from TTS.tts.models.bark import Bark
config = BarkConfig()
model = Bark.init_from_config(config).to(device)
model.load_checkpoint(config, checkpoint_dir="/home/ubuntu/.local/share/tts/tts_models--multilingual--multi-dataset--bark", eval=True)
while True:
text = input("Enter a sentence (or 'exit' to quit): ")
if text.lower() == "exit":
break
elif text != "":
output_dict = model.synthesize(text, config, speaker_id="random", voice_dirs=None)
audio_data = output_dict['wav']
audio_segment = AudioSegment(data=audio_data.tobytes(), sample_width=audio_data.dtype.itemsize,
frame_rate=22050, channels=1)
play(audio_segment)
This works fine “out of the box” though, if I go to http://127.0.0.1:5002 - but it doesn’t seem to be using my GPU:
tts-server --model_name "tts_models/en/jenny/jenny"
Now just need to figure out what the actual names of the voices are and how to use a shorter path… the basic things that the landing page or intro docs should share. Why ship demo things instead of fully usable out of the box things?