Hi,
I’m trying to run Mozilla TTS. I used DDC_TTS_and_MultiBand_MelGAN_TF_Example and it worked exactly like in the example but… it only shows audiofile on my jupyter notebook. I would like to save this file or at least play this on my terminal. I tried to do that like this:
def tts(model, text, CONFIG, p):
t_1 = time.time()
waveform, alignment, mel_spec, mel_postnet_spec, stop_tokens, inputs = synthesis(model, text, CONFIG, use_cuda, ap, speaker_id, style_wav=None,
truncated=False, enable_eos_bos_chars=CONFIG.enable_eos_bos_chars,
backend='tf')
waveform = vocoder_model.inference(torch.FloatTensor(mel_postnet_spec.T).unsqueeze(0))
waveform = waveform.numpy()[0, 0]
rtf = (time.time() - t_1) / (len(waveform) / ap.sample_rate)
tps = (time.time() - t_1) / len(waveform)
print(waveform.shape)
print(" > Run-time: {}".format(time.time() - t_1))
print(" > Real-time factor: {}".format(rtf))
print(" > Time per step: {}".format(tps))
IPython.display.display(IPython.display.Audio(waveform, rate=CONFIG.audio['sample_rate']))
scipy.io.wavfile.write('test.wav',CONFIG.audio['sample_rate'],waveform)
return alignment, mel_postnet_spec, stop_tokens, waveform
sentence = "Holly molly, it works!"
align, spec, stop_tokens, wav = tts(model, sentence, TTS_CONFIG, ap)
But it saved a file with 1 minute long and what I hear is a slow motion voice. How should I do that?