Failing to synthesize when using GravesAttention

baconator · May 5, 2020, 6:39pm

I’m using TTS cloned 4/24/2020. The end of epoch test sentences fail to generate when I set attention_type to “graves”:

 !! Error creating Test Sentence - 0
Traceback (most recent call last):
  File "train.py", line 501, in evaluate
    do_trim_silence=False)
  File "/opt/voices/TTS/tts_namespace/TTS/utils/synthesis.py", line 129, in synthesis
    model, inputs, CONFIG, truncated, speaker_id, style_mel)
  File "/opt/voices/TTS/tts_namespace/TTS/utils/synthesis.py", line 43, in run_model
    inputs, speaker_ids=speaker_id)
  File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
    return func(*args, **kwargs)
  File "/opt/voices/TTS/tts_namespace/TTS/models/tacotron.py", line 150, in inference
    encoder_outputs, self.speaker_embeddings_projected)
  File "/opt/voices/TTS/tts_namespace/TTS/layers/tacotron.py", line 457, in inference
    self.attention.init_win_idx()
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 576, in __getattr__
    type(self).__name__, name))
AttributeError: 'GravesAttention' object has no attribute 'init_win_idx'

I noted there’s another person who mentioned they had to disable this in layers/tacotron.py (Multispeaker development progress) but did not see a resolution there. Also nothing on github issues.

The GravesAttention portion of the common_layers.py doesn’t have the def init_win_idx that the OriginalAttention does (https://github.com/mozilla/TTS/blob/2e2221f146f1ca301a3b2b547d8f26b9009676de/layers/common_layers.py#L230), which I’m guessing is why. I’m not skilled enough to attempt a fix I’d consider reliable, either. Should I just comment the call in layers/tacotron.py (https://github.com/mozilla/TTS/blob/master/layers/tacotron.py#L457) out for now? I can provide more info about my setup if needed.

ETA…

For the time being I’ve commented the line out, and test wavs are working again.