Sounds good, but hard to tell if it needs more training with just 3 words, share a longer audio?
And the issue? How did you fix it?
Sounds good, but hard to tell if it needs more training with just 3 words, share a longer audio?
And the issue? How did you fix it?
Now I’m trying to adapt a new voice with just 3h using the pretrained model with the two old voices (Tux and Epachuko) with 10k steps
3h.zip (346,7 KB)
the model still needs more training, when I have at least about 25 thousand steps, I will begin to carry out sentences with longer sentences.
The noise audio was an audio generated by tacotron, in the evaluation. these audios are still produced the same.
I appreciate all your support, all that is just your merits.
Yes, you need to use Tacotron_model/inference/add
as output name
Use the DM of the forum?
hi @carlfm01! I was trying to runn synthesize.py from your tacotron-2 fork using your checkpoints, but looks like the tacotron_checkpoints are broken for me. Here is what I did:
./checkpoint01
folderThe checkpoints loads correctly:
Loading checkpoint: ./checkpoints01/tacotron_model.ckpt-55000
INFO:tensorflow:Restoring parameters from ./checkpoints01/tacotron_model.ckpt-55000
But than I have some missing variable errors:
NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
2 root error(s) found.
(0) Not found: Key Tacotron_model/inference/decoder/Location_Sensitive_Attention/attention_bias_1 not found in checkpoint
[[node save_5/RestoreV2 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
(1) Not found: Key Tacotron_model/inference/decoder/Location_Sensitive_Attention/attention_bias_1 not found in checkpoint
[[node save_5/RestoreV2 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[GroupCrossDeviceControlEdges_0/save_5/restore_all/_8]]
This is a full minimal repro notebook of what I am trying to do: https://colab.research.google.com/drive/1Ys6oWXIRUnGDYUVWYppiJXFOOUTHT-JN
Hello @Solbiati_Alessandro, those checkpoints are old, please try with https://drive.google.com/file/d/1JSC0jbdnOi4igCYTnDBdMGXIsp2VeKj9/view and the newspanish branch. For the LCPNet the old one will work.
Hello @carlfm01. Thank you very much for your detailed tutorial steps. However, I am not sure why it is necessary to copy the LPCNet-compressed wavs (the f32s) into the audio folder of the tacotron training data (steps 5 and 7 of your summary). Surely Tacotron only converts from text to MFCCs?
No, for the LPCNet we need to train Tacotron with the real features extracted by the LPCNet extractor, that’s why you need to put the extracted features into the audio directory.
Once Tacotron is trained you can predict from text to LPC features that we can feed into LPCNet to generate the actual .wav for the predicted features.
Thank you.
What about training LPCNet. You suggest using the same training data as with Tacotron. However, with dump_data a single audio file takes 10 min to process with dump_data and produces 4gb of files…
Hello carlosfm, Thanks for the contribution you make, I am trying to test in Google Colab, and I get this error, how do I correct it ?:
/tensorflow-1.15.0/python3.6/tensorflow_core/python/training/saving/saveable_object_util.py in op_list_to_dict(op_list, convert_variable_to_tensor)
291 if name in names_to_saveables:
292 raise ValueError(“At least two variables have the same name: %s” %
–> 293 name)
294 names_to_saveables[name] = var
295
ValueError: At least two variables have the same name: Tacotron_model/Tacotron_model/inference/decoder/Location_Sensitive_Attention/attention_bias/Adam
Hello carlfm, thank you very much for sharing your work, I am new to this topic and I would like to know how to use your model (55k steps) in the new branch (new_spanish) to synthesize sentences in Spanish, because with the old model (47.5k ) returns audio with only noise. Thanks a lot.