Final results LPCNet + Tacotron2 (Spanish)

btw you can use the erogol’s notebook to perform a health check

Thanks @carlfm01 I’ll try. this notebook?

I could be wrong, but I think this noteboook might be what @carlfm01 had in mind: https://github.com/mozilla/TTS/blob/master/dataset_analysis/AnalyzeDataset.ipynb

(it’s linked to here: https://github.com/mozilla/TTS/wiki/Dataset in the wiki)

It lets you look over the dataset to see things like audio length per character, which might highlight instances of bad audio in your dataset

1 Like

Thanks for answering @nmstoker.

Hello @carlfm01.

Indeed they were erroneous audios. Thank you for your suggestion.

Now I receive the following error, I don’t know if you know anything about it:

ResourceExhaustedError (see above for tracing): OOM when assigning tensor with form [32,1024] and type float on / job: localhost / replica: 0 / task: 0 / device: GPU: 0 by allocator GPU_0_bfc
[[node Tacotron_model / inference / decoder / while / CustomDecoderStep / decoder_LSTM / decoder_LSTM / multi_rnn_cell / cell_1 / dropout_1 / random_uniform / RandomUniform (defined in / home / manuel_garcia02 / Tacotron-2 / tacotron / models: 13)
]]
Tip: If you want to see a list of the assigned tensioners when OOM occurs, add report_tensor_allocations_upon_oom to RunOptions to get the current assignment information.
[[node Tacotron_model / clip_by_global_norm / mul_38 (defined in /home/manuel_garcia02/Tacotron-2/tacotron/models/tacotron.py:429)]]
Tip: If you want to see a list of the assigned tensioners when OOM occurs, add report_tensor_allocations_upon_oom to RunOptions to get the current assignment information.

I already placed tacotron_batch_size in 4, but still

Out of memory :confused:
Whats your GPU model?

do not change the batch_size, instead, sort the train.txt generated by preprocess.py and start removing the longest ones from the file.

Try removing a group then try again, and so on if it fails with a OOM.

I have a NVIDIA Tesla V100. ok i try I will try to do it and I tell you. Thank you.

You are correct, thanks for adding the link :slight_smile:

Thanks, the 16GB or 30GB version?

@carlfm01 16GB version

Hello @carlfm01.

I thank you for your answers, I already started with Tacotron training normally. I have another question.

Is it normal for audios generated in tacotron training to be heard like this?

http://www.mediafire.com/file/r7p3ggsbfqrqpud/step-2500-wave-from-mel.wav/file
(‘I’m still a new user’) :cry:

Thank you.

Please share your attention plot, sounds like the attention is broken.

@carlfm01 this is my attention plot

Can you share a single file created by this script to see if it is correct? Try to use Mozilla Send

What about the silence at the beginning and at the end? long silence can damage the performance

Did you changed something?

@carlfm01 this is an audio.

I can’t upload the file with Mozilla send. I’m still a new user.
https://transfer.sh/5kb14/audio-archivo-156579483968273.npy
I don’t change nothing

audio-archivo-156579483968273.zip (164,8 KB)

I’m able to synthesize your training file, thus your training format is correct, can be about transcription/audio quality, I mean wrong transcriptions or empty audio, like the last one you removed.

The last thing I eliminated, they were audios with phrases too extensive. I used the erogol’s notebook and eliminated audio. these auidos
eliminated, if renian audio, worse ko could be processed, I do not know what it can be. After that, I had already trained tacotron without lpcnet. And this was the result.

Could it be that I took a workout that was already saved, before deleting the long sentences?

I think the problem is to resume training with different files. At this time, I executed a new training from scratch.

later I discuss the attention plot, when the 2 workouts understand at the same level of training

Ok, let’s wait.

Yes, you need to delete the model trained with the wrong sentences.

Hello @carlfm01
In fact, my attention plot doesn’t look like it used to. this is the current one.

but the audio is heard with the same noise as before.

Share the generated feature to test? Looks like it needs silence trimming at the end

@carlfm01 This is an audio synthesized with tacotron, and processed with LPCNet.

https://transfer.sh/HvMt2/test-out.wav