Final results LPCNet + Tacotron2 (Spanish)

carlfm01 · October 10, 2019, 7:50am

btw you can use the erogol’s notebook to perform a health check

manuel3265 · October 13, 2019, 9:06pm

Thanks @carlfm01 I’ll try. this notebook?

erogol/WaveRNN/blob/master/notebooks/ExtractTTSpectrogram.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a notebook to generate mel-spectrograms from a TTS model to be used for WaveRNN training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "TTS_PATH = \"/home/erogol/projects/\""
   ]
  },
  {
   "cell_type": "code",

This file has been truncated. show original

nmstoker · October 10, 2019, 8:49am

I could be wrong, but I think this noteboook might be what @carlfm01 had in mind: https://github.com/mozilla/TTS/blob/master/dataset_analysis/AnalyzeDataset.ipynb

(it’s linked to here: https://github.com/mozilla/TTS/wiki/Dataset in the wiki)

It lets you look over the dataset to see things like audio length per character, which might highlight instances of bad audio in your dataset

manuel3265 · October 10, 2019, 3:22pm

Thanks for answering @nmstoker.

manuel3265 · October 10, 2019, 5:20pm

Hello @carlfm01.

Indeed they were erroneous audios. Thank you for your suggestion.

Now I receive the following error, I don’t know if you know anything about it:

ResourceExhaustedError (see above for tracing): OOM when assigning tensor with form [32,1024] and type float on / job: localhost / replica: 0 / task: 0 / device: GPU: 0 by allocator GPU_0_bfc
[[node Tacotron_model / inference / decoder / while / CustomDecoderStep / decoder_LSTM / decoder_LSTM / multi_rnn_cell / cell_1 / dropout_1 / random_uniform / RandomUniform (defined in / home / manuel_garcia02 / Tacotron-2 / tacotron / models: 13)
]]
Tip: If you want to see a list of the assigned tensioners when OOM occurs, add report_tensor_allocations_upon_oom to RunOptions to get the current assignment information.
[[node Tacotron_model / clip_by_global_norm / mul_38 (defined in /home/manuel_garcia02/Tacotron-2/tacotron/models/tacotron.py:429)]]
Tip: If you want to see a list of the assigned tensioners when OOM occurs, add report_tensor_allocations_upon_oom to RunOptions to get the current assignment information.

I already placed tacotron_batch_size in 4, but still

carlfm01 · October 10, 2019, 8:51pm

Out of memory
Whats your GPU model?

do not change the batch_size, instead, sort the train.txt generated by preprocess.py and start removing the longest ones from the file.

Try removing a group then try again, and so on if it fails with a OOM.

manuel3265 · October 10, 2019, 8:57pm

I have a NVIDIA Tesla V100. ok i try I will try to do it and I tell you. Thank you.

carlfm01 · October 10, 2019, 9:36pm

You are correct, thanks for adding the link

Thanks, the 16GB or 30GB version?

manuel3265 · October 10, 2019, 10:06pm

@carlfm01 16GB version

manuel3265 · October 13, 2019, 9:06pm

Hello @carlfm01.

I thank you for your answers, I already started with Tacotron training normally. I have another question.

Is it normal for audios generated in tacotron training to be heard like this?

http://www.mediafire.com/file/r7p3ggsbfqrqpud/step-2500-wave-from-mel.wav/file
(‘I’m still a new user’)

Thank you.

carlfm01 · October 11, 2019, 10:34pm

Please share your attention plot, sounds like the attention is broken.

manuel3265 · October 13, 2019, 9:05pm

@carlfm01 this is my attention plot

carlfm01 · October 12, 2019, 3:12am

manuel3265:

feature_extract.sh

mkdir -p /home/manuel_garcia02/LPCNet/spanish/audio/
for i in /home/manuel_garcia02/LPCNet/spanish/s16/*.s16
do
./dump_data -test $i /home/manuel_garcia02/LPCNet/spanish/audio/$(basename “$i” | cut -d. -f1).npy
echo $i
done

header_removal.sh

Can you share a single file created by this script to see if it is correct? Try to use Mozilla Send

What about the silence at the beginning and at the end? long silence can damage the performance

Did you changed something?

manuel3265 · October 12, 2019, 3:55am

@carlfm01 this is an audio.

I can’t upload the file with Mozilla send. I’m still a new user.
https://transfer.sh/5kb14/audio-archivo-156579483968273.npy
I don’t change nothing

carlfm01 · October 12, 2019, 4:27am

audio-archivo-156579483968273.zip (164,8 KB)

I’m able to synthesize your training file, thus your training format is correct, can be about transcription/audio quality, I mean wrong transcriptions or empty audio, like the last one you removed.

manuel3265 · October 12, 2019, 7:43am

The last thing I eliminated, they were audios with phrases too extensive. I used the erogol’s notebook and eliminated audio. these auidos
eliminated, if renian audio, worse ko could be processed, I do not know what it can be. After that, I had already trained tacotron without lpcnet. And this was the result.

Could it be that I took a workout that was already saved, before deleting the long sentences?

I think the problem is to resume training with different files. At this time, I executed a new training from scratch.

later I discuss the attention plot, when the 2 workouts understand at the same level of training

carlfm01 · October 12, 2019, 12:45pm

Ok, let’s wait.

Yes, you need to delete the model trained with the wrong sentences.

manuel3265 · October 12, 2019, 7:44pm

Hello @carlfm01
In fact, my attention plot doesn’t look like it used to. this is the current one.

but the audio is heard with the same noise as before.

carlfm01 · October 12, 2019, 10:18pm

Share the generated feature to test? Looks like it needs silence trimming at the end

manuel3265 · October 12, 2019, 11:20pm

@carlfm01 This is an audio synthesized with tacotron, and processed with LPCNet.

https://transfer.sh/HvMt2/test-out.wav