Final results LPCNet + Tacotron2 (Spanish)

carlfm01 · September 20, 2019, 8:28am

With a good explanation yes, otherwise I think it may confuse people?

No, instead of 80d mel we just need 20d which in this case will be the “cepstral features” the LPC features are computed out of the cepstral features we feed.

Quick summary:

We need to train LPCNet using raw pcm 16KHz 16bit mono without header, all the audios concatenated in a single file, there’s scripts to do so in the fork.
We compile dump_data to extract the features out of the audio.
Now we train LPCNet with the extracted features.
Now that the trained is complete, we use taco=1 to compile dump_data.
Extract the features we need for taco using the compiled dumb_data, just like step 2.
Now with the taco fork we preprocess the dataset.
Now we replace the audio folder generated by the preprocess step with the extracted features.
Adapt your feeder if it is broken to match the names of the features.
Train tacotron as usual.
To predict we reshape the features from taco save them to a file.
With the tool dump_lpcnet and the name of the trained model we extract the network weight into 2 files ‘nnet_data.c and .h’.
Move them into src of LPCNet and with do make test_lpcnet taco=1
With the compiled test_lpcnet we feed the name of the file predicted using tacotron and the output name to save the raw pcm.

It looks hard but once you puts your hands on, you will understand.

Forks used for my trainings :

The readme is easy to follow.

And for taco2 : (spanis branch)

It needs cleanup.

Most important change for taco the hparams and :