TestFigures doesn't align

snake · April 1, 2020, 9:06am

I suspect that it might happen because of difference symbols of config.json and symbols.py

config.json:

"characters": "АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёЁ",

symbols.py:

_characters = 'АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёЁ!\'(),-.:;? '

sanjaesc · April 1, 2020, 9:25am

You can ignore the symbols.py if you are using characters in config.json. If you inspect symbols.py there is a make_symbols functions, which builds the symbols from the config.json. And since you have

    "characters":"АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёЁ",
    "punctuations":"!'(),-.:;? ",

in your config.json, it should work just fine.

I think the problem here is that you are using a multispeaker dataset but you have speaker embeddings disabled… see config “use_speaker_embedding”: false.
Maybe you should try to train a single speaker dataset first and see if that works out for you.

snake · April 1, 2020, 9:34am

Can’t ignore symbols.py because it’s used during dataset loading. If I leave symbols.py with english chars it training won’t start because input will be empty…

sanjaesc · April 1, 2020, 9:48am

I mean you dont need to modify symbols.py because symbols.py is using the characters from the config.json

snake · April 1, 2020, 12:29pm

Ok. Cloned tts repo again. Started training again with minor changes in code. Haven’t changed symbols.py file. We will see if it work out

snake · April 2, 2020, 10:00am

Here is screenshots of new training.

Is it ok? Shall I wait further?

snake · April 3, 2020, 5:35am

After 100,000 steps.

As I understood the model trained good, but synthesize code might have issue to generate test snippets. I’ll look thru this code

snake · April 3, 2020, 5:41am

Logs during synthesizing

tts_1  |  | > Synthesizing test sentences
tts_1  |    | > Decoder stopped with 'max_decoder_steps
tts_1  |    | > Decoder stopped with 'max_decoder_steps
tts_1  |    | > Decoder stopped with 'max_decoder_steps
tts_1  |    | > Decoder stopped with 'max_decoder_steps

sanjaesc · April 3, 2020, 7:55am

Are you still training on the common_voice dataset? I listened to some of the audio and would say the quality is not fit to train a tts system because of the noise in it.

dkreutz · April 3, 2020, 8:25am

rnnoise removes a good amount of background noises (computer fan, electric hum, etc.).

snake · April 3, 2020, 8:59am

Is there any other options for russian? I wouldn’t say clips in common voice are noisy. At least in russian dataset…

erogol · April 3, 2020, 10:18am

training above looks broken. There might be a mismatch or a different problem in test time.

erogol · April 3, 2020, 10:19am

have you tried rnnnoise for LJSpeech?

snake · April 3, 2020, 10:37am

Here is example of dataset clips
https://drive.google.com/open?id=1qlKYT6W4izLPzhw5UTFDZY1ftOxIsbOJ

dkreutz · April 3, 2020, 1:16pm

I have applied rnnoise to the dataset for the german dataset spoken by @mrthorstenm
In addition I apply filtering with sox: highpass at 100Hz and lowpass at 7000

Listening to @snake examples I think it wouldn’t hurt to apply rnnoise there.

mrthorstenm · April 3, 2020, 2:05pm

The optimization made by @dkreutz to my recordings really improved the quality precisely. Lots of my recordings had random noise or reverb problems before.

snake · April 3, 2020, 5:23pm

How can I use rnnoise? It seems rnnoise_demo common_voice_ru_18849003.wav out.wav doesn’t work for me.

>ffmpeg -i ./common_voice_ru_18849003.wav
Input #0, wav, from './common_voice_ru_18849003.wav':
  Metadata:
    encoder         : Lavf57.83.100
  Duration: 00:00:02.81, bitrate: 353 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s

> ffmpeg -i out.wav
out.wav: Invalid data found when processing input

dkreutz · April 4, 2020, 10:41pm

rnnoise_demo requires raw audio, not wav. I have created a python-script for rnnoise&sox filtering for a preprocessing toolchain (not the nicest programming, but does its job):

from pathlib import Path
import subprocess

src = "/path/to/your/wav/files/"

rnn = "/path/to/rnnoise_demo"

paths = Path(src).glob("*.wav")

i = 0

for filepath in paths:
    i += 1
    print(str(filepath))
    filename = str(filepath).split("/")[-1]
    #print(filename)
    subprocess.run(["sox", filepath, "48k.wav", "remix", "-", "rate", "48000"]) #stereo to mono and upsample to 48000Hz
    subprocess.run(["sox", "48k.wav", "-c", "1", "-r", "48000", "-b", "16", "-e", "signed-integer", "-t", "raw", "temp.raw"]) # convert wav to raw
    subprocess.run([rnn, "temp.raw", "rnn.raw"]) # apply rnnoise
    subprocess.run(["sox", "-r", "48k", "-b", "16", "-e", "signed-integer", "rnn.raw", "-t", "wav", "rnn.wav"]) # convert raw back to wav 
    subprocess.run(["rm", "-f", str(filepath)]) # overwrite file at destination
    subprocess.run(["sox", "rnn.wav", str(filepath), "remix", "-", "highpass", "100", "lowpass", "7000", "rate", "22050"]) # apply high/low pass filter and change sr to 22050Hz
    #if i == 10: # for testing stop after ten files
    #    break

print("total: ", i)

baconator · April 5, 2020, 5:14am

I use pydub and https://github.com/Shb742/rnnoise_python
for doing rnnoise in bulk/scripts, but your way also translates easily to bash.

snake · April 6, 2020, 2:31pm

Thanks man. Your script is great. It improved alignment a bit but still there is no diagonal line. I just wonder why there is diagonal for val and train, but no for test? Isn’t diagonal for validation is proof of good model?