TestFigures doesn't align

I suspect that it might happen because of difference symbols of config.json and symbols.py

config.json:

"characters": "АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёЁ",

symbols.py:

_characters = 'АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёЁ!\'(),-.:;? '

You can ignore the symbols.py if you are using characters in config.json. If you inspect symbols.py there is a make_symbols functions, which builds the symbols from the config.json. And since you have

    "characters":"АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёЁ",
    "punctuations":"!'(),-.:;? ",

in your config.json, it should work just fine.

I think the problem here is that you are using a multispeaker dataset but you have speaker embeddings disabled… see config “use_speaker_embedding”: false.
Maybe you should try to train a single speaker dataset first and see if that works out for you.

Can’t ignore symbols.py because it’s used during dataset loading. If I leave symbols.py with english chars it training won’t start because input will be empty…

I mean you dont need to modify symbols.py because symbols.py is using the characters from the config.json

Ok. Cloned tts repo again. Started training again with minor changes in code. Haven’t changed symbols.py file. We will see if it work out

Here is screenshots of new training.




Is it ok? Shall I wait further?

After 100,000 steps.


As I understood the model trained good, but synthesize code might have issue to generate test snippets. I’ll look thru this code

Logs during synthesizing

tts_1  |  | > Synthesizing test sentences
tts_1  |    | > Decoder stopped with 'max_decoder_steps
tts_1  |    | > Decoder stopped with 'max_decoder_steps
tts_1  |    | > Decoder stopped with 'max_decoder_steps
tts_1  |    | > Decoder stopped with 'max_decoder_steps

Are you still training on the common_voice dataset? I listened to some of the audio and would say the quality is not fit to train a tts system because of the noise in it.

1 Like

rnnoise removes a good amount of background noises (computer fan, electric hum, etc.).

Is there any other options for russian? I wouldn’t say clips in common voice are noisy. At least in russian dataset…

training above looks broken. There might be a mismatch or a different problem in test time.

have you tried rnnnoise for LJSpeech?

Here is example of dataset clips
https://drive.google.com/open?id=1qlKYT6W4izLPzhw5UTFDZY1ftOxIsbOJ

I have applied rnnoise to the dataset for the german dataset spoken by @mrthorstenm
In addition I apply filtering with sox: highpass at 100Hz and lowpass at 7000

Listening to @snake examples I think it wouldn’t hurt to apply rnnoise there.

2 Likes

The optimization made by @dkreutz to my recordings really improved the quality precisely. Lots of my recordings had random noise or reverb problems before.

How can I use rnnoise? It seems rnnoise_demo common_voice_ru_18849003.wav out.wav doesn’t work for me.

>ffmpeg -i ./common_voice_ru_18849003.wav
Input #0, wav, from './common_voice_ru_18849003.wav':
  Metadata:
    encoder         : Lavf57.83.100
  Duration: 00:00:02.81, bitrate: 353 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s

> ffmpeg -i out.wav
out.wav: Invalid data found when processing input

rnnoise_demo requires raw audio, not wav. I have created a python-script for rnnoise&sox filtering for a preprocessing toolchain (not the nicest programming, but does its job):

from pathlib import Path
import subprocess

src = "/path/to/your/wav/files/"

rnn = "/path/to/rnnoise_demo"

paths = Path(src).glob("*.wav")

i = 0

for filepath in paths:
    i += 1
    print(str(filepath))
    filename = str(filepath).split("/")[-1]
    #print(filename)
    subprocess.run(["sox", filepath, "48k.wav", "remix", "-", "rate", "48000"]) #stereo to mono and upsample to 48000Hz
    subprocess.run(["sox", "48k.wav", "-c", "1", "-r", "48000", "-b", "16", "-e", "signed-integer", "-t", "raw", "temp.raw"]) # convert wav to raw
    subprocess.run([rnn, "temp.raw", "rnn.raw"]) # apply rnnoise
    subprocess.run(["sox", "-r", "48k", "-b", "16", "-e", "signed-integer", "rnn.raw", "-t", "wav", "rnn.wav"]) # convert raw back to wav 
    subprocess.run(["rm", "-f", str(filepath)]) # overwrite file at destination
    subprocess.run(["sox", "rnn.wav", str(filepath), "remix", "-", "highpass", "100", "lowpass", "7000", "rate", "22050"]) # apply high/low pass filter and change sr to 22050Hz
    #if i == 10: # for testing stop after ten files
    #    break

print("total: ", i)
2 Likes

I use pydub and https://github.com/Shb742/rnnoise_python
for doing rnnoise in bulk/scripts, but your way also translates easily to bash.

1 Like

Thanks man. Your script is great. It improved alignment a bit but still there is no diagonal line. I just wonder why there is diagonal for val and train, but no for test? Isn’t diagonal for validation is proof of good model?