Normalisation (sound_norm)

nmstoker · October 30, 2019, 2:42pm

@erogol I see in the recent PR you’ve added several new features from dev, including “sound_norm” in the AudioProcessor.

From what I can see this normalises the output audio when it’s loaded, but it isn’t (yet) set in config. This simply normalises the volume level right, whereas the spectrogram normalisation ensures the spectrogram is a consistent size, right?

Is sound_norm ready for people to have a go experimenting with different values for it? Happy to give it a shot but my GPU resources are limited so thought I’d ask first

alchemi5t · October 31, 2019, 4:09am

Do you mean input? I’ve been trying to normalize input audio volume to have a better training phase. Also, is normalising the spectrogram better than normalising the audio? I didn’t quite get that bit you stated about that.

nmstoker · October 31, 2019, 10:37am

Sorry, yes the input audio.

nmstoker · October 31, 2019, 1:38pm

I’m probably not the best person to explain the spectrogram normalisation, but it was the code for functions like this that ends with a normalisation step (that appears to be on the spectrogram but I could be reading that wrongly!)

github.com

mozilla/TTS/blob/50088cbf3ba60b139692fa1666bda9f116980cad/utils/audio.py#L150


    return scipy.signal.lfilter([1], [1, -self.preemphasis], x)



def spectrogram(self, y):

    if self.preemphasis != 0:

        D = self._stft(self.apply_preemphasis(y))

    else:

        D = self._stft(y)

    S = self._amp_to_db(np.abs(D)) - self.ref_level_db

    return self._normalize(S)



def melspectrogram(self, y):

    if self.preemphasis != 0:

        D = self._stft(self.apply_preemphasis(y))

    else:

        D = self._stft(y)

    S = self._amp_to_db(self._linear_to_mel(np.abs(D))) - self.ref_level_db

    return self._normalize(S)



def inv_spectrogram(self, spectrogram):

    """Converts spectrogram to waveform using librosa"""

    S = self._denormalize(spectrogram)

alchemi5t · October 31, 2019, 2:00pm

Ah interesting! I’ll invest some time on it tomorrow. Also interested in knowing if the volume normalization is ready to be used.

erogol · October 31, 2019, 3:35pm

Yes sound norm is ready to use, I just forgot to put it to config.json. Adn all your assumptions are correct.

‘sound_norm’ is useful especially for multi-speaker training.

PS. we need to implement a better config scheme. ‘config.json’ is getting cumbersome to use.