Normalisation (sound_norm)

@erogol I see in the recent PR you’ve added several new features from dev, including “sound_norm” in the AudioProcessor.

From what I can see this normalises the output audio when it’s loaded, but it isn’t (yet) set in config. This simply normalises the volume level right, whereas the spectrogram normalisation ensures the spectrogram is a consistent size, right?

Is sound_norm ready for people to have a go experimenting with different values for it? Happy to give it a shot but my GPU resources are limited so thought I’d ask first :slightly_smiling_face:

Do you mean input? I’ve been trying to normalize input audio volume to have a better training phase. Also, is normalising the spectrogram better than normalising the audio? I didn’t quite get that bit you stated about that.

Sorry, yes the input audio.

I’m probably not the best person to explain the spectrogram normalisation, but it was the code for functions like this that ends with a normalisation step (that appears to be on the spectrogram but I could be reading that wrongly!)

1 Like

Ah interesting! I’ll invest some time on it tomorrow. Also interested in knowing if the volume normalization is ready to be used.

Yes sound norm is ready to use, I just forgot to put it to config.json. Adn all your assumptions are correct.

‘sound_norm’ is useful especially for multi-speaker training.

PS. we need to implement a better config scheme. ‘config.json’ is getting cumbersome to use.

1 Like