Data Augmentation clarification

Hello everyone,
After using augmentation flags i noticed that my steps per epoch remained the same.
Does that mean that each augmented audio (together with the original one) are merged into a batch or am i missing the point of augmenting the data ?

The used flags are :
–data_aug_features_additive=0.3
–data_aug_features_multiplicative=0.3
–augmentation_freq_and_time_masking=True
–augmentation_speed_up_std=0.3
–augmentation_pitch_and_tempo_scaling=True

Yes. Data augmentation flags are used to transform some of your data without creating more data. These flags are usually set when you have enough data and you want to generalize even more to get better results at tests.

I Understand, but my question was wether the sound files are duplicated and merged into a batch or is there a different mechanism ? What happens with the original file? Is the model trained with that too ? How does that work ? @lissyx @reuben

Thank you in advance

It augments the data at runtime in memory and does not save the augmented audio to disk

@kdavis: Is there any link where I can find the best hyper-parameters for all augmentation flags?
I tried augmentation with these flags but have a low success.

AUG_AUDIO="--data_aug_features_additive 0.2 --data_aug_features_multiplicative 0.2 --augmentation_speed_up_std 0.2" AUG_FREQ_TIME="--augmentation_freq_and_time_masking --augmentation_freq_and_time_masking_freq_mask_range 5 --augmentation_freq_and_time_masking_number_freq_masks 3 --augmentation_freq_and_time_masking_time_mask_range 2 --augmentation_freq_and_time_masking_number_time_masks 3" AUG_PITCH_TEMPO="--augmentation_pitch_and_tempo_scaling --augmentation_pitch_and_tempo_scaling_min_pitch 0.95 --augmentation_pitch_and_tempo_scaling_max_pitch 1.2 --augmentation_pitch_and_tempo_scaling_max_tempo 1.2" AUG_SPEC_DROP="--augmentation_spec_dropout_keeprate 0.9"

Augmentation hasn’t been widely tested and is still considered experimental. So you’ll have to experiment and see what works best for your dataset.

1 Like

Coming back to this topic after a while.
Is there a way to know how much of my dataset will be augmented ?
Is there a percentage ?

@lissyx
I have a question in overlay augmentation. what is the format of csv which I will give to this augmentation :

 --augment overlay[p=0.5,source=noise.csv,layers=1,snr=50:20~10]

I mean this source argument. is a csv file with wav_filename column and relative address to my noises?
Or I am totally wrong?

any one? I get error when I try to use this(overlay) augmentation