Noise injection training experiment

Hey I’m planning to do an experiment with DeepSpeech version v0.8.2, using three different magnitudes of add augmentation in the training. Then comparing the three models for three test sets containing again three different amounts of noise.

As I am using google collab I have limited disk space and I can’t use the 50gb common voice dataset for training.

My question is - Can I somehow split that dataset so I’m only using for example 10 gb of the data in it? .

Also the testing of the models should be done with inference or as the test_files while training ?

Furthermore, for testing the models, do I search for datasets which already have noise in it or should i create that from a clean dataset.

Thanks in advance!

Look at the raw data, you have a csv-file that links to all the audios. So you can write a script that randomly selects files up to the limit you need.

Testing has (almost) nothing to do with training. You can train your models and test them afterwards or use classic inference with some custom testing method.

Good question, ideally you find enough tracks that are somewhat noisy. Or you create them artifiicially. @lissyx any ideas on that one?

nope, sorry, no idea

So I guess for testing you have to experiment a bit. My guess: take the best maybe few different noise types and run some testing to find out what your dataset can handle.

Thank you for the quick response!!

Okay I will try to do that and download the dataset to my computer instead of google collab then upload only the new limited dataset created.

Question about this - On google collab, I would download it with !wget I don’t know if it’s possible but if I could limit the amount of Gb to download and get maybe only 15gb done from the en.tar.gz file would I be able to use it for training? Or it would definitely be missing something important?

We really can’t help on third party tooling: we don’t use it, we have no experience on it.

from what I understand, you would just get a file that cannot be extracted.

I see thank you! I will try splitting the dataset then and see how it will go.

Oh also , if i were to do the experiment with a different language would that lead to more inconsistent data because of how the language is structured itself? If not could I just do the experiment with any language, as I am only interested in the correlation of noisy speech recognition with differing noise values added into the training data

Why don’t you use the LJ Speech dataset instead. It is clean to start with and you can add noise as you please.

https://keithito.com/LJ-Speech-Dataset/

Oh nice thank you , can i simply just use the !bin/import_cv2.py on this ? or do I have to do processing of the data myself?

Hm, check the other import scripts as it is used for testing. @lissyx do you know which import script might be suitable for LJ Speech?

Sorry, but I dont think we have an importer. @arpi.aszalos if you want to write one, don’t hesitate, it’s not super complicated, and you could send a PR for it

Thanks for answering! I did have a look at the import scripts , but I don’t think I can make an importer as I am quite new to programming still, most of the code doesn’t make much sense to me.

You might be interested in this pull request: https://github.com/mozilla/DeepSpeech/pull/2622. It did allow you to mix noise/speech online into your testset. But it’s somewhat older (around version 0.7) and not continued anymore.

I already did some noise tests before, you can find my setup steps here: https://gitlab.com/Jaco-Assistant/deepspeech-polyglot#download-and-prepare-noise-data and the results in the tables below.

1 Like

Yes thank you @dan.bmh I started reading , however i wouldnt use the background noise augmentation. Maybe im misunderstanding, but Add augmentation adds random noise to the training data in forms of numbers to the spectograms?

I will check the pull and your setup thanks for reaching out.

You think i could get data showing some correlation if I train it for only 3-4 hours? (using gpu)
For the voxforge dataset english

Both, the current DeepSpeech master and the pull request have flags to augment with noise audio (using standard csv format), it’s called overlay augmentation in master.

I did run my tests with voxforge german dataset (~32h) and I could see some improvement on tests with noise if I also trained with noise (0.43->0.37 WER)

@dan.bmh did you do an inference test in addition with a different noisy german dataset or this is just from the test.csv file after training is done? Sorry for asking this many questions, but do you know by chance how much data is in english voxforge dataset? I can’t seem to find it anywhere Thanks.

I did use voxforge testset and I also splitted my voice dataset into train/dev/test

No idea. If you use the preparation steps from my repository, it will print you the length (while dataset cleaning).