Certainly Nikhil (but please don’t judge my code too harshly )
The steps are:
- Get the files down from AWS to somewhere local
- Run the import script to convert them from .mp3 to .wav and generate the .csv files
- Run the training script
For 1, I used AWS CLI: https://github.com/aws/aws-cli
You need to set up your credentials so it stores them locally then you can just navigate to a download folder, then run something equivalent to this:
aws s3 sync s3://your-voice-web-bucket .
You’ll see a whole load of your files download (very quickly if your experience is anything ike mine)
For 2 I used a script I’d cobbled together mainly from the other import scripts. The gist is here:
You run something equivalent to:
python import_s3_files.py ../your-local-voice-web-bucket-folder/ ./data
That walks your local bucket folder, going through the paired up Common Voice transcripts and mp3 files cleaning up the text of the former and converting the latter into .wav files in a data folder, then creating a .csv file for each of training, dev and test (in that same data folder)
NB: one problem with my bucket is a handful of transcript files w/o corresponding .mp3 files - I should clean them up, but for now I just delete those transcripts after I sync.
For step 3 I run this script which is based on the other examples provided: https://gist.github.com/nmstoker/780bbf16a199007e3dff594f22e36d04
So far I’m getting fairly good results but I need to create more Common Voice records (I’ve done about 1,800 or so) and I’ve no doubt got lots to learn about how best to tweak the DeepSpeech settings
I hope that helps - it’s a start, but there’s a lot that could be improved (easily!) Big thanks to the Mozilla teams for making both Common Voice and DeepSpeech so awesome!!