Problem when training my own model

Hi all

I would like to train my own model based on open source data more exactely “https://commonvoice.mozilla.org/data”.

When i launch bin/import_cv2.py --filter_alphabet path/to/some/alphabet.txt /path/to/extracted/language/archive, I have faced the problem
FileNotFoundError: [Errno 2] No such file or directory: ‘fr/clips/train-all.csv’

any solution please ?

2 Likes

Please read the script to understand more, this is still work in progress. And as usual, read the output. It looks in the “fr” directory, which should work for French data. But as you didn’t supply us with more info, I don’t know whether this should be fine:

Thank you for your reply. My problem is that the downloaded data does not contain the file train-all.csv in the clips folder and there is no idea how to create this file (it is not mentionned in the doc

1 Like

Try audiomate, it takes several inputs and can produce valid DeepSpeech data.

Or read the import script, either way it is a bit of work. The train is just a csv with all the files for training, but you also need a dev.

Is that the exact command line you used?

Exactely i have used bin/import_cv2.py /path/to/extracted/language/archive
when i have replaced the path by the french downloaded model.

i have no idea about the alphabet.txt. it does not exists in the downloaded model

You obviously have not read the documentation. You need to generate it.
Please, it really looks like you should rely on https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/Dockerfile.train if you really want to train a french model, or directly use pre-trained ones.