Error while training the dataset

Preetam_XD · February 24, 2021, 3:49pm

cv-invalid
cv-other-dev
cv-other-test
cv-other-train
cv-valid-dev
cv-valid-test
cv-valid-train
LICENSE.txt
README.txt
cv-invalid.csv
cv-other-dev.csv
cv-other-test.csv
cv-other-train.csv
cv-valid-dev.csv
cv-valid-test.csv
cv-valid-train.csv

my dataset is in this format which I downloaded it from kaggle
link for dataset:https://www.kaggle.com/mozillaorg/common-voice

when I try to run the training command
!./DeepSpeech.py -train_batch_size 128 --dev_batch_size 128 --test_batch_size 128 --drop_source_layers 2 --show_progressbar True --alphabet_config_path /content/DeepSpeech/data/alphabet.txt --train_files /content/cv-valid-train.csv --dev_files /content/cv-valid-dev.csv --test_files /content/cv-valid-test.csv --epochs 1 --export_dir /content/drive/MyDrive/common_voice_eng/Output --checkpoint_dir /content/drive/MyDrive/common_voice_eng/checkpoint --load_cudnn

it shows error as

raise RuntimeError(‘No transcript data (missing CSV column)’)
RuntimeError: No transcript data (missing CSV column)

lissyx · February 24, 2021, 3:51pm

This is old content, please use actual content and perform import as documented using import_cv2.py

lissyx · February 24, 2021, 4:00pm

We can’t help about that, sorry.

Preetam_XD · February 25, 2021, 7:09am

Will common voice corpus 1 dataset work for deepspeech version 0.9.3?

othiele · February 25, 2021, 8:42am

Yes, format of CV is the same, but less material. But if you can’t get hold of a server that can hold this amount of data, it will be hard for you to do anything useful. Look at the importers, maybe another dataset is better suited for you?

Preetam_XD · February 26, 2021, 6:54pm

I have a doubt can I train common voice dataset on windows 10 CMD?
I tried running some of the commands from deepspeech train your own model but they aren’t working.
Is there any way I can train it on windows?

othiele · February 27, 2021, 4:16pm

Training on Windows is really hard. Try Google Colab if you don’t have a Linux server.

Preetam_XD · February 27, 2021, 4:21pm

Yes but how do I import dataset on Google colab because the common voice dataset is too big.

NanoNabla · February 27, 2021, 7:42pm

There are many google hits for “working with big datasets in Google Colab” e.g. upload it to your Google drive.
If you have a precise problem on running DeepSpeech you may find help here but this is not the regular Colab support.