“bin/import_cv2.py --filter_alphabet data/alphabet.txt deepspeech-data/cv-corpus-12.0-delta-2022-12-07-en/en/”
or the importer is generating three csv files - train-all.csv, other.csv, validated.csv
It is not generating the train.csv, dev.csv, test.csv files. is this a problem or should i manually split the train-all.csv into train, dev, test files.
TIA
The DeepSpeech bin/import_cv2.py
script is responsible for importing and preprocessing the Common Voice dataset. By default, it generates three CSV files: train-all.csv
, other.csv
, and validated.csv
. These files contain different subsets of the dataset.
The train-all.csv
file contains all the available training data, while other.csv
and validated.csv
contain data that can be used for validation or testing purposes. However, these files do not provide a predefined split for training, development, and testing.
If you want to follow a specific split for training, development, and testing, you will need to manually split the train-all.csv
file into the desired subsets. Typically, this involves randomly dividing the data into three sets: training, development, and testing.