Error while training Common Voice Data

----->>>By running bin/import_cv2.py --filter_alphabet path/to/some/alphabet.txt /path/to/extracted/language/archive I got my respected .tsv files
---->>>While running the command I am getting the error below
./DeepSpeech.py --train_files …/data/CV/en/clips/train.csv --dev_files …/data/CV/en/clips/dev.csv --test_files …/data/CV/en/clips/test.csv

I am using Python 3

Traceback (most recent call last):
File “DeepSpeech.py”, line 962, in
absl.app.run(main)
File “/root/tmp/deepspeech-train-venv/lib/python3.7/site-packages/absl/app.py”, line 299, in run
_run_main(main, args)
File “/root/tmp/deepspeech-train-venv/lib/python3.7/site-packages/absl/app.py”, line 250, in _run_main
sys.exit(main(argv))
File “DeepSpeech.py”, line 935, in main
train()
File “DeepSpeech.py”, line 435, in train
train_phase=True)
File “/root/DeepSpeech/util/feeding.py”, line 98, in create_dataset
df = read_csvs(csvs)
File “/root/DeepSpeech/util/feeding.py”, line 24, in read_csvs
file = pandas.read_csv(csv, encoding=‘utf-8’, na_filter=False)
File “/root/tmp/deepspeech-train-venv/lib/python3.7/site-packages/pandas/io/parsers.py”, line 685, in parser_f
return _read(filepath_or_buffer, kwds)
File “/root/tmp/deepspeech-train-venv/lib/python3.7/site-packages/pandas/io/parsers.py”, line 463, in _read
data = parser.read(nrows)
File “/root/tmp/deepspeech-train-venv/lib/python3.7/site-packages/pandas/io/parsers.py”, line 1154, in read
ret = self._engine.read(nrows)
File “/root/tmp/deepspeech-train-venv/lib/python3.7/site-packages/pandas/io/parsers.py”, line 2059, in read
data = self._reader.read(nrows)
File “pandas/_libs/parsers.pyx”, line 881, in pandas._libs.parsers.TextReader.read
File “pandas/_libs/parsers.pyx”, line 896, in pandas._libs.parsers.TextReader._read_low_memory
File “pandas/_libs/parsers.pyx”, line 950, in pandas._libs.parsers.TextReader._read_rows
File “pandas/_libs/parsers.pyx”, line 937, in pandas._libs.parsers.TextReader._tokenize_rows
File “pandas/_libs/parsers.pyx”, line 2132, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 12, saw 3

You have a hint here. We trained on Common Voice English successfully, can you check your setup ?

This error suggests there is something the pandas parser is choking on.

Can you check line 12 of all of your CSV files ?

client_id–0198410edb39cd3dff7176bf195952d144c2a5ee6d21907a8560c5631472f114

681e5a8844c9de756dc56b66b017a83d239273614f8c82c50642c6abe2af4030 path --common_voice_en_572372.mp3 sentence --YOU WANNA TAKE THIS OUTSIDE? up_votes --2 down_votes --0 age – (empty) gender – (empty) accent --(empty).

This is my 12 th row data in test.tsv. Can you help me wtih this
Thanks for replying lissyx

@saravananselvamohan This is unreadable. Please use code formatting. Also, you shared content of .tsv , this is .csv we want.


@lissyx Can you able to view the data now ??

The sentence field is in CAPS. That’s the only difference I can find

Thanks @lissyx. I understood what the error I had done. Thanks for your time

Well, please explain it to others. Also, don’t use screenshots. Please share raw text output using code formatting …