Preprocessing data

I collected Indian accent dataset and put all the .mp3 files into one directory and prepared all the csv files having two columns [path,sentence], sollowing in the common voice dataset structure. But with the import_cv2.py it did not work. It ended up creating just one train-all.csv file in the clips folder. Really need help with this.

Check the import python script, and you’ll find out. And please follow these guidelines for posting:

https://discourse.mozilla.org/t/what-and-how-to-report-if-you-need-support/62071/2

Thanks for the reply. But I am unable to understand where I can find that. Would you care to explain sir ?

You’ll need to check the import script yourself, it usually works. But use one from a release not the latest master as there may be some changes.

okay sir, will do. Can you please confirm that the csv file attributes are sufficient i.e if there is any other column that is necessary apart from ‘path’ and ‘sentence’ ?

Please follow the guide I provided. You are not posting the actual csv, you are posting images. I can only guess that you did not read the guidelines … sorry, can’t help

files.zip (212.6 KB)
I am sorry sir the uploading part did not support tsv files. I have attached a zip file that contains the current error and the tsv file structure. My dataset contains .wav files and the tsv file contains 3 fields named client_id, path and sentence. Please have a look. Thanks for your kind vigilance.

Deepspeech version 0.7.1
Ubuntu 18.04
Intel i5 8th gen
RAM 16GB
TensorFlow version 1.14.0

Format in general looks fine to me, you’ll have to debug the import script to find out whats wrong with your script.

So you re-used the script for something completely different that you built?

You don’t seem to have carefully read the documentation, training with 0.7 requires tensorflow 1.15.

You can’t expect us to be of any help if you don’t elaborate more on how you

This is not actionable:

  • screenshots are not helpful
  • you only share one test.tsv
  • your error in the screenshot is completely different from what you report in this thread, and it’s your file that is not existent.