I have created a speech dataset to train with DeepSpeech while following this( Creating an open speech recognition dataset for (almost) any language | by Andreas Klintberg | Medium ) tutorial.
But, I couldn’t trained my dataset with deepspeech. It gives an error as a result of train command like
python DeepSpeech.py --train_files /mnt/c/wsl/teneke_out_bolum1/
It gives:
pandas.errors.ParserError: Error tokenizing data. C error: Calling >read(nbytes) on source failed. Try engine=‘python’.
I have created dataset after aeneas force allignment and fine tuning with finetuneas:
Here is my code that I used on Google Colab to train with DeepSpeech:
I found some solutions on Google like
data = pd.read_csv('file1.csv', error_bad_lines=False)
Also as error output, I may solve with setting
engine=‘python’
But, I couldn’t figure out where I should change.
So, where should I edit to fix this issue.
Thanks.