While training my model after some time I am receiving an error as “ValueError: File format b’\x1aE\xdf\xa3’… not understood.”
While i understand there is wav file formatting problem for which i did FFmpeg to convert into 16khz i use to do the same every time and it always worked but receiving this error for the first time. Even i am unable to find which particular wav file causing the problem. Please help!
It’s probably some additional characters in the CSV file that may not be visible when viewed as ASCII text.
An advanced text editor should have the ability to strip this out. I use TextWrangler on the Mac for this (the Zap Gremlins option); I’m not sure what the equivalent Windows or Linux app would be.
I have listened every file and everything sounds good then i ran every file with google speech to text to see audio file error but it didnt show me any error.The help which i got above was of no use when i tried.
Also in error it doesnt give which row or file causing the error and hence not able to solve.Here is the screenshot
The header in the error message (1A 45 DF A3) means the file is a Matroska container file (.mkv, .mka, .webm, etc). Check if all your training files are in the proper format using a tool like file or soxi. Training files for DeepSpeech should be WAVE audio, signed 16 bit PCM, mono, 16000 Hz.