At that point, it means there’s something wrong with your specific data.
I suspect this may be the case after this troubleshooting. I am writing a script right now that runs through all the audio files and checks they can be read by the python wave
module.
We don’t even have a proper stack here, we can hardly see where it comes from.
It’s the same stack trace I posted before only failing somewhere else (Probably because of threading):
I Initializing variables...
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 Fatal Python error: Segmentation fault
Thread 0x0000700007cf7000 (most recent call first):
File "/Users/allabana/.virtualenvs/test-ds-1/lib/python3.7/site-packages/numpy/core/numeric.py", line 501 in asarray
File "/Users/allabana/.virtualenvs/test-ds-1/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 178 in _convert
File "/Users/allabana/.virtualenvs/test-ds-1/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 217 in <listcomp>
File "/Users/allabana/.virtualenvs/test-ds-1/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209 in __call__
Thread 0x./bin/train_deepspeech.sh: line 36: 7505 Segmentation fault: 11
Can we please get more informations on those files ?
Like I said before, they are in the repo I previously shared.
Just to be sure, can you triple check those paths values ? Are they relative or absolute?
They are absolute. I even check they exist in my script with this function :
check_file(){
if [[ -f $1 ]]; then
echo "Found $1"
fi
}
Also, how are those built?
With the generate_csv.py
file in our repo.
What is inside?
Example:
wav_filename,wav_filesize,transcript
/Users/allabana/develop/deepspeech/data/tarteel/recordings/15_81_3568321073.wav,1114156,و ا ت ي ن ا ه م _ ا ي ا ت ن ا _ ف ك ا ن و ا _ ع ن ه ا _ م ع ر ض ي ن
Can you make a single-example reduced test-case?
Yes, will get back to you on that.
Maybe you should also run
python
undergdb
to get more context on the crash.
Good call.