- Training or just running inference - Training
- Moziall STT branch/version - 0.8.2
- OS Platform and Distribution - Red Hat Enterprise Linux Server 7.8 (Maipo)
- Python version - 3.6.8
- TensorFlow version - 1.15.2
I’ve been working with the Russian Common Voice data, imported with import_cv2.py and during validation DeepSpeech just stopped. No errors, new checkpoints, it just hung. Through a lot of digging I discovered it was because some of the wav files had 2 channels, which failed the assert check when converting pcm to np in audio.py. The really weird part is that all of them were in the dev set, none in train or test. I wrote a bash script to identify the wav files with multiple channels, happy to provide it if you’d like.
Looking through import_cv2 I didn’t see any channel conversion, unless I missed it. So if that’s supposed to be converting 2 channel audio files into 1, something’s going wrong in there. The biggest issue though, imo, is that it fails silently. While I’ve figured it out for my own use, I can see others running into this, especially since I’m using Common Voice data, so I thought you’d want to know.
Let me know if you need any more info or if I can help in any other way.