Error while training with VCTK data

Thanks for import script for VCTK data. I imported VCTK data using import_ vctk script and it produced data in the format deepspeech can train. However training encountered following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Bad audio format for WAV: Expected 1 (PCM), but got3

Sox outputs following for one of wav files:

sox --i data/vctk/VCTK-Corpus/wav48/p297/p297_012.wav

Input File     : '/dev/data/vctk/VCTK-Corpus/wav48/p297/p297_012.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 24-bit
Duration       : 00:00:06.45 = 103154 samples ~ 483.534 CDDA sectors
File Size      : 413k
Bit Rate       : 512k
Sample Encoding: 32-bit Floating Point PCM

AFAIK, Precision should be 16-Bit. Is there anything wrong with import script.

I am using Version 0.5.1.

I don’t know the details for this import script, but bugs can happen. It looks like your got into one.

We won’t be able to fix it on this version.

You should verify the import script to ensure it processes conversion properly. Feel free to send a patch if you create one.

Thanks. I will look at it and send patch.

1 Like

I converted using sox and got the training working… There is one good script I came across that can be handy to check if the wav files are valid… this can be used to validate the wav files before training https://gist.github.com/piraka9011/e333325fc630f92b2808a16777e538b8

Thanks, this should be dealt with at import time for better experience. Could you propose a PR ?