Possibility to train on different file formats?

Hi, one area that DeepSpeech could be use is for phones conversations. However those files do not have the same format as the required one for DeepSpeech (16kHz, wave PCM signed etc).
They have as format:

  • Format: ADPCM
  • Format profile: A-Law
  • Sampling rate: 8kHZ
  • Bit Depth: 8 bits

So I was wondering if there was a way to train models on those kind of files or is converting to PCM signed then upsampling the only way ?
Thanks in advance

You can train on whatever format you want, you might have to apply changes to DeepSpeech.py and others though.

But please note we only work / test with 16kHz PCM signed.

Do you think it’s possible to use the 16kHz model that you trained and fine tune it on 8kHz data or shall I start a model from scratch (I don’t know how much data is required but I can gather quite a lot of samples)

Different sample rates cannot be combined.

1 Like

Likely the easiest solution is to upsample.

We upsample some of our training data which originated from phone conversation and it works fine. See bin/import_fisher.py for how to upsample.