Standard Method for Processing Long Audio Files with 0.3.0/0.4.0 Python Package?

Here’s a snippet of my processing code:

from deepspeech import Model
import scipy.io.wavfile as wav


ds = Model('models/output_graph.pb', 26, 9, 'models/alphabet.txt', 500)


def process(path, id):
    fs, audio = wav.read(path)
    processed_data = ds.stt(audio, fs)
    with open('lectures/' + id + '.txt', 'a') as f:
        f.write(processed_data)
    return processed_data

Sorry, this does not answers my question.

Whoops, I added my graph instantiation above.

Ok, so .pb, not .pbmm, means huge memory usage, likely not to help in your case. Please check the documentation about the mmap file format.

Thanks! I’ll try that change and report back.

This worked! However, I got this result from my short audio file:

“een go at in e an e same is were er to o in tejest bused e bro o or litre an per andifolworesware o t al e o as aner reaerything is bid some min o so o o o e o la oro ah e or ee ingle is ateou ea to his head e ant oo the bot te o hii a se is bo i weo reb e be a arebut the a e o a o bo tha wy om back tothe oher”

Using release 0.3.0 models and packages I’m getting jibberish. Though my input has noise and reverb I’m surprised by the lack of English words. Is this to be expected?

It really depends on your audio sources, at some point. Can you make sure it’s 16 bits PCM, 16kHz and mono, first? Could you share some sample ?

Here is the snippet: https://drive.google.com/file/d/1guFjgkmwJbi_e5nsWuZeMY5njj9klTGj/view?usp=drivesdk

I used Audacity to convert it 16Khz Mono WAV Win 16bit PCM

What was the source ?

Source was a 44khz mp4.

mono, stereo? I assume it was mp3. It’s possible your conversion introduced some bad artifacts, but given the output, can we be sure of your exact setup / versions ? The full output should include those informations.

Sorry, source was stereo .m4a . I’m running the latest non-alpha (0.3.0) versions of the Python package and models. As the python code only returns a string with the transcription I’m not sure what “full output” I can provide. If you mean the CLI interface I don’t have access to that as I’m only able to run the program in a specific environment.

libdeepspeech.so produces some TensorFlow/DeepSpeech version infos on stderr. We need that.

I’ll look into that. Here is the link to the models I’m downloading in my Dockerfile: https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-models.tar.gz

Except that the version you are running with is really important. …

Alright, so we’re talking about the output after running ./deepspeech-0.3.0-models/libdeepspeech.so ?

Sorry, I haven’t been able to find that file. It seems to be related to the python package in some way (found mentions of Native Client in the git repo)?

We package that inside the Python wheel, but you cannot directly execute it.

@zaptrem Any call to DS_CreateModel() will print versions informations on stderr: https://github.com/mozilla/DeepSpeech/blob/master/native_client/deepspeech.cc#L356. So you should be able to get it, even with the snippet you posted earlier.

Bit of a delay, but here you go:

TensorFlow: v1.11.0-9-g97d851f
DeepSpeech: v0.3.0-0-gef6b5bd