High Error Rate for my audio files

Hi all,

Working with deepspeech I noticed that the overall recognition rate is not good.
This is not in accordance with what is claimed in the paper.

I am using cpu architecture and trying to transcribe my audio files, but the error rate is very high.

I am using Mono Channel, 16kh, 16 bit audio files.

I would really appreciate guidance over this.

Getting more context on what you are doing would help a bit …

I want to show my audio files, where can i upload them?

Start by describing us what they are. How they are recorded. Etc etc. Version of deepspeech as well …

I am using Audacity to record the audio.
here is the output of mediainfo:
Complete name : audio/profit_1.wav
Format : Wave
File size : 94.0 KiB
Duration : 3 s 7 ms
Overall bit rate mode : Constant
Overall bit rate : 256 kb/s

Format : PCM
Format settings : Little / Signed
Codec ID : 1
Duration : 3 s 7 ms
Bit rate mode : Constant
Bit rate : 256 kb/s
Channel(s) : 1 channel
Sampling rate : 16.0 kHz
Bit depth : 16 bits
Stream size : 94.0 KiB (100%)

The deepspeech model i am using is 0.4.1.

Let me know if anymore information is needed.

Language? Accent? Recording conditions?

I am using English Language with Indian Accent, with no or minimal background noise.

I have tried to eliminate the noise completely using Audacity then too it doesn’t improve much.
Will Accent be a problem?

Then it’s very much likely the cause. Our current model is trained with not enough variety and only gives good results with american english.

Until now, yes. Best course of action is getting more variety of accents on Common Voice, that we use in the mix of datasets.

oh okay, then I will try to test it on the some English with American Accent.

Also, do you mean to say that I should be retraining the model with the dataset containing a mix of Indian Accent data as well as American Accent audio?

Please, search in the forum, you are not the first one to hit this issue, a lot of help has already been given on this topic.

Sure I will do that.
Any other resource I need to refer to?

@lissyx This seems like it’s a question that’s popping up a lot - maybe this should be in an FAQ somewhere? Or in the Github readme.

Yeah, well, over documentation is also not a good thing: it’s already mentionned in the release: https://github.com/mozilla/DeepSpeech/releases/tag/v0.5.1