Pretrained Model cannot provide accurate English words

ziqianzhu727 · November 7, 2018, 8:31pm

I am using retrained Deepspeech 0.3 to do some inference from recoded call. However it always output me some results like

"we for we look on a spirsofalaea a i go to a wouldtilemottofolespurtthou oh we o o n to compare now or bocortoby i for both fought no one to go and be a tale to mocometothecousonanofalinbuta a "

I tried to adjust the model arguments LM_WEIGHT, VALID_WORD_COUNT_WEIGHT, and change the audio sampling rate and format, also chunked the length. None of them help.

Any advice on this? Thanks,

lissyx · November 7, 2018, 8:56pm

Can you give us context ? How did you retrained, source of the data, what version of the inference code do you use ?

ziqianzhu727 · November 7, 2018, 9:15pm

Hi lissyx, thanks for help. I haven’t retrained the model as no available labelled data. The source of data is recorded call from call centre. I am directly using python code to do the inference,

ds = Model(‘Test/output_graph.pb’, 26, 9, ‘Test/alphabet.txt’, 500)
ds.enableDecoderWithLM(str(‘Test/alphabet.txt’), str(‘Test/lm.binary’), str(‘models/trie’), 1.5, 2.1)
fs, audio = wav.read(‘short_test.wav’)
processed_data = ds.stt(audio, fs)

lissyx · November 7, 2018, 9:17pm

Any details ? Like format, sampling rate ? Are the people speaking with a native accent ?

lissyx · November 7, 2018, 9:20pm

The whole output would help as well, to check the version of libdeepspeech.so you are using.

ziqianzhu727 · November 7, 2018, 9:25pm

The accent is native but there are places talker try to correct himself and little bit background noise. Format: I tried 8 bit, 16bit and 32bit. Sampling rate: I tried 8000hz, 16000 hz and 32000 hz. I tried google api to do it, it worked very well which dropped the correction part and only leave words make sense there.

lissyx · November 7, 2018, 9:33pm

Can you give the source format? Conversions can add artifacts that messes up with the data.

ziqianzhu727 · November 7, 2018, 9:38pm

Source is 32000HZ with 32 bit.

lissyx · November 7, 2018, 9:39pm

And how many channels ?

ziqianzhu727 · November 7, 2018, 9:42pm

oh, forgot to mention that. it is Mono.

lissyx · November 7, 2018, 9:45pm

Ok. I’m still waiting on the exact version of libdeepspeech.so you are using …

lissyx · November 7, 2018, 9:47pm

Is it possible you might share some of them ?

ziqianzhu727 · November 7, 2018, 10:00pm

I cannot share them. I am not sure the version of libdeepspeech.so. The Deepspeech version is 0.3.0.

lissyx · November 7, 2018, 10:02pm

It should be printed on the output when you run it …

ziqianzhu727 · November 8, 2018, 12:10am

Hi Lissyx, thanks. I didn’t find the version in the output. But as you said , the sampling rate matters here. The way I changed the sampling rate was not right. Now, I applied pydub change the sampling rate which gives more reasonable output.

reuben · November 8, 2018, 1:25am

You should also try the new decoder in master, it should alleviate these problems. There’s instructions here: https://github.com/mozilla/DeepSpeech/issues/1156#issuecomment-434351398

lissyx · November 8, 2018, 5:01am

Why don’t you share the whole output? It should be printed, by that call DeepSpeech/native_client/deepspeech.cc at master · mozilla/DeepSpeech · GitHub

derekpankaew · November 8, 2018, 8:22am

How did you change the sampling rate? I’m experiencing a similar issue, using ffmpeg to resample. From what I can tell, pydub uses ffmpeg to do its resampling:

http://github.com/jiaaro/pydub/blob/master/API.markdown#audiosegmentexport

Can you share any more details about what was the wrong way and the right way to resample the audio? And how big a difference did it make? (Did it eliminate the issue completely?)

ziqianzhu727 · November 8, 2018, 5:05pm

Yes, the first method is I use audio software(Audacity) to adjust the sampling rate and later I installed the pydub with ffmpeg to do it. What I found out is for some long words, it works better. Not completely solve the issue as still something need to work on such as removing the noise.

Topic		Replies	Views
Terrible Accuracy? DeepSpeech	33	5869	November 2, 2019
Using Deep Speech DeepSpeech	34	12846	August 20, 2019
Performance for native English speaker with pre-trained model? DeepSpeech	15	4529	July 12, 2019
Inference output only rubbish? DeepSpeech	6	716	February 20, 2021
DeepSpeech model training DeepSpeech	65	7991	November 12, 2019

Pretrained Model cannot provide accurate English words

Related topics