Can anyone give an idea…
How to send the deepspeech model output text without lm(Language Model) and send that text to lm(Language Model) model separately
@narasimman.saravana Write code that runs with and without LM ?
I think you not understood my question
How can i run lm model separately. for example, I have a text which i want to send it to the lm model. so is there any cmd line for lm model to run it.
Well, when you just throw a question in a title, it’s not surprising people don’t understand what you want. So you want to do what exactly, because I don’t get you.
i’m really sorry for bad question .
for example I got text from deepspeech model without lm ‘neaar branc’ but correct sentence is nearest branch. Now i want to give the ‘neaar branc’ text to lm model and it will give a nearest branch. I know if i use deepspeech with lm model it will give correct prediction. but I want it to do separately lm model. because i want to process the raw text from the deepspeech(without lm) and then pass to the lm model to correct the sentence.
That’s an unusual usage. Problem is, the LM runs on the output of the networks, logits, during the decode phase. So you cannot achieve that through the API. Can you explain why you want to proceed this way?
for noise audio i’m planning to include * .for example hello audio (0.6sec) + noise audio(1sec) transcription: hello ** (for 0.5 sec i added one *). if audio contains only one noise then depends on the second i put * in transcript.Why i am doing this is way .when the voice + noise is predicted through the deepspeech the output is somewhat ok ,but if audio contains only noise means its predicting randomly word.So I want to train the noise only audio file also,but i does not want to include * in my lm model.
You want your model to learn to output
* when there is noise detected?
yes, if i send a noise audio it should return multiple * based on seconds
Do you have a use case for the
*, or it’s just a way to get rid of noise?
To detect noise.
we have trained nearly 10,000 real time noise augmented with our own clean audio file 2,00,000.It predicts the mixed noise with voice correctly. but in real time stream telephone line we can also get only noise.The problem is the model predicting the only noise audio some randomly word.
so we are planning to train the noise only audio file.
Have you tried VAD? From my experiments it was already good enough to avoid recognition when there is noise. Maybe you should add to your training some noise-only audio and have multiple
*, as matching transcript?
Or even another tool in front of DeepSpeech that would detect voice / pure noise. I remember finding something like that on github made by French INA.
@lissyx we tried VAD, but it detecting silent’s only.
Could you please share the github link… if possible.
I don’t have it here, I’ll try and find it later today
But i tried this one it give given the correct word, OOV throwing for me
Above mentioned image will help you to see that.
Actual word is “nearest branch”
GIven word is “ne branc”, but it returning the not matching like that.
Please avoid sharing such content as screenshot, it’s hard to read on mobile and not searchable.
What’s in your
Sorry for the attachment,
Please refer the link for that test.py
No I don’t think it is enough. Look at
DeepSpeech.py how logits gets passed to the ctc decoder