Inference output only rubbish?

hi there
i successfully installed deepspeech via python on my windows 10. i tested the pretrained english models with the provided sample audio and it worked properly.
so i downloaded the pretrained models (pbmm and scorer) for german from jaco-assistant gitlab repo. i used the same command as mentioned in the original mozilla
deepspeech github repo, i just changed the used pbmm and scorer, looking like this:

deepspeech --model output_graph_de.pbmm --scorer kenlm_de.scorer --audio random.wav

the wav data was a short 30 sec audio from youtube, form a german cartoon. i converted it to wav with 16bits and 16k sample rate. the inference output of 30sec of spoken audio were only 5 random german words which had nothing to do with the audio file.

did i do something wrong? im sorry if this is a noob question.

EDIT: well the audio was maybe a bit too long. since i read that too long audio files wont work properly, so i recorded my own voice (saying something like: hello my friends) and converted it to the right format. now the inference is… blank. lol. is there any step i needed to do before i use different pre trained pbmm and scorer models?

mono or stereo?

I can’t comment / support on the quality of third party models. I think @dan.bmh is the one hacking those models ?

What do you get if you use the German audio and transcribe it with the english model? And what if you use the English audio and the German model?

In both cases they should output something that sounds a little bit similar to the spoken text.

-------- Original-Nachricht --------

1 Like

We use Daniels models for testing and comparison against ours and didn’t have any problems at all with them in the past. Check your setup, maybe do it all on Colab, so you can share it.

An Update to the situation: I followed all of you guys’ suggestions (thanks btw for trying to hlp me!). I tested the German pbmm and the german scorer models with the english sample audio from mozialla/deepspeech (experience proofs this)
the output was: experience profis.
so… it looks like i messed up my audio file. but I dont know how and why it is messed up. it was successfully recognized by deepspeech, but the inference was strange (for the 30sec audio file) and blank (for the 2sec self recorded file).

i encoded the wav file with shutter encoder (a gui for some main features of ffmpeg). only options i chose were “choose function: WAV”, “type: 16 bits” and checked the “change the sample rate to: 16k”.

so i did not change anything about mono/stereo. (the own recording was done with audacity. so i guess it was stereo?). the 30sec audio was downloaded with youtube-dl fro youtube. i guess it is also stereo then. does it have to be mono?
(there are also the features “mix audio tracks to: either mono, stereo or 5.1” and “seperate audio trakcs to: mono or stereo” – but i didnt touch these settings and they were unchecked, so nothing about mono/stereo were changed/midified)

EDIT: and just btw… off-topic… i can see my own email next to my username on all of my posts. can only I see it or do all users see it? how do i hide it from other users? that kinda shocled me now lol.

EDIT 2: well my bad… i kinda overlooked that it has to be mono to work. i converted it to Mono and now it properly worked! thanks guys. but seems like too long audio files have a higher error rate, so i probably have to stick with shorter segments then

Export from audacity. Set 16.000 in the bottom to the left, delete one track and then export as wav.

It has been some time. I guess profile upper right and then change.

thanks it works now. the mistake was, that i forgot to convert it to “mono”.

thats so awkward. it seriously put my email in the “full name” field. no idea how this could have happened. but i changed it now. well lucky it was only my “email used for forums and stuff”