I was trying to inference using pre-trained librispeech model with some audio sample randomly collected from web. But the result is quite depressing, model predicted every single character wrong.
Ground truth: “The Story of Arthur the Rat. Once upon a time there was a rat who couldn’t make up his”
Predicted : “HAM AUUEWIR CCHIUVHE C O HO AA UBBUSH”
Is there any way to solve this?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
Random audio samples ? Can you share more informations on their characteristics ?
Downloaded the audio sample from here.. then splitted the samples in 7s.
Audio properties-
Duration : 7s
channels : 2
sampling rate: 44.1khz
Bit rate : 112 kbps
If you need more info please feel free to ask.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
4
Okay, then I’d guess our automatic resampling is not good enough and likely kills the data inside. Model expects mono, 16kHz 16-bits PCM. We do have code that perform transformation to that, but obviously this is not good enough.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
5
Would you have a direct link to share one sample ? I’d like to see what happens after transformation.