Pretrained Chinese Model Invalid Inference Output

jancarius · March 22, 2021, 5:02am

(Versions and console output at end of post)

I’m attempting to use the Chinese pretrained model provided in the main git repo. I have gotten the English model/scorer to work perfectly.

But the output I am getting from the Chinese model/scorer in UTF-8 encoding is just “��” repeatedly, and in GBK (simplified Chinese) is “锟斤拷”, repeatedly, regardless of the audio input.

I was able to find that “锟斤拷” is a standard Chinese output when there is a problem with the encoding. I’m outputting the file with writeFileSync in nodejs standard library, with encoding of utf-8. gbk is not a valid encoding to specify, so I haven’t been able to do that. I did try to use a charset detection package, that told me it was 100% confident that the token.text was ASCII, which I find hard to believe. I also tried to use an encoding converter to convert to/from a variety of encodings. No luck there either.

Any ideas why I might be getting bad encoding back from the chinese pretrained models in the token.text? I’m not sure this is the issue.

I confirmed that the audio input I am using is .wav, 16khz, single channel.

I’m very new to this stuff. I have some standard mandarin audio that I plan on running through this, as well as collecting user audio, and cross referencing to give the user feedback on how good their pronunciation is. I don’t yet know if I can get everything I need here, or if I am going to be able to utilize something like SCTK/SCLite. Still exploring! Thanks for the assistance.

Tried running in two environments.

Just running inference.
using ‘deepspeech’ npm package
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
Windows 10.0.19042 / Python 3.9.0
WSL2 Debian 10 / Python 3.7.3

I haven’t been able to find anything anywhere, and I’ve looked hard. I can’t remember the last time I had to post a question. Thank you again for any assistance.

My console output is:
TensorFlow: v2.3.0-6-g23ad988fcd
DeepSpeech: v0.9.3-0-gf2e9c858
2021-03-21 20:47:21.357944: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

lissyx · March 23, 2021, 1:28pm

try to repro with pure C++ library, it might not be impossible we have bugs in the bindings …

Mixing Windows / Linux might add complexity here, it’d be easier if you can repro in a pure linux (vm) instead of windows or wsl

jancarius · March 24, 2021, 3:55am

Thanks, @lissyx! Great advice. It worked using the pure library in WSL2. There must be an issue with the bindings, like you said. I can work around that. Perhaps I should file a bug, just so the team is aware? Thanks again!

lissyx · March 24, 2021, 7:59am

I see you did, but it’s not really complete. BTW the team includes me, but I don’t think we have time to investigate

jancarius · March 24, 2021, 9:43pm

No worries. I’ll check into updating with required details. I can work without the javascript bindings. I just figured it would be shitty to not at least report the issue. Thanks for the assistance.

lissyx · March 24, 2021, 9:54pm

Yes, at least we know it’s not the lib that is broken. I’ll continue on GitHub if required.