Hi,
I am testing DeepSpeech on Android:
- using androidspeech demo app (latest version, ie. commit 0ca061bdd8c5c849265937d3364eb79afcf47eef)
- on Huawei P20 Lite (SoC inside is Kirin 659; Octa-core (4x2.36 GHz Cortex-A53 & 4x1.7 GHz Cortex-A53))
I havenāt changed demo app code in any way. Models were downloaded from here (v0.6.0). However, iāve trained my own models for Polish (n_hidden = 1700) but with same poor performance. Output transcript is not totally random/gibberish, ie. it contains real words but it is obvously wrong. Same model (.pbmm not .tflite) on PC+Ubuntu yields very good results (at least in my opinion). As far as i know, models converted to .tflite are supposed to perform worse but in my situation results are not even comparable.
Anyway, i donāt think that this is āmodel-specificā problem, so further details will apply to DeepSpeech pretraind English models. Below are some examples:
- me saying: one two three four and trascriptions is what a to (audio)
- me saying: my name is Robert and trascriptions is nature of the (audio)
- me saying: speak to me and trascriptions is sixty (audio)
- me saying: i like to play chess and trascriptions is i getta (audio)
I am not and english native speaker but i donāt think that my accent is that bad to expect such results. Also linked audio are not raw files saved by androidspeech demo app. Iāve recorded them on default recorder app on my phone to demonstrate the way i talk and testing environment. To save audio buffer directly from demo app iāve used keepClips option in STTLocalClient.java file. Here is example of such clip converted to regular wav file (not raw PCM) audio. In this recording i am saying: i like to play chess. I donāt know why audio quality is that bad. Maybe there is some issues with dropped frames (my phone is too slow)? P20 Lite isnāt the newest and most performant phone but it is comparable to RPI3 that achives < realtime inference times.
Is there any way to prevent frame drops?
edit. this is the command i used to convert raw PCM
ffmpeg -f s16le -ar 16k -ac 1 -i iliketoplaychess_DS.wav iliketoplaychess_DS_not_raw.wav