Arm64 performance vs armv7l

Environment is an Raspberry Pi 4/64 bit kernel/Debian Buster ARM64 nspawn container.

The arm64 packages are about half the speed of the armv7 packages inside a 32bit container.

I’ve tried both “pip3 install https://github.com/mozilla/DeepSpeech/releases/download/v0.6.1/deepspeech-0.6.1-cp37-cp37m-linux_aarch64.whl” and the native client from the release page.

armv7 is using TF Lite, arm4 isn’t. Is there a tflite arm64 build available?

I can’t really value that, it’s not a setup we support.

Linux and Android ARM64 packages should both be using TFLite.

Ok, my memory was failing me, we switched only the RPi3 builds, and we are in the process of moving more builds to TFLite.

BTW, Pypi policy does not allows us to upload aarch64 wheels.

I’ll take care of adding Linux/ARM64 builds: https://github.com/mozilla/DeepSpeech/issues/2676

Great! and Thank you.

On a tangent, is it expected the X86_64 deepspeech-tflite package is significantly slower than the deepspeech package?

For a 35sec file, deepspeech completes in approx 18sec, deepspeech-tflite takes 53 sec.

That depends on your hardware, but that sounds bad and it’s really not what we see.

Above test was on a single core VMWare VM, 2GB RAM.

Just ran the same test on my notebook - i5-7200/2 core/4 thread/8GB. deepspeech was about the same - 19s, Deepspeech-tflite was 38s - better than above but still significantly slower.

Fully updated Debian Buster/python3 on both.

Not sure if this is helpful or not. Happy to run any other tests if it helps.

Right, not surprising then. TFlite will eat one complete core

Same 35 secs file ? That looks much better than previous test and closer to what we expect, even though it’s a bit slow. We have not ran extensive benchmarking, that’s also why we have not yet switched default runtime, so that kind of feedback is good.

Yes, same file.

Thanks.

Just FYI. a couple more (anecdotal) data points using same file:

Raspbian Pi4 32 bit (tflite): 28s

Raspberry Pi4 64bit (pbmm): 120s

Opposite of the X86_64 results.

64bit is a Debian arm64 container on the Raspbian 64bit kernel.

Will be happy to try an unreleased arm64 tflite build - if you gen one up send me a link. Unfortunately I don’t have the time right now to set up a build environment for myself.

I had to fix a few extras in the patch but I should be able to share something with you soon.

Your perfs on rpi4 seems consistent.

Great, but at your pleasure - not angling for any sort of custom build, just willing to help a little and report back if you want.

Thanks,

Don’t worry, this was on our radar for some next release anyway, and your message just confirms that we should have done that earlier.

@jerrm This should be what you want: https://community-tc.services.mozilla.com/api/queue/v1/task/KZMAnYo2Qy2-icrTp5Ldqw/runs/0/artifacts/public/deepspeech-0.6.1-cp37-cp37m-linux_aarch64.whl

1 Like

Great! That gets back to the 28s mark on the test file.

There are some minor differences in the output vs the 0.6.1 armv7 package. I assume these don’t matter (artifact of a one-off test build, current tree vs release version, etc), but just in case it’s something you want to address, Test apparently has an older TensorFlow (1.12 vs 1.14) and is missing the “INFO: Initialized TensorFlow Lite”:

Raspberry armv7 output:
Loading model from file deepspeech-0.6.1-models/output_graph.tflite TensorFlow: v1.14.0-21-ge77504a DeepSpeech: v0.6.1-0-g3df20fe INFO: Initialized TensorFlow Lite runtime. Loaded model in 0.00548s. Loading language model from files deepspeech-0.6.1-models/lm.binary deepspeech-0.6.1-models/trie Loaded language model in 0.00236s. Running inference.

aarch64 test file
Loading model from file deepspeech-0.6.1-models/output_graph.tflite TensorFlow: v1.12.0-22283-g917d341 DeepSpeech: v0.6.1-alpha.0-80-g5a509f5 Loaded model in 0.00142s. Loading language model from files deepspeech-0.6.1-models/lm.binary deepspeech-0.6.1-models/trie Loaded language model in 0.00042s. Running inference.

Thanks Again!

We don’t see it anymore since TensorFlow r1.15, I have not verified but I assume it’s just TensorFlow not printing it :slight_smile:

No, it’s r1.15, it’s just that this is a temp build and I have not pushed tags to my branch.

No worries, just the type of test-build issues I assummed.

But, in thinking my possible scenario through, it could be beneficial if “deepspeech --version” or similar returned some indication of tflite status.

We’ve already been through that and decided it was not a good idea. Can you share your thoughts and how that would be useful?

A script could be used on multiple architectures(x8664/arm7/arm64). The script needs to know what model to use. It’s not a problem if I’m in control of the environment, but that’s not always the case. I’d normally do basic sanity checks early on - what version are we, what model do we need, do we have the right model available, etc. I can make assumptions based on CPU, but I don’t like assumptions.

There may be a better way I don’t see yet - I’m only a few (partial) days into deepspeech.