Arm64 performance vs armv7l

jerrm · January 17, 2020, 5:27am

Environment is an Raspberry Pi 4/64 bit kernel/Debian Buster ARM64 nspawn container.

The arm64 packages are about half the speed of the armv7 packages inside a 32bit container.

I’ve tried both “pip3 install https://github.com/mozilla/DeepSpeech/releases/download/v0.6.1/deepspeech-0.6.1-cp37-cp37m-linux_aarch64.whl” and the native client from the release page.

armv7 is using TF Lite, arm4 isn’t. Is there a tflite arm64 build available?

lissyx · January 17, 2020, 7:10am

I can’t really value that, it’s not a setup we support.

Linux and Android ARM64 packages should both be using TFLite.

lissyx · January 17, 2020, 7:18am

Ok, my memory was failing me, we switched only the RPi3 builds, and we are in the process of moving more builds to TFLite.

lissyx · January 17, 2020, 7:29am

BTW, Pypi policy does not allows us to upload aarch64 wheels.

I’ll take care of adding Linux/ARM64 builds: Switch Linux/ARM64 to tflite runtime · Issue #2676 · mozilla/DeepSpeech · GitHub

jerrm · January 17, 2020, 2:15pm

Great! and Thank you.

On a tangent, is it expected the X86_64 deepspeech-tflite package is significantly slower than the deepspeech package?

For a 35sec file, deepspeech completes in approx 18sec, deepspeech-tflite takes 53 sec.

lissyx · January 17, 2020, 2:17pm

That depends on your hardware, but that sounds bad and it’s really not what we see.

jerrm · January 17, 2020, 3:02pm

Above test was on a single core VMWare VM, 2GB RAM.

Just ran the same test on my notebook - i5-7200/2 core/4 thread/8GB. deepspeech was about the same - 19s, Deepspeech-tflite was 38s - better than above but still significantly slower.

Fully updated Debian Buster/python3 on both.

Not sure if this is helpful or not. Happy to run any other tests if it helps.

lissyx · January 17, 2020, 3:06pm

Right, not surprising then. TFlite will eat one complete core

Same 35 secs file ? That looks much better than previous test and closer to what we expect, even though it’s a bit slow. We have not ran extensive benchmarking, that’s also why we have not yet switched default runtime, so that kind of feedback is good.

jerrm · January 17, 2020, 3:18pm

Yes, same file.

Thanks.

jerrm · January 17, 2020, 4:10pm

Just FYI. a couple more (anecdotal) data points using same file:

Raspbian Pi4 32 bit (tflite): 28s

Raspberry Pi4 64bit (pbmm): 120s

Opposite of the X86_64 results.

64bit is a Debian arm64 container on the Raspbian 64bit kernel.

Will be happy to try an unreleased arm64 tflite build - if you gen one up send me a link. Unfortunately I don’t have the time right now to set up a build environment for myself.

lissyx · January 17, 2020, 4:47pm

I had to fix a few extras in the patch but I should be able to share something with you soon.

Your perfs on rpi4 seems consistent.

jerrm · January 17, 2020, 4:56pm

Great, but at your pleasure - not angling for any sort of custom build, just willing to help a little and report back if you want.

Thanks,

lissyx · January 17, 2020, 4:57pm

Don’t worry, this was on our radar for some next release anyway, and your message just confirms that we should have done that earlier.

lissyx · January 17, 2020, 6:50pm

@jerrm This should be what you want: https://community-tc.services.mozilla.com/api/queue/v1/task/KZMAnYo2Qy2-icrTp5Ldqw/runs/0/artifacts/public/deepspeech-0.6.1-cp37-cp37m-linux_aarch64.whl

jerrm · January 17, 2020, 9:02pm

Great! That gets back to the 28s mark on the test file.

There are some minor differences in the output vs the 0.6.1 armv7 package. I assume these don’t matter (artifact of a one-off test build, current tree vs release version, etc), but just in case it’s something you want to address, Test apparently has an older TensorFlow (1.12 vs 1.14) and is missing the “INFO: Initialized TensorFlow Lite”:

Raspberry armv7 output:
Loading model from file deepspeech-0.6.1-models/output_graph.tflite TensorFlow: v1.14.0-21-ge77504a DeepSpeech: v0.6.1-0-g3df20fe INFO: Initialized TensorFlow Lite runtime. Loaded model in 0.00548s. Loading language model from files deepspeech-0.6.1-models/lm.binary deepspeech-0.6.1-models/trie Loaded language model in 0.00236s. Running inference.

aarch64 test file
Loading model from file deepspeech-0.6.1-models/output_graph.tflite TensorFlow: v1.12.0-22283-g917d341 DeepSpeech: v0.6.1-alpha.0-80-g5a509f5 Loaded model in 0.00142s. Loading language model from files deepspeech-0.6.1-models/lm.binary deepspeech-0.6.1-models/trie Loaded language model in 0.00042s. Running inference.

Thanks Again!

lissyx · January 18, 2020, 9:07am

We don’t see it anymore since TensorFlow r1.15, I have not verified but I assume it’s just TensorFlow not printing it

No, it’s r1.15, it’s just that this is a temp build and I have not pushed tags to my branch.

jerrm · January 18, 2020, 12:29pm

No worries, just the type of test-build issues I assummed.

But, in thinking my possible scenario through, it could be beneficial if “deepspeech --version” or similar returned some indication of tflite status.

lissyx · January 18, 2020, 12:47pm

We’ve already been through that and decided it was not a good idea. Can you share your thoughts and how that would be useful?

jerrm · January 18, 2020, 4:36pm

A script could be used on multiple architectures(x8664/arm7/arm64). The script needs to know what model to use. It’s not a problem if I’m in control of the environment, but that’s not always the case. I’d normally do basic sanity checks early on - what version are we, what model do we need, do we have the right model available, etc. I can make assumptions based on CPU, but I don’t like assumptions.

There may be a better way I don’t see yet - I’m only a few (partial) days into deepspeech.

Topic		Replies	Views
How to install deepspeech tflite on armv7 cortex A7 DeepSpeech	4	721	April 7, 2021
Installing DeepSpeech tflite 0.9.3 on Nvidia Jetson Nano (Jetpack 4.5.1) [GUIDE] DeepSpeech	1	1447	December 30, 2021
Video and benchmarking results DeepSpeech	15	1605	February 6, 2020
Prebuild deep speech binary for tensorflow lite model on raspberry pi 3? DeepSpeech	32	4332	September 26, 2019
Compiling libdeepspeech.so on aarch64 machine DeepSpeech learning	6	861	August 13, 2020

Arm64 performance vs armv7l

Related topics