Building ds-ctcdecoder on RaspberryPi?

Hi,

is there are way to build the ctcdecoder package directly on a Raspberry Pi?

I did try to execute the steps I’m using to build it on my Linux PC, but this isn’t working:

# Build ctcdecoder package
RUN apt-get update && apt-get install -y swig sox
RUN git clone --depth 1 https://github.com/mozilla/DeepSpeech.git
# The next line is required for building with shallow git clone
RUN sed -i 's/git describe --long --tags/git describe --long --tags --always/g' /DeepSpeech/native_client/bazel_workspace_status_cmd.sh
RUN apt-get update && apt-get install -y libmagic-dev
RUN cd /DeepSpeech/native_client/ctcdecode && make NUM_PROCESSES=$(nproc) bindings
# RUN pip3 install --upgrade /DeepSpeech/native_client/ctcdecode/dist/*.whl

The script does some work and gives me some c++ compiler warnings, but finally fails with:

/DeepSpeech/native_client/ds-swig/share/swig/4.0.2/typemaps/swigtype.swg:608: Error: Syntax error in input(1).
error: command 'swig' failed with exit status 1

I’ve seen that it downloads a file from taskcluster for amd64 architecture

https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.swig.linux.amd64.f0e5d1a0be7383abd98a29a75f47d5dc10a87ef2.0/artifacts/public/ds-swig.tar.gz

so I assume that I’m doing something that I shouldn’t do…


Do you know an easy way to fix this, or do I need to cross-compile the .whl file?

Swig is a build/packaging tool. There is no prebuilt version for arm/aarch architecture available and you need to build swig manually on your RPI.

Why do you want to do that ?

There is no prebuilt because training on ARMv7/Aarch64 is not supported, and will not work given the complexity of the network.

@dan.bmh if you want to generate on-device scorer, please use https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/native_client.rpi3.cpu.linux.tar.xz and it bundles generate_scorer_package

you should follow cross-compilation docs anyway, for this kind of use

Installation of swig via apt-get did work for me, swig -version gives me 3.0.12. Bazel (3.7.0) and tensorflow (2.3) are also installed. I’m using raspbian-buster as base image for the container.


For now I would like to test the performance of my new network on a RasPi 4.

But on device customization would be an interesting feature to test. This might work with DeepSpeech aswell, I’m already training the rasa-nlu networks directly on the device (takes a few minutes). So I think STT training with a few 10s or 100s of sentences might be possible.

You don’t need ctcdecoder to run on device. You need it only to train.

It is unclear what you are trying to achieve. You need swig to build the bindings, but please use the version we provide (for consistency, 3.0 might have some bugs that are fixed there, and our branch is required for nodejs recent versions). You need tensorflow to perform training, but this is done through r1.15, not 2.3 …

And if you are changing the network architecture and need to rebuild libdeepspeech.so please follow our cross-compilation docs: https://deepspeech.readthedocs.io/en/v0.9.3/BUILDING.html?highlight=cross%20compilation#cross-building

Can I use the deepspeech package for decoding only? I didn’t find it in the docs.

Currently I would like to test my new network. It was able to package the data pipeline into it, so the input is an array of shape [audio signal values] and the output is [timesteps, alphabet probabilities]. For decoding with a language model I’m directly importing the ctc_beam_search_decoder from ctc_decoder package. Script is here: https://gitlab.com/Jaco-Assistant/deepspeech-polyglot/-/blob/dspol/extras/exporting/testing_pb.py

Everything is working on my PC, but now I would like to test it directly on a RasPi. So I need the ctc_decoder package there. I know this breaks compatibility with the DeepSpeech bindings, but I think its much easier to do a performance test this way first and work on the integration into the deepspeech.so file afterwards.

I’m not sure I understand what you want to do.

Problem is that you are not going to run the same code at all. libdeepspeech.so on RPi (and others) uses TFLite for inference. If you run using the training code, you’re not running TFLite, so you can’t really compare speed in this context.

Yes, so this is using the full blown tensorflow runtime, on raspberrypi device I don’t know how much you can expect, but it’s likely not going to reflect speed and latency at all of what you get out of TFLite runtime.

I don’t know, too, but I’d like to try it out.

So do you think its faster to adjust the ctc-decoder build in a way that I can build it directly on the RasPi (I would prefer this way) or to setup a cross-compiling environment?

this is documented and supported

this is non supported, and there’s a reason. also, again, doing so, you are going to test with the wrong runtime on your device.

I did successfully build a docker container from Dockerfile.build (using master branch), but now have a problem building the ctcdecoder again (the docker build runs a successful build already).

If I execute cd native_client/ctcdecode && make NUM_PROCESSES=$(nproc) bindings in the newly build container, I get following error:

[...]
DISTUTILS_USE_SDK=1 PATH=/DeepSpeech/native_client/ds-swig/bin::$PATH SWIG_LIB="/DeepSpeech/native_client/ds-swig/share/swig/4.0.2/" AS=as CC=gcc CXX=c++ LD=ld LIBEXE= CFLAGS="   " LDFLAGS="-Wl,--no-as-needed"   python ./setup.py build_ext --num_processes 12 --plat-name manylinux1_x86_6
[...]

scorer.cpp: In member function 'int Scorer::load_lm(const string&)':
scorer.cpp:97:43: error: 'class lm::base::Model' has no member named 'GetEndOfSearchOffset'
   uint64_t trie_offset = language_model_->GetEndOfSearchOffset();
                                           ^~~~~~~~~~~~~~~~~~~~
Traceback (most recent call last):
  File "./setup.py", line 68, in <module>
    maybe_rebuild(CTC_DECODER_FILES, ctc_decoder_build, build_dir)
  File "./setup.py", line 52, in maybe_rebuild
    debug=debug)
  File "/DeepSpeech/native_client/ctcdecode/build_archive.py", line 95, in build_archive
    obj_files = list(pool.imap_unordered(build_one, srcs))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 735, in next
    raise value
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/DeepSpeech/native_client/ctcdecode/build_archive.py", line 91, in build_one
    subprocess.check_call(shlex.split(cmd))
  File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['c++', '-c', '-fPIC', '-DKENLM_MAX_ORDER=6', '-std=c++11', '-Wno-unused-local-typedefs', '-Wno-sign-compare', '-O3', '-DNDEBUG', '-I..', '-I../kenlm', '-Ithird_party/openfst-1.6.7/src/include', '-Ithird_party/ThreadPool', '-Ithird_party/object_pool', 'scorer.cpp', '-o', 'temp_build/temp_build/scorer.o']' returned non-zero exit status 1.
Makefile:47: recipe for target 'bindings' failed
make: *** [bindings] Error 1

Do you have an idea how to fix it?

Your setup is unclear, sorry. How do you achieve this, building in the docker container? Outside?

The Dockerfile.build does build ds_ctcdecoder python wheel and a few days ago this was green on TaskCluster, so I highly doubt this is broken.

I dont get how building on amd64 inside a docker will help evaluate speed and latency on RPi4.

Please be super explicit, because this wording is confusing. The deepspeech python bindings package are only producing inference, but you need to update libdeepspeech.so implem.

The deepspeech_training python package has some ability to perform inference, evaluate.py. But once again, this will not be useful in your case, because this code will run using the TensorFlow runtime, and not the TFLite runtime that we use on those arch for libdeepspeech.so.

The more I see you trying things, the more I’m convinced you would loost much much less time by just adjusting libdeepspeech implementation.

Inputs on TFLite are there: https://github.com/mozilla/DeepSpeech/blob/962a117f7ed5720435904e3ac864cc8420256ca1/native_client/tflitemodelstate.cc#L194-L202
Inference on TFLite is quite simple: https://github.com/mozilla/DeepSpeech/blob/962a117f7ed5720435904e3ac864cc8420256ca1/native_client/tflitemodelstate.cc#L354-L390

TensorFlow inference is even simpler: https://github.com/mozilla/DeepSpeech/blob/962a117f7ed5720435904e3ac864cc8420256ca1/native_client/tfmodelstate.cc#L201-L245

Hack the TFLite version build TFLite on your desktop, ensure it works and then follow the cross-compilation steps. You don’t need bindings etc, just build the C++ client and libdeepspeech.so and that’s it, you can evaluate performances, speed, latency.