Running DeepSpeech inside Docker Container

I am trying to run the DeepSpeech model inside the docker container. I am successfully able to run the Docker Container, but I am stuck while passing the input to the model. Could anyone please point out how can I pass the input to the container?

It’s hard to help without more context on your exact setup, and it feels more like a Docker question than a DeepSpeech question.

Hi Lissyx,

Thank you for your reply.

I have taken the DeepSpeech model available in GitHub and wants to run it as Dockerised container. I am successfully able to run the DeepSpeech model inside the container. But, I am stuck on how can I pass the input i.e. WAV file.

I don’t see any documentation about on it on GitHub DeepSpeech readMe. Please advice. Below is my Dockerfile for reference.


####### Need devel version cause we need /usr/include/cudnn.h
####### for compiling libctc_decoder_with_kenlm.so
FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04

####### START Install base software

####### Get basic packages
RUN apt-get update && apt-get install -y --no-install-recommends
build-essential
curl
wget
git
python
python-dev
python-pip
python-wheel
python-numpy
libcurl3-dev
ca-certificates
gcc
sox
libsox-fmt-mp3
htop
nano
swig
cmake
libboost-all-dev
zlib1g-dev
libbz2-dev
liblzma-dev
locales
pkg-config
libsox-dev

####### Install Bazel
RUN apt-get install -y openjdk-8-jdk

####### Use bazel 0.11.1 cause newer bazel fails to compile TensorFlow (https://github.com/tensorflow/tensorflow/issues/18450#issuecomment-381380000)
RUN apt-get install -y --no-install-recommends bash-completion g++ zlib1g-dev
RUN curl -LO “https://github.com/bazelbuild/bazel/releases/download/0.11.1/bazel_0.11.1-linux-x86_64.deb
RUN dpkg -i bazel_*.deb

####### Install CUDA CLI Tools
RUN apt-get install -y cuda-command-line-tools-9-0

####### Install pip
RUN wget https://bootstrap.pypa.io/get-pip.py &&
python get-pip.py &&
rm get-pip.py

####### END Install base software

####### START Configure Tensorflow Build

####### Clone TensoFlow from Mozilla repo
RUN git clone https://github.com/mozilla/tensorflow/
WORKDIR /tensorflow
RUN git checkout r1.6

####### GPU Environment Setup
ENV TF_NEED_CUDA 1
ENV CUDA_TOOLKIT_PATH /usr/local/cuda
ENV CUDA_PKG_VERSION 9-0=9.0.176-1
ENV CUDA_VERSION 9.0.176
ENV TF_CUDA_VERSION 9.0
ENV TF_CUDNN_VERSION 7.1.4
ENV CUDNN_INSTALL_PATH /usr/lib/x86_64-linux-gnu/
ENV TF_CUDA_COMPUTE_CAPABILITIES 6.0

####### Common Environment Setup
ENV TF_BUILD_CONTAINER_TYPE GPU
ENV TF_BUILD_OPTIONS OPT
ENV TF_BUILD_DISABLE_GCP 1
ENV TF_BUILD_ENABLE_XLA 0
ENV TF_BUILD_PYTHON_VERSION PYTHON2
ENV TF_BUILD_IS_OPT OPT
ENV TF_BUILD_IS_PIP PIP

####### Other Parameters
ENV CC_OPT_FLAGS -mavx -mavx2 -msse4.1 -msse4.2 -mfma
ENV TF_NEED_GCP 0
ENV TF_NEED_HDFS 0
ENV TF_NEED_JEMALLOC 1
ENV TF_NEED_OPENCL 0
ENV TF_CUDA_CLANG 0
ENV TF_NEED_MKL 0
ENV TF_ENABLE_XLA 0
ENV PYTHON_BIN_PATH /usr/bin/python2.7
ENV PYTHON_LIB_PATH /usr/lib/python2.7/dist-packages

####### END Configure Tensorflow Build

####### START Configure Bazel

####### Running bazel inside a docker build command causes trouble, cf:
####### https://github.com/bazelbuild/bazel/issues/134
####### The easiest solution is to set up a bazelrc file forcing --batch.
RUN echo “startup --batch” >>/etc/bazel.bazelrc

####### Similarly, we need to workaround sandboxing issues:
####### https://github.com/bazelbuild/bazel/issues/418
RUN echo “build --spawn_strategy=standalone --genrule_strategy=standalone”
>>/etc/bazel.bazelrc

####### Put cuda libraries to where they are expected to be
RUN ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1
RUN cp /usr/include/cudnn.h /usr/local/cuda/include/cudnn.h

####### Set library paths
ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu/:/usr/local/cuda/lib64/stubs/

####### END Configure Bazel

####### Copy DeepSpeech repo contents to container’s /DeepSpeech
COPY . /DeepSpeech/

WORKDIR /DeepSpeech

RUN pip --no-cache-dir install -r requirements.txt

####### Link DeepSpeech native_client libs to tf folder
RUN ln -s /DeepSpeech/native_client /tensorflow

####### START Build and bind

WORKDIR /tensorflow

####### Using CPU optimizations:
####### -mtune=generic -march=x86-64 -msse -msse2 -msse3 -msse4.1 -msse4.2 -mavx.
####### Adding --config=cuda flag to build using CUDA.

####### passing LD_LIBRARY_PATH is required cause Bazel doesn’t pickup it from environment

####### Build LM Prefix Decoder, CPU only - no need for CUDA flag
RUN bazel build -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-mtune=generic --copt=-march=x86-64 --copt=-msse --copt=-msse2 --copt=-msse3 --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx //native_client:libctc_decoder_with_kenlm.so --verbose_failures --action_env=LD_LIBRARY_PATH=${LD_LIBRARY_PATH}

####### Build DeepSpeech
RUN bazel build --config=monolithic --config=cuda -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-mtune=generic --copt=-march=x86-64 --copt=-msse --copt=-msse2 --copt=-msse3 --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-fvisibility=hidden //native_client:libdeepspeech.so //native_client:deepspeech_utils //native_client:generate_trie --verbose_failures --action_env=LD_LIBRARY_PATH=${LD_LIBRARY_PATH}

####### Build TF pip package
RUN bazel build --config=opt --config=cuda --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-mtune=generic --copt=-march=x86-64 --copt=-msse --copt=-msse2 --copt=-msse3 --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx //tensorflow/tools/pip_package:build_pip_package --verbose_failures --action_env=LD_LIBRARY_PATH=${LD_LIBRARY_PATH}

####### Fix for not found script https://github.com/tensorflow/tensorflow/issues/471
RUN ./configure

####### Build wheel
RUN bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

####### Install tensorflow from our custom wheel
RUN pip install /tmp/tensorflow_pkg/*.whl

####### Copy built libs to /DeepSpeech/native_client
RUN cp /tensorflow/bazel-bin/native_client/libctc_decoder_with_kenlm.so /DeepSpeech/native_client/
&& cp /tensorflow/bazel-bin/native_client/generate_trie /DeepSpeech/native_client/
&& cp /tensorflow/bazel-bin/native_client/libdeepspeech.so /DeepSpeech/native_client/
&& cp /tensorflow/bazel-bin/native_client/libdeepspeech_utils.so /DeepSpeech/native_client/

####### Make DeepSpeech and install Python bindings
ENV TFDIR /tensorflow
WORKDIR /DeepSpeech/native_client
RUN make deepspeech
RUN make bindings
RUN pip install dist/deepspeech*

####### END Build and bind

####### Allow Python printing utf-8
ENV PYTHONIOENCODING UTF-8

####### Build KenLM in /DeepSpeech/native_client/kenlm folder
WORKDIR /DeepSpeech/native_client
RUN rm -rf kenlm
&& git clone https://github.com/kpu/kenlm && cd kenlm
&& mkdir -p build
&& cd build
&& cmake …
&& make -j 4

####### Done
WORKDIR /DeepSpeech

CMD [“docker”, “run”, “–runtime=nvidia”, “–rm”, “nvidia/cuda nvidia-smi”]


I’m sorry, but it’s really out of the scope right now, you need to check on Docker side how to pass files into the container :/.

But your Dockerfile is going to take lots of time. If you don’t have any specific requirements, you should just re-use the prebuilt binaries we have. Then, how to pass the WAV file inside Docker, no idea.

Hi Aashish,

Here’s how I’d approach it:

  1. First, verify that DeepSpeech is installed correctly, and that you can run DS within your Docker container. You can do this via command line. First, boot into bash:

docker run -it --entrypoint /bin/bash <image>

Once in the command line inside Docker, try running DS using command line:

deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav

  1. Assuming you can run DS successfully via command line inside Docker, then you need to figure out how to interact with the container from outside the container.

Probably the easiest way is to mount a volume into Docker which contains the file you want to transcribe. Within your Dockerfile, add a command to run DS against that file, and save the output to another file within the same mounted volume. You can then access the results later from outside the Docker environment.

You could alternatively write an API endpoint which opens within Docker, and send the file from outside the container into Docker via the endpoint. That would be a more complex but scalable way to do it.

1 Like

Hi Derek!

Thanks for the response. Yes, I am able to run Deep Speech inside docker container.

As you suggested, the next step should be to create a volume and refer the input from outside the docker container i.e. the host. For that I created a docker-compose file, as below.

But the challenge is, that every time there is new input file and then we need to run the Deep Speech command again. I believe it’s just the environment that is ready. As per my understanding the client is not waiting for Input. Whenever we have input we need to run the command again. I am not sure, how we can achieve that. Please guide.

Note: Whenever I run the docker-compose up, I get the below error.

Creating deepspeech … error

ERROR: for deepspeech Cannot start service deepspeech: OCI runtime create failed: container_linux.go:348: starting container process caused “exec: “deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio ./data/smoke_test/LDC93S1.wav”: stat deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio ./data/smoke_test/LDC93S1.wav: no such file or directory”: unknown

================docker-compose.yml=====================

services:
deepspeech:
command:
- “deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio ./data/smoke_test/LDC93S1.wav >> text_file.txt”
container_name: deepspeech
image: deepspeech
ports:
- “80:80”
restart: always
tty: true
volumes:
- ./DeepSpeech/data/smoke_test:/DeepSpeech/data/smoke_test/
version: “3”

This docker is mainly intended for rebuilding deepspeech from scratch - you don’t need that as lissyx pointed out, you can just install pre-built deepspeech from pip inside the docker image.

Once you build a docker image with the pre-built deepspeech, it’s time to get your audio to the container created from that image.

One way of doing that is

  1. start container from your image
    docker run --name deepspeech_inference --rm -i -t <your_image_name> bash

  2. copy your audio to the container
    docker cp my_audio.wav deepspeech_inference:/my_audio.wav

  3. execute your inference inside the container
    docker exec -d deepspeech_inference deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio /my_audio.wav

@yv001, Thanks for the response.

The intent here is to build a model where the user can just provide the input and model provides the output back. The approach of passing the audio.wav can be solved from docker volumes concept. But what challenging is, every time I have the input, I need to run the deepspeech command again. Please advice.

Going inside the docker container and running the command again, is something I am trying to avoid.

Then you need a server code wrapping the deepspeech inference, python or node.js APIs are available for that, you’d need to write the server part.

What exactly is your usecase?

I am building a UI where user can record the voice and the voice is then send to DeepSpeech model. The output of the DeepSpeech model is routed back to the UI.

I want to have DeepSpeech model as Docker container, so I can easily install it on any server and get rid of installing dependencies again and again. And the Input from the UI can just be routed to the Docker Container.

As I mentioned before, I can map the input to Deep Speech from outside the container using docker volumes but I am not sure, how I can run the command every time I have a input from outside.

ok, in that case, you’d best use the deepspeech as a library in your code (e.g. python or javascript) and wrap it with a server code that would receive data from your ui.

Check this https://discourse.mozilla.org/t/new-project-deepspeech-websocket-server-client/32554

You must run a server inside the container that listens on a port for input (speech.) The server can than pass that to Deepspeech and get the output. It then returns it back out the port. I have an example client server for espnet, if that helps you: https://www.youtube.com/watch?v=ooEIfR3aw44&t=1014s