Problem building Docker image from Dockerfile.build file

Hi there!

I am trying to create a backend API that can transcript audio files using a self-trained model on Spanish. I have successfully trained and exported the model but I am running into problems when building a Docker image from the Dockerfile.build file for inference.

Currently using:

  • Ubuntu 18.04
  • DeepSpeech code v0.8.0
  • CUDA 10.0

It seems like there is a file that is no longer available.

Used command: docker build -t ds-gpu-inference-image .

Sending build context to Docker daemon  2.112MB
Step 1/78 : FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
 ---> b4879c167fc1
Step 2/78 : ENV DEEPSPEECH_REPO=https://github.com/mozilla/DeepSpeech.git
 ---> Using cache
 ---> 444156e926a9
Step 3/78 : ENV DEEPSPEECH_SHA=f56b07dab4542eecfb72e059079db6c2603cc0ee
 ---> Using cache
 ---> 384b8c501aea
Step 4/78 : RUN apt-get update && apt-get install -y --no-install-recommends     apt-utils     bash-completion     build-essential     ca-certificates     cmake     curl     g++     gcc     git     libbz2-dev     libboost-all-dev     libgsm1-dev     libltdl-dev     liblzma-dev     libmagic-dev     libpng-dev     libsox-fmt-mp3     libsox-dev     locales     openjdk-8-jdk     pkg-config     python3     python3-dev     python3-pip     python3-wheel     python3-numpy     sox     unzip     wget     zlib1g-dev
 ---> Using cache
 ---> efc6c3b960f5
Step 5/78 : RUN update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1
 ---> Using cache
 ---> 5ba47c76f517
Step 6/78 : RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
 ---> Using cache
 ---> 644df0f52ab4
Step 7/78 : RUN curl -LO "https://github.com/bazelbuild/bazel/releases/download/2.0.0/bazel_2.0.0-linux-x86_64.deb"
 ---> Using cache
 ---> 0507b6601591
Step 8/78 : RUN dpkg -i bazel_*.deb
 ---> Using cache
 ---> 2b1af20cd1e8
Step 9/78 : RUN rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 4cacf777f3a7
Step 10/78 : ENV TF_NEED_ROCM 0
 ---> Using cache
 ---> 3f27c2e14ead
Step 11/78 : ENV TF_NEED_OPENCL_SYCL 0
 ---> Using cache
 ---> b4b2ee280043
Step 12/78 : ENV TF_NEED_OPENCL 0
 ---> Using cache
 ---> 22256f4c31a2
Step 13/78 : ENV TF_NEED_CUDA 1
 ---> Using cache
 ---> 087a9749ff65
Step 14/78 : ENV TF_CUDA_PATHS "/usr,/usr/local/cuda-10.1,/usr/lib/x86_64-linux-gnu/"
 ---> Using cache
 ---> efba441d0240
Step 15/78 : ENV TF_CUDA_VERSION 10.1
 ---> Using cache
 ---> 2b7766e5eae0
Step 16/78 : ENV TF_CUDNN_VERSION 7.6
 ---> Using cache
 ---> db6e969af19d
Step 17/78 : ENV TF_CUDA_COMPUTE_CAPABILITIES 6.0
 ---> Using cache
 ---> 6f2da0577550
Step 18/78 : ENV TF_NCCL_VERSION 2.4
 ---> Using cache
 ---> e83383f6370f
Step 19/78 : ENV TF_BUILD_CONTAINER_TYPE GPU
 ---> Using cache
 ---> 38400da19ffb
Step 20/78 : ENV TF_BUILD_OPTIONS OPT
 ---> Using cache
 ---> a4eefdf7f939
Step 21/78 : ENV TF_BUILD_DISABLE_GCP 1
 ---> Using cache
 ---> 1d3000fa789d
Step 22/78 : ENV TF_BUILD_ENABLE_XLA 0
 ---> Using cache
 ---> 1cdcdc3c900b
Step 23/78 : ENV TF_BUILD_PYTHON_VERSION PYTHON3
 ---> Using cache
 ---> c14e53f797a6
Step 24/78 : ENV TF_BUILD_IS_OPT OPT
 ---> Using cache
 ---> 49fe25a28bed
Step 25/78 : ENV TF_BUILD_IS_PIP PIP
 ---> Using cache
 ---> 529142550289
Step 26/78 : ENV CC_OPT_FLAGS -mavx -mavx2 -msse4.1 -msse4.2 -mfma
 ---> Using cache
 ---> 5dfe84271b8e
Step 27/78 : ENV TF_NEED_GCP 0
 ---> Using cache
 ---> bdca3e85d066
Step 28/78 : ENV TF_NEED_HDFS 0
 ---> Using cache
 ---> 55e4cd2b64a8
Step 29/78 : ENV TF_NEED_JEMALLOC 1
 ---> Using cache
 ---> 733cbf159b70
Step 30/78 : ENV TF_NEED_OPENCL 0
 ---> Using cache
 ---> 02baafd3ab56
Step 31/78 : ENV TF_CUDA_CLANG 0
 ---> Using cache
 ---> 6a38cdd39d12
Step 32/78 : ENV TF_NEED_MKL 0
 ---> Using cache
 ---> 6cda864189a3
Step 33/78 : ENV TF_ENABLE_XLA 0
 ---> Using cache
 ---> 9ab772a5589e
Step 34/78 : ENV TF_NEED_AWS 0
 ---> Using cache
 ---> 61efb8c69886
Step 35/78 : ENV TF_NEED_KAFKA 0
 ---> Using cache
 ---> 497d1e296270
Step 36/78 : ENV TF_NEED_NGRAPH 0
 ---> Using cache
 ---> 58b78a2e6207
Step 37/78 : ENV TF_DOWNLOAD_CLANG 0
 ---> Using cache
 ---> d5d0932e3951
Step 38/78 : ENV TF_NEED_TENSORRT 0
 ---> Using cache
 ---> 03c1b52e2f3c
Step 39/78 : ENV TF_NEED_GDR 0
 ---> Using cache
 ---> de520a921cd7
Step 40/78 : ENV TF_NEED_VERBS 0
 ---> Using cache
 ---> ba51095102bb
Step 41/78 : ENV TF_NEED_OPENCL_SYCL 0
 ---> Using cache
 ---> 13b82cb7bc44
Step 42/78 : ENV PYTHON_BIN_PATH /usr/bin/python3.6
 ---> Using cache
 ---> 7986e3530984
Step 43/78 : ENV PYTHON_LIB_PATH /usr/local/lib/python3.6/dist-packages
 ---> Using cache
 ---> b235a61c40a9
Step 44/78 : RUN echo "startup --batch" >>/etc/bazel.bazelrc
 ---> Using cache
 ---> 12a852e2387b
Step 45/78 : RUN echo "build --spawn_strategy=standalone --genrule_strategy=standalone"     >>/etc/bazel.bazelrc
 ---> Using cache
 ---> e9a7d6054fb0
Step 46/78 : WORKDIR /
 ---> Using cache
 ---> e0edc519b068
Step 47/78 : RUN git clone --recursive $DEEPSPEECH_REPO
 ---> Using cache
 ---> 1cd739787180
Step 48/78 : WORKDIR /DeepSpeech
 ---> Using cache
 ---> 6ad8a9936a60
Step 49/78 : RUN git checkout $DEEPSPEECH_SHA
 ---> Using cache
 ---> e66941d9666d
Step 50/78 : RUN git submodule sync tensorflow/
 ---> Using cache
 ---> 51f3cdf5dfdf
Step 51/78 : RUN git submodule update --init tensorflow/
 ---> Using cache
 ---> 51a4ab54deac
Step 52/78 : WORKDIR /DeepSpeech/tensorflow
 ---> Using cache
 ---> 75ec19c7a3d2
Step 53/78 : RUN ./configure
 ---> Using cache
 ---> 4ac002464f1f
Step 54/78 : RUN bazel build     --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh"     --config=monolithic     --config=cuda     -c opt     --copt=-O3     --copt="-D_GLIBCXX_USE_CXX11_ABI=0"     --copt=-mtune=generic     --copt=-march=x86-64     --copt=-msse     --copt=-msse2     --copt=-msse3     --copt=-msse4.1     --copt=-msse4.2     --copt=-mavx     --copt=-fvisibility=hidden     //native_client:libdeepspeech.so     --verbose_failures     --action_env=LD_LIBRARY_PATH=${LD_LIBRARY_PATH}
 ---> Using cache
 ---> 4e0cdffccee8
Step 55/78 : RUN cp bazel-bin/native_client/libdeepspeech.so /DeepSpeech/native_client/
 ---> Using cache
 ---> ca0ebf61759c
Step 56/78 : ENV TFDIR /DeepSpeech/tensorflow
 ---> Using cache
 ---> 3cc353bad69b
Step 57/78 : RUN nproc
 ---> Using cache
 ---> 1bb00577d8cc
Step 58/78 : WORKDIR /DeepSpeech/native_client
 ---> Using cache
 ---> 823fb6066949
Step 59/78 : RUN make NUM_PROCESSES=$(nproc) deepspeech
 ---> Using cache
 ---> 8d3a17d399b0
Step 60/78 : WORKDIR /DeepSpeech
 ---> Using cache
 ---> ef58c905daaa
Step 61/78 : RUN cd native_client/python && make NUM_PROCESSES=$(nproc) bindings
 ---> Running in 594bdd4e8e7e
mkdir -p /DeepSpeech/native_client/ds-swig
wget -O - ""https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.swig.linux.amd64.b5fea54d39832d1d132d7dd921b69c0c2c9d5118/artifacts/public/ds-swig.tar.gz"" | tar -C /DeepSpeech/native_client/ds-swig -zxf -
--2020-08-17 09:35:21--  https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.swig.linux.amd64.b5fea54d39832d1d132d7dd921b69c0c2c9d5118/artifacts/public/ds-swig.tar.gz
Resolving community-tc.services.mozilla.com (community-tc.services.mozilla.com)... 34.102.144.36
Connecting to community-tc.services.mozilla.com (community-tc.services.mozilla.com)|34.102.144.36|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-08-17 09:35:22 ERROR 404: Not Found.


gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
make: *** [/DeepSpeech/native_client/ds-swig/bin/swig] Error 2
../definitions.mk:226: recipe for target '/DeepSpeech/native_client/ds-swig/bin/swig' failed
The command '/bin/sh -c cd native_client/python && make NUM_PROCESSES=$(nproc) bindings' returned a non-zero code: 2

From the output I can read that the file https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.swig.linux.amd64.b5fea54d39832d1d132d7dd921b69c0c2c9d5118/artifacts/public/ds-swig.tar.gz is no longer available.

I tried to search it by hand in Chrome but got the same output:

{
  "code": "ResourceNotFound",
  "message": "Indexed task not found\n\n---\n\n* method:     findArtifactFromTask\n* errorCode:  ResourceNotFound\n* statusCode: 404\n* time:       2020-08-17T08:40:34.063Z",
  "requestInfo": {
    "method": "findArtifactFromTask",
    "params": {
      "0": "public/ds-swig.tar.gz",
      "indexPath": "project.deepspeech.swig.linux.amd64.b5fea54d39832d1d132d7dd921b69c0c2c9d5118",
      "name": "public/ds-swig.tar.gz"
    },
    "payload": {},
    "time": "2020-08-17T08:40:34.063Z"
  }
}

I have modified the Dockerfile to meet my needs but just by making additions. Here are the modified contents just in case they help. I have ommitted most of the file to avoid a too long post. The rest of the file is left untouched:

# Build KenLM in /DeepSpeech/native_client/kenlm folder
WORKDIR /DeepSpeech/native_client
RUN rm -rf kenlm && \
    git clone https://github.com/kpu/kenlm && \
    cd kenlm && \
    git checkout 87e85e66c99ceff1fab2500a7c60c01da7315eec && \
    mkdir -p build && \
    cd build && \
    cmake .. && \
    make -j $(nproc)

# START >> My modifications
EXPOSE 8064
RUN mkdir /node_backend
WORKDIR /node_backend
COPY package.json ./
COPY app.js ./
COPY src/ ./
RUN curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash -
RUN apt-get install -y nodejs
RUN npm install
RUN node app.js
# END << My modifications

# Done
WORKDIR /DeepSpeech

Does anyone know what might I have done wrong or missconfigured? Any piece of advice would be very appreciated.

Thanks in advance :smiley:

this is wrong, it should be https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.swig.linux.amd64.b5fea54d39832d1d132d7dd921b69c0c2c9d5118.1/artifacts/public/ds-swig.tar.gz

First of all, thanks for you reply @lissyx . I can’t find where to replace such link. Is there any way to fix such link/edit some file on which to place it?. Maybe I have to execute the remaining commands by hand?

On top of that, I am trying to use node.js as the backend service. However, I see that the Dockerfile builds on /native-client/python. Does this affect my purpose in any way? Should I build the backend completely in Python?

Thanks again.

First, why do you need to rebuild?
Second, this docker is provided for people who need to rebuild something.

If you just need to deploy, then just use a prebuilt binary and a Dockerfile fetching official node release, that should work.

I’m sorry if I sound silly but I don’t know what you mean. I think I misunderstood the purpose of the Dockerfile. What I am trying is to build an API that can receive audio files from a client, use the model to transcript them and then return the resulting transcription.

I thought I’d need to build a node project for that and integrate it into a container created from this Dockerfile. I guess I am wrong from what you said. If that is the case, then I don’t know how to proceed to my objective.

So you suggest creating my own Dockerfile rather than using the one you provide?

FROM node:12

RUN npm install deepspeech

That’s basicall what you would need to run your project

Well… That sure seems easier.

Very much appreciated @lissyx

Anyway, this https://github.com/mozilla/STT/pull/3250 should fix the initial issue you hit