Problem building Docker image from Dockerfile.build file

fincamd · August 17, 2020, 9:48am

Hi there!

I am trying to create a backend API that can transcript audio files using a self-trained model on Spanish. I have successfully trained and exported the model but I am running into problems when building a Docker image from the Dockerfile.build file for inference.

Currently using:

Ubuntu 18.04
DeepSpeech code v0.8.0
CUDA 10.0

It seems like there is a file that is no longer available.

Used command: docker build -t ds-gpu-inference-image .

Sending build context to Docker daemon  2.112MB
Step 1/78 : FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
 ---> b4879c167fc1
Step 2/78 : ENV DEEPSPEECH_REPO=https://github.com/mozilla/DeepSpeech.git
 ---> Using cache
 ---> 444156e926a9
Step 3/78 : ENV DEEPSPEECH_SHA=f56b07dab4542eecfb72e059079db6c2603cc0ee
 ---> Using cache
 ---> 384b8c501aea
Step 4/78 : RUN apt-get update && apt-get install -y --no-install-recommends     apt-utils     bash-completion     build-essential     ca-certificates     cmake     curl     g++     gcc     git     libbz2-dev     libboost-all-dev     libgsm1-dev     libltdl-dev     liblzma-dev     libmagic-dev     libpng-dev     libsox-fmt-mp3     libsox-dev     locales     openjdk-8-jdk     pkg-config     python3     python3-dev     python3-pip     python3-wheel     python3-numpy     sox     unzip     wget     zlib1g-dev
 ---> Using cache
 ---> efc6c3b960f5
Step 5/78 : RUN update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1
 ---> Using cache
 ---> 5ba47c76f517
Step 6/78 : RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
 ---> Using cache
 ---> 644df0f52ab4
Step 7/78 : RUN curl -LO "https://github.com/bazelbuild/bazel/releases/download/2.0.0/bazel_2.0.0-linux-x86_64.deb"
 ---> Using cache
 ---> 0507b6601591
Step 8/78 : RUN dpkg -i bazel_*.deb
 ---> Using cache
 ---> 2b1af20cd1e8
Step 9/78 : RUN rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 4cacf777f3a7
Step 10/78 : ENV TF_NEED_ROCM 0
 ---> Using cache
 ---> 3f27c2e14ead
Step 11/78 : ENV TF_NEED_OPENCL_SYCL 0
 ---> Using cache
 ---> b4b2ee280043
Step 12/78 : ENV TF_NEED_OPENCL 0
 ---> Using cache
 ---> 22256f4c31a2
Step 13/78 : ENV TF_NEED_CUDA 1
 ---> Using cache
 ---> 087a9749ff65
Step 14/78 : ENV TF_CUDA_PATHS "/usr,/usr/local/cuda-10.1,/usr/lib/x86_64-linux-gnu/"
 ---> Using cache
 ---> efba441d0240
Step 15/78 : ENV TF_CUDA_VERSION 10.1
 ---> Using cache
 ---> 2b7766e5eae0
Step 16/78 : ENV TF_CUDNN_VERSION 7.6
 ---> Using cache
 ---> db6e969af19d
Step 17/78 : ENV TF_CUDA_COMPUTE_CAPABILITIES 6.0
 ---> Using cache
 ---> 6f2da0577550
Step 18/78 : ENV TF_NCCL_VERSION 2.4
 ---> Using cache
 ---> e83383f6370f
Step 19/78 : ENV TF_BUILD_CONTAINER_TYPE GPU
 ---> Using cache
 ---> 38400da19ffb
Step 20/78 : ENV TF_BUILD_OPTIONS OPT
 ---> Using cache
 ---> a4eefdf7f939
Step 21/78 : ENV TF_BUILD_DISABLE_GCP 1
 ---> Using cache
 ---> 1d3000fa789d
Step 22/78 : ENV TF_BUILD_ENABLE_XLA 0
 ---> Using cache
 ---> 1cdcdc3c900b
Step 23/78 : ENV TF_BUILD_PYTHON_VERSION PYTHON3
 ---> Using cache
 ---> c14e53f797a6
Step 24/78 : ENV TF_BUILD_IS_OPT OPT
 ---> Using cache
 ---> 49fe25a28bed
Step 25/78 : ENV TF_BUILD_IS_PIP PIP
 ---> Using cache
 ---> 529142550289
Step 26/78 : ENV CC_OPT_FLAGS -mavx -mavx2 -msse4.1 -msse4.2 -mfma
 ---> Using cache
 ---> 5dfe84271b8e
Step 27/78 : ENV TF_NEED_GCP 0
 ---> Using cache
 ---> bdca3e85d066
Step 28/78 : ENV TF_NEED_HDFS 0
 ---> Using cache
 ---> 55e4cd2b64a8
Step 29/78 : ENV TF_NEED_JEMALLOC 1
 ---> Using cache
 ---> 733cbf159b70
Step 30/78 : ENV TF_NEED_OPENCL 0
 ---> Using cache
 ---> 02baafd3ab56
Step 31/78 : ENV TF_CUDA_CLANG 0
 ---> Using cache
 ---> 6a38cdd39d12
Step 32/78 : ENV TF_NEED_MKL 0
 ---> Using cache
 ---> 6cda864189a3
Step 33/78 : ENV TF_ENABLE_XLA 0
 ---> Using cache
 ---> 9ab772a5589e
Step 34/78 : ENV TF_NEED_AWS 0
 ---> Using cache
 ---> 61efb8c69886
Step 35/78 : ENV TF_NEED_KAFKA 0
 ---> Using cache
 ---> 497d1e296270
Step 36/78 : ENV TF_NEED_NGRAPH 0
 ---> Using cache
 ---> 58b78a2e6207
Step 37/78 : ENV TF_DOWNLOAD_CLANG 0
 ---> Using cache
 ---> d5d0932e3951
Step 38/78 : ENV TF_NEED_TENSORRT 0
 ---> Using cache
 ---> 03c1b52e2f3c
Step 39/78 : ENV TF_NEED_GDR 0
 ---> Using cache
 ---> de520a921cd7
Step 40/78 : ENV TF_NEED_VERBS 0
 ---> Using cache
 ---> ba51095102bb
Step 41/78 : ENV TF_NEED_OPENCL_SYCL 0
 ---> Using cache
 ---> 13b82cb7bc44
Step 42/78 : ENV PYTHON_BIN_PATH /usr/bin/python3.6
 ---> Using cache
 ---> 7986e3530984
Step 43/78 : ENV PYTHON_LIB_PATH /usr/local/lib/python3.6/dist-packages
 ---> Using cache
 ---> b235a61c40a9
Step 44/78 : RUN echo "startup --batch" >>/etc/bazel.bazelrc
 ---> Using cache
 ---> 12a852e2387b
Step 45/78 : RUN echo "build --spawn_strategy=standalone --genrule_strategy=standalone"     >>/etc/bazel.bazelrc
 ---> Using cache
 ---> e9a7d6054fb0
Step 46/78 : WORKDIR /
 ---> Using cache
 ---> e0edc519b068
Step 47/78 : RUN git clone --recursive $DEEPSPEECH_REPO
 ---> Using cache
 ---> 1cd739787180
Step 48/78 : WORKDIR /DeepSpeech
 ---> Using cache
 ---> 6ad8a9936a60
Step 49/78 : RUN git checkout $DEEPSPEECH_SHA
 ---> Using cache
 ---> e66941d9666d
Step 50/78 : RUN git submodule sync tensorflow/
 ---> Using cache
 ---> 51f3cdf5dfdf
Step 51/78 : RUN git submodule update --init tensorflow/
 ---> Using cache
 ---> 51a4ab54deac
Step 52/78 : WORKDIR /DeepSpeech/tensorflow
 ---> Using cache
 ---> 75ec19c7a3d2
Step 53/78 : RUN ./configure
 ---> Using cache
 ---> 4ac002464f1f
Step 54/78 : RUN bazel build     --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh"     --config=monolithic     --config=cuda     -c opt     --copt=-O3     --copt="-D_GLIBCXX_USE_CXX11_ABI=0"     --copt=-mtune=generic     --copt=-march=x86-64     --copt=-msse     --copt=-msse2     --copt=-msse3     --copt=-msse4.1     --copt=-msse4.2     --copt=-mavx     --copt=-fvisibility=hidden     //native_client:libdeepspeech.so     --verbose_failures     --action_env=LD_LIBRARY_PATH=${LD_LIBRARY_PATH}
 ---> Using cache
 ---> 4e0cdffccee8
Step 55/78 : RUN cp bazel-bin/native_client/libdeepspeech.so /DeepSpeech/native_client/
 ---> Using cache
 ---> ca0ebf61759c
Step 56/78 : ENV TFDIR /DeepSpeech/tensorflow
 ---> Using cache
 ---> 3cc353bad69b
Step 57/78 : RUN nproc
 ---> Using cache
 ---> 1bb00577d8cc
Step 58/78 : WORKDIR /DeepSpeech/native_client
 ---> Using cache
 ---> 823fb6066949
Step 59/78 : RUN make NUM_PROCESSES=$(nproc) deepspeech
 ---> Using cache
 ---> 8d3a17d399b0
Step 60/78 : WORKDIR /DeepSpeech
 ---> Using cache
 ---> ef58c905daaa
Step 61/78 : RUN cd native_client/python && make NUM_PROCESSES=$(nproc) bindings
 ---> Running in 594bdd4e8e7e
mkdir -p /DeepSpeech/native_client/ds-swig
wget -O - ""https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.swig.linux.amd64.b5fea54d39832d1d132d7dd921b69c0c2c9d5118/artifacts/public/ds-swig.tar.gz"" | tar -C /DeepSpeech/native_client/ds-swig -zxf -
--2020-08-17 09:35:21--  https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.swig.linux.amd64.b5fea54d39832d1d132d7dd921b69c0c2c9d5118/artifacts/public/ds-swig.tar.gz
Resolving community-tc.services.mozilla.com (community-tc.services.mozilla.com)... 34.102.144.36
Connecting to community-tc.services.mozilla.com (community-tc.services.mozilla.com)|34.102.144.36|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-08-17 09:35:22 ERROR 404: Not Found.


gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
make: *** [/DeepSpeech/native_client/ds-swig/bin/swig] Error 2
../definitions.mk:226: recipe for target '/DeepSpeech/native_client/ds-swig/bin/swig' failed
The command '/bin/sh -c cd native_client/python && make NUM_PROCESSES=$(nproc) bindings' returned a non-zero code: 2

From the output I can read that the file https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.swig.linux.amd64.b5fea54d39832d1d132d7dd921b69c0c2c9d5118/artifacts/public/ds-swig.tar.gz is no longer available.

I tried to search it by hand in Chrome but got the same output:

{
  "code": "ResourceNotFound",
  "message": "Indexed task not found\n\n---\n\n* method:     findArtifactFromTask\n* errorCode:  ResourceNotFound\n* statusCode: 404\n* time:       2020-08-17T08:40:34.063Z",
  "requestInfo": {
    "method": "findArtifactFromTask",
    "params": {
      "0": "public/ds-swig.tar.gz",
      "indexPath": "project.deepspeech.swig.linux.amd64.b5fea54d39832d1d132d7dd921b69c0c2c9d5118",
      "name": "public/ds-swig.tar.gz"
    },
    "payload": {},
    "time": "2020-08-17T08:40:34.063Z"
  }
}

I have modified the Dockerfile to meet my needs but just by making additions. Here are the modified contents just in case they help. I have ommitted most of the file to avoid a too long post. The rest of the file is left untouched:

# Build KenLM in /DeepSpeech/native_client/kenlm folder
WORKDIR /DeepSpeech/native_client
RUN rm -rf kenlm && \
    git clone https://github.com/kpu/kenlm && \
    cd kenlm && \
    git checkout 87e85e66c99ceff1fab2500a7c60c01da7315eec && \
    mkdir -p build && \
    cd build && \
    cmake .. && \
    make -j $(nproc)

# START >> My modifications
EXPOSE 8064
RUN mkdir /node_backend
WORKDIR /node_backend
COPY package.json ./
COPY app.js ./
COPY src/ ./
RUN curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash -
RUN apt-get install -y nodejs
RUN npm install
RUN node app.js
# END << My modifications

# Done
WORKDIR /DeepSpeech

Does anyone know what might I have done wrong or missconfigured? Any piece of advice would be very appreciated.

Thanks in advance

lissyx · August 17, 2020, 9:50am

this is wrong, it should be https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.swig.linux.amd64.b5fea54d39832d1d132d7dd921b69c0c2c9d5118.1/artifacts/public/ds-swig.tar.gz

fincamd · August 17, 2020, 10:15am

First of all, thanks for you reply @lissyx . I can’t find where to replace such link. Is there any way to fix such link/edit some file on which to place it?. Maybe I have to execute the remaining commands by hand?

On top of that, I am trying to use node.js as the backend service. However, I see that the Dockerfile builds on /native-client/python. Does this affect my purpose in any way? Should I build the backend completely in Python?

Thanks again.

lissyx · August 17, 2020, 10:22am

First, why do you need to rebuild?
Second, this docker is provided for people who need to rebuild something.

If you just need to deploy, then just use a prebuilt binary and a Dockerfile fetching official node release, that should work.

fincamd · August 17, 2020, 10:47am

I’m sorry if I sound silly but I don’t know what you mean. I think I misunderstood the purpose of the Dockerfile. What I am trying is to build an API that can receive audio files from a client, use the model to transcript them and then return the resulting transcription.

I thought I’d need to build a node project for that and integrate it into a container created from this Dockerfile. I guess I am wrong from what you said. If that is the case, then I don’t know how to proceed to my objective.

So you suggest creating my own Dockerfile rather than using the one you provide?

lissyx · August 17, 2020, 10:47am

FROM node:12

RUN npm install deepspeech

That’s basicall what you would need to run your project

fincamd · August 17, 2020, 10:52am

Well… That sure seems easier.

Very much appreciated @lissyx

lissyx · August 17, 2020, 11:05am

Anyway, this https://github.com/mozilla/STT/pull/3250 should fix the initial issue you hit