Hello, this will be a quick guide to deploy Nvidia Docker container and take advantage of the Nvidia Tensor Cores without changing the code.
Before starting the deployment of the optimized container should read:
The requirements are almost the same as normal DeepSpeech deployment for training, we will require 3 extra things:
Requirements :
-
A GPU that contains Tensorcores, you should research if your GPU has TensorCores.
-
Install docker, it depends on your platform so, I’m not including the installation in this guide.
Installing the container:
Before installing the container, you should only clone the DeepSpeech repo and remove tensorflow requirement from requeriments.txt
Next, we pull the container by running:
docker pull nvcr.io/nvidia/tensorflow:19.04-py3
Now we should setup our workspace inside the downloaded container, to activate the container we will run:
sudo nvidia-docker run -it --rm -v $HOME:$HOME nvcr.io/nvidia/tensorflow:19.04-py3
Notice how I used my current home, this allow me to use my existing paths inside the container. If you don’t want to match home, you can use to set it to any other directory:
sudo nvidia-docker run -it --rm -v /user/home/deepspeech:/deepspeech nvcr.io/nvidia/tensorflow:19.04-py3
In my case I was using a cloud instance with an extra mounted disk, if you need to add other path to the image like I required to just add an extra -v .
Now we need to install the requirements:
Again, make sure you removed TensorFlow dependency from requirements.txt, the container already is using an optimized version of TensorFlow fully compatible with DeepSpeech.
Run inside the container at your deepspeech cloned repo:
pip3 install -r requirements.txt
We need the decoder too:
pip3 install $(python3 util/taskcluster.py --decoder)
You probably will hit an issue related to pandas and python 3.5
To fix the issue run:
python3 -m pip install --upgrade pandas
Notice that we don’t need to use a virtual environment.
Finally, we need to enable the use of auto mixed precision by:
export TF_ENABLE_AUTO_MIXED_PRECISION=1
To check if your GPU is using tensor cores you can use nvprof in front of your command, something like:
nvprof python DeepSpeech.py --the rest of your params
Then you will get a log of used instructions, to know if the tensor cores were used you need to search for: s884gem_fp16
My result on my small test of 300h and 1 V100 GPU:
Type | Time | WER | Epochs |
---|---|---|---|
Normal training(fp32) | 2:27:54 | 0.084087 | 10 |
Auto Mixed precision training(fp16) | 1:39:03 | 0.091663 | 10 |
Unfortunately, I can’t run larger test.
This is a potential PR, please feel free to suggest any changes and share insights if you use the container.