Dockerfile build issue

lissyx · June 2, 2020, 8:43pm

Please, this thread was about Dockerfile issue, it’s not your case. Verify that you performed the appropriate installation steps. Also, I have no idea if symlinks are working correctly in Colab / Google Drive. Please ensure you are running from the correct directories as well.

Rahul_Jain · June 3, 2020, 5:00am

After cloning it in google colab, I am able to run the shell
But after cloning and saving to google drive and accessing these shell from drive and then its not running on google colab.

Rahul_Jain · June 3, 2020, 8:18am

This error appears when we clone a repo in google drive and run the python file from google drive

The best way to avoid this error is to clone the repo on google colab or platforms where you execute the code and for accessing data you use google drive

Do Keep in mind, while accessing data from google drive write the path in “…/…/…/.” to avoid space problem

Rahul_Jain · June 4, 2020, 8:53am

Now I am facing a new error while running the checkpoint 0.6.1 in google colab

%cd /content/DeepSpeech/
#!mkdir fine_tuning_checkpoints
!CUDA_VISIBLE_DEVICES=2 python3 DeepSpeech.py \
--n_hidden 2048 \
--checkpoint_dir "/content/gdrive/My Drive/DeepSpeech/checkpoint_directory/deepspeech-0.6.1-checkpoint/" \
--epochs 1 \
--train_files "/content/gdrive/My Drive/DeepSpeech/data_directory/librivox-train-clean-100.csv" \
--dev_files "/content/gdrive/My Drive/DeepSpeech/data_directory/librivox-dev-clean.csv" \
--test_files "/content/gdrive/My Drive/DeepSpeech/data_directory/librivox-test-clean.csv" \
--learning_rate 0.0001 \
--use_allow_growth true \
--train_cudnn true

output:

tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [dropout=0, seed=4568, num_params=8, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=247]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
  <no registered kernels 
	 [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]

If this cuda problem I updated it and worked upon it,

I can run ./bin/run-ldc93s1.sh

and also able to train by removinng --chekpoint_dir and --train_cudnn true flags

Rahul_Jain · June 4, 2020, 8:24am

can you suggest me platforms other than google colab where gpu is free

and there is no cuda problem and code have excuted properly

lissyx · June 4, 2020, 8:53am

This is CUDNN problem. Check versions.

lissyx · June 4, 2020, 8:54am

Spaces in path are a recipe for future issues. Please avoid.

Rahul_Jain · June 4, 2020, 9:10am

code:-
!cat /usr/local/cuda/version.txt
!nvcc --version
!cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

output:-
CUDA Version 10.2.89
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5

#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include “driver_types.h”

what would you suggest me to do after having a look in these versions

And please reply for other platforms

lissyx · June 4, 2020, 9:13am

Read the doc and install the correct versions: CUDA 10.2 is no good for that TensorFlow version.

lissyx · June 4, 2020, 9:13am

Sorry I don’t know any platform that provides free GPU.

lissyx · June 4, 2020, 9:14am

@Rahul_Jain Next time, please make an effort and properly format your console / code.

Rahul_Jain · June 4, 2020, 11:40am

code:-
!cat /usr/local/cuda/version.txt
!nvcc --version
!cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2

output:-
CUDA Version 10.0.130
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5

#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include “driver_types.h”

tensorflow version is 1.15.2

all versions are according to software requirements

And still I face the same issue please help

lissyx · June 4, 2020, 11:55am

First, i’ve asked you to properly paste code and console content. This is hard and painful to read as is.

Second, you hijack existing thread with unrelated issues, this is adding noise and making it very hard for everybody to track the status and extract useful informations.
For example, how do I know quickly what version you are working on? I can’t go to the top of the thread, I have to scroll and find the message where you started.

Third, you are using 0.6 branch so please read the doc and install the proper versions: you need TensorFlow r1.14, and likely a different version of CuDNN library.

Rahul_Jain · June 4, 2020, 1:14pm

i m sorry , i am a newbie but now i get it

I solved the issue by checking gpu version
code:-
import tensorflow as tf
tf.test.gpu_device_name()

I got ‘/device:GPU:0’

and then this code worked
!CUDA_VISIBLE_DEVICES=0 python3 DeepSpeech.py
–n_hidden 2048
–checkpoint_dir “/content/gdrive/My Drive/DeepSpeech/checkpoint_directory/deepspeech-0.6.1-checkpoint/”
–epochs 1
–train_files “/content/gdrive/My Drive/DeepSpeech/data_directory/librivox-train-clean-100.csv”
–dev_files “/content/gdrive/My Drive/DeepSpeech/data_directory/librivox-dev-clean.csv”
–test_files “/content/gdrive/My Drive/DeepSpeech/data_directory/librivox-test-clean.csv”
–learning_rate 0.0001
–use_allow_growth true
–train_cudnn true

Rahul_Jain · June 15, 2020, 6:57am

the issue was resolved last time but it repeated

Rahul_Jain · June 15, 2020, 6:58am

can you please specify the versions of tensorflow, cuda and others I have to use for 0.6.1 checkpoint

Rahul_Jain · June 15, 2020, 7:08am

can you please specify where can i know the versions

lissyx · June 15, 2020, 7:10am

read the documentation.

Rahul_Jain · June 15, 2020, 7:16am

please provide the link of documentation

lissyx · June 15, 2020, 7:17am

seriously? it’s the first link on github.

Dockerfile build issue

output:- CUDA Version 10.2.89 nvcc: NVIDIA ® Cuda compiler driver Copyright © 2005-2019 NVIDIA Corporation Built on Wed_Oct_23_19:24:38_PDT_2019 Cuda compilation tools, release 10.2, V10.2.89 #define CUDNN_MAJOR 7 #define CUDNN_MINOR 6 #define CUDNN_PATCHLEVEL 5

output:- CUDA Version 10.0.130 nvcc: NVIDIA ® Cuda compiler driver Copyright © 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130 #define CUDNN_MAJOR 7 #define CUDNN_MINOR 6 #define CUDNN_PATCHLEVEL 5

output:-
CUDA Version 10.2.89
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5

output:-
CUDA Version 10.0.130
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5