Dockerfile build issue

How did you solve the problem.

I am facing the same problem in google colab for fine-tuning the checkpoint of 0.6.1 version

code:-
!python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir /content/gdrive/My Drive/DeepSpeech/checkpoint_directory/ --epochs 3 --train_files librivox-train-clean-100.csv --dev_files librivox-dev-clean.csv --test_files librivox-test-clean.csv --learning_rate=0.0001

output:-
Traceback (most recent call last):
File “DeepSpeech.py”, line 7, in
from deepspeech_training import train as ds_train
File “/content/gdrive/My Drive/DeepSpeech/training/deepspeech_training/train.py”, line 30, in
from .evaluate import evaluate
File “/content/gdrive/My Drive/DeepSpeech/training/deepspeech_training/evaluate.py”, line 26, in
check_ctcdecoder_version()
File “/content/gdrive/My Drive/DeepSpeech/training/deepspeech_training/util/helpers.py”, line 57, in check_ctcdecoder_version
rv = semver.compare(ds_version_s, decoder_version_s)
File “/usr/local/lib/python3.6/dist-packages/semver.py”, line 108, in wrapper
return func(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/semver.py”, line 787, in compare
v1 = VersionInfo.parse(ver1)
File “/usr/local/lib/python3.6/dist-packages/semver.py”, line 657, in parse
raise ValueError("%s is not valid SemVer string" % version)
ValueError: …/…/VERSION is not valid SemVer string[quote=“hashim, post:40, topic:58592, full:true”]
Is there a proper solution to

ValueError: …/…/VERSION is not valid SemVer string

@Rahul_Jain do you reproduce when running ./bin/run-ldc93s1.sh ?

Please, this thread was about Dockerfile issue, it’s not your case. Verify that you performed the appropriate installation steps. Also, I have no idea if symlinks are working correctly in Colab / Google Drive. Please ensure you are running from the correct directories as well.

1 Like

After cloning it in google colab, I am able to run the shell
But after cloning and saving to google drive and accessing these shell from drive and then its not running on google colab.

This error appears when we clone a repo in google drive and run the python file from google drive

The best way to avoid this error is to clone the repo on google colab or platforms where you execute the code and for accessing data you use google drive

Do Keep in mind, while accessing data from google drive write the path in “…/…/…/.” to avoid space problem

Now I am facing a new error while running the checkpoint 0.6.1 in google colab

%cd /content/DeepSpeech/
#!mkdir fine_tuning_checkpoints
!CUDA_VISIBLE_DEVICES=2 python3 DeepSpeech.py \
--n_hidden 2048 \
--checkpoint_dir "/content/gdrive/My Drive/DeepSpeech/checkpoint_directory/deepspeech-0.6.1-checkpoint/" \
--epochs 1 \
--train_files "/content/gdrive/My Drive/DeepSpeech/data_directory/librivox-train-clean-100.csv" \
--dev_files "/content/gdrive/My Drive/DeepSpeech/data_directory/librivox-dev-clean.csv" \
--test_files "/content/gdrive/My Drive/DeepSpeech/data_directory/librivox-test-clean.csv" \
--learning_rate 0.0001 \
--use_allow_growth true \
--train_cudnn true

output:

tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [dropout=0, seed=4568, num_params=8, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=247]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
  <no registered kernels 
	 [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]

If this cuda problem I updated it and worked upon it,

I can run ./bin/run-ldc93s1.sh

and also able to train by removinng --chekpoint_dir and --train_cudnn true flags

can you suggest me platforms other than google colab where gpu is free

and there is no cuda problem and code have excuted properly

This is CUDNN problem. Check versions.

Spaces in path are a recipe for future issues. Please avoid.

code:-
!cat /usr/local/cuda/version.txt
!nvcc --version
!cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

output:-
CUDA Version 10.2.89
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5

#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include “driver_types.h”

what would you suggest me to do after having a look in these versions

And please reply for other platforms

Read the doc and install the correct versions: CUDA 10.2 is no good for that TensorFlow version.

Sorry I don’t know any platform that provides free GPU.

@Rahul_Jain Next time, please make an effort and properly format your console / code.

code:-
!cat /usr/local/cuda/version.txt
!nvcc --version
!cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2

output:-
CUDA Version 10.0.130
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5

#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include “driver_types.h”

tensorflow version is 1.15.2

all versions are according to software requirements

And still I face the same issue please help

First, i’ve asked you to properly paste code and console content. This is hard and painful to read as is.

Second, you hijack existing thread with unrelated issues, this is adding noise and making it very hard for everybody to track the status and extract useful informations.
For example, how do I know quickly what version you are working on? I can’t go to the top of the thread, I have to scroll and find the message where you started.

Third, you are using 0.6 branch so please read the doc and install the proper versions: you need TensorFlow r1.14, and likely a different version of CuDNN library.

i m sorry , i am a newbie but now i get it

I solved the issue by checking gpu version
code:-
import tensorflow as tf
tf.test.gpu_device_name()

I got ‘/device:GPU:0’

and then this code worked
!CUDA_VISIBLE_DEVICES=0 python3 DeepSpeech.py
–n_hidden 2048
–checkpoint_dir “/content/gdrive/My Drive/DeepSpeech/checkpoint_directory/deepspeech-0.6.1-checkpoint/”
–epochs 1
–train_files “/content/gdrive/My Drive/DeepSpeech/data_directory/librivox-train-clean-100.csv”
–dev_files “/content/gdrive/My Drive/DeepSpeech/data_directory/librivox-dev-clean.csv”
–test_files “/content/gdrive/My Drive/DeepSpeech/data_directory/librivox-test-clean.csv”
–learning_rate 0.0001
–use_allow_growth true
–train_cudnn true

1 Like

the issue was resolved last time but it repeated

can you please specify the versions of tensorflow, cuda and others I have to use for 0.6.1 checkpoint

can you please specify where can i know the versions

read the documentation.