Fatal Python error while transfer learning

Hi, I am trying to train DeepSpeech-0.8.1 on my data set using transfer learning. I have 5000 training examples, 2000 Dev and 1000 test set. I am getting Fatal Python error can any one help me with this?

I Loading best validating checkpoint from deepspeech-0.8.1-checkpoint/best_dev-732522
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/bias/Adam
I Loading variable from checkpoint: layer_6/bias/Adam_1
I Loading variable from checkpoint: layer_6/weights
I Loading variable from checkpoint: layer_6/weights/Adam
I Loading variable from checkpoint: layer_6/weights/Adam_1
I Initializing variable: learning_rate
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:08:19 | Steps: 1377 | Loss: 138.792073  2020-08-24 10:34:08.297475: F tensorflow/stream_executor/cuda/cuda_dnn.cc:1403] Check failed: max_seq_length > 0 (0 vs. 0)
Fatal Python error: Aborted

Thread 0x00007fde9cffd700 (most recent call first):
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 379 in _recv
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 407 in _recv_bytes
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 250 in recv
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 463 in _handle_results
  File "/usr/lib/python3.6/threading.py", line 864 in run
  File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
  File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap

Thread 0x00007fde9d7fe700 (most recent call first):
  File "/storage/Hassaan/DeepSpeech-0.8.1/training/deepspeech_training/util/helpers.py", line 97 in _limit
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 290 in _guarded_task_generation
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 419 in _handle_tasks
  File "/usr/lib/python3.6/threading.py", line 864 in run
  File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
  File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap

Thread 0x00007fde9dfff700 (most recent call first):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 406 in _handle_workers
  File "/usr/lib/python3.6/threading.py", line 864 in run
  File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
  File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap

Thread 0x00007fdefb35c700 (most recent call first):
  File "/usr/lib/python3.6/threading.py", line 295 in wait
  File "/usr/lib/python3.6/queue.py", line 164 in get
  File "/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow_core/python/summary/writer/event_file_writer.py", line 159 in run
  File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
  File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap

Thread 0x00007fdefbb5d700 (most recent call first):
  File "/usr/lib/python3.6/threading.py", line 295 in wait
  File "/usr/lib/python3.6/queue.py", line 164 in get
  File "/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow_core/python/summary/writer/event_file_writer.py", line 159 in run
  File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
  File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap

Thread 0x00007fdfe0ff9700 (most recent call first):
  File "/usr/lib/python3.6/threading.py", line 295 in wait
  File "/usr/lib/python3.6/queue.py", line 164 in get
  File "/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow_core/python/summary/writer/event_file_writer.py", line 159 in run
  File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
  File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap

Thread 0x00007fe05eb21740 (most recent call first):
  File "/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443 in _call_tf_sessionrun
  File "/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350 in _run_fn
  File "/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365 in _do_call
  File "/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359 in _do_run
  File "/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180 in _run
  File "/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956 in run
  File "/storage/Hassaan/DeepSpeech-0.8.1/training/deepspeech_training/train.py", line 566 in run_set
  File "/storage/Hassaan/DeepSpeech-0.8.1/training/deepspeech_training/train.py", line 601 in train
  File "/storage/Hassaan/DeepSpeech-0.8.1/training/deepspeech_training/train.py", line 933 in main
  File "/home/ubuntu/.local/lib/python3.6/site-packages/absl/app.py", line 250 in _run_main
  File "/home/ubuntu/.local/lib/python3.6/site-packages/absl/app.py", line 299 in run
  File "/storage/Hassaan/DeepSpeech-0.8.1/training/deepspeech_training/train.py", line 961 in run_script
  File "Dpy.py", line 12 in <module>

I am using following command to run deepspeech

python DeepSpeech.py --n_hidden 2048 --checkpoint_dir deepspeech-0.8.1-checkpoint --epochs 3 --train_files ../Audios/Data/train/train.csv --dev_files ../Audios/Data/dev/dev.csv --test_files ../Audios/Data/test/test.csv --learning_rate 0.0001 --use_allow_growth --train_cudnn

Package                 Version
----------------------- -------------------
decorator               4.4.2
deepspeech-gpu          0.8.1
ds-ctcdecoder           0.9.0a5
future                  0.18.2
Keras-Applications      1.0.8
Keras-Preprocessing     1.1.2
numpy                   1.18.5
pandas                  1.1.0
pandocfilters           1.4.2
pip                     20.2
progressbar2            3.47.0
prompt-toolkit          3.0.5
pycrypto                2.6.1
pydub                   0.24.1
tensorflow-estimator    1.15.1
tensorflow-gpu          1.15.0

You are using CUDA 11, usually only 10.1 is supported. Training doesn’t start. Have you checked, that a training with about 100 files is running at all? That way you can check your setup is working.

1 Like

yes, I have trained it on a subset with 1000 training examples 300 dev and 100 test set and it’s working fine. I am only getting this error while using full data set.

You have something broken in your data. It might just be too big file.

Please triple verify your CUDA (10.0) and CUDNN versions (7.6).

1 Like

I have reinstalled cuda but i am still getting same error.

ubuntu@deepspeech:~$ dpkg -l | grep cuda
    ii  cuda-10-0                               10.0.130-1                                      amd64        CUDA 10.0 meta-package
    ii  cuda-command-line-tools-10-0            10.0.130-1                                      amd64        CUDA command-line tools
    ii  cuda-compiler-10-0                      10.0.130-1                                      amd64        CUDA compiler
    ii  cuda-cublas-10-0                        10.0.130-1                                      amd64        CUBLAS native runtime libraries
    ii  cuda-cublas-dev-10-0                    10.0.130-1                                      amd64        CUBLAS native dev links, headers
    ii  cuda-cudart-10-0                        10.0.130-1                                      amd64        CUDA Runtime native Libraries
    ii  cuda-cudart-dev-10-0                    10.0.130-1                                      amd64        CUDA Runtime native dev links, headers
    ii  cuda-cufft-10-0                         10.0.130-1                                      amd64        CUFFT native runtime libraries
    ii  cuda-cufft-dev-10-0                     10.0.130-1                                      amd64        CUFFT native dev links, headers
    ii  cuda-cuobjdump-10-0                     10.0.130-1                                      amd64        CUDA cuobjdump
    ii  cuda-cupti-10-0                         10.0.130-1                                      amd64        CUDA profiling tools interface.
    ii  cuda-curand-10-0                        10.0.130-1                                      amd64        CURAND native runtime libraries
    ii  cuda-curand-dev-10-0                    10.0.130-1                                      amd64        CURAND native dev links, headers
    ii  cuda-cusolver-10-0                      10.0.130-1                                      amd64        CUDA solver native runtime libraries
    ii  cuda-cusolver-dev-10-0                  10.0.130-1                                      amd64        CUDA solver native dev links, headers
    ii  cuda-cusparse-10-0                      10.0.130-1                                      amd64        CUSPARSE native runtime libraries
    ii  cuda-cusparse-dev-10-0                  10.0.130-1                                      amd64        CUSPARSE native dev links, headers
    ii  cuda-demo-suite-10-0                    10.0.130-1                                      amd64        Demo suite for CUDA
    ii  cuda-documentation-10-0                 10.0.130-1                                      amd64        CUDA documentation
    ii  cuda-driver-dev-10-0                    10.0.130-1                                      amd64        CUDA Driver native dev stub library
    ii  cuda-drivers                            450.51.06-1                                     amd64        CUDA Driver meta-package, branch-agnostic
    ii  cuda-drivers-450                        450.51.06-1                                     amd64        CUDA Driver meta-package, branch-specific
    ii  cuda-gdb-10-0                           10.0.130-1                                      amd64        CUDA-GDB
    ii  cuda-gpu-library-advisor-10-0           10.0.130-1                                      amd64        CUDA GPU Library Advisor.
    ii  cuda-libraries-10-0                     10.0.130-1                                      amd64        CUDA Libraries 10.0 meta-package
    ii  cuda-libraries-dev-10-0                 10.0.130-1                                      amd64        CUDA Libraries 10.0 development meta-package
    ii  cuda-license-10-0                       10.0.130-1                                      amd64        CUDA licenses
    ii  cuda-memcheck-10-0                      10.0.130-1                                      amd64        CUDA-MEMCHECK
    ii  cuda-misc-headers-10-0                  10.0.130-1                                      amd64        CUDA miscellaneous headers
    ii  cuda-npp-10-0                           10.0.130-1                                      amd64        NPP native runtime libraries
    ii  cuda-npp-dev-10-0                       10.0.130-1                                      amd64        NPP native dev links, headers
    ii  cuda-nsight-10-0                        10.0.130-1                                      amd64        CUDA nsight
    ii  cuda-nsight-compute-10-0                10.0.130-1                                      amd64        NVIDIA Nsight Compute
    ii  cuda-nvcc-10-0                          10.0.130-1                                      amd64        CUDA nvcc
    ii  cuda-nvdisasm-10-0                      10.0.130-1                                      amd64        CUDA disassembler
    ii  cuda-nvgraph-10-0                       10.0.130-1                                      amd64        NVGRAPH native runtime libraries
    ii  cuda-nvgraph-dev-10-0                   10.0.130-1                                      amd64        NVGRAPH native dev links, headers
    ii  cuda-nvjpeg-10-0                        10.0.130.1-1                                    amd64        NVJPEG native runtime libraries
    ii  cuda-nvjpeg-dev-10-0                    10.0.130.1-1                                    amd64        NVJPEG native dev links, headers
    ii  cuda-nvml-dev-10-0                      10.0.130-1                                      amd64        NVML native dev links, headers
    ii  cuda-nvprof-10-0                        10.0.130-1                                      amd64        CUDA Profiler tools
    ii  cuda-nvprune-10-0                       10.0.130-1                                      amd64        CUDA nvprune
    ii  cuda-nvrtc-10-0                         10.0.130-1                                      amd64        NVRTC native runtime libraries
    ii  cuda-nvrtc-dev-10-0                     10.0.130-1                                      amd64        NVRTC native dev links, headers
    ii  cuda-nvtx-10-0                          10.0.130-1                                      amd64        NVIDIA Tools Extension
    ii  cuda-nvvp-10-0                          10.0.130-1                                      amd64        CUDA nvvp
    ii  cuda-repo-ubuntu1804                    10.2.89-1                                       amd64        cuda repository configuration files
    ii  cuda-runtime-10-0                       10.0.130-1                                      amd64        CUDA Runtime 10.0 meta-package
    ii  cuda-samples-10-0                       10.0.130-1                                      amd64        CUDA example applications
    ii  cuda-toolkit-10-0                       10.0.130-1                                      amd64        CUDA Toolkit 10.0 meta-package
    ii  cuda-tools-10-0                         10.0.130-1                                      amd64        CUDA Tools meta-package
    ii  cuda-visual-tools-10-0                  10.0.130-1                                      amd64        CUDA visual tools
    ii  libcudnn7                               7.6.5.32-1+cuda10.2                             amd64        cuDNN runtime libraries
    ii  libcudnn7-dev                           7.6.5.32-1+cuda10.2                             amd64        cuDNN development libraries and headers
    ii  libnccl2                                2.7.8-1+cuda11.0                                amd64        NVIDIA Collectives Communication Library (NCCL) Runtime

Have you read my message?

yes, I have. The maximum file size is 5Mb in my dateset and removing them solves the error. Thank You!

1 Like