Float: zero division error while doing Transfer Learning on custom data

Hi Deepspeech,

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04) : Ubuntu 16.04 LTS
  • TensorFlow installed from (our builds, or upstream TensorFlow) : NO
  • TensorFlow version (use command below) : 1.13
  • Python version : Python 3.6.8

Here are the command and output of the error

deepspeech-venv) javi@thinkpad-e470:~/speech/tools/DeepSpeechWorking$ bin/run-alfred.sh

  • [ ! -f DeepSpeech.py ]
  • python -u DeepSpeech.py --train_files /home/javi/speech/tools/backup/train/train.csv --dev_files /home/javi/speech/tools/backup/train/dev.csv --test_files /home/javi/speech/tools/backup/train/test.csv --train_batch_size 80 --dev_batch_size 80 --test_batch_size 40 --n_hidden 375 --epoch 33 --early_stop True --earlystop_nsteps 6 --estop_mean_thresh 0.1 --estop_std_thresh 0.1 --dropout_rate 0.22 --learning_rate 0.00095 --report_count 100 --use_seq_length False --export_dir /home/javi/speech/tools/backup/export_modal/ --checkpoint_dir /home/javi/speech/tools/backup/checkout/ --alphabet_config_path /home/javi/speech/tools/backup/DeepSpeech/data/alphabet.txt --lm_binary_path /home/javi/speech/tools/backup/DeepSpeech/data/lm.binary
    WARNING:tensorflow:From /home/javi/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    tf.py_func is deprecated in TF V2. Instead, use
    tf.py_function, which takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It’s easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means tf.py_functions can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.

WARNING:tensorflow:From /home/javi/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:358: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/javi/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/lstm_ops.py:696: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /home/javi/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
I Restored variables from most recent checkpoint at /home/javi/speech/tools/backup/checkout/train-0, step 0
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: /home/javi/speech/tools/backup/train/dev.csv
Traceback (most recent call last):
File “DeepSpeech.py”, line 836, in
tf.app.run(main)
File “/home/javi/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “DeepSpeech.py”, line 820, in main
train()
File “DeepSpeech.py”, line 524, in train
dev_loss = dev_loss / total_steps
ZeroDivisionError: float division by zero

That’s usually a hint of a problem in your dataset.

1 Like