Training a deepspeech-0.6 model i'm facing issue

While training a model from scratch without using checkpoint with default argument it’s working fine, the epoch is running.

Refer to the below-mentioned command for training.

'python3.7 DeepSpeech.py \
			--lm_binary_path ./data/lm/lm.binary \
			--lm_trie_path ./data/lm/trie \
			--learning_rate 0.0001 \
			--train_batch_size 32 \
			--alphabet_config_path ./data/alphabet_old.txt \
			--remove_export true \
			--train_files ./data/testing_data/16_12_19_only_wav.csv \
			--export_dir ./export  \
			--checkpoint_dir ./scratch_checkpoint \
			--epochs 10')

After changing the augmentation for augmentation_pitch_and_tempo_scaling to True I’m facing the issue.

While I trace the error in spectrogram_augmentations.py augment_pitch_and_tempo funtion i’m facing resize bilinear

Exactly spectrogram_aug = tf.image.resize_bilinear(tf.expand_dims(spectrogram, -1), [new_height, new_width]) this line while calculating the new_hight and new_width while I think -(minus) value comming but how it is comming and unable to get the calculation value i don’t know beacause it’s comming tensor value.

Please assist in how to solve this issue. Please refer the above loop conversation for augmentation arguments.

My end goal I want to do augmentation for my dataset.

Thanks…


While using deepspeech-0.6 custome training while changing an augmentation argument i’m facing an error. There is no checkpoint is mentioned.

# Data Augmentation
# ================

f.DEFINE_float('data_aug_features_additive', 0, 'std of the Gaussian additive noise')
f.DEFINE_float('data_aug_features_multiplicative', 0, 'std of normal distribution around 1 for multiplicative noise')

f.DEFINE_float('augmentation_spec_dropout_keeprate', 1, 'keep rate of dropout augmentation on spectrogram (if 1, no dropout will be performed on spectrogram)')

f.DEFINE_boolean('augmentation_freq_and_time_masking', True, 'whether to use frequency and time masking augmentation')
f.DEFINE_integer('augmentation_freq_and_time_masking_freq_mask_range', 5, 'max range of masks in the frequency domain when performing freqtime-mask augmentation')
f.DEFINE_integer('augmentation_freq_and_time_masking_number_freq_masks', 3, 'number of masks in the frequency domain when performing freqtime-mask augmentation')
f.DEFINE_integer('augmentation_freq_and_time_masking_time_mask_range', 2, 'max range of masks in the time domain when performing freqtime-mask augmentation')
f.DEFINE_integer('augmentation_freq_and_time_masking_number_time_masks', 3, 'number of masks in the time domain when performing freqtime-mask augmentation')

f.DEFINE_float('augmentation_speed_up_std', 1, 'std for speeding-up tempo. If std is 0, this augmentation is not performed')

f.DEFINE_boolean('augmentation_pitch_and_tempo_scaling', True, 'whether to use spectrogram speed and tempo scaling')
f.DEFINE_float('augmentation_pitch_and_tempo_scaling_min_pitch', 0.95, 'min value of pitch scaling')
f.DEFINE_float('augmentation_pitch_and_tempo_scaling_max_pitch', 1.2, 'max value of pitch scaling')
f.DEFINE_float('augmentation_pitch_and_tempo_scaling_max_tempo', 1.2, 'max vlaue of tempo scaling')

Use tf.where in 2.0, which has the same broadcast rule as np.where
I Initializing variables...
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000                                                                                          Traceback (most recent call last):
  File "/home/administrator/deepspeech_0.6.1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/home/administrator/deepspeech_0.6.1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/administrator/deepspeech_0.6.1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Need minval < maxval, got 0 >= -3
	 [[{{node random_uniform_3}}]]
	 [[tower_0/IteratorGetNext]]
  (1) Invalid argument: Need minval < maxval, got 0 >= -3
	 [[{{node random_uniform_3}}]]
	 [[tower_0/IteratorGetNext]]
	 [[Mean_8/_85]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "DeepSpeech.py", line 969, in <module>
    absl.app.run(main)
  File "/home/administrator/deepspeech_0.6.1/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/administrator/deepspeech_0.6.1/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "DeepSpeech.py", line 942, in main
    train()
  File "DeepSpeech.py", line 635, in train
    train_loss, _ = run_set('train', epoch, train_init_op)
  File "DeepSpeech.py", line 603, in run_set
    feed_dict=feed_dict)
  File "/home/administrator/deepspeech_0.6.1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/administrator/deepspeech_0.6.1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/administrator/deepspeech_0.6.1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/administrator/deepspeech_0.6.1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Need minval < maxval, got 0 >= -3
	 [[{{node random_uniform_3}}]]
	 [[tower_0/IteratorGetNext]]
  (1) Invalid argument: Need minval < maxval, got 0 >= -3
	 [[{{node random_uniform_3}}]]
	 [[tower_0/IteratorGetNext]]
	 [[Mean_8/_85]]
0 successful operations.
0 derived errors ignored.