Trying to train model at 22050Hz, the training throws inf loss

Hi Team,
Here to bug you again! I simply changed the sampling rate to 22050Hz and since then the error being shown up is : Error converting shape to a TensorShape: Ambiguous dimension: 705.6.
I understand the it will change the window size length and window stride. But from the discourse I understood that we can change the sampling rate according to our need. I wonder where this might be causing an issue. Kindly throw some light!


@sumegha19 Could you please share us the changes you did? Also, I suspect that the non-int value of 705.6 might be what TensorFlow is unhappy about.

Thanks for responding so fast!!
Yeah, that number is the window_length size : 22050*32ms=705.6
Can we do something about it?
Also, I was trying to train with common voice data of 100 hrs, just for some experiments around window size parameter and window stride parameter(25ms–window_sample_length and 10ms–window_stride_length), it then too gave inf loss and was not for 2-3 epochs it ran, the loss didn’t change, meaning the model wasn’t converging.

Apart from these queries, I find it weird why the training runs for some epochs and then spits inf loss at some later epoch. I mean if the sizes are inconsistent the training should not start at all.

Please let me know where am I going wrong!

Change the window size so we have an integer value ?

Without more details on what you use in term of data and parameters, it’s hard to tell.

I kept the frequency -22000 Hz, and the training is right now on.
Data is : common voice data-100hrs-sampled at 22050
–using augmentation pipeline
command given : ./DeepSpeech_augmented2.py --train_files cv_data/cv_train_22050.csv --dev_files cv_data/cv_valid_22050.csv --train_batch_size 52 --dev_batch_size 52 --dropout_rate 0.3 --learning_rate 0.0008 --es_steps 10–es_mean_th 0.2 --es_std_th 0.2 --checkpoint_dir cv_22050_checkpoint/ --load “best” --lm_binary_path /home/sumegha/ds/content/datalab/data/deepspeech/DeepSpeech/data/lm/lm.binary --alphabet_config_path /home/sumegha/ds/content/datalab/data/deepspeech/DeepSpeech/data/alphabet.txt --lm_trie_path /home/sumegha/ds/content/datalab/data/deepspeech/DeepSpeech/data/lm/trie --export_dir cv_22050_model/ > outputs/cv_22050_train2

window_size_length = 32 ms
window_stride_length = 20ms

Let me know what else should I share.

So that’s not really our code ?

Common Voice data is released at 48kHz, and DeepSpeech import code resamples at 16kHz, can you share your changes as well ?

Likely your learning rate is inappropriate.

No no it’s totally your code ,just the name changed.

I had a folder of commnon voice samples at 16Khz so I upsampled them at 22050 Hz.

I kept this after the first training ran at learning_rate =0.001 and it early stopped, so fine tune it, I reduced the learning rate.

The weird thing is training runs normally for epochs and then at one epoch suddenly inf loss gets flashed.

Bad idea, you’re adding noise / artifacts.

can this be the reason of inf loss?

Who knows ? It’s hard to be definitive here.

Ok! I’ll try to figure out!
Anyways thanks for the help :slight_smile: