Warning and error when training the model

I am trying to fine tune with custom data set with the following command:

python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir ~/checkpoint_retrain --epoch -30 --train_files ~/iitm_data/hindi-female-train.csv --dev_files ~/iitm_data/hindi-female-dev.csv --test_files ~/iitm_data/hindi-female-test.csv --learning_rate 0.0001 --train_batch_size 24 --dev_batch_size 48 --test_batch_size 48 --display_step 0 --validation_step 1 --dropout_rate 0.2 --checkpoint_step 1 --decoder_library_path binaries/libctc_decoder_with_kenlm.so --export_dir ~/new_model

but, I am getting the following warning again and again:

WARNING:root:frame length (1200) is greater than FFT size (512), frame will be truncated. Increase NFFT to avoid

In addition to this, I am also getting KeyError: ‘3’ even though I have checked my transcripts and there is no 3 present. Can someone please help?

Check again. It might be a tricky UTF-8 char looking exactly the same.

@lissyx could you explain a little more as how to fix it? and do you have any idea about the frame length warning?

@lissyx I am generating my CSVs using a python script and I encoded it in ‘UTF-8’. Even that didn’t help

I already told you how to fix it. Change the code that shows the KeyError, print the hex code of the character, I’m sure your “3” is not the 0x33 one, but rather some other UTF-8 code. And you don’t have this one in your alphabet. Hence the error.

The error is in label_from_string() in text.py. I have printed hex(int(string)) but this is giving me the same value in hex as well i.e ‘0x03’. I had to convert this to int because the input is string and hex() doesn’t work with string. Next, I tried printing every string (which is actually a character) which goes in this function. Again, that didn’t help as I started getting some python threading error. I am not sure how to proceed further now.

That’s wrong … You need ord(string):

>>> hex(ord('3'))
'0x33'