Checkpoints and exported model giving totally different outputs

I trained the model on a collected domain specific dataset with a custom scorer. The final WER was really good (got from evaluate.py). But when I used to exported model (using deepspeech library) for one of the file from the same test dataset, the result was totally different. (note : the file was picked from the same dataset that the checkpoints were evaluated on.)

original transcript : “wrong interpretation of the values at intermediate positions so that is the basic problem of”

result from checkpoints : “wrong interpretation of the values at intermediate positions so that is the basic problem of”

result from exported model : “so”

training command :
python ./DeepSpeech-0.7.4/DeepSpeech.py --train_files ./domain_specific_data/domain_train.csv --save_checkpoint_dir ./domain_specific_data/checkpoints/scratch_19k/ --scorer ./domain_specific_data/scorers/domain_specific_default.scorer --alphabet_config_path ./DeepSpeech-0.7.4/data/alphabet.txt --n_hidden 2048 --train_batch_size 64 --learning_rate 0.0001 --dev_files ./atul_data/train_test_dev/validation_6k.csv --dev_batch_size 64 --use_allow_growth true --reduce_lr_on_plateau true --max_to_keep 10 --cache_for epochs 3 --epochs 30 --export_dir ./model_pb/

DeepSpeech 0.7.4

Thanks in advance. :slightly_smiling_face:

If your data is at a different sample rate than 16 kHz, the training code will handle it automatically, but the inference code won’t, you have to specify the --audio_sample_rate flag at export time. I don’t remember if 0.7.4 will handle this properly but that’s the first thing that comes to mind. You should be getting a warning about the sample rate mismatch on the client side tho…

Adding to Reuben, the pb is basically the frozen checkpoint. So results should be the same.

If they are not, something else changed:

  • How you feed the audio to DS (e.g. sample rate)

  • Setting of parameters (lm_alpha, but they shouldn’t be that bad)

  • Given scorer (are you really using the same?)

Why don’t you switch to DS 0.9.2? Models should be compatible and it is easier to help.

thanks a lot for the quick response. But the sampling rate is 16khz only and the scorer files and all of it is in place (double-checked that too).

Moving to 0.9.2 seems like a good idea, but as I work in a remote server without admin access, setting up the env takes a little more time and effort. I used following command to test the ineference on exported model.

deepspeech --model model.pb --scorer custom_scorer.scorer --audio audio.wav

Is there anything else that could have gone wrong?

without the test run from the training log highlighting the inference you compare with, it’s hard to be definitive.

maybe the file is just broken? can you share details?

I took few random samples from the test files…tested them with original deepspeech model and scorer. Issue was same. Tried some different combinations and files outiside the dataset. It seems there is some issue with the domain specific audio file. I manually listened to some of them and the content was there. Checked the sampling rate and channels and found them to be 16KHz and 1 respectively. What else could go wrong with the audio file ?? :expressionless:

Again, there is no difference between checkpoint and pb model. So you must have changed something else. Same server, same environment, same DS version? Somewhere along the road you changed something. Why don’t you set up a fresh environment and check versions, so you got 0.7.4? Pip install will give you 0.9.x if you don’t watch out.