I am on version 0.7.0 and just testing on LibriSpeech test clean. I picked the first 18 files from the csv, combined them into a single wav file using ffmpeg. I just using the downloaded deepspeech-0.7.0-models.scorer and deepspeech-0.7.0-checkpoint or deepspeech-0.7.0-models.pbmm
Running the following on the same input audio
deepspeech --model models/deepspeech-0.7.0-models.pbmm --scorer models/deepspeech-0.7.0-models.scorer --audio bin/librispeech/LibriSpeech/test-clean-wav/test.wav
python transcribe.py -load_checkpoint_dir deepspeech-0.7.0-checkpoint --scorer models/deepspeech-0.7.0-models.scorer -src bin/librispeech/LibriSpeech/test-clean-wav/test.wav
python evaluate.py --n_hidden 2048 --checkpoint_dir deepspeech-0.7.0-checkpoint --test_files bin/librispeech/LibriSpeech/test-clean-wav/test.csv --test_output_file output.json
why is there a discrepancy in the transcribed output from the 3 methods? I have attached a pdf with the output in each case and the ground truth. Shouldn’t the results from evaluate.py and Deepspeech (0.7.0) python binding atleast be the same? I do see that transcribe.py does an additional split of the audio, not sure if that is causing any issues?
Also I see that transcribe.py very often misplaces a block, for instance the first 15 secs in the audio was the last but one line in the transcribed file.
Can someone help me please why there is so much difference in the output in of the cases?
thankscomparison.pdf (105.6 KB)