yk98
February 25, 2020, 8:22am
1
I’m using evaluate.py to get a report on a no. of test files which I’ve put in test.csv(wav_name,size,transcript).
However, when I run evaluate.py using a custom LM and trie file, the transcriptions are incorrect.
Whereas if i run these transcriptions individually, I get a (almost) correct transcription…
Can you tell me what I must be doing wrong??
Command used to run evaluate.py:
python evaluate.py --test_files ../Test_Files/Test.csv --checkpoint_dir ../original_checkpoints/ --alphabet_config_path ./data/alphabet.txt --lm_binary_path ./working/language_models/lm.binary --trie ./working/language_models/trie --test_output_file ../Output/output.txt
Example:
Original transcript: i am good and what about you.
Individual inference: i am good and what about you
evaluate.py inference: do you
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
February 25, 2020, 8:54am
2
yk98:
However, when I run evaluate.py using a custom LM and trie file, the transcriptions are incorrect.
Whereas if i run these transcriptions individually, I get a (almost) correct transcription…
Can you tell me what I must be doing wrong??
How do you run the individual inference ?
yk98
February 25, 2020, 2:50pm
3
deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --lm working/language_models/lm.binary --trie working/language_models/trie --audio ../Test_Files/Test1.wav
yk98
February 25, 2020, 3:25pm
4
The original_checkpoints directory contains the original checkpoint files downloaded from the 0.6.1 release page. i.e pt 233784
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
February 25, 2020, 3:39pm
5
Are you sure about your files? About the LM alpha and beta values?
yk98
February 25, 2020, 3:47pm
6
I haven’t given them any value as such so i assumed they would be the default values. I passed them again along with evaluate.py and i got the same results…
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
February 25, 2020, 4:15pm
7
Well, i’d suspect how you exported your model.
yk98
February 26, 2020, 9:42am
8
I loaded the relevant checkpoint and i thought that would be it.
evaluate.py doesn’t seem to take --model as an input parameter…
yk98
February 26, 2020, 9:51am
10
The same point in the deepspeech-0.6.1-checkpoint directory:
best_dev-233784.data,
best_dev-233784.index,
best_dev-233784.meta
Used the following steps to make the lm.binary and trie:
build/bin/lmplz --text …/cmlm/file/comcast_updated_3.txt --arpa …/cmlm/lm/words.arpa --order 5 --temp_prefix …/tmp --prune 0 0 0 1
build/bin/build_binary -T -s trie …/cmlm/lm/words.arpa …/cmlm/lm/lm.binary
./generate_trie data/alphabet.txt ./cmlm/lm/lm.binary ./cmlm/lm/trie
As i mentioned above, my custom lm.binary and trie file work perfectly fine with individual transcriptions…
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
February 26, 2020, 9:55am
11
Can you please share more logs from your evaluate.py
?
yk98
February 26, 2020, 10:23am
12
Hey so I got the problem
Just had to convert the .wav files to 16k.
This is usually not a problem when you transcribe files individually and I should’ve read the logs better!
Thank you for your help
1 Like