Hello team,
Is there a way to the predicted transcripts for all the TEST set with WER? At present (v0.60) shows the WER only for a few (~10) test transcripts.
Kindly guide.
Hello team,
Is there a way to the predicted transcripts for all the TEST set with WER? At present (v0.60) shows the WER only for a few (~10) test transcripts.
Kindly guide.
The first line, before the samples are shown, is the WER/CER for the entire set.
Adding to Reuben, I usually run it just with the test_files param and set
--test_output_file "/xxx/out.txt" \
--report_count 50 \
If you want to dig deeper, try the benchmarkstt repo on the out.txt file.
@reuben: I meant I want to get the predicted transcripts for the entire test set with WER for each transcript. Is there a way to get it? Kindly guide.
@othiele: Thank you, will try it out.
Either set report_count = test_size or check the output file, it lists it for all inputs.
Thank you @othiele, it worked. But the file is in default ASCII format. I tried
iconv -f ASCII -t UTF-8 out.txt > “out_utf8.txt”
but it didn’t work. You did you handle it?
{
"char_distance": 35,
"word_length": 3,
"wer": 2.6666666666666665,
"char_length": 26,
"loss": 208.03085327148438,
"src": "beim vorliegenden gesch\u00e4ft",
"word_distance": 8,
"wav_filename": "/media/data/LTLab.lan/agarwal/german-speech-corpus/swiss_german/clips/35795.wav",
"res": "die kollegen und kolleginnen wie marie den elfte",
"cer": 1.3461538461538463
}
It’s this problem: https://stackoverflow.com/a/18337754/346048
I’ll make a PR for a fix. Alternatively you can load the file using Python and apply a fix locally so you don’t have to re-run the test epoch.
In the test results, I see a problem. Some resulting transcripts are very short (1-2 words) and some are very long (15-20 words) for a source transcripts of 5-8 words.
I tried changing LM_alpha and beta parameters, but got not much success. Do you recommend anything to solve this problem.
Note: I am working on German data and have trained the model with ~1000 Hours of speech data.
@reuben, Kindly advice on it.
Others will be better qualified than I to comment, but I’d guess it’s worth looking at two areas:
1. Your dataset: what’s the audio quality like? And how about the transcription quality? Bear in mind that 0.70 was trained on maybe six times as much audio as your 1,000 hrs (just a back of envelope calc. based on the datasets mentioned on release page under Training Regimen section)
2. Your language model/scorer: how large a text corpus did you use to create it? Was it just the transcribed text from your audio dataset or was it more comprehensive? Unless you’re targeting a narrow vocab scenario (+ it doesn’t sound like you are) then you’ll likely want the biggest you can manage, so that the model makes sensible predictions about sentence probabilities.
@reuben, Kindly advice on it.
Given it wasn’t that long after your earlier question and people had already helped you, it might be worth a little patience People are often happy to help but they aren’t sitting there just waiting for your next question…
Anyway, I hope you get to the bottom of your issues with the transcriptions