Ok so I’m training on a Malay dataset that I had from my company, not sure of the particular source but it can’t be found on the web. It consists of 15 min audio clips of people talking over the phone, with transcripts of named entities in capital letters.
I’ve split the audio clips and transcripts into short clips of around 1 sentence each, and trained it using DeepSpeech with 25 epochs.
After hitting a WER of about 0.59, I used this model to run inference on the test file. I then used the following metric to calculate the recall score for the NER task.
Recall = No. of NEs correctly decoded/Total no. of NEs in test transcripts
The recall score I got for the scorer with only sentences from the training set was around 0.18, which is pretty low. After adding in an additional 1.5 million sentences to the scorer file from the malay dataset that I got from the following sources:
The recall score fell to 0.15.
This is what I did, the results look pretty strange to me.