Tests on Deepspeech Python Package

Hi everyone,

I run some tests with the Deepspeech Python Package. The model used is the pre-trained model which we can download and I test its efficiency on the a small part of the Voxforge Dataset. Here is the code I used:

 import os
from jiwer import wer
from deepspeech import Model
import scipy.io.wavfile as wav

def file_lengthy(fname):
        with open(fname) as f:
                for i, l in enumerate(f):
                        pass
        return i + 1

directory = 'Voxforge_dataset/test'
ds = Model('models/output_graph.pb', 26, 9, 'models/alphabet.txt', 500)
score = 0
num1 = 0

for foldername in os.listdir(directory):
	
	direc = directory + '/' + foldername
	
	if 'wav' in os.listdir(direc):
		
		direc1 = directory + '/' + foldername + '/wav'
		direc2 = directory + '/' + foldername + '/etc/prompts-original'
		f = open(direc2, "r")
		text = f.readlines()
		lengthy = file_lengthy(direc2)
	
		for k in range(lengthy):
			
			line = text[k]
			num2 = line.find(' ')
			name = direc1 + '/' + text[k][:num2] + '.wav'
			num1 = num1 + 1
			fs, audio = wav.read(name)
			processed_data = ds.stt(audio, fs)
			score = score + wer(text[k][num2+1:], processed_data)
		
score = score / num1
print("Average wev is", score)

I obtained an average wer of approximately 35%. My question are the following:

  1. Is this wer normal, considering I used the pre-trained model ?
  2. I didn’t specified any language model binary file nor any language model trie file. So which ones did my algorithm use by default ?
  3. How can I specify which lm binary file and trie file I want to use ?

Thanks in advance.

I suspect scipy.io.wavfile does not produce proper features, maybe @reuben can confirm that

None

Please read the doc, use enableDecoderWithLM() for that.