What is the training set of the pre-trained model?

(jackhuang) #1

I used the TIMIT dataset to test the pre-trained model, and the wer is about 27%. I want to know the training set of the pre-trained model, so I can try to improve the selection strategy of choosing training set. Can anyone help me?

(kdavis) #2

The pre-trained model was trained on Fisher, Switchboard, and Librivox training data sets.

(jackhuang) #3

Combining these three sets to be the training set and validation set?

(kdavis) #4


The training set was the Fisher, Switchboard, and Librivox training data sets.
The validation set was the clean Librivox validation data set.
The test set was the clean Librivox test data set.

(jackhuang) #5

Thank you, and is the language model the 4-gram language model with a 30,000 word vocabulary trained on the Fisher and Switchboard transcriptions as the paper says?

(kdavis) #6

No. We didn’t try to exactly reproduce the paper’s results.

We created a KenLM language model based off of Fisher, Switchboard, andLibrivox training data sets as well as part of Wikipedia.

(Sanjay Rao) #7

Is it possible to train this model further using other voice datasets ?

(Yv) #8

I don’t think that’s possible until tensorflow checkpoints are also published - frozen out_graph.pb cannot be used for further training AFAIK

(jackhuang) #9

Does the common Voice data set include other data sets like Librivox, Switchboard, Fisher?

(kdavis) #10

No. Common Voice, Librivox, Switchboard, and Fisher are separate, distinct data sets.

(jackhuang) #11

Thank you, and may I ask the WER value of the clean Librivox test data set?

(jackhuang) #12

And did you use the 4-gram model to train the language model?

(kdavis) #13

The WER is 6.0 percent for the Librivox clean, test data set

(kdavis) #14

Not sure what you’re asking here.

Do you mean “Is the language model a 4-gram language model?”

(jackhuang) #15

Yes, that’t what I want to express.

(Panybj) #16

Where can I download Fisher and Switchboard datasets?

(kdavis) #17

You have to purchase Fisher and Switchboad from LDC.

(Srikar) #18

So, the Common Voice data is not factored into the pre-trained model?