Is dataset of acoustic model subset of dataset of language model?

Hi,
I have a question around dataset to train a model from scratch.

Does dataset of acoustic model have to be subset of dataset of language model?

I think there are a few scenarios.

  1. Dataset of acoustic model is different from dataset of language model.
  2. Dataset of acoustic model is subset of dataset of language model.
  3. Dataset of acoustic model is same as dataset of language model.

I mean acoustic model can have wav audio and sentence like “I love dogs.”
Language model can have “I love dogs. I love cats too”.

Which is recommended?

Thanks in advance.

No.

The 0.5.1 model embodied your case 1, “Dataset of acoustic model is different from dataset of language model.”

Which is recommended depends on your use case.

2 Likes