I want to give classify speech commands for a single speaker. The speaker could have an accent, could not have an accent but it would always be in the English language. The dataset would be extremely small, (10 speech commands, 10 .wav recordings per command). Would mozilla deepspeech work for this, perhaps a fine tuning model? How would I go about this?
If it is just 10 commands, simply try a custom language model. Ideally they sound differently. Search for that here in the forum, you’ll find some ideas on what to do.