Training a Small Dataset on DeepSpeech

zaanind · March 1, 2023, 2:45pm

Hello everyone,

As a subtitle creator, I was looking for ways to make the process easier and more efficient. However, I couldn’t find any existing models for the language I was working with, so I decided to try out DeepSpeech.

I had no prior training experience with AI, but I wanted to see how well the model would perform with a small dataset

I recently trained a DeepSpeech model on a small dataset of five examples to test how it performs. However, when I tested the model, I found that it produced some errors. For example, when I fed the model the word “eek”, it outputted “ek” instead. Similarly, when I fed it the word “abee”, it outputted “e”.

I’m wondering if this is due to the small size of my dataset or if there’s something else I need to do to improve the model’s performance. Should I create a scorer to fix the word errors, or is there another approach I should take?

Thank you in advance for your help!

kathyreid · March 1, 2023, 9:54pm

The DeepSpeech Playbook will help you with these questions - you will need more data.