I have a question concerning the lenght of the recordings suitable for training. What I (think I) understood from some previous discussions:
- The recordings shouldn’t be too long (e.g. an hour) because it may be too demanding in terms of processing, and also the learning algorithm is not designed for such cases.
- However, whether the length is 5 seconds, 16 seconds or 30 seconds does not really matter.
- It should be possible to train DeepSpeech even for one-word utterances, which means that it can be trained on recordings as short as e.g. 0,5 second.
To conclude, my impression is that there are no hard limits on either side - the upper boundary is given mostly by the fact that DeepSpeech works best for sentence-like recordings, and the lower limit just reflects the fact that the recordings should make some (linguistic) sense.
Am I right? Are there also other things I should consider when cutting my material?
Thank you very much.