Translation of sentences from other-language corpuses

My guess is that it’s because of the memory requirements of larger samples. The way I would investigate this is to increase the parameter (possibly even passing it as a command line argument to DeepSpeech.py) and see if it causes memory failures.

An even better approach might be to identify samples in the .tsv file for the language that are > 20 seconds long, and split them into 2 x slices of data.

I was curious about how many utterances in the CV dataset are > 10 seconds long. Without running a Python script over all the .mp3 files in a dataset (which I could do but don’t want to go down that rabbit hole), I took a look at the average utterance duration for all the languages - so this visualisation:

Most of the languages have an average clip duration of well under 7 seconds.

There might be some outliers, but based on this data, I don’t think we have a lot of clips that are > 10 seconds, or that could be split (at say 2 x 10 second or even 2 x 7 seconds) chunks.

1 Like