Decoding time of 30 second audio file

Hi,

I’m running deep-speech training on 1000 hours of audio files.
How much time will an epoch take if I’m using 4 NVIDIA TESLA P100 (16 GB each) and also how much time it will take to decode a 30-second audio file on both GPU and CPU system?

Thanks

It’s basically impossible for us to gauge the time required without knowledge of the distribution of snippet lengths in your data set. For example, a really long sample could force a batch size of 1 and make training take very long.

However, by way of comparison, when we train on the 1k hours of LibriSpeech using 8 Titan X Pascal GPU’s it takes several days to converge.

As to decoding time on a CPU and/or GPU, it depends on the CPU and/or GPU. The surest way is to try. By way of comparison we’ve gotten faster than real time on a 1070 for clips of approximately 5sec in length.

As suggested in the README, the architecture is currently geared towards dealing with shorter clips of about 5 sec. So for a 30sec clip YMMV.

However, a streaming interface is current in the works[1] and should lift this limitation.