Requirements for DeepSpeech

what are the minimum platform requirements both hardware and software to download and run deepSpeech code? for example hardware, os, python, tf versions etc

This is documented in the README, in the very first section, though it’s outdated now and valid for the older versions (0.1.1 model), newer should even require less power: https://github.com/mozilla/DeepSpeech/blob/master/README.md

Thanks! But Under Table of Contents and Prerequisites it says Python and Git Large File Storage. It does not say anything about the other requirements.

Check above that part, it gives figures about some hardware :slight_smile:

It says please check runtime dependencies. That link does not give os, hardware, tensoflow version etc

Giving CPU status is much more complicated, because even with the same model of CPU we saw big variances depending on a lot of factors.

Thanks. If I use the CPU model and not use GPUs, what hardware for example and OS do I need ?

As documented, if you use our prebuilt binaries, you need some CPU with at least AVX instructions set. Also, as documented, we have binaries for Linux/AMD64, OSX/AMD64, and some ARM (strictly RPi3B) and ARM64 systems (should run on any Debian Stretch aarch64 distro, tested on Le Potato board).

thanks again. After I have the hardware , say LInux HW, after some installations and steps if I do
git clone https://github.com/mozilla/DeepSpeech

get the code and run
run-ldc93s1.sh

it should work?

Do Prebuilt binaries means pre-trained binaries ?

@csawkar1215 It would have been easier that you stated what you want. If you are looking at training your own model, it’s not the same, your require some good GPUs to be able to achieve anything.

No, it means pre-built binaries, to run inference.

It’s all documented: https://github.com/mozilla/DeepSpeech/blob/master/README.md#training

thank you for the details.

I’m sorry, but without more details on what you are trying to do, it’s hard to be more helpful. Full training of the previous 0.1.1 model on the whole set of data we have (several thousands of english audio) on something like 16x TITAN X GPUs would take around 1 week.

what does inference of prebuilt binaries mean?

binaries to compute audio to text

what do you feed to the binary and what is the output ?

Again, it’s all documented: WAV 16 bits, text output.