what are the minimum platform requirements both hardware and software to download and run deepSpeech code? for example hardware, os, python, tf versions etc
This is documented in the README, in the very first section, though it’s outdated now and valid for the older versions (0.1.1 model), newer should even require less power: https://github.com/mozilla/DeepSpeech/blob/master/README.md
Thanks! But Under Table of Contents and Prerequisites it says Python and Git Large File Storage. It does not say anything about the other requirements.
Check above that part, it gives figures about some hardware
It says please check runtime dependencies. That link does not give os, hardware, tensoflow version etc
Giving CPU status is much more complicated, because even with the same model of CPU we saw big variances depending on a lot of factors.
Thanks. If I use the CPU model and not use GPUs, what hardware for example and OS do I need ?
As documented, if you use our prebuilt binaries, you need some CPU with at least AVX
instructions set. Also, as documented, we have binaries for Linux/AMD64, OSX/AMD64, and some ARM (strictly RPi3B
) and ARM64 systems (should run on any Debian Stretch
aarch64 distro, tested on Le Potato
board).
thanks again. After I have the hardware , say LInux HW, after some installations and steps if I do
git clone https://github.com/mozilla/DeepSpeech
get the code and run
run-ldc93s1.sh
it should work?
Do Prebuilt binaries means pre-trained binaries ?
@csawkar1215 It would have been easier that you stated what you want. If you are looking at training your own model, it’s not the same, your require some good GPUs to be able to achieve anything.
No, it means pre-built binaries, to run inference.
It’s all documented: https://github.com/mozilla/DeepSpeech/blob/master/README.md#training
thank you for the details.
I’m sorry, but without more details on what you are trying to do, it’s hard to be more helpful. Full training of the previous 0.1.1 model on the whole set of data we have (several thousands of english audio) on something like 16x TITAN X GPUs would take around 1 week.
what does inference of prebuilt binaries mean?
binaries to compute audio to text
what do you feed to the binary and what is the output ?
Again, it’s all documented: WAV 16 bits, text output.