Alternatives to deep speech?

I’ve found using deepspeech (for making and using ASR models), to be a painful experience, it’s fine when it works but i’ve found it stressful to keep it working. I left my project in january, having got it working with python 3.8 and tensorflow 1.14, but now when I revisit it, it seems tensorflow 1.x has been deprecated and they’re no longer maintaining it, deep speech is finally starting to move to tensorflow 2.0 (after denying they’ll ever bother and dragging their heels).

In the meantime, on 20.04, i can’t even get tensorflow-gpu 1.15 working with python 3.6 anymore (out of the box, i had to end up reinstalling linux because it broke my system trying, and eventually i figured it out by symlinking some libraries, a solution which i couldn’t find through google despite looking very hard, probably because 99% of people who use tensorflow moved on to 2.x (or pytorch), except for some reason, the DS team, who thought they knew better?). It feels like something will always be broken with the the strange choices the tensorflow and deepspeech teams will make, they seem disinterested in keeping pace with each other and the rest of the Python and Linux ecosystem which has moved on from Python 3.6.

We’ve known about 3.8 for a long time but TF only recently added support for it, but only for 2.x and they have stated they never will add support beyond 3.7 for the TF 1.x project. This leaves DS in a bad place because they tied their cart tightly to 1 and now they’ve been left in a dead end by the TF team abandoning it, when they never said they were going to support 1 long term in the first place.

I see there are some projects using pytorch but those seem pretty heavily tied to Kaldi, which is a very user unfriendly project in its own right. the most promising project seems to be the speechbrain project, but they have not yet released anything. Have I missed any?

2 Likes

I know that it can be hard to get DeepSpeech to run, but I doubt that you’ll find any other repo that offers code + models and support for free to the extent that DeepSpeech does. It is just hard to get all the little needed libs to work together. But if you do, let us know and we might switch :slight_smile:

3 Likes

This is unfortunate, however we can’t ask TensorFlow to just sync with us, and the reverse is also not something that makes sense: by definition, we’ll never be in sync, and that’s fine.

Moving training to use newer 2.0 APIs is a lot of work and we can’t focus our energy on solving all problems at once. It’s unfortunate that r1.15 does not support latest Python 3.8, but at the same time most of the users maintaining complex training setups are still on previous LTS versions of distros.

I can run current DeepSpeech (non-GPU, my laptop does not have any) under Python 3.6 on Ubuntu 20.04, and there are a lot of other alternatives, even though it’s not as user-friendly as we’d like:

  • Docker
  • Build own Python using pyenv

No, it’s just that it requires a complete rewrite of some of the training parts, and it’s some work that is complicated.

Nothing is broken on r1.15 of TensorFlow, and if people desperately need support for r2.0 APIs we’d welcome patches.

@tensorfoo maybe this:

1 Like

Docker is your friend. When using it DeepSpeech is really easy to work with. I have tested a few different ASR systems with my own datasets including DeepSpeech, Wav2Letter and Nemo (formerly known as OpenSeq2Seq).

W2L has the lowest WER after training on my data but you have to really do your research before using it. It has a number of different architectures and when one of them works poorly another may work better. Also, in my opinion, it’s difficult to implement in a real world use case. They have Python bindings but it’s just executing the decoder and reading from stdout. To be fair, they now have an online decoder but the network is difficult to train. They used a couple of thousand of hours of audio and a few hundred GPUs so it’s not really realistic for a hobbyist like me.

When it comes to actually using the model you’ve spent hours and hours on training, DeepSpeech is unmatched. Just look at all the provided examples. They even have mobile support!

Nvidia is supposed to release Jarvis which should make Nemo easier to use but at the time of writing it hasn’t showed up.

4 Likes

I know some people aren’t fans but would you not have an easier time using Miniconda? With conda environments, on a per project basis you can switch Python versions easily, can install different versions of the cuda toolkit and get Tensorflow working usually fairly easily. I’ve used it with DeepSpeech and you can either stick fairly closely to the regular installation process (ie get python and cuda toolkit sorted first with conda, then use pip but within the conda environment) or use conda for certain libraries (eg if you want to try MKL numpy) and then use pip for the rest. The only caveat is I’m on Arch but I don’t think it’ll be significantly different to Ubuntu given the isolation of the conda environments.

2 Likes