Error after pip3 install --upgrade -e

SeaLiteral · January 18, 2021, 3:42pm

It seems, if I want to get DeepSpeech to transcribe anything that isn’t in English, I first have to train a model for that language. And right now I don’t have enough audio to do that from scratch for Danish, but fine-tuning the English model to better understand my accent seemed like a thing I could try in the meantime, so I read the documentation at https://deepspeech.readthedocs.io/en/v0.7.4/TRAINING.html and did all the stuff before “pip3 install --upgrade -e .” But that’s as far as I can get. When I run that, it first seems to install some stuff, then writes the following in red:
ERROR: Could not find a version that satisfies the requirement tensorflow==1.15.4 (from deepspeech-training==0.10.0a3) (from versions: 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.2.1, 2.2.2, 2.3.0rc0, 2.3.0rc1, 2.3.0rc2, 2.3.0, 2.3.1, 2.3.2, 2.4.0rc0, 2.4.0rc1, 2.4.0rc2, 2.4.0rc3, 2.4.0rc4, 2.4.0)

ERROR: No matching distribution found for tensorflow==1.15.4 (from deepspeech-training==0.10.0a3)

After re-reading the instructions several times, and even redoing some steps, and after searching here, I found a few people mentioned similar things, and one who seemed to have had the same problem had managed to solve it by following some instructions that someone linked to in the documentation. Unfortunately, that link leads to a 404 page now, and the only thing in the instructions that looks like it could be a problem is the CUDA thing (I don’t think my laptop has an NVIDIA GPU, and it does seem to have a sticker that says something about AMD and graphics, so I’m guessing it doesn’t also have an NVDA GPU, it’s not a gaming laptop).

Oh, and in case it matters, I’m using Ubuntu, seems to be 20.04.1 LTS, and sorry, but I don’t remember which day of the week it was when I installed it.

Am I missing something obvious?

reuben · January 18, 2021, 3:48pm

That version list starts with TensorFlow 2.2.0rc1, which is the first version to add support for Python 3.8, which makes me think you’re running Python 3.8. For now, our training code requires Python 3.6 or 3.7.

othiele · January 18, 2021, 4:07pm

I usually don’t advertise other engines, but have you tried DanSpeech? They even have a published model, once a DTU project. A bit dated and trained with some Norwegian as well as data from Folketinget, but maybe ok for your use case. And they might move to DeepSpeech at some point I think I had read that somewhere.

Apart from that, set up a 3.7 environment or search here in the forum for 3.8 and you’ll find some resources. And please don’t use 0.7.4 docs, but the current ones.

SeaLiteral · January 18, 2021, 5:40pm

Can one message here be a reply to two messages?
@reuben Thanks! So that was what I was missing, the Python version.

@othiele Thanks! I guess that engine might be useful for Danish. But I’ll still have to figure out how to get better accuracy in English with my accent. Do I also need an older version of Python to be able to fine-tune the scorer file? If not, then I might want try that.

I guess I might also wait and see if DeepSpeech gets to work “properly” with Python 3.8. I feel like to get a reasonable accuracy in Danish, I could have to wait until there’s more audio to train with anyway.

And about the documentation, I wonder how I ended up reading an old version. Maybe I followed a link from one of the posts here on Discourse or something like that.

othiele · January 18, 2021, 5:43pm

Search this forum for accents, the English model works really well for American English. Not so well for other speakers. Think of a Copenhagener trying a model from Jutland

SeaLiteral · January 19, 2021, 6:49pm

It seems I will have to fine-tune the English model to get it to recognize it. But I managed to use Python 3.7, so now I’m hopefully a bit closer to doing that. Hopefully I won’t need a GPU. And I also managed to make a scorer package, increasing the accuracy a bit, but it only got better at recognizing words that were already somewhat common in English.

Also, how do I deal with speech in one language containing names from another country? Do I need to train it with audio of those, or can I just put them in the scorer language/model training text (maybe misspelling it on purpose to show how it would be spelled if it were in the language of the speech around it, and then I’d have to post-process the output to replace the wrong spelling with the correct one). I hope that’s not a dumb question, I probably know more about linguistics than about machine learning.

othiele · January 19, 2021, 7:30pm

Nope, words in a different language are a problem. You need enough of them in your training material (thousands) and then you can recognize them. You could try to trick the scorer and replace it afterwards, but that sounds a bit hacky …

If you let us know what you want to do, we can say more.