Fine tuning Deepspeech 0.9.1 with same alphabet

You say it did not work, but you dont explicit if you still have Xorg and GNOME shell process. If they are still there, it’s inconclusive.
If they are killed and you still repro, there’s something wrong on your system and I have no idea. Please try and use docker image maybe?

When I run sudo systemctl stop gdm3.service and then systemctl status gdm3 I get:

● gdm.service - GNOME Display Manager
   Loaded: loaded (/lib/systemd/system/gdm.service; static; vendor preset: enabled)
   Active: inactive (dead) since Mon 2020-11-30 08:52:19 EST; 45s ago
  Process: 915 ExecStart=/usr/sbin/gdm3 (code=exited, status=0/SUCCESS)
  Process: 907 ExecStartPre=/usr/share/gdm/generate-config (code=exited, status=0/SUCCESS)
 Main PID: 915 (code=exited, status=0/SUCCESS)

Nov 30 08:39:58 ghada-Inspiron-3593 systemd[1]: Starting GNOME Display Manager...
Nov 30 08:39:58 ghada-Inspiron-3593 systemd[1]: Started GNOME Display Manager.
Nov 30 08:39:58 ghada-Inspiron-3593 gdm-launch-environment][955]: pam_unix(gdm-launch-environment:session): session opened for user gdm by (uid=0)
Nov 30 08:40:12 ghada-Inspiron-3593 gdm-password][1347]: pam_unix(gdm-password:session): session opened for user ghada by (uid=0)
Nov 30 08:52:19 ghada-Inspiron-3593 systemd[1]: Stopping GNOME Display Manager...
Nov 30 08:52:19 ghada-Inspiron-3593 gdm3[915]: GLib: g_hash_table_find: assertion 'version == hash_table->version' failed
Nov 30 08:52:19 ghada-Inspiron-3593 systemd[1]: Stopped GNOME Display Manager.

And I still have Xorg and gnom-shell processes
I also noticed that even with the flag --load_cudnn instead of --train_cudnn it gives the same error, but it’s fixed once I state: export CUDA_VISIBLE_DEVICES=-1

that makes you not use cuda devices, so it’s not what you want

then it could still be the cause

maybe it’s gdm.service on your system: I’m not here to debug your setup, sorry. I explained you what it could be, but I can’t fix it for you.

Yes I got it ! Thanks alot !

Problem solved after setting TF_FORCE_GPU_ALLOW_GROWTH = True

1 Like

Hi lissyx.

This also gave me a huge headache.
The official https://deepspeech.readthedocs.io/en/v0.9.1/TRAINING.html documentation states that we need cuda 10.1 for deepspeech.
i wasted 6 hours today trying to solve this, lol.
the official documentation should be updated.

this solved my problem though:

running on ubuntu 18.04 (since thats the latest OS with cuda support.
https://www.tensorflow.org/install/gpu
change all 10.1 -> 10.0 and it will work!

also maybe i am a complete retard, but it took me an hour to solve another issue where DeepSpeech.py could not resolve path with spaces.

it was fixed, after it was properly reported the discrepency

@soerengustenhoff, @Ghada_Mjanah thanks for pointing that problem out. I was still using my old setup and also didn’t realize we had a problem there.

@soerengustenhoff, as for the white spaces. If you can, make a PR and file it. If you don’t have the time, open a new thread here on Discourse, give some examples and we’ll fix that either in code or the docs for the next release.

Hi lissyx.

Where has it been fixed ?
My problems were yesterday, and then it persisted.
is it fixed in the new version ?

And thank you for your very swift reply!

Olaf i will try to write down any further issues that i will have.
right now my model failed after running common voice english with a batch size of 50. another 4 hours wasted due to running out of memory i suppose ?
I am not sure how it works though, will have to look through the code :slightly_smiling_face:

is a PR through github ?
I assume it is a public request, and i am happy to help.
I did however just think it was a simply python3 argument thing i ran into.

It is already fixed on Github, might not be updated on readthedocs though, @lissyx?

Try multiples of 2 like 16, 32, etc. But it is a common problem to find the right batch size. If it persists, start a new post here on Discourse.

Yes, it would be great if you could generate a Pull Request on Github. If not, open a post for the problem here and we’ll see to it that we find a solution. If it is just a usage related problem, also open a post, describe what happened and how you fixed it. Most people search for the problem and will find your post.

I fixed doc on master and r0.9 a few days ago.

But readthedocs doesn’t show the changes yet.

Because latest is v0.9.1 and we have had no time to push a 0.9.2 yet. Check r0.9 and master, it’s reflecting it.

Hm, pretty sure I checked master as well, but yep, master shows the changes now. Thanks for pointing that out.

please elaborate / file issue / send PR if you can.

yes i can see that under Training your own model, it has now been changed.
it is not changed in " CUDA dependency (inference)", but i dont know if that is by design.
Can you still run inference with a wrong CUDA version ?

YEs, i will write a Pull request and write an update on discord so that any other windows newbies do not make the same mistake (i am running ubuntu 18.04 for this project, just to clarify)

thank you for your time on the matter lissyx :slight_smile:

edit: I wont have time to do a PR and create a post discourse before i get home from work.

no, this was never the case

what do you mean? I properly see CUDA 10.1 and CUDNN v7.6 there.

Can you run inference with CUDA 10.1 when you need tensorflow-gpu 1.15 which can only run with cuda 10.0 ?

Is this a question ? Tensorflow versions for training and inference are different, their cuda dependency are as well.

Hope this helps and is for future reference:

GPU - Inference

This already makes use of TF 2.3 and therefore needs CUDA 10.1. Think of inference already moved to TF 2.

GPU - Training

As much more code needs to be changed to move to TF 2, this is still on TF 1.15 and you therefore need CUDA 10.0.

Yes, this is not great, but hey, inference already makes use of better TF 2 performance :slight_smile: