Can anybody help with 'Failed to create session.' (0x3006) error?

utunga · June 16, 2020, 12:40pm

Hi ya’ll, wondering if anyone can shed some lights on this.

Yes, we are running a custom model and (eek) custom native_client binaries (ever so slightly tweaked from ds_0.7.1)

I can run this trained output_graph.pbmm model in inference mode using DeepSpeech.py (pretty sure its the same exact file though I might double check this).

To narrow things down I made sure to use

dockerfile based on tensorflow/tensorflow:1.15.2-gpu-py3
python3.7
the standard code for client.py - copied and pasted into client.py below

But as you can see, I’m getting
CreateModel failed with ‘Failed to create session.’ (0x3006)

I’m sort of wondering if the problem is the Tesla K80 not having the required level of CUDA GPU compatibility? (Which is different from the machine where we trained the model and on which I confirmed DeepSpeech.py inference was working).

Or, quite probably there’s just something wrong with my custom binary (I’ll roll back to standard to check I guess?)

Or is the problem that we’re using a custom alphabet? There doesn’t seem to be anywhere in the API to specify a custom alphabet anymore?

Or… probably I’m just missing something obvious. Any help much appreciated!

Nga Mihi / Thanks in advance

Here’s the log

nvidia-docker run --rm -it -u $(id -u):$(id -g) -v $(pwd):/work -w /work/ docker.dragonfly.co.nz/reo-tuhituhi-gpu:8a020f7 bash

________                               _______________
___  __/__________________________________  ____/__  /________      __
__  /  _  _ \_  __ \_  ___/  __ \_  ___/_  /_   __  /_  __ \_ | /| / /
_  /   /  __/  / / /(__  )/ /_/ /  /   _  __/   _  / / /_/ /_ |/ |/ /
/_/    \___//_/ /_//____/ \____//_/    /_/      /_/  \____/____/|__/


You are running this container as user with ID 1000 and group 1000,
which should map to the ID and group for your user on the Docker host. Great!

tf-docker /work > cd client
tf-docker /work/client > nvidia-smi
Tue Jun 16 12:28:08 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   66C    P0    61W / 149W |      3MiB / 11441MiB |     68%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
tf-docker /work/client > python3 --version
Python 3.7.7
tf-docker /work/client > python3 client.py --model ../model/20200610_ds.0.7.1_thm/output_graph.pbmm --audio ../146238.wav
Loading model from file ../model/20200610_ds.0.7.1_thm/output_graph.pbmm
TensorFlow: v1.14.0-0-g87989f6959
DeepSpeech: v0.7.1-4-g7ff16422
2020-06-16 12:28:16.544591: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
Invalid argument: No OpKernel was registered to support Op 'Minimum' used by {{node Minimum}}with these attrs: [T=DT_FLOAT]
Registered devices: [CPU]
Registered kernels:
  <no registered kernels>

         [[Minimum]]
Traceback (most recent call last):
  File "client.py", line 162, in <module>
    main()
  File "client.py", line 117, in main
    ds = Model(args.model)
  File "/usr/local/lib/python3.7/dist-packages/deepspeech/__init__.py", line 38, in __init__
    raise RuntimeError("CreateModel failed with '{}' (0x{:X})".format(deepspeech.impl.ErrorCodeToErrorMessage(status),status))
RuntimeError: CreateModel failed with 'Failed to create session.' (0x3006)

reuben · June 16, 2020, 12:51pm

Does this help? https://github.com/mozilla/DeepSpeech/wiki/Finding-TensorFlow-op-kernel-targets-when-you-add-new-operations-to-the-graph

utunga · June 16, 2020, 12:59pm

Appreciate the suggestion but yeah, haven’t changed the actual underlying NN at all on this occasion, just trained it on different data. The binary is a little different as I mentioned but not any of the actual tensorflow stuff (just adding confidences to the metadata).

However it does make me wonder if perhaps there is a version mis-match between the model and the binary I’m using (though they appear to be both v0.7.1, basically)… is that the kind of direction I should be looking in?

reuben · June 16, 2020, 1:05pm

Actually, this is weird, our standard graph has Minimum in it too, and so we include it in our dependencies. Are you sure you didn’t mess anything up when editing the native client? Check the “//tensorflow/core/kernels:deepspeech_cwise_ops” target in native_client/BUILD.

reuben · June 16, 2020, 1:39pm

And FWIW the metadata fields are TensorFlow stuff, and they can definitely change the required ops and kernels.

lissyx · June 16, 2020, 2:04pm

What is that? 0.7 should be using tensorflow r1.15.

utunga · June 16, 2020, 9:50pm

Pretty sure just reading the log-probs that are already there and passing them out wouldn’t have changed any of the underlying ops.

That said, it seems likely that I might have as you suggested “messed something up when editing the native client [build]” if some of the Ops are not registered.

utunga · June 16, 2020, 9:51pm

Ah, thanks @lissyx … not sure how that happened. Will make sure this is right also.

Topic		Replies	Views
Unable to fetch graph version DeepSpeech	8	499	January 20, 2020
"Could not create model" when running either native or python DeepSpeech	21	1934	November 21, 2019
Create Model failed after transfer learning DeepSpeech	7	685	January 20, 2020
Deepspeech0.6.1 not working on gpu DeepSpeech	2	328	January 27, 2020
DeepSpeech 0.7.3 on Windows Failed to initialize memory mapped model DeepSpeech	13	5524	July 16, 2023

Can anybody help with 'Failed to create session.' (0x3006) error?

Here’s the log

Related topics