Can anybody help with 'Failed to create session.' (0x3006) error?

Hi ya’ll, wondering if anyone can shed some lights on this.

Yes, we are running a custom model and (eek) custom native_client binaries (ever so slightly tweaked from ds_0.7.1)

I can run this trained output_graph.pbmm model in inference mode using DeepSpeech.py (pretty sure its the same exact file though I might double check this).

To narrow things down I made sure to use

  • dockerfile based on tensorflow/tensorflow:1.15.2-gpu-py3
  • python3.7
  • the standard code for client.py - copied and pasted into client.py below

But as you can see, I’m getting
CreateModel failed with ‘Failed to create session.’ (0x3006)

I’m sort of wondering if the problem is the Tesla K80 not having the required level of CUDA GPU compatibility? (Which is different from the machine where we trained the model and on which I confirmed DeepSpeech.py inference was working).

Or, quite probably there’s just something wrong with my custom binary (I’ll roll back to standard to check I guess?)

Or is the problem that we’re using a custom alphabet? There doesn’t seem to be anywhere in the API to specify a custom alphabet anymore?

Or… probably I’m just missing something obvious. Any help much appreciated!

Nga Mihi / Thanks in advance

Here’s the log

nvidia-docker run --rm -it -u $(id -u):$(id -g) -v $(pwd):/work -w /work/ docker.dragonfly.co.nz/reo-tuhituhi-gpu:8a020f7 bash

________                               _______________
___  __/__________________________________  ____/__  /________      __
__  /  _  _ \_  __ \_  ___/  __ \_  ___/_  /_   __  /_  __ \_ | /| / /
_  /   /  __/  / / /(__  )/ /_/ /  /   _  __/   _  / / /_/ /_ |/ |/ /
/_/    \___//_/ /_//____/ \____//_/    /_/      /_/  \____/____/|__/


You are running this container as user with ID 1000 and group 1000,
which should map to the ID and group for your user on the Docker host. Great!

tf-docker /work > cd client
tf-docker /work/client > nvidia-smi
Tue Jun 16 12:28:08 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   66C    P0    61W / 149W |      3MiB / 11441MiB |     68%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
tf-docker /work/client > python3 --version
Python 3.7.7
tf-docker /work/client > python3 client.py --model ../model/20200610_ds.0.7.1_thm/output_graph.pbmm --audio ../146238.wav
Loading model from file ../model/20200610_ds.0.7.1_thm/output_graph.pbmm
TensorFlow: v1.14.0-0-g87989f6959
DeepSpeech: v0.7.1-4-g7ff16422
2020-06-16 12:28:16.544591: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
Invalid argument: No OpKernel was registered to support Op 'Minimum' used by {{node Minimum}}with these attrs: [T=DT_FLOAT]
Registered devices: [CPU]
Registered kernels:
  <no registered kernels>

         [[Minimum]]
Traceback (most recent call last):
  File "client.py", line 162, in <module>
    main()
  File "client.py", line 117, in main
    ds = Model(args.model)
  File "/usr/local/lib/python3.7/dist-packages/deepspeech/__init__.py", line 38, in __init__
    raise RuntimeError("CreateModel failed with '{}' (0x{:X})".format(deepspeech.impl.ErrorCodeToErrorMessage(status),status))
RuntimeError: CreateModel failed with 'Failed to create session.' (0x3006)

Does this help? https://github.com/mozilla/DeepSpeech/wiki/Finding-TensorFlow-op-kernel-targets-when-you-add-new-operations-to-the-graph

Appreciate the suggestion but yeah, haven’t changed the actual underlying NN at all on this occasion, just trained it on different data. The binary is a little different as I mentioned but not any of the actual tensorflow stuff (just adding confidences to the metadata).

However it does make me wonder if perhaps there is a version mis-match between the model and the binary I’m using (though they appear to be both v0.7.1, basically)… is that the kind of direction I should be looking in?

Actually, this is weird, our standard graph has Minimum in it too, and so we include it in our dependencies. Are you sure you didn’t mess anything up when editing the native client? Check the “//tensorflow/core/kernels:deepspeech_cwise_ops” target in native_client/BUILD.

And FWIW the metadata fields are TensorFlow stuff, and they can definitely change the required ops and kernels.

What is that? 0.7 should be using tensorflow r1.15.

Pretty sure just reading the log-probs that are already there and passing them out wouldn’t have changed any of the underlying ops.

That said, it seems likely that I might have as you suggested “messed something up when editing the native client [build]” if some of the Ops are not registered.

Ah, thanks @lissyx … not sure how that happened. Will make sure this is right also.