[FIXED] Error with master/alpha8 (unknown op: UnwrapDatasetVariant & WrapDatasetVariant)

UPDATE:

Tried bin/run-ldc93s1.sh, which worked fine. So I exported a model from its checkpoint. It has the same error with the downloaded deepspeech command.

I guess the issue is in either model exporting code or libdeepspeech.so

And I do use both successfully here, so it’s not that as well.

Can you clarify the status here ?

Also @eggonlea can you ls -hal on your checkpoint directory and exported models ?

I see that both your normal model and your TFLite model are using --nouse_seq_length, which is a configuration we only test for TFLite models. Can you try not using that option? Also, it’s best if you don’t reuse the same --export_dir for normal and TFLite exports.

Thanks @lissyx and @reuben! Please see my comments below.

  1. The script “bin/run-ldc93s1.sh” worked fine - I mean the script itself finished 200 epochs and successfully decoded LDC93S1.wav.

  2. But if I exported a model from its checkpoint, and then fed it to the native_client command tool “deepspeech”. I met the same “unknown op” error.

  3. I removed “–nouse_seq_length” for non-TFLite model. The same error was still there.

Please find the output of “ls -hal” below:

ldc93s1 checkpoint & models

drwxrwxr-x 2 li li 4.0K May 18 22:12 **.**
drwxrwxr-x 7 li li 4.0K May 18 22:13 **..**
-rw-rw-r-- 1 li li 481 May 17 18:19 checkpoint
-rw-rw-r-- 1 li li 647K May 18 22:10 output_graph.pb
-rw-rw-r-- 1 li li 647K May 18 22:12 output_graph.pbmm
-rw-rw-r-- 1 li li 205K May 18 22:10 output_graph.tflite
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-196.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-196.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-196.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-197.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-197.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-197.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-198.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-198.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-198.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-199.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-199.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-199.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-200.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-200.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-200.meta

Downloaded native_client
NOTE 1: I made a symbol link for libsox as Ubuntu 18.04 uses a newer version
NOTE 2: util/taskcluster.py could not download convert_graphdef_memmapped_format due to broken link

-rwxr-xr-x 1 li li 40K May 14 12:08 **deepspeech**
-r-xr-xr-x 1 li li 3.0M May 14 12:08 **generate_trie**
-r-xr-xr-x 1 li li 94M May 14 12:08 **libdeepspeech.so**
lrwxrwxrwx 1 li li 37 May 18 21:29 **libsox.so.2** -> /usr/lib/x86_64-linux-gnu/libsox.so.3
-rw-r--r-- 1 li li 17K May 14 12:02 LICENSE
-rw-rw-r-- 1 li li 9.0M May 18 21:28 **native_client.tar.xz**
-rw-r--r-- 1 li li 1.2K May 14 12:02 README.mozilla

Local native_client built from master branch

-rwxrwxr-x 1 li li 40K May 18 21:13 **deepspeech**
-rwxrwxr-x 1 li li 3.2M May 18 21:59 **generate_trie**
-r-xr-xr-x 1 li li 210M May 18 21:21 **libdeepspeech.so**
-r-xr-xr-x 1 li li 222M May 18 21:00 **convert_graphdef_memmapped_format**

That means the code is fine.

The error is about input_lengths:0 not found. The op error is not an error.

Output file size like that is impossible with a model n_hidden=2048. Please give more context and full log of training / export.

The checkpoint and models above are from bin/run-ldc93s1.sh, which uses 100 instead of 2048.

Well we need that for the bogus model, obviously …

Also, I second @reuben suggeestion of not using --nouse_seq_length, since it does remove input_lengths

Sorry if I didn’t make it clear. The model I exported from bin/run-ldc93s1.sh has the same issue. As it’s much smaller, so I now use it to debug the issue. So yes, this ldc93s1 model is the bogus one.

./download/deepspeech --model ./ldc93s1/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio LDC93S1.wav

TensorFlow: v1.13.1-10-g3e0cc53

DeepSpeech: v0.5.0-alpha.8-2-gdf5bb31

2019-05-19 11:24:33.557761: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

2019-05-19 11:24:33.564525: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant

2019-05-19 11:24:33.564547: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant

2019-05-19 11:24:33.564554: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant

2019-05-19 11:24:33.564604: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Let me double check this, then. Thanks!

There’s something else then, because I was able to export several 0.5.0-based models without any issue …

@eggonlea Try to force --notrain --nodev when doing export as well.

To make it extra clear, currently, our clients only support:

TFLite client: You must use --nouse_seq_length.
Non-TFLite client: You must not use --nouse_seq_length.

I used them for 0.4.1. But I didn’t find them on master branch so I removed them.

I also noticed that “–epoch” becomes “–epochs” and it doesn’t support negative values now. Also, the validation and test feature cache are removed.

That said, could you please provide the parameters you use for master branch?

On master, not passing --train_files disables the training phase.

1 Like

Yes, confirmed this is the reason. Inference works although the “unknown op” error is still there.

1 Like