[FIXED] Error with master/alpha8 (unknown op: UnwrapDatasetVariant & WrapDatasetVariant)

eggonlea · May 19, 2019, 5:22am

UPDATE:

Tried bin/run-ldc93s1.sh, which worked fine. So I exported a model from its checkpoint. It has the same error with the downloaded deepspeech command.

I guess the issue is in either model exporting code or libdeepspeech.so

lissyx · May 19, 2019, 8:45am

And I do use both successfully here, so it’s not that as well.

lissyx · May 19, 2019, 8:53am

Can you clarify the status here ?

lissyx · May 19, 2019, 8:57am

Also @eggonlea can you ls -hal on your checkpoint directory and exported models ?

reuben · May 19, 2019, 11:52am

I see that both your normal model and your TFLite model are using --nouse_seq_length, which is a configuration we only test for TFLite models. Can you try not using that option? Also, it’s best if you don’t reuse the same --export_dir for normal and TFLite exports.

eggonlea · May 19, 2019, 6:09pm

Thanks @lissyx and @reuben! Please see my comments below.

The script “bin/run-ldc93s1.sh” worked fine - I mean the script itself finished 200 epochs and successfully decoded LDC93S1.wav.
But if I exported a model from its checkpoint, and then fed it to the native_client command tool “deepspeech”. I met the same “unknown op” error.
I removed “–nouse_seq_length” for non-TFLite model. The same error was still there.

Please find the output of “ls -hal” below:

ldc93s1 checkpoint & models

drwxrwxr-x 2 li li 4.0K May 18 22:12 **.**
drwxrwxr-x 7 li li 4.0K May 18 22:13 **..**
-rw-rw-r-- 1 li li 481 May 17 18:19 checkpoint
-rw-rw-r-- 1 li li 647K May 18 22:10 output_graph.pb
-rw-rw-r-- 1 li li 647K May 18 22:12 output_graph.pbmm
-rw-rw-r-- 1 li li 205K May 18 22:10 output_graph.tflite
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-196.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-196.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-196.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-197.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-197.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-197.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-198.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-198.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-198.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-199.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-199.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-199.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-200.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-200.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-200.meta

Downloaded native_client
NOTE 1: I made a symbol link for libsox as Ubuntu 18.04 uses a newer version
NOTE 2: util/taskcluster.py could not download convert_graphdef_memmapped_format due to broken link

-rwxr-xr-x 1 li li 40K May 14 12:08 **deepspeech**
-r-xr-xr-x 1 li li 3.0M May 14 12:08 **generate_trie**
-r-xr-xr-x 1 li li 94M May 14 12:08 **libdeepspeech.so**
lrwxrwxrwx 1 li li 37 May 18 21:29 **libsox.so.2** -> /usr/lib/x86_64-linux-gnu/libsox.so.3
-rw-r--r-- 1 li li 17K May 14 12:02 LICENSE
-rw-rw-r-- 1 li li 9.0M May 18 21:28 **native_client.tar.xz**
-rw-r--r-- 1 li li 1.2K May 14 12:02 README.mozilla

Local native_client built from master branch

-rwxrwxr-x 1 li li 40K May 18 21:13 **deepspeech**
-rwxrwxr-x 1 li li 3.2M May 18 21:59 **generate_trie**
-r-xr-xr-x 1 li li 210M May 18 21:21 **libdeepspeech.so**
-r-xr-xr-x 1 li li 222M May 18 21:00 **convert_graphdef_memmapped_format**

lissyx · May 19, 2019, 6:17pm

That means the code is fine.

The error is about input_lengths:0 not found. The op error is not an error.

lissyx · May 19, 2019, 6:19pm

Output file size like that is impossible with a model n_hidden=2048. Please give more context and full log of training / export.

eggonlea · May 19, 2019, 6:20pm

The checkpoint and models above are from bin/run-ldc93s1.sh, which uses 100 instead of 2048.

lissyx · May 19, 2019, 6:22pm

Well we need that for the bogus model, obviously …

lissyx · May 19, 2019, 6:23pm

Also, I second @reuben suggeestion of not using --nouse_seq_length, since it does remove input_lengths …

eggonlea · May 19, 2019, 6:26pm

Sorry if I didn’t make it clear. The model I exported from bin/run-ldc93s1.sh has the same issue. As it’s much smaller, so I now use it to debug the issue. So yes, this ldc93s1 model is the bogus one.

./download/deepspeech --model ./ldc93s1/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio LDC93S1.wav

TensorFlow: v1.13.1-10-g3e0cc53

DeepSpeech: v0.5.0-alpha.8-2-gdf5bb31

2019-05-19 11:24:33.557761: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

2019-05-19 11:24:33.564525: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant

2019-05-19 11:24:33.564547: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant

2019-05-19 11:24:33.564554: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant

2019-05-19 11:24:33.564604: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

eggonlea · May 19, 2019, 6:28pm

Let me double check this, then. Thanks!

lissyx · May 19, 2019, 6:30pm

There’s something else then, because I was able to export several 0.5.0-based models without any issue …

lissyx · May 19, 2019, 6:32pm

@eggonlea Try to force --notrain --nodev when doing export as well.

reuben · May 19, 2019, 6:32pm

To make it extra clear, currently, our clients only support:

TFLite client: You must use --nouse_seq_length.
Non-TFLite client: You must not use --nouse_seq_length.

eggonlea · May 19, 2019, 6:39pm

I used them for 0.4.1. But I didn’t find them on master branch so I removed them.

I also noticed that “–epoch” becomes “–epochs” and it doesn’t support negative values now. Also, the validation and test feature cache are removed.

That said, could you please provide the parameters you use for master branch?

reuben · May 19, 2019, 6:46pm

On master, not passing --train_files disables the training phase.

lissyx · May 19, 2019, 7:37pm

github.com

Common-Voice/commonvoice-fr/blob/c24e2c8e746ad88344217ebea48f14343de29940/DeepSpeech/train_fr.sh

#!/bin/bash

set -xe

pushd $HOME/ds/
	all_train_csv="$(find /mnt/extracted/data/ -type f -name '*train.csv' -printf '%p,' | sed -e 's/,$//g')"
	all_dev_csv="$(find /mnt/extracted/data/ -type f -name '*dev.csv' -printf '%p,' | sed -e 's/,$//g')"
	all_test_csv="$(find /mnt/extracted/data/ -type f -name '*test.csv' -printf '%p,' | sed -e 's/,$//g')"

	mkdir -p /mnt/sources/feature_cache || true

	if [ ! -f "/mnt/models/output_graph.pb" ]; then
		python -u DeepSpeech.py \
			--alphabet_config_path /mnt/models/alphabet.txt \
			--lm_binary_path /mnt/lm/lm.binary \
			--lm_trie_path /mnt/lm/trie \
			--feature_cache /mnt/sources/feature_cache \
			--train_files ${all_train_csv} \
			--dev_files ${all_dev_csv} \
			--test_files ${all_test_csv} \

This file has been truncated. show original

eggonlea · May 19, 2019, 8:10pm

Yes, confirmed this is the reason. Inference works although the “unknown op” error is still there.