[FIXED] Error with master/alpha8 (unknown op: UnwrapDatasetVariant & WrapDatasetVariant)

eggonlea · May 19, 2019, 4:54am

Hello,

Background: DS 0.4.1 works fine. And now I’m switching to Master (aka alpha8). I build the Mozilla Tensorflow r1.13 and native_client successfully. And I can also train LBS with the new code.

The problem is, I cannot run the new deepspeech command utility to decode wav file.

NOTE 1: I tried both of output_graph.pbmm & output_graph.pb. So I can basically rule out convert_graphdef_memmapped_format.
NOTE 2: I tried both of local deepspeech (which I built) and the one downloaded from TaskCluster (by util/taskcluster.py). So I can basically rule out deepspeech native_client build issue.

So the only thing suspicious is the exported model. As the runtime parameters changed a lot from 0.4.1 to alpha8, I might set something wrong.

Training script

EPOCH=${1:-1}

TRAIN_FILES=\
/srv/corpus/librivox/librivox-train-clean-100.csv,\
/srv/corpus/librivox/librivox-train-clean-360.csv,\
/srv/corpus/librivox/librivox-train-other-500.csv

DEV_FILES=\
/srv/corpus/librivox/librivox-dev-clean.csv

TEST_FILES=\
/srv/corpus/librivox/librivox-test-clean.csv

CACHE_PATH=\
~/ds/cache/

time python -u ./DeepSpeech.py \
        --checkpoint_dir ~/ds/checkpoint \
        --summary_dir ~/ds/summary \
        --train_files ${TRAIN_FILES} \
        --dev_files ${DEV_FILES} \
        --test_files ${TEST_FILES} \
        --feature_cache ${CACHE_PATH} \
        --epochs ${EPOCH} \
        --n_hidden 2048 \
        --learning_rate 0.0001 \
        --dropout_rate 0.2 \
        --train_batch_size 24 \
        --dev_batch_size 48 \
        --test_batch-size 48 \
        --display_step 0\
        --validation_step 1 \
        --log_level 1 \
        --summary_secs 60

time python -u ./DeepSpeech.py --checkpoint_dir ~/ds/checkpoint --n_hidden 2048 --nouse_seq_length --export_dir ~/ds/models
time python -u ./DeepSpeech.py --checkpoint_dir ~/ds/checkpoint --n_hidden 2048 --nouse_seq_length --export_tflite --export_dir ~/ds/models
time ~/vobs/Mozilla/DeepSpeech/native_client/convert_graphdef_memmapped_format --in_graph=~/ds/models/output_graph.pb --out_graph=~/ds/models/output_graph.pbmm

With my local deepspeech

./native_client/deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio LDC93S1.wav

TensorFlow: v1.13.1-10-g3e0cc5374d

DeepSpeech: v0.5.0-alpha.8-14-g033d0d6

2019-05-18 21:34:05.617736: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA

2019-05-18 21:34:05.769112: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2019-05-18 21:34:05.769891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 

name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705

pciBusID: 0000:02:00.0

totalMemory: 10.89GiB freeMemory: 10.44GiB

2019-05-18 21:34:05.769910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0

2019-05-18 21:34:06.099880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:

2019-05-18 21:34:06.099923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 

2019-05-18 21:34:06.099927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 

2019-05-18 21:34:06.100428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10104 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)

2019-05-18 21:34:06.106328: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant

2019-05-18 21:34:06.106345: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant

2019-05-18 21:34:06.106351: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant

2019-05-18 21:34:06.106496: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

With the downloaded deepspeech

./download/deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio LDC93S1.wav

TensorFlow: v1.13.1-10-g3e0cc53

DeepSpeech: v0.5.0-alpha.8-2-gdf5bb31

2019-05-18 21:34:06.901687: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

2019-05-18 21:34:06.909320: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant

2019-05-18 21:34:06.909346: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant

2019-05-18 21:34:06.909353: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant

2019-05-18 21:34:06.909413: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

eggonlea · May 19, 2019, 5:21am

UPDATE:

Tried bin/run-ldc93s1.sh, which worked fine. So I exported a model from its checkpoint. It has the same error with the downloaded deepspeech command.

I guess the issue is in either model exporting code or libdeepspeech.so

lissyx · May 19, 2019, 8:45am

And I do use both successfully here, so it’s not that as well.

lissyx · May 19, 2019, 8:53am

Can you clarify the status here ?

lissyx · May 19, 2019, 8:57am

Also @eggonlea can you ls -hal on your checkpoint directory and exported models ?

reuben · May 19, 2019, 11:52am

I see that both your normal model and your TFLite model are using --nouse_seq_length, which is a configuration we only test for TFLite models. Can you try not using that option? Also, it’s best if you don’t reuse the same --export_dir for normal and TFLite exports.

eggonlea · May 19, 2019, 6:09pm

Thanks @lissyx and @reuben! Please see my comments below.

The script “bin/run-ldc93s1.sh” worked fine - I mean the script itself finished 200 epochs and successfully decoded LDC93S1.wav.
But if I exported a model from its checkpoint, and then fed it to the native_client command tool “deepspeech”. I met the same “unknown op” error.
I removed “–nouse_seq_length” for non-TFLite model. The same error was still there.

Please find the output of “ls -hal” below:

ldc93s1 checkpoint & models

drwxrwxr-x 2 li li 4.0K May 18 22:12 **.**
drwxrwxr-x 7 li li 4.0K May 18 22:13 **..**
-rw-rw-r-- 1 li li 481 May 17 18:19 checkpoint
-rw-rw-r-- 1 li li 647K May 18 22:10 output_graph.pb
-rw-rw-r-- 1 li li 647K May 18 22:12 output_graph.pbmm
-rw-rw-r-- 1 li li 205K May 18 22:10 output_graph.tflite
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-196.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-196.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-196.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-197.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-197.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-197.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-198.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-198.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-198.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-199.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-199.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-199.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-200.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-200.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-200.meta

Downloaded native_client
NOTE 1: I made a symbol link for libsox as Ubuntu 18.04 uses a newer version
NOTE 2: util/taskcluster.py could not download convert_graphdef_memmapped_format due to broken link

-rwxr-xr-x 1 li li 40K May 14 12:08 **deepspeech**
-r-xr-xr-x 1 li li 3.0M May 14 12:08 **generate_trie**
-r-xr-xr-x 1 li li 94M May 14 12:08 **libdeepspeech.so**
lrwxrwxrwx 1 li li 37 May 18 21:29 **libsox.so.2** -> /usr/lib/x86_64-linux-gnu/libsox.so.3
-rw-r--r-- 1 li li 17K May 14 12:02 LICENSE
-rw-rw-r-- 1 li li 9.0M May 18 21:28 **native_client.tar.xz**
-rw-r--r-- 1 li li 1.2K May 14 12:02 README.mozilla

Local native_client built from master branch

-rwxrwxr-x 1 li li 40K May 18 21:13 **deepspeech**
-rwxrwxr-x 1 li li 3.2M May 18 21:59 **generate_trie**
-r-xr-xr-x 1 li li 210M May 18 21:21 **libdeepspeech.so**
-r-xr-xr-x 1 li li 222M May 18 21:00 **convert_graphdef_memmapped_format**

lissyx · May 19, 2019, 6:17pm

That means the code is fine.

The error is about input_lengths:0 not found. The op error is not an error.

lissyx · May 19, 2019, 6:19pm

Output file size like that is impossible with a model n_hidden=2048. Please give more context and full log of training / export.

eggonlea · May 19, 2019, 6:20pm

The checkpoint and models above are from bin/run-ldc93s1.sh, which uses 100 instead of 2048.

lissyx · May 19, 2019, 6:22pm

Well we need that for the bogus model, obviously …

lissyx · May 19, 2019, 6:23pm

Also, I second @reuben suggeestion of not using --nouse_seq_length, since it does remove input_lengths …

eggonlea · May 19, 2019, 6:26pm

Sorry if I didn’t make it clear. The model I exported from bin/run-ldc93s1.sh has the same issue. As it’s much smaller, so I now use it to debug the issue. So yes, this ldc93s1 model is the bogus one.

./download/deepspeech --model ./ldc93s1/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio LDC93S1.wav

TensorFlow: v1.13.1-10-g3e0cc53

DeepSpeech: v0.5.0-alpha.8-2-gdf5bb31

2019-05-19 11:24:33.557761: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

2019-05-19 11:24:33.564525: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant

2019-05-19 11:24:33.564547: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant

2019-05-19 11:24:33.564554: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant

2019-05-19 11:24:33.564604: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

eggonlea · May 19, 2019, 6:28pm

Let me double check this, then. Thanks!

lissyx · May 19, 2019, 6:30pm

There’s something else then, because I was able to export several 0.5.0-based models without any issue …

lissyx · May 19, 2019, 6:32pm

@eggonlea Try to force --notrain --nodev when doing export as well.

reuben · May 19, 2019, 6:32pm

To make it extra clear, currently, our clients only support:

TFLite client: You must use --nouse_seq_length.
Non-TFLite client: You must not use --nouse_seq_length.

eggonlea · May 19, 2019, 6:39pm

I used them for 0.4.1. But I didn’t find them on master branch so I removed them.

I also noticed that “–epoch” becomes “–epochs” and it doesn’t support negative values now. Also, the validation and test feature cache are removed.

That said, could you please provide the parameters you use for master branch?

reuben · May 19, 2019, 6:46pm

On master, not passing --train_files disables the training phase.

lissyx · May 19, 2019, 7:37pm

github.com/Common-Voice/commonvoice-fr

DeepSpeech/train_fr.sh

c24e2c8e7

#!/bin/bash

set -xe

pushd $HOME/ds/
	all_train_csv="$(find /mnt/extracted/data/ -type f -name '*train.csv' -printf '%p,' | sed -e 's/,$//g')"
	all_dev_csv="$(find /mnt/extracted/data/ -type f -name '*dev.csv' -printf '%p,' | sed -e 's/,$//g')"
	all_test_csv="$(find /mnt/extracted/data/ -type f -name '*test.csv' -printf '%p,' | sed -e 's/,$//g')"

	mkdir -p /mnt/sources/feature_cache || true

	if [ ! -f "/mnt/models/output_graph.pb" ]; then
		python -u DeepSpeech.py \
			--alphabet_config_path /mnt/models/alphabet.txt \
			--lm_binary_path /mnt/lm/lm.binary \
			--lm_trie_path /mnt/lm/trie \
			--feature_cache /mnt/sources/feature_cache \
			--train_files ${all_train_csv} \
			--dev_files ${all_dev_csv} \
			--test_files ${all_test_csv} \

This file has been truncated. show original

Topic		Replies	Views
Native client not returning output DeepSpeech	19	1574	January 3, 2019
DeepSpeech model training DeepSpeech	65	7966	November 12, 2019
Where I can get pre-trained model for version 5 DeepSpeech	8	2048	May 3, 2019
Problem when training my own model DeepSpeech	6	1362	November 2, 2020
RuntimeError: CreateModel failed with error code 12294 DeepSpeech	9	2581	January 9, 2020

[FIXED] Error with master/alpha8 (unknown op: UnwrapDatasetVariant & WrapDatasetVariant)

Related topics