[FIXED] Error with master/alpha8 (unknown op: UnwrapDatasetVariant & WrapDatasetVariant)

Hello,

Background: DS 0.4.1 works fine. And now I’m switching to Master (aka alpha8). I build the Mozilla Tensorflow r1.13 and native_client successfully. And I can also train LBS with the new code.

The problem is, I cannot run the new deepspeech command utility to decode wav file.

NOTE 1: I tried both of output_graph.pbmm & output_graph.pb. So I can basically rule out convert_graphdef_memmapped_format.
NOTE 2: I tried both of local deepspeech (which I built) and the one downloaded from TaskCluster (by util/taskcluster.py). So I can basically rule out deepspeech native_client build issue.

So the only thing suspicious is the exported model. As the runtime parameters changed a lot from 0.4.1 to alpha8, I might set something wrong.

Training script

EPOCH=${1:-1}

TRAIN_FILES=\
/srv/corpus/librivox/librivox-train-clean-100.csv,\
/srv/corpus/librivox/librivox-train-clean-360.csv,\
/srv/corpus/librivox/librivox-train-other-500.csv

DEV_FILES=\
/srv/corpus/librivox/librivox-dev-clean.csv

TEST_FILES=\
/srv/corpus/librivox/librivox-test-clean.csv

CACHE_PATH=\
~/ds/cache/

time python -u ./DeepSpeech.py \
        --checkpoint_dir ~/ds/checkpoint \
        --summary_dir ~/ds/summary \
        --train_files ${TRAIN_FILES} \
        --dev_files ${DEV_FILES} \
        --test_files ${TEST_FILES} \
        --feature_cache ${CACHE_PATH} \
        --epochs ${EPOCH} \
        --n_hidden 2048 \
        --learning_rate 0.0001 \
        --dropout_rate 0.2 \
        --train_batch_size 24 \
        --dev_batch_size 48 \
        --test_batch-size 48 \
        --display_step 0\
        --validation_step 1 \
        --log_level 1 \
        --summary_secs 60

time python -u ./DeepSpeech.py --checkpoint_dir ~/ds/checkpoint --n_hidden 2048 --nouse_seq_length --export_dir ~/ds/models
time python -u ./DeepSpeech.py --checkpoint_dir ~/ds/checkpoint --n_hidden 2048 --nouse_seq_length --export_tflite --export_dir ~/ds/models
time ~/vobs/Mozilla/DeepSpeech/native_client/convert_graphdef_memmapped_format --in_graph=~/ds/models/output_graph.pb --out_graph=~/ds/models/output_graph.pbmm

With my local deepspeech

./native_client/deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio LDC93S1.wav

TensorFlow: v1.13.1-10-g3e0cc5374d

DeepSpeech: v0.5.0-alpha.8-14-g033d0d6

2019-05-18 21:34:05.617736: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA

2019-05-18 21:34:05.769112: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2019-05-18 21:34:05.769891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 

name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705

pciBusID: 0000:02:00.0

totalMemory: 10.89GiB freeMemory: 10.44GiB

2019-05-18 21:34:05.769910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0

2019-05-18 21:34:06.099880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:

2019-05-18 21:34:06.099923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 

2019-05-18 21:34:06.099927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 

2019-05-18 21:34:06.100428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10104 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)

2019-05-18 21:34:06.106328: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant

2019-05-18 21:34:06.106345: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant

2019-05-18 21:34:06.106351: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant

2019-05-18 21:34:06.106496: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

With the downloaded deepspeech

./download/deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio LDC93S1.wav

TensorFlow: v1.13.1-10-g3e0cc53

DeepSpeech: v0.5.0-alpha.8-2-gdf5bb31

2019-05-18 21:34:06.901687: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

2019-05-18 21:34:06.909320: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant

2019-05-18 21:34:06.909346: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant

2019-05-18 21:34:06.909353: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant

2019-05-18 21:34:06.909413: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

UPDATE:

Tried bin/run-ldc93s1.sh, which worked fine. So I exported a model from its checkpoint. It has the same error with the downloaded deepspeech command.

I guess the issue is in either model exporting code or libdeepspeech.so

And I do use both successfully here, so it’s not that as well.

Can you clarify the status here ?

Also @eggonlea can you ls -hal on your checkpoint directory and exported models ?

I see that both your normal model and your TFLite model are using --nouse_seq_length, which is a configuration we only test for TFLite models. Can you try not using that option? Also, it’s best if you don’t reuse the same --export_dir for normal and TFLite exports.

Thanks @lissyx and @reuben! Please see my comments below.

  1. The script “bin/run-ldc93s1.sh” worked fine - I mean the script itself finished 200 epochs and successfully decoded LDC93S1.wav.

  2. But if I exported a model from its checkpoint, and then fed it to the native_client command tool “deepspeech”. I met the same “unknown op” error.

  3. I removed “–nouse_seq_length” for non-TFLite model. The same error was still there.

Please find the output of “ls -hal” below:

ldc93s1 checkpoint & models

drwxrwxr-x 2 li li 4.0K May 18 22:12 **.**
drwxrwxr-x 7 li li 4.0K May 18 22:13 **..**
-rw-rw-r-- 1 li li 481 May 17 18:19 checkpoint
-rw-rw-r-- 1 li li 647K May 18 22:10 output_graph.pb
-rw-rw-r-- 1 li li 647K May 18 22:12 output_graph.pbmm
-rw-rw-r-- 1 li li 205K May 18 22:10 output_graph.tflite
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-196.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-196.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-196.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-197.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-197.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-197.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-198.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-198.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-198.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-199.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-199.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-199.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-200.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-200.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-200.meta

Downloaded native_client
NOTE 1: I made a symbol link for libsox as Ubuntu 18.04 uses a newer version
NOTE 2: util/taskcluster.py could not download convert_graphdef_memmapped_format due to broken link

-rwxr-xr-x 1 li li 40K May 14 12:08 **deepspeech**
-r-xr-xr-x 1 li li 3.0M May 14 12:08 **generate_trie**
-r-xr-xr-x 1 li li 94M May 14 12:08 **libdeepspeech.so**
lrwxrwxrwx 1 li li 37 May 18 21:29 **libsox.so.2** -> /usr/lib/x86_64-linux-gnu/libsox.so.3
-rw-r--r-- 1 li li 17K May 14 12:02 LICENSE
-rw-rw-r-- 1 li li 9.0M May 18 21:28 **native_client.tar.xz**
-rw-r--r-- 1 li li 1.2K May 14 12:02 README.mozilla

Local native_client built from master branch

-rwxrwxr-x 1 li li 40K May 18 21:13 **deepspeech**
-rwxrwxr-x 1 li li 3.2M May 18 21:59 **generate_trie**
-r-xr-xr-x 1 li li 210M May 18 21:21 **libdeepspeech.so**
-r-xr-xr-x 1 li li 222M May 18 21:00 **convert_graphdef_memmapped_format**

That means the code is fine.

The error is about input_lengths:0 not found. The op error is not an error.

Output file size like that is impossible with a model n_hidden=2048. Please give more context and full log of training / export.

The checkpoint and models above are from bin/run-ldc93s1.sh, which uses 100 instead of 2048.

Well we need that for the bogus model, obviously …

Also, I second @reuben suggeestion of not using --nouse_seq_length, since it does remove input_lengths

Sorry if I didn’t make it clear. The model I exported from bin/run-ldc93s1.sh has the same issue. As it’s much smaller, so I now use it to debug the issue. So yes, this ldc93s1 model is the bogus one.

./download/deepspeech --model ./ldc93s1/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio LDC93S1.wav

TensorFlow: v1.13.1-10-g3e0cc53

DeepSpeech: v0.5.0-alpha.8-2-gdf5bb31

2019-05-19 11:24:33.557761: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

2019-05-19 11:24:33.564525: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant

2019-05-19 11:24:33.564547: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant

2019-05-19 11:24:33.564554: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant

2019-05-19 11:24:33.564604: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

Let me double check this, then. Thanks!

There’s something else then, because I was able to export several 0.5.0-based models without any issue …

@eggonlea Try to force --notrain --nodev when doing export as well.

To make it extra clear, currently, our clients only support:

TFLite client: You must use --nouse_seq_length.
Non-TFLite client: You must not use --nouse_seq_length.

I used them for 0.4.1. But I didn’t find them on master branch so I removed them.

I also noticed that “–epoch” becomes “–epochs” and it doesn’t support negative values now. Also, the validation and test feature cache are removed.

That said, could you please provide the parameters you use for master branch?

On master, not passing --train_files disables the training phase.

1 Like