UPDATE:
Tried bin/run-ldc93s1.sh, which worked fine. So I exported a model from its checkpoint. It has the same error with the downloaded deepspeech command.
I guess the issue is in either model exporting code or libdeepspeech.so
UPDATE:
Tried bin/run-ldc93s1.sh, which worked fine. So I exported a model from its checkpoint. It has the same error with the downloaded deepspeech command.
I guess the issue is in either model exporting code or libdeepspeech.so
And I do use both successfully here, so it’s not that as well.
Can you clarify the status here ?
I see that both your normal model and your TFLite model are using --nouse_seq_length
, which is a configuration we only test for TFLite models. Can you try not using that option? Also, it’s best if you don’t reuse the same --export_dir
for normal and TFLite exports.
Thanks @lissyx and @reuben! Please see my comments below.
The script “bin/run-ldc93s1.sh” worked fine - I mean the script itself finished 200 epochs and successfully decoded LDC93S1.wav.
But if I exported a model from its checkpoint, and then fed it to the native_client command tool “deepspeech”. I met the same “unknown op” error.
I removed “–nouse_seq_length” for non-TFLite model. The same error was still there.
Please find the output of “ls -hal” below:
ldc93s1 checkpoint & models
drwxrwxr-x 2 li li 4.0K May 18 22:12 **.**
drwxrwxr-x 7 li li 4.0K May 18 22:13 **..**
-rw-rw-r-- 1 li li 481 May 17 18:19 checkpoint
-rw-rw-r-- 1 li li 647K May 18 22:10 output_graph.pb
-rw-rw-r-- 1 li li 647K May 18 22:12 output_graph.pbmm
-rw-rw-r-- 1 li li 205K May 18 22:10 output_graph.tflite
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-196.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-196.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-196.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-197.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-197.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-197.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-198.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-198.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-198.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-199.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-199.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-199.meta
-rw-rw-r-- 1 li li 1.9M May 17 18:19 train-200.data-00000-of-00001
-rw-rw-r-- 1 li li 1.3K May 17 18:19 train-200.index
-rw-rw-r-- 1 li li 1.2M May 17 18:19 train-200.meta
Downloaded native_client
NOTE 1: I made a symbol link for libsox as Ubuntu 18.04 uses a newer version
NOTE 2: util/taskcluster.py could not download convert_graphdef_memmapped_format due to broken link
-rwxr-xr-x 1 li li 40K May 14 12:08 **deepspeech**
-r-xr-xr-x 1 li li 3.0M May 14 12:08 **generate_trie**
-r-xr-xr-x 1 li li 94M May 14 12:08 **libdeepspeech.so**
lrwxrwxrwx 1 li li 37 May 18 21:29 **libsox.so.2** -> /usr/lib/x86_64-linux-gnu/libsox.so.3
-rw-r--r-- 1 li li 17K May 14 12:02 LICENSE
-rw-rw-r-- 1 li li 9.0M May 18 21:28 **native_client.tar.xz**
-rw-r--r-- 1 li li 1.2K May 14 12:02 README.mozilla
Local native_client built from master branch
-rwxrwxr-x 1 li li 40K May 18 21:13 **deepspeech**
-rwxrwxr-x 1 li li 3.2M May 18 21:59 **generate_trie**
-r-xr-xr-x 1 li li 210M May 18 21:21 **libdeepspeech.so**
-r-xr-xr-x 1 li li 222M May 18 21:00 **convert_graphdef_memmapped_format**
That means the code is fine.
The error is about input_lengths:0
not found. The op error is not an error.
Output file size like that is impossible with a model n_hidden=2048
. Please give more context and full log of training / export.
The checkpoint and models above are from bin/run-ldc93s1.sh, which uses 100 instead of 2048.
Well we need that for the bogus model, obviously …
Also, I second @reuben suggeestion of not using --nouse_seq_length
, since it does remove input_lengths
…
Sorry if I didn’t make it clear. The model I exported from bin/run-ldc93s1.sh has the same issue. As it’s much smaller, so I now use it to debug the issue. So yes, this ldc93s1 model is the bogus one.
./download/deepspeech --model ./ldc93s1/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio LDC93S1.wav
TensorFlow: v1.13.1-10-g3e0cc53
DeepSpeech: v0.5.0-alpha.8-2-gdf5bb31
2019-05-19 11:24:33.557761: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-05-19 11:24:33.564525: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant
2019-05-19 11:24:33.564547: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant
2019-05-19 11:24:33.564554: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant
2019-05-19 11:24:33.564604: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant
Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph
Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph
Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph
Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph
Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph
Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph
Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph
Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph
Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph
Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph
Let me double check this, then. Thanks!
There’s something else then, because I was able to export several 0.5.0-based models without any issue …
To make it extra clear, currently, our clients only support:
TFLite client: You must use --nouse_seq_length
.
Non-TFLite client: You must not use --nouse_seq_length
.
I used them for 0.4.1. But I didn’t find them on master branch so I removed them.
I also noticed that “–epoch” becomes “–epochs” and it doesn’t support negative values now. Also, the validation and test feature cache are removed.
That said, could you please provide the parameters you use for master branch?
On master, not passing --train_files
disables the training phase.
Yes, confirmed this is the reason. Inference works although the “unknown op” error is still there.