Hello,
I have a DS model trained on tf 1.6rc0 (refer to this issue in github: https://github.com/mozilla/DeepSpeech/issues/1223
). I used librivox, TED, VoxForge and CV datasets as training datasets and its testset WER performance is about 6% on the librivox-test-clean dataset.
For inferencing, I downloaded native_client binaries by using the taskcluster method. Using the deepspeech in there, I am able to inference a wav file from the exported graph from the training above.
Output from a sample inferencing session:
============= BEGIN ==============
./deepspeech …/export_train_all_corpora/output_graph.pb models/alphabet.txt
models/lm.binary
models/trie data/ted/TEDLIUM_release2/test/wav/BillGates_2010-940.56-953.42.wav
TensorFlow: v1.6.0-11-g7554dd8
DeepSpeech: v0.1.1-48-g31c01db
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-04-02 22:40:29.249149: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
things going on in poor countries still some agriculture hopefully we will have cleaned up forstfrecementso to get to that eighty percent the developed countries
============ END ================
In the framework that my internal team has setup, I need to send inferencing requests to deepspeech-server (github : https://github.com/MainRo/deepspeech-server), which utilizes deepspeech python library.
However, when I do a pip install deepspeech and inference the same wav file from the same exported graph, I get the following error:
============ BEGIN ===================
deepspeech …/export_train_all_corpora/output_graph.pb data/ted/TEDLIUM_release2/test/wav/BillGates_2010-940.56-953.42.wav models/alphabet.txt models/lm.binary models/trie
Loading model from file …/export_train_all_corpora/output_graph.pb
2018-04-03 10:17:18.091189: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.456s.
Loading language model from files models/lm.binary models/trie
Loaded language model in 1.245s.
Running inference.
2018-04-03 10:17:23.056869: E tensorflow/core/framework/op_segment.cc:53] Create kernel failed: Invalid argument: NodeDef mentions attr ‘identical_element_shapes’ not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,4096], identical_element_shapes=true, tensor_array_name=“bidirectional_rnn/bw/bw/dynamic_rnn/input_0”, _device="/job:localhost/replica:0/task:0/device:CPU:0". (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
2018-04-03 10:17:23.056969: E tensorflow/core/common_runtime/executor.cc:643] Executor failed to create kernel. Invalid argument: NodeDef mentions attr ‘identical_element_shapes’ not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,4096], identical_element_shapes=true, tensor_array_name=“bidirectional_rnn/bw/bw/dynamic_rnn/input_0”, _device="/job:localhost/replica:0/task:0/device:CPU:0". (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
[[Node: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,4096], identical_element_shapes=true, tensor_array_name=“bidirectional_rnn/bw/bw/dynamic_rnn/input_0”, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Error running session: Invalid argument: NodeDef mentions attr ‘identical_element_shapes’ not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,4096], identical_element_shapes=true, tensor_array_name=“bidirectional_rnn/bw/bw/dynamic_rnn/input_0”, _device="/job:localhost/replica:0/task:0/device:CPU:0". (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
[[Node: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,4096], identical_element_shapes=true, tensor_array_name=“bidirectional_rnn/bw/bw/dynamic_rnn/input_0”, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
None
Inference took 3.282s for 12.860s audio file.
(virenv_py2_ds)
==================== END ====================
For reference, this pip install’ed deepspeech is able to inference the same wav file off of the output_graph.pb as published in the DeepSpeech releases page (v 0.1.1)
=============== BEGIN ===============
deepspeech models/output_graph.pb data/ted/TEDLIUM_release2/test/wav/BillGates_2010-940.56-953.42.wav models/alphabet.txt models/lm.binary models/trie
Loading model from file models/output_graph.pb
2018-04-03 10:16:21.939155: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.434s.
Loading language model from files models/lm.binary models/trie
Loaded language model in 1.183s.
Running inference.
things going on in poor countries still some agriculture a hopefully we will have cleaned up a for free cement a so to get to that eighty per cent the developed countries
Inference took 25.305s for 12.860s audio file.
================ END ====================
Is the deepspeech from pip based on some older version of tensorflow?
Can you please share the wheel corresponding to the binaries that I got as a result of taskcluster?
thank you,
regards,
Buvana