When the model is exported during the training

hnipun · February 21, 2018, 4:40am

Hi,
When the model is exported (i.e .pb file is generated) during the training. I know we can configure the checkpoint time interval. However, I couldn’t find any information regarding the exported model. Is it only end of the training?

Thanks in advance

kdavis · February 21, 2018, 9:51am

Yes, for example see the code flow of the main method[1].

meghagowda5193 · June 13, 2018, 8:33am

does this mean we get to see the model file in the export directory we mention in the training command only if we finish the training? or as the training proceeds, after some particular epochs, can we save the trained model?

below is my training command,

python -u DeepSpeech.py \
--checkpoint_dir checkpoint \
--checkpoint_step 1 \
--dropout_rate 0.2367 \
--default_stddev 0.046875 \
--epoch 13 \
--export_dir /my_exportdir/model.pb \
--initialize_from_frozen_model models/output_graph.pb \
--learning_rate 0.0001 \
--train_files CVD/cv-valid-train.csv,CVD/cv-other-train.csv \
--dev_files CVD/cv-valid-dev.csv \
--test_files CVD/cv-valid-test.csv \
--train_batch_size 12 \
--dev_batch_size 8 \
--test_batch_size 8 \
--display_step 0 \
--validation_step 1 \
--log_level 0 \
--summary_dir summary3  \
--summary_secs 60

I am in 13th epoch and till now I don’t see anything saved in my my_exportdir folder. Any comments on this?

Thanks

utunga · June 13, 2018, 9:50pm

does this mean we get to see the model file in the export directory we mention in the training command only if we finish the training?

Yup. That’s my understanding. If you see checkpoints being created you can always stop the training, export the model (with --notrain flag set to avoid training) and then re-start the training again based on the checkpoint. That’s what I do though you may want to be careful about your learning rate parameters.

meghagowda5193 · June 15, 2018, 8:29am

Thanks for your reply.

I now exported my model in “my_exportdir” folder with contains model.pb and inside model.pb, I have output_graph.pb . This means now I have 2 output_graph.pb(1. I exported, 2. which I have from the pre-trained model which deepspeech had provided.)

When I run the below command, I am getting some output which is not accurate. but I believe I am using the model which deepspeech provided.

deepspeech models/output_graph.pb audio_Test/a.wav models/alphabet.txt models/lm.binary models/trie

If I use the model which I exported, like below,

deepspeech my_exportdir/model.pb/output_graph.pb audio_Test/a.wav models/alphabet.txt models/lm.binary models/trie

I am getting the below error.

Loading model from file my_exportdir/model.pb/output_graph.pb
2018-06-15 10:26:22.963854: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-06-15 10:26:23.075646: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-06-15 10:26:23.076005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7085
pciBusID: 0000:01:00.0
totalMemory: 5.92GiB freeMemory: 5.28GiB
2018-06-15 10:26:23.076017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
Loaded model in 0.351s.
Loading language model from files models/lm.binary models/trie
Loaded language model in 0.698s.
Running inference.
2018-06-15 10:26:25.713962: E tensorflow/core/framework/op_segment.cc:53] Create kernel failed: Invalid argument: NodeDef mentions attr ‘identical_element_shapes’ not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,4096], identical_element_shapes=true, tensor_array_name=“bidirectional_rnn/bw/bw/dynamic_rnn/input_0”, _device="/job:localhost/replica:0/task:0/device:GPU:0". (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
2018-06-15 10:26:25.714004: E tensorflow/core/common_runtime/executor.cc:643] Executor failed to create kernel. Invalid argument: NodeDef mentions attr ‘identical_element_shapes’ not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,4096], identical_element_shapes=true, tensor_array_name=“bidirectional_rnn/bw/bw/dynamic_rnn/input_0”, _device="/job:localhost/replica:0/task:0/device:GPU:0". (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
[[Node: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,4096], identical_element_shapes=true, tensor_array_name=“bidirectional_rnn/bw/bw/dynamic_rnn/input_0”, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
Error running session: Invalid argument: NodeDef mentions attr ‘identical_element_shapes’ not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,4096], identical_element_shapes=true, tensor_array_name=“bidirectional_rnn/bw/bw/dynamic_rnn/input_0”, _device="/job:localhost/replica:0/task:0/device:GPU:0". (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
[[Node: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,4096], identical_element_shapes=true, tensor_array_name=“bidirectional_rnn/bw/bw/dynamic_rnn/input_0”, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
None
Inference took 1.698s for 174.103s audio file.

any comments on this?