WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased


(Megha ) #1

Hi,

I am at the end of the training now (13th epoch). below is the command with parameter settings I used for training.

python -u DeepSpeech.py \
--checkpoint_dir checkpoint \
--checkpoint_step 1 \
--dropout_rate 0.2367 \
--default_stddev 0.046875 \
--epoch 13 \
--export_dir /my_exportdir/model.pb \
--initialize_from_frozen_model models/output_graph.pb \
--learning_rate 0.0001 \
--train_files CVD/cv-valid-train.csv,CVD/cv-other-train.csv \
--dev_files CVD/cv-valid-dev.csv \
--test_files CVD/cv-valid-test.csv \
--train_batch_size 12 \
--dev_batch_size 8 \
--test_batch_size 8 \
--display_step 0 \
--validation_step 1 \
--log_level 0 \
--summary_dir summary3  \
--summary_secs 60

But I am getting the below warning frequently. I am not sure if I am getting the warning correctly. Should I tweak something to avoid this?

WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 364416 vs previous value: 364416. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.

Thanks :slight_smile:


(Tilman Kamp) #2

Hi,

Some questions:
Is this a continued training -> were there already any snapshot files before training started?
Is this warning coming since the very beginning of the training or just during the last epoch?
When does the warning occur: During training (expected), validation (unexpected) or test (unexpected).
Does it occur on every epoch or just at the beginning or end of the training?


(Megha ) #3

I notice this warning during the end of my training(in last few epochs). Now I am done with training I believe. Please see the below code to know what I got at the end of my training.

D Sending Job (ID: 492, worker: 0, epoch: 13, set_name: test)...
D Computing Job (ID: 493, worker: 0, epoch: 13, set_name: test)...
D Starting batch...
D Finished batch step 364416.
D Sending Job (ID: 493, worker: 0, epoch: 13, set_name: test)...
D Computing Job (ID: 494, worker: 0, epoch: 13, set_name: test)...
D Starting batch...
D Finished batch step 364416.
D Sending Job (ID: 494, worker: 0, epoch: 13, set_name: test)...
I Test of Epoch 13 - WER: 0.361877, loss: 33.44531447021763, mean edit distance: 0.191006
I --------------------------------------------------------------------------------
I WER: 0.125000, loss: 0.200928, mean edit distance: 0.025641
I  - src: "i guess i'm not quite the football type"
I  - res: "i guess im not quite the football type"
I --------------------------------------------------------------------------------
I WER: 0.200000, loss: 0.058690, mean edit distance: 0.074074
I  - src: "that's true the boy thought"
I  - res: "thats true the boy thought "
I --------------------------------------------------------------------------------
I WER: 0.200000, loss: 0.117750, mean edit distance: 0.068966
I  - src: "he doesn't have anything else"
I  - res: "he doesnt have anything else "
I --------------------------------------------------------------------------------
I WER: 0.200000, loss: 0.129502, mean edit distance: 0.034483
I  - src: "he doesn't have anything else"
I  - res: "he doesnt have anything else"
I --------------------------------------------------------------------------------
I WER: 0.200000, loss: 0.136908, mean edit distance: 0.047619
I  - src: "then i don't get paid"
I  - res: "then i dont get paid"
I --------------------------------------------------------------------------------
I WER: 0.200000, loss: 0.185607, mean edit distance: 0.080000
I  - src: "we've got to do something"
I  - res: "weve got to do something "
I --------------------------------------------------------------------------------
I WER: 0.250000, loss: 0.010482, mean edit distance: 0.125000
I  - src: "i don't know you"
I  - res: "i dont know you "
I --------------------------------------------------------------------------------
I WER: 0.250000, loss: 0.011586, mean edit distance: 0.111111
I  - src: "i don't believe it"
I  - res: "i dont believe it "
I --------------------------------------------------------------------------------
I WER: 0.250000, loss: 0.026088, mean edit distance: 0.055556
I  - src: "it isn't the money"
I  - res: "it isnt the money"
I --------------------------------------------------------------------------------
I WER: 0.375000, loss: 0.199804, mean edit distance: 0.066667
I  - src: "i say we don't go out any more"
I  - res: "i say we dont go out anymore"
I --------------------------------------------------------------------------------
D Epochs - running: 0, done: 1
D Closing queues...
D Queues closed.
D Session closed.
D Done.
I Exporting the model...
2018-06-13 10:45:33.870094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-06-13 10:45:33.870228: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 103 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
Traceback (most recent call last):
  File "DeepSpeech.py", line 1838, in <module>
    tf.app.run()
  File "/home/megha/Alu_Meg/DeepSpeech_Alug_Meg/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "DeepSpeech.py", line 1829, in main
    export()
  File "DeepSpeech.py", line 1739, in export
    os.makedirs(FLAGS.export_dir)
  File "/home/megha/Alu_Meg/DeepSpeech_Alug_Meg/deepspeech-venv/lib/python2.7/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/home/megha/Alu_Meg/DeepSpeech_Alug_Meg/deepspeech-venv/lib/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/my_exportdir'
(deepspeech-venv) megha@megha-medion:~/Alu_Meg/DeepSpeech_Alug_Meg/DeepSpeech$

This means that there is a problem when exporting the model. I see that there is a permission denied error. Now, how can I give permission to the folder within the deepseech command or should I handle it separately. If so, please any hints.

Thanks :slight_smile:


(Tilman Kamp) #4

“–export_dir /my_exportdir/model.pb” is telling DeepSpeech to save the model into a folder in your system’s root directory. I doubt that this is what you intended. If you want to save it into a sub-folder of your DeepSpeech checkout, you should create a directory named “my_eportdir” within your checkout and remove the trailing slash from your path: “–export_dir /my_exportdir/model.pb” -> “–export_dir my_exportdir/model.pb”


(Megha ) #5

okay, thanks:). I will try this and get back to you if I run into any other problems. But 1 doubt, I can use the same command as I used to do for training before with this changed section right? I mean I will use the below code 1 now or should I just use parameters for exporting the model alone without any other parameters as in code 2?
code 1:

python -u DeepSpeech.py
–checkpoint_dir checkpoint
–checkpoint_step 1
–dropout_rate 0.2367
–default_stddev 0.046875
–epoch 13
–export_dir my_exportdir/model.pb
–initialize_from_frozen_model models/output_graph.pb
–learning_rate 0.0001
–train_files CVD/cv-valid-train.csv,CVD/cv-other-train.csv
–dev_files CVD/cv-valid-dev.csv
–test_files CVD/cv-valid-test.csv
–train_batch_size 12
–dev_batch_size 8
–test_batch_size 8
–display_step 0
–validation_step 1
–log_level 0
–summary_dir summary3
–summary_secs 60

or
code 2:

python -u DeepSpeech.py \
  --export_dir my_exportdir/model.pb \

(Tilman Kamp) #6

“code 1” plus --no-train --no-test and probably without --initialize_from_frozen_model


(Megha ) #7

This worked. Thank you :slight_smile:


(Megha ) #8

@Tilman_Kamp

I am struck with this problem. Any suggestions to overcome this?

Thanks :slight_smile: