@lissyx Well, I was wrong with my virtual environment so I didn’t working with v0.2.0-alpha.8. Now when I’m trying to train with v0.2.0-alpha.8, I get the error that follows:
(deepspeech2_env) $ CUDA_VISIBLE_DEVICES=0,1 ./DeepSpeech.py --train_files data/train.csv --dev_files data/dev.csv --test_files data/test.csv --decoder_library_path /models/language/libctc_decoder_with_kenlm.so --lm_binary_path models/language/5gram.klm --lm_trie_path /models/language/trie --alphabet_config_path models/language/alphabet.txt --train_batch_size 64 --dev_batch_size 64 --test_batch_size 64 --n_hidden 2048 --epoch 20 --checkpoint_dir /models/session/ --summary_dir models/summary/ --summary_secs 1756 --export_dir models/modelo/ --validation_step 25
I STARTING Optimization
E OOM when allocating tensor with shape[6144,8192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
E [[Node: tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1/StackPopV2, tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/BiasAdd_grad/tuple/control_dependency)]]
E Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
E
E [[Node: tower_0/gradients/tower_0/MatMul_1_grad/tuple/control_dependency_1/_1283 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4918_tower_0/gradients/tower_0/MatMul_1_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
E Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
E
E
E Caused by op 'tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1', defined at:
E File "./DeepSpeech.py", line 1870, in <module>
E tf.app.run()
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 126, in run
E _sys.exit(main(argv))
E File "./DeepSpeech.py", line 1827, in main
E train()
E File "./DeepSpeech.py", line 1500, in train
E results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
E File "./DeepSpeech.py", line 653, in get_tower_results
E gradients = optimizer.compute_gradients(avg_loss)
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py", line 460, in compute_gradients
E colocate_gradients_with_ops=colocate_gradients_with_ops)
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 611, in gradients
E lambda: grad_fn(op, *out_grads))
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 377, in _MaybeCompile
E return grad_fn() # Exit early
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 611, in <lambda>
E lambda: grad_fn(op, *out_grads))
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/math_grad.py", line 973, in _MatMulGrad
E grad_b = math_ops.matmul(a, grad, transpose_a=True)
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 2064, in matmul
E a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2507, in _mat_mul
E name=name)
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
E op_def=op_def)
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
E op_def=op_def)
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
E self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
E
E ...which was originally created as op 'tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul', defined at:
E File "./DeepSpeech.py", line 1870, in <module>
E tf.app.run()
E [elided 2 identical lines from previous traceback]
E File "./DeepSpeech.py", line 1500, in train
E results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
E File "./DeepSpeech.py", line 635, in get_tower_results
E calculate_mean_edit_distance_and_loss(model_feeder, i, dropout_rates)
E File "./DeepSpeech.py", line 516, in calculate_mean_edit_distance_and_loss
E logits = BiRNN(batch_x, tf.to_int64(batch_seq_len), dropout)
E File "./DeepSpeech.py", line 453, in BiRNN
E sequence_length=seq_length)
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 442, in bidirectional_dynamic_rnn
E time_major=time_major, scope=bw_scope)
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 632, in dynamic_rnn
E dtype=dtype)
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 829, in _dynamic_rnn_loop
E swap_memory=swap_memory)
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3096, in while_loop
E result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2874, in BuildLoop
E pred, body, original_loop_vars, loop_vars, shape_invariants)
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2814, in _BuildLoop
E body_result = body(*packed_vars_for_body)
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3075, in <lambda>
E body = lambda i, lv: (i + 1, orig_body(*lv))
E File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 798, in _time_step
E skip_conditionals=True)
E
E ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[6144,8192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
E [[Node: tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1/StackPopV2, tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/BiasAdd_grad/tuple/control_dependency)]]
E Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
E
E [[Node: tower_0/gradients/tower_0/MatMul_1_grad/tuple/control_dependency_1/_1283 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4918_tower_0/gradients/tower_0/MatMul_1_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
E Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
E
E
Traceback (most recent call last):
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call
return fn(*args)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[6144,8192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1/StackPopV2, tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/BiasAdd_grad/tuple/control_dependency)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: tower_0/gradients/tower_0/MatMul_1_grad/tuple/control_dependency_1/_1283 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4918_tower_0/gradients/tower_0/MatMul_1_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./DeepSpeech.py", line 1666, in train
_, current_step, batch_loss, batch_report, step_summary = session.run([train_op, global_step, loss, report_params, step_summaries_op], **extra_params)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 546, in run
run_metadata=run_metadata)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 1022, in run
run_metadata=run_metadata)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 1113, in run
raise six.reraise(*original_exc_info)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/six.py", line 693, in reraise
raise value
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 1098, in run
return self._sess.run(*args, **kwargs)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 1170, in run
run_metadata=run_metadata)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 950, in run
return self._sess.run(*args, **kwargs)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
options, run_metadata)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[6144,8192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1/StackPopV2, tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/BiasAdd_grad/tuple/control_dependency)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: tower_0/gradients/tower_0/MatMul_1_grad/tuple/control_dependency_1/_1283 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4918_tower_0/gradients/tower_0/MatMul_1_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Caused by op 'tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1', defined at:
File "./DeepSpeech.py", line 1870, in <module>
tf.app.run()
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "./DeepSpeech.py", line 1827, in main
train()
File "./DeepSpeech.py", line 1500, in train
results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
File "./DeepSpeech.py", line 653, in get_tower_results
gradients = optimizer.compute_gradients(avg_loss)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py", line 460, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 611, in gradients
lambda: grad_fn(op, *out_grads))
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 377, in _MaybeCompile
return grad_fn() # Exit early
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 611, in <lambda>
lambda: grad_fn(op, *out_grads))
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/math_grad.py", line 973, in _MatMulGrad
grad_b = math_ops.matmul(a, grad, transpose_a=True)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 2064, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2507, in _mat_mul
name=name)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
...which was originally created as op 'tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul', defined at:
File "./DeepSpeech.py", line 1870, in <module>
tf.app.run()
[elided 2 identical lines from previous traceback]
File "./DeepSpeech.py", line 1500, in train
results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
File "./DeepSpeech.py", line 635, in get_tower_results
calculate_mean_edit_distance_and_loss(model_feeder, i, dropout_rates)
File "./DeepSpeech.py", line 516, in calculate_mean_edit_distance_and_loss
logits = BiRNN(batch_x, tf.to_int64(batch_seq_len), dropout)
File "./DeepSpeech.py", line 453, in BiRNN
sequence_length=seq_length)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 442, in bidirectional_dynamic_rnn
time_major=time_major, scope=bw_scope)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 632, in dynamic_rnn
dtype=dtype)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 829, in _dynamic_rnn_loop
swap_memory=swap_memory)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3096, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2874, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2814, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3075, in <lambda>
body = lambda i, lv: (i + 1, orig_body(*lv))
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 798, in _time_step
skip_conditionals=True)
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[6144,8192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1/StackPopV2, tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/BiasAdd_grad/tuple/control_dependency)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: tower_0/gradients/tower_0/MatMul_1_grad/tuple/control_dependency_1/_1283 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4918_tower_0/gradients/tower_0/MatMul_1_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Traceback (most recent call last):
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call
return fn(*args)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[6144,8192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1/StackPopV2, tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/BiasAdd_grad/tuple/control_dependency)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: tower_0/gradients/tower_0/MatMul_1_grad/tuple/control_dependency_1/_1283 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4918_tower_0/gradients/tower_0/MatMul_1_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./DeepSpeech.py", line 1666, in train
_, current_step, batch_loss, batch_report, step_summary = session.run([train_op, global_step, loss, report_params, step_summaries_op], **extra_params)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 546, in run
run_metadata=run_metadata)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 1022, in run
run_metadata=run_metadata)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 1113, in run
raise six.reraise(*original_exc_info)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/six.py", line 693, in reraise
raise value
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 1098, in run
return self._sess.run(*args, **kwargs)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 1170, in run
run_metadata=run_metadata)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 950, in run
return self._sess.run(*args, **kwargs)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
options, run_metadata)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[6144,8192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1/StackPopV2, tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/BiasAdd_grad/tuple/control_dependency)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: tower_0/gradients/tower_0/MatMul_1_grad/tuple/control_dependency_1/_1283 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4918_tower_0/gradients/tower_0/MatMul_1_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Caused by op 'tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1', defined at:
File "./DeepSpeech.py", line 1870, in <module>
tf.app.run()
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "./DeepSpeech.py", line 1827, in main
train()
File "./DeepSpeech.py", line 1500, in train
results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
File "./DeepSpeech.py", line 653, in get_tower_results
gradients = optimizer.compute_gradients(avg_loss)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py", line 460, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 611, in gradients
lambda: grad_fn(op, *out_grads))
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 377, in _MaybeCompile
return grad_fn() # Exit early
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 611, in <lambda>
lambda: grad_fn(op, *out_grads))
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/math_grad.py", line 973, in _MatMulGrad
grad_b = math_ops.matmul(a, grad, transpose_a=True)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 2064, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2507, in _mat_mul
name=name)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
...which was originally created as op 'tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul', defined at:
File "./DeepSpeech.py", line 1870, in <module>
tf.app.run()
[elided 2 identical lines from previous traceback]
File "./DeepSpeech.py", line 1500, in train
results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
File "./DeepSpeech.py", line 635, in get_tower_results
calculate_mean_edit_distance_and_loss(model_feeder, i, dropout_rates)
File "./DeepSpeech.py", line 516, in calculate_mean_edit_distance_and_loss
logits = BiRNN(batch_x, tf.to_int64(batch_seq_len), dropout)
File "./DeepSpeech.py", line 453, in BiRNN
sequence_length=seq_length)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 442, in bidirectional_dynamic_rnn
time_major=time_major, scope=bw_scope)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 632, in dynamic_rnn
dtype=dtype)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 829, in _dynamic_rnn_loop
swap_memory=swap_memory)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3096, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2874, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2814, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3075, in <lambda>
body = lambda i, lv: (i + 1, orig_body(*lv))
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 798, in _time_step
skip_conditionals=True)
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[6144,8192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul_grad/MatMul_1/StackPopV2, tower_0/gradients/tower_0/bidirectional_rnn/bw/bw/while/basic_lstm_cell/BiasAdd_grad/tuple/control_dependency)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: tower_0/gradients/tower_0/MatMul_1_grad/tuple/control_dependency_1/_1283 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4918_tower_0/gradients/tower_0/MatMul_1_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./DeepSpeech.py", line 1870, in <module>
tf.app.run()
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "./DeepSpeech.py", line 1827, in main
train()
File "./DeepSpeech.py", line 1698, in train
hook.end(session)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 463, in end
self._save(session, last_step)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 474, in _save
self._get_saver().save(session, self._save_path, global_step=step)
File "/home/user0/deepspeech2_env/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1646, in save
raise TypeError("'sess' must be a Session; %s" % sess)
TypeError: 'sess' must be a Session; <tensorflow.python.training.monitored_session.MonitoredSession object at 0x7fd90d07ef60>
[1]+ Terminado (killed)