Hi, I’m running on Ubuntu 18.04 with an Nvidia RTX 3080. I checked out the v0.8.2 label on github and only modified the alphabet.txt to accomodate the german language common voice dataset.
I’m getting a rather long error message when trying to run this command:
./DeepSpeech.py --train_files ./data/CV/de/clips/train.csv --dev_files ./data/CV/de/clips/dev.csv --test_files ./data/CV/de/clips/test.csv --log_level 0
I believe the error message boils down to this line:
2020-10-12 01:10:20.357132: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: ptxas exited with non-zero error code 65280, output: ptxas fatal : Value ‘sm_86’ is not defined for option ‘gpu-name’
the entire log looks like this:
2020-10-12 00:55:02.977870: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-10-12 00:55:02.999112: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3999980000 Hz
2020-10-12 00:55:02.999355: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5b8b4a0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-12 00:55:02.999366: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-10-12 00:55:03.000739: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-10-12 00:55:03.075007: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-12 00:55:03.075414: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5c25230 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-10-12 00:55:03.075430: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 3080, Compute Capability 8.6
2020-10-12 00:55:03.075539: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-12 00:55:03.075880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: GeForce RTX 3080 major: 8 minor: 6 memoryClockRate(GHz): 1.71
pciBusID: 0000:01:00.0
2020-10-12 00:55:03.076083: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-10-12 00:55:03.076901: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-10-12 00:55:03.077578: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-10-12 00:55:03.077745: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-10-12 00:55:03.078661: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-10-12 00:55:03.079365: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-10-12 00:55:03.081532: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-12 00:55:03.081602: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-12 00:55:03.081934: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-12 00:55:03.082203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-10-12 00:55:03.082227: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-10-12 00:55:03.082778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-12 00:55:03.082787: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0
2020-10-12 00:55:03.082792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N
2020-10-12 00:55:03.082860: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-12 00:55:03.083161: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-12 00:55:03.083446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:0 with 8801 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6)
2020-10-12 00:55:03.753431: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-12 00:55:03.753746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: GeForce RTX 3080 major: 8 minor: 6 memoryClockRate(GHz): 1.71
pciBusID: 0000:01:00.0
2020-10-12 00:55:03.753775: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-10-12 00:55:03.753783: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-10-12 00:55:03.753792: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-10-12 00:55:03.753800: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-10-12 00:55:03.753808: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-10-12 00:55:03.753815: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-10-12 00:55:03.753822: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-12 00:55:03.753859: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-12 00:55:03.754153: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-12 00:55:03.754419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
WARNING:tensorflow:From /home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:347: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Usetf.compat.v1.data.get_output_types(iterator)
.
W1012 00:55:03.927856 139958901720896 deprecation.py:323] From /home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:347: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Usetf.compat.v1.data.get_output_types(iterator)
.
WARNING:tensorflow:From /home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:348: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Usetf.compat.v1.data.get_output_shapes(iterator)
.
W1012 00:55:03.928005 139958901720896 deprecation.py:323] From /home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:348: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Usetf.compat.v1.data.get_output_shapes(iterator)
.
WARNING:tensorflow:From /home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:350: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Usetf.compat.v1.data.get_output_classes(iterator)
.
W1012 00:55:03.928088 139958901720896 deprecation.py:323] From /home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:350: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Usetf.compat.v1.data.get_output_classes(iterator)
.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.W1012 00:55:04.018131 139958901720896 lazy_loader.py:50] The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons * https://github.com/tensorflow/io (for I/O related ops) If you depend on functionality not listed there, please file an issue. WARNING:tensorflow:From /home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/rnn/python/ops/lstm_ops.py:597: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version. Instructions for updating: Please use `layer.add_weight` method instead. W1012 00:55:04.019410 139958901720896 deprecation.py:323] From /home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/rnn/python/ops/lstm_ops.py:597: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version. Instructions for updating: Please use `layer.add_weight` method instead. WARNING:tensorflow:From /home/lukas/DeepSpeech/training/deepspeech_training/train.py:245: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where W1012 00:55:04.068966 139958901720896 deprecation.py:323] From /home/lukas/DeepSpeech/training/deepspeech_training/train.py:245: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where 2020-10-12 00:55:04.431726: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-10-12 00:55:04.432038: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: name: GeForce RTX 3080 major: 8 minor: 6 memoryClockRate(GHz): 1.71 pciBusID: 0000:01:00.0 2020-10-12 00:55:04.432065: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 2020-10-12 00:55:04.432074: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 2020-10-12 00:55:04.432082: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0 2020-10-12 00:55:04.432090: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0 2020-10-12 00:55:04.432098: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0 2020-10-12 00:55:04.432106: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0 2020-10-12 00:55:04.432114: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2020-10-12 00:55:04.432152: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-10-12 00:55:04.432444: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-10-12 00:55:04.432710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0 2020-10-12 00:55:04.432726: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-10-12 00:55:04.432732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0 2020-10-12 00:55:04.432735: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N 2020-10-12 00:55:04.432781: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-10-12 00:55:04.433072: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-10-12 00:55:04.433344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8801 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6) D Session opened. I Could not find best validating checkpoint. I Could not find most recent checkpoint. I Initializing all variables. I STARTING Optimization Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 2020-10-12 00:55:15.773105: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 2020-10-12 00:56:27.844114: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2020-10-12 01:10:20.357132: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: ptxas exited with non-zero error code 65280, output: ptxas fatal : Value 'sm_86' is not defined for option 'gpu-name' Relying on driver to perform ptx compilation. This message will be only logged once. 2020-10-12 01:10:21.064574: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(63, 494), b.shape=(494, 2048), m=63, n=2048, k=494 [[{{node tower_0/MatMul}}]] [[concat/concat/_99]] (1) Internal: Blas GEMM launch failed : a.shape=(63, 494), b.shape=(494, 2048), m=63, n=2048, k=494 [[{{node tower_0/MatMul}}]] 0 successful operations. 0 derived errors ignored. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "./DeepSpeech.py", line 12, in <module> ds_train.run_script() File "/home/lukas/DeepSpeech/training/deepspeech_training/train.py", line 961, in run_script absl.app.run(main) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/absl/app.py", line 300, in run _run_main(main, args) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "/home/lukas/DeepSpeech/training/deepspeech_training/train.py", line 933, in main train() File "/home/lukas/DeepSpeech/training/deepspeech_training/train.py", line 601, in train train_loss, _ = run_set('train', epoch, train_init_op) File "/home/lukas/DeepSpeech/training/deepspeech_training/train.py", line 566, in run_set feed_dict=feed_dict) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(63, 494), b.shape=(494, 2048), m=63, n=2048, k=494 [[node tower_0/MatMul (defined at /home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] [[concat/concat/_99]] (1) Internal: Blas GEMM launch failed : a.shape=(63, 494), b.shape=(494, 2048), m=63, n=2048, k=494 [[node tower_0/MatMul (defined at /home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] 0 successful operations. 0 derived errors ignored. Original stack trace for 'tower_0/MatMul': File "./DeepSpeech.py", line 12, in <module> ds_train.run_script() File "/home/lukas/DeepSpeech/training/deepspeech_training/train.py", line 961, in run_script absl.app.run(main) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/absl/app.py", line 300, in run _run_main(main, args) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "/home/lukas/DeepSpeech/training/deepspeech_training/train.py", line 933, in main train() File "/home/lukas/DeepSpeech/training/deepspeech_training/train.py", line 479, in train gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates) File "/home/lukas/DeepSpeech/training/deepspeech_training/train.py", line 312, in get_tower_results avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0) File "/home/lukas/DeepSpeech/training/deepspeech_training/train.py", line 239, in calculate_mean_edit_distance_and_loss logits, _ = create_model(batch_x, batch_seq_len, dropout, reuse=reuse, rnn_impl=rnn_impl) File "/home/lukas/DeepSpeech/training/deepspeech_training/train.py", line 180, in create_model layers['layer_1'] = layer_1 = dense('layer_1', batch_x, Config.n_hidden_1, dropout_rate=dropout[0]) File "/home/lukas/DeepSpeech/training/deepspeech_training/train.py", line 82, in dense output = tf.nn.bias_add(tf.matmul(x, weights), bias) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper return target(*args, **kwargs) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/ops/math_ops.py", line 2754, in matmul a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 6136, in mat_mul name=name) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper op_def=op_def) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op attrs, op_def, compute_device) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal op_def=op_def) File "/home/lukas/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__ self._traceback = tf_stack.extract_stack()
apt list --installed | grep cuda
yields the following list:
cuda-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [installiert]
cuda-command-line-tools-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-compiler-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-cublas-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-cublas-dev-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-cudart-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-cudart-dev-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-cufft-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-cufft-dev-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-cuobjdump-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-cupti-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-curand-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-curand-dev-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-cusolver-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-cusolver-dev-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-cusparse-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-cusparse-dev-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-demo-suite-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-documentation-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-driver-dev-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-drivers/unbekannt,unbekannt,now 455.23.05-1 amd64 [Installiert,automatisch]
cuda-drivers-455/unbekannt,unbekannt,now 455.23.05-1 amd64 [Installiert,automatisch]
cuda-gdb-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-gpu-library-advisor-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-libraries-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-libraries-dev-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-license-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-memcheck-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-misc-headers-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-npp-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-npp-dev-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-nsight-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-nsight-compute-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-nvcc-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-nvdisasm-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-nvgraph-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-nvgraph-dev-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-nvjpeg-10-0/unbekannt,now 10.0.130.1-1 amd64 [Installiert,automatisch]
cuda-nvjpeg-dev-10-0/unbekannt,now 10.0.130.1-1 amd64 [Installiert,automatisch]
cuda-nvml-dev-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-nvprof-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-nvprune-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-nvrtc-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-nvrtc-dev-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-nvtx-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-nvvp-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-repo-ubuntu1804-10-0-local-10.0.130-410.48/now 1.0-1 amd64 [Installiert,lokal]
cuda-runtime-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-samples-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-toolkit-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-tools-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
cuda-visual-tools-10-0/unbekannt,unbekannt,now 10.0.130-1 amd64 [Installiert,automatisch]
libcudart9.1/bionic,now 9.1.85-3ubuntu1 amd64 [Installiert,automatisch]
libcudnn7/unbekannt,now 7.6.4.38-1+cuda10.0 amd64 [Installiert,aktualisierbar auf: 7.6.5.32-1+cuda10.2]
nvidia-cuda-dev/bionic,now 9.1.85-3ubuntu1 amd64 [Installiert,automatisch]
nvidia-cuda-doc/bionic,bionic,now 9.1.85-3ubuntu1 all [Installiert,automatisch]
nvidia-cuda-gdb/bionic,now 9.1.85-3ubuntu1 amd64 [Installiert,automatisch]
nvidia-cuda-toolkit/bionic,now 9.1.85-3ubuntu1 amd64 [installiert]
nvidia-smi results in this:
Mon Oct 12 09:04:21 2020
±----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3080 On | 00000000:01:00.0 On | N/A |
| 0% 50C P5 46W / 320W | 519MiB / 10014MiB | 9% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1247 G /usr/lib/xorg/Xorg 34MiB |
| 0 N/A N/A 1303 G /usr/bin/gnome-shell 78MiB |
| 0 N/A N/A 1862 G /usr/lib/xorg/Xorg 254MiB |
| 0 N/A N/A 2010 G /usr/bin/gnome-shell 48MiB |
| 0 N/A N/A 2603 G /usr/lib/firefox/firefox 4MiB |
| 0 N/A N/A 2712 G /usr/lib/firefox/firefox 4MiB |
| 0 N/A N/A 2926 G …token=7611235723361034942 21MiB |
| 0 N/A N/A 3395 G …/debug.log --shared-files 15MiB |
| 0 N/A N/A 3702 G /usr/lib/firefox/firefox 4MiB |
| 0 N/A N/A 3742 G /usr/lib/firefox/firefox 4MiB |
| 0 N/A N/A 13105 G gnome-control-center 4MiB |
| 0 N/A N/A 20096 G /usr/bin/vlc 35MiB |
±----------------------------------------------------------------------------+
The gpu-memory usage shoots up to 9gb and then remains there for a few minutes, while GPU-util stays <10%. The training then crashes. I believe it has something to do with the compute capability of the 3080 and the older cuda required by tensorflow 1.15 not being able to work with it.