Hi all, I’ve been trying to train an acoustic model using release v0.8.2, but my process gets killed after a certain amount of time. Checking the kernel logs, it seems to be an OOM issue, but on the system level, not GPU.
The command I’m running to invoke training is:
python -u DeepSpeech.py --train_files ../speech-transcription/training/train_data.csv --train_batch_size 1000 --n_hidden 2048 --epochs 30 --verbosity 1
This same OOM error occurs when batch size is set to 1.
Hardware specs:
OS: Ubuntu 20.04.1 LTS (Focal Fossa)
MOBO: MSI X299 Pro
CPU: Intel i9-10900X 10-core 3.7 GHz
RAM: 64 GB Corsair Vengeance 3200 MHz
GPU: Nvidia RTX Titan 24 GB
I uploaded the relevant kernel logs to pastebin: https://pastebin.com/NgS94xPQ
Has anyone experience a similar issue when training? I have a hunch it’s an issue with my hardware, but I’m not sure. I’m considering executing a training run on an AWS EC2 P3 instance to verify my training data is good, but it should be, it’s all 16kHz wav files.