RuntimeError: CUDA out of memory. after the first epoch with custom dataset

robert.jimerson · April 2, 2020, 4:52pm

I’m trying to train a model using a custom dataset but I get a CUDA out of memory error after the first epoch. I’m able to train a model using LJSpeech fine. I’ve tried reducing the batch size from 32 to 16 to 8 all the way down to a batch size of 1. I am looking at running on a bigger GPU but I’m wondering if there’s anything else I can do or if it has something to do with my new dataset. Below are some details from the new dataset:

 > DataLoader initialization
| > Use phonemes: False
| > Number of instances : 3221
| > Max length sequence: 522
| > Min length sequence: 2
| > Avg length sequence: 59.317913691400186
| > Num. instances discarded by max-min (max=150, min=6) seq limits: 209
| > Batch group size: 0.

Comparing the new dataset against the LJSpeech dataset the Max length sequence is loading but the Avg length sequence is shorter.

 > DataLoader initialization
 | > Use phonemes: True
 | > phoneme language: en-us
 | > Number of instances : 12000
 | > Max length sequence: 187
 | > Min length sequence: 7
 | > Avg length sequence: 98.32825
 | > Num. instances discarded by max-min (max=150, min=6) seq limits: 584
 | > Batch group size: 0.

Could it be because the Max length sequence is so much longer in the new dataset than the LJSpeech dataset? 522 to 187? Any help is appreciated!

erogol · April 3, 2020, 10:20am

try shorter max length, smaller batch size. How large is GPU mem?

robert.jimerson · April 12, 2020, 4:15pm

I was able to get it to run through with batch 32. I checked my dataset and I had some speech samples over 30 seconds long. The GPU I have is 8 GB. I run a check to make sure all my speech samples are under 20 seconds and that works for my environment.

erogol · April 15, 2020, 9:37am

8GB is quite small. Batch 32 might be too big. Consider 16.