I’m trying to train a model using a custom dataset but I get a CUDA out of memory error after the first epoch. I’m able to train a model using LJSpeech fine. I’ve tried reducing the batch size from 32 to 16 to 8 all the way down to a batch size of 1. I am looking at running on a bigger GPU but I’m wondering if there’s anything else I can do or if it has something to do with my new dataset. Below are some details from the new dataset:
> DataLoader initialization
| > Use phonemes: False
| > Number of instances : 3221
| > Max length sequence: 522
| > Min length sequence: 2
| > Avg length sequence: 59.317913691400186
| > Num. instances discarded by max-min (max=150, min=6) seq limits: 209
| > Batch group size: 0.
Comparing the new dataset against the LJSpeech dataset the Max length sequence is loading but the Avg length sequence is shorter.
> DataLoader initialization
| > Use phonemes: True
| > phoneme language: en-us
| > Number of instances : 12000
| > Max length sequence: 187
| > Min length sequence: 7
| > Avg length sequence: 98.32825
| > Num. instances discarded by max-min (max=150, min=6) seq limits: 584
| > Batch group size: 0.
Could it be because the Max length sequence is so much longer in the new dataset than the LJSpeech dataset? 522 to 187? Any help is appreciated!