Buy new RAM!

Ole_Klett · January 29, 2021, 1:56pm

When trying to train using cpu, I got

Traceback (most recent call last):
  File "TTS/bin/train_glow_tts.py", line 647, in <module>
    main(args)
  File "TTS/bin/train_glow_tts.py", line 558, in main
    epoch)
  File "TTS/bin/train_glow_tts.py", line 190, in train
    text_input, text_lengths, mel_input, mel_lengths, attn_mask, g=speaker_c)
  File "C:\mozillatts\TTS\tts\models\glow_tts.py", line 161, in forward
    z, logdet = self.decoder(y, y_mask, g=g, reverse=False)
  File "C:\envs\project\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\mozillatts\TTS\tts\layers\glow_tts\decoder.py", line 122, in forward
    x, logdet = f(x, x_mask, g=g, reverse=reverse)
  File "C:\envs\project\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\mozillatts\TTS\tts\layers\glow_tts\glow.py", line 200, in forward
    x = self.wn(x, x_mask, g)
  File "C:\envs\project\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\mozillatts\TTS\tts\layers\generic\wavenet.py", line 105, in forward
    n_channels_tensor)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "C:\mozillatts\TTS\tts\layers\generic\wavenet.py", line 8, in <foward op>
def fused_add_tanh_sigmoid_multiply(input_a, input_b, n_channels):
    n_channels_int = n_channels[0]
    in_act = input_a + input_b
             ~~~~~~~~~~~~~~~~~ <--- HERE
    t_act = torch.tanh(in_act[:, :n_channels_int, :])
    s_act = torch.sigmoid(in_act[:, n_channels_int:, :])
RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:72] data. 
DefaultCPUAllocator: not enough memory: you tried to allocate 25411584 bytes. Buy new RAM!

This is a first on CPU.

I only changed r to 6 and used the main Glow model ( https://colab.research.google.com/drive/1NC4eQJFvVEqD8L4Rd8CVK25_Z-ypaBHD?usp=sharing ).

Is it because of r or because of the Glow model?

erogol · January 29, 2021, 12:58pm

do you train the model on CPU ?

Ole_Klett · January 29, 2021, 1:23pm

I don’t know where https://colab.research.google.com/drive/1NC4eQJFvVEqD8L4Rd8CVK25_Z-ypaBHD?usp=sharing was trained, probably on GPU.
But I train on CPU because of Training with a 2GB GPU - TTS (Text-to-Speech) - Mozilla Discourse.

erogol · January 29, 2021, 1:18pm

on colab select GPU runtime. Otherwise it has 2GB RAM and training on CPU would be too slow and impractical.

Ole_Klett · January 29, 2021, 1:23pm

I tried colab, but they started restricting GPU more. I haven’t been able to use the GPU for more than 6 days.

Also “try colab” is not solving my problem.

sanjaesc · January 29, 2021, 1:51pm

Where are you getting the 6 days restriction from? I am not able to find anything about it. The only restriction I know is that colab closes the session after 12 hours.

Edit: Ok i guess this here “As a result, users who use Colab for long-running computations, or users who have recently used more resources in Colab, are more likely to run into usage limits and have their access to GPUs and TPUs temporarily restricted.” https://research.google.com/colaboratory/faq.html#resource-limits

Ole_Klett · January 29, 2021, 1:52pm

“try colab” is not solving my problem here.
Please address it in the other thread Training with a 2GB GPU

sanjaesc · January 29, 2021, 1:56pm

Not sure where I told you to use colab.

Also the provided notebook is not available. Any more information on your setup?

Ole_Klett · January 29, 2021, 1:55pm

You are talking about Colab, but this is about why the glow model with r = 6 fails on my machine.

You are going offtopic.

Ole_Klett · January 29, 2021, 1:59pm

Which requirements does

the glow model and
r = 6

have regarding a local machine?

sanjaesc · January 29, 2021, 2:09pm

There are no fixed requirements, depends on several factors.

Did you check your process usage like ram, cpu, when starting the training?
Error message seems like you are running out of available RAM to allocate.