Expected all tensors to be on the same device, but found at least two devices

Dias · November 30, 2020, 10:58am

Trying to train vocoder on RTX 3090 with torch 1.8.0 nightly (because 1.7.0 doesn’t support 3090) and get this error. Does anyone know where can be the error?

TRAINING (2020-11-30 16:55:21)
! Run is removed from /home/abai/Downloads/kazakh_synthes/Results/multiband-melgan-November-30-2020_04+55PM-f6c96b0
Traceback (most recent call last):
File “bin/train_vocoder.py”, line 647, in
main(args)
File “bin/train_vocoder.py”, line 551, in main
epoch)
File “bin/train_vocoder.py”, line 149, in train
feats_real, y_hat_sub, y_G_sub)
File “/home/abai/MozillaTTS/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 744, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/abai/TTS/TTS/vocoder/layers/losses.py”, line 234, in forward
stft_loss_mg, stft_loss_sc = self.stft_loss(y_hat.squeeze(1), y.squeeze(1))
File “/home/abai/MozillaTTS/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 744, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/abai/TTS/TTS/vocoder/layers/losses.py”, line 71, in forward
lm, lsc = f(y_hat, y)
File “/home/abai/MozillaTTS/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 744, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/abai/TTS/TTS/vocoder/layers/losses.py”, line 47, in forward
y_hat_M = self.stft(y_hat)
File “/home/abai/TTS/TTS/vocoder/layers/losses.py”, line 26, in call
return_complex=True)
File “/home/abai/MozillaTTS/lib/python3.7/site-packages/torch/functional.py”, line 516, in stft
normalized, onesided, return_complex)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

dkreutz · November 30, 2020, 6:36pm

Just guessing here, but your issue might be caused by a bug in the nightly build.

erogol · December 3, 2020, 2:34am

I’d not use nightly and how come 3090 is not supported exactly, just out of curiosity?

Dias · December 3, 2020, 10:55am

sorry, wrong info, it is actually supported, the problem is with pytorch 1.7 and 1.8

Dias · December 8, 2020, 6:38am

opening this topic again, as the problem is still here, I tried training on 2080 and 3090, and as I tried none of them are support torch 1.6 (as torch 1.6 doesn’t support Cuda 11).
So this issue happens only on torch 1.7 and 1.8 nightly builds, the solution is to initialize parameters to gpu when the matrix/tensors are created I will search the docs and write the results here

erogol · December 8, 2020, 3:12pm

I think it is fixed in dev branch.

nana_nan · December 9, 2020, 4:59am

I dont know where the error is but I used this command and worked for me.

# CUDA 10.2
pip install torch==1.6.0 torchvision==0.7.0

# CUDA 10.1
pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

# CUDA 9.2
pip install torch==1.6.0+cu92 torchvision==0.7.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html

# CPU only
pip install torch==1.6.0+cpu torchvision==0.7.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

Dias · December 9, 2020, 6:24am

what GPU do you have?

nana_nan · December 9, 2020, 8:41am

I am using Quadro RTX 4000.

Zain_Mujahid · June 1, 2021, 9:59am

@Dias How you managed to get past this problem? I am having the exact same issue with my RTX 3060. When I try to start training the vocoder model it says “expected all tensors to be on the same device, but found at least two devices”.

I tried downgrading the torch as well, also tried using the nightly version. but the issue remains. Can you please let me know how you fixed it? Thanks.

Dias · June 3, 2021, 2:36am

As I remember, I searched the docs, some tensors in pytorch were loading on gpu, some on cpu, that was the conflict. I don’t know current situation on latest versions, as I was working with it half a year ago

Zain_Mujahid · June 4, 2021, 9:37pm

Thank you for your response. Yes the issue was with tensors being on different devices so I simply shifted them to .cuda() . This solved my issue.

rastko · June 22, 2021, 3:32pm

Hi @Zain_Mujahid

How did you solve that? Can you paste a sample of the code?

Zain_Mujahid · June 23, 2021, 7:15am

Hello @rastko,

You need to check from your error that in which file you are getting the issue and then shift the tensors to same device, either on cpu (.cpu()) or gpu (.cuda()).

For me it was in /TTS/vocoder/layers/losses.py and i did following change
line 21 self.window.cuda()

P.S. you will need to re install the directory if you do any changes in the files. Or you can directly change the files form your TTS installation under your environment’s site-packages.

Thanks.