I’ve been trying to use multiple gpus for training but the training hangs(Responsive, but indefinitely stays here without aby output or error) at initializing process group. Any insights on how to fix this?
Tried pytorch 0.4.1 and 1+.
"distributed":{
"backend": "nccl",
"url": "tcp:\/\/localhost:23456"
},