Version 0.5 pefomance issues with GPUs at lower link speed

After installing version 0.5 I noticed massive performance difference on on gpu that is not directly connected to the motherboard. I used 1x extension to better cool the gpu but devices connected with pcie 1x extension work significantly slower and how slow depends on PCIE generation. PCIe gen 1 runs up to 6-8 times slower, PCIE gen 2 runs 3-4 times slower and PCIE gen3 runs 2-3 times slower.

There was no issue with pcie link size before v0.5 so something must have changed, any ideas about how this could be resolved or at least what maybe causing the problem?

It’s absolutely not something we did. Maybe it is a different behavior from TensorFlow or CUDA runtime, but we have no play there.

@Proper Could you share more details on your hardware setup, as well as how you evaluate that ?

I use riser that miners use to pull the gpu out of the case for better cooling, its just 1x to 16x pcie extension.

Everything worked fine until v0.5 at which point things got rather slow.
At best gpu connected via extension is 2x slower than same GPU connected to the motherboard. Not that big of a deal I just connected directly but this seems to be bandwidth related.
When connected by 1x extension higher Generation of PCIe, setting on the motherboard improves performance.
For example when port is set to GEN 1 processing time is 3.4 seconds on a file.
GEN 2 does same file in 2.4 seconds
GEN 3 does same file in 1.4 seconds
When gpu is connected directly it always runs at 0.6sec no matter what pcie configuration is

I also suspecting this is something that came from CUDA or driver. I am just not sure what setting if any could be changed to remedy this.
And this affects all gpus not connected directly to the board, I tested with wide verity.
Will keep this thread updated if I find anything

@Proper That’s a lot of variables in your problems, and we cannot help / support with that. Still, having more details about your extact setup might help diagnose and/or be useful for others.

@Proper While debugging Dual CPU socket inference I thought of something. What’s your GPU ? We changed the CUDA capabilities around that version (to save space), I’m wondering if it’s a side-effect.

This affects any GPU. I tested about 8 different generations and models of GTX cards as well as multiple variants of RTX.
They all work fine in 0.4 and all suffer performance loss in 0.5 only when connected via 1x extension.
This is very easy to replicate, extension costs 10$

My tests were done with AMD TR and EPYC cpus, I will try intel cpu and let you know if issue persists.

Yes, but all those models might be recents enough

You are forgetting the delays in shipping, testing, reproducing.

FTR, my development box is AMD Threadripper with two RTX2080Ti (directly connected to the CPU, ASUS X399 PRIME), and no slow down.

Slowdown only happens if GPU connected via an extension.

I have a system exactly like yours, Asus X399 TR with RTX cards.
If you ever wanted to reproduce this issue, just unplug one gpu from the board and connect it via 16x to 1x extension. After reboot you will see performance drop on that gpu

I tested GTX, 680, 780, 980, 1060, 1080, 1080Ti, RTX 2060, 2080, 2080 ti and few others as well. All had similar % losses running via extension on 0.5, but ran normal when connected directly.
However when using 0.4 or prior versions all these gpus run identical directly connected or via extension.

I understand this is an odd thing, and you have limited time - I appreciate your input on this.