Jump to content
  • Advertisement
Sign in to follow this  
maxest

PCI Express Throughput

This topic is 384 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Here (https://en.wikipedia.org/wiki/PCI_Express) is a nice table outlining speeds for PCI Express. I have GF 660 GTX with motherboard with PCI Express 3.0 x16. I made a test by writing a simple D3D11 app that download 1920x1080x32 (8 MB) image from GPU to CPU. The whole operation takes 8 ms. In second this sums up to around 1 GB of data, which corresponds exactly to PCI Express 3.0 x1. Is this how it is supposed to work? Is it like all CopyResource/Map data goes through one of the 16 lanes?

Share this post


Link to post
Share on other sites
Advertisement

I've never heard this described in detail, but I would imagine that while the interface may have 1, 2, 4, 8 or 16 lanes, the card and driver determine how the data is transmitted. I would assume that if the data can fit within a single lane, a single lane would be used.

Edited by MarkS

Share this post


Link to post
Share on other sites

The question is what it means "if the data can fit". I would like to copy data back from GPU to CPU as fast as possible and since no other data go that way expect for my one texture download I would ideally like to utilized all 16 lines. If that's possible of course.

Share this post


Link to post
Share on other sites

The question is what it means "if the data can fit". I would like to copy data back from GPU to CPU as fast as possible and since no other data go that way expect for my one texture download I would ideally like to utilized all 16 lines. If that's possible of course.

 

You are not streaming data to the monitor. You are telling the card, through the driver, how much data is to be transferred and the card and driver make the appropriate decisions as to how that happens.

 

You have to understand that you have absolutely no control over what the graphics card and driver does in this matter. I'm not 100% convinced that the driver has control over this, and if not, the user never will.

 

Out of curiosity, why is this important to you? Have you found yourself bottle-necked by the number of lanes used, or are you looking at potential issues?

Edited by MarkS

Share this post


Link to post
Share on other sites

I'm just looking at potential uses. I'm aware the GPU->CPU traffic should be avoided as much as possible but for some tests I needed to do this and to make those tests reliable I wanted to utilize full transfer potential.

 

On a side note, uploading data (CPU -> GPU) takes 3-5 ms (around twice faster than the other way around).

Share this post


Link to post
Share on other sites

Is this how it is supposed to work? Is it like all CopyResource/Map data goes through one of the 16 lanes?

No thats not how its supposed to work... if it is setup for Pcie3.0 x16 then it should have all 16 lanes transferring at the same time.  Maybe your videocard isn't in the x16 slot or maybe its misconfigured.

Share this post


Link to post
Share on other sites

..In addition not because you motherboard support PCI-E 3.0 doesn't mean that our graphics cards support PCI-E 3.0, and because the graphics card specification states 3.0 support I would still be wary. The GPU may fallback to a lower speed if certain conditions are not met so unless you have all the low level specification for the GPU in question the all we are dealing with is specification.

Share this post


Link to post
Share on other sites

I'm now testing my work computer which is brand new with GeForce 1080 GTX. See detailed spec in this picture: https://postimg.org/image/hwhuntpn5/

 

Now my tests show upload (CPU->GPU) 8 GB/s and download (GPU->CPU) 3 GB/s.

PCI-E is bidirectional and all sources I've found claim the transfer rate in both directions should be identical, what is not true in my case.

Share this post


Link to post
Share on other sites

Nothing. It's something like this (download):

        uint64 bef = TickCount();
 
        deviceContext->CopyResource(stagingCopy.texture, gbufferDiffuseRT.texture);

        D3D11_MAPPED_SUBRESOURCE mappedSubresource;
        deviceContext->Map(stagingCopy.texture, 0, D3D11_MAP_READ, 0, &mappedSubresource);
        memcpy(mydata, mappedSubresource.pData, sizeof(mydata));
        deviceContext->Unmap(stagingCopy.texture, 0);

        uint64 aft = TickCount();
        cout << aft - bef << endl;

 

As for my home GeForce 660 GTX I've just checked in HWINFO app that it's plugged into PCI-E 2.0, hence the slower speed than at my work computer.

Nevertheless I presume the 8 GB/s and 3 GB/s should be bigger. And identical.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!