• Advertisement
Sign in to follow this  

GPU RAM access latency...

This topic is 4274 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Is the GPU RAM access latency on PCI Express generally less costly in terms of time than AGP? Is this a direction we should be developing toward? Through a pattern of contemporary shader development, should we try to bring this type of enhancement to the forefront of GPU hardware design? Should I just hush up and be patient for the next bus, and the one after that, etc? :)

Share this post


Link to post
Share on other sites
Advertisement
Guest Anonymous Poster
The pcie-16 bus runs at least double the speed of agp8x. However this is still slow compared to the card's local video ram, and many sli designs divide the bandwidth between 2 or sometimes 4 chips. The ideal case would be to upload all scene geometry, all textures and all shaders into the card and only send updates from the cpu. When the physics simulation is done on the card, even that data can be left in place on the card and this speeds up rendering and lowers latency. The system memory should only be used as a level2 cache between the videocard and the storage media (aka. the disk). For a fully programmable gpu, it's possible that the cpu only sends the network and the game logic data to the rendering chip and the rest is calculated by the chip, then the optional collision data is packed and sent back to the game logic for processing and optionally sending through the network.

Viktor

ps: The above to work, we need a gpu that can render scenes with changing models in changing movement phases and at changing coordinates. All it needs is an input vector containing positions, model and skin selectors and physics data. The first step is to determine the effects of the physical rules on the objects, then render them. This can be written with a vertex shader that has writeback support, so it can store the results in an output vector, that can be sent back to the cpu and used as the base vector for the next step. This way the cpu will only have to handle the network traffic, the high level game logic and the optional ai logic.

Share this post


Link to post
Share on other sites
I don't have any exact numbers on latency for either bus, but I wouldn't expect it to be much lower on PCI-Express. It has twice the bandwidth, yes, but that doesn't affect latency, and the signals still have to be sent the same physical distance, which is the main issue there.

So no, probably a bad idea to develop towards lower latency. And you probably won't get lower latency with the next bus, or the next again. Sorry. [wink]

Share this post


Link to post
Share on other sites
Graphics and rendering are inherently very predictable, so latency is not a major issue (hence why our current architectures can be so heavily pipilened as they are). On the topic of bus latency, as long as the datastream is one way, latency is not an issue at all (which is the case in most operations today). As soon as you start querying the GPU for information though, you run into trouble, and not only because of the latency of the bus, but because you're effectively forcing a synchronisation to a processor of very deep pipeline that might be executing operations on which the given query is dependent on.

To circumvent that you'll need to embrace the asynchronous nature of the communication between the graphics processor and the host system, and never rely on any low latency feedback.

As for processing everything on the graphics card / general purpose vector processing unit, this might be the (future) ideal way of crunching game data. With D3D10 and the new Vista driver model for graphics cards, the possibilities of such a setup are enhanced for a few reasons (virtualisation, geometry instancing and streaming, etc). The introduction of virtualisation of the hardware will minimize the need for large ammounts of on-card memory by automatically loading data from main memory. Now, at this point we can start worrying about latency again [smile]. Although the operation is not comperably as slow as page faulting on the CPU, the potential penalty is far worse than a CPU cache miss (unless memory controllers are redesigned to allow the graphics hardware equal direct access, in which case the penalty is very much the same). I assume the graphics card architects will find ways of exploiting the predictable nature of rendering, as they have been doing for years, to hide this latency. However, more general purpose algorithm will have a harder time adapting to this environment - although, over time, the environment will adapt to the algorithms.

Right, I'm probably just rambling on outside the main topic here, so I'll stop for now.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement