How much data can I copy to video memory per frame?

Started by
3 comments, last by 21st Century Moose 12 years ago
Hi

As the title says, how much data can I copy to video memory per frame (30/60/120 Hz)?

Usually the game loads and everything is neatly put in managed memory, but what if I want to update a texture, how much is too much when I copy data video/managed memory?

Also, is it possible to copy data while the main loop is working? Like I have a function that copies alot of data over to the managed pool, can I copy on another thread while I do other API calls on the main thread? (I will not use the data as I copy it over)
Advertisement
It depends on your PCI-E bus speed, which for modern cards is about 4GB/s in any direction. If you are running at 60 frames per second, it means you can upload roughly 4/60 = 0.07GB = 70MB of data to video memory every frame. This is a bit less because of necessary communication overhead (and you aren't the only one using the graphics card on the system) so 30MB/frame is a nice figure. You can do the math for other refresh rates. Note that the new PCI-E interfaces are coming around the corner and should double these figures, but in general transfer speed isn't the bottleneck in games.

It is possible to copy data to memory asynchronously in a different thread, GPU's are excellent at asynchronous operations, just be careful with threads (graphics API's sometimes need some setup to be able to be called from multiple threads).

Note that this is different from actual "video memory speed" advertised on graphics cards which tell you how fast memory operations which are local to the GPU (say, adding two textures together on the GPU), which can be as high at 200GB/s (that's right, two hundred gigabytes per second, so if you can stay on the graphics card without transfering anything that's a huge speed boost). So for instance if you need fill a texture with random data for some reason, instead of generating it from the CPU (slow) and copying it to the GPU (double slow), you should generate the numbers on the GPU (ultra fast) and so you don't need to copy anything (instantaneous!)

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

Any cpu to gpu operation is hellishly slow. Move your texture to gpu in loading phase, then use shaders.
i would be suprised if you manage to stream a single 1080p texture to gpu at 60hz.

i recently wrote a render-to-texture method. only 512*512resolution texture, still killed the cpu.
now i do only partial updates.
I'm uploading roughly 2-3 megabytes of data to the GPU memory per frame (instancing data, texture updates etc) and I can easily reach over 100 fps or 200 fps with Crossfire. I am also pretty sure that I haven't saturated the bus and that the data transfer isn't the bottleneck. Also, I haven't even tried yet using staging buffers to further improve the situation. I use mostly map/discard and occasionally UpdateSubresource, which isn't enforced by the MSDN.

As a side note, I was able to do that already 5 years ago.

You'll have to be careful with updating the texture, the amount of stress you put on your CPU exactly. It is more likely that you'll end up having CPU as a bottleneck than the actual memory transfer. Copying things around isn't that expensive (unless you do too much of it), but the actual CPU-side update on the texture may be.

Best regards!
Data transfer is very unlikely to be a bottleneck on any reasonably modern hardware, unless you're doing something utterly crazy. What will kill performance for you is if any such transfer needs to stall the pipeline. What this means is that if a resource is in use for drawing at the time you're updating it (and remember that a GPU is a parallel processor - your draw calls don't actually draw stuff, all they do is tell the GPU to draw it at some later time) the entire pipeline needs to stall and drain before the resource can be updated. Do this many times per frame on many different resources and you may even drop back to single-digit framerates.

This is more of an issue on modern hardware than it was back in the old days because modern hardware has much deeper pipelines with many concurrent operations going on in each pipeline stage. A pipeline stall/drain is an absolute killer.

To resolve this you need to use dynamic resources with the proper lock flags specified when updating; these will serve as hints to the driver that it's OK to not stall as the portion you're updating is not currently being used for drawing, or to continue working with it's current chunk of GPU memory but give you a new chunk for the purposes of updates.

Have a look in your DXSDK help file - there are sections (under the "Performance Optimizations" heading) on using dynamic textures and using dynamic vertex/index buffers that give useful tips and sample code.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

This topic is closed to new replies.

Advertisement