Jump to content
  • Advertisement
Sign in to follow this  
thecheeselover

Send more data to the GPU vs more operations

This topic is 2258 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi there,

I noticed that I was sending a lot of data to my GPU and I was wondering if sending a lot of data (into a vertex buffer) could be very slow? For each vertex, I send 48 bytes (position, normal, texture and color). Also, I wanted to send one more byte to the GPU, but because HLSL doesn't support bytes, I'm obligated to send a short (2 bytes). I found a way to store my normal, my color and my new byte into one float (they all share the same variable) so that it would only use 4 bytes instead of 18. Is it worth to do binary operations both on the CPU and the GPU to compress/decompress my normal, my color and my new byte into one float or it's better to send more data? Which choice is the best to increase my game's performance?

Thank you,
Thecheeselover

Share this post


Link to post
Share on other sites
Advertisement
In general with GPU's you want to prefer math over memory access. Usually dedicated GPU's are quite disproportionate with regards to their ALU count vs. bandwidth + memory hardware. But of course in reality it depends on the hardware and the workload, so you should be careful not to jump to any conclusions without profiling.

However my bigger concern would be precision. How are you going to store a normal, color, and another byte in just 4 bytes? Typically you'll want at least 16 bit per component for normals.

Share this post


Link to post
Share on other sites
In general the memory transactions will happen in the background. They're also very fast, don't pollute the CPU cache, don't cause cacheline fighting and can happen while your 3D card is busy doing something else; as the CPU can be doing.

The CPU is pretty fast at doing tight operations loops, but there will still be tons of branches, cache misses, hyperbus communications and other assorted friction. In addition it can't parallelise the work. Mapping more of the GPU memory, doing simple writes into it and then leaving both devices to get on with more work while other (specialised and faster) parts of the computer hardware deal with the shifting around of memory is definitely the way to go on desktop systems.

If some of the data is constant or not updated often, consider using two buffers -- a frequent and an infrequently changing and use two accesses in the shaders to combine the data. This will reduce the amount of memory which needs to be moved between the devices across the memory busses[1]. This is usually less of a problem in the shaders because the shader cores will each have memory controllers (often one for each buffer), and you're still processing the vertices in linear order so you'll still get good cache read-ahead on them.



[1] An example would be a skinned model. The texture posn/colour at each vertex will typically not change, it's just the vertex posn that gets updated, so by using two buffers, the memory transmission size can easily be reduced by 2/3.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!