Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 08 Aug 2012
Offline Last Active Mar 15 2013 02:30 PM

#4969335 Memory coalescing with structured buffers

Posted by on 14 August 2012 - 12:53 AM

I did some testing and found out that I could get up to a 2 time speed increase if I arranged the data in a AAAABBBB fashion (for a structure of 2 float4's). If you have a structure of one float4, you can get at maximum 8 float4 accesses in one memory transaction (the full 128 byte segment). If you have a structure of 2 float4's, you can get at maximum 4 float4 accesses. If you have 3 float4's, you can get at maximum 2 float4 accesses (most likely less because 3 float4's do not line up on the 128 byte boundaries). Many advanced shaders will be limited by memory bandwidth, so a doubled increase in usable memory bandwidth will most likely double your framerate. If you can separate your structures in order to align the same value types (max size of 16 bytes) next to each other (at least 128 bytes worth of data), you will most likely see a significant increase in frame rate. If someone else would like to try this on their shader and report back, I'd like to see the results.