Sign in to follow this  

Quick questions about DirectX shader memory vs. efficiency tradeoffs.

This topic is 3297 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello, I am an experienced C++/DirectX programmer but am new to programming DirectX HLSL shaders. I am trying to figure out how to effectively program shaders as far as memory usage vs. efficiency tradeoffs go. My concern is that these functions (shaders) are going to be called often because of the number of vertices running through them so it seems like they of all functions should be optimized. In my current vertex shader there are three pieces of information that need to be known by each vertex in addition to the per vertex data: There are many ways to supply this information to the vertices but since I am new I do not have the experience to know the proper and most effective way to supply it. For the first variable I could ... A) Supply the value in the vertex. This uses 256 verts * 150 batches * 4 bytes = 153,600 bytes of memory. The value could be found directly from the vertex. B) Set four floats in the global domain of the shader prior to each batch (150 batches) and require 4 additional conditional statements per vertex. For the second variable I could: A) Supply the value in the vertex. This uses 256 * 150 * 4 bytes = 153,600 bytes of memory. The value could be found directly from the vertex. B) Set 2 floats at intialization in the global domain of the vertex shader + Set four floats in the global domain of the shader prior to each batch (150 batches) and require 2 additional conditionals, 2 additional - operators, 2 additional * operators and 2 additional % operators per vertex. For the third variable I could: A) Supply the value in the vertex. This uses 256 * 150 * 4 bytes = 153,600 bytes of memory. The value could be found directly from the vertex. B) Set 28 floats in the global domain of the shader prior to each batch and require 2 additional conditionals, 2 additional - operators, 2 additional / operators, 2 additional * operators and 2 additional % operators per vertex. Could someone please help give input into these types of memory vs. efficiency tradeoffs? Which option(s) would you recommend to a newbie shader programmer? Thank you for your help. Jeremy

Share this post


Link to post
Share on other sites
If the vertices aren't going to change a lot then I'd store data in the vertices. The extra memory shouldn't be a serious issue, and it'll save both the CPU overhead of setting the constants and the extra work in the vertex shader.

If the data is changing every few frames then it might be worth using vertex shader constants.

Also don't forget that you can store an index in the vertex and use that to look up a value from an array of constants in the shader.

Share this post


Link to post
Share on other sites
Adam_42's suggestion is good.

Some points worth noting:

Vertex size which is 16 or 32 bytes caches more easily, so leads to better performance. So if this data brings your vertex from 20 bytes to 32 that's good, but if it brings it from 16 to 28, that's less optimal.

Anything which is changed a lot of times will have an overhead cost. I assume you're using D3D9, where this overhead is more pronounced than D3D10 (but it exists in 10, too). So setting shader constants adds a little cost. Shouldn't be huge, though.

Modern graphics card have a lot of math processing power. Adding a few more math operations is likely to not be a serious problem with modern cards. ATI and NVIDIA both have utilities which tell you how much time a shader will take on various cards. Check out ATI's shader analyser, for example.

Share this post


Link to post
Share on other sites
Quote:
Original post by Adam_42
Also don't forget that you can store an index in the vertex and use that to look up a value from an array of constants in the shader.


Indeed, but keep in mind that this can lead to constant waterfalling, which is where multiple units that are executing the same instructions will stall because the units need to fetch data from different constant registers.

Share this post


Link to post
Share on other sites

This topic is 3297 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this