Jump to content
  • Advertisement
Sign in to follow this  
Medium9

Shader's registers: Technical question

This topic is 3324 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Aloah How are the registers in the GPU managed? What I aim at is: Do external variables use the very same registers that the ALUs calculate with? The reason I ask for is, that I now have a lot of external variables (14, mostly float3), and fear that this way I might cause the compiler to have little room for optimization. Is that the case? If yes, is there another way to pass informations to the shader? I know textures are a possible choice, but then I guess that accessing them is a good tad more costly, and some seemingly random coordinates are hardly descriptive, making the code look like a wild mess :-/ Please tell me that my concerns are gratuitous! Edit: I might add that this regards SM3 on DX9 in this particular case, but a general info is just as welcome.

Share this post


Link to post
Share on other sites
Advertisement
There's no public data available on the topic, but in general the hardware manufacturer is free to implement features in their own way as long as the result conforms to API specifications.

That said: I'm able to make an educated guess that each ALU module will get a copy of the working set they need - if this is the case (it very likely is), the computations are performed on internal temp registers that are not exposed to the application programmer directly.

Share this post


Link to post
Share on other sites
Thank you for some (educatedly guessed) clearification. So that basically means that I do nothing wrong or "harmful" in terms of execution speed, if I just use whatever number of externals/consts/etc. until something crashes or just won't work?
Don't get me wrong, I don't consider to actually put that to the test with something I want to release :) It's just that I would like to avoid a redesign after I hit a wall.

Share this post


Link to post
Share on other sites
Quite an interesting read, although I didn't understand everything (yet). But yes, it contains a line saying doing math with constants should be minimized, but there is no reason given.

This however raises the question of what to do instead. Textures are hardly a good choice, like I thought. But considering that for some things you just need constants, I guess the only possible optimization is to actually tear down calls to them - like introducing a new variable for a formula depending only on constants, if this is needed several times.
Much like

color.a = 1.0f;
color.r = 0.3+C1*2;
color.g = 0.4+C1*2;
color.b = 0.1+C1*2;
// change to:
float c2 = C1*2;
color.r = 0.3+c2;
...


Would that be something that could have a positive effect with regard to this tip?
(Sorry for asking a lot and quite naively. I'm still very new to GPU stuff.)

I know there are lots and lots of places to look at when optimizing, but since my shader relys quite heavily on calcs with externals, this really cought my attention =)

Oh, and thanks! Although difficult for a newbie in places, still very informative stuff.

Share this post


Link to post
Share on other sites
Well I wouldn't try anything too drastic (like textures) just to avoid using constants...you'll definitely want to minimize their usage but within reason. Personally I wouldn't even bother trying to optimize that unless I had a shader where I knew I was consistently math-bound.

As for your code snippet, I would think that the compiler would do that for you already whenever it's reasonable to do so (especially if it's a general-case performance improvement). Even if it's not, I doubt that it's worth coding stuff in shader assembly.

Share this post


Link to post
Share on other sites
Well, my shader is a volumetric raytracer, and as such heavy on math. I pass camera and light data by constants, and don't use any usual geometry. By the nature of it, I juggle around a lot with these values. For every ray, for every step the ray does, and then for every loop-iteration that determines how to handle the current point (scalar fields are a pain :)). All these use externals in loops.
The thing is: By now, it is far from being used for realtime rendering. But, under certain conditions, it's getting quite close, and if anyhow I could tweak it enough to perform at a bearable framerate... well that would be awesome =)

As for the second part of your answer: Well, the compiler better do this. Compiling this particular shader already takes ~10-20s, I'd expect some proper results from that ;)
No seriously, it's good to know that the optimization is likely to handle things like these well. The less need for handcrafted assembly the better.

Share this post


Link to post
Share on other sites
Quote:
Original post by Medium9
Aloah

How are the registers in the GPU managed? What I aim at is: Do external variables use the very same registers that the ALUs calculate with?
The reason I ask for is, that I now have a lot of external variables (14, mostly float3), and fear that this way I might cause the compiler to have little room for optimization. Is that the case?

If yes, is there another way to pass informations to the shader? I know textures are a possible choice, but then I guess that accessing them is a good tad more costly, and some seemingly random coordinates are hardly descriptive, making the code look like a wild mess :-/


Please tell me that my concerns are gratuitous!

Edit: I might add that this regards SM3 on DX9 in this particular case, but a general info is just as welcome.



You should generally assume that any given logical answer to this question exists in at least one real hardware device out there.

- All hardware runs shaders in parallel in some facility, which is up to them as to how many and in what pattern.
- The hardware might handle a fixed or variable number of threads.
- The hardware might have a register pool or a fixed number of registers per thread. If its from a pool this probably affects the number of threads it can run at once.
- Shader constants might share with this pool, they might not, or they might even be burned into the compiled shader opcodes.
- The vertex and pixel shaders might or might not be implemented in the same way. They might share some, none, or all of the resources.
- Texture caches might or might not be shared across texture units.
- If the hardware can dynamically adjust the number of threads of pixel vs vertex shaders, expect it to do this poorly for some of your draw calls, from time to time. They have to guess what the load will be like.
- If you are worried about space always pack your data into float4, obviously you would want to put as many scalars into a w of a float3 as much as possible.
- This is nearly mandatory for interpolators. Use as few as possible, always pack everything into float3/4 as much as possible.

Texture fetching is one specialized thing the hardware does really well. Texture fetching bone data for skeletal animation can beat shader constants, due a thing called shader constant waterfalling. However I wouldn't worry about it a whole lot unless your skeletal meshes regularly push a large amount of vertices through them.

Bonus fun:

If you have some embarrassingly parallel code using hardcoded constants, it frequently pays off to change the constants to parameters set at runtime. Why? Because the shader compiler can't optimize the code as well when you do that. Why can this be good?

I can explain with a real world example!

I have a pixel shader that implements a sobel filter. its 3x3 filter, though the center value isn't involved in most of the math.

A normal sobel filter has 4 coefficients, though you negate them all to generate 8:

Horizontal:

Example values:

1 0 -1
2 0 -2
1 0 -1

Vertical

1 2 1
0 0 0
-1 -2 -1


Naive shader code (H=Horizatonl, V=Vertical):

half4 HFilterDiagCoeff(HCornerCoeff, HAxisCoeff, -HCornerCoeff, -HAxisCoeff);
half4 VFilterAxisCoeff(VCornerCoeff, VAxisCoeff, -HCornerCoeff, -HAxisCoeff);

half4 SobelH = DepthsDiag * HFilterDiagCoeff + DepthsAxis * HFilterAxisCoeff;
half4 SobelV = DepthsDiag * VFilterDiagCoeff + DepthsAxis * VFilterAxisCoeff;

The compiler will not generally generate float4 vectorized math for this, due to the negation of two of the constants in HFilterDiagCoeff and VFilterAxisCoeff. Instead, HFilterDiagCoeff and VFilterAxisCoeff should be shader constants set entirely from the app. Yes it technically doubles the space it takes. But it cuts the amount of math down by about half. This particluar chunk of code will be 4 instructions on just about all hardware, the broken version is usualy 2 times that or worse. Computing the UV offsets for a filter like this has the same problem, with the same fix.

Share this post


Link to post
Share on other sites
Thanks a lot! I didn't realize until now how few standardizing has happened behind the known APIs. That situation probably won't help me much at optimizing in general, but it is good to know that attempts may only result in an improvement on that particular kind of GPU I use.

Packing is one thing I just recently thought about, which already eliminated 5 float variables. It didn't do anything to the framerate, but it's nice to know that it basically was a good idea :)

The last tip is a good-to-know as well!

Thank you for zoning in and making that interesting and most importantly well understandable reply.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!