Tips wanted for writing Shaders (HLSL)

Started by
7 comments, last by programering 16 years ago
Writing shaders, though the syntax is very similar to any C program, requires a completely different methodology and mind set. This is perhaps because Shaders are very limited in what they can be made to do, e.g branching, ALU instruction slots and etc. Few days back I was trying to implement a kind of ray-tracer for volume rendering and ran into tons of problem, mostly related to overshooting ALU instruction slots, branching (if-else) and loops. I know these limitations largely depend on shader models but still you may agree with me that writing shader requires a completely different mind set. I am just wondering if there is any good resource (book,ebook or article) out there which discusses about "Writing good shader codes" in general. I didn't mean anything like "How to implement this/that particular algorithm."
Z
Advertisement
Quote:Original post by browny
Writing shaders, though the syntax is very similar to any C program

Then it might be the high level shader language.

yes, i meant HLSL, sorry for not mentioning that.
Z
I've been writing shaders for a while now and the best approach I have is to actually think more about what you are actually programming on, the hardware of the graphics card and where each shader, vertex/ pixel, fits into the rendering pipeline as a whole. I know it sounds obvious but try to remember that you are writing a piece of code that will either be executed for every single vertex in your visible scene and every pixel in clip space so its critical that they are fast, efficient and as short as possible.

When I approach designing a shader, I find it much easier to think about the hardware right down to register level and thinking about what I am using each register for, what vertex format do I need? and what am I going to pass between the pixel and vertex shader? When it comes to implementing actual code, I also try to visualise what I am trying to accomplish in terms of vectors, texture coordinates normal’s etc. I sit down and think (or draw out) how I want these things to interact, i.e. I have the light direction here in world space, and the face normal there in object space, what am I going to do with them to get them into screen space, or how am I going to get to my final colour value in the pixel shader.

I have read a few good books (ShaderX, Programming Vertex & Pixel Shaders, Shaders for Game Programmers & Artists, The Complete Effect & HLSL Guide), and I agree that most of them talk about how to implement existing effects but this is a good way of getting to grips with the approach taken by others and possibly learning a few new ideas which you can then go on and improve upon.
I don't know of any books or anything off-hand, but a few tips I've learned either from here, from papers and presentations, or from trial and error:

-If a calculation can be linearly interpolated, do it in the vertex shader. This is usually the best way to reduce instructions in a pixel shader. However you need to be careful in balancing work per-vertex and work per-pixel depending on how many vertices and pixels you're rendering.

-In general, try to limit your texture sampling. GPU's tend to have plenty of ALU power to spare for maths, but bandwidth and latency issues can hit you hard. Be aware of the fact that the HLSL compiler will usually try to issue a texture fetch, then throw in some math (if you have any), then use the fetched texel in order to "hide" the latency.

-Don't modify depth in the pixel shader unless you REALLY need to. On modern GPU's, modifying the depth in the pixel shader disables early-z optimizations, which allow the GPU to not run a shader for a pixel if it's determined that it will fail the z-test.

-Be aware that in most cases, using if-statements will cause the HLSL compiler to use predication rather than dynamic branching. Which means if you have something like this...

if ( someCondition ){    SampleTexture( someTexture );    DoFancyMath();}else{    SampleTexture( someOtherTexture );    DoOtherFancyMath();}


...unless you deliberately take steps to avoid it, the compiler will generate assembly that follows both branches, samples both textures, does the math on both of them, and then "chooses" the "right" result.

-If you use dynamic branching, only use it where the results will be coherent. Or in other words, only use it where you can be reasonably sure that large groups of adjacent pixels will take the same branch. This is because GPU's work with groups of pixels simultaneously (usually quads at the lowest level) and all pixels in a quad must execute the same instructions. So if two pixels in a quad take different branches, all 4 must execute instructions from both branches. Also be aware of that Nvidia's older SM3.0 GPU's (GeForce 6 and 7) were pretty bad at dynamic-branching, performance-wise.

-This isn't a performance tip but more of a sanity tip...keep track of what coordinate space a vector is in. Much of your code, especially in a vertex shader, will be transforming vectors from one coordinate space to another. It's very easy to have a shader that needs to deal with 4 different coordinate spaces, and it can confusing very easy. Personally I like to add a suffix to any position or direction vector that denotes the coordinate space: OS for object-space, VS for view-space, TS for tangent-space...etc.


I guess that's about it for now...I'll edit this post if I think of anything else.
Yes, i fully understand that everything comes with experience and time. Just wanted to know which way would take me there in shortest possible time. The problem is, writing Shader in HLSL is not only about dealing with a problem (algorithmic) but also thinking a lot about low level hardware and also keeping in mind that "loops will be unrolled", "dynamic branching and hence if-else are really not as free", and etc etc. Like, after 2 days of struggling with my ray-tracer shader code i finally realized that it may not be a good idea to write such ray-tracer in PS; instead using several pass of rendering quads ( slicing technique ) and blending each slice in PS might be a better idea.
Z
@MJP
Great answers. Thank you, these are the kind of things I am really looking for, i.e. knowing the pits before i step on them :). Also, where do you really find those "inside stories" of shaders like the way you mentioned "SM 3.0 in older NVs were really bad and etc" also that neighbouring pixels thing. Btw, can you please explain that neighbouring pixel thing one more time, i couldn't follow it properly.

But thanks a lot again.

I believe this thread might be really helpful for lot other people
Z
Quote:Original post by browny
@MJP
Great answers. Thank you, these are the kind of things I am really looking for, i.e. knowing the pits before i step on them :). Also, where do you really find those "inside stories" of shaders like the way you mentioned "SM 3.0 in older NVs were really bad and etc" also that neighbouring pixels thing. Btw, can you please explain that neighbouring pixel thing one more time, i couldn't follow it properly.

But thanks a lot again.

I believe this thread might be really helpful for lot other people


Glad I could help. [smile]

That particular tip about the Nvidia GPU's, as well as plenty of other useful things, probably came from spending a lot of time lurking on the beyond3d forums. The forums are brimming with highly knowledgeable amateurs and professionals involved in all areas of 3D graphics, who will routinely mention this kind of "inside" knowledge (while also regularly discussing all sorts of topics that will probably melt your brain). Nvidia used to also release a programming guide for their GPU's that had this sort of "do this, don't do that" info but the last one they put out was for 6/7 series with a whole lot of FX-series stuff still in there. ATI also usually has some whitepapers you can read in their SDK. Another good place to get info from is the presentations typically given at conferences like GDC and Gamefest. For instance there are probably a few presentations from Gamefest 2007 that you'll find useful. Usually someone here or at beyond3d will make a thread posting links to these presentations when they become available, so be sure check them out.

Now for the bit about dynamic branching and coherency...when you talk about pixel shaders (and shaders in general, for the most part) you have to keep in mind that things are usually happening on a massively parallel scale. Your shader code isn't running on a single monolithic CPU core with full access to resources and that will optimize your code for you, instead it's running on a single streamlined execution unit that's designed to crank out some math fairly quickly while sharing resources with other units. This is what makes shader programming the way it is: why math is usually better than texture sampling, and why you don't usually want fancy branching code. Now getting to my point...typically pixel shaders are grouped into quads, with a quad being 4 pixel shaders that work simultaneously on a "box" of 4 pixels. Part of what helps these shaders in a quad work efficiently side-by-side is that they all typically execute the same shader code. So for instance if one shader is adding two vectors, the other 3 are also adding vectors. Dynamic branching, however, throws a wrench into all this. With DB any pixel could come up with different results, and could therefore take different branches. And different branches means that shaders with a quad are no longer executing the same code at the same time. The worst case comes with a shader like this:

if ( x > 0 ){    return float4(0, 0, 0, 0);}else{    SampleLotsOfTextures();    DoLotsOfMath();}


With that shader you have two branches: the first has almost no instructions, while the second would result in a large amount of instructions. This means that if 3 pixels in a quad take the first branch but one takes the second, the first 3 will have to wait around for the 4th meaning you will have saved no time with dynamic branching (in fact you will probably take more time, since DB tends to have a penalty involved just for using it). This is why I said you want coherency: you'll only want to use DB if you can be pretty sure that most of your quads will have 4 pixels that take the same branch.


Quote:Original post by browny
yes, i meant HLSL, sorry for not mentioning that.

That's ok.

This topic is closed to new replies.

Advertisement