Jump to content

  • Log In with Google      Sign In   
  • Create Account


Uber (Compute) Shader construction


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
9 replies to this topic

#1 spek   Prime Members   -  Reputation: 993

Like
0Likes
Like

Posted 03 August 2013 - 12:16 PM

Hello there,

 

I never actually used the UDK development tools, but when looking at their material definitions -a bunch of nodes with texture or math functions connected to each other- I get the impression that the shader code is constructed by pasting small pieces of node-code together. Though less aawesome, I'm doing something similiar; specific shaders get constructed by enabling options and using IFDEFs in an ubershader.

 

Nothing new so far. But how about (numeric) parameters? For example, let's say we want to multiple a specular texture with a fixed RGB value of {64,64,75}. In my engine, this RGB color would be passed as a uniform float3 parameter when applying the shader. But it seems to me that UDK actually hardcodes these (constant) values as well into the shader.

 

 

I could be wrong of course, but it would make some sense. Using constants is more efficient than passing parameters. Then again the disadvantage is that you'll have to rebuild the shader whenever a value changes, and maybe more important, you will get a massive amount of slight-different shaders.

 

 

Now to the actual question, what would be wiser to use -mainly performance wise- in the following situation? I'm making a new Particle editor, and there are tons of modes and parameters regarding motion, colorizing, size, spawning positions, timing, opacity control, physics, collisions, and so on. Instead of passing a big bunch of vectors, colors and numbers, I could just hardcode them into the (OpenCL Compute) shader. Sounds like a plan Murdock, but I'm a bit worried about the fact that I will end up with a big bunch of shaders. If there are 1000 different particle spawners, there will be 1000 different programs as well, This dissallows eventual grouping on shader as well.

 

Greets,

Rick



Sponsor:

#2 Vilem Otte   Crossbones+   -  Reputation: 1374

Like
1Likes
Like

Posted 03 August 2013 - 07:46 PM

I've also never worked with UDK, but from what I've tried, having tons of shaders doesn't work that well too - switching shader is not free operation, and so using 1k different shaders can destroy your performance. Also loading time for 1k different shaders would be ... long (incl. compilation, linking ... and if you use some pre-processing, you can as well end up with few minutes just on this).


My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com


#3 spek   Prime Members   -  Reputation: 993

Like
0Likes
Like

Posted 04 August 2013 - 01:22 AM

Hey Vilem, doing well?

 

More shader switches and loading times are a problem indeed. Then again, in this particular situation (particles) there won't be 1000 different types of particles active at the same time of course. Imagine a couple of dust clouds, electro sparks, and maybe some blood, (gun)fire + flying debris if things get tense. 5 to 50 different particle effects sound more likely.

 

A second argument would be that the particle shaders are rarely the same anyway, even when using variable parameters instead of hardcoded constants. In the example above, sparks / blood / dust / debris all have different characterestics and physics. So you'll still end up with almost the same number of different shaders, unless we use a variable for really everything, and eventually some "if else" in the code. That is how I currently do it by the way (using vertex shaders + transform feedback to evaluate particle points). But either you'll have to send a big amount of parameters for each particles, or you'll end up with very stiff shaders that can't do a lot of various things. Especially collision detection or advanced motion (standard gravity, gasses, tornado's, falling feathers, streaming liquids, ...) are hard to combine in one super shader that does it all.

 

Maybe it's best to do both; hardcode the stuff that would produce a lot of if/else or impossible code. Keep using parameters for smaller controls in order to reduce the amount of various shaders. Doing this sounds a bit difficult to do properly though, and programmer-instrinct tells me UDK does it in a more flexible way hehe.

 

 

 

As for loading, I suspect quite a lot particle effects are bound to a special situation only. A waterfall or some dancing plasma balls don't appear all the time, so I could load the pre-compiled code in a background thread when approaching the scene. Though I'm not sure if OpenCL allows to create the actual shader in a background thread as well... never tried it.

 

Cheers



#4 Vilem Otte   Crossbones+   -  Reputation: 1374

Like
-1Likes
Like

Posted 05 August 2013 - 05:04 PM

Yup doing fine...

 

you could just determine the shaders needed at level load and dynamically load and compile the ones that you *will* need on the fly. E.g. the player starts in one end of the map, you don't need shaders from the other and. And load the ones needed on the fly and remove the older ones. That could work, the question is - is it worth it to implement that complicated system for Tower22 game?

 

I can't tell you whether it is or isn't. I do have though this much complicated system in the game engine for my long-term project - the whole loading system for any resource is working on other thread and letting main thread know when we're done. It is quite complicated construction though and I probably wouldn't do it for non-open world games. On the other hand it gives your engine the ability to work with large scale worlds.

 

OpenGL *allows* you to create or upload shader/texture/whatever in background thread - but I don't use it this way. I use background threads to do just loading, and OpenGL calls are called on main thread. I know this isn't the best solution ever, but I wanted to separate loading threads from dependency on OpenGL. Because in future I'd like to allow user to switch to OpenCL renderer (using ray tracer).


Edited by Vilem Otte, 05 August 2013 - 05:05 PM.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com


#5 spek   Prime Members   -  Reputation: 993

Like
1Likes
Like

Posted 05 August 2013 - 05:43 PM

It gets slightly more complicated, as there is no "level-start". The whole world roams, so in case I shouldn't load all possible shaders at a time, they need to be loaded when they become relevant. So most of the texture and world data is streamed on the fly already. Using OpenCL instead of GL for the particles btw, dunno about the options there to stream things smoothly.

 

It's hard to tell how many different particle effects will be generated really. Thousand is probably a bit of an extreme example, and a minute more or less loading doesn't hurt that much anyway, since a lot has to be loaded initially anyway. That still doesn't mean I should follow the dumb & easy path, But for making decissions, I'm more interested in the performance benefits (or lack of), and flexibility. If each different particle effect gets its own shader anyway, I can hardcode all parameters, and allow to add custom chunks of code for advanced motion, rather than trying to do everything with a limited set of parameters.

 

 

 

For example, consider physics. A particle either just falls like a rock, like a feather or tree leaf. Maybe it doesn't fall at all because it's a dust cloud, or a swarm of stinky flies that does random movement. With a variable gravity vector you can come quite far, but more advanced movements such as the flies require more options/variables, or a different base shader to start with. Combine that with the many other features (color, size, opacity, collisions, wind, spawn locations, ...), and the list of combinations becomes infinite. In my current "solution", I'm just using one shader with a big bunch of variables. But three things bother me:

- The shader isn't very optimized (a lot of features are rarely used)

- Not switching shaders, but I need to set quite a lot parameters nevertheless (how bad does that hurt anyway, setting params?)

- Works good for basic effects, but too stiff for more advanced stuff (cigarette smoke, A-bombs, whatever)

 

 

 

However... probably 50% of the particles still uses a simple program. Bit smoke here, a bit of dripping dirty stuff there... Impact debris doesn't vary a lot, except for the texture, density and gravity influences. So probably I should try to make shaders reusable in such cases anyway (so I can also sort on them, saving a few switches maybe), by offering a small listing of common used parameters / features. More advanced effects should have the flexibility to create their complete own programs, without getting restricted. But advanced effects aren't the majority of the particles, so it shouldn't produce many hundreds.

 

 

 

UDK

Still curious, does anyone know if a variable (a gloss factor for example) is stored as a hard-coded constant in the shader, or does the engine pass the variable so the same program can be re-used for multiple materials with different gloss values? If not, don't they end up with thousands of shaders as well?

 

The most mind boggling part of this whole thing, is to pick a path that is efficient, but also clear and somewhat easy to use, understand and implement. Highly optimized strategies often aren't. That's what I like about the UDK node approach, it seems very consequent and understandable for non-programmers. No idea what kind of dirty performance tricks are done under the hood though...



#6 nonoptimalrobot   Members   -  Reputation: 408

Like
1Likes
Like

Posted 12 August 2013 - 09:39 PM

When you hard code a constant into HLSL or Cg code the end result is the same as having that constant specified in a constant buffer and bound at runtime.  The internal mechanics of whatever API you happen to be using make sure this happens transparently.  I'm not exactly sure how DirectX 11 and OpenCL / OpenGL architectures implement this but in the DirectX 9 days constant literals would be implicitly stored in the constant registers.  When you compiled a shader all the register not explicitly declared by the code would get reserved for storing compile time literals.  When you bound a shader to the GPU those constants would get bound to the appropriate registers by driver.

 

There isn't actually a runtime difference between the two options but there are practical consequences.  If you restrict yourself to using only compile time constants the API / driver end up managing and binding constant buffers for you; however, you loose flexibility.  You can't change them without recompiling your shaders and you will inevitable end up with a lot of shaders to account for all the permutations in compile time literals.  Don't do this.

 

You might be wonder why compile time literals are even supported at all.  It's mostly a convenience thing.  There are a lot of formulas that use constants that will never ever change and it's convenient to type these directly into the code.  A classic example would be when you multiply the color sampled from a texture by 2 and subtract 1 to turn it into a vector for normal mapping.  It would be super annoying if you had to declare 2 and 1 as constant buffer values and bind and set them yourself at runtime.

 

The other wrinkle in this is that drivers will occasionally try to catch various render-state, constant-buffer and shader combinations that are common (by noting the GPU state at the time of DrawPrimitive calls) and re-implement these things behind the scenes (sometimes actually recompiling shader code) as a way of achieving driver specific optimizations.  This is what is happening when Nvidia or ATI release new drivers that are said to increase performance in specific games.  It can cause a lot of stuttering and commercial games get around this by looping through all the materials permutations used before gameplay starts to trigger the driver to do all of it's behind the scenes trickery behind a black screen.  If you hard code all the tweakable constants for a particular shading algorithm and thereby generate a huge numbers of shaders you make it harder for the driver to these types of optimizations.



#7 MJP   Moderators   -  Reputation: 10659

Like
2Likes
Like

Posted 13 August 2013 - 03:05 PM

When you hard code a constant into HLSL or Cg code the end result is the same as having that constant specified in a constant buffer and bound at runtime. 

Well that of course isn't true. Any constants that are hard-coded into the shader can be folded into the instructions and used to optimize the resulting assembly/bytecode. In materials a lot of parameters end up having a value of 0 or 1. If those parameters are hard-coded then any multiplication with those values can be optimized away completely. Or in the case of 0, all operations involved in producing the value to be multiplied with that parameter can be stripped away. With values in a constant buffer the compiler can't make these assumptions, and must issue wasted instructions and possibly consume additional registers. There are also cases where the hardware supports modifiers that allow multiplications by certain values to be folded into a previous instruction. For instance AMD's GCN ISA supports modifiers that allow for a free multiplication by 0.25, 0.5, 2, or 4.



#8 nonoptimalrobot   Members   -  Reputation: 408

Like
1Likes
Like

Posted 13 August 2013 - 09:03 PM

 

When you hard code a constant into HLSL or Cg code the end result is the same as having that constant specified in a constant buffer and bound at runtime. 

Well that of course isn't true. Any constants that are hard-coded into the shader can be folded into the instructions and used to optimize the resulting assembly/bytecode. In materials a lot of parameters end up having a value of 0 or 1. If those parameters are hard-coded then any multiplication with those values can be optimized away completely. Or in the case of 0, all operations involved in producing the value to be multiplied with that parameter can be stripped away. With values in a constant buffer the compiler can't make these assumptions, and must issue wasted instructions and possibly consume additional registers. There are also cases where the hardware supports modifiers that allow multiplications by certain values to be folded into a previous instruction. For instance AMD's GCN ISA supports modifiers that allow for a free multiplication by 0.25, 0.5, 2, or 4.

 

 

 

Yeah, definitely.  Optimizations happen based on known compile time constants.  In general terms this happens in one of two ways:

 

1) Algebraic simplification where operations are collapsed and constants are combined such that the mathematics are the same.  If this is as trivial as skipping computations that are simply multiplied by zero then you'll likely get a warning that you are doing something unnecessary.  This is common if you are compiling code that was pieced together by a script or whatever else.

 

2) Some architectures support instructions with implicit constant factors such as AMDs GCN.  These are limited in variety and restricted to trivial values such as {1/4, 1/2, 2 4}.  I don't know of any that exploit pi or e or other natural constants but I would expect these to show up at some point.



#9 spek   Prime Members   -  Reputation: 993

Like
0Likes
Like

Posted 14 August 2013 - 03:41 AM

Thank you both for these insights!

 

Having to change the parameters won't be required in this case, as the effects are created in an editor and will never change afterwards. But yeah, ending up with a billion shaders is just a no-go, even while the expected count of variations isn't that extremely high, you usually end up with more than planned.

 

 

I made a combination of both worlds. Variables like colors or force vectors won't be hard-coded. These are stored in (OpenGL) UBO's, so that saves some CPU / GPU traffic (btw, is an UBO the same as a constant-buffer??). Bigger options such as wether a particle and bounce on the ground or not which typically would generate if-else or switch chunks of code, are hardcoded. 

 

Then last, the artist is able to add custom code for very special cases of particles that don't wrap very well with the standard features. Earlier I ended up with (too) many variables and options, trying to support everything and bloating the code. I stripped the shaders down to the more common basic features, and allowed custom code for the (relative few) special cases. These custom shaders could have all their variables hard-coded, as they are unique anyway.

 

 

 

Any idea where UDK draws a line between constant and variable data? Just looking at the node-editor, a shader can end up with quite a lot of colors, vectors and numerics. Of course the shader can be reused for other objects, but there will always be slight differences in reflectivity, gloss or other object/material specific attributes.



#10 nonoptimalrobot   Members   -  Reputation: 408

Like
0Likes
Like

Posted 14 August 2013 - 11:42 AM

I made a combination of both worlds. Variables like colors or force vectors won't be hard-coded. These are stored in (OpenGL) UBO's, so that saves some CPU / GPU traffic (btw, is an UBO the same as a constant-buffer??). Bigger options such as wether a particle and bounce on the ground or not which typically would generate if-else or switch chunks of code, are hardcoded.

 

Yeah, UBOs in OpenGL are the same concept as constant buffers in DirectX

 

Any idea where UDK draws a line between constant and variable data? Just looking at the node-editor, a shader can end up with quite a lot of colors, vectors and numerics. Of course the shader can be reused for other objects, but there will always be slight differences in reflectivity, gloss or other object/material specific attributes.

 

It's intentionally left up to the developer to accommodate a range of development styles and game types.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS