Jump to content
  • Advertisement
Sign in to follow this  
L. Spiro

OpenGL Shader-Based Engine

This topic is 3065 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

The basic question I will be asking is general structure behind a shader-based engine.

But let me explain.
I am making a generalized commercial game engine.

My first test was just for performance.
I made a single shader that replicated nearly all of the fixed-function pipeline since it was my first shader and I wanted a nice introduction. I gained valuable knowledge on the performance.
The shader itself was no hit to the FPS at all. It was updating the data in the shader.
Nearly every call to a batch of vertices needed some change, so SetIntArray() or some-such was called frequently.
This accounted for about 80% of the slow-down I got over the fixed-function pipeline.

In my next attempt I did what I thought was more normal.
Instead of branching if lighting was enabled (for example), I had a second permutation for it, and activated different shaders based on which settings were active between color-write, lighting, tex-gen, and something else I forgot at the moment.
This way I could avoid all of my calls to SetBoolean(), and also avoid some setting of data if it wasn't being used. I also avoided using shaders when nothing special was needed. That is, my shader was providing a replica of GL_OBJECT_LINEAR, and if this was not needed I used the fixed-function pipeline.

But then I found out that after I had a shader set, then I set a different shader, when I went back to shader 1 its data had been flushed and I had to re-apply its global-variable data with more calls to SetValue() (and friends) to update values that did not actually change since the last time I used that shader. If I could set shader 1, set shader 2, then go back to shader 1 and have its previous data still there I could mitigate this cost, but sadly this seems not the case.

Both experiments gave me horrible results.
My result looked exactly like the fixed-function pipeline and this is important for me to emulate because I am making a generic commercial engine.
But the same scene using the 3 methods looks as follows (in FPS):
FFP: ~200
Method #1: ~150
Method #2: ~100

Both of my experiments were even worse than just straight FFP.

So finally, what is the theory behind a shader-based engine? I originally needed to replicate the OpenGL fixed-function pipeline, but now we are switching to shaders for all of our target platforms (Nintendo Wii, OpenGL, OpenGL ES, PlayStation 3, Xbox 360, PC, Macintosh, and iPhone).
For compatibility reasons, we were trying to stay fixed-function on OpenGL systems, but we no longer have that restriction.

So with the performance results I got before, I don't see what I can do.
I wanted a generic foundation for shaders that could be used for any models, etc., but all of my results show that this is just impossible.
Or did I do something wrong to cause such a hit to the performance?
Instead of making generic shaders, are most engines generating special shaders inside their model libraries that are designed just to draw one part of the model (then sharing these as much as possible)?
Even in that case I can't get around the performance issue I had before with switching between shaders and having to re-update their global states (which is the single largest bottleneck in the history of bottlenecks I discovered).

How are shader-based engines getting all of the performance they are getting?
Am I completely off based with my theories or what?

Thank you,
Yogurt Emperor

Share this post

Link to post
Share on other sites
From those performance figures it sounds like you could be CPU bound. How many draw calls per frame are you making?

Share this post

Link to post
Share on other sites
Well I rev it up to test out the performance, but it should be about 100 per frame average. One call for each terrain chunk, then a few more for each part of the car that has a different texture/material.

But the same number of calls does not slow down the fixed-function pipeline like that. I mean the bottleneck has definitely been narrowed down to the Set*Value() family of functions for updating shader globals.
And my objects are sorted to reduce the number of redundant state changes, on top of using last-set-value checks to ensure no redundant calls to any shader values.

Yogurt Emperor

Share this post

Link to post
Share on other sites
Here are some tips:

1. Sort batches by shader first (VBO / texture / etc second). This will ensure that states remain between draw calls.

2. Sort states into three categories: per-frame, per-batch and per-instance

3. Only update the states by the frequency they need (frame/batch/instance)

4. Instead of using separate SetFloat()-calls, bunch states into structs (frame/batch/instance) and use SetValue(). Push entire structs instead of separate states.

5. If you're not using instancing, the inner-most loop should basically only push a new transform, commit, and make a new draw calls. (However, for better CPU utilization you should definitely use some form of instancing, and since you're using shaders on all platforms you could easily implement shader instancing which would work even on VS1.1 HW).

The fixed function pipeline is implemented on a lower level than you are working on, so keep in mind that you have one additional API level to go through that the fixed function pipeline doesn't. If you implement a "emulation" through shaders, it's bound to be slower, since you will have more overhead (you're working in the application layer, not the runtime layer).

In DirectX for example, it's implemented in the DX runtime which communicates directly with the graphics driver, which in turn has optimized shaders for the FF pipeline. This applies in the same way to OpenGL.

This isn't usually a problem though, because most game engines will utilize the other performance benefits of using shaders (such as being able to implement instancing, using deferred rendering techniques, über-shaders, and / or using custom formats for vertex data such as half-floats, using texture atlasing to reduce draw calls, and lots of other techniques). It is in using these other techniques in which the real power of shader programming lies, and where you will get the most performance benefits.

Hope this helps and happy coding! :)


Share this post

Link to post
Share on other sites
Thank you for all of the good advice.

I am already doing all that you mentioned except instancing because I just started with shaders. I am sorting by shaders somewhat only because implicitly the same render states will result in the same shader being used over and over. But I will explicitly sort by shader before the end of my trials.

What would be considered a better approach considering this is a commercial game engine?

Using a set of generalized shaders that the engine uses for all of its standard rendering, with the ability to add your own custom shaders for more effects.
* To get interesting new effects you only have to combine some render states instead of writing a whole new shader.
* Performance would apparently be lower.

Building the whole framework around shaders. The graphics library does not know how to render anything. Instead the model and terrain libraries create their own shaders for every little thing they want to do.
* Should be faster. In theory?
* Every new object that needs to be rendered must supply its own rendering shader. Of course generating the required shaders is automatically part of the model library and will be done for every model you import from Maya/3D Studio Max/FBX, so naturally the engine is still load-and-go, but changing the way these models render or adding completely new effects can no longer be done just by tweaking render states.
* Less flexibility. With each object hardcoding its own shader I can no longer make simple or global changes. Things would be designed to draw one way and one way only.

I can’t decide on the best way to go here.

Yogurt Emperor

Share this post

Link to post
Share on other sites
I would opt for option #2. It's not as bad as it sounds, because you can create your own library/utility functions that your shaders re-use, so when you create a new effect, you just #include the needed files (like you would if coding regular C/C++).

You could write utility files for shadow mapping, environment mapping, lighting models and skinning (just examples, there's lots more you could write of course). You can now combine these in different ways to get new effects, and if you need custom stuff you just build upon these functions. And if a new awesome shadow mapping technique comes out, you just rewrite your shadow mapping routines and it will be applied in every effect that includes the shadow mapping utility code.

A simple way to let the engine easily interface with different shader code is to specify pre-defined names for a standard set of parameters that most of your shaders use. For example world/view/projection transforms, diffuse/normal map/shadow map samplers etc. As long as a shader uses the name for a variable that you pre-defined (for example g_mWorld for a world transform), your shader framework can automatically get a handle to it and the engine can automatically update it.

(Sidenote: some might argue that this is exactly the kind of thing that annotations are used for, and to them I respond that not all platforms that have support for shaders necessarily have support for effects, and simply using standardized names is a simple way to save yourself from a lot of headaches :))

Some of the more advanced commercial engines does this automatically via material editors. Many use a node based system where each node represents a certain effect (in actuality a small code fragment). An artist can then combine different nodes to get new effect, just by connecting them and specifying inputs (textures etc) and outputs. The framework then generates the real shader binaries from this graph automatically, with all the combinations needed for different types / numbers of light etc.

If this is the first time you're writing shaders, a node based system should probably not be the first thing you first try. Try to write a general code base first, and then expand it to something more general (and automated) in the future.

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!