Choosing template policies at run-time?

Started by
10 comments, last by Gage64 16 years, 3 months ago
I apologize in advance for the long post [smile]. I'm starting work on a software renderer. So far I have only implemented wireframe rendering, but I'm starting to plan ahead a little bit, and I'm thinking about how I should handle render states. For now let's only consider backface culling and blending. Let's say that the options for backface culling are: None, CW, CCW, and for blending: Copy, ColorKey (maybe blending is not the right term here but that's beside the point). The obvious way to handle this is to simply use switch statements, but that would mean using a conditional for every triangle (for backface culling) and for every pixel (for blending), which will incur a significant performance overhead (especially as more render states are added). For the same reason, I don't want to use function pointers or virtual functions (something like the strategy pattern). I want any such conditional code to be done at the object level. I own Tricks of the 3D Game Programming Gurus and there it is (partially) accomplished by writing dozens of different functions that look almost identical (to avoid the conditionals), and using a huge block of if statements to decide which function to call. Needless to say, this isn't pretty, and as the number of render states and rendering options increases, this becomes completely unmanageable. This article uses a policy-based design to solve this problem, and I thought it would be perfect for this, but I then realized that I have no idea how I would choose the policy at run-time. Basically what I want to do is this:

if (cullMode == CW)
    CullPolicyCW::cull(...);
if (blendMode == Copy)
    BlendPolicyCopy::blend(...);
but I want this to happen at the object level, that is, I want to have a drawTriMesh() function that uses policies to render a triangle mesh, and "plug" the correct polices based on the current render state, like this (think of this as pseudo-code):

if (cullMode == CW)
    cullPolicy = CullPolicyCW;
if (blendMode == Copy)
    blendPolicy = BlendPolicyCW;
//...

Renderer<cullPolicy, blendPolicy>::drawTriMesh(...);
And now comes my problem - I don't know how to implement this in a good way. The above is "pseudo-code" because I can't have a variable that stores a type (I think). I could implement this with virtual functions and the strategy pattern, but the whole point is to avoid this for performance reasons. I had one "neat" idea, but because this post is long enough I will just list the source code for it, and answer any questions that come up. Note that this is just prototype code quickly throw together to test the idea:

#include <iostream>
#include <map>
#include <cassert>
using namespace std;

enum CullMode {
	None,
	CW,
	CCW,
};

enum BlendMode {
	Copy,
	ColorKey,
};

class Renderer {
public:
	virtual ~Renderer() {}

	virtual void render() = 0;
};

struct CullPolicyNone {
	static void cull() { cout << "None" << endl; }
};

struct CullPolicyCW {
	static void cull() { cout << "CW" << endl; }
};
struct CullPolicyCCW {
	static void cull() { cout << "CCW" << endl; }
};

struct BlendPolicyCopy {
	static void blend() { cout << "Copy" << endl; }
};

struct BlendPolicyColorKey {
	static void blend() { cout << "ColorKey" << endl; }
};

template <class C, class B>
class RendererImpl : public Renderer {
public:
	void render() {
		C::cull();
		B::blend();
	}
};

struct RenderState {
	explicit RenderState(CullMode c, BlendMode b) {
		cullMode = c;
		blendMode = b;
	}

	friend bool operator<(const RenderState &l, const RenderState &r) {
		int d = l.cullMode - r.cullMode;
		if (d == 0)
			return (l.blendMode - r.blendMode) < 0;
		return d < 0;
	}

	CullMode cullMode;
	BlendMode blendMode;
};

class Device {
public:
	Device():rs(CullMode::CW, BlendMode::Copy) {
		renderMap[RenderState(None, Copy)]		= new RendererImpl<CullPolicyNone,	BlendPolicyCopy>();
		renderMap[RenderState(CW,	Copy)]		= new RendererImpl<CullPolicyCW,	BlendPolicyCopy>();
		renderMap[RenderState(CCW,	Copy)]		= new RendererImpl<CullPolicyCCW,	BlendPolicyCopy>();
		renderMap[RenderState(None, ColorKey)]	        = new RendererImpl<CullPolicyNone,	BlendPolicyColorKey>();
		renderMap[RenderState(CW,	ColorKey)]	= new RendererImpl<CullPolicyCW,	BlendPolicyColorKey>();
		renderMap[RenderState(CCW,	ColorKey)]	= new RendererImpl<CullPolicyCCW,	BlendPolicyColorKey>();
	}

	~Device() {
		for (RenderMap::iterator it = renderMap.begin(); it != renderMap.end(); ++it)
			delete it->second;
	}

	void render() {
		Renderer *r = renderMap[rs];
		r->render();
	}

public:
	typedef map<RenderState, Renderer *> RenderMap;
	RenderMap renderMap;
	RenderState rs;
};

void testRenderer() {
	Device *r = new Device();

	r->render();
	r->rs.cullMode = None;
	r->rs.blendMode = ColorKey;
	r->render();

	delete r;
}

int main() {
	testRenderer();
}


Using this approach, I don't have to write many functions or use a huge block of if statements, but as you can see, the renderMap initialization code will quickly grow as more render states are added. I don't really know much about advanced template techniques so maybe I'm missing something obvious, and maybe there's another solution altogether (how did people who wrote software renderers in pure C handle this?). Any help is greatly appreciated.
Advertisement
Well, it's common for a software renderer to generate code on the fly for the different renderstates. For example, if you set a cull mode, and a blend mode, the software will generate the rendering code for that particular combination on the fly.

If you're going to have a lot of different states, this is probably the only way to do it. Even if templates reduce the amount of C++ code, it won't reduce the amount of ASM code. Combinatorial explosion will occur if you have a lot of states, and your executable will be massive.
NextWar: The Quest for Earth available now for Windows Phone 7.
Quote:
Original post by Sc4Freak
Well, it's common for a software renderer to generate code on the fly for the different renderstates. For example, if you set a cull mode, and a blend mode, the software will generate the rendering code for that particular combination on the fly.


Do you mean self-modifying code? I've heard of this but I don't know how it works and I wanted to see if there's a solution that doesn't involve complicated assembly (which I assume self-mod code does), but I'll look into it further.

Quote:Even if templates reduce the amount of C++ code, it won't reduce the amount of ASM code. Combinatorial explosion will occur if you have a lot of states, and your executable will be massive.


How "massive" can they be? I've haven't thought about this (like I said, I have never done anything interesting with templates), but if you mean something like 5MB, then I guess it's no big deal?

BTW, I've seen some people that use scripts (usually python scripts) that generate template code. This will also increase the executable, but maybe the script itself will be manageable code-wise. I'm not sure what I think about this approach, but I'd like to hear any comments you might have on it.

Thanks for your help so far.
Quote:Original post by Gage64
Quote:
Original post by Sc4Freak
Well, it's common for a software renderer to generate code on the fly for the different renderstates. For example, if you set a cull mode, and a blend mode, the software will generate the rendering code for that particular combination on the fly.


Do you mean self-modifying code? I've heard of this but I don't know how it works and I wanted to see if there's a solution that doesn't involve complicated assembly (which I assume self-mod code does), but I'll look into it further.

Quote:Even if templates reduce the amount of C++ code, it won't reduce the amount of ASM code. Combinatorial explosion will occur if you have a lot of states, and your executable will be massive.


How "massive" can they be? I've haven't thought about this (like I said, I have never done anything interesting with templates), but if you mean something like 5MB, then I guess it's no big deal?

BTW, I've seen some people that use scripts (usually python scripts) that generate template code. This will also increase the executable, but maybe the script itself will be manageable code-wise. I'm not sure what I think about this approach, but I'd like to hear any comments you might have on it.

Thanks for your help so far.

It depends on how complex you want your renderer to be. An advanced renderer would have, say, the capabilities of a DirectX7 graphics card. I'm sure that even DirectX7 allowed trillions upon trillions of different input possibilities. Even if you severely limit the different states your renderer could have, you'd still have at a minimum a few hundred thousand combinations. Which isn't pretty.
NextWar: The Quest for Earth available now for Windows Phone 7.
I don't have any really helpful advice on the core problem, but I will say this: If you want generated code to cover all the possibilities, using a scripting language to auto-generate the code is a MUCH better plan than doing it on the fly (self-modifying code).

Reasons: The auto-generated code will actually be debuggable when you're done. On modern systems, self-modifying code is practically impossible. Many CPUs/OSs are now designed so that generating code on the fly is hard or impossible. Generating the code as source earlier is pretty much guaranteed to work anywhere.
Clearly, any solution involving templates / traditional policy-based design involves making compile-time decisions regarding code inclusion.

If you want run-time policy decisions, I think virtual functions and the strategy pattern are your best choice. IMHO, the performance costs would be trivial if decently implemented.



Quote:
Original post by CactusPenguin
If you want run-time policy decisions, I think virtual functions and the strategy pattern are your best choice. IMHO, the performance costs would be trivial if decently implemented.


That would have to be one hell of a decent implementation. Think of a blending strategy, for example. That would involve calling a virtual function per-pixel, which will completely kill performance. That's exactly why I said that the strategy pattern isn't an option here. Unless you meant something else?
You don't need to overcomplicate things so much.
There maybe a trazilion possible rendering combinations, but that doesn't mean they all have to be in your code.
Do you know why they call it a rendering Pipeline? Because you do one step after the other.
So in every step of the pipeline you may have some options on how that step will manipulate your data.(for a software rasterizer the data would be a pixel.)
Some pixels may be discarded by clipping or culling, so there the processing stops.

I would suggest to design something like that (pipeline-ish).
And if you nicely separate each step, it will be maintainable.

By the way, there is no such thing as self modifying code in a software rasterizer, also there isn't any in the drivers of a hardware rasterizer.
You might find it in virusses and some wicked AI algorithms.

And about the speed of the thingy, everybody knows software rasterizers are slow.
Someone who uses a, euhm..., delta!?
Quote:
Original post by delta user
if you nicely separate each step, it will be maintainable.


Well that's just the thing - right now I don't know how to do this separation. Maybe after I implement some more functionality (right now I only have a wireframe renderer with backface culling) I'll gain some further insight.

Quote:there is no such thing as self modifying code in a software rasterizer


Why not? It sounds like it can be applied just fine.

Quote:And about the speed of the thingy, everybody knows software rasterizers are slow.


You don't really expect me to take this statement seriously, do you?
Quote:there is no such thing as self modifying code in a software rasterizer

er.. yes there is. maybe not actually self-*modifying* code, but code that builds/compiles asm codepaths on the fly, yes, definitely.
(and there definitely was real self-modifying code in old software rasterizers, maybe you can still find some today, although memory write/exectution protections would almost certainly mess things up for naive self-modifying code)

about generated asm codepaths, it isn't necessary, of course, but there are a few software rasterizers that do this. (for example, softwire)

OP> something you could do that wouldn't totally kill perfs or executable sizes might be to use a jump table (pretty much like you did up there), but only specialize the most common paths, or the blend modes the most likely to be used most of the time, and for all the other unlikely ones, use a default jump to a generic, 'if'-based codepath.

or generate rendering scripts on the fly, but then you'd probably be better off with 'if' statements from a performance point of view, I guess...

This topic is closed to new replies.

Advertisement