Sign in to follow this  
Gage64

Choosing template policies at run-time?

Recommended Posts

I apologize in advance for the long post [smile]. I'm starting work on a software renderer. So far I have only implemented wireframe rendering, but I'm starting to plan ahead a little bit, and I'm thinking about how I should handle render states. For now let's only consider backface culling and blending. Let's say that the options for backface culling are: None, CW, CCW, and for blending: Copy, ColorKey (maybe blending is not the right term here but that's beside the point). The obvious way to handle this is to simply use switch statements, but that would mean using a conditional for every triangle (for backface culling) and for every pixel (for blending), which will incur a significant performance overhead (especially as more render states are added). For the same reason, I don't want to use function pointers or virtual functions (something like the strategy pattern). I want any such conditional code to be done at the object level. I own Tricks of the 3D Game Programming Gurus and there it is (partially) accomplished by writing dozens of different functions that look almost identical (to avoid the conditionals), and using a huge block of if statements to decide which function to call. Needless to say, this isn't pretty, and as the number of render states and rendering options increases, this becomes completely unmanageable. This article uses a policy-based design to solve this problem, and I thought it would be perfect for this, but I then realized that I have no idea how I would choose the policy at run-time. Basically what I want to do is this:
if (cullMode == CW)
    CullPolicyCW::cull(...);
if (blendMode == Copy)
    BlendPolicyCopy::blend(...);
but I want this to happen at the object level, that is, I want to have a drawTriMesh() function that uses policies to render a triangle mesh, and "plug" the correct polices based on the current render state, like this (think of this as pseudo-code):
if (cullMode == CW)
    cullPolicy = CullPolicyCW;
if (blendMode == Copy)
    blendPolicy = BlendPolicyCW;
//...

Renderer<cullPolicy, blendPolicy>::drawTriMesh(...);
And now comes my problem - I don't know how to implement this in a good way. The above is "pseudo-code" because I can't have a variable that stores a type (I think). I could implement this with virtual functions and the strategy pattern, but the whole point is to avoid this for performance reasons. I had one "neat" idea, but because this post is long enough I will just list the source code for it, and answer any questions that come up. Note that this is just prototype code quickly throw together to test the idea:
#include <iostream>
#include <map>
#include <cassert>
using namespace std;

enum CullMode {
	None,
	CW,
	CCW,
};

enum BlendMode {
	Copy,
	ColorKey,
};

class Renderer {
public:
	virtual ~Renderer() {}

	virtual void render() = 0;
};

struct CullPolicyNone {
	static void cull() { cout << "None" << endl; }
};

struct CullPolicyCW {
	static void cull() { cout << "CW" << endl; }
};
struct CullPolicyCCW {
	static void cull() { cout << "CCW" << endl; }
};

struct BlendPolicyCopy {
	static void blend() { cout << "Copy" << endl; }
};

struct BlendPolicyColorKey {
	static void blend() { cout << "ColorKey" << endl; }
};

template <class C, class B>
class RendererImpl : public Renderer {
public:
	void render() {
		C::cull();
		B::blend();
	}
};

struct RenderState {
	explicit RenderState(CullMode c, BlendMode b) {
		cullMode = c;
		blendMode = b;
	}

	friend bool operator<(const RenderState &l, const RenderState &r) {
		int d = l.cullMode - r.cullMode;
		if (d == 0)
			return (l.blendMode - r.blendMode) < 0;
		return d < 0;
	}

	CullMode cullMode;
	BlendMode blendMode;
};

class Device {
public:
	Device():rs(CullMode::CW, BlendMode::Copy) {
		renderMap[RenderState(None, Copy)]		= new RendererImpl<CullPolicyNone,	BlendPolicyCopy>();
		renderMap[RenderState(CW,	Copy)]		= new RendererImpl<CullPolicyCW,	BlendPolicyCopy>();
		renderMap[RenderState(CCW,	Copy)]		= new RendererImpl<CullPolicyCCW,	BlendPolicyCopy>();
		renderMap[RenderState(None, ColorKey)]	        = new RendererImpl<CullPolicyNone,	BlendPolicyColorKey>();
		renderMap[RenderState(CW,	ColorKey)]	= new RendererImpl<CullPolicyCW,	BlendPolicyColorKey>();
		renderMap[RenderState(CCW,	ColorKey)]	= new RendererImpl<CullPolicyCCW,	BlendPolicyColorKey>();
	}

	~Device() {
		for (RenderMap::iterator it = renderMap.begin(); it != renderMap.end(); ++it)
			delete it->second;
	}

	void render() {
		Renderer *r = renderMap[rs];
		r->render();
	}

public:
	typedef map<RenderState, Renderer *> RenderMap;
	RenderMap renderMap;
	RenderState rs;
};

void testRenderer() {
	Device *r = new Device();

	r->render();
	r->rs.cullMode = None;
	r->rs.blendMode = ColorKey;
	r->render();

	delete r;
}

int main() {
	testRenderer();
}


Using this approach, I don't have to write many functions or use a huge block of if statements, but as you can see, the renderMap initialization code will quickly grow as more render states are added. I don't really know much about advanced template techniques so maybe I'm missing something obvious, and maybe there's another solution altogether (how did people who wrote software renderers in pure C handle this?). Any help is greatly appreciated.

Share this post


Link to post
Share on other sites
Well, it's common for a software renderer to generate code on the fly for the different renderstates. For example, if you set a cull mode, and a blend mode, the software will generate the rendering code for that particular combination on the fly.

If you're going to have a lot of different states, this is probably the only way to do it. Even if templates reduce the amount of C++ code, it won't reduce the amount of ASM code. Combinatorial explosion will occur if you have a lot of states, and your executable will be massive.

Share this post


Link to post
Share on other sites
Quote:

Original post by Sc4Freak
Well, it's common for a software renderer to generate code on the fly for the different renderstates. For example, if you set a cull mode, and a blend mode, the software will generate the rendering code for that particular combination on the fly.


Do you mean self-modifying code? I've heard of this but I don't know how it works and I wanted to see if there's a solution that doesn't involve complicated assembly (which I assume self-mod code does), but I'll look into it further.

Quote:
Even if templates reduce the amount of C++ code, it won't reduce the amount of ASM code. Combinatorial explosion will occur if you have a lot of states, and your executable will be massive.


How "massive" can they be? I've haven't thought about this (like I said, I have never done anything interesting with templates), but if you mean something like 5MB, then I guess it's no big deal?

BTW, I've seen some people that use scripts (usually python scripts) that generate template code. This will also increase the executable, but maybe the script itself will be manageable code-wise. I'm not sure what I think about this approach, but I'd like to hear any comments you might have on it.

Thanks for your help so far.

Share this post


Link to post
Share on other sites
Quote:
Original post by Gage64
Quote:

Original post by Sc4Freak
Well, it's common for a software renderer to generate code on the fly for the different renderstates. For example, if you set a cull mode, and a blend mode, the software will generate the rendering code for that particular combination on the fly.


Do you mean self-modifying code? I've heard of this but I don't know how it works and I wanted to see if there's a solution that doesn't involve complicated assembly (which I assume self-mod code does), but I'll look into it further.

Quote:
Even if templates reduce the amount of C++ code, it won't reduce the amount of ASM code. Combinatorial explosion will occur if you have a lot of states, and your executable will be massive.


How "massive" can they be? I've haven't thought about this (like I said, I have never done anything interesting with templates), but if you mean something like 5MB, then I guess it's no big deal?

BTW, I've seen some people that use scripts (usually python scripts) that generate template code. This will also increase the executable, but maybe the script itself will be manageable code-wise. I'm not sure what I think about this approach, but I'd like to hear any comments you might have on it.

Thanks for your help so far.

It depends on how complex you want your renderer to be. An advanced renderer would have, say, the capabilities of a DirectX7 graphics card. I'm sure that even DirectX7 allowed trillions upon trillions of different input possibilities. Even if you severely limit the different states your renderer could have, you'd still have at a minimum a few hundred thousand combinations. Which isn't pretty.

Share this post


Link to post
Share on other sites
I don't have any really helpful advice on the core problem, but I will say this: If you want generated code to cover all the possibilities, using a scripting language to auto-generate the code is a MUCH better plan than doing it on the fly (self-modifying code).

Reasons: The auto-generated code will actually be debuggable when you're done. On modern systems, self-modifying code is practically impossible. Many CPUs/OSs are now designed so that generating code on the fly is hard or impossible. Generating the code as source earlier is pretty much guaranteed to work anywhere.

Share this post


Link to post
Share on other sites
Clearly, any solution involving templates / traditional policy-based design involves making compile-time decisions regarding code inclusion.

If you want run-time policy decisions, I think virtual functions and the strategy pattern are your best choice. IMHO, the performance costs would be trivial if decently implemented.



Share this post


Link to post
Share on other sites
Quote:

Original post by CactusPenguin
If you want run-time policy decisions, I think virtual functions and the strategy pattern are your best choice. IMHO, the performance costs would be trivial if decently implemented.


That would have to be one hell of a decent implementation. Think of a blending strategy, for example. That would involve calling a virtual function per-pixel, which will completely kill performance. That's exactly why I said that the strategy pattern isn't an option here. Unless you meant something else?

Share this post


Link to post
Share on other sites
You don't need to overcomplicate things so much.
There maybe a trazilion possible rendering combinations, but that doesn't mean they all have to be in your code.
Do you know why they call it a rendering Pipeline? Because you do one step after the other.
So in every step of the pipeline you may have some options on how that step will manipulate your data.(for a software rasterizer the data would be a pixel.)
Some pixels may be discarded by clipping or culling, so there the processing stops.

I would suggest to design something like that (pipeline-ish).
And if you nicely separate each step, it will be maintainable.

By the way, there is no such thing as self modifying code in a software rasterizer, also there isn't any in the drivers of a hardware rasterizer.
You might find it in virusses and some wicked AI algorithms.

And about the speed of the thingy, everybody knows software rasterizers are slow.

Share this post


Link to post
Share on other sites
Quote:

Original post by delta user
if you nicely separate each step, it will be maintainable.


Well that's just the thing - right now I don't know how to do this separation. Maybe after I implement some more functionality (right now I only have a wireframe renderer with backface culling) I'll gain some further insight.

Quote:
there is no such thing as self modifying code in a software rasterizer


Why not? It sounds like it can be applied just fine.

Quote:
And about the speed of the thingy, everybody knows software rasterizers are slow.


You don't really expect me to take this statement seriously, do you?

Share this post


Link to post
Share on other sites
Quote:
there is no such thing as self modifying code in a software rasterizer

er.. yes there is. maybe not actually self-*modifying* code, but code that builds/compiles asm codepaths on the fly, yes, definitely.
(and there definitely was real self-modifying code in old software rasterizers, maybe you can still find some today, although memory write/exectution protections would almost certainly mess things up for naive self-modifying code)

about generated asm codepaths, it isn't necessary, of course, but there are a few software rasterizers that do this. (for example, softwire)

OP> something you could do that wouldn't totally kill perfs or executable sizes might be to use a jump table (pretty much like you did up there), but only specialize the most common paths, or the blend modes the most likely to be used most of the time, and for all the other unlikely ones, use a default jump to a generic, 'if'-based codepath.

or generate rendering scripts on the fly, but then you'd probably be better off with 'if' statements from a performance point of view, I guess...

Share this post


Link to post
Share on other sites
Quote:
Original post by Gage64
Quote:

Original post by delta user
if you nicely separate each step, it will be maintainable.


Well that's just the thing - right now I don't know how to do this separation. Maybe after I implement some more functionality (right now I only have a wireframe renderer with backface culling) I'll gain some further insight.

Quote:
there is no such thing as self modifying code in a software rasterizer


Why not? It sounds like it can be applied just fine.

Quote:
And about the speed of the thingy, everybody knows software rasterizers are slow.


You don't really expect me to take this statement seriously, do you?


Okay, self modifying code:

Yes, it could be made in such a way. But did you read the whole article? If you did, you would have noticed that your rasterizer won't work very good on many OSes. Most modern OSes are in some way protected against self modifying code because of the dangers of it.


The speed:

What I meant was, 'not as fast as hardware accelerated rasterizers'.
Someone can make a software rasterizer 'relatively fast' by optimising every bit of it.

Now maybe some help:(depends on whether you find it helpful)

I never made one myself, but I will give some ways I would make it.

All the steps in your pipeline are classes, which can be configured by the hints you give it. (ex.: backface culling or front face culling).

*managed approach:
Create a manager class which performs all the steps of the pipeline one by one and each time giving the output to the next step.

*chain approach:
Link the steps to the next steps and make them call that next step when it is done with it's own job.

This shouldn't be that hard if you make a base class for the steps, and then inherit from it, implementing the right functionality.


Gotta go, I hope it was a bit helpful.

Share this post


Link to post
Share on other sites
Quote:

Original post by momotte
something you could do that wouldn't totally kill perfs or executable sizes might be to use a jump table (pretty much like you did up there), but only specialize the most common paths, or the blend modes the most likely to be used most of the time, and for all the other unlikely ones, use a default jump to a generic, 'if'-based codepath.


That sounds like a good idea. Thanks for suggesting it.

Quote:
or generate rendering scripts on the fly, but then you'd probably be better off with 'if' statements from a performance point of view, I guess...


I'm not sure what you mean. Could you please explain this a bit (esp. the note on performance).

Quote:

Original post by delta user
*managed approach:
Create a manager class which performs all the steps of the pipeline one by one and each time giving the output to the next step.

*chain approach:
Link the steps to the next steps and make them call that next step when it is done with it's own job.

This shouldn't be that hard if you make a base class for the steps, and then inherit from it, implementing the right functionality.


Both those approaches sound the same to me...

I thought about something like this (if I understand you correctly), but that would mean I have to use multiple loops instead of one, which will affect memory coherency (I'm basing the renderer on this article, although I'm not really familiar with topics like prediction, cache coherency, pipeline stalls, cache misses, etc. and other topics that are critical to getting good performance on a modern CPU, but those are the things that motivate the architecture in that article, so I want to try and follow along).

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this