Sign in to follow this  

Optimizing C++ with code generators

This topic is 3597 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi. I'm writing a software renderer in C++ (for evil reasons, don't ask), and I have come to believe that I could use dynamic code generation to optimize some critical parts. For example, I suppose it can be used to avoid the need to check per pixel whether alpha blending is enabled, or z-writing or z-testing is enabled, or to check the pixel format of the render target, vertex component assembly, interpolating only when necessary... etc. But how would I go about writing/using a dynamic code generator? What do I need to know to make/use them? What are the typical advantages and disadvantages? Any pointers? Thanks in advance.

Share this post


Link to post
Share on other sites
Instead of writing a code generator, you could just use the one that's already built into the C++ language (i.e. Templates).
The downside of this is that most of your code will have to become infused with templates, which can complicate the way in which you organise your code - and if done badly can greatly increase compilation times.

In particular, look up partial template specialization for implementing different behaviours for the same class.

Here's a taste - moving the 'alpha blending is enabled check' to per-triangle instead of per-pixel:
template< bool alphaBlend >
struct PixelFiller
{
void operator()( RGBA& out, const RGBA& in ) const
{
out = in;
}
};
template<>
struct PixelFiller<true>
{
void operator()( RGBA& out, const RGBA& in ) const
{
out = in*in.a + out*(1-in.a);
}
};
...
template<class PixelWriteFunc>
struct TriangleFiller
{
void operator()( const Triangle&, const PixelWriteFunc& write )
{
for each pixel
write( backBufPixel, newPixel );
}
};
...
if( alphaBlendingIsEnabled )
{
PixelFiller<true> pixFunc;
TriangleFiller< PixelFiller<true> > triFunc;
triFunc( triangle, filler );
}
else
{
PixelFiller<false> pixFunc;
TriangleFiller< PixelFiller<false> > triFunc;
triFunc( triangle, filler );
}





[Edited by - Hodgman on February 7, 2008 7:50:08 PM]

Share this post


Link to post
Share on other sites
Thanks for the reply. I thought about that, but I guess I'm talking about a deeper level of optimization which is also more dynamic (and thus flexible). I think a similar (performance-wise) approach to what you described would be using function pointers, but they can only go so far. There are cases where function pointers and templates just don't cut it. For example, optimum code for fetching various vertex components from different places (this is of course assuming we're looking at using flexible vertex formats, kinda like in Direct3D) and determining which of these components needs interpolation... etc etc. Or maybe even determining the source of alpha when finally blending pixels (from texture, from vertex color, a constant.. etc). This, I think, can only be achieved by dynamic code generation.

Share this post


Link to post
Share on other sites
Quote:
Original post by hikikomori-san
I think a similar (performance-wise) approach to what you described would be using function pointers, but they can only go so far.

Function pointers are evaluated at run-time. The template code I showed you is evaluated at compile time... So the performance won't be similar.
Quote:
There are cases where function pointers and templates just don't cut it.
...
This, I think, can only be achieved by dynamic code generation.

Templates *are* dynamic code generation!
Quote:
optimum code for fetching various vertex components from different places and determining which of these components needs interpolation... etc etc. Or maybe even determining the source of alpha when finally blending pixels (from texture, from vertex color, a constant.. etc).
All of those challenges are well suited to C++ template specialization.


I'm only pointing this out because this functionality is already available in the language. So you can either become proficient in the existing industry standard (teaching you an industry-valued skill-set, making you more employable), or become proficient in your own method (that no employer finds useful).

Share this post


Link to post
Share on other sites
Quote:
For example, optimum code for fetching various vertex components from different places


I'm sure CS research community would be interested into optimal code proofs you're using to determine that.

It is quite unlikely that you'll be able to outperform MSVC template code optimization for modern processor architectures. Writing an optimizing compiler that takes into consideration pipelining and cache characteristics is a monumental task.

How familiar are you with assembly and processor architectures at all?

Share this post


Link to post
Share on other sites
I already know how and what it means to use templates. I'm not talking about industry standards or the industrial appeal here, I'm concerned about performance and performance only. As for the code you posted, it does indeed work well for this example (alpha blending), and it is not the same as function pointers (which I failed to notice the first time around). As for templates being dynamic code generation, I thought it was already clear by context that by dynamic I meant run-time dynamic, so I guess this is my mistake as well.

Actually the reason I'm looking into dynamic code generation is becaue I'm planning on allowing different shaders (vertex, pixel, and even geometry shaders) to have different input and output semantics, and the most efficient way to do the pipelining between them, I think, is this method.

[EDIT]: @Antheus: Actually I'm not familiar with assembly (not to a useful degree at least). That's actually the point. I just wanted to ask what it would take. Apparently I'd better stick with more conventional methods.

Share this post


Link to post
Share on other sites
Quote:
Original post by hikikomori-san
Actually the reason I'm looking into dynamic code generation is becaue I'm planning on allowing different shaders (vertex, pixel, and even geometry shaders) to have different input and output semantics, and the most efficient way to do the pipelining between them, I think, is this method.


There are many attempts at *shader code* generation. Whole different concept. You have several options there, and the code is somewhat simpler. That is obviously viable.

But it's in direct conflict with: "I'm writing a software renderer in C++".

Quote:
But humor me here - let's just say that I want to learn about dynamic code generation for no purpose. Where should I start looking?


Write a compiler.
If you want to look into various optimization techniques, BCEL is interesting choice for Java bytecode generation.
Graph theory comes handy. Many optimization techniques are based on it.
Read through all the processor manufacturer manuals. They describe common techniques.
Get acquainted with Herb Sutter's work and publications. Invaluable information there. Well presented for non-insiders as well.

Quote:
Actually I'm not familiar with assembly (not to a useful degree at least)


I would consider this to be a considerable road-block. This would make it the first step then - get proficient in assembly programming.

Share this post


Link to post
Share on other sites
Quote:
Original post by hikikomori-san
For example, optimum code for fetching various vertex components from different places (this is of course assuming we're looking at using flexible vertex formats, kinda like in Direct3D) and determining which of these components needs interpolation... etc etc. Or maybe even determining the source of alpha when finally blending pixels (from texture, from vertex color, a constant.. etc). This, I think, can only be achieved by dynamic code generation.


I see fetching vertex components as a dependent read. You can hide the latency from the first read by scheduling it sufficiently early enough, so I don't think it's worth the trouble just for that.

If I was going to do this, as a first attempt I would write a rasterizer that tests every render state per pixel. Then at runtime I would alter the branches that correspond to the tests to either explicitly jump or not jump based on the render state, resulting in a conditional-branch-free path through your inner loop. However you're probably only doing a bit better than a good branch predictor with this.

Off the top of my head I don't think i-caches follow predicted or unconditional branches, so it isn't the same as having a straight line inner loop that corresponds to your render states, but to get that you either need to stitch together the precompiled sequences or generate the appropriate inner loop at runtime from some higher level representation, which is a whole 'nother story.

If you're going to stitch together sequences then you have to have some control over register usage to make sure data is passed along correctly through any permutation of your inner loop, so it's not exactly trivial.

Share this post


Link to post
Share on other sites
If you're willing to ship a C++ compiler with your application, you can look at how something like TaskGraph generates code at runtime. Otherwise, you've got to learn assembly or do some really, really evil, nonportable runtime code hacks that probably won't work on modern processors with data execute protection.

Share this post


Link to post
Share on other sites
Shader code generation? That's not what I meant. The shaders are to be written by the user (the client application using the SW renderer). What I want is not to end up executing something like this for every vertex:


if( activeVS->ReadsNormals() )
{
if( vertexFormat->HasNroamls() )
{
VSInputRegisters.normal = ...//inspect vertex format to know where to get normal and copy it from there.
// Insert normal interpolation stuff here.
}
else
VSInputRegisters.normal = defaultNormalValue;
}




Well hopefully you get the idea. I was aiming for something like outRider talked about.. stitching code together or modifying jmp instructions. But obviously this is not as trivial as I'd hoped it would be. Thanks very much, and look forward to the first screenshot of my super cool advanced next-gen cross-platform vista-ready multi-threaded assembly-optimized next-gen (again) SW renderer that will make you all abandon D3D and OpenGL for good!

[EDIT]: Actually I was planning on using TCC since I don't know assembly, and maybe I will and see if that works out, even though I know I have to be careful since it's not really much of an _optimizing_ compiler.

Share this post


Link to post
Share on other sites

This topic is 3597 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this