Archived

This topic is now archived and is closed to further replies.

Fast triangle rasterizer

This topic is 5150 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I think it would be possible to do something like this: DWORD* addr_v0 = frameBuffer[y0 * nPitch + x0]; DWORD* addr_v1 = frameBuffer[y1 * nPitch + x1]; DWORD* addr_v2 = frameBuffer[y2 * nPitch + x2]; HUGE_FLOAT addr_ch0 = (addr_v1 - addr_v0) / (y1 - y0); HUGE_FLOAT addr_ch1 = (addr_v2 - addr_v0) / (y2 - y0); HUGE_FLOAT addr_ch2 = (addr_v2 - addr_v1) / (y2 - y1); // we start at the first vertex (smallest y value) HUGE_FLOAT hLine_start = addr_v0; HUGE_FLOAT hLine_end = addr_v0; while(hLine_start <= addr_v1) { DWORD* pixel = hLine_start; // rasterize the hLine while(pixel <= hLine_end) { *pixel = color; // set this pixel to color of triangle pixel++; // get to the next pixel } hLine_start += addr_ch0; // trace edge 0 hLine_end += addr_ch1; // trace edge 1 } Notice how incredibly fast this code would be... Also notice how incredibly nonexistent the HUGE_FLOAT is As you should know, the HUGE_FLOAT needs to be 32 bits for the pointer part PLUS at least ??? bits for the fraction of a pointer part... What can I do? Assume near pointer in the video memory and go with a float? I don''t think so... Use 64 bit fixed point math? does a 64 bit data type exist? I don''t know, please answer that if you do... Use a double? Is it big enough?... I''d rather go with the 64 bit fixed point strategy and save some time in the conversion float->int (adress), ''cause I''ve heard it''s quite time consuming...

Share this post


Link to post
Share on other sites
Here is my inner loop of my software rasterizing engine...

Note: This does texture mapping as well, so this does a bit more interpolation than your method:


for (y=y1;y<y2;y++)
{
//Have to modify it to check this out earlier, but for now it's fine.

if (y>=0&&y<200) //Check to see if it's on screen..

{
start = (mx1>>7)+x1;
fin = (mx2>>7)+x1;
if (fin<start)
hline(off,fin,start,(mc2>>7)+tx1,(mc1>>7)+tx1,(yc2>>7)+ty1,(yc1>>7)+ty1,z);
else
hline(off,start,fin,(mc1>>7)+tx1,(mc2>>7)+tx1,(yc1>>7)+ty1,(yc2>>7)+ty1,z);
}
else if (y==0) off=0;
mx1+=ma1;
mx2+=ma2;
mc1+=mac1;
mc2+=mac2;
yc1+=yac1;
yc2+=yac2;
off+=320; //Stores the offest of the screen address (320x200 as of now)

}



--- Edit ---
Formatting of code came out horrible with a copy/paste! Anyways, this uses fixed point math and was originally written on a p200 machine (also used on 486 without a math co-processor).

Also wanted to mention, this could be optimized further, and it was written many years ago by myself.

[edited by - Ready4Dis on November 8, 2003 11:07:08 AM]

Share this post


Link to post
Share on other sites
You are attempting to optimize prematurely... in other words, you''re wasting your time! First finish your application, locate the real bottlenecks and optimize those.

Share this post


Link to post
Share on other sites
quote:
Original post by C0D1F1ED
You are attempting to optimize prematurely...


How could you possibly know that ???
My application IS finnished... (lying?)
Besides I think a triangle rasterizer is too elementary to be "optimized prematurely", don't you?
I mean all it is supposed to do is rasterize a triangle, it's the end of the friggin' pipeline

[edited by - Lejno on November 9, 2003 3:10:02 PM]

Share this post


Link to post
Share on other sites
quote:
Original post by lejno
How could you possibly know that ???
My application IS finnished... (lying?)
Besides I think a triangle rasterizer is too elementary to be "optimized prematurely", don''t you?
I mean all it is supposed to do is rasterize a triangle, it''s the end of the friggin'' pipeline

Your application might be finished for now, but are you sure you will never reuse this code for a new project which goes further than flat color filling? Personally I''ve lost a lot of time thinking my application was ''finished'' and optimizing it many long nights. But then when I wanted a new feature I had to either start all over again or roll back to a version that was not optimized. Are you sure you never need gouraud shading or textures? A rasterizer is not elementary at all, especially if you need multiple textures, mipmapping, diffuse and specular lighting. It might be the end of your friggin'' pipeline today, but tomorrow you might want new features that are not compatible with your optimization.

Besides, like I already said, you have dozens of clock cycles per pixel. So I don''t think a tiny optimization like this would really make a difference. Plus, you can''t be sure that this is actually an optimization and not slowing things down because modern CPU''s behave in a very complex manner. It''s quite probable that the mix of floating-point and integer calculations is not beneficial. And the code you think you save with this might actually be for free in assembly thanks to complex addressing modes.

Anyway, if you really still want to try it, double precision floating-point numbers should suffice...

Share this post


Link to post
Share on other sites
C0D1F1ED, I see, thanks for the tip...

But as a flat shading rasterizer it really SEEMS to be "as fast as it can get", granted I'm no pentium guru...

If I want a rasterizer that did guraud shading then I'd just write another one...

Any tips on optimizations that take into account the complex ways of todays processors?

Is a double float 64 bits on all computers?

Why is float to int conversion so time consuming?

[edited by - Lejno on November 10, 2003 4:10:42 PM]

Share this post


Link to post
Share on other sites
quote:
Original post by C0D1F1ED
You are attempting to optimize prematurely... in other words, you''re wasting your time! First finish your application, locate the real bottlenecks and optimize those.


OMFG man, these people and complaining about premature optimizations all the time. I''m glad that just good enough is good for you guys, but if he''s going for all out speed, and the function is 8 lines long right now... how is optimizing it, KNOWING it''s the main loop in the entire rendering engine, wasting your time. So you want to add new features, great, add them, doesn''t mean you wasted your time making this specific section faster. I''ve spent many days optimizing my software graphics engines inner loops, and i''ve gotten some great gains by doing so. Something like this is NOT premature, because if this section isnt'' as fast as it can be, and it gets called to draw everything in the entire game, it will be a bottle neck, so what''s the point of writing it ineficiently then re-writing it again, just do it right the first time. Yes, there are times when you shouldn''t worry about optimizing, but if you know it''s going to have to be sooner or later, why not just write it once as efficiently as you can, and be done with it, rather than re-write it multiple times because you didn''t want to optimize when you started.

Share this post


Link to post
Share on other sites
float->int conversion tends to call a helper function on the PC that futzes with the rounding mode to adhere to the C standard.

Using floats for your address arithmetic is interesting, if a bit flawed. You''ll probably get shafted on the float format. A 64-bit float still only has a 53 bit mantissa or so.

Share this post


Link to post
Share on other sites
quote:
Original post by lejno
But as a flat shading rasterizer it really SEEMS to be "as fast as it can get", granted I''m no pentium guru...

The only way to make sure is to test it. ;-)
quote:
If I want a rasterizer that did guraud shading then I''d just write another one...

Sure. And if you want one with one texture you''d write a third one. And seven more if you want up to eight textures. And a dozen more if you want to combine that with lighting. And let''s not forget the thousands of other possibilities with blending modes. Or the millions you need for every possible render state with fog, stenciling, sampling modes, filtering modes, mipmapping and anti-aliasing.

Well, I don''t know if you have any future plans with software rendering, but that''s how things were going with my own renderer. I started out with a spinning cube and for a while I thought this was all I ever wanted, but I became hooked up and wanted more. But the real problem was that optimizations had to be done for all variants. If I changed anything, they all had to be adjusted and this was so unmanagable that in the end only three rasterizers actually worked. The only solution was to generate the rasterizers at run-time. For this I wrote my own assembler: SoftWire. I used it to create swShader, which can render anything a modern graphics card can, and more...

So, I might have a totally different point of view, but I think you''re wasting much of your time. Not all, because experimentation is good, but for some things there are ''standard'' methods that are really standard for a reason. So I think it might be better to spend your time in expanding your knowledge about these methods. And for really low-level optimizations some knowledge about assembly wouldn''t hurt either. I really don''t want to stop you from trying new things, your idea is very nice! But soon or later you''re going to realize it doesn''t make a difference.

Anyway, if you''re interested in expanding the possibilities of your rasterizer, here''s an excellent standard starting point: Chris Hecker''s Perspective Texture Mapping.
quote:
Any tips on optimizations that take into account the complex ways of todays processors?

Yes, profile. Profiling means testing performance in practice. Like I already said, the exact behaviour of modern processors is hard to predict, -especially- for very short code and small ''optimizations''. So the only way to make sure if anything is faster or not is to try it. This takes a lot of time, and most of this time is wasted, so that''s what I''m trying to warn you from. A profiling application, like build into Visual Studio, or Intel''s VTune, can tell you exactly what functions or even instructions are slowing down your program. Only focussing on the real bottlenecks will save you a lot of time and the results can be really surprising.
quote:
Is a double float 64 bits on all computers?

I don''t know if it''s a hard standard but I''ve never seen different. Even if it''s defined differentely, I''m quite sure you can accurately represent a 32-bit integer with any ''double''.
quote:
Why is float to int conversion so time consuming?

It isn''t. But C++ and the Pentium use different standard rounding modes. So to keep consistency, the compiler has to replace every conversion with a function that sets the correct rounding mode, does the rouding, and then resets the rounding mode. It''s changing the rounding mode that is slow. But, if you set the rounding mode at the start of your rendering function, then do all conversions directly in assembly you can avoid all the state changes. Good optimizing compilers do this automatically, but not all.

Besides this, there are several floating-point instructions that require integer circuitry and vice-versa. So they can block each other''s execution, but not that drastically...

Good luck!

Share this post


Link to post
Share on other sites
C0D1F1ED,
I took a look at your projects... very impressive!!

Was that demo with the teapot and the golden ring really software rasterizing?

So you do all your rasterizing using pixel and vertex shaders JITcompiled with your SoftWire project???

I have to say that sounds pretty advanced, like Direct3D but using software rasterizers huh?

Share this post


Link to post
Share on other sites
Yes, it''s completely software rendered. Just use an old 2D card if you want. Here''s another demo: Real Virtuality.

The shaders are JIT-compiled but also recompiled for every render state. This removes all conditional code so only the necessary instructions are executed. A cache system is used to prevent re-compiles. You can read the details about it in my article in the ShaderX 2 Tips & Tricks book.

I use Direct3D specifications and interfaces as a reference, but swShader is actually unlimited. For example the number of instructions in the shaders is unlimited. I once tested it with 1000 texture reads and it was still kind of interactive...

Share this post


Link to post
Share on other sites