My Software Triangle Rasterizer

Started by
24 comments, last by Trenki 16 years, 10 months ago
Quote:Original post by Winograd
Have you looked at this paper:
Triangle Scan Conversion using 2D Homogenous Coordinates

It dodges quite nicely many problems related to projection of 3D triangles to 2D, such as, w component being 0. Basicly, you don't need frustum clipping if you use homogenous rasterizer. I definitely recommend you to at least read the paper.

Also one idea worth investigating is using hierarchical z-buffer, which may improve performance especially in case of high depth complexity scenes.


Hi Winograd!

I know of the paper you mentioned and I took a quit look at it but didn't read it all. I didn't understand everything and I also din't get how to actually draw the triangle to the 2d screen at the end + perspective divide for each pixel is too expensive.

Regarding the hierachical z-buffer: I already thought about this and provided a way for the shader to do it. Basically the shader can remember the minimum z of each 8x8 block and than let the rasterizer reject the whole block if it determins that the minimum expected z value for this block is larger.

Advertisement
Quote:Original post by Trenki
There should be only one virtual function call per fragment and I don't believe this virtual function call is a heavy penalty. I could be using a function pointer instead but do you really believe that would be faster?
I do, yes. It's small, but when you're handling things per fragment, that's a lot of calls. (About 29 mil at 800 x 600 at 60fps, and that's with zero overdraw.) Plus I have a few low level ideas I want to play with...
Quote:
Quote:along with a few other niceties.

Could you please elaborate on this and point out what exacly you mean.
Sorry, it's strictly experimental R&D right now and I would prefer not to say too much until I know for sure what works and what doesn't. Here's a teaser though.
VERTEX_SHADER( Shader ){	VS_INPUTS (		((Float4, Position, Position))		((Float4, Diffuse, Color))	);	VS_OUTPUTS (		((Float4, Position, Position))		((Float4, Diffuse, Color))	);	VS_Output SimpleVS( const VS_Input& in )	{		VS_Output out;				out.Position = in.Position;		out.Diffuse = in.Diffuse;				return out;	}} END_VS( Shader );
That's pure C++ code. It should generate vectorized SSE when I'm done.

[Edited by - Promit on May 27, 2007 4:49:07 PM]
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
Quote:Original post by Trenki
There should be only one virtual function call per fragment and I don't believe this virtual function call is a heavy penalty. I could be using a function pointer instead but do you really believe that would be faster?

Virtual functions are just an abstraction of function pointers. So it wouldn't be faster to use a function pointer explicitely. However, argument passing is quite expensive. You might be able to speed things up a little by making the arguments class members and sharing them with the shader routine. Beware of turning things into spaghettic code though...
Quote:Original post by Winograd
Basicly, you don't need frustum clipping if you use homogenous rasterizer.

That's a nice property for a hardware implementation, but in software clipping is quite simple and fast. The homogenous rasterizer needs extra work per pixel, which makes it less attractive for software.
Quote:Also one idea worth investigating is using hierarchical z-buffer, which may improve performance especially in case of high depth complexity scenes.

Yeah, my implementation with 8x8 blocks can be used directly with a hierarchical z-buffer. In my experience it's not worth it when working with low resolutions though (typically the case for a software renderer), but your mileage may vary.
Quote:Original post by Promit
Sorry, it's strictly experimental R&D right now and I would prefer not to say too much until I know for sure what works and what doesn't. Here's a teaser though.*** Source Snippet Removed ***That's pure C++ code. It should generate vectorized SSE when I'm done.

Have you looked at Sh yet? It's capable of writing out C code that can be compiled at run-time with GCC or ICC.

What back-end are you using to generate SSE code?
Question to C0D1F1ED: As you pointed out I could get an undesired wrapping effect for the texture coordinates at the edges of the triangles. I have given it some thought but could not come up with a satisfactory solution. What would you suggest without requiring the expensive perspective correction and retaining the overall design?
Quote:Original post by Trenki
As you pointed out I could get an undesired wrapping effect for the texture coordinates at the edges of the triangles. I have given it some thought but could not come up with a satisfactory solution. What would you suggest without requiring the expensive perspective correction and retaining the overall design?

For fully covered tiles, keep using linear interpolation. For partially covered tiles, use per-pixel perspective correction.

To avoid even more perspective correction, you can detect which triangles are either small enough or 'flat' enough not to require perspective correction at all...

But beware of the law of premature optimization! This is going to complicate your design, and could make it quite complicated to maintain the code. So only do it if you really need it and everything else is functional.

The wrapping is not that terrible, so depending on the needs of your projects you might not need to solve it at all.
Quote:Original post by C0D1F1ED
Quote:Original post by Winograd
Basicly, you don't need frustum clipping if you use homogenous rasterizer.

That's a nice property for a hardware implementation, but in software clipping is quite simple and fast. The homogenous rasterizer needs extra work per pixel, which makes it less attractive for software.


Clipping is quite fast on hardware also ;) Well I guess your point is that on hardware there is no extra per pixel cost except in the number of gates. I haven't implemented such rasterizer but at quick glance it seems one could apply "DDA"-like algorithm which would account to basicly 4 additions per pixel of which one is used for perspective division.
I've now coded up a vertex transformation and clipping pipeline and tested the whole stuff with a cube. It runs fine for five seconds and then I get an integer division by 0 in the rasterizer.

I could trace it to this line
int w10 = fixdiv<16>(1 << 16, solve_plane(xx1, y0, wPlane));

Apparently solve_plane returns 0 here. I checked with my calculator using the values VS studio gave me but if I did the calculations correctly it should really ba returning 1. Still this is a problem since such things should not happen.

The problems with the triangles at the top of the cube when it is viewed at a shallow angle (y coordinates of the corners: 243 244 246).

Does anyone have suggestions on how to counter this problem?
Quote:Original post by Trenki
I checked with my calculator using the values VS studio gave me but if I did the calculations correctly it should really ba returning 1.

Did you enable break on exceptions in the Debug > Exceptions... menu? When the exception occurs, break, and then place the 'yellow arrow' at the start of the calculation (you might need to first step out of the function, then move the arrow). This way you can interactively follow the calculations. Pressing Alt+F8 will show the disassembly so you can follow one instruction at a time. Or you can split your C++ calculation up into elemental computations so you can follow line by line.

That should quickly reveal the cause of the error...

This topic is closed to new replies.

Advertisement