My Software Triangle Rasterizer

Started by
24 comments, last by Trenki 16 years, 11 months ago
The division by zero exception occurred and visual studio pointed me right at the location where that happend. The value solve_plane returned really was zero, so that resulted in the following division by 0.

The thing is that I compute 1/w for the corners of each 8x8 block using the plane equation calculated earlier. From this i compute W by taking the reciprocal. Since the corners of the block can lie outside the triangle I also can get values that I would not get if I stayed inside the triangle all the time.

I fixed it by simply setting the value to 1 if it was 0. Now the demo program which animates a simple cube runs.

Do you know where the profiler in Visual Studio is? Currently I use Orcas Beta 1.
Advertisement
Hi!

I have finished my vertex and clipping pipeline, so I could do some speed tests with my rasterizer. I get 600fps at 320x240 and ~60fps at 1024x768 on my AMD Athlon64 3500+ (2.2 Ghz) for a scene with a single gouraud shaded cube. The cube constanly fills ~1/9 of the screen area. As I don't have any numbers to compare this to I don't know how good/bad this is.

Thinking back to the days of DOS where I played Fatal Racing at 640x480 on my Pentium 166 which ran at ~20-30fps and filled the whole screen and also had texture mapping my rasterizer seems slow. What do you guys think?

I also did some profiling with gprof:
   %   cumulative   self             time   seconds   seconds    calls  name     72.20      4.26     4.26    31899  Rasterizer::draw_triangle 14.07      5.09     0.83           __divdi3 12.54      5.83     0.74 47792689  TestFragmentShader::shade

Even though shade is called 47 million times it requires less time than draw_triangle. Probably because the interpolation of the varyings happens in draw_triangle. I don't know how much the virtual function call per pixel affects the performance and wether the time spent on the call is added to the draw_triangle total time or to the shade time. I will try to remove the parameters from the shade function and find anonther way to let the shader know about the required values. Maybe that speeds things up a little.

[Edited by - Trenki on May 29, 2007 12:56:49 PM]
Quote:Original post by C0D1F1ED
Quote:Original post by Eitsch
for me the devmaster link doesn't work. could you tell us what article you mean?
thanks

Advanced Rasterization by Nicolas Capens, also known as c0d1f1ed. [wink]


You know this guy? Seems to be cool... [smile]
Quote:Original post by Trenki
I have finished my vertex and clipping pipeline, so I could do some speed tests with my rasterizer.

I recommend you to run some simple sweep tests. For example, keep the total rasterized pixels constant by rectangular area with strip of triangles. Vary the number of triangles per frame linearly, starting from 1 or 2 and going up. Notice that you will first be fill limited, and thus increasing triangle count does not have much effect. When the graph starts to climb linearly you are geometry limited and the slope gives you your peak triangle rate.

In second test, keep the triangle count constant, but vary the number of rasterized pixels. The slope will give you the peak pixel rate.

You can calculate the approximate number clks used per pixel basis and overhead caused by geometry processing. Ideally geometry would be processed in parallel with rasterization.
Quote:Original post by Trenki
I get 600fps at 320x240 and ~60fps at 1024x768 on my AMD Athlon64 3500+ (2.2 Ghz) for a scene with a single gouraud shaded cube. The cube constanly fills ~1/9 of the screen area. As I don't have any numbers to compare this to I don't know how good/bad this is.

What you really should be questioning is what are your own goals? Increasing performance at this point is without a doubt going to make your software hard to maintain. So if you want to add texture mapping and things like that, add it first. As long as things are interactive enough to test, performance is really fine. Once all functionality you need is implemented you can concentrate on the real bottlenecks. It's very likely that texturing will be a major new bottneck, so much of the work you'd currenly do on the gouraud shading would be largely pointless.

Also, if you really target the GP2X then you should only look at how it performs on that. At the resolution of 320x240 it might not even be worth it to use 8x8 pixel tiles. Different architectures have different needs. Maybe it's bandwidth limited and you should really concentrate on addressing that first...

So the best advice I can give you is to stop programming and start developing. Write down your goals so you have something to concentrate on. That way we also know what advice to give you, instead of sending you in the wrong directions. Trust me, pinpointing your goals is an extremely crucial step towards a succesful project.
I tought I'd keep you updated on my progress.

I now was able to speed things up considerably. Profiling on the PC also revealed that the division was a major bottleneck. There the __divdi3 function consumed 12%-16% of the performance. Now its only 0.6% left. The bottleneck was the (int64) division performend by the solve_plane and fixdiv<16> function. W is now computed by inverting 1/w using only a 32bit unsigned division which is faster. For the other parameters i compute step values from the plane equation once and reuse them.

Additionally by manually unrolling the loop which steps the varyings and the loop which steps over the render targets speed things up quite a bit.

All these changes made the rasterizer ~40% faster and it still uses one virtual function call per pixel which now is the bottleneck.

In the quest of removing the virtual function overhead I used C++ templates and member function pointers. Now the fragment shaders are ordinary functions which are passed as a template parameter to the triangle rasterization function which now can inline the shader. This gave an additional 30% speed improvement.

Now the rasterizer is nearly twice as fast and runs my test demo at 65-75fps on the GP2X (this time i turned vsync off).

This topic is closed to new replies.

Advertisement