Average pixels per triangle?

Started by
11 comments, last by Charles B 19 years, 9 months ago
Hi, I am trying to find out what the average nimber of pixels per triangle is for benchmarking purposes. I realize that this is a rather arbitrary question as it depends on the scene, resolution, etc. I guess what I am seeking is what has been considered and average pixel-count when others have benchmarked their performance so I have something concrete to compare to. I've googled for this information for about an hour both on gamedev and web-wide. Can anyone give me an answer or reasonable estimate? My software renderer currently draws 88k flat shaded triangles per second in 16bit color with vertices (10, 10)-(128, 12)-(12, 128). I will be implimenting a new edge-walking algorith which should push it over 100k, possibly as high as 120k.

throw table_exception("(? ???)? ? ???");

Advertisement
Depending on what you want to measure (setup or fill rate) there are different things that you can do. Back in the early CAD days (SGI Indigo anyone?) 25 pixel triangles were pretty much the norm (10x5 triangle). To be really confident about your rate, you need to rotate the triangle 360 degrees to be sure you aren't only taking the fast cases such as long scanlines.

I do remember back in the day my perspective correct texture mapping routine got about 5MP/s on a Pentium-90. Heh, that was fast for the time (I believe the first 3dfx Voodoo card was ~80MP/s and didn't have much CPU overhead, where my 5MP/s used all the CPU :)....
The problem is pretty simple. Just store the area in pixels of each rendered triangle (see the well known Graphics FAQ for info on that), store how many triangles are rendered and then find the ratio between these two
Well for games it's a normal scene which is probably quite small polys. You could tile the screen with 5x5 triangles in a grid of quads and while it's running gradually increase the poly size, storing poly size and fps data each frame for later. To take overdraw into account maybe draw the grid 2/3 times with each poly randomly chosen from 3 z locations?
Hi Ravyne, long time.

For a soft renderer, surely 25-50 pixels is a decent norma. Certainly you should launch 3 tests :
small triangles (10 pix ?)
=> hilights transforms + edge walking.
medium triangles (50 pix ?)
=> edge walking
large triangles (1000 pix ?)
=> pixel rate

The quickest edge walking exploits the fact that there are only two different kind of increments when you follow a Bresenham down (y++).

For instance if dx/dy = 3.26
Then the sequence is something ike this :
x+=3; y++;
x+=3; y++;
x+=4; y++;
x+=3; y++;

3 or 4 (==3+1) depends on a carry bit between the fractionnal part and the integer part of your x coordinate.

This becomes very powerful to quickly update the values of the linear gradients (for instance the screen pointer or the rgb or the homogenous texture coords).

This ends up with something like this :
carry = (xnew^xold)>>31L;
ptr += ptr_step[carry];
r += r_step[carry];
g += g_step[carry];
b += b_step[carry];

This saves the multiplications you might have in :
r = rx*x + ry*y + r00;

"Coding math tricks in asm is more fun than Java"
Yes Charles, it has been awhile, hasn't it? I've seen you make a few posts around gamedev and you're always on target. I've also been keeping an eye on the SBGFRC (that was the acronymn, wasn't it?) group. I'm very impressed by what I hear of your math lib, as well as the other memeber's contributions.

You are on target once again Charles, about using Bresenham's to walk the edges (currently I am using float math, just to get something up quick and dirty ;) ) Because my span-line filler takes three parameters: a pointer to the beginning of the line segment in the framebuffer, a color value, and a pixel count, I intend to calculate the pointer and count while walking the left edge, and only the count while walking the right edge. This, of course, leaves me with exactly the information I need, while eliminating floats and making the algo purely incremental.

I should note that this triangle filler is part of a 2D software rendering system, perhaps rasterization would have been a better choice in words. I do hope to apply a modified version in my 3D software graphics lib, of course I will then have to track texels, normals, and lighting among other things. If you have any additional advice I would be glad to hear it.

Thanks everyone for you're help, I appreciate it all very much.

throw table_exception("(? ???)? ? ???");

...I intend to calculate the pointer and count while walking the left edge, and only the count while walking the right edge.
Right it's the way to do. Just beware of exact precision losses for long spans. Or some edges between two triangles will reveal cracks/distortions in texture continuity or lighting.

If you have any additional advice I would be glad to hear it.

Maybe this. I remember the implementations details to make it more efficient in asm. But since you already know the math trick you probably also know it. Anyway this was something like that :

; use memory for constants (read only)
; => more read/write registers free
add eax, Dx_lo
; or this for better scheduling
;mov edx, Dx_lo
;add eax, edx
sbb ecx, ecx ; -carry
mov edi, prev_ptr
add edi, inc_ptr[4+4*ecx]
; etc...

Else I knew a multitude of tricks combining floating points, integers or MMX. But that's kind of outdated ... just like software rendering is :)

For instance a true bilinear filtering + magnification in less than 10 cycles per pixel with floating points only. It worked on a Pentium 90 (non MMX). But that was very tricky and a real headache to code. These kind of 100 lines long asm code that take many days of chess playing to produce. Here my sincere advice is don't try to do that again. That was only useful coz the really first 3D cards were such a pity ;)

Now there could be more interesting challenges for fun. With SIMD caps and high frequencies, I think a more than decent 3D renderer could be done. I suppose one could make surprinsingly good things in pure software compared to a code based on 3D hardwares and written with average quality.
"Coding math tricks in asm is more fun than Java"
Quote:Original post by Charles B
For a soft renderer, surely 25-50 pixels is a decent norma. Certainly you should launch 3 tests :
small triangles (10 pix ?)
=> hilights transforms + edge walking.
medium triangles (50 pix ?)
=> edge walking
large triangles (1000 pix ?)
=> pixel rate


I wanted to post again for comparison purposes before I impliment the new algo (Note these measurements are as close as I can easily manage, but are not dead-accurate.)

Small Triangle (10 pix) = 2.35 million triangles per second.
Medium Triangle (50 pix) = 1.36 million
Large Triangle (1000 pix) = 386 thousand

throw table_exception("(? ???)? ? ???");

What would be the most interesting to you know is see where you can improve things :
- transfos + projection
- clipping
- rasterizing (edge walking)
- filling

1000 pix triangles give you mostly the fill rate :
386 millions pixels per second, if your machine is 2GHz, then it's 2000/386 = 5 cycles per pixel. Rather correct, though you can certainly improve (write 8 aligned bytes at once with MMX for instance).

Now getting the numbers for smaller triangles lets appear the contribiutions of the earlier pipeline stages of rendering.

50 pix : 68 M pix/sec (29 cycles/pix)
10 pix : 23 M pix/sec (86 cycles/pix)

860 cycles per 10pix triangle looks quite big. This means that filling a 1 Mega pixels offscreen with such triangles would get you 23FPS. I think it can be improved so that your software renderer can propose an alternative to 3D cards. I know it's probably a pedagogic study case, still 100FPS could let you see more opportunities for this soft.

- The three projections should cost around 50-75. Less with SIMD. I don't mention strips or indexed arrays that let only one vertex be transformed per triangle on the average.

- Then maybe vectorial clipping is a bottleneck. It could be disabled by the user when the bounding volume of a mesh is known to be fully visible.

- Getting the projected edge slopes, the gradients (color texture, etc...) is where I had to focus most of my energy for small triangles.

- Edge walking can also be time consumming. I remeber that I focused on that point nearly as much as on the inner loop. (*)

- The inner loop becomes a bit more complicated when you unroll the loops or treat 4 pixels at once.

(*) Very small triangles could be drawn very fast with some SIMD features. Basically imagine you draw a 4X or 8X width rectangle and you apply a triangle mask on it. Compute 4 aligned columns at once and use write masks to take the edge borders into account. Doing this instead of classical edge walking remove the time consumming overhead of the inner loops. It also helps packing the pixels for 4X or 8X fill rate, even on the edges.

My intuition is that the current machines should allow a peak of 50-100 Mega 10pix non textured triangles per second. But that would certainly not be a piece of cake to code. Your results are already very decent.
"Coding math tricks in asm is more fun than Java"
UPDATE:

I have implimented the new algorithm, and while there are a few cases not working currently, my initial results have been very promising. In tests of a static triangle rendered many times. I achieved the following performance in 32bit color, 640x480:

9 pixel - 5.40m (320, 32) - (318, 35) - (323, 35)
25 pixel - 4.40m (320, 32) - (315, 37) - (325, 37)
100 pixel - 2.50m (320, 32) - (310, 42) - (330, 42)
400 pixel - 1.25m (320, 32) - (300, 52) - (340, 52)
900 pixel - 740k (320, 32) - (290, 62) - (350, 62)

Again, these are 2D solid-shaded triangles, so theres nothing fancy going on. The performance gain was much greater than expected. I'm sure performance will drop when the algo works for every case, but not much if my understanding of the failure is accurate. I would also like to note that I currently only use asm in the span-filler, and while I am aiming for a minimum of MMX support I haven't yet implimented any MMX techniques.

For those interested my relevant specs are listed below:
Intel 875P mobo (Dell 400sc)
Pentium4 800fsb 3.0ghz
Dual-Channel DDR400 1gig

throw table_exception("(? ???)? ? ???");

This topic is closed to new replies.

Advertisement