Average pixels per triangle?

Started by
11 comments, last by Charles B 19 years, 9 months ago
Yes a 2X speedup with just one element from the 8 maybe that can significantly improve perfs. ;) You see in the end 2^8 = 64. There is surely still a good margin. 6 mega triangles already looks promissing. You are in the aera of a Dreamcast, not bad at all. I suppose that if your edge walking was in assembly too, the span rendering inlined, (a trapazoid routine replces the span filler), you would gain another X2 for small triangles. A function call overhead costs much on small loops.

Can you explain concisely what is the buggy case ? Is this related to clipping ? Or is it related to the way you compute the starting x ? It's a typical source of graphical bug (lacking pixels on edges etc...). You must be really precise on how you take the fractional part into account, how you jump to the exact next line under the vertex, and how you deal with fractionnal precision. How you change the xstart, when a new left edge is treated, etc ...

But be sure it's worth the pain for you keeping on improving your code. Because it's quasi an abyssal job to reach the 'perfect code' for triangle rendering.
"Coding math tricks in asm is more fun than Java"
Advertisement
Yes, I'm sure asm would benefit the edge-walking, although I don't know if I want to go quite that far. I'll be able to recycle this code for next years CG class, but they discourage using lots of asm, I suppose I'll look into it for my own benefit though. The Span-filler is small and inlined, so there shouldn't be a function call. I suppose I should examine the compiler's output just to be sure.

The failing cases seems to do with the order of the vertices. It should be a matter of ensuring order at the top of the function or reworking the algo to accept unordered vertices, I have a feeling that the second option will be faster and eliminate some conditional jumps. In either case, the inner loop would remain the same.

I have also confirmed my suspicions(spelling?) about why 16 bit color is slower than 32bit in triangle rendering. It was indeed an alignment issue. Other functions did not exhibit this behavior as they were always aligned. Aligning to 32 or 64bit boundaries should increase that performance by around 2-3 times.
That would put 16bit triangles in the neighborhood of 12-18m. In any case, I plan to be switching to an MMX span-filler in the near future.

As for whether this continued improvement is worth it, I believe it is for my own education. Unfortunately Software rendering is not a very viable skill these days, With all the new engines coming out with (sometimes requiring) pixel-shader support, software becomes unusable even as a fall-back renderer. UT2k4 contains a software mode writen by Michael Abrash, which actually pieces together hand coded ASM fragments as features are turned on or off, and even he states that PS support could not be done to satisfactory quality and speed. Maybe this could change when multi-core CPUs/SMP systems become common. Using one core (or CPU) to simply do rasterization, leaving the other to perform geometry calculations (building a display list for the other core/CPU,) gameplay, etc... Anyhow I'm wandering off topic.


I believe it will be worthwhile when I one day get an interview and the man behind a desk says "We were very impressed with your use of Direct3D in your demos." to which I can reply "Actually, thats all my own software rendering routines." Hopefully sealing the deal :D I suppose they'll know when they look through the source, but I'm allowed to have occasional delusions of grandure, right?

Thanks again for your input Charles.

throw table_exception("(? ???)? ? ???");

Quote:Original post by Ravyne
The failing cases seems to do with the order of the vertices. It should be a matter of ensuring order at the top of the function or reworking the algo to accept unordered vertices, I have a feeling that the second option will be faster and eliminate some conditional jumps. In either case, the inner loop would remain the same.

I'd start by handling :
- backculling : always
- ccw : always
Then you can start making your API compatible with OpenGL philosophy, let user choose each option.

So now this leaves you a crossproduct (if verts are already projected) and sign evaluation. If the sign is positive your rasterizer won't have to cope with inversely ordered triangles. Even though, spans of negative length should render nothing. for(x=a; x<b; x++) renders nothing when a>=b. I don't remeber any difficulty with this problem. Unless you want to accept any kind of winding order. Then you probably better have to swap the vertices 2 and 3 at the top of the function after the backfacing test.


Quote:
It was indeed an alignment issue.

Sure, I forgot to answer to this question. But misalignement and even 16 bit accesses are reknown perfs killers. I remember it was often faster to write 2 bytes on the first pentiums. The best is to group 16 bits pixels two by two (or, shift) and write 32 bits aligned data at once. Now same thing with 64 bits data if you use the MMX. This means you have to deal with some nasty loop preambles and postambles. Then comes the idea of the masks back again. The loop is complete ;)


Quote:
Anyhow I'm wandering off topic.

Software rendering is fundamental to understand the 3D hardware far more in depth. Having tackled the perfs issues in software rendering enables an intuitive understanding on how the undocummented parts of the hardwares may work, and how to help them speed up. Now I also consider that after the dull years of the first hardwares, the developments of shaders tend to let the coders reappropriate the lower levels of rendering code. So it's not as obsolete concern as it seems. Maybe on longer term, the GPUs will enable even more layers of coding, possibly letting a coder redefine a completely customized rendering pipeline, for instance rendering complex shapes without tesselation. Example direct nurb rasterization. The architecture of the PS2 for instance was in this direction.


Quote:
I suppose they'll know when they look through the source, but I'm allowed to have occasional delusions of grandure, right?

Sure it's one way to go. The other is hard work and produce things. All a question of balance with some clear objectives in mind.
"Coding math tricks in asm is more fun than Java"

This topic is closed to new replies.

Advertisement