Quote:Original post by JoeJ
Stop thinking about too hardware specific stuff - improve th algorithm first! :)
Stop attempting to improve the algorithm - implement the standard graphics pipeline first! ;-)
Seriously, hundreds of researchers and engineers have formed the graphics pipeline as we know it today. The z-buffer has proven to be very valuable. Not only does it perform well, both in theory as in practice, it's accurate for any situation.
Quote:This is still the best VSD algo I know, nothing I've ever read seems better. Or am I wrong?
It totally depends on the situation. If you're rendering a static low-polygon scene and you have access to the BSP, it can be very efficient. But for other conditions this approach either just doesn't work or shows a bad worst-case behavior. Furthermore, if I'm not mistaken this doesn't work with intersecting polygons.
Quote:Hierarchical Z-Buffers looks like a very inefficient solution to me, acceptable only for hardware rendering.
It's not inefficient at all, in its context. It reduces bandwidth significantly, and avoids the rendering of whole tiles of pixels.
So if hierarchical z-buffers are so perfect for hardware rendering, why doesn't it apply to software rendering? Well, first of all, there's a totally different balance between processing power and bandwidth. On a CPU, for every 32-bit of bandwith you only have about 1.5 clock cycles to do your processing. So any attempt at reducing bandwidth is rather futile; you're processing limited, so just use the damn bandwidth. On a GPU, you're sharing the bandwidth with tens of processing units. So if you don't limit bandwidth usage, many of those units won't recieve any data and your expensive hardware can use only a fraction of its processing power. Secondly, on a CPU you do have the ability to pull pixels 'out of the pipeline'. By doing the z-test early you can skip that pixel if it's hidden, and start processing the next pixel right away. In hardware, pixel processing is 'all or nothing'. Once pixels enter the pipeline, you can't pull them out again. You can only make decisions on batches (tiles) of pixels. So it's natural to use at least one layer of hierarchical z-buffer to perform the z-test for a whole tile before it enters the pixel pipeline.
So should we "stop thinking about too hardware specific stuff"? I don't think so. The choice of algorithm heavily depends on the architecture. On the other hand, "premature optimization is the root of all evil". So even if you think you know what's faster for the hardware you're working with, it's safer to implement the straightforward approach and get it fully functional first, then profile for the real bottlenecks, and only then start worrying about how to impove performance (if necessary). It's so situation dependent that any guesses without prior experience are almost always going to be wrong.
So, applying that to this thread's topic, I really advise clapton to first try the standard z-buffer. It's simple, quite efficient in most situations, and produces perfect results. After every other aspect of his application is finished, he can profile performance and determine whether the z-buffer is a bottleneck or not. If not, fantastic. If it is, see you all again in a few months... ;-)