An efficient method of clearing my Z buffer?

Started by
8 comments, last by CPPNick 14 years, 1 month ago
Hi im writing a scanline based software rasterizer. and have just reached the stage of having Z buffered perspectively correct textured objects. and after some profiling its shown that the most significant amount of time spent in my program is in my clear z buffer routine which literally loops through and re initialises the array could anyone please give some suggestions on a better way to do this please? i did consider having 2 z buffers, with a seperate thread to clear the one not currently in use, but im sure there must be a better way, as im using SDL and the function to clear each frame doesnt show up anywhere near as high in my "time spent" void Rasterizer::clear_z_buffer() { for(int i = 0; i < m_screen_width; ++i) { for(int j = 0; j < m_screen_height; ++j) { m_z_buffer[j] = 1.0f; } } } [\code] thank you!
http://stowelly.co.uk/
Advertisement
For starters I wouldn't use a 2D array; I'd use a 1D array sized as a 2D array.

After that it's just a matter of filling the buffer quickly with the data; std::fill() or std::fill_n() would do the job.
You should be able to use an SDL_Surface for it, since SDL sometimes has HD accelerated drawing. Otherwise, looping through manually is faster than memset
there is a trick you can pull if you are guaranteed to be redrawing over the whole screen every frame. store the z values signed. then on odd frames flip z values and change the z compare mode to larger (. this will work as a long as you are guaranteed to write over every pixel every frame.
Modern graphics cards, as well as having the actual array of depth values, also have a quad-tree hierarchy on top of it. Each node in the hierarchy stores the min/max depth value in the cells below it and also has a flag specifying whether there are any values stored below it at all.
Using this system, clearing the buffer only involves setting a flag on the root node ;) however, your ZReads and ZWrites obviously become more complex.


Quote:Original post by phantom
For starters I wouldn't use a 2D array; I'd use a 1D array sized as a 2D array.

After that it's just a matter of filling the buffer quickly with the data; std::fill() or std::fill_n() would do the job.


thanks. what does std::fill do differently than just itterating and assigning values to the elements. im trying to avoid any use of the stl in my app


Quote:Original post by zacaj
You should be able to use an SDL_Surface for it, since SDL sometimes has HD accelerated drawing. Otherwise, looping through manually is faster than memset


Im using software only SDL for my app, but it does seem that sdl fill on a surface is much quicker than how I am doing it. may try that and benchmark it.... im assuming this will require me using fix point values for my Z buffer though


Quote:Original post by ibebrett
there is a trick you can pull if you are guaranteed to be redrawing over the whole screen every frame. store the z values signed. then on odd frames flip z values and change the z compare mode to larger (. this will work as a long as you are guaranteed to write over every pixel every frame.



ah I do like this idea. im not drawing over every pixel, but i might be able to figure out a way of having multiple buffers and utilise it as part of my swap chain, as the back buffer gets cleared I could swap in the Z buffer for this and render to that next frame. hmm will have a think as it might be possible to get it almost for free this way


Quote:Original post by Hodgman
Modern graphics cards, as well as having the actual array of depth values, also have a quad-tree hierarchy on top of it. Each node in the hierarchy stores the min/max depth value in the cells below it and also has a flag specifying whether there are any values stored below it at all.
Using this system, clearing the buffer only involves setting a flag on the root node ;) however, your ZReads and ZWrites obviously become more complex.


this does sound a lot more complex than im willing to do for this. but if i cant gain any significant performance with other methods i might look into this.


thanks alot for your help guys!
http://stowelly.co.uk/
Quote:Original post by Stowelly
Quote:Original post by phantom
For starters I wouldn't use a 2D array; I'd use a 1D array sized as a 2D array.

After that it's just a matter of filling the buffer quickly with the data; std::fill() or std::fill_n() would do the job.


thanks. what does std::fill do differently than just itterating and assigning values to the elements. im trying to avoid any use of the stl in my app


Technically, probably very little; it'll correctly handle non-POD types and use the best system for setting the memory for POD types. The compiler might be able to do more aggressive inlining as well in some situations.

The bigger question however is why you want to avoid using the Standard C++ Library in your app?
You have a lot of cache misses in your clear_z_buffer implementation.

void Rasterizer::clear_z_buffer(){    for(int i = 0; i < m_screen_width; ++i)    {        for(int j = 0; j < m_screen_height; ++j)        {            m_z_buffer[j] = 1.0f;        }    }}


You should declare your buffer as m_z_buffer[height][width], because of the way C/C++ stores its multidimmensional arrays. I'll try to explain this by example.

When you have these declarations:

float buffer[1024][768]; // It should be float buffer[768][1024] for it to be efficient
float *p = (float*)buffer;

These statements would be true:

&buffer[1][0] == &p[1*768 + 0];
&buffer[0][1] == &p[0*768 + 1];

So if you're copying and incrementing the first number in the [][] pair, you're jumping around by 768 elements, which isn't particularly cache friendly, it's way more efficient when the accesses are done linearly. I remember this being explained in some book (don't remember which one though, it could have been Game Coding Complete by Mike McShaffry). It also did a small benchmark. The performance difference between correct and incorrect use of the [][] was huge.


And then your function should look like this:

void Rasterizer::clear_z_buffer(){    for(int j = 0; j < m_screen_height; ++j)    {        for(int i = 0; i < m_screen_width; ++i)        {            m_z_buffer[j] = 1.0f;        }    }}


Or this:

void Rasterizer::clear_z_buffer(){    float *pData = (float*)m_z_buffer;    float *pEnd = pData + (m_screen_width * m_screen_height);    while (pData != pEnd) *pData++ = 1.0f;}


However, the compiler will most likely optimize the first variation to something simillar to the second one.

Steve Mc Connel confirmed this optimization to be applied by the compiler in his book Code Complete, 2nd Ed.

[Edited by - Giedrius on February 17, 2010 10:21:39 AM]
Quote:Original post by ibebrett
there is a trick you can pull if you are guaranteed to be redrawing over the whole screen every frame. store the z values signed. then on odd frames flip z values and change the z compare mode to larger (. this will work as a long as you are guaranteed to write over every pixel every frame.
FYI, I successfully use this technique in my own software 3D engine.
I also have implemented an E-Buffer and S-Buffer in addition to a flipping and non-flipping-Z-buffer, so I've tried the lot really.
The flipping-Z-buffer noted here works well, but I actually get a little more speed from the S-Buffer (with a slightly loss of accuracy for intersection polygons).
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Giedrius.

This topic is closed to new replies.

Advertisement