void Rasterizer::clear_z_buffer()
{
for(int i = 0; i < m_screen_width; ++i)
{
for(int j = 0; j < m_screen_height; ++j)
{
m_z_buffer[j] = 1.0f;
}
}
}
[\code]
thank you!
An efficient method of clearing my Z buffer?
Hi im writing a scanline based software rasterizer. and have just reached the stage of having Z buffered perspectively correct textured objects.
and after some profiling its shown that the most significant amount of time spent in my program is in my clear z buffer routine which literally loops through and re initialises the array
could anyone please give some suggestions on a better way to do this please?
i did consider having 2 z buffers, with a seperate thread to clear the one not currently in use, but im sure there must be a better way, as im using SDL and the function to clear each frame doesnt show up anywhere near as high in my "time spent"
For starters I wouldn't use a 2D array; I'd use a 1D array sized as a 2D array.
After that it's just a matter of filling the buffer quickly with the data; std::fill() or std::fill_n() would do the job.
After that it's just a matter of filling the buffer quickly with the data; std::fill() or std::fill_n() would do the job.
You should be able to use an SDL_Surface for it, since SDL sometimes has HD accelerated drawing. Otherwise, looping through manually is faster than memset
there is a trick you can pull if you are guaranteed to be redrawing over the whole screen every frame. store the z values signed. then on odd frames flip z values and change the z compare mode to larger (. this will work as a long as you are guaranteed to write over every pixel every frame.
Modern graphics cards, as well as having the actual array of depth values, also have a quad-tree hierarchy on top of it. Each node in the hierarchy stores the min/max depth value in the cells below it and also has a flag specifying whether there are any values stored below it at all.
Using this system, clearing the buffer only involves setting a flag on the root node ;) however, your ZReads and ZWrites obviously become more complex.
Using this system, clearing the buffer only involves setting a flag on the root node ;) however, your ZReads and ZWrites obviously become more complex.
Quote:Original post by phantom
For starters I wouldn't use a 2D array; I'd use a 1D array sized as a 2D array.
After that it's just a matter of filling the buffer quickly with the data; std::fill() or std::fill_n() would do the job.
thanks. what does std::fill do differently than just itterating and assigning values to the elements. im trying to avoid any use of the stl in my app
Quote:Original post by zacaj
You should be able to use an SDL_Surface for it, since SDL sometimes has HD accelerated drawing. Otherwise, looping through manually is faster than memset
Im using software only SDL for my app, but it does seem that sdl fill on a surface is much quicker than how I am doing it. may try that and benchmark it.... im assuming this will require me using fix point values for my Z buffer though
Quote:Original post by ibebrett
there is a trick you can pull if you are guaranteed to be redrawing over the whole screen every frame. store the z values signed. then on odd frames flip z values and change the z compare mode to larger (. this will work as a long as you are guaranteed to write over every pixel every frame.
ah I do like this idea. im not drawing over every pixel, but i might be able to figure out a way of having multiple buffers and utilise it as part of my swap chain, as the back buffer gets cleared I could swap in the Z buffer for this and render to that next frame. hmm will have a think as it might be possible to get it almost for free this way
Quote:Original post by Hodgman
Modern graphics cards, as well as having the actual array of depth values, also have a quad-tree hierarchy on top of it. Each node in the hierarchy stores the min/max depth value in the cells below it and also has a flag specifying whether there are any values stored below it at all.
Using this system, clearing the buffer only involves setting a flag on the root node ;) however, your ZReads and ZWrites obviously become more complex.
this does sound a lot more complex than im willing to do for this. but if i cant gain any significant performance with other methods i might look into this.
thanks alot for your help guys!
Quote:Original post by StowellyQuote:Original post by phantom
For starters I wouldn't use a 2D array; I'd use a 1D array sized as a 2D array.
After that it's just a matter of filling the buffer quickly with the data; std::fill() or std::fill_n() would do the job.
thanks. what does std::fill do differently than just itterating and assigning values to the elements. im trying to avoid any use of the stl in my app
Technically, probably very little; it'll correctly handle non-POD types and use the best system for setting the memory for POD types. The compiler might be able to do more aggressive inlining as well in some situations.
The bigger question however is why you want to avoid using the Standard C++ Library in your app?
You have a lot of cache misses in your clear_z_buffer implementation.
You should declare your buffer as m_z_buffer[height][width], because of the way C/C++ stores its multidimmensional arrays. I'll try to explain this by example.
When you have these declarations:
float buffer[1024][768]; // It should be float buffer[768][1024] for it to be efficient
float *p = (float*)buffer;
These statements would be true:
&buffer[1][0] == &p[1*768 + 0];
&buffer[0][1] == &p[0*768 + 1];
So if you're copying and incrementing the first number in the [][] pair, you're jumping around by 768 elements, which isn't particularly cache friendly, it's way more efficient when the accesses are done linearly. I remember this being explained in some book (don't remember which one though, it could have been Game Coding Complete by Mike McShaffry). It also did a small benchmark. The performance difference between correct and incorrect use of the [][] was huge.
And then your function should look like this:
Or this:
However, the compiler will most likely optimize the first variation to something simillar to the second one.
Steve Mc Connel confirmed this optimization to be applied by the compiler in his book Code Complete, 2nd Ed.
[Edited by - Giedrius on February 17, 2010 10:21:39 AM]
void Rasterizer::clear_z_buffer(){ for(int i = 0; i < m_screen_width; ++i) { for(int j = 0; j < m_screen_height; ++j) { m_z_buffer[j] = 1.0f; } }}
You should declare your buffer as m_z_buffer[height][width], because of the way C/C++ stores its multidimmensional arrays. I'll try to explain this by example.
When you have these declarations:
float buffer[1024][768]; // It should be float buffer[768][1024] for it to be efficient
float *p = (float*)buffer;
These statements would be true:
&buffer[1][0] == &p[1*768 + 0];
&buffer[0][1] == &p[0*768 + 1];
So if you're copying and incrementing the first number in the [][] pair, you're jumping around by 768 elements, which isn't particularly cache friendly, it's way more efficient when the accesses are done linearly. I remember this being explained in some book (don't remember which one though, it could have been Game Coding Complete by Mike McShaffry). It also did a small benchmark. The performance difference between correct and incorrect use of the [][] was huge.
And then your function should look like this:
void Rasterizer::clear_z_buffer(){ for(int j = 0; j < m_screen_height; ++j) { for(int i = 0; i < m_screen_width; ++i) { m_z_buffer[j] = 1.0f; } }}
Or this:
void Rasterizer::clear_z_buffer(){ float *pData = (float*)m_z_buffer; float *pEnd = pData + (m_screen_width * m_screen_height); while (pData != pEnd) *pData++ = 1.0f;}
However, the compiler will most likely optimize the first variation to something simillar to the second one.
Steve Mc Connel confirmed this optimization to be applied by the compiler in his book Code Complete, 2nd Ed.
[Edited by - Giedrius on February 17, 2010 10:21:39 AM]
Quote:Original post by ibebrettFYI, I successfully use this technique in my own software 3D engine.
there is a trick you can pull if you are guaranteed to be redrawing over the whole screen every frame. store the z values signed. then on odd frames flip z values and change the z compare mode to larger (. this will work as a long as you are guaranteed to write over every pixel every frame.
I also have implemented an E-Buffer and S-Buffer in addition to a flipping and non-flipping-Z-buffer, so I've tried the lot really.
The flipping-Z-buffer noted here works well, but I actually get a little more speed from the S-Buffer (with a slightly loss of accuracy for intersection polygons).
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement