float or double depth buffer?

Started by
30 comments, last by fir 9 years, 10 months ago

im mending and rewriting my software rasterizer (this one screenshot i presented

before :

[attachment=22171:tie3.png]

and i wonder if i better should use float or double for values in depthbuffer - got no idea, could someone hint something?

Advertisement

try both, then make an informed decision

Madhed is right. It's easy enough to just try both and compare output quality vs execution time to make your own informed decision based on the specific domain/application you're working with.

You might also consider fixed-point and integral buffers, not entirely uncommon in hardware. Also, a lot of tuning can go into figuring out _what_ you store in your depth buffer (normalized z? w? etc.) rather than just _how_ it's stored.

Sean Middleditch – Game Systems Engineer – Join my team!

in case of software rasterizer, you should use native floating format in first place. On many systems it is the 64bit floating number, not 32bit floating number. Falling to 32bit on such systems where 64bit is native, might result even in performance decrease (I have once myself set 16 bit depth buffer on a 24S8 prefered gpu, and fps got halved!)

How to find out what system-hardware likes and is optimized for is another tale though. Dx offers things such as depth-format-check as I remember well, but in case of software rasterizer, you will need something else.

Madhed is right. It's easy enough to just try both and compare output quality vs execution time to make your own informed decision based on the specific domain/application you're working with.

You might also consider fixed-point and integral buffers, not entirely uncommon in hardware. Also, a lot of tuning can go into figuring out _what_ you store in your depth buffer (normalized z? w? etc.) rather than just _how_ it's stored.

alright youre plobabry right can do them both

also would like to know some hints for general optymization, now Im just using strightforward scanline approach (wandering on triangle edges from up to down and drawing each scanline with depth),

probably some more eleborate tehniques are avaliable but I m getting lost in this (when readed some threads people using more optymized approaches), didint understand what shoukd i try at start

in case of software rasterizer, you should use native floating format in first place. On many systems it is the 64bit floating number, not 32bit floating number. Falling to 32bit on such systems where 64bit is native, might result even in performance decrease (I have once myself set 16 bit depth buffer on a 24S8 prefered gpu, and fps got halved!)

How to find out what system-hardware likes and is optimized for is another tale though. Dx offers things such as depth-format-check as I remember well, but in case of software rasterizer, you will need something else.

Im using mingw compiler and old core2duo processor right now (on 32 bit xp), curiously previous version of this rasterizer i compiled with old borland compiler (i was very accustomed to it so i used it a couple of years) so it was weakly generated (as bcc32 is a compiler made by 2000 or something) and as far as i remember thiose model above give about 10fps there (i mean compiled by bcc32 and runned on core2duo) , now i want to revrite it to mingw and optymize it as far as i could - I could touch assembly (which I know sadly weak ) if i would know what really i should do, but got no idea

you may setup a 16 bit depth buffer but the operation itself will perform on a 32bit floating number after all (or 64 bit one) needing you to convert and interpret the 2 bytes. This 16 bit reduction can still be reasonable if you want to save cache coherency, or compute short integers. but cache coherency of 2d arrays (render targets or texture samplers) is usualy optimized for 4 byte or 8 byte storage atoms.

The cache coherency in enormous 2d arrays is mainly achieved by smart subdividing the areas, and after fetching or writing, continueing all halted threads that demanded the seeked memory operation, and then continuing threads whose memory operations are close to current cache burst. So the atom size is quite marginal against good managing of memory and threads that ask for memory.

You are writing a software raterizer, but you still should construct your rasterizer to acount for many (many many) cores available, like if it was to run on a gpu. Cache coherency is a crutial thing, if you achive good cache flowing, you will see impossible speed boost.

you may setup a 16 bit depth buffer but the operation itself will perform on a 32bit floating number after all (or 64 bit one) needing you to convert and interpret the 2 bytes. This 16 bit reduction can still be reasonable if you want to save cache coherency, or compute short integers. but cache coherency of 2d arrays (render targets or texture samplers) is usualy optimized for 4 byte or 8 byte storage atoms.

The cache coherency in enormous 2d arrays is mainly achieved by smart subdividing the areas, and after fetching or writing, continueing all halted threads that demanded the seeked memory operation, and then continuing threads whose memory operations are close to current cache burst. So the atom size is quite marginal against good managing of memory and threads that ask for memory.

You are writing a software raterizer, but you still should construct your rasterizer to acount for many (many many) cores available, like if it was to run on a gpu. Cache coherency is a crutial thing, if you achive good cache flowing, you will see impossible speed boost.

but i got no blind idea how to obtaijn that cache coherency

right now i got just an array of traingles which im looping on and transform -> project -> rsterize with scanline approach to the frame buffer in a very strightforward way. Not even works right now as i rebuild prewious half spagetti into a bit more tidy system, but after a two or three days from now I will mend it, profile it more and try to think what can do to rebuild my strightforward approach ino something a bit quicker if possible

but i got no blind idea how to obtaijn that cache coherency

for example if you are reading some outer pointer memory (not cached on stack) per every cycle (pixel) while reading or writing into a big (2d) array of pixels, you have then just reduced speed by multiple times. Can be 10 or 100 times.

but i got no blind idea how to obtaijn that cache coherency

for example if you are reading some outer pointer memory (not cached on stack) per every cycle (pixel) while reading or writing into a big (2d) array of pixels, you have then just reduced speed by multiple times. Can be 10 or 100 times.

here as an input i got an raw array of triangles (9 floats sadly as i got only white geometry not colorized)

as an output i draw scanline triangle to frame and depth buffer,

each triangle in general is jumping on those buffers here, though probably they are somewhat (if not strongly) coherent in the model file

in the middle there are 3d transfrormation of input triangle into eye space then 3d plane clipping , projection then 2d clipping - that would be all if i remember it correctly

- this all is somewhat coherent as to ram acces though not 100% coherrent in acces to frame and depth buffers - stil im worried that this middle calculation have bigger impact on this then ram flow,

but im not sure

(when i mend it today or toomorow i will try to profile this a bit)

This topic is closed to new replies.

Advertisement