Z performance & alpha blending compat

Started by
3 comments, last by HateWork 13 years, 8 months ago
Hi there, i've already searched for this in the forums and it's not there so i can post the situation here.
I've made an ilustration for easy understanding of the problem. It's quiet simple, direct and it won't take you much time reading and understanding.

I implemented very easily the z-buffer. It sorts all of my objects as i desire. BUT the performance went to the bottom. With a very low end device and before z implementation i was having 204 fps and now with z enabled y have 114, this makes a huge diference. I'm not using all of these fps i just need 45/55, but all of those remaining fps are reserved for the application, so, the more fps the best right?
Now, reading the SDK, it says that when using the z-buffer, performance can be achieved by drawing objects from front to back, that sounds reasonably. Ok, i did that and from 114 fps i went to 128, at least something.

The real problem is that i'm drawing textured quads (fixed pipeline functions) and when a texture contains an alpha channel with transparent/semitransparent areas i get the following:



The scenario is:
Left side, we have 2 textured quads: A soccer ball with transparent and semitransparent areas. Also we have the second quad wich is an opaque square. The ball is drawn first always. The alpha blending is working fine as we can see the ball is perfectly merged with light blue background.
Right side, we draw from front to back and sucessfully put the ball in front of the square but note the corners and surrounding borders of the ball there should be red pixels from the square.
It's totally normal and i understand why this happends. The z test and alphablending are working as they should. This happends because the ball is drawn first and get the pixel colors from what is behind and in this case is the blue background, then when the square is drawn in the same coords as the ball, the pixels sharing the same space with the ball's rect are not drawn because of the z.test doing it's job.

The question is quite obvious: How can i draw the square behind the ball and keep the missing red pixels visible?

An obvious answer would be: Easy! in this case draw the square first and the ball at the end. Well, that's not posible because in my application, manual sort is not posible. The square n ball is just an example, other multiple objects are drawn in a batch.
If i draw in reverse order all of the objects it works fine but what's the point then of this post.

Another question: If i forget about this front to back thing to get rid of this alpha issue and draw normally from back to front. Is there another way to get more performance when z-buffer is enabled, tips?

Thanks in advance and i hope you liked my picture ;).
Advertisement
For opaques the ideal way to render to is to draw front to back. If you don't do it this way, you sacrifice performance but that's about it.

For transparents you have to render back to front. If you don't do it this way you will get incorrect results with alpha-blending. You'll also want to disable z-writes (but leave z-testing on).

If it's possible, I would suggest putting opaques and transparent into separate "bins" when they're submitted to your renderer. Then you can sort them separately.
Thanks MJP for the reply.
If i disable z-writes then the zbuffer does not work at all. How can i leave enabled only the test? (just in case) I'm using d3d9, clearing the z-buffer and setting it to 0, the zfunc is greater or equal.

I'm taking your advice and will rewrite the drawing/sorting algoritm to something like this (i'll review it in case someone needs something like this too, i have implemented something quite similar and it's giving me great performance for 2D content with z-buffer off). So, here's my secret:

First will try with Z off (because i get more performance when drawing 2D). Then i'll try enabling it and changing draw order.
One big VB per FVF (one lock per frame). No additional SetStreamSource.
Smaller fixed IB per geometry. One lifetime write. Fewer additional SetIndices.
I'll code a drawing time line. Items will be added to this line in the order they were created. If an item has the same geometry or is similar than an item back in the line AND if this item does not occlude (partially or totally) other items in the line then these objects are batched together. If an item occludes entirely other item(s) and it hasn't alpha then delete occluded items from the line (this is perhaps an efficient z-occlude test by cpu, at least it performs the test based in an entire region other than per pixel as it is done in the gpu). If an item has the same geometry to be batched with other items but it is occluding partially or is occluding totally but has alpha then this item can not be batched and has to remain foward in the line.

This is the main idea. I hope this helps someone writing a D3D based library/program from scratch to draw 2D. It doesn't sacrifice performance nor mess with z-buffer, scissor test and stenciling.
My actual implementation gives me at maximum 235/248 fps at medium-high content load (only 2D), with many texture changes and with an older and motherboard integrated hardware.
I'll try this new implementation and post the results.

In the meanwhile any z-buffer optimizations would be appreciated.
Since you are not actually using semi-transparent objects, and are just using alpha as a (fully opaque vs. fully transparent) mask, you should be using alpha-TESTING, not alpha-BLENDING to get transparency, then you can draw from front to back. This way, the alpha-TESTED pixels in your soccer ball that fail the alpha test will not write Z values, which cause the corresponding pixels in the red quad to be depth killed.
I'm back here to post my results. Basically what i did: I created my own Z-Test/Z-Buffer that works on regions instead of pixels and it runs on older hardware and older DX versions. It runs amazingly in 2D scenarios and i also added 3D support (although i haven't tested it yet). I used the steps that i explained above, with special efforts in grouping objects with similar attributes.

RESULTS: It works great!!! even better than the hardware z-test performed by the GPU. I'm back to my 235/248 fps and with z-depth implemented. I have no performance loss and only gain FPS when an object is totally occluding other objects (about 2-5 fps per object though). Again, i put special efforts in grouping objects for batching, this allows less texture and index buffer changes.
My graphics adapter is an old and very basic, the most basic in its series (there can't exists a lesser model). It's an integrated, low power mobile nVidia GeForce 7000M. With this adapter i get about 2350 fps with a blank screen, 850 fps with the graphics in blank and the rest of the engine (input, sound, network, etc) and the application logic running. And finally i get my 235/248+ when throwing everything at it. Please tell me right now if these numbers seems to be wrong, i haven't had the chance to compare against others implementations.
I wonder what am i going to be able to accomplish when running my engine on newer top end hardware. I'm excited about this.

Looks like z-testing and z-buffers in hardware are really expensive features, even with newer hardware. I recommend it (not really) if you're using it for 3D content (meshes) in a very deep 3D space and don't wan't to mess with complex code. But, for 2D and specially quads, per-pixel z-testing is not needed, at least not in the way HW performs it.
A very well crafted custom Z implementation seems to outperform in all scenarios.

This thread is more about Z performance than alpha blending techniques. Take my post above as a huge advice.

Cheers.

This topic is closed to new replies.

Advertisement