Archived

This topic is now archived and is closed to further replies.

Renus

6th generation PVS outdoor

Recommended Posts

6th generation video cards want us to sort our surfaces to minimize texture- and state change and to reuse vertex buffer data. This leads to an engine architecture that keeps some kind of surface pipelines (e.g. solid, transparent etc) that must be sorted on each frame. To minimize this sorting effort, we use the fact that appr. 90% of all surfaces remain the same for the next frame and insert only the surfaces which came in sight and remove those from the pipelines that went out of sight since the last frame. For this task of looking for the "PDS" (potential difference set ) the traditional tree structures (octree, BSP) seem not to be the ideal choice. They relate to "is-contained-in" information rather than "is-adjacent-to". Did you find alternatives? To me, this seems to be the essential of a "modern" engine. - thomas

Share this post


Link to post
Share on other sites
Noone? *hmpf*

Here are my thoughts so far:
Octrees that operate on objects in the scene have an ideal structure in that they refine data resolution only in high-detailed regions. Moreover, they easily allow to add or reject further child cubelets on runtime.

The worse thing is the traversal. It works fine if you start with a scene covering cubelet rendering recursively all childs in view. But for the reasons mentioned in the initial post it is a Bad Thing to touch all cubelets when 90% remain in view and we just need the ones that came insight / went out of sight.

For this task the "traditional" traversion algorithms fail.
I first thought of just keeping a list of all fathers where not all of the 8 childs are in view / out of view. Instead of traversing the tree one just looks at those fathers which cover all cubelets on the boundary of the viewing frustum. Unfortunetaly the data structure of an octree does not help much on telling what cubelets lies adjacent. A kind of cubelet-linked list could work better, still thinking on such a "linked-tree" structure.


- thomas

Share this post


Link to post
Share on other sites
If anything some sort of spatial subdivision/culling scheme will still need to be used. BSP has some downsides (its static nature, precompile time etc) that will probably eventually lead to alternate solutions becoming dominant.

I envision the best choice with the 6th gen cards to be octrees which are rendered using a shade-tree. Shade-trees bring down state changes to an absolute minimum. You can find more info on them in this terrific flipCode thread.





The quickest way to double your money is to fold it in half and put it back in your pocket.

David Hof
Destiny3D engine programmer
mail - icq - homepage

Share this post


Link to post
Share on other sites
Check out the SurRender (and Renderware) dPVS stuff...

The dPVS reference manual here goes into some very interesting detail about the algorithms they chose for a variety of things including spatial structures (Octree, BSP etc), Occlusion, PVS, temporal coherence (what you mention about surfaces staying the same over short times) etc:

http://www.surrender3d.com/umbra/download.html


BTW: You can avoid sorting explicitly per frame with a bit of careful thought - think linked lists per texture and trees etc - you get perfecly sorted textures and states for the price of appending a node to a list

--
Simon O''''Connor
Creative Asylum Ltd
www.creative-asylum.com

Share this post


Link to post
Share on other sites
Thanks for your hints!

I also saw the Renderware solution and think thats (in general) the way to go: pack visibility information into world sectors, how ever they are built. The immense memory amount will be available soon. I will give it a try, although many heuristic parameters seem to pop up with this solution and the work package takes some time.

For sorting my rendering pipelines I chosed another way than shader trees & Co. I managed to assign each surface a precompiled _int64 number where all state change information is bit-wise coded, including vertex data reuse. During the rendering, I sort this single number on the fly which takes less efforts and can be done while the GPU renders the last shot.

Thanks again,

- thomas

Share this post


Link to post
Share on other sites
Renus, although I agree its always good to compress stuff I think you kind of miss the point why you would use a tree instead of a array.

If I have three primitives (A,B,C), A and B have the exact same first texturestage but have a different texturestage at level 2, C has nothing the same as A and B.

Then the tree looks like this:


- A, set stage 1, 2
- B, set stage 2 (stage 1 stays the same)
- C, set stage 1,2,...


In other words, using a shade tree instead of a ''shade array'' prevented setting a whole texture stage.

Also, fuck the CPU overhead involved in the sorting, its nothing compared to setting a different texture/vertexbuffer.



The quickest way to double your money is to fold it in half and put it back in your pocket.

David Hof
Destiny3D engine programmer
mail - icq - homepage

Share this post


Link to post
Share on other sites
Countach,

I don''t make an array of them, sorry for my short lines.
I represent this tree in a 64-bit value which is not stored in an array, it is just use to sort the surface in place. So all I do during rendering is to sort new surfaces which came in sight into the rendering pipeline by comparing this _int64 with the actual surface being drawn - is it <=, then I insert the new surface pointer before the previous one. I "flatten" the tree to this _int64 to have as less work with it during rendertime as possible. The drawback is the need to put all state/vertex data/texture/pixel&vertex shader combinations into one _int64 value. This is only possible if I restrict myself on e.g. 32 different pixel shader (5 bits) etc.

Your example concerning the texture stages will lead to the same results as with the tree, but the rendering order is then
- A, set stage 1, 2
- C, set stage 1,2,...
- B, set stage 2 (stage 1 stays the same)
since A and C have same texture stage settings. When B comes up, only the difference (in this case stage 2) is set.

I agree that the tree solution is more general and versatile. Since I actually have the foxus on many topics which wait to be resolved, I take the design&implementation effort into account and am happy to have the whole stuff running. Over the last years the technology changed faster than I could follow with my refinements of the different algorithms used, and currently I am not sure where to set the balance between forthcoming hardware frustum culling (and occlusion culling? don''t know) and some pretty complicated algorithms that work NOW best in software.
Thats the problem if you have not a product deadline and hence a concrete target system..

Thanks,




- thomas

Share this post


Link to post
Share on other sites
Uups, Countach,

I missed your example. Since your primitive C has different stages than A and B the rendering order is as you wrote it, sorry.


- thomas

Share this post


Link to post
Share on other sites
Thanks, that clears stuff up a bit. Though ingenious I wouldn''t want to implement it that way because you sacrifice the option of easily adding new types of renderstates using inheritance.

On the topic of hardware implementations of culling, I wouldn''t count on any card featuring hardware culling even to become available for this year (the ATI Radeon2 won''t have it at least). So just keep everything in software until the hardware arrives. If you have some sort of plugin dll driver system you can just write a new driver. Also, don''t worry about optimizing anything that runs just on the CPU. Even if you would bubblesort the shadetree for instance the difference in fps with your highly optimized routine would be around 0.0000000000000001.





The quickest way to double your money is to fold it in half and put it back in your pocket.

David Hof
Destiny3D engine programmer
mail - icq - homepage

Share this post


Link to post
Share on other sites
Yes, I first need a software-only solution, thats true. I''d like to make the algorithm choice based on the hardware culling direction which should be already designed, but I have no informations on it. If someone has a link around that goes into such details I would be very grateful.

Since today Germany''s temperatures reach marks where hi-tec does not run anymore, I''ve plenty of time

- thomas

Share this post


Link to post
Share on other sites