Jump to content

  • Log In with Google      Sign In   
  • Create Account


Is my frustum culling slow ?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
45 replies to this topic

#41 lipsryme   Members   -  Reputation: 1005

Like
1Likes
Like

Posted 01 April 2013 - 05:18 AM

Oh my god I've finally done it smile.png....using the btAlignedArray from the bullet physics sdk.

And it is blazingly fast. With just the culling itself it takes about 0.03ms for 10k AABBs.

50k goes to around 0.53ms. 100k AABBs in 1.25ms.

 

Big thanks to everyone in here !


Edited by lipsryme, 01 April 2013 - 05:24 AM.


Sponsor:

#42 galop1n   Members   -  Reputation: 226

Like
0Likes
Like

Posted 01 April 2013 - 05:33 AM

Remember, even the fastest frustum culling loop will be no match with hierarchical culing.

Most of a world is made of static geometry, so it is only a preprocess to clusterize it and merge the clusters AABB in some kind of hierarchical structure. An octree can be a good choice, each node split into 8 sub node ( neat, it is 2 times a 4 box simd culling ). You send have a usefull information for traversal, fully visible, invisible and partially culled. You can then push the primitives of the fully visible node without even culling for example.

 

And you seems to still miss understand the difference between type size and memory alignement, so study them again and again, also add bunch of assert each time to try to do an aligned load or write to catch bugs as soon as possible.



#43 kalle_h   Members   -  Reputation: 1345

Like
0Likes
Like

Posted 01 April 2013 - 01:31 PM

Remember, even the fastest frustum culling loop will be no match with hierarchical culing.
Most of a world is made of static geometry, so it is only a preprocess to clusterize it and merge the clusters AABB in some kind of hierarchical structure. An octree can be a good choice, each node split into 8 sub node ( neat, it is 2 times a 4 box simd culling ). You send have a usefull information for traversal, fully visible, invisible and partially culled. You can then push the primitives of the fully visible node without even culling for example.
 
And you seems to still miss understand the difference between type size and memory alignement, so study them again and again, also add bunch of assert each time to try to do an aligned load or write to catch bugs as soon as possible.

Hierarchical can be overkill for many cases and the added overhead and complexity can be then net loss.
In Battlefield 3 culling paper their paraller brute force algorithm was 3 times faster than hierarchical, code size was 80% smaller and because of this simplicity further optimizations was easier.
http://dice.se/publications/culling-the-battlefield-data-oriented-design-in-practice/

#44 galop1n   Members   -  Reputation: 226

Like
0Likes
Like

Posted 01 April 2013 - 03:59 PM

I perfectly know about that talks but the 360/PS3 way is not anymore. Beeing an in order RISC processor was a pain in the ass for code size and branching. Things as needed as possible in frustum culling like the float compare instruction was really costly too. Add the need for simple data layout to not saturate the DMA communication with the SPUs and the battefield choice may be legitime in their context.

 

But today and tomorow's engine will again scale the number of primitives to manage and draw by a good amount. Hierarchical layout will strike back easely with the more modern hardwares working more efficiently with branching.

 

And being hierarchical do not means we have to go down to single primitive leaves, everything is in the balance between the overhead of the hierarchy and the raw test. May be instead of storing hundreds of primitive in leafs, we will store thousands and dispatch each leaf on separate thread. In the end, only profile session and testing will give the answer, an answer dependant of the context of each game.

 

And strangely, on a previous game i work on, with a lot of instances to manage, one optimisation i did was to strip the hierarchical split of the culling after it reach a too small size because it was more efficient. So no, i am not a pro hierarchical, i am just a pro performance :)


Edited by galop1n, 01 April 2013 - 04:06 PM.


#45 Sock5   Members   -  Reputation: 162

Like
0Likes
Like

Posted 01 April 2013 - 07:20 PM

@OP - Could you share what hardware are you running this on?Those are some awesome results.


>removed<


#46 lipsryme   Members   -  Reputation: 1005

Like
0Likes
Like

Posted 01 April 2013 - 07:24 PM

Core i5 2.8ghz quad core (I think it's a nehalem)
Mind you the frustum cull was basically the only thing inside my render loop and the results might not have been 100% perfectly measured




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS