IOTD | Submit Your Image
One Billion Polys by Krypt0n
Time Spent:
some months, mostly optimization
Date Added: Jun 13 2011 05:08 PM
ok ok, quite a catchy name. what you actually see is my culling lib, supporting
-frustum culling
-occlusion culling
you see a cube consisting of 135k objects, slightly over one billion polys made of 3 mesh types
-utah teapot, about 500poly, mostly contributing to drawcalls
-stanford bunny, 15k poly, simulating nowadays game models
-tetrahedron, highly tessellated, simulating micro polygons (which are quite challenging for software rasterizers)
some benchmarks, without/with the lib
- 135k object, 1Billion polys -> 0.3fps/5-6fps
- 13k objects, 100MPoly -> 3fps/30fps
-1.3k objects, 10MPoly -> 30fps, 65fps
-130 objects, 1MPoly -> 111fps/105fps (about 3-5objects get culled, kinda useless lib in this case, creating just overhead)
there are just a few occlusion libs, I wanted to create an alternative that have some quality goals:
- handling high object counts, that's usually the limiting factor on PC games using D3D nowadays
- drop in solution, handling all culling you need, without much coding or level setup. supporting ogl/d3d (you can just call glget.. to get matrices and pass them to the lib)
- handling low poly (boxes) to Million poly meshes, with a win if culling is possible, but without performance penalty if it's impossible.
- simple c interface, few functions, supporting "scenes" and "cameras" with shared meshes, but instanced objects. so you can make one cam, one scene, thousands objects games, but also split screen games with shared cams, you can use the cams for occlusion culling of shadows, or in some rare cases also simulate separate scenes.
the scene from the screenshot is quite a worst case for the culling lib, high poly and extreme drawcall counts, low pixel cost (opengl phong shading), and it still is a win or at least not a big hit. Common in nowadays games are rather high pixel cost with 1000-5000 drawcalls, which makes this tech very beneficial for real world scenarios.
if you have any interest to integrate this lib into your existing project, drop me a pm please
Date Added: Jun 13 2011 05:08 PM
ok ok, quite a catchy name. what you actually see is my culling lib, supporting
-frustum culling
-occlusion culling
you see a cube consisting of 135k objects, slightly over one billion polys made of 3 mesh types
-utah teapot, about 500poly, mostly contributing to drawcalls
-stanford bunny, 15k poly, simulating nowadays game models
-tetrahedron, highly tessellated, simulating micro polygons (which are quite challenging for software rasterizers)
some benchmarks, without/with the lib
- 135k object, 1Billion polys -> 0.3fps/5-6fps
- 13k objects, 100MPoly -> 3fps/30fps
-1.3k objects, 10MPoly -> 30fps, 65fps
-130 objects, 1MPoly -> 111fps/105fps (about 3-5objects get culled, kinda useless lib in this case, creating just overhead)
there are just a few occlusion libs, I wanted to create an alternative that have some quality goals:
- handling high object counts, that's usually the limiting factor on PC games using D3D nowadays
- drop in solution, handling all culling you need, without much coding or level setup. supporting ogl/d3d (you can just call glget.. to get matrices and pass them to the lib)
- handling low poly (boxes) to Million poly meshes, with a win if culling is possible, but without performance penalty if it's impossible.
- simple c interface, few functions, supporting "scenes" and "cameras" with shared meshes, but instanced objects. so you can make one cam, one scene, thousands objects games, but also split screen games with shared cams, you can use the cams for occlusion culling of shadows, or in some rare cases also simulate separate scenes.
the scene from the screenshot is quite a worst case for the culling lib, high poly and extreme drawcall counts, low pixel cost (opengl phong shading), and it still is a win or at least not a big hit. Common in nowadays games are rather high pixel cost with 1000-5000 drawcalls, which makes this tech very beneficial for real world scenarios.
if you have any interest to integrate this lib into your existing project, drop me a pm please
SSE(1/2/4) was the most important "tool"
it's written internally in c++, using visual studio on win, xcode on mac and vi+make on linux.
it's written internally in c++, using visual studio on win, xcode on mac and vi+make on linux.









18 Comments
What methods are you using for the occlusion culling?
Seriously, awesome work! What are you planning to do from here?
@Hodgman
it's a software rasterizer with triangle and quad support and some obvious optimizations like removing duplicated vertices etc.
in general the lib is working like a physics/collision lib, the camera is collided against the objects in the broad phase and the fine grain phase is checking objects against the software zbuffer.
@owl @Tachikoma
can't do anything wrong with teapots, everyone love them, right?
@owl
I hoped some people want to use the lib for their project (and I got some pm which seem promising), as it is quite a basic thing for most engines, but people seem to struggle (or at least wasting time) to get a well working system.
If it really will be used, I'll optimize it further, to make it a real alternative to portal, pvs and other systems.
then I'll add more optimization and try to make it more of a 'drop in' solution. I've also designed everything in a way that it would be easy to get it working on PS3/X360, in case someone wants that.
Surprisingly I got request from non-graphics programmers, who said they might have use for it for AI (visibility checking seems to be a hassle as it's done with raycasts?) and a sound programmer said, with cubemap rendering, he could use it to check the occlusion of sound sources. (I guess that's easy to add).
I wonder how it compares to modern hardware approches like CHC++. What do you think?
becuse i made 30K objects run at 100 fps with an GTX 280.
and im using DX11
Awesome, keep us posted!
From my point of view, occlusion culling is, just like frustum culling, a way to trade some CPU cycles for GPU cycles. Back in the early days of 3D, frustum culling on cpu was a real time consumer (I think on the first game I made it was 20%+ of the frame time), probably more than occlusion culling nowadays, as games usually don't occupy 100% of all cores, but with just one core and culling down to polygon level, you saw the impact of frustum culling.
That's why I prefer the cpu solution in general, if you would write a molecular simulator, occupying 100% cpu time and showing some ogl spheres, I wouldn't recomment the software version.
the GPU solutions have quite some issues from my experience
@Tordin
I accept the challenge ;), if I draw just one triangle per object, I get 49fps with 135000 drawcalls per frame
I you intend to compare something more, you must specify more accurately how the scene and camera is set, so I could provide you numbers.
there is no preview button here :/
I think in context to this, also this years GDC demo might be interesting for you to watch:
Mega Meshes - Modelling, rendering and lighting a world made of 100 billion polygons
http://miciwan.com/GDC2011/GDC2011_Mega_Meshes.pdf
@Krypt0n : I dident mean to compare "our qualitys in anyway". i was more intrerested if you where using an DX9 version or a DX11 Version.
So both of us could improve the rendering.
And besides a note to every one. i dont belive the actual drawing in this case is the hard stuff, i belive the OC and FC is the part that takes more from the FPS. (thats only what i belive)
First, again, congratulation for this work! Looks like a really nice piece of tech you got here!
I would just like to know what is the overall algo behind the library. Some people already asked some of my question but I still have few:
- Is it meant to give an answer about visibility of each object for the current frame? Is there some computations that are delayed in between frames? (as in CHC++)
- If I understand well, you are first rendering a depth buffer. Then you again render each object to test per pixel visibility? Or are you keeping a list of pixel to test for each objects? (ok this would be a very memory consuming bad idea...) Or any other tricks?
- If there is rasterization, there is a buffer. What is the size of the buffer for the performance you are giving us? Is it the same size as the render buffer (pixel level visibility)? Can you use a buffer with different sizes? (conservative rasterization)
Thank you in advance for your answers!
it's made to be fast enough to give you the result for the current frame, it's not "instant" though. I think most people will submit the current state and then want the result to render. But the interface is having the possibility to work asynchronously, so you could in theory
-change stateof instances (e.g. transformations) in the culling lib
-notify the lib to start culling
-do something in-between, like calculating bone/skinning data, handle some streaming etc.
-sync to the culling lib
-render returned objcts
to make it async across frames, you would just reschedule this
- sync
-get list of objects
-change stateof instances (e.g. transformations) in the culling lib for the next frame
-notify culling lib
-process all returned objects
-swapbuffers to next frame
-update/logic etc.
- sync
the 2nd solution run the culling asynchronously, it has no object popping like the usually occlusion queries that are based on the last frame, but it adds one frame of latency.
I don't have any separate buffer, although it wouldn't be that bad memory wise, doubling the framebuffer memory. in usually cases you could even assume there won't be more than 64k object and limit it to 16bit IDs, just the matrices for 135k objects are >8MB, so
I try to balance the resolution based on the time between triggering the culling and sync time. the lowest limit is 128x128 atm, but I'm thinking about some way to set the 'culling quality'.
most software occlusion culler work on lower res, as just testing wouldn't be fast enough due to "fillrate" if done in full res, due to this you of course can suffer from some false detected occlusions. there are some ways to compensate it a little, like extending the size of the primitives that you text, but there can always be cases e.g. some fence that looks solid in your occlusion culler and hides everything behind, but in reality has holes and the objects behind will disappear/flicker.
that's one disadvantage of the software solution, I admit ;)
That was one question I also had on my mind. "Culling quality" seems to be a nice concept to adjust the buffer's resolution but is it a good idea to do this automatically? Doesn't this introduce popping/flickering? I am thinking of scene where the camera is right in front of the above mentioned fence and from time to time some objects like birds or planes are appearing. Now would this extra objects change the buffer's resolution and thereby trigger culling of some previously visible objects?
Some more questions are keeping me busy
Does "camera is collided against objects" mean your are using ray casting in the broad phase?
Are you taking advantage of spatial coherency, e.g. using BVHs to cull a bunch of objects?
Are you utilizing temporal coherency? Something like "What is hidden this frame will also most probably be invisible in the next frame"
Sorry for asking so many questions but I am really into this kind of stuff and obviously you are the right person to ask ;-)
Thanks in advance
the lower resolution is already a source of error, adapting it to performance is rather improving the error, otherwise you might want to always run in the fastest setup and having always the error. but in general the error isn't that noticeable, some games use that kind of occlusion culling (like Warhawk, crysis1, battlefield 3) and it seems to run fine.
a broad phase in physics usually means that you put everything into bounding objects (e.g. bounding boxes or spheres or simple primitives like capsule..) and just test those bounding volumes against each other to receive a list of potential collisions, so no, I don't do any raycasting for collisions/visibility.
yes, there are some spatial partitioning going on, as I want it to be a generic solution and in the usual world, you see probably less than 1% of all the objects that people place in the world.
Short: NO, I don't.
Long: there are two possibilities, coherency for how long ago something was checked and visible and how long ago it was checked and invisible.
Invisible:
- in that case you would gamble that something won't be visible, in a recent shooter(fps) I was involved, we tried this at first, it worked in most cases (I'd guess 99.9%), but if some QA tester went into cover and jumped out of it, most of the world was usually invisible and popped in after some frames. but even worse, because there were no proper occluder (as they were assumed to be invisible), usually occluded objects were marked visible and were pushed into the streaming system, which rejected objects which actually should be visible, to stream in invisible objects. This coherency was really not acceptable.
Visible:
- if objects are visible, you could assume, that in average you get a "true" after testing 50% of the pixel. if you make it a little bit smarter, you'll actually be done very early if something is visible. in my lib, I think it's about 2% of the time that is wasted on visible objects, 98% on invisible objects (% in regard to cpu time, using some common profiling tool). so we are talking about speeding up 2% in best cases. On the other side, those tests are relatively cheap compared to the rendering of objects, testing every frame and rejecting as many drawcalls as possibile, even if you spend those extra 2%, will probably save you more than 2% of the frame time.
that's why I don't use any coherency.
In previous occlusion cullers I added coherency testing, mainly to save rasterization time, buy approximating the current zbuffer from the old zbuffer, for this I had to strictly split static from dynamic objects. I did not want to add this kind of limitation to this lib.
cheers
everyone who requested an sdk should have it by now, if not please contact me, but check your spam folder first
Cheers