Large Query Count?

Started by
11 comments, last by 3TATUK2 10 years, 5 months ago

So, I'm doing octree occlusion which means I have to generate an occlusion query for each node... Which at a depth of 6 is >16,000....

Once I glEndQuery() my 16256th query... It crashes.

Anyone have any clue about this?

Advertisement
If your usage is correct, it sounds like a driver bug. What's the stack trace and the exception given when it crashes?

However:
Does every node in your tree actually contain an occludee? You can probably skip most of them. Also, you should perform frustum culling before occlusion culling, which will also reduce this number.

I also suspected it's a driver/hardware bug...

Stack trace comes from GL driver which is basically useless:


#0  0x69e30a2c in atioglxx!atiPS () from C:\WINDOWS\SysWOW64\atioglxx.dll
#1  0x814e724c in ?? ()
#2  0x6994c4f8 in atioglxx!atiPPHSN () from C:\WINDOWS\SysWOW64\atioglxx.dll
#3  0x060f2910 in ?? ()
#4  0x6994b3c9 in atioglxx!atiPPHSN () from C:\WINDOWS\SysWOW64\atioglxx.dll
#5  0x064a4188 in ?? ()
#6  0x69088463 in atioglxx!atiPPHSN () from C:\WINDOWS\SysWOW64\atioglxx.dll
#7  0x814e7138 in ?? ()
#8  0x693f0577 in atioglxx!atiPPHSN () from C:\WINDOWS\SysWOW64\atioglxx.dll
#9  0x00000000 in ?? ()

Also, it's "hypothetically potential" that all octree nodes will be visible at one time - say you fly up into the air and look down onto the scene, or something... So I can't rely on a "workaround" like that.

Is it preferable to use the hardware for the occlusion queries? In several cases it has been stated that a software solution is actually better, although maybe more difficult to implement.

As far as I know, the problem with the hardware occlusion is that you'll have to wait some / several frames for the query results to arrive. Also, SLI/Crossfire requires separate queries created for each AFR unit.

Cheers!

Yeah, pre-generated "potentially visible set" stuff is better but much harder to implement... I might start seriously thinking about it. Also, I use the query result from the last frame so it's not much of a problem

You shouldn't be doing occlusion on each node.

First of all, use frustum culling to remove nodes that aren't in your view pyramid. Don't even bother testing these for occlusion.

Secondly, the way an octree is constructed, if any given node is occluded then all of it's child nodes are also occluded (because a node contains all of it's children). No need to test child nodes of an occluded node.

Your hypothetical case where all nodes are visible at the same time - that's a rare edge case. It's not going to happen unless all of your nodes are on the same plane and the viewpoint is sufficiently far away to bring them all on-screen together. I wouldn't even worry about that all.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Even if all the nodes are visible, you can decide to LOD some of them and use the parent instead of the leaf nodes.
If you're not going to do any stuff that exploits the hierarchical nature of the tree (including what mhagain said above), then you've basically just got a 3D grid, not a 3D tree tongue.png

The simplest pre-generated PVS system can be extremely simple though. In one game demo that I made, I simply manually created bounding volumes for different sectors, and then manually wrote into a text file lists of which sectors could be seen from which other sectors laugh.png

As well as pre-generated PVS, there's lots of runtime approaches that use the CPU rather than the GPU --

  • Lots of games have used sector/portal systems.
  • I know of one proprietary AAA engine that just had the level designers place 3D rectangles throughout the world as occluders (e.g. in the walls of buildings). The engine would then find the ~6 largest of these rectangles that passed the frustum test, and then use them to create frustums, and then brute-force cull any object that was inside of those occluder-frustums.
  • Fallout 3 used a combination of both of the above.
  • Here's another example of occluder frustums: http://www.gamasutra.com/view/feature/131388/rendering_the_great_outdoors_fast_.php?print=1
  • Lots of modern games have implemented software rasterizers, where they render occlusion geometry on the CPU to create a float Z-buffer in main ram, which you can then test object bounding boxes against.
  • Personally, I'm using this solution, where you implement a software rasterizer, but only allocate one bit per pixel (1=written to, 0=not yet written to). A clever implementation can then write to (up to) 128 pixels per instruction using SIMD. Occludee and occluder triangles are then sorted from front to back and rasterized. Occluder triangles write 1's, occludee triangles test for 0's (and early exit if a zero is found).

On the GPU side, there's also other options than occlusion queries. e.g. Splinter cell conviction re implemented the HZB idea, doing culling in a pixel shader, allowing them to do 10's of thousands of queries in a fraction of a millisecond, on a GPU from 2006 (much, much faster than HW occluson queries, but slightly less accurate).

Stack trace comes from GL driver which is basically useless

So as long as your usage of the functions is perfectly compliant with the GL spec, then that means you can legitimately blame the driver and/or report a driver bug wink.png

You have posted a lot of threads all revolving around occlusion culling. In each you seem to target a problem, that isn't actually your problem.

Have you done any profiling or performed any measurements, that actually tell you, that you need this sort of occlusion culling? You should never do s.th. as diffuse as optimization, without some form of measurement in place which tells you, if you are heading in the right direction and if you are still on track.


Also, it's "hypothetically potential" that all octree nodes will be visible at one time - say you fly up into the air and look down onto the scene, or something... So I can't rely on a "workaround" like that.


If your game is supposed to render just fine at 30 FPS even when everything is visible, why do you even care about culling?

Doesn't 16k occlusion querries also mean 16k draw calls? If you hit 16k draw calls before the first real triangle is even rendered, you might just as well be better off without any culling.

Why don't you try some LOD and pure frustrum culling and see, how far it gets you? If you are indoor, rather then outdoor, try portal culling. It is actually quite easy to implement for handplaced cells/portals und extremely efficient.


You have posted a lot of threads all revolving around occlusion culling. In each you seem to target a problem, that isn't actually your problem.

Yes? That's what I'm focusing on implementing now and each post is aimed at a different specific issue. It's your opinion that "they're not my problem".


Have you done any profiling or performed any measurements, that actually tell you, that you need this sort of occlusion culling?

Yes. For example on a 200K polygon map, I would get a constant 40 FPS since all vertices were transformed. With my occlusion culling I get a varying 60-150 FPS throughout the map.


If your game is supposed to render just fine at 30 FPS even when everything is visible, why do you even care about culling?

I don't care so much that it's supposed to be *fast* in such a hypothetical situation - but that it still *works*.


Doesn't 16k occlusion querries also mean 16k draw calls?

No. Once an octree node is not visible, all of it's leaves aren't processed.


Why don't you try some LOD and pure frustrum culling and see, how far it gets you?

I tried frustum culling and my current octree occlusion works better.


If you are indoor, rather then outdoor, try portal culling.

It might be "easy" - but it puts a limitation on the mapping aspect, and makes thing less flexible.

You have posted a lot of threads all revolving around occlusion culling. In each you seem to target a problem, that isn't actually your problem.


Yes? That's what I'm focusing on implementing now and each post is aimed at a different specific issue. It's your opinion that "they're not my problem".


I didn't mean to imply, that you were posting too many threads or asking too many questions. On the contrary, I would encourage you to do so. It is however my opinion (and yes, this is only my opinion and you are free to ignore it) that you might be chasing ghosts.


Have you done any profiling or performed any measurements, that actually tell you, that you need this sort of occlusion culling?


Yes. For example on a 200K polygon map, I would get a constant 40 FPS since all vertices were transformed. With my occlusion culling I get a varying 60-150 FPS throughout the map.


This is a good start, but can you try to get more into the details? Without occlusion culling you get 40FPS which equals 25ms for 200k tris. Is this 25ms on the CPU or on the GPU and are you sure, that this is vertex transformation time? Can you tilt the camera in a way, that no triangles hit the screen and measure again? The OpenGL time querries are an extremely cool tool to measure GPU times and latencies.

To give you a reference, my notebook GPU can render a ~700k triangle terrain in ~1ms as long, as no triangle actually hits the screen. So in terms of pure vertex and polygon processing, it should be able to render 17500k triangles in those 25ms, not 200k. You will probably never reach the vertex efficiency of a terrain with regular models, but still, with 200k tris in 25ms I find it hard to believe, that you are actually vertex bound.

Doesn't 16k occlusion querries also mean 16k draw calls?


No. Once an octree node is not visible, all of it's leaves aren't processed.


Are you using s.th. like the concept proposed in GPU Gems?
http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter06.html

Because they still perform blocking waits on query results in there, which might end up being a PITA once timings start to shift around, due to code, content or hardware changes.


If you are indoor, rather then outdoor, try portal culling.


It might be "easy" - but it puts a limitation on the mapping aspect, and makes thing less flexible.


We went down the same road for our last commercial game, for the same reasons (reduce the workload for the level designers), and boy did we regret that decision.

This topic is closed to new replies.

Advertisement