Trying to make sense of this shader article [occlusion]

Started by
4 comments, last by mcclure111 12 years, 7 months ago
[font="Verdana, Helvetica, sans-serif"]Hi! haven't posted here before but some people from Tigsource said that this was a good place to go to ask technical questions. S[/font][font="Verdana, Helvetica, sans-serif"]omebody linked me this paper, it describes a shader technique. I'm having trouble understanding it, I think because it is written targeting DirectX (DX9, I think), which I do not use. I'm trying to figure out how to translate this to OpenGL-ese.[/font][font="Verdana, Helvetica, sans-serif"]
[/font]
[font="Verdana, Helvetica, sans-serif"]http://gamedeveloper...1109?pg=37#pg37[/font]
[font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]The article suggests you could use a GPU to accelerate large-scale occlusion detection. I think I get their general approach: You have a scene containing a number of objects which are in themselves complex, like people or buildings. If one of these objects is completely obscured by another object, you'd prefer not to even try to render it-- maybe these objects are so complex that even with z-buffer testing you're spending a lot by trying to draw it only to have its pixels discarded. So before you actually try to draw, you draw a simplified version of the scene-- instead of the building, you draw a plain box that has the same dimensions as the building-- and then you use a shader to report back information about which of the boxes wound up in the final scene. Okay, that makes sense. But then I start thinking about how to implement the technique they describe and I just wind up feeling like an idiot.[/font]
[font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]Here's where I'm lost:[/font]
[font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]- When do you do this? Are they really proposing doing this every frame?[/font]
[font="Verdana, Helvetica, sans-serif"]- Once the GPU has created the list of "here's what is and isn't occluded", the CPU has to act on it. Is it really cheaper to ship back data every frame from the GPU to the CPU than to do the occlusion tests in CPU? I thought shipping data GPU->CPU was basically the most expensive thing you could do.[/font]
[font="Verdana, Helvetica, sans-serif"]- My big point of confusion. They say:[/font][font="Verdana, Helvetica, sans-serif"]
[/font]
[font="Verdana, Helvetica, sans-serif"]
After dispatching the compute shader to test all the bounds for visibility, you'll have time to do some processing on the CPU while you wait for the results... Once the compute shader finishes, it's just a matter of locking the writable buffer from steps 3-5, and copying the float values representing visible or hidden objects into a CPU-accessible buffer so that a final list of visible objects can be prepared for rendering.[/quote][/font][font="Verdana, Helvetica, sans-serif"]
[/font]
[font="Verdana, Helvetica, sans-serif"]They suggest storing the floats in a "writable buffer". I don't think I've ever heard of such a concept. They also make (sec. 3) reference to a "compute shader", a piece of vocabulary I'm not familiar with. Are these DirectX things? Do they have OpenGL analogues? It sounds as if they are running a shader program that has access to a plain array, and can sample a texture at will and write into random-access places in that array. They're having individual indexes in the array correspond to objects that are being tested for occlusion, and somehow also the shader has access to the bounding boxes of the individual objects it's working on (they have a diagram where they show a 3D object, and then draw a bounding box for the object on the screen; they seem to think that they can get this bounding box information for any given object, in fact, they seem to believe getting this information is so easy they don't even bother telling you how to do it. Once they have this box they plan to test the four corners for visibility and conclude that if any corner is visible, so is the object. Doesn't sound like a fair assumption for a sphere, but...). Are these techniques that translate to OpenGL-land at all?[/font]
[font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]The closest I can get to making something like this work with the OpenGL tools I know of are:[/font][font="Verdana, Helvetica, sans-serif"]
[/font]
[font="Verdana, Helvetica, sans-serif"]1. Render the "simplified" scene such that the depth buffer goes into a texture.[/font]
[font="Verdana, Helvetica, sans-serif"]2. Render a very small scene (like, 16x16) consisting of one polygon covering the screen, with the vertex shader just passing a varying to each pixel which tells it its screen coordinate. This will give me 256 runs into a pixel shader.[/font]
[font="Verdana, Helvetica, sans-serif"]3. Each pixel shader runthrough uses its coordinate to determine which object it's responsible for testing; it samples the depth texture from (1), computes whether the object was visible or not, and writes either black or white at its pixel. [/font]
[font="Verdana, Helvetica, sans-serif"]4. Copy the small 16x16 texture back to the CPU and read each pixel to decide whether to draw the object.[/font]
[font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]...but, this doesn't work because I have NO IDEA how I would get the "bounding box", or, if I magically had a list of screen-bounding-boxes-for-objects. how I would pass this list of information into the pixel shader.[/font]
[font="Verdana, Helvetica, sans-serif"]
[/font][font="Verdana, Helvetica, sans-serif"]Am I missing anything?![/font]
[font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]Thanks for any responses! I don't actually exactly plan to use this technique, but I do want to develop my shader programming skillset to the point where I can at least UNDERSTAND an article like this :/[/font]
Advertisement
[font="Verdana, Helvetica, sans-serif"]
When do you do this? Are they really proposing doing this [/font][font="Verdana, Helvetica, sans-serif"]every frame?[/quote]Yes, occlusion testing is done every frame.
[/font][font="Verdana, Helvetica, sans-serif"]- Once the GPU has created the list of "here's what is and isn't occluded", the CPU has to act on it. Is it really cheaper to ship back data every frame from the GPU to the CPU than to do the occlusion tests in CPU? I thought shipping data GPU->CPU was basically the most expensive thing you could do.[/quote]In general, sending data back to the CPU from the GPU is a very bad idea, yes, due to the latency involved.[/font]
[font="Verdana, Helvetica, sans-serif"]Often, to get around this problem, people use the results with a one frame delay -- e.g. [/font]
[font="Verdana, Helvetica, sans-serif"][Update #1] [Occlusion #1] [Render #1 using results from update #1 and Occlusion #0]
[/font][font="Verdana, Helvetica, sans-serif"][Update #2] [Occlusion #2] [Render #2 using results from update #2 and Occlusion #1] [/font]
[font="Verdana, Helvetica, sans-serif"]^^this causes "popping" artefacts where objects take a single frame to apper after they come into view, so it's an unpopular technique for many people.
[/font][font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]However, occlusion results only need one bit per object. If you're occluding 10,000 objects, that's ~1KiB of data, which isn't very much.[/font]
[font="Verdana, Helvetica, sans-serif"]You still have to find a way to mask the latency -- e.g.[/font]
[font="Verdana, Helvetica, sans-serif"]Render occlusion volumes.[/font]
[font="Verdana, Helvetica, sans-serif"]"Render" occlusion tests.[/font]
[font="Verdana, Helvetica, sans-serif"]Initiate data transfer to CPU.[/font]
[font="Verdana, Helvetica, sans-serif"]Render something that doesn't depend on the occlusion results (keeping GPU busy during the latency/stall period).[/font]
[font="Verdana, Helvetica, sans-serif"]Do some kind of non-rendering task on the CPU [/font][font="Verdana, Helvetica, sans-serif"] (keeping CPU busy during the latency/stall period).[/font]
[font="Verdana, Helvetica, sans-serif"]Render the things that do depend on the occluion tests.[/font]
[font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]
[/font][font="Verdana, Helvetica, sans-serif"]They suggest storing the floats in a "writable buffer". I don't think I've ever heard of such a concept.[/quote]A render-target, frame-buffer, whateveryouwanttocallit is a writeable buffer.[/font]
[font="Verdana, Helvetica, sans-serif"]
[/font][font="Verdana, Helvetica, sans-serif"]
They also make (sec. 3) reference to a "compute shader", a piece of vocabulary I'm not familiar with. Are these DirectX things? Do they have OpenGL analogues?[/quote]Compute shaders were introduced in DX11. If you're not using DirectX, you can use them via OpenCL. I believe you can find some of the compute-shader code on the author's blog.[/font]
[font="Verdana, Helvetica, sans-serif"]However, this technique can be implemented using only pixel/vertex shaders instead of using compute shaders.[/font]
[font="Verdana, Helvetica, sans-serif"]
[/font]
[font="Verdana, Helvetica, sans-serif"]
[/font][font="Verdana, Helvetica, sans-serif"]they seem to believe getting this information is so easy they don't even bother telling you how to do it.[/quote]The information comes from the context of GDC/etc, attended by people who would find it obvious and be bored by the explanation of how to do it ;)[/font]
[font="Verdana, Helvetica, sans-serif"]
[/font]
[font="Verdana, Helvetica, sans-serif"]
It sounds as if they are running a shader program that has access to a plain array, and can sample a texture at will and write into random-access places in that array. They're having individual indexes in the array correspond to objects that are being tested for occlusion, and somehow also the shader has access to the bounding boxes of the individual objects it's working on[/quote]It's a lot simpler a problem than it first sounds - picture this:[/font]
[font="Verdana, Helvetica, sans-serif"]You've got a 64x64 render-target/frame-buffer, which is basically an array of 4096 pixels.[/font]
[font="Verdana, Helvetica, sans-serif"]You've got a VBO with 4096 vertices in it -- each vertex represents a single object to be tested by the occlusion system.[/font]
[font="Verdana, Helvetica, sans-serif"]Each vertex has a 2d position (which corresponds to a unique pixel position in the render target) -- this is where the output/result will be stored.[/font]
[font="Verdana, Helvetica, sans-serif"]Each vertex also has a 3D AABB encoded in it's texture coordinates.[/font]
[font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]You render this VBO to this render-target using "points" primitives, so each vertex becomes a single output pixel.[/font]
[font="Verdana, Helvetica, sans-serif"]You use a vertex shader which simply places the output vertex at the 2D pixel position specified in the input vertex and passes through the "texcoord"/attribute data.[/font]
[font="Verdana, Helvetica, sans-serif"]You use a pixel shader that takes the attribute data, decodes the AABB from it and does all the occlusion logic. During this occlusion logic, a few texture samples are taken from the HZB and compared against the AABB's depth.[/font]
[font="Verdana, Helvetica, sans-serif"]
[/font][font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]Alternatively, the method that you describe -- of rendering a full-screen polygon -- would also work. In that case you would:[/font]
[font="Verdana, Helvetica, sans-serif"]Encode all of the AABB's into a 64x64 texture (or several textures if the data won't fit in one).[/font]
[font="Verdana, Helvetica, sans-serif"]Draw a full-screen quad to a 64x64 render-target.[/font]
[font="Verdana, Helvetica, sans-serif"]In the pixel shader, use the screen-position to read an AABB from the input texture(s). Perform the occlusion test on the AABB using the HZB texture and output black/white.[/font]
[font="Verdana, Helvetica, sans-serif"]
[/font][font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]With either of these, you end up with a 64x64 (or whatever size you need) render-target containing black or white pixels, with each pixel representing the visibility result of a single object. [/font][font="Verdana, Helvetica, sans-serif"]You can then download that back to the CPU to determine which objects to render.[/font][font="Verdana, Helvetica, sans-serif"]
[/font]
[font="Verdana, Helvetica, sans-serif"]As an additional step, you can first compress this texture before moving it to the CPU -- each pixel in the texture takes up 32bits (assuming RGBA8 render-targets), but you only require 1 bit of data per pixel.[/font]
[font="Verdana, Helvetica, sans-serif"]So, you can take this 'results' texture and render a full-screen quad to a 2x64 sized target. In the pixel shader, you take 32 texture samples horizontally and pack them together into a single RGBA8 output pixel.[/font]

[font="Verdana, Helvetica, sans-serif"]
[/font][font="Verdana, Helvetica, sans-serif"] Once they have this box they plan to test the four corners for visibility and conclude that if any corner is visible, so is the object.[/quote]No, they calculate a mip-level of the HZB where 4 samples will cover the entire area of the AABB, not just it's corners.[/font]
Look up occlusion query. You can query back how many pixels have been drawn from start() to stop(). You would draw your big occluders like buildings all normal shaded and such, then you would draw your basic bounding volumes for your small objects like cars etc (without allowing writing to the screen glColorMask). For each object you make an occlusion query start, and after you are done drawing them all, you request the queries back all at once.

I don't see much use for occlusion queries other than cities, so you could optimize it by spatially dividing your city blocks and having all cars etc being inside those city blocks. This way you would only have a few queries to actually check and then you mark everything inside the block as visible or not.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

Look up occlusion query. ... I don't see much use for occlusion queries other than cities
The above technique doesn't use hardware occlusion queries. In the "rendering with conviction" explanation of the above technique, they mention that the cost of a single occlusion query for them was 0.05ms, whereas performing the above technique for 20,000 objects cost only 0.4ms total (0.00002ms each).

Many current games, even ones based on popular engines like Unreal, use this technique for their visibility culling (instead of PVS, etc). They render a low-poly mesh of the entire level, and then batch-test every single object in the level at once (without any fancy hierarchical structures at all).
single occlusion query for them was 0.05ms, whereas performing the above technique for 20,000 objects cost only 0.4ms total (0.00002ms each).[/quote]
But this does not mean 20,000 queries = .05*20,000. I don't know what it equals because I have never used it. It is also a lot easier than implementing it on cpu as he seems confused already anyway. I'll have to try and pull up my old version of this on the cpu and time it. Sounds about right, but again .05ms might just be the overhead and starting the query process. 20,000 could be .05 overhead + .0000001ms for each query. I haven't seen too many numbers on occ. queries cpu vs gpu though.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

Thanks all for the responses!

This topic is closed to new replies.

Advertisement