• Advertisement
Sign in to follow this  

OpenGL Trying to make sense of this shader article [occlusion]

This topic is 2352 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

[font="Verdana, Helvetica, sans-serif"]Hi! haven't posted here before but some people from Tigsource said that this was a good place to go to ask technical questions. S[/font][font="Verdana, Helvetica, sans-serif"]omebody linked me this paper, it describes a shader technique. I'm having trouble understanding it, I think because it is written targeting DirectX (DX9, I think), which I do not use. I'm trying to figure out how to translate this to OpenGL-ese.[/font][font="Verdana, Helvetica, sans-serif"]
[/font]
[font="Verdana, Helvetica, sans-serif"]http://gamedeveloper...1109?pg=37#pg37[/font]
[font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]The article suggests you could use a GPU to accelerate large-scale occlusion detection. I think I get their general approach: You have a scene containing a number of objects which are in themselves complex, like people or buildings. If one of these objects is completely obscured by another object, you'd prefer not to even try to render it-- maybe these objects are so complex that even with z-buffer testing you're spending a lot by trying to draw it only to have its pixels discarded. So before you actually try to draw, you draw a simplified version of the scene-- instead of the building, you draw a plain box that has the same dimensions as the building-- and then you use a shader to report back information about which of the boxes wound up in the final scene. Okay, that makes sense. But then I start thinking about how to implement the technique they describe and I just wind up feeling like an idiot.[/font]
[font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]Here's where I'm lost:[/font]
[font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]- When do you do this? Are they really proposing doing this every frame?[/font]
[font="Verdana, Helvetica, sans-serif"]- Once the GPU has created the list of "here's what is and isn't occluded", the CPU has to act on it. Is it really cheaper to ship back data every frame from the GPU to the CPU than to do the occlusion tests in CPU? I thought shipping data GPU->CPU was basically the most expensive thing you could do.[/font]
[font="Verdana, Helvetica, sans-serif"]- My big point of confusion. They say:[/font][font="Verdana, Helvetica, sans-serif"]
[/font]
[font="Verdana, Helvetica, sans-serif"]
After dispatching the compute shader to test all the bounds for visibility, you'll have time to do some processing on the CPU while you wait for the results... Once the compute shader finishes, it's just a matter of locking the writable buffer from steps 3-5, and copying the float values representing visible or hidden objects into a CPU-accessible buffer so that a final list of visible objects can be prepared for rendering.[/quote][/font][font="Verdana, Helvetica, sans-serif"]
[/font]
[font="Verdana, Helvetica, sans-serif"]They suggest storing the floats in a "writable buffer". I don't think I've ever heard of such a concept. They also make (sec. 3) reference to a "compute shader", a piece of vocabulary I'm not familiar with. Are these DirectX things? Do they have OpenGL analogues? It sounds as if they are running a shader program that has access to a plain array, and can sample a texture at will and write into random-access places in that array. They're having individual indexes in the array correspond to objects that are being tested for occlusion, and somehow also the shader has access to the bounding boxes of the individual objects it's working on (they have a diagram where they show a 3D object, and then draw a bounding box for the object on the screen; they seem to think that they can get this bounding box information for any given object, in fact, they seem to believe getting this information is so easy they don't even bother telling you how to do it. Once they have this box they plan to test the four corners for visibility and conclude that if any corner is visible, so is the object. Doesn't sound like a fair assumption for a sphere, but...). Are these techniques that translate to OpenGL-land at all?[/font]
[font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]The closest I can get to making something like this work with the OpenGL tools I know of are:[/font][font="Verdana, Helvetica, sans-serif"]
[/font]
[font="Verdana, Helvetica, sans-serif"]1. Render the "simplified" scene such that the depth buffer goes into a texture.[/font]
[font="Verdana, Helvetica, sans-serif"]2. Render a very small scene (like, 16x16) consisting of one polygon covering the screen, with the vertex shader just passing a varying to each pixel which tells it its screen coordinate. This will give me 256 runs into a pixel shader.[/font]
[font="Verdana, Helvetica, sans-serif"]3. Each pixel shader runthrough uses its coordinate to determine which object it's responsible for testing; it samples the depth texture from (1), computes whether the object was visible or not, and writes either black or white at its pixel. [/font]
[font="Verdana, Helvetica, sans-serif"]4. Copy the small 16x16 texture back to the CPU and read each pixel to decide whether to draw the object.[/font]
[font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]...but, this doesn't work because I have NO IDEA how I would get the "bounding box", or, if I magically had a list of screen-bounding-boxes-for-objects. how I would pass this list of information into the pixel shader.[/font]
[font="Verdana, Helvetica, sans-serif"]
[/font][font="Verdana, Helvetica, sans-serif"]Am I missing anything?![/font]
[font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]Thanks for any responses! I don't actually exactly plan to use this technique, but I do want to develop my shader programming skillset to the point where I can at least UNDERSTAND an article like this :/[/font]

Share this post


Link to post
Share on other sites
Advertisement
[font="Verdana, Helvetica, sans-serif"]
When do you do this? Are they really proposing doing this [/font][font="Verdana, Helvetica, sans-serif"]every frame?[/quote]Yes, occlusion testing is done every frame.
[/font][font="Verdana, Helvetica, sans-serif"]- Once the GPU has created the list of "here's what is and isn't occluded", the CPU has to act on it. Is it really cheaper to ship back data every frame from the GPU to the CPU than to do the occlusion tests in CPU? I thought shipping data GPU->CPU was basically the most expensive thing you could do.[/quote]In general, sending data back to the CPU from the GPU is a very bad idea, yes, due to the latency involved.[/font]
[font="Verdana, Helvetica, sans-serif"]Often, to get around this problem, people use the results with a one frame delay -- e.g. [/font]
[font="Verdana, Helvetica, sans-serif"][Update #1] [Occlusion #1] [Render #1 using results from update #1 and Occlusion #0]
[/font][font="Verdana, Helvetica, sans-serif"][Update #2] [Occlusion #2] [Render #2 using results from update #2 and Occlusion #1] [/font]
[font="Verdana, Helvetica, sans-serif"]^^this causes "popping" artefacts where objects take a single frame to apper after they come into view, so it's an unpopular technique for many people.
[/font][font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]However, occlusion results only need one bit per object. If you're occluding 10,000 objects, that's ~1KiB of data, which isn't very much.[/font]
[font="Verdana, Helvetica, sans-serif"]You still have to find a way to mask the latency -- e.g.[/font]
[font="Verdana, Helvetica, sans-serif"]Render occlusion volumes.[/font]
[font="Verdana, Helvetica, sans-serif"]"Render" occlusion tests.[/font]
[font="Verdana, Helvetica, sans-serif"]Initiate data transfer to CPU.[/font]
[font="Verdana, Helvetica, sans-serif"]Render something that doesn't depend on the occlusion results (keeping GPU busy during the latency/stall period).[/font]
[font="Verdana, Helvetica, sans-serif"]Do some kind of non-rendering task on the CPU [/font][font="Verdana, Helvetica, sans-serif"] (keeping CPU busy during the latency/stall period).[/font]
[font="Verdana, Helvetica, sans-serif"]Render the things that do depend on the occluion tests.[/font]
[font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]
[/font][font="Verdana, Helvetica, sans-serif"]They suggest storing the floats in a "writable buffer". I don't think I've ever heard of such a concept.[/quote]A render-target, frame-buffer, whateveryouwanttocallit is a writeable buffer.[/font]
[font="Verdana, Helvetica, sans-serif"]
[/font][font="Verdana, Helvetica, sans-serif"]
They also make (sec. 3) reference to a "compute shader", a piece of vocabulary I'm not familiar with. Are these DirectX things? Do they have OpenGL analogues?[/quote]Compute shaders were introduced in DX11. If you're not using DirectX, you can use them via OpenCL. I believe you can find some of the compute-shader code on the author's blog.[/font]
[font="Verdana, Helvetica, sans-serif"]However, this technique can be implemented using only pixel/vertex shaders instead of using compute shaders.[/font]
[font="Verdana, Helvetica, sans-serif"]
[/font]
[font="Verdana, Helvetica, sans-serif"]
[/font][font="Verdana, Helvetica, sans-serif"]they seem to believe getting this information is so easy they don't even bother telling you how to do it.[/quote]The information comes from the context of GDC/etc, attended by people who would find it obvious and be bored by the explanation of how to do it ;)[/font]
[font="Verdana, Helvetica, sans-serif"]
[/font]
[font="Verdana, Helvetica, sans-serif"]
It sounds as if they are running a shader program that has access to a plain array, and can sample a texture at will and write into random-access places in that array. They're having individual indexes in the array correspond to objects that are being tested for occlusion, and somehow also the shader has access to the bounding boxes of the individual objects it's working on[/quote]It's a lot simpler a problem than it first sounds - picture this:[/font]
[font="Verdana, Helvetica, sans-serif"]You've got a 64x64 render-target/frame-buffer, which is basically an array of 4096 pixels.[/font]
[font="Verdana, Helvetica, sans-serif"]You've got a VBO with 4096 vertices in it -- each vertex represents a single object to be tested by the occlusion system.[/font]
[font="Verdana, Helvetica, sans-serif"]Each vertex has a 2d position (which corresponds to a unique pixel position in the render target) -- this is where the output/result will be stored.[/font]
[font="Verdana, Helvetica, sans-serif"]Each vertex also has a 3D AABB encoded in it's texture coordinates.[/font]
[font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]You render this VBO to this render-target using "points" primitives, so each vertex becomes a single output pixel.[/font]
[font="Verdana, Helvetica, sans-serif"]You use a vertex shader which simply places the output vertex at the 2D pixel position specified in the input vertex and passes through the "texcoord"/attribute data.[/font]
[font="Verdana, Helvetica, sans-serif"]You use a pixel shader that takes the attribute data, decodes the AABB from it and does all the occlusion logic. During this occlusion logic, a few texture samples are taken from the HZB and compared against the AABB's depth.[/font]
[font="Verdana, Helvetica, sans-serif"]
[/font][font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]Alternatively, the method that you describe -- of rendering a full-screen polygon -- would also work. In that case you would:[/font]
[font="Verdana, Helvetica, sans-serif"]Encode all of the AABB's into a 64x64 texture (or several textures if the data won't fit in one).[/font]
[font="Verdana, Helvetica, sans-serif"]Draw a full-screen quad to a 64x64 render-target.[/font]
[font="Verdana, Helvetica, sans-serif"]In the pixel shader, use the screen-position to read an AABB from the input texture(s). Perform the occlusion test on the AABB using the HZB texture and output black/white.[/font]
[font="Verdana, Helvetica, sans-serif"]
[/font][font="Verdana, Helvetica, sans-serif"][/font]
[font="Verdana, Helvetica, sans-serif"]With either of these, you end up with a 64x64 (or whatever size you need) render-target containing black or white pixels, with each pixel representing the visibility result of a single object. [/font][font="Verdana, Helvetica, sans-serif"]You can then download that back to the CPU to determine which objects to render.[/font][font="Verdana, Helvetica, sans-serif"]
[/font]
[font="Verdana, Helvetica, sans-serif"]As an additional step, you can first compress this texture before moving it to the CPU -- each pixel in the texture takes up 32bits (assuming RGBA8 render-targets), but you only require 1 bit of data per pixel.[/font]
[font="Verdana, Helvetica, sans-serif"]So, you can take this 'results' texture and render a full-screen quad to a 2x64 sized target. In the pixel shader, you take 32 texture samples horizontally and pack them together into a single RGBA8 output pixel.[/font]

[font="Verdana, Helvetica, sans-serif"]
[/font][font="Verdana, Helvetica, sans-serif"] Once they have this box they plan to test the four corners for visibility and conclude that if any corner is visible, so is the object.[/quote]No, they calculate a mip-level of the HZB where 4 samples will cover the entire area of the AABB, not just it's corners.[/font]

Share this post


Link to post
Share on other sites
Look up occlusion query. You can query back how many pixels have been drawn from start() to stop(). You would draw your big occluders like buildings all normal shaded and such, then you would draw your basic bounding volumes for your small objects like cars etc (without allowing writing to the screen glColorMask). For each object you make an occlusion query start, and after you are done drawing them all, you request the queries back all at once.

I don't see much use for occlusion queries other than cities, so you could optimize it by spatially dividing your city blocks and having all cars etc being inside those city blocks. This way you would only have a few queries to actually check and then you mark everything inside the block as visible or not.

Share this post


Link to post
Share on other sites
Look up occlusion query. ... I don't see much use for occlusion queries other than cities
The above technique doesn't use hardware occlusion queries. In the "rendering with conviction" explanation of the above technique, they mention that the cost of a single occlusion query for them was 0.05ms, whereas performing the above technique for 20,000 objects cost only 0.4ms total (0.00002ms each).

Many current games, even ones based on popular engines like Unreal, use this technique for their visibility culling (instead of PVS, etc). They render a low-poly mesh of the entire level, and then batch-test every single object in the level at once (without any fancy hierarchical structures at all).

Share this post


Link to post
Share on other sites
single occlusion query for them was 0.05ms, whereas performing the above technique for 20,000 objects cost only 0.4ms total (0.00002ms each).[/quote]
But this does not mean 20,000 queries = .05*20,000. I don't know what it equals because I have never used it. It is also a lot easier than implementing it on cpu as he seems confused already anyway. I'll have to try and pull up my old version of this on the cpu and time it. Sounds about right, but again .05ms might just be the overhead and starting the query process. 20,000 could be .05 overhead + .0000001ms for each query. I haven't seen too many numbers on occ. queries cpu vs gpu though.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Tags

  • Advertisement
  • Popular Now

  • Similar Content

    • By LifeArtist
      Good Evening,
      I want to make a 2D game which involves displaying some debug information. Especially for collision, enemy sights and so on ...
      First of I was thinking about all those shapes which I need will need for debugging purposes: circles, rectangles, lines, polygons.
      I am really stucked right now because of the fundamental question:
      Where do I store my vertices positions for each line (object)? Currently I am not using a model matrix because I am using orthographic projection and set the final position within the VBO. That means that if I add a new line I would have to expand the "points" array and re-upload (recall glBufferData) it every time. The other method would be to use a model matrix and a fixed vbo for a line but it would be also messy to exactly create a line from (0,0) to (100,20) calculating the rotation and scale to make it fit.
      If I proceed with option 1 "updating the array each frame" I was thinking of having 4 draw calls every frame for the lines vao, polygons vao and so on. 
      In addition to that I am planning to use some sort of ECS based architecture. So the other question would be:
      Should I treat those debug objects as entities/components?
      For me it would make sense to treat them as entities but that's creates a new issue with the previous array approach because it would have for example a transform and render component. A special render component for debug objects (no texture etc) ... For me the transform component is also just a matrix but how would I then define a line?
      Treating them as components would'nt be a good idea in my eyes because then I would always need an entity. Well entity is just an id !? So maybe its a component?
      Regards,
      LifeArtist
    • By QQemka
      Hello. I am coding a small thingy in my spare time. All i want to achieve is to load a heightmap (as the lowest possible walking terrain), some static meshes (elements of the environment) and a dynamic character (meaning i can move, collide with heightmap/static meshes and hold a varying item in a hand ). Got a bunch of questions, or rather problems i can't find solution to myself. Nearly all are deal with graphics/gpu, not the coding part. My c++ is on high enough level.
      Let's go:
      Heightmap - i obviously want it to be textured, size is hardcoded to 256x256 squares. I can't have one huge texture stretched over entire terrain cause every pixel would be enormous. Thats why i decided to use 2 specified textures. First will be a tileset consisting of 16 square tiles (u v range from 0 to 0.25 for first tile and so on) and second a 256x256 buffer with 0-15 value representing index of the tile from tileset for every heigtmap square. Problem is, how do i blend the edges nicely and make some computationally cheap changes so its not obvious there are only 16 tiles? Is it possible to generate such terrain with some existing program?
      Collisions - i want to use bounding sphere and aabb. But should i store them for a model or entity instance? Meaning i have 20 same trees spawned using the same tree model, but every entity got its own transformation (position, scale etc). Storing collision component per instance grats faster access + is precalculated and transformed (takes additional memory, but who cares?), so i stick with this, right? What should i do if object is dynamically rotated? The aabb is no longer aligned and calculating per vertex min/max everytime object rotates/scales is pretty expensive, right?
      Drawing aabb - problem similar to above (storing aabb data per instance or model). This time in my opinion per model is enough since every instance also does not have own vertex buffer but uses the shared one (so 20 trees share reference to one tree model). So rendering aabb is about taking the model's aabb, transforming with instance matrix and voila. What about aabb vertex buffer (this is more of a cosmetic question, just curious, bumped onto it in time of writing this). Is it better to make it as 8 points and index buffer (12 lines), or only 2 vertices with min/max x/y/z and having the shaders dynamically generate 6 other vertices and draw the box? Or maybe there should be just ONE 1x1x1 cube box template moved/scaled per entity?
      What if one model got a diffuse texture and a normal map, and other has only diffuse? Should i pass some bool flag to shader with that info, or just assume that my game supports only diffuse maps without fancy stuff?
      There were several more but i forgot/solved them at time of writing
      Thanks in advance
    • By RenanRR
      Hi All,
      I'm reading the tutorials from learnOpengl site (nice site) and I'm having a question on the camera (https://learnopengl.com/Getting-started/Camera).
      I always saw the camera being manipulated with the lookat, but in tutorial I saw the camera being changed through the MVP arrays, which do not seem to be camera, but rather the scene that changes:
      Vertex Shader:
      #version 330 core layout (location = 0) in vec3 aPos; layout (location = 1) in vec2 aTexCoord; out vec2 TexCoord; uniform mat4 model; uniform mat4 view; uniform mat4 projection; void main() { gl_Position = projection * view * model * vec4(aPos, 1.0f); TexCoord = vec2(aTexCoord.x, aTexCoord.y); } then, the matrix manipulated:
      ..... glm::mat4 projection = glm::perspective(glm::radians(fov), (float)SCR_WIDTH / (float)SCR_HEIGHT, 0.1f, 100.0f); ourShader.setMat4("projection", projection); .... glm::mat4 view = glm::lookAt(cameraPos, cameraPos + cameraFront, cameraUp); ourShader.setMat4("view", view); .... model = glm::rotate(model, glm::radians(angle), glm::vec3(1.0f, 0.3f, 0.5f)); ourShader.setMat4("model", model);  
      So, some doubts:
      - Why use it like that?
      - Is it okay to manipulate the camera that way?
      -in this way, are not the vertex's positions that changes instead of the camera?
      - I need to pass MVP to all shaders of object in my scenes ?
       
      What it seems, is that the camera stands still and the scenery that changes...
      it's right?
       
       
      Thank you
       
    • By dpadam450
      Sampling a floating point texture where the alpha channel holds 4-bytes of packed data into the float. I don't know how to cast the raw memory to treat it as an integer so I can perform bit-shifting operations.

      int rgbValue = int(textureSample.w);//4 bytes of data packed as color
      // algorithm might not be correct and endianness might need switching.
      vec3 extractedData = vec3(  rgbValue & 0xFF000000,  (rgbValue << 8) & 0xFF000000, (rgbValue << 16) & 0xFF000000);
      extractedData /= 255.0f;
    • By Devashish Khandelwal
      While writing a simple renderer using OpenGL, I faced an issue with the glGetUniformLocation function. For some reason, the location is coming to be -1.
      Anyone has any idea .. what should I do?
  • Advertisement