Jump to content
  • Advertisement
Sign in to follow this  
D.V.D

3D Proper Pixel Coverage Testing for CPU Occlusion Culling

Recommended Posts

Hey guys, I've been writing a software occlusion culling algorithm (mostly for fun and learning), and I came across an issue. I store my geometry as an octree that I traverse front to back, and I check each node for coverage to see if it and its children are occluded. When checking a node for occlusion, the node itself is a cube but I approximate it as the minimum on screen bounding box that covers the cube. Now my issue is, when checking this on screen bounding box for occlusion, I don't exactly know when a pixel should be considered as touched by the geometry. Currently, I generate min/max points for every node on screen, I round them to the nearest integer, and loop as shown in the below code:


f32* DepthRow = RenderState->DepthMap + MinY*ScreenX + MinX;
for (i32 Y = MinY; Y < MaxY; ++Y)
{
    f32* Depth = DepthRow;
    for (i32 X = MinX; X < MaxX; ++X)
    {
        if (*Depth > NodeZ)
        {
            // NOTE: We aren't occluded
        }

        ++Depth;
    }

    DepthRow += ScreenX;
}

This method is kind of the same as what software rasterizes do, in that a pixel is only considered a part of the box if the center of the pixel is inside the box. Sometimes though, I get nodes that are 2x1 pixels in size, and when I subdivide them, none of the children are rendered because of where they happen to be relative to the pixel centers. This is illustrated in the below 2 images:

20180514_235823.thumb.jpg.afe09d8b0632ab2d7414d07c8ec5b60e.jpg20180514_235825.thumb.jpg.11a6b7cb192f4d073f84c62c2f04a9bf.jpg

So in the above examples, one node happens to overlap 2 pixels, but once it's subdivided, some its children are empty and the ones which aren't overlap 0 pixels because the min max values are equal, so we get a nothing rendered. So my issue is how to better decide when a node should overlap a pixel to avoid such cases. I thought I can maybe make my loop inclusive by checking Y <= MaxY and X <= MaxX but then I have to subdivide nodes until MinX == MaxX and MinY == MaxY, which makes my program render 10 times more nodes than it usually does. My other option is to check if the pre rounded Min/Max values are of a distance less than or equal to 1 of each other, and render that, regardless of if the block takes up more than one pixel because of its orientation. The only issue I have with this method is if I can really consider my renders as pixel accurate then, in the worst case scenario, a node can be of diameter 1 on the screen but cover 4 pixels, so I can get some blockier results than I should be getting. 

Is there a good fix for this issue? There aren't many resources on software rasterization, especially with very small geometry so I don't know how explored this issue really is.

Share this post


Link to post
Share on other sites
Advertisement

if your test geometry is an approximation, you need to test in a conservative way. Easiest is if you truncate the MinX and MinY to integer and ceil MaxX and MaxY.

Share this post


Link to post
Share on other sites
3 hours ago, rapso said:

if your test geometry is an approximation, you need to test in a conservative way. Easiest is if you truncate the MinX and MinY to integer and ceil MaxX and MaxY.

So I guess I should clarify. The model geometry is the actual octree. When I check for occlusion, I approximate the cube as a screen bounding box of the cubes 8 vertices. When I render the cube, I render it as the screen bounding box as well. I just hope that the model is detailed enough such that the cubes I render are 1 pixel in size, so the screen bounding box converges to the same result as rendering the cube outright. You can see this below when I modify the max level of detail:

Pic2.png.edd703bcdfd1610369965a47d2a20412.thumb.png.ec3ac4bb81408ca5cd5c92ffb41b5e9a.pngPic3.png.4b55f4d850c16879d584ced518b762fd.thumb.png.ee3cc52885be4f4db67beecf65464234.pngPic1.png.c7f9f5d9935695621d420c05962c435b.thumb.png.3d2a097c46ae57d2958a239b1a138250.png

So if I where to test conservatively, I'd also have to render conservatively, and then a node is of size one pixel when Minx - MaxX == 0 and MinY - MaxY == 0 which is what I did before, and the program would render 10 times more nodes then it currently does.

Share this post


Link to post
Share on other sites

Software occlusion culling is usually used to get a _potential_ set of visible objects. You won't get " screen bounding box converges to the same result as rendering the cube outright", because a screen space box != world space box.

"and the program would render 10 times more nodes then it currently does." it will return "potentially" visible objects. it should not be 10 times, but there will be some more "potentially visible" than you really have, and that's usually OK. The goal is to reject objects that are 100% not visible.

If you really want a 100% matching result, you need to test-rasterize the real objects, not just bounding rects. (You can optimize that by using screen rects for big objects and on small coverage e.g. <=2x2 pixel use accurate tests, but obviously, 99% of the cost will be the tiny/accurate objects).

 

btw. If all you want is to render voxels, then raycasting might be the fastest solution.

Share this post


Link to post
Share on other sites
23 hours ago, rapso said:

Software occlusion culling is usually used to get a _potential_ set of visible objects. You won't get " screen bounding box converges to the same result as rendering the cube outright", because a screen space box != world space box.

"and the program would render 10 times more nodes then it currently does." it will return "potentially" visible objects. it should not be 10 times, but there will be some more "potentially visible" than you really have, and that's usually OK. The goal is to reject objects that are 100% not visible.

If you really want a 100% matching result, you need to test-rasterize the real objects, not just bounding rects. (You can optimize that by using screen rects for big objects and on small coverage e.g. <=2x2 pixel use accurate tests, but obviously, 99% of the cost will be the tiny/accurate objects).

 

btw. If all you want is to render voxels, then raycasting might be the fastest solution.

Right so I guess I should have stated what I'm doing differently. I'm rendering voxels stored in octrees. Part of that process is checking for occlusion. So I walk through the octree front to back, and if a node is occluded, I skip it and all of its children. Otherwise, I traverse it as well.

So when I say it would render 10 times more nodes, that's what it did when I tested it. I'm assuming the reason is because you get lots of nodes that are rounded up to have a diameter of 1 (so MaxX - MinX == 1) instead of 0, so the program traverses way further then it should be. I know that it is traversing way more nodes than it should because on average, the children of a parent node in a octree are like 60% of the size of their parent on screen. So doing the math, I'm rendering on a 512x512 screen, I should traverse 12-13 levels to get pixel sized octree nodes, which my current method gets me. When I ask the nodes to have a MinX == MaxX and MinY == MaxY, that becomes a couple levels deeper, which shouldn't be happening.

Right so to get 100% accurate results that would the correct path, but once the nodes become 1-2 pixels in size, the difference between doing either becomes very close to 0. My issue is with the small nodes, the large nodes seem to be working just fine, it's the small nodes that approach that small size that give me inconsistent results. So in the examples I drew in my first post, I have a 2x2 node that would be considered visible on screen. But when I subdivide it, none of its full children are overlapping a pixel so I get nothing drawn to the screen. This makes the traversal provide no results and I don't think its visually the correct result to be getting.

Share this post


Link to post
Share on other sites

Hey all, so I did some experimenting to try and see what rendered outputs of various methods give me. I'm focusing right now on making sure my rendered image renders hard edges properly and I want it to do so in the least amount of octree traversals as possible. My octree render function looks a little like this:


void RenderOctree(v3 Center, octree* Node, u32 Level)
{
    b32 IsLeaf = CheckIfLeaf(Node);
    
    i32 MinX, MaxX, MinY, MaxY;
    f32 FMinX, FMaxX, FMinY, FMaxY;
    f32 MinZ, MaxZ;
    ApproximateNodeSize(Center, Level, &FMinX, &FMaxX, &FMinY, &FMaxY, &MinZ, &MaxZ);
    
    if (MaxZ <= 0.0f)
    {
        // NOTE: Node is behind camera, cull it
        return;
    }
    
    MinX = SomeRoundingMethod(FMinX);
    MaxX = SomeRoundingMethod(FMaxX);
    MinY = SomeRoundingMethod(FMinY);
    MaxY = SomeRoundingMethod(FMaxY);
    
    if (NodeIsPixelSize())
    {
        // Render the node as a dot
        v2 ProjectedCenter = ProjectPoint(Center);
                    
        u32 PixelId = SomeRoundingMethod(ProjectedCenter.y)*ScreenX + SomeRoundingMethod(ProjectedCenter.x);
        f32 Depth = DepthMap[PixelId];
        if (Depth > Center.z)
        {
            RenderState->DepthMap[PixelId] = Center.z;
        }
        
        return;
    }
    
    if (MinX >= ScreenX || MinY >= ScreenX || MaxX <= 0 || MaxY <= 0)
    {
        // NOTE: Node is outside of the screen, don't render it or its children
        return;
    }
    
    MinX = Max(0, MinX);
    MinY = Max(0, MinY);
    MaxX = Min(ScreenX, MaxX);
    MaxY = Min(ScreenX, MaxY);
    
    b32 IsOccluded;
    {
        IsOccluded = true;
        f32* DepthRow = DepthMap + MinY*ScreenX + MinX;
        for (i32 Y = MinY; Y < MaxY; ++Y)
        {
            f32* Depth = DepthRow;
            for (i32 X = MinX; X < MaxX; ++X)
            {
                if (*Depth > MinZ)
                {
                    // NOTE: We aren't occluded
                    IsOccluded = false;
                    goto EndLoop;
                }

                ++Depth;
            }

            DepthRow += ScreenX;
        }
    }
EndLoop:
    if (IsOccluded)
    {
        return;
    }


    for (u32 CurrNodeId = 0; CurrNodeId < 8; ++CurrNodeId)
    {
        u32 SortedNodeId = Indicies[GetFrontBackSortId(CurrNodeId)];
        if (Node->Children[SortedNodeId])
        {
            RenderOctree(GetChildCenter(Node, SortedNodeId), Node->Children[NodeId], Level + 1);
        }
    }
}

So in the above code, we first approximate the nodes size on screen and get its float min/max xy values, we then use some sort of rounding method to convert those values to ints, we check if the nodes size is a pixel (and render if it is), otherwise we clip the node, check for occlusion, and traverse its children. For the various experiments I tried, I used the above code and only modified the rounding method for min/max, the rounding method for rendering the 1 pixel node, and whether the occlusion check was inclusive/exclusive of max x,y values.

The first methods I tried render single pixel nodes by projecting and flooring the center of the node to convert to a integer pair, which represents the pixel that node takes up. The reason we floor the center of the node is because flooring corresponds to rendering the node to the pixel it covers the most. 

Method 1:

1.thumb.png.f3a065eb6723a0f08ff649571cbe29c0.png

Render Info: Num Traversals: 5056321 NumRejected: 562733 NumRendered: 3656973

This method takes the floor of FMinX/Y values and the ceiling of FMaxX/Y values to get the pixels a node intersects (for occlusion checks). The idea here is that we try to be conservative with the pixels the node is expected to cover. This method produces renders with sharp edges but traverses a ton of extra nodes to do so. We also check if a node is ready to render by using the FMin/Max values. The node must satisfy (FMaxX - FMinX) <= 1.0f and (FMaxY - FMinY) <= 1.0f to be rendered.

Method 2:

2.thumb.png.ae1aef84cd63617446f29db636799901.png

Render Info: Num Traversals: 3406177 NumRejected: 646950 NumRendered: 2193007

Here we round the FMin/Max values to the nearest integer to calculate the integer Min/Max values. If you look at some of the edges in the renderer, they have pixel crawling (random dots sticking out). Method 1 didn't have this artifact because it would fill in the extra line where the dots stick out (the edges were thicker). In this method, we get a ton of nodes that get rounded down to be inside the edge instead of outside, and thus are rejected and prevented from being rendered correctly, which gives us the pixel crawling. We again use the FMin/Max values to check if a node is small enough to be rendered.

Method 3:

4.thumb.png.0682d8675fe63b0af80653a585e21d65.png

Render Info: Num Traversals: 3705105 NumRejected: 299739 NumRendered: 2784177

Here we use integer Min/Max values to decide if a node is small enough to be considered a single pixel in size. We check if MaxX - MinX <= 1 and MaxY - MinY <= 1. We take the floor of our FMin/FMax values to generate the min/max values, and we make our occlusion check inclusively check pixels at MaxX and MaxY coordinates. The result has lots of holes and pixel crawling which seems to be caused by improper coverage of the nodes, due to the rounding method. I figured that being inclusive with the bounds check would make the node size correspond to the actual pixels the node takes up but I guess this isn't the case.

Method 4: 

5.thumb.png.b95d09921f4e6f465f00940f93c806b5.png

Render Info: Num Traversals: 2544985 NumRejected: 340016 NumRendered: 1777500

Here we use integer Min/Max values to decide when to render a node and we calculate it by rounding FMin/Max to the nearest integer. The idea is that the previous method wasn't capturing the node size properly so maybe rounding provides the actual node size that the nodes take up on screen. We again use integer Min/Max to decide if a node is ready to be rendered. The result looks almost perfect, but it still has a little pixel crawling occurring for hard edges.

Method 5: 

8.thumb.png.ee07b201144f743b57bc6a0e58bd6e16.png

Render Info: Num Traversals: 2686033 NumRejected: 1045595 NumRendered: 1184283

This method (like method 4) uses integer Min/Max to decide when a node should be rendered, and it calculates the Min/Max values by rounding to the nearest integer. The only difference is, if a node's MinX==MaxX or MinY == MaxY, then the node is discarded completely and not rendered. This seems to cleanup Method 4's image and make all the edges sharp.

 

Method 5 seems to be the most correct so far. And best of all, it also has a similar number of traversals as Method 4, so we aren't being overly conservative with approximating the nodes size when converting the coordinates to integers. One thing that bothers me with this method is the fact that nodes which are 0 pixels wide or tall. Of course those nodes should be discarded since they have 0 pixels covered but it feels odd that they exist in the first place and have to be removed. I guess that's just a sacrifice that has to happen because of weird rounding errors that can occur. 

The other thing that bothers me with this method and the other methods which use integer min/max values to calculate if a node should be rendered is the resulting level map that gets rendered. So when I render each node, I also render into a separate buffer a color for the given level (recursion depth) that the node is. So we have a color for level 1, another color for level 2, and so on. If we use float FMin/FMax values to decide if a node should be rendered, our level map looks like this:

LevelFloat.thumb.png.9aa0494ada51e0a62bdbb277dc97ba71.png

If we use integer Min/Max, our level map looks like this:

LevelInt.thumb.png.eca6423bd77060bbdfd8c387a7c67230.png

Now there are cases where a node which is 1 pixel by 1 pixel can actually color 4 pixels if it happens to have its center where 4 pixel corners intersect. So from that logic, it makes sense for the node to be rendered based off of integer min/max values. But from the above images, we would expect layers of levels moving farther away from the camera like what we get when we use float FMin/FMax values to decide if a node is small enough to render.

So my question is, whats the right answer here? Should my rendering method generate 0 sized nodes that need to be discarded? Should it also have the weird level map as shown above? Or are there some other rounding methods that make more sense  theoretically/experimentally?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!