Followers 0

# OpenGL Voxel Cone Tracing Experiment - Part 2 Progress

## 54 posts in this topic

Looks pretty good, where's your main bottleneck for performance, is it really your new mip mapping?

0

##### Share on other sites

Looks pretty good, where's your main bottleneck for performance, is it really your new mip mapping?

No, mip-mapping is still pretty cheap in the scale of things (but the cost may accumulate later when I try to implement cascades).

Right now the main bottlenecks for performances are soft-shadowing, ssR and ssao.

I haven't bothered to set up any sort of way of querying actual cost of each feature so I can't really tell you accurately where the costs would come from. All I can tell you is the framerate of what I remember from before I implemented soft-shadowing, ssR and ssao - where it was running at 50fps with the same cone tracing features (except now I have the modified mip-mapping). I think soft-shadowing (for main pointlight and 3 emissive objects) pushed it down to ~35fps, ssR pushed it down to about ~25fps and ssao pushed it to ~20fps.

I believe that there is a lot of cost to binding all these textures, so I think the next step for me in improving performance would be to dwell into bindless graphics. I also want to try and get partially resident textures working on my nvidia card with OpenGL 4.4, but I haven't found any resources to help me with this - has anyone been able to implement this?

1

##### Share on other sites

It's well worth spending a day to implement a basic GPU profiling system (I guess using ARB_timer_query in GL? I've only done it in D3D so far.) so you can see how many milliseconds are being burnt by your different shaders/passes. You can then print them to the screen, or the console, or a file, etc, and get decent statistics for all of your different features in one go.

This is an excellent idea, especially since there are plenty of optimizations to pare down SSAO/SSR, those are pretty well established and researched. You'd really be looking at cone tracing for trying novel optimizations, though I can think of several already done.

One is downsampling before or otherwise binning together pixel blocks for the diffuse trace, which would work well with pixels relatively close to each other but miss thin and edge objects in the right cases. Epic also does this with the specular trace though they never explained how other than a hand wavy "then you upsample and scatter".

To reduce the number of samples for specular trace you can check a lower mip level of the volume for the alpha to see if its empty and if you should skip it. I'm also interested in, and may eventually get back to trying to figure out realtime signed distance fields. This should give you a minimum step size you can skip to for tracing, reducing the amount of samples you need.

As for a huge area (if you want to go that far), volume LOD and a Directed Acylic Graph (as I mentioned earlier) should reduce memory consumption a lot since you're using volume textures and not a sparse octree, though the paper is based on doing as such for an octree so I'm not sure how a uniform volume texture would play out.

Edited by Frenetic Pony
0

##### Share on other sites

Your post processing is fairly expensive. With just those you could't run a game 60fps.

0

##### Share on other sites

I'm curious as to why your SSAO + SSR are so expensive.

0

##### Share on other sites

I'm curious as to why your SSAO + SSR are so expensive.

Did some debugging and found out that (because I'm using forward rendering) I had accidentally used the hi-res version of the Buddha model for my ssao and ssr (over a million tris).

So instead of 1.0ms from the vertex shader with the low-poly model, I was getting 10ms.

My ssao is about 8.5ms now.

However, when I previously reported my results, I didn't actually have any ssr turned on, so my ssr, when turned on for the entire scene (all surfaces) is an additional 8.8ms.

I guess there's still a lot of room to optimize my ssr - when I implemented it, I was looking more for getting the best quality I could get than performance.

I've managed to reduce my ssao to 4.7ms without too much quality loss.

I'm trying to calculate whether deferred shading has an advantage over my current forward shading. With deferred shading, I have to render the Buddha at full res for position, normal and albedo textures so this will be a fixed vertex shader cost of 30ms. At the moment with forward shading, I render the model at full res once and at low-res 7 times, so that makes 17ms altogether for vertex shader costs.

Edited by gboxentertainment
1

##### Share on other sites

Hi! Try to find SSR with iteractive step - not fixed step. You will find reflected pixel in 3-8 steps. SSR must be faster than any postprocess effect.

SSAO - better implement it in multiple resolutions with upsampling - faster/better/no noise/no need to post-blur.

0

##### Share on other sites

I would ditch all screenspace hacks If I would have voxel data structure some where already...

0

##### Share on other sites

Yeah, your SSAO and SSR implementations seem very bloated. There's definitely a lot of room for optimizations here.

As for deferred shading: why would your cost for geometry go up 3x? You just need to use MRT to output some g-buffer data.

0

##### Share on other sites

I would ditch all screenspace hacks If I would have voxel data structure some where already...

The problem is computation and memory cost go up quite quickly with increased voxel resolution, and he's only getting that with a small room. Ideally you'd have say, a 16 square kilometer grid centered around the player. Which is going to cost more than enough by itself, without getting into the same voxel resolution as screen resolution.

0

##### Share on other sites

I would ditch all screenspace hacks If I would have voxel data structure some where already...

The problem is computation and memory cost go up quite quickly with increased voxel resolution, and he's only getting that with a small room. Ideally you'd have say, a 16 square kilometer grid centered around the player. Which is going to cost more than enough by itself, without getting into the same voxel resolution as screen resolution.

Screen space effects are still useful for voxel cone tracing because the voxels often don't have enough resolution to provide finner details. For instance, ambient occlusion generated naturally by the cone tracing tends to look a bit washed out due to the lack of geometric detail and thus can benefit from SSAO to give finner details.

Same thing goes for reflections, if you want sharp reflections you'd need very small voxels which is impractical and expensive, in this case screen space reflections can help a lot. However, for blurred reflections voxel cone tracing is very good.

0

##### Share on other sites

Here's my ssR code for anyone that can help me optimize whilst still keeping some plausible quality:

	vec4 bColor = vec4(0.0);

vec4 N = normalize(fNorm);
mat3 tbn = mat3(tanMat*N.xyz, bitanMat*N.xyz, N.xyz);
vec4 bumpMap = texture(bumpTex, texRes*fTexCoord);
vec3 texN = (bumpMap.xyz*2.0 - 1.0);
vec3 bumpN = bumpOn == true ? normalize(tbn*texN) : N.xyz;

vec3 camSpaceNorm = vec3(view*(vec4(bumpN,N.w)));
vec3 camSpacePos = vec3(view*worldPos);

vec3 camSpaceViewDir = normalize(camSpacePos);
vec3 camSpaceVec = normalize(reflect(camSpaceViewDir,camSpaceNorm));

vec4 clipSpace = proj*vec4(camSpacePos,1);
vec3 NDCSpace = clipSpace.xyz/clipSpace.w;
vec3 screenSpacePos = 0.5*NDCSpace+0.5;

vec3 camSpaceVecPos = camSpacePos+camSpaceVec;
clipSpace = proj*vec4(camSpaceVecPos,1);
NDCSpace = clipSpace.xyz/clipSpace.w;
vec3 screenSpaceVecPos = 0.5*NDCSpace+0.5;
vec3 screenSpaceVec = 0.01*normalize(screenSpaceVecPos - screenSpacePos);

vec3 oldPos = screenSpacePos + screenSpaceVec;
vec3 currPos = oldPos + screenSpaceVec;
int count = 0;
int nRefine = 0;
float farPlane = 2.0;
float nearPlane = 0.1;

float cosAngInc = -dot(camSpaceViewDir,camSpaceNorm);
cosAngInc = clamp(1-cosAngInc,0.3,1.0);

if(specConeRatio <= 0.1 && ssrOn == true)
{
while(count < 50)
{
if(currPos.x < 0 || currPos.x > 1 || currPos.y < 0 || currPos.y > 1 || currPos.z < 0 || currPos.z > 1)
break;

vec2 ssPos = currPos.xy;

float currDepth = 2.0*nearPlane/(farPlane+nearPlane-currPos.z*(farPlane-nearPlane));
float sampleDepth = 2.0*nearPlane/(farPlane+nearPlane-texture(depthTex, ssPos).x*(farPlane-nearPlane));
float diff = currDepth - sampleDepth;
float error = length(screenSpaceVec);
if(diff >= 0 && diff < error)
{
screenSpaceVec *= 0.7;
currPos = oldPos;
nRefine++;
if(nRefine >= 3)
{
break;
}
} else if(diff > error){
bColor.xyz = vec3(0);
sampleDepth = 2.0*nearPlane/(farPlane+nearPlane-texture(depthBTex, ssPos).x*(farPlane-nearPlane));
diff = currDepth - sampleDepth;
if(diff >= 0 && diff < error)
{
screenSpaceVec *= 0.7;
currPos = oldPos;
nRefine++;
if(nRefine >= 3)
{
break;
}
}
}

oldPos = currPos;
currPos = oldPos + screenSpaceVec;
count++;

}
}


Note that the second half of the code (after the else if(diff > error)) is where I cover the back face of models (depthBTex is a depth texture with frontface culling) so that the back of models are reflected.

1

##### Share on other sites
float L=0.1;
float4 T=0;
float3 NewPos;
for(int i=0;i<10;i++){
NewPos=RealPos+R*L; // RealPos - current position, R- reflection
T=mul(float4(NewPos,1),mat_ViewProj); // Projecting new position to screen.
T.xy=0.5+0.5*float2(1,-1)*T.xy/T.w;
NewPos=GetWorldPos( GBufferPositions.Load(uint2(gbufferDim.xy* T),0),T.xy,mat_ViewProjI); // Find world position

L=length(RealPos-NewPos); // new distance
}

T.xy - texturecoord of reflected pixel

1

##### Share on other sites

So I've managed to remove some of the artifacts from my soft shadows:

Previously, when I had used front-face culling I got the following issue:

This was due to backfaces not being captured by the shadow-caster camera when at overlapping surfaces, thus leading to a gap of missing information in the depth test. There's also the issue of back-face self shadowing artifacts.

Using back-face culling (only rendering the front-face) resolves this problem, however, leads to the following problem:

Which is front-face self shadowing artifacts - any sort of bias does not resolve this problem because it is caused by the jittering process during depth testing.

I came up with a solution that resolves all these issues for direct lighting shadows, which is to also store an individual object id for each object in the scene from the shadow-caster's point of view. During depth testing, I then compare the object id from the player camera's point of view with that from the shadow-caster's point of view and make it so that each object does not cast its own shadow onto itself:

Now this is all good for direct lighting, because everything that is not directly lit I set to zero, including shadows, and then I add the indirect light to that zero - so there's a smooth transition between the shadow and the non-lit part of each object.

For indirectly lit scenes with no direct lighting at all (i.e. emissively lit by objects), things are a bit different. I don't separate a secondary bounce with the subsequent bounces, all bounces are tied together - thus I cannot just set a secondary bounce as the "direct lighting" and everything else including shadows to zero, then add the subsequent bounces. This would require an additional voxel texture and I would need to double the number of cone traces.

I cheat by making the shadowed parts of the scene darker than the non-shadowed parts (when a more accurate algorithm would be to make shadowed areas zero and add subsequent bounces to those areas). This, together with the removal of any self-shadowing leads to shadow leaking:

So I think I have two options:

1. Add another voxel texture for the second bounce and double the number of cone traces (most expensive).
2. Switch back to back-face rendering with front-face culling for the shadow mapping only for emissive lighting shadows (lots of ugly artifacts).

I wonder if anyone can come up with any other ideas.

1

##### Share on other sites

I just tested this with my brand new EVGA GTX780 and it runs at average 95fps at 1080p with all screen space effects turned on (ssao, ssr, all soft shadows). In fact, screen space effects seem to make little dent in the framerate.

I discovered something very unusual when testing the voxel depth. Here's my results:

32x32x32 -> 95fps (37MB memory)

64x64x64 -> 64fps (37MB memory)

128x128x128 -> 52fps (37MB memory)

256x256x256 -> 31fps (38MB memory)

512x512x512 -> 7fps (3.2GB memory)

How on earth did I jump from 38MB memory to 3.2GB of memory used when going from 256 to 512 3d texture depths?!

1

##### Share on other sites

I just tested this with my brand new EVGA GTX780 and it runs at average 95fps at 1080p with all screen space effects turned on (ssao, ssr, all soft shadows). In fact, screen space effects seem to make little dent in the framerate.

I discovered something very unusual when testing the voxel depth. Here's my results:

32x32x32 -> 95fps (37MB memory)

64x64x64 -> 64fps (37MB memory)

128x128x128 -> 52fps (37MB memory)

256x256x256 -> 31fps (38MB memory)

512x512x512 -> 7fps (3.2GB memory)

How on earth did I jump from 38MB memory to 3.2GB of memory used when going from 256 to 512 3d texture depths?!

Obviously your profiler is broken somehow, as I doubt your experiment manages to hold ever increasing data in the same exact amount of ram.

Edited by Frenetic Pony
0

##### Share on other sites

Obviously your profiler is broken somehow, as I doubt your experiment manages to hold ever increasing data in the same exact amount of ram.

Actually I'm using the task manager to get the amount of ram that my application is using.

0

##### Share on other sites

Just a general idea regarding the light-info accumulation concept which was floating around my head for some time now and I finally want to get rid of :

Instead of cone-tracing per screen-pixel (which is how the technique works default wise IIRC), couldn't you seperate your view frustrum into cells (similar to what you do for clustered shading, but perhaps with cube-shaped cells), accumulate the light information in these represented by spherical harmonics using cone-tracing and finally use this SH - 'volume' to light your scene?

You would of course end up with low frequent information only suitable for diffuse lighting (like when using light propagation volumes, but still with less quantization since you would not (necessarily) propagate the information iteratively (or at least with fewer steps if you choose to do so to keep the trace range shorter)) but on the other hand you could probably reduce the amount of required cone-traces considerably (you also would only need to fill cells with intersecting geometry (if you choose not to propagate iteratively)) and, to some extend, resolve the correlation between the amount of traces and the output pixel count.

Just an idea.

Edited by Bummel
1

##### Share on other sites

That's a similar idea to what others already did, which is just downsample before tracing and then upsample the results (with some trickery for fine edges). The main problem with just doing cells is that an always present (and temporally stable) specular term is part of the thing that really sells GI to begin with. Still, it's an idea if you're really performance bound.

I think I mentioned a similar idea but just for particles, which are going to be diffuse only anyway for the most part and would be really helpful with layers of transparency. And now that I think about it, it would also work well for highly distant objects. While specular doesn't actually fall off of course, anything but primary specular (say from the sun) shouldn't be too noticeable really far away.

As for transparency, "inferred" or stippled transparency rendering would be really useful for cone tracing. I'm not sure you could also downsample the tracing simultaneously, but it would still prevent tracing from multiple layers of transparency.

As for using a directed acylic graph. I've been thinking that you'd need to separately store albedo/position information, mipmap that, and then figure out a way to apply lighting to different portions dynamically and uniquely using the indirection table. If you're missing what I'm talking about, a Directed Acylic Graph would converge identical copies of voxel areas into just one copy, and then use a table or "indirection table" to direct the tracing to where each copied block was in worldspace.

1

##### Share on other sites

The main problem with just doing cells is that an always present (and temporally stable) specular term is part of the thing that really sells GI to begin with.

As I understand it, the diffuse part is actually the costly one because of the large amount of cones you need to trace per pixel in the default solution. So for rather sharp glossy highlights you could keep tracing them per pixel without the intermediate accumulation step into the SH-volume. But that's of course just the theory.

0

## Create an account

Register a new account

Followers 0

• ### Similar Content

• So it's been a while since I took a break from my whole creating a planet in DX11. Last time around I got stuck on fixing a nice LOD.
A week back or so I got help to find this:
https://github.com/sp4cerat/Planet-LOD
In general this is what I'm trying to recreate in DX11, he that made that planet LOD uses OpenGL but that is a minor issue and something I can solve. But I have a question regarding the code
He gets the position using this row
vec4d pos = b.var.vec4d["position"]; Which is then used further down when he sends the variable "center" into the drawing function:
if (pos.len() < 1) pos.norm(); world::draw(vec3d(pos.x, pos.y, pos.z));
Inside the draw function this happens:
draw_recursive(p3[0], p3[1], p3[2], center); Basically the 3 vertices of the triangle and the center of details that he sent as a parameter earlier: vec3d(pos.x, pos.y, pos.z)
Now onto my real question, he does vec3d edge_center[3] = { (p1 + p2) / 2, (p2 + p3) / 2, (p3 + p1) / 2 }; to get the edge center of each edge, nothing weird there.
But this is used later on with:
vec3d d = center + edge_center[i]; edge_test[i] = d.len() > ratio_size; edge_test is then used to evaluate if there should be a triangle drawn or if it should be split up into 3 new triangles instead. Why is it working for him? shouldn't it be like center - edge_center or something like that? Why adding them togheter? I asume here that the center is the center of details for the LOD. the position of the camera if stood on the ground of the planet and not up int he air like it is now.

Full code can be seen here:
https://github.com/sp4cerat/Planet-LOD/blob/master/src.simple/Main.cpp
If anyone would like to take a look and try to help me understand this code I would love this person. I'm running out of ideas on how to solve this in my own head, most likely twisted it one time to many up in my head
Toastmastern

• I googled around but are unable to find source code or details of implementation.
What keywords should I search for this topic?
Things I would like to know:
A. How to ensure that partially covered pixels are rasterized?
Apparently by expanding each triangle by 1 pixel or so, rasterization problem is almost solved.
But it will result in an unindexable triangle list without tons of overlaps. Will it incur a large performance penalty?
How to ensure proper synchronizations in GLSL?
GLSL seems to only allow int32 atomics on image.
C. Is there some simple ways to estimate coverage on-the-fly?
In case I am to draw 2D shapes onto an exisitng target:
1. A multi-pass whatever-buffer seems overkill.
2. Multisampling could cost a lot memory though all I need is better coverage.
Besides, I have to blit twice, if draw target is not multisampled.

• By mapra99
Hello

I am working on a recent project and I have been learning how to code in C# using OpenGL libraries for some graphics. I have achieved some quite interesting things using TAO Framework writing in Console Applications, creating a GLUT Window. But my problem now is that I need to incorporate the Graphics in a Windows Form so I can relate the objects that I render with some .NET Controls.

To deal with this problem, I have seen in some forums that it's better to use OpenTK instead of TAO Framework, so I can use the glControl that OpenTK libraries offer. However, I haven't found complete articles, tutorials or source codes that help using the glControl or that may insert me into de OpenTK functions. Would somebody please share in this forum some links or files where I can find good documentation about this topic? Or may I use another library different of OpenTK?

Thanks!

• Hello, I have been working on SH Irradiance map rendering, and I have been using a GLSL pixel shader to render SH irradiance to 2D irradiance maps for my static objects. I already have it working with 9 3D textures so far for the first 9 SH functions.
In my GLSL shader, I have to send in 9 SH Coefficient 3D Texures that use RGBA8 as a pixel format. RGB being used for the coefficients for red, green, and blue, and the A for checking if the voxel is in use (for the 3D texture solidification shader to prevent bleeding).
My problem is, I want to knock this number of textures down to something like 4 or 5. Getting even lower would be a godsend. This is because I eventually plan on adding more SH Coefficient 3D Textures for other parts of the game map (such as inside rooms, as opposed to the outside), to circumvent irradiance probe bleeding between rooms separated by walls. I don't want to reach the 32 texture limit too soon. Also, I figure that it would be a LOT faster.
Is there a way I could, say, store 2 sets of SH Coefficients for 2 SH functions inside a texture with RGBA16 pixels? If so, how would I extract them from inside GLSL? Let me know if you have any suggestions ^^.
• By KarimIO
EDIT: I thought this was restricted to Attribute-Created GL contexts, but it isn't, so I rewrote the post.
Hey guys, whenever I call SwapBuffers(hDC), I get a crash, and I get a "Too many posts were made to a semaphore." from Windows as I call SwapBuffers. What could be the cause of this?
Update: No crash occurs if I don't draw, just clear and swap.
static PIXELFORMATDESCRIPTOR pfd = // pfd Tells Windows How We Want Things To Be { sizeof(PIXELFORMATDESCRIPTOR), // Size Of This Pixel Format Descriptor 1, // Version Number PFD_DRAW_TO_WINDOW | // Format Must Support Window PFD_SUPPORT_OPENGL | // Format Must Support OpenGL PFD_DOUBLEBUFFER, // Must Support Double Buffering PFD_TYPE_RGBA, // Request An RGBA Format 32, // Select Our Color Depth 0, 0, 0, 0, 0, 0, // Color Bits Ignored 0, // No Alpha Buffer 0, // Shift Bit Ignored 0, // No Accumulation Buffer 0, 0, 0, 0, // Accumulation Bits Ignored 24, // 24Bit Z-Buffer (Depth Buffer) 0, // No Stencil Buffer 0, // No Auxiliary Buffer PFD_MAIN_PLANE, // Main Drawing Layer 0, // Reserved 0, 0, 0 // Layer Masks Ignored }; if (!(hDC = GetDC(windowHandle))) return false; unsigned int PixelFormat; if (!(PixelFormat = ChoosePixelFormat(hDC, &pfd))) return false; if (!SetPixelFormat(hDC, PixelFormat, &pfd)) return false; hRC = wglCreateContext(hDC); if (!hRC) { std::cout << "wglCreateContext Failed!\n"; return false; } if (wglMakeCurrent(hDC, hRC) == NULL) { std::cout << "Make Context Current Second Failed!\n"; return false; } ... // OGL Buffer Initialization glClear(GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT); glBindVertexArray(vao); glUseProgram(myprogram); glDrawElements(GL_TRIANGLES, indexCount, GL_UNSIGNED_SHORT, (void *)indexStart); SwapBuffers(GetDC(window_handle));

• 19
• 14
• 23
• 11
• 28