Jump to content
  • Advertisement
spike1

R&D Realtime raytracing on a 2014 GPU

Recommended Posts

Advertisement

Super awesome!

I get 83 fps on FuryX on 1440p screen :O

Keep us updated... :)

You could try Spatiotemporal Variance Guided Filter to make it much 'faster'? Guess you know the Quake2 demo is open source but there's also a project on this site: http://cwyman.org/papers.html

Very impressive work! :D

 

Share this post


Link to post
Share on other sites

Very cool, thanks for explaining how it works! I played around with some similar techniques (ray-tracing voxels using an SDF to accelerate the trace) about a year ago, and it was a fun little project. I was mainly interested of doing it in the context of real-time/interactive lightmap baking on the GPU, and it seemed to work really well for that. But yeah, even with jump-flooding building that SDF got real slow. The voxel grid could also end up using quite a bit of memory. But still very promising!

Share this post


Link to post
Share on other sites

How do you voxelize a scene to a texture, i wonder how big this 3d tex is, i cant really imagine one texture handling that much detail

Share this post


Link to post
Share on other sites
13 hours ago, spike1 said:

The global illumination works pretty simply too, when voxelizing the scene I throw some rays out from each voxel, and since they raycast against themselves, each frame I get an additional bounce of light :D

If you do this, you can get infinite bounces for free (not just a single one).

I do it this way using surfels instead voxels, tracing only from the surfels and caching irradiance to get realtime lightmaps. Only downside is the spatial discretization, so for full details with geometric frequency below the voxel resolution you'd still need to trace from screenspace as well.

Share this post


Link to post
Share on other sites
On 2/25/2019 at 3:49 AM, JoeJ said:

Super awesome!

I get 83 fps on FuryX on 1440p screen 😮

You could try Spatiotemporal Variance Guided Filter to make it much 'faster'?

Thanks for timing, that's awesome :D As for the SVGF, I've kinda avoided using any sort of spatial denoising, I usually find the blobby noise artefacts distracting. That said, I had a go at implementing a precursor to that paper (the A-Trous wavelet filter one, and that combined with the temporal super sampling I have is pretty close to the one you mentioned) and wow it does a great job of smoothing the bounce lighting!

Screenshot_2019-02-26_22-43-41.thumb.png.3d23dfaf4164bdb665ff18fda0e2c789.png

I think denoising that separately to the direct light might work well, if I can speed up the filter (4 passes of a 5x5 kernel at 1080p with the edge awareness costs about 15ms...yikes). Even in that paper the filter costs 10ms on a Titan X, but could be good for fully clearing up the image on higher end GPUs.

On 2/25/2019 at 10:18 AM, _WeirdCat_ said:

How do you voxelize a scene to a texture, i wonder how big this 3d tex is, i cant really imagine one texture handling that much detail

Yeah memory is a problem with these sorts of techniques, so I do cheat a bit. I find I can store the colour data at half resolution without it affecting diffuse lighting much (reflections on the other hand...). On top of that, the voxels are reasonably large (4cm-ish each side, in a 512x512x512 texture), which is really just enough to make the lighting passable - there are still lots of artefacts around :P

On 2/25/2019 at 7:09 AM, MJP said:

Very cool, thanks for explaining how it works! I played around with some similar techniques (ray-tracing voxels using an SDF to accelerate the trace) about a year ago, and it was a fun little project. I was mainly interested of doing it in the context of real-time/interactive lightmap baking on the GPU, and it seemed to work really well for that. But yeah, even with jump-flooding building that SDF got real slow. The voxel grid could also end up using quite a bit of memory. But still very promising!

It's interesting you and JoeJ both mention lightmaps, for a while the intention behind this was also to generate real-time lightmaps. I would start off calculating at a low lightmap resolution then adaptively subdivide to improve shadow edges and whatnot. That adaptive subdivision necessitated that it was noiseless though (or else you get really distracting flickering patches), which for simpler Quake-complexity levels was achievable, but past that...yeah. I've sped up the raytracing since then though, so maybe I could have a look at that again...

On 2/25/2019 at 4:23 PM, JoeJ said:

If you do this, you can get infinite bounces for free (not just a single one).

Sorry yeah that's what I meant to say :P.


Outside of the denoising filter I also got the sample rejection working right again in the temporal super sampling filter, along with initially linear convergence. Now I can increase the number of frames blended, and instead of smearing artefacts when looking at new areas I just get noise, which isn't too bad except for a particular case. Whenever I move the camera small amounts the edges of objects all get rejected, which looks really bad :P.  Here's an exaggerated image (I let it converge fully before moving):

Screenshot_2019-02-26_22-44-21.thumb.png.7eac8bfedf99499baa671a46c669c4c1.png

I'm hoping once I add a second temporal anti aliasing filter on top (which will use neighbourhood clamping rather than complete rejection) this might be less noticeable.

The other option is to throw more samples into new areas, which works but slows it down a bit when the camera is in motion - that said there are likely more efficient ways to implement it than a variable length for loop :P

Share this post


Link to post
Share on other sites
2 hours ago, spike1 said:

I think denoising that separately to the direct light might work well, if I can speed up the filter (4 passes of a 5x5 kernel at 1080p with the edge awareness costs about 15ms...yikes). Even in that paper the filter costs 10ms on a Titan X

I remember those timings given in the papers, and i wonder Schieds Quake 2 demo works so well and fast now. Anyone knows the runtime of the filter there? I have no RTX yet and can't test myself. Must be much faster than 10ms i guess.

Personally i do compute RT too, but i update the full environment of stochastic light map texels, so i have no noise and no denoising experience either.

2 hours ago, spike1 said:

On top of that, the voxels are reasonably large (4cm-ish each side, in a 512x512x512 texture)

What do you think about the idea to store voxels using only one bit? Ofc for occlusion only, but 4*4*4 voxels in 64 bit value seems interesting, especially if you need no filter. 

I assume only the primary hit is accurate triangles, and all secondary rays for GI trace only against voxels?

Share this post


Link to post
Share on other sites
7 hours ago, JoeJ said:

I remember those timings given in the papers, and i wonder Schieds Quake 2 demo works so well and fast now. Anyone knows the runtime of the filter there? I have no RTX yet and can't test myself. Must be much faster than 10ms i guess.

Personally i do compute RT too, but i update the full environment of stochastic light map texels, so i have no noise and no denoising experience either.

From a quick look at the source it looks like they're using a pretty full implementation of the paper, but I do notice some differences to my rushed implementation (like they use a depth-normals texture rather than separate ones like I do) so those may account for some of the speed. Is there anywhere I can see your work? It sounds rather interesting :)

8 hours ago, JoeJ said:

What do you think about the idea to store voxels using only one bit? Ofc for occlusion only, but 4*4*4 voxels in 64 bit value seems interesting, especially if you need no filter. 

I assume only the primary hit is accurate triangles, and all secondary rays for GI trace only against voxels?

Yeah I probably should have been more clear about the camera rays, just usual triangle rasterization. If you turn on Visualize Voxels in the demo you can see how crude they are haha XD.

As for the binary voxelization, I've played around with it a bit in the past in some other contexts - a cool property of it is you can precalculate specific ray directions if you store every voxel along that direction in the value, and you can find the intersection with just a single sample, a bit mask and a findLSB. I figured I could use this to precalculate several directions and sample all of them to improve the bounce lighting quality, but I didn't end up using it (I can't remember why to be honest, perhaps the generation of the directions cost too much, and without enough directions the coherency of the rays becomes obvious and looks distracting).

But yeah the idea you describe sounds good, and would fit into my current method pretty smoothly - I actually compute the acceleration structure at half resolution, so once I hit the finest detailed cells I start sampling from the highest resolution voxel texture, an 8-bit texture which really only needs to be 1-bit :P. If I just use the 8-bit as a 2x2x2 block I should be able to halve the size of the voxels without any memory increase, the only difficult thing is to generate the texture.

When voxelizing I use the "project the triangle by the axis with the largest visible surface area in a compute shader, then when rasterizing write manually into the right voxel spot using imageStore" method, and I was hoping that I'd be able to use the imageAtomicOr function to directly create this texture - unfortunately it only works on 32-bit ints.

I'm wondering if maybe texture views, or the format access qualifiers might give me a way to work around this, I don't have any experience with them so I haven't a clue :P

Share this post


Link to post
Share on other sites
6 hours ago, spike1 said:

Is there anywhere I can see your work? It sounds rather interesting

No, i have no renderer at all yet, just the compute shaders to generate lightmaps. (I visualize texels as quads :) for now) The surfels i'm using have similar properties to voxels, just they form a quadtree over the surface instead a 3D octree in space. That's promising, but hard to do. I have wasted more than a full year of work by implementing quadrangulation papers and improving them, but nothing was good for my requirement to support LOD. Finally i've solved the problem and can move on, but i feel exhausted and pretty dump to loose so much time on this although easier alternatives would have worked as well.

So it will take still some time until i can show something. I use a quake level, and the RTX demo looks very similar to what i have, also the specular reflections (which i get from 4x4 env maps per lightmap texel). But i can do it in 5 ms on first gen GCN with infinite bounces.

I planned to raytrace my surfel hierarchy for sharper reflections, but now it might make more sense to use RTX for that. Not sure. RTX seems totally wrong to me: Restricted to non LOD triangles, single threaded rays, blackboxed BVH. That's not what i would have expected to be good for games and i don't like it. So i should focus on the renderer and wait what happens for next gen consoles (but i guess i can't resist to try it anyways :) )

6 hours ago, spike1 said:

2x2x2 block

Yeah, i'm quite a voxel non-believer, but after another dev brought up some compression ideas to me recently i find them more interesting again. Things like this: http://jcgt.org/published/0006/02/01/paper-lowres.pdf - 0.048 bits per voxel. Streaming large worlds and unpacking them to multilevel bit grids should work well i guess.

6 hours ago, spike1 said:

I'm wondering if maybe texture views, or the format access qualifiers might give me a way to work around this, I don't have any experience with them so I haven't a clue :P

I don't know those things either, but finally VK has support for 64 bit atomics (so 4^3), i guess OpenGL too. Could be useful maybe when rasterization becomes inefficient at distance for lower LOD cascades and splatting the just vertices would suffice. 

 

Share this post


Link to post
Share on other sites

Impressive work! 

I've implemented Voxel Cone Tracing (VCT) a while back using a technique similar to yours (see here for images and description). And reading your description made me question why didn't you take a step further and made it into a cone tracer to get rid of all that noise? I'm very curious to understand that because your implementation seems to be very close to one.
 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!