Real-time hybrid rasterization-raytracing engine

Details

The aim of this project is to create a renderer which almost matches Path Tracing in quality, while generating noiseless images hundreds times faster. Instead of tracing thousands of rays per pixel to get soft shadows and diffuse reflections, it traces a single ray per pixel and uses smart blur techniques in screen space. It also uses standard Deferred Rendering for rasterizing main image. One of the project's fundamental rules is that everything is dynamic (no pre-baked data).

Current implementation of ray tracing is very slow, with significant fixed cost per object. Performance will be dramatically improved after switching graphics API to DirectX12. The switch is needed to randomly access meshes from within the shader, rather than processing one by one as it is now. Despite that, image processing of shadows and reflections/refractions is pretty fast, which allows for real-time operation in small/medium sized scenes.

I'm certain that most operations could be speed-up 2-10x if some more work was put into it.

In development since 2014, 1500+ hours of work.

Differences from NVIDIA RTX demos (such as SVGF - Spatiotemporal Variance-Guided Filtering):
- engine doesn't shoot rays in random directions, but rather offsets them slightly based on pixel position in image-space. This results in zero temporal noise.
- engine traces rays one layer (bounce) at a time and then stores results in render targets, which allows blurring of each light bounce separately. NVIDIA traces all bounces at once and then blurs the result.
- engine doesn't use temporal accumulation of samples, although it could be added for extra quality.
- NVIDIA demos analyze noise patterns to deduce how much blur needs to be applied at given screen region, I deduce this information from distance to the nearest occluder (for shadows) or nearest reflected/refracted object. It gives temporally stable results and saves some processing time (couple of ms per frame for NVIDIA), but introduces own set of challenges.

Working features:
• Deferred Rendering
• Light reflections/refractions
· Locally correct
· Fully dynamic
· Diffuse - for rough surfaces
· Multiple refractions/reflections
· Blur level depanding on surface to object distance
· Fully dynamic
· Soft - depanding on light source radius, distance to occluder
· "Unfinite " precision - tiny/huge objects cast proper soft shadows, light source can be 1000 km away
· Visible in reflections/refractions
· Correct shadows from transparent objects - depanding on alpha texture
• Physically Based Rendering - only one material, which supports all effects
• Tonemapping, bloom, FXAA, ASSAO (from Intel)

Future plans:
• Switching from DX11 do DX12 - to allow for indexing meshes inside shaders, support for huge scenes
• Shadows for non-overlapping lights can be calculated in a single step! Amazing potential for rendering huge scenes.
• Adjustable visual quality through tracing a few more rays per pixel - like 9, reducing the need for heavy blur
• Dynamic Global Illumination
• Support for skeletal animation
• Support for arbitrary animation - ex. based on physics - cloths, fluids
• Dynamic caustics
• Volumetric effects, patricles - smoke, fire, clouds
• Creating a sandbox game for testing

Amazing! This really kicks ass

I work on something similar with the focus on infinite bounce GI using a hierarchy of surface samples, so i cache results in textures as well. I use Texels and Mip Maps the same way a quad tree works and i require to access neighouring / parenting texels for interpolation and blurring too.

This brought up a major problem: Traditioanl lightmap UVs are not continuous and have seams - no more easy interpolation. How do you handle this?

Personally i've gone the hard way: Quadrangulate the surface to get a seamless global parameterization by constraining seams with 90 degrees rotations and integer dispalcements. Spent half a year on that, still lot's of work to get a full automatic tools...

"- engine doesn't shoot rays in random directions, but rather offsets them slightly based on pixel position in image-space. This results in zero temporal noise."

If you don't mind, could you explain this more detailed?

Hey JoeJ, thanks for the kind words

I think I wasn't clear about the "texture" part. I store ray tracing results in various render targets, rather than in scene objects' textures (lightmaps etc). So there is really no problem of seems in my case.

I have no GI yet, only shadows and reflections/refractions. But GI gives the most occasions for interesting solutions and optimizations so I very much look forward to writing it. I'm yet to figure out where to store light data from GI. It's one of those subjects I wonder about in bath  I've been thinking of storing it in various acceleration structures or textures as you do. It's good to know what kind of problems lay ahead! I've been thinking of merging similar neighbor light samples together and then come up with some cheap/approximate shadowing/shading method which could handle thousands of weak light sources. For that some sort of directional occlusion information could be useful, stored in screen-space possibly. And ray tracing only some geometry further from lit surface, using very low-res geometry. Just some thoughts on the subject. Life will verify those ideas

"- engine doesn't shoot rays in random directions, but rather offsets them slightly based on pixel position in image-space. This results in zero temporal noise."

It can be best explained if we start with tracing just a single ray towards a center point on a light source. It gives us hard shadow, with proper silhouette. But we can also trace multiple rays to multiple points in the light source. If those points on the light source are always the same (ex: corners + center) we will get several hard steps of shadow fade out. But we can also shoot those rays randomly (Path Tracing) and we get temporally noisy shadow with less and less shadow samples further from shadow center. Or we can decide on where to target our rays based on pixel position (some modulo of pixel pos in screen-space etc) and we get some weird pattern of shadow/lit pixels, which is fixed in time (beneficial for hiding noise). That's almost what I do.

Because I then reduce the number of traced shadow rays to just one per pixel. It basically makes that pattern stronger, but still easily blur-able. And way cheaper! That however is not visible in video or screenshots as it is my newest work - and it's incomparable to what is on screenshots/video - a different league of quality and reliability  What is on screenshots/video is hard shadow (also 1 shadow ray per pixel, rays aimed at center of light source) blurred into soft shadow. Looks good, but supports only limited penumbra size and has reliability problems in some complex scenes. I could go on about problems with shadow overlapping etc I had to solve, but I'll leave it for other occasion

Thanks for the explantation, i'm surprised it works so well although you work in screenspace!

(Also that means i just quoted you wrong in the marching cubes thread.)

4 hours ago, IcyTower said:

I'm yet to figure out where to store light data from GI.

I store it in the surface sample hierarchy, so in world space. My decision to use a layout compatible with textures has many advantages, but it also turned out to be a very complex approach requiring much more work than expected.

I recommend however to use world space in some form and not screen space. By doing so you can cache results robustly and you get infinite bounces for free. Also you already have a result whenever you hit something - no need to trace random paths just to get a tiny fraction of this result. Path tracing does nothing like this, and that's why it's slow.

Unfortunately caching in world space leads to high complexity to solve many problems: Need for global paramererization? How to interpolate from sample positions? Need to interpolate between parents and children to hide LOD switches? How to manage memory? Need some form of unique data per surface, so no more easy instancing? etc...

This said as a warning and you should keep thinking about simpler alternatives...  E.g. the Seed demo which uses just random surfels - easy to do and the important optimization is still there.

Keep up this impressive work!

Posted (edited)

Could you elaborate more on how Seed GI works? It's the first time I hear about "surfels"...

The reason why screen-space approach works so well is because the source of information is very reliable - ray tracing. But then it's crucial to blur some parts of the image, while leaving others sharp, rejecting some samples where positions/normals differ, doing that gradually etc. Lots of bugs I've been through and lots of failed approaches. And I sample various level mipmaps based on the blur I want to get, but still trying to take at least a couple of samples to be able to reject some of them. Also to enable shadow overlapping, I need to store shadows in 3 layers - hard, medium and soft shadows. I wonder if I can improve on that with the new approach. But this allows me to store and blur dist-to-occluder distance properly without mixing very small and very high values. The core of the algorithm actually.

I just though that GI could also be stored in screen-space, but with some spherical camera perspective. When moving camera, most samples could be reused, and some would need to be recalculated. Just another idea...

Edited by IcyTower

1 hour ago, IcyTower said:

Could you elaborate more on how Seed GI works? It's the first time I hear about "surfels"...

There is a presentation about Seed. It's not that detailed, but i guess it works like so:

'Surfel' mostly means (disc shaped) samples on the surface, you can place them regular (like light map texels) or distribute them more random but still dense (they have a image showing they use the latter).

The surfel stores the incoming light from the whole hemisphere it can see, and they apply this to the final image using bilateral filtering tech you already have: More expensive than simple bilinear filtering light maps but avoids all the pain i've mentioned.

The main problem then is how to calculate incoming light for each sample quickly. Two options:

Trace many rays for each sample to capture full enviroment, but update only few samples per frame. (Closer to Radiosity method, no noise)

Trace one ray for all samples and use denoising. (Closer to Path Tracing)

Using either way, if you use the surfel data at the ray hit points as well, there is no more need to add recursive paths because you already know the outgoing light at the hitpoint. This way, e.g. if updating all surfes once per frame, you integrate one bounce of GI per frame. So infinite bounces are free with a lag of one frame per bounce. The lag is not very noticeable because the nth bounce contributes much less than the first bounce. (But the Lag coming from updating only a small number of surfels or tracing only a single ray of course is noticeable.)

This is no new idea, it has been used by Radiosity solvers even before path tracing came up. Also many static approaches using precomputed light transport use this form of integration. It ends up being much faster than path tracing, but it has the limitation of discretized sample locations.

Personally i don't trace real triangles but only a hierarchy of such surfels. So if surfels are 10cm appart then i have no way to capture shadows of smaller objects. Also i use small spherical enviroment maps for each surfel (similar to the Many LODs paper) to have directional information for bump mapping and glossy reflections, but sharp mirror reflections are not possible this way.

That's why i consider raytracing like you do to add those missing high frequency details...

1 hour ago, IcyTower said:

I just though that GI could also be stored in screen-space, but with some spherical camera perspective.

Even if you would render a full cube map for the camera, the information you get is too incomplete to support indirect lighting inside buildings. I think about such ideas as well to add local details, but its just a hack. Something like this https://www.youtube.com/watch?v=7LrlKEzCkh0 looks beautiful, but think of how it would break with moving objects like a 3rd person character. McGuire showed some work with layered framebuffers to get around this, but at that point you can do it just right as well i think.

I remember i made a small example of surfel radiosity and still have the code. It's just brute force but contains all the math. Init once and per frame call Simulate and Visualize.

struct Radiosity
{
typedef sVec3 vec3;
inline vec3 cmul (const vec3 &a, const vec3 &b)
{
return vec3 (a[0]*b[0], a[1]*b[1], a[2]*b[2]);
}

struct AreaSample
{
vec3 pos;
vec3 dir;
float area;

vec3 color;
float emission; // using just color * emission to save memory
};

AreaSample *samples;
int sampleCount;

void InitScene ()
{
// simple cylinder

int nU = 144;
int nV = int( float(nU) / float(PI) );
float scale = 2.0f;

float area = (2 * scale / float(nU) * float(PI)) * (scale / float(nV) * 2);

sampleCount = nU*nV;
samples = new AreaSample[sampleCount];

AreaSample *sample = samples;
for (int v=0; v<nV; v++)
{
float tV = float(v) / float(nV);

for (int u=0; u<nU; u++)
{
float tU = float(u) / float(nU);
float angle = tU * 2.0f*float(PI);
vec3 d (sin(angle), 0, cos(angle));
vec3 p = (vec3(0,tV*2,0) + d) * scale;

sample->pos = p;
sample->dir = -d;
sample->area = area;

sample->color = ( d[0] < 0 ? vec3(0.7f, 0.7f, 0.7f) : vec3(0.0f, 1.0f, 0.0f) );
sample->emission = ( (d[0] < -0.97f && tV > 0.87f) ? 35.0f : 0 );

sample++;
}
}
}

void SimulateOneBounce ()
{
for (int rI=0; rI<sampleCount; rI++)
{
vec3 rP = samples[rI].pos;
vec3 rD = samples[rI].dir;
vec3 accum (0,0,0);

for (int eI=0; eI<sampleCount; eI++)
{
vec3 diff = samples[eI].pos - rP;

float cosR = rD.Dot(diff);
if (cosR > FP_EPSILON)
{
float cosE = -samples[eI].dir.Dot(diff);
if (cosE > FP_EPSILON)
{
float visibility = 1.0f; // todo: In this example we know each surface sees any other surface, but in Practice: Trace a ray from receiver to emitter and set to zero if any hit (or use multiple rays for accuracy)

if (visibility > 0)
{
float area = samples[eI].area;
float d2 = diff.Dot(diff) + FP_TINY;
float formFactor = (cosR * cosE) / (d2 * (float(PI) * d2 + area)) * area;

vec3 reflect = cmul (samples[eI].color, samples[eI].received);
vec3 emit = samples[eI].color * samples[eI].emission;

accum += (reflect + emit) * visibility * formFactor;
}
}
}
}

}
}

void Visualize ()
{
for (int i=0; i<sampleCount; i++)
{
vec3 reflect = cmul (samples[i].color, samples[i].received);
vec3 emit = samples[i].color * samples[i].emission;

vec3 color = reflect + emit;

//float radius = sqrt (samples[i].area / float(PI));

float radius = sqrt(samples[i].area * 0.52f);
}
}

};

Posted (edited)

Thank you for the great explanation. I need to look at that presentation as well. That seems to be the way to go.

1 hour ago, JoeJ said:

That's why i consider raytracing like you do to add those missing high frequency details...

That's funny because I have the exact opposite problem with reflections. I need low-frequency, but working with normal maps. Blurring sharp image fails in that case. I also think of altering ray directions as I started doing with shadows, and maybe adding some more rays per pixel. The problem is what to do for secondary bounces, where number of rays would grow quickly. But I think it's doable. Hopefully that could give me rough reflections for bump mapped surfaces.

I wish I could spend more time on this, so many interesting ideas to test...

Is there somewhere to see your project and some screenshots/videos?

Edited by IcyTower

6 minutes ago, IcyTower said:

That's funny because I have the exact opposite problem with reflections.

Yeah, we have exact opposite strengths everywhere

I was already thinking about bumpy reflections with your approach - it's an acceptable limitation of course.

11 minutes ago, IcyTower said:

Is there somewhere to see your project and some screenshots/videos?

Not yet. I have only compute shaders to calculate the surfel stuff yet (it's huge - took me more than a year just to port from CPU to GPU!)

I'm not done with the preprocessing tools necessary to apply this to generic game geometry. I'm using a simple Quake level for now - pretty tired of it after 10 years, haha. I also have nothing for the graphics pipeline yet - starting with this when the preprocessing stuff is done.

I'd be lucky if i can show something before the year is over.

That's sad because performance seems good enough for XBox One so far, but i guess i've already missed current generation. So i consider adding stuff like you do to be attractive for the next. Pretty unsure how upcoming raytracing APIs should affect this...

×   Pasted as rich text.   Paste as plain text instead

Only 75 emoji are allowed.

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

1. Developer
2. Category
Developer Tool
3. Type
Engine
4. Status
In Development
5. Platforms
6. Engine
Custom

Screenshots

Last updated 04/17/18