implement and understand voxel cone tracing

Started by
20 comments, last by Vilem Otte 5 years, 12 months ago

Hello,

i try to implement voxel cone tracing in my game engine.

I have read many publications about this, but some crucial portions are still not clear to me.

At first step i try to emplement the easiest "poor mans" method

a.  my test scene "Sponza Atrium" is voxelized completetly in a static voxel grid 128^3 ( structured buffer contains albedo)

b. i dont care about "conservative rasterization" and dont use any sparse voxel access structure

c. every voxel does have the same color for every side ( top, bottom, front .. )

d.  one directional light injects light to the voxels ( another stuctured buffer )

I will try to say what i think is correct ( please correct me )

GI lighting a given vertecie  in a ideal method

A.  we would shoot many ( e.g. 1000 ) rays in the half hemisphere which is oriented according to the normal of that vertecie

B.  we would take into account every occluder ( which is very much work load) and sample the color from the hit point.

C. according to the angle between ray and the vertecie normal we would weigth ( cosin ) the color and sum up all samples and devide by the count of rays

Voxel GI lighting

In priciple we want to do the same thing with our voxel structure.

Even if we would know where the correct hit points of the vertecie are we would have the task to calculate the weighted sum of many voxels.

Saving time for weighted summing up of colors of each voxel

To save the time for weighted summing up of colors of each voxel we build bricks or clusters.
Every 8 neigbour voxels make a "cluster voxel" of level 1, ( this is done recursively for many levels ).

The color of a side of a "cluster voxel" is the average of the colors of the four containing voxels sides with the same orientation.

After having done this we can sample the far away parts just by sampling the coresponding "cluster voxel with the coresponding level" and get the summed up color.

Actually this process is done be mip mapping a texture that contains the colors of the voxels which places the color of the neighbouring voxels also near by in the texture.

Cone tracing, howto ??

Here my understanding is confus ?? How is the voxel structure efficiently traced.

I simply cannot understand how the occlusion problem is fastly solved so that we know which single voxel or "cluster voxel" of which level we have to sample.

Supposed,  i am in a dark room that is filled with many boxes of different kind of sizes an i have a pocket lamp e.g. with a pyramid formed light cone

- i would see some single voxels near or far

- i would also see many different kind of boxes "clustered voxels" of different sizes which are partly occluded

How do i make a weighted sum of this ligting area ??

e.g. if i want to sample a "clustered voxel level 4" i have to take into account how much per cent of the area of this "clustered voxel" is occluded.

Please be patient with me, i really try to understand but maybe i need some more explanation than others

best regards evelyn

 

 

Advertisement

Hi, first of all, a concrete simple example how to sample the voxel grid which is contained in a 3D texture: https://github.com/turanszkij/WickedEngine/blob/master/WickedEngine/voxelConeTracingHF.hlsli#L32

For understanding cone tracing, first you should understand ray tracing, and how to approximate it in a numeric way where you don't have explicit parametric definition of your scene surface, just a bunch of data. The approximation is called ray marching and the data is your texture which is built from pixels. In ray marching, you start at the start of the ray, look up the corresponding pixel value, then advance along the ray direction by one pixel and sample the texture again. If you just sampled a pixel with opacity = 1, then that means you just hit a surface.

Cone tracing is the same, but you don't want to trace a single ray, but many rays at once, making up a cone. An approximation is pre-integrating the texture into different levels of detail, called mip mapping. So now each level contains less data than the previous, but the data being the average of that as well. So do the exact same thing as with ray marching, start at the ray beginning and go along the ray direction, but with each step, increase which mip level you sample from, so each sample will give you a precomputed average of samples. The linear filtering will work for 3D textures as well, and it will ensure that when you sample from an increased mip level, the result will be weighted by the sample position, so it will result in a "nice" gradient of colors when visualized. Also, you must keep track of the opacity, because with the data being pre-integrated, the opacity value you read will not be just one or zero, but averages of nearby pixels. You can be sure that you hit a surface when you accumulated alpha and exceeded the value of one.

I hope that made some sense, good luck!

hi turanszkij,

many many thanks for you answer.

49 minutes ago, turanszkij said:

Hi, first of all, a concrete simple example how to sample the voxel grid which is contained in a 3D texture: https://github.com/turanszkij/WickedEngine/blob/master/WickedEngine/voxelConeTracingHF.hlsli#L32

Some months ago i had an intensive study on your surce code und your home page. Although i work with c# i could success with my basic c++ knowledge to compile and run your engine. Bad badly i did no save the code and after trying to download and compile again i saw, that i cannot use the engine because i only have Win7 64bit  your engine requires Win10 DirectX12. Do you still have a Win7 version ?

1 hour ago, turanszkij said:

 The approximation is called ray marching and the data is your texture which is built from pixels. In ray marching, you start at the start of the ray, look up the corresponding pixel value, then advance along the ray direction by one pixel and sample the texture again. If you just sampled a pixel with opacity = 1, then that means you just hit a surface.

In ray marching contex i understand. But opacity here is easy to detect and the progress is only e.g. one pixel. So the problem i have is not touched. My Problem "understanding occlusion query" is solved by tracing each pixel step by step.

1 hour ago, turanszkij said:

An approximation is pre-integrating the texture into different levels of detail, called mip mapping. So now each level contains less data than the previous, but the data being the average of that as well.

Now here is my problem. When doing mip mapping on a 3d texture several coarser 3d texture mip maps are created ?  But when sampling the 3d texture at a given point an given mip level it will allways return the same value right ?

When sampling with "quadrilinear interpolation" we get smooth values but allways the same value independent from view direction of the cone. see the following Part of your code

The coneDirection is only for calculating the next postion in texture space where we sample from.

But if sampling the mip map is view direction independent how can it reproduce the correct color ? Sampling a point within a 3d mipmap is something totally different from getting a projection (of the view ) of the colors from the Voxels it consists of ( when looking at it from a certain viepoint) ?

To my understanding the sampling yust gives me the interplation of the accumulated surrounding voxel colors of the sample point.

I still need help in understanding, please be patient.

best regards evelyn

Part Code from turanszkij game engine

float diameter = max(g_xWorld_VoxelRadianceDataSize, 2 * coneAperture * dist); float mip =log2(diameter * g_xWorld_VoxelRadianceDataSize_Inverse);   // Because we do the ray-marching in world space, we need to remap into 3d texture space before sampling: // todo: optimization could be doing ray-marching in texture space float3 tc = startPos + coneDirection * dist; tc = (tc - g_xWorld_VoxelRadianceDataCenter) * g_xWorld_VoxelRadianceDataSize_Inverse; tc *= g_xWorld_VoxelRadianceDataRes_Inverse; tc = tc * float3(0.5f, -0.5f, 0.5f) + 0.5f;   // break if the ray exits the voxel grid, or we sample from the last mip: if (any(tc - saturate(tc)) || mip >= (float)g_xWorld_VoxelRadianceDataMIPs)

break;

float4 sam = voxels.SampleLevel(sampler_linear_clamp, tc, mip);

 

4 hours ago, evelyn4you said:

GI lighting a given vertecie  in a ideal method

A.  we would shoot many ( e.g. 1000 ) rays in the half hemisphere which is oriented according to the normal of that vertecie

B.  we would take into account every occluder ( which is very much work load) and sample the color from the hit point.

C. according to the angle between ray and the vertecie normal we would weigth ( cosin ) the color and sum up all samples and devide by the count of rays

I'm not sure what you can not understand, but maybe it is how to approximate the above with cone tracing.

First, notice that instead weighting by cosine, you should change the distribution of rays instead: More rays at normal directions, less at tangent directions. (Importance sampling) Then you can simply average and get better results with less rays... just to mention.

So, e.g. instead tracing 1000 hemisphere rays we want to trace 4 cones instead. We can only expect to get an approximate and similar result, but we try to make it as good as possible.

We start by dividing 1000 rays into 4 bundles of 250 rays, and we calculate a bounding* cone for each of them.

Then we trace the cones, which is similar to sphere tracing, but for cone tracing we constantly grow sphere size so it fits our cone as it marches along the ray. We will also increase the step size by the same factor. Sphere size also sets the mip level for trilinear filtering so volume data resolution fits sphere size.

Each time we sample from the volume data, we accumulate color and alpha, and if alpha > 1 we decide to stop tracing, because we assume all 250 rays have hit something at that point. The accumulated color divided by alpha then refers to the averaged color from all 250 rays.

It's just an approximization, that becomes better and better the more cones you use. (3 is minimum to approximate hemisphere, 4,5,7... gives better results)

Direction (hit point normal) does not matter. You seem to be confused by that, but it would not matter for ray tracing as well. The normal of the hit point would only matter if you would not just have point at the hit, but something like a patch or surfel, something that has an area you wouldd want to integrate - here visible area from the receiver depends on the surface normal of the emitting patch (radiosity method). Ray tracing does not care - it approximates this by the distribution of the ray directions (or their weighting as you said.)

 

I like this example to understand how diffuse GI  is calculated:

Place a perfect mirror ball at the receiving point you want to calculate.

Make a photo of the mirror ball from the direction along the receivers surface normal. (avoid perspective in the photo, so orthogonal if possible)

Sum up all the pixel colors from the photo on the mirror ball and divide by number of pixels. (no weighting necessary)

This is the final incoming light, we're done.

Notice that this simple example explains the most important things about GI and how easy it is to derive related math (like ray or cone distribution or weighting, cone directions and angles, etc.) from there. Also the normal direction of emitters does not matter - only the image that appears on the mirror ball.

 

*) bounding would not be very good as it would duplicate too much space between the cones, but just to visualize...

 

 

EDIT: There are limitations in volume tracing where direction indeed matters. E.g. at some distance the fornt and back side of a wall will be merged and you accidently sample them both. No way to fix this - it's the main problem of voxel mips. (Anisotropic voxels with 6 colors, one for each side can't really fix it either.)

 

 

 

1 hour ago, evelyn4you said:

 i cannot use the engine because i only have Win7 64bit  your engine requires Win10 DirectX12. Do you still have a Win7 version 

Unfortunately, I don't have a machine with win 7 any more and can't keep up with syncing with the old DirectX SDK. Developing just for the new one is so much easier. Though I'm nearly finished the Vulkan implementation which should eliminate the Win 10 requirement. 

1 hour ago, evelyn4you said:

Now here is my problem. When doing mip mapping on a 3d texture several coarser 3d texture mip maps are created ?  But when sampling the 3d texture at a given point an given mip level it will allways return the same value right ?

When sampling with "quadrilinear interpolation" we get smooth values but allways the same value independent from view direction of the cone. see the following Part of your code

The coneDirection is only for calculating the next postion in texture space where we sample from.

But if sampling the mip map is view direction independent how can it reproduce the correct color ? Sampling a point within a 3d mipmap is something totally different from getting a projection (of the view ) of the colors from the Voxels it consists of ( when looking at it from a certain viepoint) ?

To my understanding the sampling yust gives me the interplation of the accumulated surrounding voxel colors of the sample point.

 

You are right, mip maps here are like different 3D textures, but in DirectX they are stored in the same resource as the main 3d texture wich enables eficcient access from the shaders because with one sample operation you can possibly load from multiple mips (sub-3d textures) at once. If your sampler has "linear" filtering mode. This is also called quadrilinear filtering as you mentioned.

We don't need view dependency for the diffuse part of the illumination. This calculation only takes into consideration the surface normal, and shoots rays inside a hemisphere directed along that normal. This step involves first having your surface position which you start the rays from. Also you have the surface normal in world space, which will give the ray direction. Now you can start stepping along the ray. On each step, you convert the position on the ray which is in world space, to your voxel texture space, then sample. Repeat until you accumulated more that one alpha or reached the lowest mip level or reached some predefined maximum distance (to avoid infinite loop).

You are actually right, sampling only gives the surrounding voxel colors, but those should already contain the scene with direct illumination, so it's exactly what you want.

You will need view direction information for specular reflections, which you can also retrieve from the voxels. The algorithm is the same, but you are shooting a single ray in the reflection direction. Note that my code uses the function "ConeTrace" for both diffuse and specular GI.

I hope that it was any help. 

many thanks for you answers JoeJ and turanszkij,

@JoeJ
Your "perfect mirror ball" helped me a lot to understand, also your info about weighting rays vs.  changing the distribution of rays

The part that comes nearly exactly to my understanding problem is the front/backside wall problem.

Test scene. Just imagine a room which is diveded by a wall with a closed door.

Lets assume the voxel len is 10 units and the wall is even 20 units thick. One room has a strong point light, the other is completely dark.

The voxels of the middle wall on one side will be bright because of the direct light but the voxels on the other side of the wall will be completely dark.
But at the 1, 2, 3, mip level of the voxels of the middle wall will be a middle bright area that will give indirect illumination in the dark room.

is this right ?

Yes you are right. This is the reason why i personally (and others too) gave up on voxel GI many years ago.

It's not just the front/back problem, a thin wall starts to produce too small occlusion as well, it becomes more and more transparent the higher the mip.

You can get usable results if you adapt your level design to those limitations - large and complex interiors are a problem. What most people do is to use it only near the eye, often with only one volume (so no nested volumes with various resolutions like terrain clip maps). Far geometry then has no GI but could fall back to baking or whatever.

However i don't not how performance is with current hardware. Being able to trace more cones with smaller angles will reduce those problems.

 

 

Now my misunderstanding is clarified.

But now i am at my starting point. Which Gi solution would be best for me ?

I have already tried LPVs but the results did not please me.

My game happens indoor in a  castle. Until now i work with  high resolution dynamic cube map sampling 6 viewport directions ligt probe with mipmapping.

In the room i place it the illimination and reflection look marvelous, but i havent found a automatic solution to determine which vertecie shall be lighted by wich probe. So the probe workes on all verecies in its range through walls and doors.

Does here exist a good standard solution ?

 

From what you say - did you consider light maps? This allows fast, robust and high quality diffuse lighting. Probes would be necessary only for reflections. Missing occlusion can be limited by manually placing probes with boundary volumes / SSR.

But you should tell us a lot more to get good help on this decision. The more you know about your game, the better the choice you can make. The more generic and flexible you want to be to support multiple games, the more you need to accept compromises and limitations.

Here are the usual questions arising:

Mostly static world geometry / features destruction or building / needs to be fully dynamic? 

Lights position mostly static or dynamic? Light colors and intensities static / dynamic? Dynamic Time of Day?

Materials mostly rough, diffuse or many reflections from metals?

 

 

Hi JoeJ,

- all scenes happen indoor in the house, castle or on the balcon or terrace ( no open world scenario )
- there is no destruction ( no such things like explosions, no shooter scenario )
- only the characters and the animals like ( bird, cat, dog ) are dynamic
- lights have static position but with slow day and night cycle. ( At the sunset sitting on terrace .. )
- very much effort i had to create believable characters. This was the reason why i did not use unity, or unreal
  because i could not transfer my morph, bone animation, face animation like i wanted to have.
  ( lack of skill for c++ for unreal, and problems with unity shader integration)
  e.g.  a character shall grab a glass with her/his hand and drink. This is NOT done by simple pre processs animation for this special
  character but with an animation tempate and realtme IK calculation taking into account wheter the person is small or large ( long or short arms, distance from shoulder to glass and so on.

I think lightmapping would probably be the best solution for me but it was too hard for me to switch between blender, unwrapping, ray tracing, bake lightmaps importing them...
Also i  have read about doing lighmap baking myself, but my skills are too small to integrate a engine like nvidias optix (c++) in my c# engine.
My intention was to bake the interieur scenes without characters and to combine the baked lighting with my dynamic directional, spot and point lights.

I am working on my engine for nearly 16 months an have reached much, much more i never thought i could reach.
In principle my results are not bad but they dont reach the standard i want to reach.

The last weeks i have read very much about GI but my enlish is not the best and nearly all publications are in english with much mathematic background. My mathematic standard i think is quite hight, but things explained in the papers exceed me knowledge.

Placing here and there small point lights help to create a pleasing scene but the overall impession is not what i want to reach.
It is the "old style" games where made.

Things like AO, bloom, hdr pipeline, i have already integrated.

 

This topic is closed to new replies.

Advertisement