Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Mar 2007
Offline Last Active Yesterday, 11:40 PM

#5295575 Phong model BRDF

Posted by on 08 June 2016 - 12:00 AM

1. Reflectance is the ratio of outgoing light to incoming light. In other words, for any beam of light striking the surface it tells you how much of that light will reflect off of it instead of getting absorbed. Since it's a ratio, only [0,1] values make sense if you're going to enforce energy conservation. You can compute it for a BRDF and a light direction by integrating the result of the BRDF over the entire hemisphere of viewing directions. So you can essentially think of that process as summing up all of the light that's reflected in all directions from a single ray of light.


2. Rspec is the reflectance of the specular BRDF.


3. By "glancing angle" they mean that the vector light source is close to parallel with the surface plane. This is consistent with the common usage of the term glancing angle in the field of optics, where it refers to an incoming ray striking a surface.


4. So as that paragraph says, they compute the directional-hemispherical reflectance of the specular BRDF with the light direction held constant at ThetaI = 0. Since the reflectance is highest when θi = 0, you know that the reflectance value represents the maximum possible reflectance value for the BRDF. So by computing the maximum reflectance and then dividing the BRDF by that value, you can be sure that the reflectance of the BRDF never exceeds 1 (as long as Cspec is <= 1).


If you want to derive this result yourself, first start with the specular BRDF. This is the right side of 7.46:




As we established earlier, we can compute directional hemispherical reflectance by integrating our BRDF about the hemisphere of possible viewing directions. We'll call this set of directions Ωv, and in spherical coordinates we'll refer to the two coordinates as Φv and θ(not to be confused with θi, which refers to our incident lighting direction). The integral we want to evaluate looks like this:




The "sinθv" term is part of the differential element of a spherical surface, which is defined as dS = r2sinθdθdϕ. In our case we're working on a unit hemisphere, so r = 1.


Now we're going to evaluate this with θi held constant as zero. In this case α is equal to θv, and so we can make that substitution. We can also pull out Cspec / π, since that part is constant:




You can verify the result of this integral using Wolfram Alpha.


EDIT: I accidentally left out the "dΩ" and "dθdϕ" from the integrals in the middle image. Please pretend that I put them in there.  :)

#5295168 SampleLevel not honouring integer texel offset

Posted by on 05 June 2016 - 06:50 PM

It's possible to just use the new compiler and still use the old SDK, if for some reason you're really keen on not switching. If you're using fxc.exe it's easy: just use the new version. If you're linking to the D3DCompiler DLL it's a little trickier, since you will probably have trouble making sure that your app links to the correct import lib. One way to make sure that you use the right version is to not use an import lib at all, and instead manually call LoadLibrary/GetProcAddress to get a pointer to the function you want to use from d3dcompiler_47.dll.

#5294517 [D3D12] Issue pow HLSL function

Posted by on 01 June 2016 - 12:51 PM

If you pass a hard-coded to 0.0f as the exponent parameter of of pow, the compiler is going to optimize away the pow() completely and just replace the whole expression with 1.0f. However if the exponent is not hard-coded and instead comes from a constant buffer or the result of some other computation, then it will need to actually evaluate the pow(). On catch with pow() is that DX bytecode doesn't contain a pow assembly instruction, which is consistent with the native ISA of many GPU's. Instead the compiler will use the following approximation:

pow(x, y) = exp2(y * log2(x))

If you take a look at the generated assembly for your program, you should find a sequence that corresponds to this approximation. Here's a simple example programming and the resulting bytecode:

cbuffer Constants : register(b0)
    float x;
    float y;

float PSMain() : SV_Target0
    return pow(x, y);

dcl_globalFlags refactoringAllowed
dcl_constantbuffer CB0[1], immediateIndexed
dcl_output o0.x
dcl_temps 1
log r0.x, cb0[0].x
mul r0.x, r0.x, cb0[0].y
exp o0.x, r0.x

Notice the log instruction (which is a base-2 logarithm) followed by the exp instruction (which is also base-2).

The one thing you need to watch out for with log instruction is that it will return -INF if passed a value of 0, and NAN if passed a value that's < 0. This is why the compiler will often emit a warning if you don't use saturate() or abs() on the value that you pass as the first parameter to pow(). 


In light of all of this, I would take a look at the assembly being generated for your shader. It may reveal why you don't get the results your expect, or possibly an issue with how the compiler is generating the bytecode. You should also double-check that you're not passing a negative value as the first parameter of pow(), which you can avoid by passing saturate(RdotV).

#5294228 (Physically based) Hair shading

Posted by on 30 May 2016 - 02:33 PM



In fact, I did not solve the IBL problem yet. In my knowledge The Order uses tangent irradiance maps



We didn't end up shipping with that, since we removed all usage of SH from the game towards the end of the project. Instead we applied specular from the 9 spherical gaussian lobes stored in our probe grid.

#5294083 Question about GI and Pipelines

Posted by on 29 May 2016 - 05:15 PM

As Hodgman already explained, you can implement VCT as part of a deferred or forward renderer. However a deferred renderer will generally give you more flexibility in how you can fit into your rendering pipeline. Back when UE4 was telling everyone that they were going to use VCT, their presentations mentioned that they couldn't afford to perform the cone traces at full resolution. Instead they were were doing specular at half-resolution and diffuse even lower than that, and then upsampling. This is really only feasible with a deferred renderer, since a forward renderer typically rules out any kind of mixed-resolution shading.

#5293844 Still no DX12 Topic Prefix for the forums?

Posted by on 27 May 2016 - 12:49 PM

I don't think I have permission to do this, so I'll need to get in touch with the admins.

#5293843 When will SlimDX be updated so as to contain also DirectX 12 ?

Posted by on 27 May 2016 - 12:49 PM

Promit answered this question yesterday.

#5293675 does video memory cache effect efficiency

Posted by on 26 May 2016 - 03:35 PM

I think you'll find more results if you search for "GPU cache" instead of "video memory cache". This is because the cache structure is really a part of a GPU and not its onboard memory, and also because the term "video memory" is pretty outdated.


Unlike CPU's, there's no generic cache structure that's used by all GPU's. Instead GPU's have a mixture of special-case and general-purpose caches where the exact number and details can vary significantly between hardware vendors, or even across different architectures from the same vendor. They also tend to be much smaller and much more transient compared to the CPU caches. CPU's actually dedicate a relatively large portion of their die space to cache, while GPU's tend to dedicate more space to their SIMD ALU units and corresponding register files. Ultimately this all means that the cache behavior ends up being different then what you would expect from a CPU with large L1/L2 caches, and you can't always apply the same rules-of-thumb.

#5292928 How to organize worlds/levels as seen in the new Doom

Posted by on 22 May 2016 - 02:39 PM

The new Doom uses Umbra, it says so right when you start up the game.


A lot of games still use some form of PVS, but probably not based on BSP's like in the Quake days. The games that I have worked on used manual camera volumes where the set of visible meshes/lights/particles/whatever was computed at build-time based on generated or hand-placed sample points. Other games compute visibility completely at runtime by rasterizing a coarse depth buffer on the CPU, and then testing bounding volumes for visibility. Some newer games are moving towards doing all occlusion culling and scene submission on the GPU.

#5292920 Phong model BRDF

Posted by on 22 May 2016 - 02:16 PM


Why is the outgoing radiance equal to 0 when the angle between n and l is <= 0?

Because the surface is back-facing from the light's point of view. The rendering equation itself -- which the BRDF plugs into -- multiplies the entire BRDF by N⋅L, so having this condition within the BRDF is actually superfluous. I guess it's just mentioned because most realtime renderers actually don't implement the rendering equation properly, so that condition is required to avoid getting specular highlights on the wrong side of an object.
As for the rest, this cosm term seems weird. Phong is based around (L⋅reflect(V,N))m, which is equivalent to cos(θr)m, where θr is the angle between the reflection direction and the light direction...


It's not cosm, it's cosmαr, where αr is the angle between the light direction and the reflected view direction.

#5292549 What's the advantage of Spherical Gaussians used in The Order vs. Envirom...

Posted by on 19 May 2016 - 06:36 PM

Let's back up a bit here. Our ultimate goal for our baked lightmap/probe data was to store a representation of the incoming radiance on the sphere or hemisphere surrounding a point, so that we can integrate our diffuse and specular BRDF's against that radiance in order to determine the outgoing reflectance for the viewing direction. In simpler terms we need to store the incoming lighting in a way that's convenient to compute the visible light coming out. Spherical Guassians (SG's) are nice for this purpose, since a small number of them can potentially represent the radiance field as long as the radiance is somewhat low-frequency. They can do this because they have a "width", which means that the sum of them can potentially represent the entire surface area of a hemisphere or sphere. SG's also convenient because it's possible to come up with a reasonable closed-form approximation for the integral of (BRDF * SG). This ultimately means that computing the outgoing lighting for a pixel is an O(N) operation, where N is the number of SG's per sample point.


This is all somewhat separate from the issue of how the data is actually stored in textures so that it can be accessed at runtime. In our game, we stored the data in 2D textures for static meshes (just like standard lightmaps) and in 3D textures for our probe grids. In practice we actually keep an array of textures, where each texture contains just 1 SG for each texel. Storing as textures means that we can use texture filtering hardware to interpolate spatially between neighboring texels, either along the surface for static meshes or in 3D space for dynamic meshes. Textures also allow you to use block compression formats like BC6H, which can let you save memory.


So to finally get your question, you're asking why you wouldn't store the baked data in lots of mini 3x3 or 4x4 environment maps. Let's look at this in terms of the two aspects I just mentioned: how the data is stored, and what you're actually storing. Having a set of environment maps allows you to interpolate in the angular domain, since each texel essentially represents a path of surface area on a sphere or hemisphere. However if we're using SG's, then we don't actually need angular interpolation. Instead we consider each SG individually, which requires no interpolation. However we do want to interpolate spatially, since we'll want to shade pixels at a rate that's likely to be greater than the density of the baked texels. As I mentioned earlier 2D or 3D textures work well for spatial interpolation since we can use hardware filtering, whereas if you used miniature environment maps you would have to manually interpolate between sample points.


Now to get the other part: what you actually store in the texture. You seem to be suggesting an approach where the environment contains some kind of single radiance value in each texel, instead of a distribution of radiance like you have with an SG. The problem here is how to do you integrate against your BRDF(s)? If you consider each texel to contain the radiance for an infinitely small solid angle, then you can essentially treat each texel as a directional light and iterate over each. However this is not great since there will be "holes" in the hemisphere that aren't covered by your sparse radiance samples. So instead of doing that, you might try to pre-integrate your BRDF against a whole bunch of radiance samples, and store that result per-texel (I believe that this is what you're suggesting when you say "approximate a cone"). This is pretty much exactly the approach used by most games for their specular IBL probes. The big catch with pre-filtering is that you can't actually pre-integrate a specular BRDF against a radiance field and store the result in a 2D texture, since the view dependence adds too many dimensions. So you're forced to pre-integrate the NDF assuming that V == N, sample the environment map based on the peak specular direction, and apply the fresnel/visibility terms after the fact. Separating the NDF from fresnel/visibility leads to error, and unfortunately this error gets worse as your roughness increases. The pre-integration also assumes a fixed roughness, and so you need to store a mip chain with different roughness values that you interpolate between. This doesn't sound particularly appealing to me due to the error from pre-integration and mip interpolation, and you also lose the benefit of being able to spatially interpolate between your sample points. On top of all of this you still need to handle your diffuse BRDF, which requires an entirely different pre-integration.


TL:DR - our approach also us to use hardware filtering to interpolate samples, and we can use an analytical approximation for both diffuse and specular without having to rely on pre-integration.

#5290911 Some questions about cascaded variance shadow mapping

Posted by on 09 May 2016 - 07:33 PM

The key insight here is that all of your cascades are parallel with each other. In other words, they all project along a common local Z axis. With that assumption, you can remove the need for having a completely separate matrix for each cascade. To do this you need to think of your shadow matrix as being 3 separate transforms composed together: a rotation, a scale, and a translation. The rotation is based on the orientation of your directional light: applying it will transform a coordinate so that it's now in a local coordinate relative to the light's direction, where the Z axis is aligned with the light's direction. The translation will transform the position so that it's now relative to the origin of the projection, which is typically center of the projection's near clip plane. Finally, the scale will transform the position such that -1 is the left/bottom of the projection, and 1 is the right/top of the projection. The translation and scale is typically unique for each cascade, since the projections will be different in size (in order to allow cascades to cover increasing amounts of the viewable area) and will also be located at different locations in world space. However the orientation will be the same, since it's purely based on the light direction. This means we can set things up such that we have one shared matrix representing the cascade rotation, while having a unique scale + translation for each cascade.


Storing your shadow projections in this manner be a good idea from performance point of view, since the data is more compact than having a full matrix per cascade. However it also allows you to handle gradients in an elegant way. Let's say you were to take a single point, compute the shadow map UV coordinate for two different cascades, and then compute the gradients. The gradients for each cascade would obviously be different, otherwise you wouldn't have issues at cascade boundaries. However since the projections are orthographic, the two gradients will always be proportional to one another. In fact, the ratio between the two is equal to the ratio between the scale components of the respective cascade transforms. So if we store the cascade scales separately, we can use them to "adjust" a gradient based on the cascade that was selected without having to apply the full transform to the original surface position.


The way that I implemented this in my shadow demo was to store a single 4x4 matrix that represented the transform for the cascade. Then I would store a translation and scale for all cascades, where that translation and scale represented the values needed to transform a point from the first cascade to the Nth cascade. This means that the translation for the first entry would be 0, and the scale would be 1.0. However they would be different for the following cascades, since they would be larger and centered around different points. Computing these scale and translation values is fairly straightforward: you can do it by transforming points with the first cascade matrix, then transforming by your Nth cascade matrix, and then comparing the values. Applying the transform in the shader is also pretty simple: first transform by the matrix for the first cascade, then apply the scale/translation based on the cascade that was selected for that pixel. If you need gradients for VSM, you just need to compute the gradients after applying the transform from the first matrix, and then scale them by the scale value for the selected cascade. 


Regarding your second question about the mip levels: it may be true that only the first cascade uses the highest-resolution mip level. However you wouldn't want to directly rasterize to a lower-resolution mip level, since doing this will give you poor results. If you were to do that, you'll get a lot of aliasing since you're rasterizing at a lower sample rate. The idea with mipmaps is that you rasterize at a high sampling, then pre-filter to lower-resolution mip levels so that you get a nice, stable result. 

#5290200 Irrandiance Volume v.s. 4-Basis PRT in Farcry

Posted by on 05 May 2016 - 12:40 AM

It seems 2 band SH also has ringing. When I implemented my SH2 irr-vol I used an Lanczos window to reduce them. Since only trivial SH co-effs multiplies involved in this windowing ops, from the performance perspectives I feel it doesn't like a big deal. Maybe the storage & filtering performance cost is not be the point about the question.


The FarCry's motivation really confused me for a while until by chance I found the Order 1886 (Sig’15 course) also used an multi-basis SG baking solution. One of most interesting things about the course is that they shared some experiences about using SH3 irradiance-cube to represent the HDR lighting, namely, HDR lighting can cause some SH lobes to be very large negative numbers to cancel out the high positive co-effs, which is really bad for baking quality and compression.


So finally I find my own answer: Don’t ever use SH irradiance-cube under HDR lighting situation. The irradiance-cube representation by using low-band SH under HDR situation may be far from accuate, and it's not suitable for baking output. Use muli-basis PRT method instead.


Indeed, that was the conclusion we eventually came to while working on The Order. SH has some really great properties, but ultimately it doesn't do well for storing arbitrary lighting environments. It's not so bad if you're storing very low-frequency data from indirect lighting, but if ever try to bake in direct lighting from an area light source the result is unusable without filtering. But then once you filter, you completely lose the directionality which also doesn't look right. SG's are much better in this regard, and also have the capability of storing higher-frequency signals. 

#5289015 [D3D12] Binding multiple shader resources

Posted by on 27 April 2016 - 06:44 PM

The CopyDescriptors approach is mostly for convenience and rapid iteration, since it doesn't require you to have descriptors in a contiguous table until you're ready to draw. For a real engine where you care about performance, you'll probably want to pursue something along the lines of what Jesse describes: put your descriptors in contiguous tables from the start, so that you're not constantly copying things around while you're building up your command buffers.


I also want to point out that the sample demonstrates another alternative to both approaches in its use of indexing into descriptor tables. In that sample it works by grabbing all of the textures needed to render the entire scene, putting them in one contiguous descriptor table, and then looking up the descriptor indices from a structured buffer using the material ID. Using indices can effectively give you an indirection, which means that your descriptors don't necessarily have to be contiguous inside the descriptor heap.

#5288662 How does material layering work ?

Posted by on 25 April 2016 - 03:21 PM

This seems to be a really nice workflow for artists as they have some kind of material library which they can customize and blend to obtain advanced materials on complicated object. This seems to be the best regarding to performances.

Yes, I would say that it has worked out very well for us. It helps divide the responsibility appropriately among the content team: a lot of environment artists can just pull from common material libraries and composite them together in order to create unique level assets. At the same time our texture/shader artists can't author the most low-level material templates, and whenever they make changes they are automatically propagated to the final runtime materials.

So you are using some kind of uber shader that accepts multiple albedos, normals, etc. each with his associated tiling and offsets with a masking texture for each layer ?

Yup. We have an ubershader that has a for loop over all of the material layers, but we generate a unique shader for every material with certain constants and additional code compiled in. The number of layers ends up being a hard-coded constant at compile time, and so we unroll the loop that samples the textures for each layer and blends the resulting parameters.

You might also have multiple drawcalls from those layers which are not present in the above technics, right ? This has some performances costs, can those be neglected ?

I don't think you would ever want to have multiple draw calls for runtime layer blending. It would likely be quite a bit more expensive than doing it all in a loop in the pixel shader.