Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Mar 2007
Offline Last Active Yesterday, 02:33 PM

#5251034 Spherical Harmonics Cubemap

Posted by MJP on 07 September 2015 - 01:36 PM

I had to change CosineA2 to be Pi * 0.25f, which corresponds with the 1, 2/3, 1/4 bands

Whoops, sorry about that! I corrected the code, in case anybody else looks at this thread later on.

Either way, I'm glad that you got it to work!

#5250931 Spherical Harmonics Cubemap

Posted by MJP on 06 September 2015 - 08:59 PM

The long answer:

To compute irradiance, you need to take your incoming radiance and integrate it with a cosine lobe oriented about the surface normal. With spherical harmonics you typically have your incoming radiance represented as a set of SH coefficients (this is what you're computing when you integrate your cubemap onto the SH basis functions), which means that it makes sense to also represent the cosine lobe with SH. If you do this, then computing the integral can be done using an SH convolution, which is essentially just a dot product of two sets of SH coefficients. This paper, which is also by Ravi Ramamoorthi, goes into the full details of how to represent your cosine lobe with SH coefficients. Basically you take the zonal harmonics coefficients for a cosine lobe oriented about the Z axis (This is just a set of constants, since it never changes), and then you rotate it so that it's now aligned with your normal direction. Once you've done that you perform the SH dot product with your incoming radiance, and you get irradiance as a result.

The short answer:

You compute a new set of coefficients from your normal direction, and perform a dot product with the SH coefficients that you got from integrating the cubemap. This is simple enough to do in a pixel shader, so you can just do it all at runtime per-pixel:
static const float Pi = 3.141592654f;
static const float CosineA0 = Pi;
static const float CosineA1 = (2.0f * Pi) / 3.0f;
static const float CosineA2 = Pi * 0.25f;

struct SH9
    float c[9];

struct SH9Color
    float3 c[9];

SH9 SHCosineLobe(in float3 dir)
    SH9 sh;

    // Band 0
    sh.c[0] = 0.282095f * CosineA0;

    // Band 1
    sh.c[1] = 0.488603f * dir.y * CosineA1;
    sh.c[2] = 0.488603f * dir.z * CosineA1;
    sh.c[3] = 0.488603f * dir.x * CosineA1;

    // Band 2
    sh.c[4] = 1.092548f * dir.x * dir.y * CosineA2;
    sh.c[5] = 1.092548f * dir.y * dir.z * CosineA2;
    sh.c[6] = 0.315392f * (3.0f * dir.z * dir.z - 1.0f) * CosineA2;
    sh.c[7] = 1.092548f * dir.x * dir.z * CosineA2;
    sh.c[8] = 0.546274f * (dir.x * dir.x - dir.y * dir.y) * CosineA2;

    return sh;

float3 ComputeSHIrradiance(in float3 normal, in SH9Color radiance)
    // Compute the cosine lobe in SH, oriented about the normal direction
    SH9 shCosine = SHCosineLobe(normal);

    // Compute the SH dot product to get irradiance
    float3 irradiance = 0.0f;
    for(uint i = 0; i < 9; ++i)
        irradiance += radiance.c[i] * shCosine.c[i];

    return irradiance;

float3 ComputeSHDiffuse(in float3 normal, in SH9Color radiance, in float3 diffuseAlbedo)
    // Diffuse BRDF is albedo / Pi
    return ComputeSHIrradiance(normal, radiance) * diffuseAlbedo * (1.0f / Pi);
I assembled and simplified this code from different places, so I apologize if there's a typo in there. But it should give you the general idea. Those cosine A0/A1/A2 terms are from Ravi's paper, and they're the zonal harmonics coefficients of a cosine lobe oriented around the Z axis. Also, you should notice how to compute the final diffuse value, I had to divide the result by Pi. This is because the actual diffuse BRDF is albedo / Pi, and if you forget that factor the result will be too bright. If you'd like, you can combine the 1 / Pi into the AO/A1/A2 terms, which simplifies nicely.

#5250908 Spherical Harmonics Cubemap

Posted by MJP on 06 September 2015 - 04:58 PM

The whole point is that you don't generate a "cubemap" as output, you generate a set of SH coefficients. Basically you iterate over every texel of your input cubemap, compute the SH coefficients for that direction, multiply the coefficients by the cubemap value, multiply by a correction factor that accounts for cubemap->sphere distortion, and sum the results into a set of SH coefficients. Page 9 of this paper has some pseudo-code for the process.

#5250907 Shader PS_3_0 Fails to Run (returns black frame)

Posted by MJP on 06 September 2015 - 04:53 PM

You can find the meaning of the various FVF codes here. XYZRHW indicates that the positions are already transformed, and they should be used by the rasterizer directly. I believe it's the same as using D3DDECLUSAGE_POSITIONT in a vertex declaration. They both essentially specify that vertex processing should be disabled. If you want to use that, then you need to pre-transform your vertex positions into screen space, and I believe you also need to make sure that your position has a valid W component.

#5250905 Stop HLSL from optimizing out unused textures

Posted by MJP on 06 September 2015 - 04:41 PM

I have never seen such behavior from the shader compiler, which makes me very skeptical that the compiler is actually re-assigning it to register 0. You can check this pretty easily by using fxc.exe to dump the disassembly, which includes the register assignments for shader resources. Alternatively, you can also use a program like RenderDoc to capture a frame and then inspect the disassembly. If you do this, you can also easily confirm that you are in fact binding your shader resource view to the correct slot.

If all else fails, try running with the reference device. If it works correctly with the reference device, then you may have hit a driver bug.

#5250762 Shader PS_3_0 Fails to Run (returns black frame)

Posted by MJP on 05 September 2015 - 06:32 PM

I believe D3D9 debug runtimes only worked on Windows XP?

I'm pretty sure that the debug runtime at least worked on Vista and Win7, but I haven't used D3D9 in quite some time so I can't say for sure.

Can I find a sample of setting the dummy vertex shader anywhere?

Here's a simple example:
float4x4 Projection;

struct VSInput
    float3 Position : POSITION;
    float4 Color : COLOR0;
    float2 TexCoord : TEXCOORD0;

struct VSOutput
    float3 Position : POSITION;
    float4 Color : COLOR0;
    float2 TexCoord : TEXCOORD0;

VSOutput MainVS(in VSInput Input)
    VSOutput Output;

    Output.Position = mul(float4(Input.Position, 1.0f), Projection);
    Output.TexCoord = Input.TexCoord;
    Output.Color = Input.Color;
    return Output;
All this shader does is transform the vertex position by a projection matrix, and then pass along the texture coordinates and color to the pixel shader.

Do I simply need to load a dummy vertex shader and call SetVertexShader? Any additional vertex configuration or rendering configuration that needs to be done?

You will also need to make sure that your projection matrix gets set to the appropriate vertex shader constant registers. You do this in the same way that you're setting constants for your pixel shader: get the ID3DXConstantTable for the shader, and then call SetMatrix. Calling IDirect3DDevice9::SetTransform has no effect on vertex shaders, so you no longer need to call that.

One thing I'm not sure about is whether you can use FVF codes with vs_3_0 vertex shaders. FVF codes were a legacy feature for D3D9 that was carried over from D3D8. D3D9 also supports using vertex declarations to specify your vertex buffer layouts, which are a lot more flexible than FVF codes. You can try it with your FVF code first to see if it works, and if you get an error then you can try mapping your FVF code to a vertex declaration.

#5250533 Shader PS_3_0 Fails to Run (returns black frame)

Posted by MJP on 03 September 2015 - 11:55 PM

ps_3_0 shaders always need to be run with a vs_3_0 vertex shader. You can't mix pixel shaders with fixed-function vertex processing, which is what you're doing in your code. If you enable the debug runtimes in the D3D control panel, it will output an error message when you do this. In your case the shader can be very simple: you just need to transform the vertex position by your orthographic matrix, and pass along the other data to the pixel shader.

By the way, you can use D3DXFloat16To32Array from D3DX to convert from half to full precision.

#5250328 Performance questions with post-processing render-to-texture

Posted by MJP on 02 September 2015 - 03:52 PM

Those GPU times sound pretty high relative your SSAO cost. How are you measuring the timings? What GPU are you running on? What resolution are you running at?

The budget allocated to post-processing will vary depending on the game, what kind of effects it's using, what hardware it's running on, etc. For the PS4 game that I worked on, we budgeted around 3.0-3.25ms for post FX, which was close to 10% of a 33.3ms frame. I think around 0.85ms of that was DOF, 0.65-0.75ms or so for lens flares, 0.1-0.2ms for bloom, 0.3-0.5ms for motion blur, and maybe 0.85 ms for the final step which combined tone mapping, exposure, bloom/flare composite, chromatic aberration, film grain, lens distortion, vignette, and color correction. But of course this was after lots and lots of optimization for our target hardware.

#5249233 Firing many rays at one pixel?

Posted by MJP on 27 August 2015 - 03:27 PM

For path tracers, it's pretty common to use random or psuedo-random sampling patterns for this purpose. In contrast to regular sample patterns (AKA sample patterns that are the same per-pixel, like in Hodgman's example), they will hide aliasing better but will replace it with noise. I would strongly suggest reading through Physically Based Rendering if you haven't already, since it has a great overview of the some of the more popular sampling patterns (stratified, Hammersley, Latin Hypercube, etc.)

#5248832 Simple Solar Radiance Calculation

Posted by MJP on 25 August 2015 - 01:05 PM

When we were working on this a few months ago, we had been using a sample implementation of the Preetham solar radiance function because we were having some trouble getting the Hosek solar radiance function to work correctly. I revisited this a little while ago and I was able to get the Hosek sample implementation to work correctly, so I would suggest using that instead. The one catch is that their sample code only has a spectral implementation, so you need to do the spectral-RGB conversion yourself. To do that, I used the Spectrum classes from pbrt-v3.

The way their sample code works is that you need to make a sky model state for each wavelength that you're sampling. The pbrt SampledSpectrum class uses 60 samples ranging from 400-700nm, so that's what I used. Then for each wavelength you can sample a point on the solar disc to get the corresponding radiance, which you do by passing the sky state to their solar radiance function. I just create and delete the sky states on the fly, but you can cache and reuse them if you want. You just need to regenerate the states if the sun elevation, turbidity, or ground albedo changes. Their function also returns non-uniform radiance across the solar disc, so you may want to take multiple samples around the disc to get the most accurate result. Otherwise you can just take one sample right in the center.

This is the code I'm using at the moment. I can't promise that it's bug-free, but it seems to be working.

const float SunSize = DegToRad(0.27f);  // Angular radius of the sun from Earth

float thetaS = std::acos(1.0f - sunDirection.y);
float elevation = (Pi / 2.0f) - thetaS;

Float3 sunRadiance;

SampledSpectrum groundAlbedoSpectrum = SampledSpectrum::FromRGB(GroundAlbedo);
SampledSpectrum solarRadiance;

const uint64 NumDiscSamples = 8;
for(uint64 x = 0; x < NumDiscSamples; ++x)
    for(uint64 y = 0; y < NumDiscSamples; ++y)
        float u = (x + 0.5f) / NumDiscSamples;
        float v = (y + 0.5f) / NumDiscSamples;
        Float2 discSamplePos = SquareToConcentricDiskMapping(u, v);

        float theta = elevation + discSamplePos.y * SunSize;
        float gamma = discSamplePos.x * SunSize;

        for(int32 i = 0; i < nSpectralSamples; ++i)
            ArHosekSkyModelState* skyState = arhosekskymodelstate_alloc_init(elevation, turbidity, groundAlbedoSpectrum[i]);
            float wavelength = Lerp(float(SampledLambdaStart), float(SampledLambdaEnd), i / float(nSpectralSamples));

            solarRadiance[i] = float(arhosekskymodel_solar_radiance(skyState, theta, gamma, wavelength));

            skyState = nullptr;

        Float3 sampleRadiance = solarRadiance.ToRGB();
        sunRadiance += sampleRadiance;

// Account for coordinate system scaling, and sample averaging
sunRadiance *= 100.0f * (1.0f / NumDiscSamples) * (1.0f / NumDiscSamples);
This computes an average radiance across the entire solar disc. I'm doing it this way so that the code works with the rest of our framework, which currently works off the assumption that the solar disc has a uniform radiance. If you just want to compute the appropriate intensity to use for a directional light, then you can just directly compute irradiance instead. To this you need to evaluate the integral of cos(theta) * radiance, which you can do with monte carlo. Basically for each sample you compute you would multiply by N dot L (where 'N' is the direction towards the center of the sun, and 'L' is your current sample direction), and accumulate the sum. Then you would need to multiply the sum by InversePDF / NumSamples. Otherwise, if you assume the radiance is uniform then you can compute the irradiance integral analytically:

static float IlluminanceIntegral(float theta)
    float cosTheta = std::cos(theta);
    return Pi * (1.0f - (cosTheta * cosTheta));
where 'theta' is angular radiance of the sun. So the final irradiance would be IlluminanceIntegral(SunSize) * sunRadiance.

Oh, and that 'SquareToConcentricDiskMapping' function is just an implementation of Peter Shirley's method for mapping from a unit square to a unit circle:

inline Float2 SquareToConcentricDiskMapping(float x, float y)
    float phi = 0.0f;
    float r = 0.0f;

    // -- (a,b) is now on [-1,1]ˆ2
    float a = 2.0f * x - 1.0f;
    float b = 2.0f * y - 1.0f;

    if(a > -b)                      // region 1 or 2
        if(a > b)                   // region 1, also |a| > |b|
            r = a;
            phi = (Pi / 4.0f) * (b / a);
        else                        // region 2, also |b| > |a|
            r = b;
            phi = (Pi / 4.0f) * (2.0f - (a / b));
    else                            // region 3 or 4
        if(a < b)                   // region 3, also |a| >= |b|, a != 0
            r = -a;
            phi = (Pi / 4.0f) * (4.0f + (b / a));
        else                        // region 4, |b| >= |a|, but a==0 and b==0 could occur.
            r = -b;
            if(b != 0)
                phi = (Pi / 4.0f) * (6.0f - (a / b));
                phi = 0;

    Float2 result;
    result.x = r * std::cos(phi);
    result.y = r * std::sin(phi);
    return result;
Hope this helps!

#5248132 The Order 1886: Spherical Gaussian Lightmaps

Posted by MJP on 21 August 2015 - 03:43 PM

We had a custom GI baking system written top of Optix. Our tools were integrated into Maya (including our renderer), so the lighting artists would open the scene and Maya and initiate bakes. From there, we would package up the scene data and distribute it to multiple nodes on our bake farm, which were essentially Linux PC's running mostly GTX 780's.

We're still working on finish up our course notes, but once they're available there will be a lot more details about representing using an SG NDF and warping it to the correct space. We're also working on a code sample that bakes SG lightmaps and renders the scene.

Also, regarding the golden spiral: if you do a google search for "golden spiral on sphere", you can find some articles (like this one) that show you how to do it.

#5247472 Eye rendering - parallax correction

Posted by MJP on 18 August 2015 - 02:54 PM

First, look up the equations for refraction. These will tell you how to compute the refracted light direction based on the surface normal and IOR. If you have a mesh for the cornea that matches the actual dimensions of the human eye, then calculating the refraction is really easy in the pixel shader: your incoming light direction will be the eye->pixel vector, and the normal will be the interpolated surface normal of the mesh. Once you've calculated the refracted view direction, you just need to intersect it with the iris. A simple way to do this is to treat the iris as a flat plane that's 2.18mm from the apex of the cornea. You can then do a simple ray/plane intersection test to find the point on the surface of the iris that you're shading. To get the right UV coordinates to use, you just need a simple way of mapping your iris UV's to your actual positions on the iris (I just used an artist-configurable scale value on the XY coordinates of the iris surface). I would recommend doing all of this in a coordinate space that's local to the eye, since it makes the calculations simpler. For instance, you could have it set up such that the apex of the cornea is at X=Y=Z=0, and the iris is plane perpendicular with the XY plane located 2.8mm from the origin.

#5246781 UpdateSubresource on StructuredBuffer

Posted by MJP on 15 August 2015 - 04:28 PM

The interface kinda lets you believe that a DrawCall is executed when called.

Indeed, it does make it appear like that is the case. That's actually one of the major changes for D3D12: with D3D12 you build up one or more command lists, and then you must explicitly submit them to the GPU. This makes it very clear that you're buffering up commands in advance, and also lets you make the choice as to how much latency you want between building command lists and having the GPU execute them. It also completely exposes the memory synchronization to the programmer. So instead of having something like D3D11_MAP_WRITE_DISCARD where the driver is responsible for doing things behind the scenes to avoid stalls, it's up to you to make sure that you don't accidentally write to memory that the GPU is currently using.

#5246571 PBR specular BRDF

Posted by MJP on 14 August 2015 - 03:24 PM

So it sounds like you're asking why you would calculate the specular contribution from analytical light sources, when you could just include them in a pre-integrated environment map that's used for IBL specular. There's three main reasons for this:

1. Computing analytical specular is generally going to be higher quality than anything you get from a cubemap. The current techniques commonly used for IBL specular have some heavy approximations. For instance, you don't get the "stretched highlights" look that you're supposed to get from microfacet BRDFs, since cubemaps don't store enough information for full view-dependent calculations. You also can end up with a lot of locality issues due to the fact that your cubemaps are generated at sparse locations throughout your scene. This leads to a lack of proper occlusion, and poor parallax. If you can represent your lighting source analytically, you can use the full BRDF and get correct behavior.

2. If you handle the light separately, then the light can move or change intensity.

3. If you handle the light separately, then you can generate shadow maps to give you dynamic occlusion.

#5246545 UpdateSubresource on StructuredBuffer

Posted by MJP on 14 August 2015 - 01:29 PM

Mapping a DYNAMIC resource with D3D11_MAP_WRITE_DISCARD is meant to prevent any kind of GPU synchronization and stalls. Typically the GPU won't be executing commands until quite some time after the CPU issues D3D commands. The D3D user-mode drivers will typically buffer things so that they can be executed on a separate thread, and the driver will send off packets of work to the GPU at some later point. In practice you can end up having the GPU be up to 3 frames behind the CPU, although in practice it's usually closer to 1 frame. Because of that lag, you have an a potential issue with updating GPU resources from the CPU. If the CPU just modified a resource with no synchronization (which is effectively what happens when you use D3D11_MAP_WRITE_NO_OVERWRITE), the CPU might be changing it while the GPU is still using it, or hasn't used it yet. This is obviously bad, since you want the GPU to work with the data that you originally specified for the frame that its working on. To get around this DISCARD allows the driver to silently hand you a new resource behind the scenes, which is known as "buffer renaming". By giving you a new piece of memory to work with, you can write to that one while the GPU is still using the old piece of memory from a previous frame. Doing this can add a fair bit of overhead, since the driver might implement this by having some sort of pool where it frees up old allocations by waiting on labels to ensure that the GPU has finished using them. It may also decide to block you if insufficient memory is available, so that it can wait for the GPU in order to free up more memory. And then of course once the driver has given you the memory to write to, it will probably take a while to actually fill such a large buffer. Even at peak CPU bandwidth, it will surely take at least a few milliseconds to touch 120 MB of memory. It can also be slower in some cases, since the memory you get back from Map will typically be in uncached, write-combined memory so that it can be visible to the GPU.

The first thing I would probably do here is try to profile how much of your overhead is coming from Map(), and how much of it is coming from just filling the buffer with data. If Map() is taking a long time, you may want to consider an alternative approach. DYNAMIC is usually used for small, frequently-updated resources like constant buffers. The driver's internal mechanisms may not be scaling particularly well for this case. Another approach you can try is to have your own pool of buffers that have STAGING usage. You can cycle through these (2-3 should be enough), and then once you've filled them you can use CopyResource to copy their contents to a GPU-accessible buffer with DEFAULT usage.