Jump to content

  • Log In with Google      Sign In   
  • Create Account

We need your help!

We need 1 more developer from Canada and 12 more from Australia to help us complete a research survey.

Support our site by taking a quick sponsored survey and win a chance at a $50 Amazon gift card. Click here to get started!


Member Since 29 Mar 2007
Offline Last Active Yesterday, 06:22 PM

#5250533 Shader PS_3_0 Fails to Run (returns black frame)

Posted by MJP on 03 September 2015 - 11:55 PM

ps_3_0 shaders always need to be run with a vs_3_0 vertex shader. You can't mix pixel shaders with fixed-function vertex processing, which is what you're doing in your code. If you enable the debug runtimes in the D3D control panel, it will output an error message when you do this. In your case the shader can be very simple: you just need to transform the vertex position by your orthographic matrix, and pass along the other data to the pixel shader.

By the way, you can use D3DXFloat16To32Array from D3DX to convert from half to full precision.

#5250328 Performance questions with post-processing render-to-texture

Posted by MJP on 02 September 2015 - 03:52 PM

Those GPU times sound pretty high relative your SSAO cost. How are you measuring the timings? What GPU are you running on? What resolution are you running at?

The budget allocated to post-processing will vary depending on the game, what kind of effects it's using, what hardware it's running on, etc. For the PS4 game that I worked on, we budgeted around 3.0-3.25ms for post FX, which was close to 10% of a 33.3ms frame. I think around 0.85ms of that was DOF, 0.65-0.75ms or so for lens flares, 0.1-0.2ms for bloom, 0.3-0.5ms for motion blur, and maybe 0.85 ms for the final step which combined tone mapping, exposure, bloom/flare composite, chromatic aberration, film grain, lens distortion, vignette, and color correction. But of course this was after lots and lots of optimization for our target hardware.

#5249233 Firing many rays at one pixel?

Posted by MJP on 27 August 2015 - 03:27 PM

For path tracers, it's pretty common to use random or psuedo-random sampling patterns for this purpose. In contrast to regular sample patterns (AKA sample patterns that are the same per-pixel, like in Hodgman's example), they will hide aliasing better but will replace it with noise. I would strongly suggest reading through Physically Based Rendering if you haven't already, since it has a great overview of the some of the more popular sampling patterns (stratified, Hammersley, Latin Hypercube, etc.)

#5248832 Simple Solar Radiance Calculation

Posted by MJP on 25 August 2015 - 01:05 PM

When we were working on this a few months ago, we had been using a sample implementation of the Preetham solar radiance function because we were having some trouble getting the Hosek solar radiance function to work correctly. I revisited this a little while ago and I was able to get the Hosek sample implementation to work correctly, so I would suggest using that instead. The one catch is that their sample code only has a spectral implementation, so you need to do the spectral-RGB conversion yourself. To do that, I used the Spectrum classes from pbrt-v3.

The way their sample code works is that you need to make a sky model state for each wavelength that you're sampling. The pbrt SampledSpectrum class uses 60 samples ranging from 400-700nm, so that's what I used. Then for each wavelength you can sample a point on the solar disc to get the corresponding radiance, which you do by passing the sky state to their solar radiance function. I just create and delete the sky states on the fly, but you can cache and reuse them if you want. You just need to regenerate the states if the sun elevation, turbidity, or ground albedo changes. Their function also returns non-uniform radiance across the solar disc, so you may want to take multiple samples around the disc to get the most accurate result. Otherwise you can just take one sample right in the center.

This is the code I'm using at the moment. I can't promise that it's bug-free, but it seems to be working.

const float SunSize = DegToRad(0.27f);  // Angular radius of the sun from Earth

float thetaS = std::acos(1.0f - sunDirection.y);
float elevation = (Pi / 2.0f) - thetaS;

Float3 sunRadiance;

SampledSpectrum groundAlbedoSpectrum = SampledSpectrum::FromRGB(GroundAlbedo);
SampledSpectrum solarRadiance;

const uint64 NumDiscSamples = 8;
for(uint64 x = 0; x < NumDiscSamples; ++x)
    for(uint64 y = 0; y < NumDiscSamples; ++y)
        float u = (x + 0.5f) / NumDiscSamples;
        float v = (y + 0.5f) / NumDiscSamples;
        Float2 discSamplePos = SquareToConcentricDiskMapping(u, v);

        float theta = elevation + discSamplePos.y * SunSize;
        float gamma = discSamplePos.x * SunSize;

        for(int32 i = 0; i < nSpectralSamples; ++i)
            ArHosekSkyModelState* skyState = arhosekskymodelstate_alloc_init(elevation, turbidity, groundAlbedoSpectrum[i]);
            float wavelength = Lerp(float(SampledLambdaStart), float(SampledLambdaEnd), i / float(nSpectralSamples));

            solarRadiance[i] = float(arhosekskymodel_solar_radiance(skyState, theta, gamma, wavelength));

            skyState = nullptr;

        Float3 sampleRadiance = solarRadiance.ToRGB();
        sunRadiance += sampleRadiance;

// Account for coordinate system scaling, and sample averaging
sunRadiance *= 100.0f * (1.0f / NumDiscSamples) * (1.0f / NumDiscSamples);
This computes an average radiance across the entire solar disc. I'm doing it this way so that the code works with the rest of our framework, which currently works off the assumption that the solar disc has a uniform radiance. If you just want to compute the appropriate intensity to use for a directional light, then you can just directly compute irradiance instead. To this you need to evaluate the integral of cos(theta) * radiance, which you can do with monte carlo. Basically for each sample you compute you would multiply by N dot L (where 'N' is the direction towards the center of the sun, and 'L' is your current sample direction), and accumulate the sum. Then you would need to multiply the sum by InversePDF / NumSamples. Otherwise, if you assume the radiance is uniform then you can compute the irradiance integral analytically:

static float IlluminanceIntegral(float theta)
    float cosTheta = std::cos(theta);
    return Pi * (1.0f - (cosTheta * cosTheta));
where 'theta' is angular radiance of the sun. So the final irradiance would be IlluminanceIntegral(SunSize) * sunRadiance.

Oh, and that 'SquareToConcentricDiskMapping' function is just an implementation of Peter Shirley's method for mapping from a unit square to a unit circle:

inline Float2 SquareToConcentricDiskMapping(float x, float y)
    float phi = 0.0f;
    float r = 0.0f;

    // -- (a,b) is now on [-1,1]ˆ2
    float a = 2.0f * x - 1.0f;
    float b = 2.0f * y - 1.0f;

    if(a > -b)                      // region 1 or 2
        if(a > b)                   // region 1, also |a| > |b|
            r = a;
            phi = (Pi / 4.0f) * (b / a);
        else                        // region 2, also |b| > |a|
            r = b;
            phi = (Pi / 4.0f) * (2.0f - (a / b));
    else                            // region 3 or 4
        if(a < b)                   // region 3, also |a| >= |b|, a != 0
            r = -a;
            phi = (Pi / 4.0f) * (4.0f + (b / a));
        else                        // region 4, |b| >= |a|, but a==0 and b==0 could occur.
            r = -b;
            if(b != 0)
                phi = (Pi / 4.0f) * (6.0f - (a / b));
                phi = 0;

    Float2 result;
    result.x = r * std::cos(phi);
    result.y = r * std::sin(phi);
    return result;
Hope this helps!

#5248132 The Order 1886: Spherical Gaussian Lightmaps

Posted by MJP on 21 August 2015 - 03:43 PM

We had a custom GI baking system written top of Optix. Our tools were integrated into Maya (including our renderer), so the lighting artists would open the scene and Maya and initiate bakes. From there, we would package up the scene data and distribute it to multiple nodes on our bake farm, which were essentially Linux PC's running mostly GTX 780's.

We're still working on finish up our course notes, but once they're available there will be a lot more details about representing using an SG NDF and warping it to the correct space. We're also working on a code sample that bakes SG lightmaps and renders the scene.

Also, regarding the golden spiral: if you do a google search for "golden spiral on sphere", you can find some articles (like this one) that show you how to do it.

#5247472 Eye rendering - parallax correction

Posted by MJP on 18 August 2015 - 02:54 PM

First, look up the equations for refraction. These will tell you how to compute the refracted light direction based on the surface normal and IOR. If you have a mesh for the cornea that matches the actual dimensions of the human eye, then calculating the refraction is really easy in the pixel shader: your incoming light direction will be the eye->pixel vector, and the normal will be the interpolated surface normal of the mesh. Once you've calculated the refracted view direction, you just need to intersect it with the iris. A simple way to do this is to treat the iris as a flat plane that's 2.18mm from the apex of the cornea. You can then do a simple ray/plane intersection test to find the point on the surface of the iris that you're shading. To get the right UV coordinates to use, you just need a simple way of mapping your iris UV's to your actual positions on the iris (I just used an artist-configurable scale value on the XY coordinates of the iris surface). I would recommend doing all of this in a coordinate space that's local to the eye, since it makes the calculations simpler. For instance, you could have it set up such that the apex of the cornea is at X=Y=Z=0, and the iris is plane perpendicular with the XY plane located 2.8mm from the origin.

#5246781 UpdateSubresource on StructuredBuffer

Posted by MJP on 15 August 2015 - 04:28 PM

The interface kinda lets you believe that a DrawCall is executed when called.

Indeed, it does make it appear like that is the case. That's actually one of the major changes for D3D12: with D3D12 you build up one or more command lists, and then you must explicitly submit them to the GPU. This makes it very clear that you're buffering up commands in advance, and also lets you make the choice as to how much latency you want between building command lists and having the GPU execute them. It also completely exposes the memory synchronization to the programmer. So instead of having something like D3D11_MAP_WRITE_DISCARD where the driver is responsible for doing things behind the scenes to avoid stalls, it's up to you to make sure that you don't accidentally write to memory that the GPU is currently using.

#5246571 PBR specular BRDF

Posted by MJP on 14 August 2015 - 03:24 PM

So it sounds like you're asking why you would calculate the specular contribution from analytical light sources, when you could just include them in a pre-integrated environment map that's used for IBL specular. There's three main reasons for this:

1. Computing analytical specular is generally going to be higher quality than anything you get from a cubemap. The current techniques commonly used for IBL specular have some heavy approximations. For instance, you don't get the "stretched highlights" look that you're supposed to get from microfacet BRDFs, since cubemaps don't store enough information for full view-dependent calculations. You also can end up with a lot of locality issues due to the fact that your cubemaps are generated at sparse locations throughout your scene. This leads to a lack of proper occlusion, and poor parallax. If you can represent your lighting source analytically, you can use the full BRDF and get correct behavior.

2. If you handle the light separately, then the light can move or change intensity.

3. If you handle the light separately, then you can generate shadow maps to give you dynamic occlusion.

#5246545 UpdateSubresource on StructuredBuffer

Posted by MJP on 14 August 2015 - 01:29 PM

Mapping a DYNAMIC resource with D3D11_MAP_WRITE_DISCARD is meant to prevent any kind of GPU synchronization and stalls. Typically the GPU won't be executing commands until quite some time after the CPU issues D3D commands. The D3D user-mode drivers will typically buffer things so that they can be executed on a separate thread, and the driver will send off packets of work to the GPU at some later point. In practice you can end up having the GPU be up to 3 frames behind the CPU, although in practice it's usually closer to 1 frame. Because of that lag, you have an a potential issue with updating GPU resources from the CPU. If the CPU just modified a resource with no synchronization (which is effectively what happens when you use D3D11_MAP_WRITE_NO_OVERWRITE), the CPU might be changing it while the GPU is still using it, or hasn't used it yet. This is obviously bad, since you want the GPU to work with the data that you originally specified for the frame that its working on. To get around this DISCARD allows the driver to silently hand you a new resource behind the scenes, which is known as "buffer renaming". By giving you a new piece of memory to work with, you can write to that one while the GPU is still using the old piece of memory from a previous frame. Doing this can add a fair bit of overhead, since the driver might implement this by having some sort of pool where it frees up old allocations by waiting on labels to ensure that the GPU has finished using them. It may also decide to block you if insufficient memory is available, so that it can wait for the GPU in order to free up more memory. And then of course once the driver has given you the memory to write to, it will probably take a while to actually fill such a large buffer. Even at peak CPU bandwidth, it will surely take at least a few milliseconds to touch 120 MB of memory. It can also be slower in some cases, since the memory you get back from Map will typically be in uncached, write-combined memory so that it can be visible to the GPU.

The first thing I would probably do here is try to profile how much of your overhead is coming from Map(), and how much of it is coming from just filling the buffer with data. If Map() is taking a long time, you may want to consider an alternative approach. DYNAMIC is usually used for small, frequently-updated resources like constant buffers. The driver's internal mechanisms may not be scaling particularly well for this case. Another approach you can try is to have your own pool of buffers that have STAGING usage. You can cycle through these (2-3 should be enough), and then once you've filled them you can use CopyResource to copy their contents to a GPU-accessible buffer with DEFAULT usage.

#5246314 UpdateSubresource on StructuredBuffer

Posted by MJP on 13 August 2015 - 04:13 PM

UpdateSubresource only works for resources created with DEFAULT usage. For DYNAMIC you should Map and Unmap.

#5243906 Normal offsets in view space?

Posted by MJP on 31 July 2015 - 04:14 PM

There should be no difference between doing this in world space or view space, provided that everything you're working with is in the same coordinate space. I would suspect that you have something in world space that hasn't been converted to view space.

#5243437 Is it worth to use index buffers?

Posted by MJP on 29 July 2015 - 03:14 PM

The way most GPU's work is that they have "post-transform vertex cache", which stores the results of the vertex shader for some number of vertices. The idea is that if you have the same index come up multiple times, it won't have to invoke the vertex shader every time. However since the cache size is limited, you want to sort your triangles so that the same indices are near each other in the index buffer.

Another thing to consider is that there is sometimes another cache (or caches) that's used for reading the actual vertex data needed by the vertex shader. For instance on AMD's recent hardware, all vertex fetching is done as standard vector memory loads that go through both the L2 and L1 caches. In light of that, you may also want to sort the order of the elements in your vertex buffer so that you get better locality, which reduces cache misses.

You should be able to find some links if you search for "vertex cache optimization" on Google. You'll probably want to use an existing implementation, like this one.

#5243258 why do we need to divide w in texture projection

Posted by MJP on 28 July 2015 - 03:15 PM

guys, frankly, I thank you all for reply, but I still don't get it.
Why is this divide called perspective divide? and it's a  (x or y / depth) => tan( fov / 2) right? 
I'm gonna go through some basic knowledge again, I must be missing sth, seriously!

The "perspective divide" comes from the use of homogeneous coordinates. If you're not familiar with that term, I would suggest reading some background material. This link that I found explains some of the basics, and how it applies to graphics.

#5243116 why do we need to divide w in texture projection

Posted by MJP on 27 July 2015 - 11:30 PM

tex2Dproj is intended to be used for texture lookups that are the result of applying a perspective projection to a coordinate. The most common example (and almost certainly the only intended use case) is for shadow maps: typically you sample a shadow map by transforming your pixel position by a light-space projection matrix, and then use the resulting projected position to compute the UV address for the shadow map texture. tex2Dproj just saves you from having to do the last step in that process, which is the perspective divide. It was also used as the means of accessing so-called "hardware PCF" functionality on older GPU's, hence the reason for passing the Z coordinate of the projected position (the Z coordinate was used for comparing with the shadow map depth).

#5241843 Shader Returns Black Screen When Compiling With ps_3_0

Posted by MJP on 21 July 2015 - 08:18 PM

Are you using it with a vs_3_0 vertex shader?