Jump to content

  • Log In with Google      Sign In   
  • Create Account

MJP

Member Since 29 Mar 2007
Offline Last Active Today, 12:42 AM

#5079747 API Wars horror - Will it matter?

Posted by MJP on 22 July 2013 - 10:22 PM

 

My bad huh.png That's what I remembered reading in the past, but like I said I never programmed for the PS3.


At the start of the PS3's life time the fact a OGL|ES implementation existed was jumped on by the 'opengl everywhere!' gang and has since been reported as a fact the PS3 uses OpenGL... alas to this day the misinformation exists and thus this common mistake crops up sad.png

 

Yeah I still see that all of the time (especially on general gaming forums) and it drives me nuts. Hopefully the same thing doesn't happen to PS4.




#5079622 [Instancing] Flickering when updating one instance.

Posted by MJP on 22 July 2013 - 11:52 AM

 

If your common use case is to only change a small part of the buffer at a time, then you can try using a two-buffer approach. Create your primary instancing buffer as D3D11_USAGE_IMMUTABLE, and then create a secondary buffer with D3D11_USAGE_STAGING. Then when you want to change the buffer contents, map your staging buffer and then use CopySubresourceRegion to copy some or all of the staging buffer to your primary buffer.


Wait. Copying to an immutable ? I've just tried that with D3D 11.0 and I get this:

ID3D11DeviceContext::CopySubresourceRegion: Cannot invoke CopySubresourceRegion when the destination Resource was created with the D3D11_USAGE_IMMUTABLE Usage.

What am I missing ? Is this some D3D 11.1 or higher feature ?

 

You're not missing anything, I messed up. That should have been D3D11_USAGE_DEFAULT instead of D3D11_USAGE_IMMUTABLE.




#5079195 [Instancing] Flickering when updating one instance.

Posted by MJP on 20 July 2013 - 01:06 PM

When you map with D3D11_MAP_WRITE_DISCARD you lose all previous contents of that resource. So if you use it, you need to fill up the entire buffer again.

If your common use case is to only change a small part of the buffer at a time, then you can try using a two-buffer approach. Create your primary instancing buffer as D3D11_USAGE_DEFAULT, and then create a secondary buffer with D3D11_USAGE_STAGING. Then when you want to change the buffer contents, map your staging buffer and then use CopySubresourceRegion to copy some or all of the staging buffer to your primary buffer.




#5079090 f16tof32 f32tof16 doesn't work correctly

Posted by MJP on 19 July 2013 - 11:12 PM

Are you using DXGI_FORMAT_R16G16B16A16_FLOAT for the position element in your vertex buffer? You said the format is R16G16B16A16, but there are several varieties of that format.

For debugging compute shaders the only tools you can use are Nsight (Nvidia) and GPU PerfStudio (AMD). However you don't necessarily need any tools to debug this...you can copy your buffer to a staging buffer that you can MAP on the CPU so that you can print out the values or inspect them in a debugger.




#5078810 Shader Unlimited Lights

Posted by MJP on 18 July 2013 - 06:48 PM

One can break the constant limit - and force to use a proper loop - in SM 3.0 by encoding stuff in textures

I think there's an echo in here tongue.png




#5078634 Shader Unlimited Lights

Posted by MJP on 17 July 2013 - 11:27 PM

 

D3D9 pixel shaders have a major limitation, which is that they can't dynamically index into shader constants. This means that it can't use an actual loop construct in assembly to implement your for loop, instead it has to unroll it and do something like this:

 

 

It's possible to use a loop with D3D9, take a look at this sample:

http://www.dhpoware.com/demos/d3d9NormalMappingWithManyLights.html

 

Have you looked at the generated assembly? It looks like this:

ps_3_0
def c46, -4, -5, -6, -7
def c47, 0, 1, 2, 3
dcl_texcoord v0.xyz
dcl_texcoord1 v1.xy
dcl_texcoord2 v2.xyz
dcl_texcoord3 v3.xyz
dcl_2d s0
nrm r0.xyz, v3
dp3 r0.w, v2, v2
rsq r0.w, r0.w
mov r1, c47.x
mov r2.x, c47.x
rep i0
  add r3, r2.x, -c47
  add r4, r2.x, c46
  mov r5.x, c47.x
  cmp r2.yzw, -r3_abs.x, c0.xxyz, r5.x
  cmp r2.yzw, -r3_abs.y, c5.xxyz, r2
  cmp r2.yzw, -r3_abs.z, c10.xxyz, r2
  cmp r2.yzw, -r3_abs.w, c15.xxyz, r2
  cmp r2.yzw, -r4_abs.x, c20.xxyz, r2
  cmp r2.yzw, -r4_abs.y, c25.xxyz, r2
  cmp r2.yzw, -r4_abs.z, c30.xxyz, r2
  cmp r2.yzw, -r4_abs.w, c35.xxyz, r2
  add r2.yzw, r2, -v0.xxyz
  cmp r5.y, -r3_abs.x, c4.x, r5.x
  cmp r5.y, -r3_abs.y, c9.x, r5.y
  cmp r5.y, -r3_abs.z, c14.x, r5.y
  cmp r5.y, -r3_abs.w, c19.x, r5.y
  cmp r5.y, -r4_abs.x, c24.x, r5.y
  cmp r5.y, -r4_abs.y, c29.x, r5.y
  cmp r5.y, -r4_abs.z, c34.x, r5.y
  cmp r5.y, -r4_abs.w, c39.x, r5.y
  rcp r5.y, r5.y
  mul r2.yzw, r2, r5.y
  dp3 r5.y, r2.yzww, r2.yzww
  add r5.z, -r5.y, c47.y
  max r6.x, r5.z, c47.x
  rsq r5.y, r5.y
  mul r2.yzw, r2, r5.y
  mad r5.yzw, v2.xxyz, r0.w, r2
  nrm r7.xyz, r5.yzww
  dp3_sat r2.y, r0, r2.yzww
  dp3_sat r2.z, r0, r7
  pow r5.y, r2.z, c44.x
  cmp r7, -r3_abs.x, c1, r5.x
  cmp r7, -r3_abs.y, c6, r7
  cmp r7, -r3_abs.z, c11, r7
  cmp r7, -r3_abs.w, c16, r7
  cmp r7, -r4_abs.x, c21, r7
  cmp r7, -r4_abs.y, c26, r7
  cmp r7, -r4_abs.z, c31, r7
  cmp r7, -r4_abs.w, c36, r7
  mad r7, r6.x, r7, c45
  cmp r8, -r3_abs.x, c2, r5.x
  cmp r8, -r3_abs.y, c7, r8
  cmp r8, -r3_abs.z, c12, r8
  cmp r8, -r3_abs.w, c17, r8
  cmp r8, -r4_abs.x, c22, r8
  cmp r8, -r4_abs.y, c27, r8
  cmp r8, -r4_abs.z, c32, r8
  cmp r8, -r4_abs.w, c37, r8
  mul r8, r8, c41
  mul r8, r2.y, r8
  mul r8, r6.x, r8
  mad r7, c40, r7, r8
  cmp r8, -r3_abs.x, c3, r5.x
  cmp r8, -r3_abs.y, c8, r8
  cmp r8, -r3_abs.z, c13, r8
  cmp r3, -r3_abs.w, c18, r8
  cmp r3, -r4_abs.x, c23, r3
  cmp r3, -r4_abs.y, c28, r3
  cmp r3, -r4_abs.z, c33, r3
  cmp r3, -r4_abs.w, c38, r3
  mul r3, r3, c43
  mul r3, r5.y, r3
  cmp r3, -r2.y, c47.x, r3
  mad r3, r3, r6.x, r7
  add r1, r1, r3
  add r2.x, r2.x, c47.y
endrep
texld r0, v1, s0
mul oC0, r0, r1

Because of the constant indexing limitation it has to do a compare and select for every single constant register. It's just a different variant of what I mentioned. Basically it's like doing this:

for(uint i = 0; i < NumLights; ++i)
{
    float3 LightPos = Lights[0].Position;
    if(i == 1)
        LightPos = Lights[1].Position;
    else if(i == 2)
        LightPos = Lights[2].Position;
    else if(i == 3)
        LightPos = Lights[3].Position;
    ...
    else if(i == 7)
        LightPos = Lights[7].Position;
        
    float3 LightColor = Lights[0].Color;
    if(i == 1)
        LightColor = Lights[1].Color;
    else if(i == 2)
        LightColor = Lights[2].Color;
    else if(i == 3)
        LightColor = Lights[3].Color;
    ...
    else if(i == 7)
        LightColor = Lights[7].Color;
        
    // and so on
}



#5078547 "Rough" material that is compatible with color-only lightmaps?

Posted by MJP on 17 July 2013 - 02:49 PM

When you bake a diffuse lightmap you're pre-integrating your lighting environment at each texel with your BRDF. You can do this and end up with a single value because you assume that the surface normal never changes, and that the reflected light doesn't depend on the viewing angle (which is true for a Lambertian BRDF). For most other BRDF's that last assumption doesn't hold, so you can't really use the same approach and get correct results. You would need to either...

  • Store the lighting environment without pre-integrating your BRDF using some sort of basis (such as spherical harmonics), and integrate your BRDF at runtime taking the current viewing angle into account (with this approach you can also vary the surface normal at runtime, which allows for normal mapping)
  • Pre-integrate with your BRDF and store multiple values corresponding to multiple viewing angles, then interpolate between them at runtime based on the current viewing angle
  • Don't change the way you bake lightmaps, and instead attempt to apply some approximating curve to the value that takes the viewing angle into account.



#5078542 Shader Unlimited Lights

Posted by MJP on 17 July 2013 - 02:29 PM

D3D9 pixel shaders have a major limitation, which is that they can't dynamically index into shader constants. This means that it can't use an actual loop construct in assembly to implement your for loop, instead it has to unroll it and do something like this:

 

if(numLights > 0)
    CalcLight(Light0);
 
if(numLights > 1)
    CalcLight(Light1);
 
if(numLights > 2)
    CalcLight(Light2);
 
// ...and so on

The only way to dynamically index into your light data would be to use textures to store your light properties. This is a major reason why shader permutations were very popular for this era of hardware.

 

Now even if you get this working, you have to keep in mind that it's very important for performance to not just blindly apply every light to every mesh that you draw. Even a dozen or so lights can be a huge strain on the GPU if those lights affect every pixel on the screen. This means either doing work on the CPU to selectively bind lights to certain meshes/draw calls, or switching to a deferred approach that uses the GPU to figure out which lights affect which pixels.

 




#5078060 Constant buffer madness! [fixed]

Posted by MJP on 16 July 2013 - 12:02 AM

 

 

Each field in a constant buffer must be 16-byte aligned

#define ALIGN_REGISTER _declspec(align(16))

struct LightingConstants
{
     DirectX::XMFLOAT3    myLightDirection;
     ALIGN_REGISTER DirectX::XMFLOAT3    myLightColour;
     ALIGN_REGISTER DirectX::XMFLOAT3    myCameraPosition;
     ALIGN_REGISTER DirectX::XMFLOAT2    myHalfPixel;
     ALIGN_REGISTER DirectX::XMFLOAT4X4    myInverseViewProjection;
     float            myPadding;
};

.... should work. But maybe XM... types are already aligned, with D3DX I have to manually align the fields.

 

My explanation is quite bad, please check this http://msdn.microsoft.com/en-us/library/bb509632%28v=vs.85%29.aspx

Not all variables in a constant buffer will be aligned to 16-byte boundaries. Alignment issues only occur when a variable will straddle a 16-byte boundary. So for instance this constant buffer won't require any special alignment in your C++ struct:

cbuffer Constants
{
    float Var0; // Offset == 0
    float Var1; // Offset == 4
    float Var2; // Offset == 8
    float Var3; // Offset == 12
}
However something like this will cause the alignment rules to kick in:
cbuffer Constants
{
    float Var0;  // Offset == 0
    float Var1;  // Offset == 4
    float3 Var2; // Offset == 16, since a float 3 would occupy bytes [8-20] which would cross the boundar
    float Var3;  // Offset == 20
}

Hello MJP. That's why I suggested to follow the link....

Could we say that vectors and matrices must be 16-bytes aligned and scalars musn't ?

 

Yes indeed the link spells out all of the rules in detail, I just wanted to make sure that anyone reading in this thread didn't come away with incorrect information. smile.png

Saying that vectors need to be aligned and scalars don't is also incorrect. A float2 vector won't need to be aligned if it starts on byte 4 or byte 8, since it can still fit within a 16-byte boundary. Same goes for a float3 that starts on byte 4. Matrices can also be a bit weird if you don't use a float4x4, since you have to consider the alignment for each row. For instance a float3x3 will get packed as 3 float3's, with each float3 aligned to the next 16-byte boundary.




#5077957 Constant buffer madness! [fixed]

Posted by MJP on 15 July 2013 - 02:33 PM

Each field in a constant buffer must be 16-byte aligned

#define ALIGN_REGISTER _declspec(align(16))

struct LightingConstants
{
     DirectX::XMFLOAT3    myLightDirection;
     ALIGN_REGISTER DirectX::XMFLOAT3    myLightColour;
     ALIGN_REGISTER DirectX::XMFLOAT3    myCameraPosition;
     ALIGN_REGISTER DirectX::XMFLOAT2    myHalfPixel;
     ALIGN_REGISTER DirectX::XMFLOAT4X4    myInverseViewProjection;
     float            myPadding;
};

.... should work. But maybe XM... types are already aligned, with D3DX I have to manually align the fields.

 

My explanation is quite bad, please check this http://msdn.microsoft.com/en-us/library/bb509632%28v=vs.85%29.aspx

Not all variables in a constant buffer will be aligned to 16-byte boundaries. Alignment issues only occur when a variable will straddle a 16-byte boundary. So for instance this constant buffer won't require any special alignment in your C++ struct:

 

cbuffer Constants
{
    float Var0; // Offset == 0
    float Var1; // Offset == 4
    float Var2; // Offset == 8
    float Var3; // Offset == 12
}

However something like this will cause the alignment rules to kick in:

 

cbuffer Constants
{
    float Var0;  // Offset == 0
    float Var1;  // Offset == 4
    float3 Var2; // Offset == 16, since a float 3 would occupy bytes [8-20] which would cross the boundar
    float Var3;  // Offset == 20
}



#5077939 Energy conservation of diffuse term

Posted by MJP on 15 July 2013 - 01:21 PM

In general the combination of a diffuse and specular is just to simulate materials that are actually composed of multiple layers consisting of different reflective properties. The classic example would be a coated plastic, where you have a very smooth specular reflection on the surface layer while underneath you have a layer that's much rougher with subsurface scattering that causes the reflected light to take on the albedo color. You can generalize this to having a material being composed of a sum of BRDF's, instead of a more rigid diffuse/specular relationship. For instance on cars you typically have a clear coat on top, and a more metallic surface underneath that is still very much view-dependent which necessitates another specular lobe. In all cases you just need to be careful in how you set up the interactions between the two BRDF terms if you want to maintain energy conservation.




#5077757 Global illumination techniques

Posted by MJP on 14 July 2013 - 08:57 PM

Yeah  that's the gist of it: you store ambient diffuse lighting encoded in some basis at sample locations, and interpolate between the samples based on the position of the object sampling the probes as well as the structured of the probes themselves (grid, loose points, etc.). Some common based used for encoding diffuse lighting:

  • Single value - store a single diffuse color, use it for all normal directions. Makes objects look very flat in ambient lighting, since the ambient diffuse doesn't change with normal direction. There's no way to actually compute diffuse lighting without a surface normal, so typically it will be the average of the computed diffuse in multiple directions.
  • Hemispherical - basically you compute diffuse lighting for a normal pointed straight up, and one for a normal pointed straight down. Then you interpolate between the two value using the Y component of the surface normal used for rendering.
  • Ambient cube - this is what Valve used in HL2. Similar to hemispherical, except diffuse lighting is computed and stored in 6 directions (usually aligned to world space axes). The contribution for each light is determined by taking the dot product of the surface normal with the direction for each axis of the cube.
  • Spherical harmonics - very commonly used in modern games. Basically you can store a low-frequency version of any spherical signal by projecting onto the SH basis functions up to a certain order, with the order determining how much detail will be retained as well as the number of coefficients that you need to store. SH has all kinds of useful properties, for instance they're essentially a frequency-domain representation of your signal which means you can perform convolutions with a simple multiplication. Typically this is used to convolve the lighting environment with a cosine lobe, which essentially gives you Lambertian diffuse. However you can also convolve with different kernels, which allows you to use SH with non-Lambertian BRDF's as well. You can even do specular BRDF's, however the low-frequency nature of SH typically limits you to very high roughnesses (low specular power, for Phong/Blinn-Phong BRDF's).
  • Spherical Radial Basis Functions - with these you basically approximate the full lighting environment surrounding a probe point with a set number of lobes (usually Gaussian) oriented at arbitrary directions about a sphere. These can be cool because they can let you potentially capture high-frequency lighting. However they're also difficult because you have to use a non-linear solver to "fit" a set of lobes to a lighting environment. You can also have issues with interpolation, since each probe can potentially have arbitrary lobe directions.
  • Cube Maps - this isn't common, but it's possible to integrate the irradiance for a set of cubemap texels where each texel represents a surface normal of a certain direction. This makes your shader code for evaluating lighting very simple: you just lookup into the cubemap based on the surface normal. However it's generally overkill, since something like SH or Ambient Cube can store diffuse lighting with relatively little error while having a very compact representation. Plus you don't have to mess around with binding an array of cube map textures, or sampling from them.

For all of these (with the exception of SRBF's) you can generate the probes by either ray-tracing and directly projecting onto the basis, or by rasterizing to a cubemap first and then projecting. This can potentially be very quick, in fact you could do it in real time for a limited number of probe locations. SRBF's are trickier, because of the non-linear solve which is typically an iterative process.

EDIT: I looked at those links posted, and there's two things I'd like to point out. In that GPU Gems article they evalute the diffuse from the SH lighting environment by pre-computing the set of "lookup" SH coefficients to a cube map lookup texture, but this is totally unnecessary since you can't just directly compute these coefficients in the shader. Something like this should work:

// 'lightingSH' is the lighting environment projected onto SH (3rd order in this case),
// and 'n' is the surface normal
float3 ProjectOntoSH9(in float3 lightingSH[9], in float3 n)
{
    float3 result = 0.0f;
    
    // Cosine kernel
    const float A0 = 1.0f;
    const float A1 = 2.0f / 3.0f;
    const float A2 = 0.25f;

    // Band 0
    result += lightingSH[0] * 0.282095f * A0;

    // Band 1
    result += lightingSH[1] * 0.488603f * n.y * A1;
    result += lightingSH[2] * 0.488603f * n.z * A1;
    result += lightingSH[3] * 0.488603f * n.x * A1;

    // Band 2
    result += lightingSH[4] * 1.092548f * n.x * n.y * A2;
    result += lightingSH[5] * 1.092548f * n.y * n.z * A2;
    result += lightingSH[6] * 0.315392f * (3.0f * n.z * n.z - 1.0f) * A2;
    result += lightingSH[7] * 1.092548f * n.x * n.z * A2;
    result += lightingSH[8] * 0.546274f * (n.x * n.x - n.y * n.y) * A2;

    return result;
}

This brings me to my second point, which is that the WebGL article mentions the lookup texture as a disadvantage of SH which really isn't valid since you don't need it at all. This makes SH a much more attractive option for storing irradiance, especially if your goal is runtime generation of irradiance maps since with SH you don't need an expensive convolution step. Instead your projection onto SH is basically a repeated downsampling process, which can be done very quickly. This is especially true if you use compute shaders, since you can use a parallel reduction to perform integration using shared memory with fewer steps.

 

For an introduction to using SH for this purpose, I would definitely recommend reading Ravi's 2001 paper on the subject. Robin Greene's paper is also a good place to start.




#5077405 Fast exp2() function in shader

Posted by MJP on 13 July 2013 - 01:57 PM

Almost all GPU's have a native exp2 instruction in their ALU's, so you're not going to make a faster version on your own. Converting to integer does often have a performance cost, and on most GPU's integer instructions run at 1/2 or 1/4 rate which means its unlikely you'll get better performance with bit shifts. You'll have to check the available docs on various architectures to find out the specifics.


#5076815 Debugging the stencil buffer

Posted by MJP on 10 July 2013 - 11:07 PM

It depends on which format you used to create the depth buffer. If you used DXGI_FORMAT_R24G8_TYPELESS, then use DXGI_FORMAT_X24_TYPELESS_G8_UINT to create your SRV and then access the G channel in your shader. If you used DXGI_FORMAT_R32G8X24_TYPELESS, then do the same with DXGI_FORMAT_X32_TYPELESS_G8X24_UINT.

In your shader, make sure that you declare your texture as Texture2D<uint2>. You'll then get an integer that's [0, 255], which you can convert to a [0,1] float for displaying.




#5076706 Global illumination techniques

Posted by MJP on 10 July 2013 - 03:06 PM

 

Hmm, you're right. It looks like they're using cascaded shadow maps for both the static and dynamic geometry, which is interesting. I assume they bake only the indirect lighting and then just add in the direct lighting on the fly. If nothing else, it's probably easier to implement than storing the contribution of direct light onto static geometry.

 

Guys, I understand the part with shadows. It's not interesting if they are using static shadow maps for static level geometry. I don't think they just bake the indirect lighting and that's it. The actors and other objects moving through the level receive indirect lighting as well. I have a feeling they have some sort of lightmap on static levels and also have some "fill lights" placed here and there simulate bounced light and to illuminate dynamic objects, that move around.

 

 

It's fairly common to bake ambient lighting into probes located throughout the level, and then have dynamic objects sample from those probes as they move through the level.






PARTNERS