Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Mar 2007
Offline Last Active Yesterday, 11:04 PM

#5287884 [D3D12] About CommandList, CommandQueue and CommandAllocator

Posted by on 20 April 2016 - 08:44 PM

There is no implied copy from CPU->GPU memory when you submit a command list. GPU's are perfectly capable of reading from CPU memory across PCI-e, and on some systems the CPU and GPU may even share the memory.

#5287707 How to blend World Space Normals

Posted by on 19 April 2016 - 07:34 PM

The sample implementation of RNM on that blog post assumes that the "s" vector is a unit z vector, which is the case for tangent-space normal maps. This is represented in equations 5/6/7. If you want to work in world-space, then you need to implement equation 4 as a function that takes s as an additional parameter:


float3 ReorientNormal(in float3 u, in float3 t, in float3 s)
    // Build the shortest-arc quaternion
    float4 q = float4(cross(s, t), dot(s, t) + 1) / sqrt(2 * (dot(s, t) + 1));
    // Rotate the normal
    return u * (q.w * q.w - dot(q.xyz, q.xyz)) + 2 * q.xyz * dot(q.xyz, u) + 2 * q.w * cross(q.xyz, u);


If you pass float3(0, 0, 1) as the "s" parameter, then you will get the same result as the pre-optimized version. However the compiler may not be able to optimize it as well as the hand-optimized code provided in the blog.

#5286063 D3D alternative for OpenGL gl_BaseInstanceARB

Posted by on 09 April 2016 - 03:09 PM

ExecuteIndirect supports setting arbitrary 32-bit constants through the D3D12_INDIRECT_ARGUMENT_TYPE_CONSTANT argument type. You can use this to specify transform/materialID data per-draw without having to abuse the instance offset. You can also set a root CBV or SRV via a GPU virtual address, which means you can use that to directly specify a pointer to the draw's transform data or material data.

#5285926 PIXBeginEvent and PIXEndEvent member functions on CommandList object

Posted by on 08 April 2016 - 04:55 PM

The documentation you linked to is the old pre-release documentation. The final documentation doesn't list those methods. Instead it has BeginEvent and EndEvent, which are called by the helper functions in pix.h.

#5285343 [D3D12] Synchronization on resources creation. Need a fence?

Posted by on 05 April 2016 - 02:55 PM

Yeah, there's no need to wait for commands to finish executing because they don't actually issue any commands. If you look at some of the other samples, they all have a wait at the end of LoadAssets. They do this so that they can ensure that any GPU copies finish before they destroy upload resources. So for instance if you look at the HelloTexture sample, it goes like this:


  • Create upload resource
  • Map upload resource, and fill it with data
  • Issue GPU copy commands on a direct command list
  • Submit the direct command list
  • Wait for the GPU to finish executing the command list
  • ComPtr destructor calls Release on the upload resource, destroying it

#5285175 When would you want to use Forward+ or Differed Rendering?

Posted by on 04 April 2016 - 10:28 PM

In The Order we output depth and vertex normals in our prepass so that we could compute AO from our capsule-based proxy occluders. Unfortunately this means having a pixel shader, even if that shader is extremely simple. Rasterizing the scene twice is a real bummer, especially if you want to push higher geometric complexity. But at the same time achieving decent Z sorting is also pretty hard, unless you're doing GPU-driven rendering and/or you have a really good occlusion culling.

#5285174 Soft Particles and Linear Depth Buffers

Posted by on 04 April 2016 - 10:22 PM

Yes, z/w is very non-linear. If you're using a hardware depth buffer, you can compute the original view-space Z value by using the original projection matrix used for transforming the vertices: 


float linearDepth = Projection._43 / (zw - Projection._33);


If you'd like you can then normalize this value to [0, 1] by dividing by the far clip plane, by doing z = (z - nearClip) / (farClip - nearClip). 


Using a linear depth value for soft particles should give you much more consistent results across your depth range, so I would recommend doing that.

#5284941 Texture sample as uniform array index.

Posted by on 03 April 2016 - 07:09 PM

All modern hardware that I know of can dynamically index into constant (uniform) buffers. For AMD hardware it's basically the same as using a structured buffer: for an index that can vary per-thread in a wavefront, the shader unit will issue a vector memory load through a V# contains the descriptor (base address, num elements, etc.). On Nvidia, there's 2 different paths for constant buffers and structured buffers. They recommend using constant buffers if the data is very coherent between threads, since this will be lower-latency path compared to structured buffers. I have no idea what the situation is for Intel, or any mobile GPU's.

#5283795 In Game Console window using DirectX 11

Posted by on 27 March 2016 - 11:11 PM

You can use an orthographic projection matrix to map from a standard 2D coordinate system (where (0,0) is the top left, and (DisplayWidth, DisplayHeight) is the bottom right) to D3D normalized device coordinates (where (-1, -1) is the bottom left and (1, 1) is the top right). DirectXMath has the XMMatrixOrthographicOffCenterLH function which you can use to generate such a matrix. Just fill out the parameters such that Top = 0, Left = 0, Bottom = DisplayHeight, and Right = DisplayWidth. If you look at the documentation from the old D3DX function for doing the same thing, you can see how it generates a matrix such that it has an appropriate scale and translation.

#5283446 Per Triangle Culling (GDC Frostbite)

Posted by on 25 March 2016 - 03:10 PM

Nvidia has OpenGL and D3D extensions for a "passthrough" GS that's meant to be fast as long as you can live with the restrictions (no arbitrary amplification, only triangles, no stream out, etc.). So if you could use that to do per-triangle culling, it could potentially be much easier to get it working. If anybody actually tries it, I'd love to hear about the results. :)

#5283071 Math behind anisotropic filtering?

Posted by on 23 March 2016 - 11:42 PM

Is there an article/explanation and is it standardized somehow or vendor dependant? (in gl it's not core AFAIK even if all vendors supports it)

It's an extension in GL because it's patented. :(

#5282547 Iso-/Dimetric tile texture has jagged edges

Posted by on 22 March 2016 - 01:22 AM

I had a similiar issue once, and it turned out I was doing windowed mode wrong in terms of calculating the window size to fit the backbuffer
size etc., resulting in a vaguely stretched display that was hard to notice for a long while.. Maybe you could check your window+DirectX
initialisation code?

I was going to say the same thing. You want to make sure that the client area of your window is the same size as your D3D backbuffer,
otherwise you'll get really crappy scaling when the backbuffer is blit onto the window. You can use something like this:
RECT windowRect;
SetRect(&windowRect, 0, 0, backBufferWidth, backBufferHeight);

BOOL isMenu = (GetMenu(hwnd) != nullptr);
if(AdjustWindowRectEx(&windowRect, style, isMenu, exStyle) == 0)

if(SetWindowPos(hwnd, HWND_NOTOPMOST, 0, 0, windowRect.right - windowRect.left, windowRect.bottom - windowRect.top, SWP_NOMOVE) == 0)
See the docs for AdjustWindowRectEx for more details.

#5282527 EVSM, 2 component vs 4 component

Posted by on 21 March 2016 - 10:54 PM

I went with 16-bit because there's too many artifacts when using the 2-component version EVSM. Specifically, you run into issues in areas with high geometrical complexity where the receiver surface is non-planar relative to the filter kernel. See the original paper (go to section 7) for some more details.


I really noticed it on our characters and faces, due to their dense, curved geometry. I took some screenshots from my sample app in attempt to replicate the issues that I saw in The Order:


This is 4-component EVSM with 32-bit textures:




This is 2-component EVSM with 32-bit textures (look at the shadow cast by the nose):




And this is 4-component EVSM with 16-bit textures, with the bias and leak reduction turned up:



#5282243 Object Space Lightning

Posted by on 20 March 2016 - 07:04 PM

That said, I don't get the comparisons to REYES and overall it seems like a very special purpose, application-specific approach to rendering


Yeah, I agree that the frequent mentioning of REYES is misleading. The only real commonality with REYES is the idea of not shading per-pixel, and even in that regard REYES has a very different approach (dicing into micropolygons followed by stochastic rasterization).


I also agree that it's pretty well-tailored to their specific style of game, and the general requirements of that genre (big terrain, small meshes, almost no overdraw). I would image that to adopt something similar for more general scenes you would need to a much much better job of allocating appropriately-sized tiles, and you would need to account for occlusion. I could see maybe going down the megatexture approach of rasterizing out tile ID's, and then analyzing that on the CPU or GPU to allocate memory for tiles. However this implies latency, unless you do it all on the GPU and rasterize your scene twice. Doing it all on the GPU would rule out any form of tiled resources/sparse textures, since you can't update page tables from the GPU.

ptex would be nice for avoiding UV issues (it would also be nice for any kind of arbitrary-rate surface calculations, such as pre-computed lightmaps or real-time GI), but it's currently a PITA to use on the GPU (you need borders for HW filtering, and you need quad->page mappings and lookups).

#5282237 Gamma correction. Is it lossless?

Posted by on 20 March 2016 - 06:49 PM

The sRGB->Linear transformation for color textures will typically happen in the fixed-function texture units, before filtering. You can think of the process as going something like this:


result = 0.0
foreach(texel in filterFootPrint):
    encoded = ReadMemory(texel)
    decoded = DecodeTexel(encoded)  // Decompress from block compression and/or convert from 8-bit to intermediate precision (probably 16-bit)
    linear = sRGBToLinear(decoded)
    result += linear * FilterWeight(texel) // Apply bilinear/trilinear/anisotropic filtering
return ConvertToFloat(result)


It's important that the filtering and sRGB->Linear conversion happen at some precision that's higher than 8-bit, otherwise you will get banding. For sRGB conversion 16-bit fixed point or floating point is generally good enough for this purpose. The same goes for writing to a render target: the blending and linear->sRGB conversion need to happen at higher precision than the storage format, or you will get poor results. You will also get poor results if you write linear data to an 8-bit render target, since there will be insufficient precision in the darker range.


Probably the vast majority of modern game are gamma-correct. It's a bona fide requirement for PBR, which almost everyone is adopting in some form or another. I seem to recall someone mentioning that Unity maintains a non-gamma-correct rendering path for legacy compatibility, but don't quote me on that.