Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 29 Mar 2007
Offline Last Active Today, 05:43 PM

#5222476 What to do when forward vector equals up vector

Posted by MJP on 10 April 2015 - 12:51 PM

Using an "up" vector to generate a basis is only necessary when you don't have a basis to begin with. In other words, you should only need to do it when all you have is one "forward" direction and nothing else. This typically doesn't need to be the case for a camera: it's usually easy to hold onto a full basis for your camera, and then there's no need for generating perpendicular vectors. How you would implement this depends on how your camera works, but if you share some details about how you handle camera orientation then I can probably help you modify it so that you don't need an "up" vector.

#5221771 Difference between SDSM and PSSM?

Posted by MJP on 06 April 2015 - 10:05 PM

This only happens when your shadow casters are clipped, due to not lying within the Z extents of your shadow projection. The common way to fix this is to use "pancaking", where you force force outlying triangles to get rasterized with depth = 0.0. It can be done easily in the vertex shader by forcing the output Z component to be >= 0, or in D3D11 you can do it by creating a rasterizer state with "DepthClipEnable" set to "false".

#5221565 Questions about compute shader

Posted by MJP on 05 April 2015 - 11:38 PM

1. No. It's specifying that each thread group should have 20x1 threads in it, so it will always execute with at least 20 threads. The actual number of threads executed depends on the number of thread groups that are dispatched, which is a parameter of the Dispatch() function called by the CPU.


2. SV_DispatchThreadID gives you GroupID * ThreadGroupSize + GroupThreadID. In this case it's being used to assign every thread to a particular index of the input and output buffers. So for instance thread 7 in group #5 would read and write using index (5 * 20 + 7) = 107.

#5221564 Difference between SDSM and PSSM?

Posted by MJP on 05 April 2015 - 11:32 PM

Is it really needed to do the whole SDSM steps or just the depth reduction pass is enough ?
I actually just do the depth reduction pass.


It does improve quality, if that's what your asking. Where it really helps is cases where the visible surfaces are very constrained along the view-space X and Y axis. For instance, consider looking down a narrow alleyway. I believe the original paper or presentation had some images demonstrating this case.

#5221040 BC6H Viewers?

Posted by MJP on 02 April 2015 - 06:35 PM

Are they DDS files? Visual studio 2012 and 2013 can view DDS files, and it will display BC6H/BC7. However it will only display the [0,1] range of BC6H textures, and it will clip the rest (there's no exposure or tone mapping controls).


RenderDoc can view DDS files if you just drag them onto the window. I think it has an exposure slider, which is nice.


HDRSee can open DDS files using FreeImage, but I don't know if the version of FreeImage that they use supports the newer DDS formats.

#5221007 Difference between SDSM and PSSM?

Posted by MJP on 02 April 2015 - 03:37 PM

If you use the GPU to produce the splits, do you have to read back the values on the CPU afterwards? So the CPU has to stall for the GPU to finish before continuing?


Yes, you need to read back the results if you want to do split setup, culling, and draw setup on the CPU. If you just wait immediately you'll get a stall on the CPU where it waits for the GPU to finish processing pending commands, and that will immediately be followed by a stall on the GPU (since its run out of commands to execute). Adding latency can allow you to avoid the stall, but if you decide to do a full frame of latency your split setup will be out-of-date. This can result in artifacts if the splits are constrained too much for the visible pixels, which can happen due to either camera movement or object movement. It's possible to compute instantaneous velocity and use that to predict a particular pixel's position for the next frame, which can be used during the depth buffer analysis to compute bounds that (hopefully) work for the next frame. This can work pretty well for simple camera movement, but will still break with more complex movement or for camera teleportation.


With modern API's it's possible to actually do all of the setup and culling on the CPU, which eliminates the need for reading back the data. It can still be awkward and limiting depending on the API and extensions used, but it's definitely doable. In fact the demo that I made for this article has a GPU-only implementation that makes use of a compute shader batching system and DrawIndirect, so you can look at that if you want. It's not really an implementation that would scale up to a real-world scenario (unless your game is pretty simple), but it might give you some ideas. DX12, Vulkan, and some of the newer GL extensions have even more functionality available for GPU-driven rendering, which should allow for more flexible implementations.

#5220835 Direct3D 12 documentation is now public

Posted by MJP on 01 April 2015 - 06:32 PM

A 2D texture will have N subresources, where N = ArraySlices * Mips. To get the subresource for a given mip level of a particular array slice, you do N =  (NumMips * ArraySlice) + MipLevel. There's actually a D3D11 function that does this for you.

#5220760 Difference between SDSM and PSSM?

Posted by MJP on 01 April 2015 - 12:19 PM

SDSM works by rasterizing to a depth buffer on the GPU, and then using a compute shader to analyze the depth samples in order to come up with optimal projects for the shadow map splits. This can me much more accurate than using object bounding boxes, especially when consider that it will handle occlusion.

The SDSM paper demo proposes a few different technique. The simplest one is to just compute the min and max Z value visible to the camera using the depth buffer, which you can then use to compute optimal split distances. It also proposes taking things a step further by transforming every depth buffer position into the local space of the directional light, and then fitting a tight AABB per split.

#5220316 Mip-chain generation compute shader

Posted by MJP on 30 March 2015 - 04:56 PM

That's all legal. Read/write hazards are tracked per-subresource, so you can read from one mip level and write to another. I would just make sure that you run the code with the debug layer active: it will emit an error message if you accidentally introduce a read/write hazard.

#5220097 The Atomic Man: Are lockless data structures REALLY worth learning about?

Posted by MJP on 29 March 2015 - 11:30 PM

Just wanted to quickly give a +1 to what Hodgman mentioned in the second half of his post: you can often get better performance *and* have less bugs by removing the need for mutable shared resources.

#5220072 Copy 2D array into Texture2D?

Posted by MJP on 29 March 2015 - 08:09 PM

How is it crashing, and where? When you get the crash while running in the debugger, you should be getting some information on the unhandled exception that lead to the crash. You should also be able to break into the debugger, and find out the callstack of the thread that encountered the exception. These two things can potentially give you valuable information for tracking down the crash.

#5220046 Copy 2D array into Texture2D?

Posted by MJP on 29 March 2015 - 05:24 PM

First of all...are you looking to create a static texture that you initialize once and use for a long time? Or are you looking to create a dynamic texture, whose contents are updated frequently with new data from CPU memory? You're setting things up for the latter case, and I just want to make sure that this is what you intended.

Those last 4 lines, where you map the texture and update its contents, are definitely wrong. You're performing a memcpy using a pointer to your D3D11_SUBRESOURCE_DATA struct as your souce, and then copying "width" bytes. This isn't copying the right data, and will surely result in the memcpy reading garbage data off the stack. In fact you don't need that D3D11_SUBRESOURCE_DATA struct at all, that's only something that you use for initializing a texture with data. You're also using the address of mappedResource.pData as your destination, which means you're passing a pointer to a pointer to your mapped data, rather than passing a pointer to your mapped data. This will result in your memcpy stomping over your stack, which will result in very bad things happening.

You want something like this:


ZeroMemory(&desc2, sizeof(desc2));
desc2.Width = width;
desc2.Height = height;
desc2.ArraySize = 1;
desc2.Format = DXGI_FORMAT_R32G32B32A32_FLOAT;
desc2.Usage = D3D11_USAGE_DYNAMIC;
desc2.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
desc2.BindFlags = D3D11_BIND_SHADER_RESOURCE;
desc2.MipLevels = 1;
desc2.SampleDesc.Count = 1;
desc2.SampleDesc.Quality = 0;
device->CreateTexture2D(&desc2, NULL, &textureBuffer);
const uint32_t texelSize = 16;  // size of DXGI_FORMAT_R32G32B32A32_FLOAT
D3D11_MAPPED_SUBRESOURCE mappedResource;
deviceContext->Map(textureBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
uint8_t* dstData = reinterpret_cast<uint8_t*>(mappedResource.pData);
const uint8_t* srcData = reinterpret_cast<const uint8_t*>(grid);
for(uint32_t i = 0; i < height; ++i)
    memcpy(dstData, srcData, width * texelSize);
    dstData += mappedResource.RowPitch;
    srcData += width * texelSize;
deviceContext->Unmap(textureBuffer, 0);

#5218986 Documentation for PSGL and GNMX

Posted by MJP on 25 March 2015 - 12:32 AM

The GNM/GNMX docs, like any other console-specific documentation, is not public and only available to registered developers who sign NDA's. Same goes for PSGL and PSSL (PSGL was an incomplete OpenGL implementation for PS3, PSSL is the official shading language for PS4).


Unfortunately I am covered by said NDA's, so I can't really tell you anything about them specifically. I will just say that in general, console API's can be really nice since they're tailored to exactly 1 GPU, and can therefore expose them completely (although this can sometimes make certain things a bit harder, since certain things that are normally abstracted away by PC API's will instead have to be manually managed).

#5218553 Particle Z Fighting

Posted by MJP on 23 March 2015 - 01:05 PM

The order in which elements are added to an append buffer is non-deterministic, since it uses a global atomic increment under the hood. If you're using standard alpha blending, you'll need to sort your particles by Z order if you want to get the correct result.

#5218343 Link shader code to program

Posted by MJP on 22 March 2015 - 04:21 PM

Most rendering engines treat their shaders as data, similar to how you would treat textures or meshes. For rendering API's that support offline compilation (D3D, Metal) this data will consist of pre-compiled bytecode, and for GL the data will be GLSL source code (typically generated from HLSL or another meta-language, with optimizations and pre-processing already applied).


For simple projects embedding shaders right in the executable can be convenient, but it doesn't scale up very well. For GL it's also pretty annoying to write and edit your shaders inside of string literals.