Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Mar 2007
Offline Last Active Today, 01:19 AM

#5200384 Multi-threading for performance gains

Posted by MJP on Today, 01:21 AM

Ditto on what Hodgman said. Big commercial games have been moving en-masse to job-based systems, where you chunk up your work into discrete tasks and then let a scheduler throw them on your available cores. Doing a coarse-grained setup where you assign major sub-systems to persistant threads can give you a lot of scheduling problems that lead to idle cores. 

#5200383 A few quick questions about compute shaders

Posted by MJP on Today, 01:11 AM

If you create a hardware device (by passing D3D_DRIVER_TYPE_HARDWARE when creating the device), then there will be no software emulation/fallback. The hardware will either support the feature or it won't. Windows does provide an implementation that runs entirely in software that's called the WARP driver, but it runs everything in software and you have to specifically request it when creating your device (by passing D3D_DRIVER_TYPE_WARP). There's also the reference device which also runs in software, but it's slow beyond belief because it's intended for correctness and not performance. You also have to specifically request that, by passing D3D_DRIVER_TYPE_REFERENCE. 


The simplest way to ensure compute shader functionality is to just require FEATURE_LEVEL_11_0 support from the GPU. It's easy to do this: when you call D3D11CreateDevice, just check the returned feature level (from the pFeatureLevel parameter) and then make sure that it's >= D3D_FEATURE_LEVEL_11_0. If that's the case, then you're guaranteed cs_5_0 support from the hardware. D3D_FEATURE_LEVEL_11_0 corresponds to all of the video cards that were marketed as "DX11-capable", which started with the AMD HD 5000 series and the Nvidia GTX 400 series. Technically there are FEATURE_LEVEL_10_0 and FEATURE_LEVEL_10_1 (DX10/DX10.1) video cards out there that supported the stripped down cs_4_0/cs_4_1 profiles, but support is optional and so you have to query the device at runtime to find out if you can use it (using ID3D11Device::CheckFeatureSupport and passing D3D11_FEATURE_D3D10_X_HARDWARE_OPTIONS). Honestly though cs_4_x is really limited in what it can do, and isn't really worth the effort. 

#5199344 Current-Gen Lighting

Posted by MJP on 20 December 2014 - 06:55 PM

Most games use their specular BRDF with analytical direct light sources (point lights, or area lights with approximations) and then use another means for applying lighting from more complex emissive and bounce lighting. At the moment pre-convolved cubemaps are pretty popular for the latter part. It's absolutely true that you can't actually pre-integrate your BRDF with a lighting environment and store the result in a cubemap, so games resort to approximations. See Epic's course notes from SIGGRAPH 2013 for what's considered "state-of-the-art" in this regard, and then see the Frostbite presentation from 2014 for some updates. For our game we still use cubemaps for high-gloss materials, but have a different solution for everything else (which we will share hopefully in the near future).

#5197733 sRGB and deferred rendering

Posted by MJP on 12 December 2014 - 01:19 AM

Now granted I havn't tried this so perhaps it dosn't work in practice, but isn't it possible to have two render targets to the same texture, one render target with sRGB conversion and one without? i.e, one RT with DXGI_FORMAT_R8G8B8A8_UNORM and one with DXGI_FORMAT_R8G8B8A8_UNORM_SRGB - drawing HUD using the former, linear->sRGB using the later?


Yes, you can do this. You just need to create the Texture2D with the associated TYPELESS format, and then specify the appropriate UNORM/SRGB format when creating shader resource views or render target views.

#5196714 Create Geometry Shader with Stream Output

Posted by MJP on 06 December 2014 - 05:32 PM

Did you create your device with the D3D11_CREATE_DEVICE_DEBUG flag enabled? If you do that, you'll get error messages in the debugger output telling you why API calls failed.

#5195451 MSAA with non-msaa data

Posted by MJP on 30 November 2014 - 01:32 AM

You can manually fill your MSAA depth buffer with contents of your non-MSAA depth buffer. The easiest way to do this is with a full screen pixel/fragment shader that writes directly to your MSAA depth buffer. Basically you have each pixel sample the depth value from your non-MSAA depth buffer, and the output it using SV_Depth/gl_FragDepth (there's no need to output a color). You can then use your MSAA depth buffer for rendering your HUD. 


The major caveat is that this is basically doing an upscale of your depth data with point sampling. Consequently, you won't actually have any sub-pixel data in your MSAA depth buffer after filling it with the contents of your non-MSAA depth buffer. This means that your HUD won't get antialiased results for pixels where your lines intersect with your actual scene geometry. 

#5195305 EVSM is the best needed ?

Posted by MJP on 28 November 2014 - 10:02 PM

There's a lot of factors that go into determining whether a particular shadowing technique is really "the best", and those factors are usually different for every project. Otherwise everyone would agree, and we would be using the same thing in every game. It's going to be up to you to weigh the pros and cons of different approaches, and try to decide what's actually the best fit for your project and target hardware.


In general, "standard" depth buffer shadow maps with various forms of PCF are still the most popular choice for games. They're cheap to render, easy to setup, they're supported on a very wide range of hardware, and they have well understood flaws (mostly related to filtering and biasing). 


VSM's primary advantage over standard shadow maps is that they're pre-filterable. Unlike depth buffer shadow maps, where you to perform the depth comparison before filtering, you can filter VSM's as soon as you have them in two-component VSM format. This opens the door to things like MSAA, mipmaps, and separable blur passes. This can not only give you better quality, but can also possibly make things cheaper relative to standard shadow maps (this is especially true if you cache your shadow maps across frames). Their other main advantage is that they are much easier to bias compared to standard shadow maps. With VSM it's possible to pick a single "magic" value that will work across a wide range of conditions, whereas with standard shadow maps you typically need artists-authored offsets combined with complex techniques that adjust the bias based on the sample position and receiver slope. The main disadvantages are that light bleeding (which I'm sure that you're already familiar with), and the need to convert from "standard" depth into the variance format. Light bleeding can be reduced with a few tricks, but it will always be present to a certain extent for certain occluder/receiver configurations. The conversion requires either using a pixel shader when rendering to the shadow map, or having a conversion step after you've finished rendering to a depth buffer. The conversion can possibly be rolled into an MSAA resolve or the filtering step, if you use those things. You might see additional memory storage brought up as a concern for VSM, but in practice I've found that using an R16G16_UNORM format provides completely adequate precision, with the same footprint as a 32-bit depth buffer.

EVSM is much like VSM, except it attempts to address light bleeding by using an exponential warp. This warp can be very effective at reducing or eliminating light bleeding in most cases, but it won't fix all cases. Since it's an exponential warp you're pretty much required to use a floating-point format, which can quickly leave you with precision problems if you're not careful. 32-bit floats will give the best results, but you can get away with 16-bit if you're very careful about restricting your depth ranges and also use a more conservative warping factor. However you really need to include the negative warp  for best results, so at best you're looking at R16G16B16A16_FLOAT which is double the footprint of a 32-bit depth buffer or a VSM texture. For maximum quality, you'll want 32-bit floating point which means 4x the footprint of a standard shadow map.

Having done a lot of work with EVSM myself, I will say that it's definitely a viable approach for a higher-spec hardware that has good support for floating-point formats (for instance DX11 PC video cards, or current-gen consoles). If you're targeting more modest hardware, then you're probably better off sticking to standard shadow maps or VSM. I don't currently know of any shipping games or games in development that use EVSM, with the exception of the game that I'm working on (The Order: 1886). So when it comes out, you can have a look and judge the quality for yourself. tongue.png

#5195112 Putting a calculation in the vertex shader vs. pixel shader

Posted by MJP on 28 November 2014 - 12:08 AM

What mhagain said is exactly right: if you move a calculation to the vertex shader, you're only going to get the same result if the value is the result of a linear function. So for your attenuation example, it would depend on how you calculate your attenuation factor. If your factor is linear like this:

float attenuation = (100.0f - lightDistance);

then you can use linear interpolation and get the same result. If it's non-linear, like this:

float attenuation = 1.0f / (lightDistance * lightDistance);

then interpolation won't give you the same results.

#5192784 One back buffer for several swap chains

Posted by MJP on 13 November 2014 - 09:42 PM

Every swap chain has its own backbuffer, there's no way to share backbuffers between multiple swap chains.

#5191996 Generating mipmaps for depth texture?

Posted by MJP on 10 November 2014 - 01:05 AM

Like Hodgman mentioned, it's probably not useful to compute the average depth value for a neighbor hood of texels in a depth buffer. Computing SSAO(Avg(Depth(x, y))) gives you a much different result than Avg(SSAO(Depth(x, y))). Computing the min and max depth can be useful for improving the performance of general heightfield intersections, but that may or not be useful for SSAO. A lot of games that perform SSAO and other effects at a lower resolution will just point sample the depth buffer.

#5191622 Datatype Size and Struct Compiler Padding

Posted by MJP on 07 November 2014 - 12:26 AM

At our studio we enable compiler padding warnings to make sure that we're explicitly accounting for any padding in our serialized data structures. For Visual C++ it's warning 4820, and for gcc/clang it's -Wpadded.


We generally remove padding whenever we can, but for places where we can't we have a special templated type where you can specify the padding size to use. This padding is then initialized with all 0's in the constructor. Doing this makes sure that data produced by our content build system is deterministic, which is important if you're hashing your data.

#5191424 Multipass rendering with depth compare EQUAL

Posted by MJP on 05 November 2014 - 04:17 PM

At work that is exactly how we handle alpha tested geometry: alpha test during the depth prepass, EQUAL depth testing during the main forward pass (we have a Forward+-like setup). This has always worked for use in DX11, altough we only run on Nvidia hardware so I can't say that I've tested it on a wide range of GPU's. You definitely need to make sure that your position calculations exactly match in both vertex shaders, although it seems you've already considered that. Have you double-checked the resulting shader assembly from both shaders to ensure that they're using the same sequence of instructions to compute the position? You'll especially want to check for any usage of the mad instruction, since that will have different precision relative to issuing separate multiply + add instructions. If that is the case, then you can try using the precise keyword to force the compiler to be more strict about the instructions that it uses.

#5190270 Best book for Shadow and Lighting ?

Posted by MJP on 30 October 2014 - 07:50 PM

As others have mentioned Real Time Shadows gives a great overview of modern shadowing techniques, and Real Time Rendering has a lot of the basic theory that's necessary for understanding physically-based shading. If you're looking for a glimpse into how physically based shading is being used in the latest games and movies, then I would suggest reading through the slides and course notes from the physically based shading course at SIGGRAPH (20122013, 2014).

#5189462 DXGI_USAGE_UNORDERED_ACCESS and RWTexture2D in pixel shader?

Posted by MJP on 27 October 2014 - 12:52 PM

Yeah, that's a really annoying limitation of RWTexture2D: they can only read from R32_FLOAT, R32_UINT, and R32_INT formats. It comes from early D3D11 hardware that had bad support for UAV's.

For the case of an R8B8G8A8_UNORM texture, you can work around this by aliasing the texture as R32_UINT and then manually performing the UNORM conversion in the shader. There's documentation on how to do this here.

#5189331 Mac or PC - Really, this is a programming question.

Posted by MJP on 26 October 2014 - 10:36 PM

At home I use use Windows 8 with Visual Studio 2013, and at work I use Windows 7 with Visual Studio 2012 (although I do a lot of coding in Sublime). The current-gen consoles both use Windows + Visual Studio as the development environment, so if you're going to work on them then you at least need access to a PC that runs Windows.