Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 29 Mar 2007
Online Last Active Today, 01:13 AM

#5210216 Particle and Mipmapping

Posted by MJP on 12 February 2015 - 02:16 AM

Alpha testing is always going to give you funky, aliased results when you undersample the texture map. Mipmapping just plain doesn't work, because you can't pre-filter the results of the comparison and subsequent depth testing/rasterization. Basically what you end up with is AlphaTest(Filter(TextureAlpha)) when what you really want is Filter(AlphaTest(TextureAlpha)) instead (FYI, this is almost the exact same problem that you have with filtering shadow maps). Performing texture filtering and mipmapping on the the alpha value will give you a smoothed alpha value in the shader, but it won't actually accurately represent the shape you're trying to represent. This can get really bad in the lower mip levels, where the averaging of neighboring alpha values can end up causing features to essentially "disappear". You can alleviate the problem somewhat by increasing your sampling rate, which requires enabling MSAA, taking multiple samples from your alpha texture, and then using the alpha test results for each sample to generate an output coverage mask. 


A common, much cheaper method to work around this issue is to change the way you generate mipmaps for the alpha channel such that you end up with something that's a better representation of the overall coverage of the shape that you're trying to represent. This article explains a simple way to do that. It's not always a silver-bullet solution, but it can certainly help in a lot of cases.

#5210188 A question on texture sampling

Posted by MJP on 11 February 2015 - 09:52 PM

You absolutely can.  One such way would be to use SampleLevel.


FYI, the D3D9 version of that function is tex2dlod.

#5210187 D3DCOMPILER_47.dll ?

Posted by MJP on 11 February 2015 - 09:49 PM

I've definitely ran into some cases where the old _43 compiler would take minutes to compile a complex compute shader, and would take only a few seconds with _46 or _47.

#5209756 Does gpu guaranties order of execution?

Posted by MJP on 10 February 2015 - 01:45 AM

Let me try to give you a TL:DR version from a DX11 point of view:

  • When you draw triangles, the value output from the pixel shader (using SV_Target) will be written to your render target in the order of triangle submission. 
  • The actual pixel shader threads themselves will generally not execute in triangle order. So if you use a UAV to make arbitrary memory writes, those writes probably won't be ordered. Same goes for compute shader threads within a single dispatch.
  • If you split things into separate Draw or Dispatch calls, the driver will be forced to sync and flush such that the writes from dispatch A get written to memory before dispatch B. This allows dispatch B to use the results from dispatch A.

In case you're curious, the way that most desktop GPU's enforce render target write ordering is by having special hardware in the ROP's, which handle memory read/write operations for render targets.

#5209541 Sky box rendering - Any downside to depthfunc LESS_EQUAL vs LESS?

Posted by MJP on 09 February 2015 - 01:03 AM

To me it sounds like a bad idea to pick a depth state for everything based on just your skydome. I would just set the depth state explicitly when rendering your skydome, and that way you can use whatever makes most sense. With skydomes/skyboxes it's also typical to do thinks like setting a viewport such that everything is forced to the z=1 plane, and so you may find that you want to set some other custom render states anyway. On top of that, as you already mentioned you always want to draw your skdome last and so you may as well bucket it differently compared to all of your other geo.


As for what mesh you use, it really doesn't matter at all if you're using cubemap. For that, all that matters is that you sample the cubemap with the view direction and you can do that with just a full-screen quad if you really want. However some games will use standard 2D sky textures instead of cubemaps, and in those cases it usually makes sense to use an actual dome-shaped mesh that has the appropriate UV's to match how the texture was authored. 

#5209208 Render from tbuffer to tbuffer?

Posted by MJP on 06 February 2015 - 08:22 PM

By far the easiest way to do this would be to use structured buffers instead of tbuffers, and then writing a compute shader to do the matrix multiply. A structured buffer will give you the same performance characteristics, except with much more flexibility. They're also easier to work with, IMO.

#5209126 Does stream-out maintain ordering?

Posted by MJP on 06 February 2015 - 02:09 PM

Yes, it's required to be in order. It's actually the of the reasons why it can be kind-of slow on certain hardware.

#5208979 PBR diffuse textures in UE4

Posted by MJP on 05 February 2015 - 07:30 PM

FWIW, this is kinda wasteful, because we're dedicating bits to storing values <50 and >244, and then telling artists to never use those bits...
CryEngine has a feature where the artists can author their textures in high precision 16-bit, and then the engine's tools identify the darkest and lightest colours, remapping the darkest colour to 0 and the lightest colour to 255 (so that there's no wasted bits in the 8 bit version), and then storing offset = darkest, multiplier = 1/(lightest-darkest) alongside the texture, so the shader can reconstruct the original colours.


Yeah, we do that too. It can make a pretty big difference for certain textures. For instance if you start using roughness/gloss maps everywhere, you'll often have a lot of materials that don't span the entire roughness range and so you can allocate your available precision accordingly. One funny thing that came out of that was that "solid" textures would get completely optimized out of the shader, since we generated shader code for every material and the offset/multiplier were fixed constants.

#5208475 ShadowMapping in the year 2015?

Posted by MJP on 03 February 2015 - 04:25 PM

No one likes VSM anymore, shadow leaking sucks and evsm is very expensive to fix it

The Order is shipping with EVSM, by the way. smile.png 

#5208293 3D video game in C++ with OpenGL and DirectX 10

Posted by MJP on 02 February 2015 - 05:11 PM

I'm honestly not really sure what you're asking here. Are you looking for help with finding tools that you can use to develop games?

#5208130 D3DCOMPILER_47.dll ?

Posted by MJP on 01 February 2015 - 09:25 PM

The D3DCompiler DLL is included as part of the Windows SDK folder. If you browse to C:\Program Files (x86)\Windows Kits\8.1\bin\x64 on your development PC you'll find the DLL, along with the matching version of fxc.exe. You can just copy it into your project folder if you'd like. Just make sure that if you distribute your program, you also include the DLL for people running Windows 7 or Windows 8.


How do you setup that to work? Is it possible to take the shader binaries and add them as static linked resources so building a Release will add them automatically inside the executable?


See the docs here and here. Basically you just add the shader source file to your project, and Visual Studio will execute a custom build step that runs fxc.exe and produces a .cso file containing the compiled bytecode. It's possible to then include that .cso file as an embedded resource, but an easier option might be to use /Fh option of fxc.exe. This outputs a C++ header file that declares an array containing the bytecode. 

#5207967 "bind slots vs. registers" concern

Posted by MJP on 31 January 2015 - 06:23 PM

The HLSL compiler accepts bogus register assignments, and will just silently ignore them. So in your first example it will ignore the "c0" assignment, and just assign it the first available t# register (which would be t1 in your case).


Here's a quick cheat cheat for register assignments in D3D11:


t# - shader resource views (Texture2D/Buffer/StructuredBuffer/ByteAddressBuffer/etc.)

b# - constant buffers (cbuffer)

u# - unordered access views (RWTexture2D/RWBuffer/RWStructuredBuffer/etc.)

c# - manual variable offsets within a constant buffer

#5207811 Parallax Corrected Cube Maps

Posted by MJP on 31 January 2015 - 01:34 AM

Any sort of complex environment is always going to be difficult to represent with simple proxy geo. Most games that use this technique just aim to capture to overall shape of the room or environment, and accept the error for everything. I don't think that I've ever seen anyone try to address the particular situation you've described, although it's possible that racing games having done something special for that case.

Are you actually trying to handle the case with the 4 cars, or are you just using that as a hypothetical scenario? If you're really set on trying to handle arbitrarily complex geometry and you have some memory and performance to spare, then you could try capturing depth along with color for each cubemap. You could then treat the depth as a heightfield, and raymarch through it to find the intersection point. Of course this will come with performance overhead, and you'll run into the typical undersampling artifacts.

#5207338 Constant Buffer matrix array not updating.

Posted by MJP on 28 January 2015 - 08:29 PM

I would suspect that you're either not binding the constant buffer correctly, or your shader isn't interpreting it correctly. Keep in mind that by default the HLSL compiler will assume that matrices in constant buffers use column-major layout, and will treat them accordingly. If you're storing row-major matrices in your constant buffer, then you can add the "row_major" prefix to your float4x4 array and the compiler will interpret it correctly.


Not pertinent to your question, but do you really have 96 influence bones?


That's nothing, we had 256 just in our heads! smile.png

#5207043 Modeling Light Sources

Posted by MJP on 27 January 2015 - 08:12 PM

In my opinion you definitely don't want to have separate diffuse/specular controls for your lights. If you were to do this, you would essentially be allowing your lights to override the carefully-balanced response of a material. As an extreme example, imagine that you had a plastic surface and you set a light to have only specular response: the plastic will now basically look like a piece of metal, since it no longer has any diffuse.