Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Mar 2007
Offline Last Active Yesterday, 06:33 PM

#5178941 HLSL compiler weird performance behavior

Posted by MJP on 08 September 2014 - 02:58 PM

I'm not sure why it takes so long to compile. You'd really need to get someone from the DirectX team to help you out. Historically the loop simulator used for unrolling loops has always been rather slow with HLSL compiler, but it doesn't make sense that it would be so slow for your particular case. It must be doing some sort of bounds-checking that is slowing it done.

I tried changing your shader to use a StructuredBuffer instead of a constant buffer for storing the array of bone matrices, and it compiles almost instantly. So you can do that as a workaround. A StructuredBuffer shouldn't be any slower (in fact it probably takes the same path on most recent hardware), and will give you the same functionality. 

As for unrolling, it's almost always something you want to do for loops with a fixed number of iterations. It generally results in better performance, because it allows the compiler to better optimize the resulting code and always prevents the hardware from having to execute looping/branching instructions every iteration. So you'll probably want to be explicit and put an [unroll] attribute on your loop.

#5178236 Intrinsics to improve performance of interpolation / mix functions

Posted by MJP on 04 September 2014 - 11:40 PM

I don't think there's anything available that the compiler won't already being using, and even if there were there's no guarantee that it would actually map to a single instruction once it's JIT compiled for your GPU.

#5178150 "DirectX Texture Tool : An error occurred trying to open that file" w...

Posted by MJP on 04 September 2014 - 02:56 PM

PIX comes with the old DirectX SDK, which you seem to already have installed. Just be aware that PIX will not work with up-to-date versions of Windows 7 or any version of Windows 8 without patching the EXE and one of its DLLs.


You may want to consider trying RenderDoc instead, which is a very awesome third-party tool that aims to be a worthy successor to PIX.

#5178149 DX12 - Documentation / Tutorials?

Posted by MJP on 04 September 2014 - 02:50 PM



what kind of documentation are you searching for?

Sorry, should've been more specific. I'm referring to documentation on the binary format to allow you to produce/consume compiled shaders like you can with SM1-3 without having to pass through Microsoft DLLs or HLSL. Consider projects like MojoShader that could make use of this functionality to decompile SM4/5 code to GLSL when porting software or a possible Linux D3D11 driver that would need to be able to compile compiled SM4/5 code into Gallium IR and eventually GPU machine code.

There's also no way with SM4/5 to write assembly and compile it which is a pain for various tools that don't work to work through HLSL or the HLSL compiler.



I'm not sure what the actual problem you have here is.  It's an ID3DBlob.


If you want to load a precompiled shader, it's as simple as (and I'll even do it in C, just to prove the point) fopen, fread and a bunch of ftell calls to get the file size.  Similarly to save one it's fopen and fwrite.


Unless you're looking for something else that Microsoft actually have no obligation whatsoever to give you, that is.....



He's specifically talking about documentation of the final bytecode format contained in that blog, so that people could write their own compilers or assemblers without having to go through d3dcompiler_*.dll, as well as being to disassemble a bytecode stream without that same DLL. That information (along with the full, complete D3D specification) is only available to driver developers. 

#5177977 DX12 - Documentation / Tutorials?

Posted by MJP on 03 September 2014 - 11:39 PM



D3D12 will be the same, except will perform much better (D3D11 deferred context do not actually provide good performance increases in practice... or this is the excuse of AMD and Intel which do not support driver command lists).

Fixed cool.png
AMD support them on Mantle and multiple game console APIs. It's a back end D3D (Microsoft code) issue, forcing a single-thread in the kernel mode driver be responsible for kickoff. The D3D12 presentations have pointed out this flaw themselves.



Indeed. There's also potential issues resulting from the implicit synchronization and abstracted memory management model used by D3D11 resources. D3D12 gives you much more manual control over memory and synchronization, which saves the driver from having to jump through crazy hoops when generating command buffers on multiple threads.

#5177956 Seemingly incorrect buffer data used with indirect draw calls

Posted by MJP on 03 September 2014 - 06:49 PM

Like any other GPU-executed command, CopyStructureCount has implicit synchronization with any commands issued afterwards. So there shouldn't be any kind of manual waiting or synchronization required, the driver is supposed to handle it.

Your approach all sounds okay, and I've successfully implemented something similar several times in the past. I'm not sure if you have a bug somewhere in your code, if there's a driver issue, or if RenderDoc is giving you incorrect information. Driver bugs can usually be diagnosed by enabling the reference rasterizer, and comparing the output. So you should do that, if you haven't already (WARNING: it's *very* slow). Reading back the value yourself on the CPU would be my other suggestion, but it sounds like you're doing that already. 

#5177610 Having trouble with energy conservation with IBL.

Posted by MJP on 02 September 2014 - 01:55 AM

Yeah, it definitely looks like you've got a bug somewhere. For comparison, here's some images from my ground-truth renderer, showing roughness values starting at 0.01 and ending with 1.0:






FYI these are taken with an exposure of -2.5, which is a linear exposure of 0.176. It also has filmic tone mapping applied after exposure, followed by gamma correction.

#5177123 Compiling 64-bit Applications in Visual Studio Express 2013

Posted by MJP on 30 August 2014 - 04:46 PM

VS 2012 and 2013 express includes the x86-x64 cross compiler, and not the full x64 compiler. Like SmkViper already explained, this essentially means that the compiler and linker are 32-bit executables, but ultimately they can produce a 64-bit executable. For the most part this isn't a big deal, unless you start working on a huge project that can cause the linker to exceed its 4GB virtual address space. I have no idea why they only include the cross-compiler and not the full x64 compiler, it doesn't seem to make much sense to me.

#5176603 Why are their bumps in my shadow mapping?

Posted by MJP on 28 August 2014 - 12:36 AM

Having a stair-step pattern is totally expected, since it's just aliasing due to rasterization. However in your picture the edges are rounded, which is strange. Are you using bilinear filtering when sampling the shadow map, or performing any other filtering/blurring on the shadow map before you sample it?

#5176602 what's this error message mean?

Posted by MJP on 28 August 2014 - 12:33 AM

So the error messaging that the program crashed due to an unhandled SEH exception with code C0000005. If you're not familiar with SEH, it's basically the low-level mechanism that Windows uses for handling critical failures caused by program behavior. For more information, I would suggest reading the Windows SDK documentation as well as the VC++ documentation.


In your particular case, the code of C0000005 corresponds to EXCEPTION_ACCESS_VIOLATION, which indicates that there was a memory access violation in your process or one of its loaded DLL's. Typically this happens due to accessing a pointer with a bad address, which is why mhagain suggested that it's probably due to a NULL pointer. That particular exception also comes with additional information telling you a bit more about what happened. In particular, it includes a flag indicating the memory access type and another integer containing the virtual address that was accessed. Your error report only includes the first parameter, the flag, and not the address. The flag has a value of 8, which tells us that the violation was caused by DEP, which is a feature that tries to make sure that programs don't start executing instructions that aren't part of the executable's actual code (it's meant to protect buffer overflows).


So it seems you've already figured out that DEP is involved, however you most definitely should not try to turn it off. You almost certainly have a bug in your code (or a DLL) that is caused your program to start executing from the wrong place in memory. There's a few ways that this can happen:


1. Using a bad function pointer

2. Stomping a vtable, and then calling a virtual function that uses the vtable

3. Trashing the stack in such a way that you don't return to the calling code of a function


Like Hodgman suggested, a good way to handle these problems is to write an exception handler (either by installing a global SEH handler, or by putting a __try/__except bracket around your main() function) and then write out a crash dump when the handler is invoked. You can have your beta tester send you the dump file, and then you can debug it on your machine to try to figure out what went wrong.

#5176341 What you think of my Depth Pass AA?

Posted by MJP on 26 August 2014 - 08:41 PM

What you're doing is a rather crude method of antialiasing. Such approaches were used maybe 5 to 7 years ago, before FXAA and MLAA were introduced. I believe the original Crysis used it when MSAA was disabled. While it's true that filtering is an essential component of antialiasing, just doing a simple blur along edges will remove details and will be susceptible to temporal flickering/crawling. This is because you really need sub-pixel information to do a decent job of antialiasing. MSAA naturally gives you sub-pixel information because it causes the rasterizer to run at a higher resolution. FXAA and MLAA try to get around this by analyzing neighboring pixel colors, and reconstructing analytical triangle data based on the edges it finds in the pixels. Temporal solutions use previous frame results as a means of increasing the effect sample count.

TXAA isn't anything like you've described. Unfortunately they don't share any specifics, but from what I've read from their press material and previous (deleted) blog posts it's essentially MSAA with a custom resolve combined with temporal supersampling. Custom resolves let you implement custom filter kernels, which can be used to boost quality over "standard" box filters used in hardware MSAA resolves.

If you're interested in temporal supersampling, here are some presentations that you can check out:










#5176117 What you think of my Depth Pass AA?

Posted by MJP on 25 August 2014 - 08:03 PM

I can't really give you feedback unless you give us some information about what you're doing.

#5176068 Sampling the Render target while rendering to it

Posted by MJP on 25 August 2014 - 03:30 PM

Although it can be done as Yourself mentions, note that it's for a single format - DXGI_FORMAT_R32_UINT - which may not be compatible for color sampling.


Indeed, which is why he mentioned having to do manual packing and unpacking. Another possible workaround is to use a StructuredBuffer instead of a texture.

#5175858 Vector3.Unproject return NAN [SharpDX]

Posted by MJP on 24 August 2014 - 02:21 PM

Your far clip plane is way too big. It should probably be no bigger than 1000 or so.

#5175391 Getting ASSIMP Working/Building

Posted by MJP on 21 August 2014 - 08:30 PM

You can't just set the path to assimp.dll in the ExcecutableDirectories if you want Windows to find the DLL. You either need to add it to your PATH environment variable (you can do that using "Environment" debugging option), or need to to just copy it to your project folder. Personally I like to use a post-build step to copy any dependent DLL's to my project folder.