HLSL compiler weird performance behavior

Started by
17 comments, last by satanir 9 years, 7 months ago
If you don't specify loop/flatten attributes, the compiler will try both options before picking one. Even if you do specify [loop], it still seems to partially unroll loops to see if maybe it's a better choice.

That's one reason you should always compile your shaders ahead of time, instead of on your loading screen!
Advertisement


I tried changing your shader to use a StructuredBuffer instead of a constant buffer for storing the array of bone matrices, and it compiles almost instantly. So you can do that as a workaround. A StructuredBuffer shouldn't be any slower (in fact it probably takes the same path on most recent hardware), and will give you the same functionality.

Cool, I'll do that.


That's one reason you should always compile your shaders ahead of time, instead of on your loading screen!

Usually, I agree. But in this case it's a framework I use for shader prototyping, and I like the ability to re-compile shadres on the fly without having to stop the app. It makes the development process somewhat more fluid.

I've been using this shader at work for over a year now for facial animation study. We use a highly-detailed face model with 3 4Kx4K textures, and I always thought that the long loading time was because of the model loading. Yesterday I implemented a model viewer that just create state objects without loading anything, and the loading time was still there... All because of the bones array. Reducing the size of the bone array in my facial animation code - half of the loading time is gone!

There's probably a moral there somewhere, something about not assuming things and such...

Well, I'll just tag this compiler issue as a small DirectX wonder.


Usually, I agree. But in this case it's a framework I use for shader prototyping, and I like the ability to re-compile shadres on the fly without having to stop the app. It makes the development process somewhat more fluid.
There's no reason you can't still do that with pre-compiled shaders. Instead of reading text from disc, compiling to binary and recreating your resources, you just load binary from disc and recreate your resources cool.png

I had a quick play with the shader, and found compilation went much faster if I:

1. Manually inlined the function call (this had the biggest impact).

2. Used the [fastopt] attribute on the loop.

3. Instead of #2 disabled optimization completely in the shader compiler for a bigger impact.

Note that [fastopt] can make the compiler generate worse code, so I wouldn't recommend it outside of prototyping. The same goes for disabling optimization on the shader compiler. Having said that the driver optimizes the shader too, so the runtime performance hit from either of those isn't usually very big.

As a side note, you can generally get away with 4x3 matrices for your bones, which cuts down on the size of the constant buffer and saves a few instructions in the shader.


Manually inlined the function call (this had the biggest impact).

Tried that, it still takes 2s, same as with unroll


2. Used the [fastopt] attribute on the loop.
3. Instead of #2 disabled optimization completely in the shader compiler for a bigger impact.

Tried that as well. Even with optimizations disabled it still takes 1s - still a lot for a simple shader.


As a side note, you can generally get away with 4x3 matrices for your bones, which cuts down on the size of the constant buffer and saves a few instructions in the shader.

Nice tip, thanks.


There's no reason you can't still do that with pre-compiled shaders. Instead of reading text from disc, compiling to binary and recreating your resources, you just load binary from disc and recreate your resources cool.png

Sure, but that means I have to re-compile the shader outside my app every time I change it. By letting the app re-compile, I just change the hlsl file, press a button and let the app do the magic for me.


Sure, but that means I have to re-compile the shader outside my app every time I change it. By letting the app re-compile, I just change the hlsl file, press a button and let the app do the magic for me.
Sorry I'm taking you way off topic laugh.png

Yeah workflow often trumps theoretical performance, but I'd still recommend supporting both text and binary shader files if you're going to go that way, so you can iterate quickly and load quickly in shipping builds biggrin.png

The engine I'm currently using (and a different proprietary one I used in '09) have a system-tray tool that subscribes to OS notifications about changed files in your game's content directory, automatically passes those files to the appropriate data-compiler plugins, and then notifies the game that these compiled data files have changed. That way, the staff just have to press ctrl+S on the text files, the game engine itself remains simple with a single code-path for loading binary data, and end-users get fast load times.


Yeah workflow often trumps theoretical performance, but I'd still recommend supporting both text and binary shader files if you're going to go that way, so you can iterate quickly and load quickly in shipping builds biggrin.png

I guess it's a matter of requirements. The framework I implemented is used for algorithmic development, where the top requirement is fast shader prototyping (think DXUT, but way way better and simpler to use). I don't see a lot of use for pre-compiled shaders in our case.

If I was working on games - then yeah, compiling shaders at load time would make people go german-kid-crazy.

Just to add some more information.

Whats the graphics card of your pc?

FastCall22: "I want to make the distinction that my laptop is a whore-box that connects to different network"

Blog about... stuff (GDNet, WordPress): www.gamedev.net/blog/1882-the-cuboid-zone/, cuboidzone.wordpress.com/


Whats the graphics card of your pc

ATI 6850.

But the compilation performance is unrelated - it happens when I compile with FXC, it's a Microsoft compiler issue.

This topic is closed to new replies.

Advertisement