Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Mar 2012
Offline Last Active Today, 12:48 AM

Posts I've Made

In Topic: Intrinsics to improve performance of interpolation / mix functions

20 September 2014 - 01:20 PM

Hey thanks osmanb for those explanations and hints. I'll consider them.


What about branches (in the sense of if-then-else code sections) and loops? Both are known as being potentially heavily affecting performance, so are there considerations about how to speed up those code flow operations, i.e. what sort of optimizations on the HLSL side help generating well performing GPU operations?

In Topic: Intrinsics to improve performance of interpolation / mix functions

19 September 2014 - 02:08 PM

It's impossible to do without tools from the GPU vendor. AMD provides such a tool, Nvidia does not.


Thanks. But there seem to be at least some heuristics, or experiences, or basic information on the particular instruction set of a GPU, right?


I mean, people seem to know whether a GPU has a horizontal add instruction or not, so I guess there are other instructions that I could specifically address using particular HLSL codes, so I'm interested in learning those.

In Topic: Intrinsics to improve performance of interpolation / mix functions

18 September 2014 - 04:04 PM

Thank you guys! I'll try those tweaks.


Btw, when talking about "thinking low level while coding high level", how can I determine how many atomic GPU instructinos a certain GPU's driver / JIT compiler produces by a given bytecode-compiled shader?


I mean, examining fxc's assembly output for performance reasons doesn't seem to make much sense if the GPU instructions produced by the JIT compiler differ much from that (e.g. if some assembly code containing, say, 10 "bytecode" instructions but the driver makes 15 GPU instructons out of those while a different bytecode with 15 instructions produces the same amount of GPU instructions).

In Topic: Loop Compilation - Bad Performance

18 September 2014 - 03:52 PM

Thank you for the explanations.




Are you sure you can't just use pre-compiled shaders in your mod? Anyway, there's definitely no doubt that shader compilation is not meant to be a real-time process. You should be blaming the game's developers if they don't allow pre-compiled shaders, not the compiler for being slow. Microsoft has repeatedly stated that developers should always use pre-compiled shaders.


Yes I am. And yes, I do blame the game's developers rolleyes.gif


Found another thing meanwhile. Using [unroll] instead of [loop] causes my shader to crash upon compilation. I don't get any specific compiler error in that case but for what I've read on the topic there are cases where unrolling a loop at compile time is not possible, so given the fact that using [unroll] doesn't work and [loop] does not change a thing I'd assume my shader isn't unrolled by the compiler, right? I'll check the assembly code nonetheless.

In Topic: Intrinsics to improve performance of interpolation / mix functions

16 September 2014 - 07:15 AM

On the topic: Is there a simple and fast way to get and compare the average of the components of a vector, e.g. luminance of a color? I'm often doing things like this:

float lum1 = (col1.r+col1.g+col1.b)/3.f;
float lum2 = (col2.r+col2.g+col2.b)/3.f;

if (lum2-lum1 > threshold) { ... }

Which of cause generates loads of instrucstions for a simple average computation and comparison. Is there a simpler / leaner  / more elegant way?