Sharpen shader performance

Started by
15 comments, last by JoeJ 7 years, 6 months ago
Personally I'd recommend treating GLSL like JavaScript on the modern Web -- never directly write the files that will ship; always have them generated from some build system and potentially have the real source code in a better language altogether. That way you can have clean source code and ship ugly optimized GLSL files, such as with pre-unrolled loops.

Making a shader transpiler and a new shader language is a whole project in itself, Hodg :D When you don't have a tooling team that makes that stuff, the time might be best invested in something else.

What happens if you declare the kernel/offset arrays as uniforms and use the for loop?

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

Advertisement

What happens if you declare the kernel/offset arrays as uniforms and use the for loop?

I tried that too after reading some positive feedbacks about it, but without success.


Making a shader transpiler and a new shader language is a whole project in itself, Hodg :D When you don't have a tooling team that makes that stuff, the time might be best invested in something else.

Yeah, this is practically beyond capacity of an indie studio..

However, i wonder if a compute shader implementation would be faster, e.g. processing 8x8 pixels per invocation.

There would be much less texture access, and maybe it beats texture cache.

I'd be interested to know how a compute shader would behaves in that case.

Some news about the issue:

Using the AMD Shader Analyser and analyzing both looped and unrolled versions of the shader, I saw that both were almost identical. One thing though:

The unrolled version has only 5 texture lookups instead of 9. Why ? Because some of the "kernel" array values are just "0.0f" and the compiler strip the related texture lookups.

But the assembly code shows that AMD compiler is able to unroll the loop by itself.

So, after removing them, I ran both versions too see how they behave:

-Loop version: 1.7 ms

-Unrolled version : 0.24 ms

Conclusion: no real changes on nVidia, except that both run a little bit faster.

Is there a tool similar to AMD Shader Analyzer, but for nVidia gpus ?

[quote name="LM-Crashy" post="5313172" timestamp="1475128726"

Making a shader transpiler and a new shader language is a whole project in itself, Hodg :D When you don't have a tooling team that makes that stuff, the time might be best invested in something else.
[/quote]
Yeah, this is practically beyond capacity of an indie studio..[/quote]
Good thing GLSLOptimizer is open source :)
If it takes a few days to integrate GLSLOptimizer and it saves you a week in tweaking GLSL code, then you can't afford not to :wink:

There's a few other existing projects that convert D3D bytecode to GLSL too. I'm thinking about taking that direction in the future rather than making a new language on top of HLSL/GLSL.

In the past we looked at making a HLSL-like language and a transpiler - I didn't work on it, but it only took one person a week to get the prototype working.

I wouldn't think so. When Cg was still active, you could actually compile from Cg->GLSL :wink:
Silence must be thinking of the fact that GLSL will be compiled into an internal format that's specific to NVidia/AMD/Intel/Qualcomm/etc, and God knows what happens in that step.
At least with HLSL->D3D bytecode->Vendor ASM (and GLSL->SPIR-V bytecode->Vendor ASM) there's a middle step where you can see what optimisations the offline compiler has performed :D

OK. That's not exactly what I was thinking :)

Good thing GLSLOptimizer is open source :) If it takes a few days to integrate GLSLOptimizer and it saves you a week in tweaking GLSL code, then you can't afford not to

Yup, I'm definitely going ot integrate it. Readme says it's not 100% compatible with GL330+, but I keep my fingers crossed.

However, I've the also bad habit to try to understand how things work. :D

If you benchmark shaders on the desktop you have to note that drivers often run a "quick and dirty" version of the shader first and replace it later (many seconds) with a optimized version.

So if you just run a benchmark for a few seconds you may not even see what your drivers optimized version is.

Is there a tool similar to AMD Shader Analyzer, but for nVidia gpus ?

Don't know about NV, but Shader Analyzer is outdated, you may wanna look and CodeXL instead.

CodeXL is great, it shows things like ISA code, runtime, occupancy, LDS & register usage, cache hits, stall time due to bandwith limits, spilled registers etc.

It's easy to get conclusions like: "If i could decrease register count by 2 and LDS usage by 0.5 kB, occupancy would raise from 50% to 60%, probably resulting in a 10% speed up".

I've use it only for compute, but assume it's similar usefull to general shaders as well.

This topic is closed to new replies.

Advertisement