Jump to content

  • Log In with Google      Sign In   
  • Create Account

Awesome job so far everyone! Please give us your feedback on how our article efforts are going. We still need more finished articles for our May contest theme: Remake the Classics

tanzanite7

Member Since 20 Nov 2005
Offline Last Active May 20 2013 04:20 AM
-----

Posts I've Made

In Topic: Optimizing Out Uniforms, Attributes, and Varyings

19 May 2013 - 06:00 PM

Yeah I was just sticking with the terminology already present in the thread.

Ah, that explains it. Now, speaking of which - to OP, are you sure you are not using an unnecessarily old GLSL?

GL2's API for dealing with shaders, uniforms, attributes and varyings is absolutely terrible compared to the equivalents in D3D9, or the more modern APIs of D3D10/GL3...

Quite, was a bit puzzled myself when the GLSL first surfaced. OGL sure has long lasting developmental issues (stemming from: crippling need for consensus / compatibility / support).
 

BTW when using interface blocks for uniforms, the default behaviour in GL is similar to D3D in that all uniforms in the block will be "active" regardless of whether they're used or not though - no optimisation to remove unused uniforms is done. It's nice that GL gives you a few options here though (assuming that every GL implementation acts the same way with these options...).

Did not quite understand what you said here (not sure which parts refer to D3D and which OGL).
About uniforms with OGL: Uniform buffers are not part of program object and hence are not directly bound to any program. Plain uniforms are bound to program object - however, it appears that everyone compiles an internal buffer for thous under the hood and the two cases are indistinguishable at hardware level.

So, in either case, there is no special "loading" code generated for uniforms and the only optimization of removing unused stuff one can speak of is ... well, just do not use the parts of the uniform you do not use ... duh. As the underlying hardware is the same then D3D is bound to end up doing the exact same thing here (unused uniforms are, in all regards that matter, thrown out - regardless of what is seen/reported at API side).

Ie. only buffers are bound - the individual uniforms are just offsets in machine code.

Oh, and thank goodness for std140 or my head would explode in agony.

With the default GL behaviour, the layout of the block isn't guaranteed, which means you can't precompile your buffers either. The choice to allow this optimisation to take place means that you're unable to perform other optimisatons.

What optimizations (makes zero difference at driver/OGL/GPU side)? You mean CPU side, ie. filling buffers with data? Yeah, it would be pretty painful not to use std140. IIRC, it was added at the same time as interface blocks - so, if you can use interface blocks then you can always use the fixed format also ... a bit late here to go digging to check it though.

Once you've created the individual D3D shader programs for each stage (vertex, pixel, etc), it's assumed that you can use them (in what you call a mix-and-match) fashion straight away, as long as you're careful to only mix-and-match shaders with interfaces that match exactly, without the runtime doing any further processing/linking.

Yep, got that when i re-read the "separate program objects" extension ( http://www.opengl.org/registry/specs/ARB/separate_shader_objects.txt ) - it uses "max-and-match" to describe it. It is core in 4.3. Not using it any time soon, nice to have the option though (as most shader programs do not particularly benefit from "whole-program-optimization").


Having precompiled intermediate is one of the most reoccurring requests at OGL side (even after binary blobs got already added) - having D3D brought up as an example time and time and time again. So, whats the holdup? If it makes sense for OGL then why is it not added?

That's a pretty silly argument.


Ee.. you lost me here :/. I think you implied an argument where there was none. I was conveying "wonderment" as perceived by me - it is not an argument from me nor from the "wonderer".

But if i would speculate anyway then the reason it has not been added to OGL might be:
* Khronos is slow and half the time i just want to throw my shoe at them.
* Instead of one specification one has to hope is implemented correctly (rolleyes.gif) - now there would be two.
* Consensus lock / competition ... ie. no MS as arbitrator to break the lock.
* Insufficient demand from thous that matter (learnin-ogl-complaining-a-lot persons do not matter).
* The question whether it would be worthwhile for OGL specifically has not been confidently settled.
* There are more important matters to attend to - maybe later.
* All the above.

PS. i would like to have an GLSL intermediate option - as you said, in case of shader explosion (as i call it), it gets problematic (uncached runs).

wait for 10 minutes the first time they load the game.

Then i would say that one is doing something wrong. One does not need thousands of shaders to show the splash screen ;)

... again, i am not against an intermediate, i would just like to point out that its absence is not as widespread and grave problem as it is often portrayed to be.

Sure, you can trade runtime performance in order to reduce build times, but to be a bit silly again, if this is such a feasible option, why do large games not do it?

Yep, that is silly indeed. Cannot quite use what one does not have - D3D, as far as i gather from your responses, does not have the option (no intermediate / whole-program-optimization) to begin with. Asking why the option that does not exist is not used more often ... well, good question.

Another reason why runtime compilation in GL-land is a bad thing, is because the quality of the GLSL implementation varies widely between drivers.

Having an extra specification and implementation is unlikely to be less problematic than not having the extra.

To deal with this, Unity has gone as far as to build their own GLSL compiler, which parses their GLSL code and then emits clean, standardized and optimized GLSL code, to make sure that it runs the same on every implementation. Such a process is unnecessary in D3D due to there being a single, standard compiler implementation.

... continuation: It is unnecessary in OGL too - GLSL etc is well specified. If implementers fail to follow the spec then changing the spec content (intermediate spec etc) to make them somehow read the darn thing and not fuck up implementing that ... is silly.

Leaving the bad example aside, what you wanted to say, if i may, is that a third party (Khronos) compiler would be helpful as it would leave only the intermediate for driver.

Perhaps. However, I highly doubt it would be any less buggy. Compilers are not rocket science (having written a few myself) - the inconsistencies stem from lesser user base and some extremely lazy driver developers and shader writers not reading the spec either. "Khronos compiler" would not have fixed any of thous.

---------------------------
I hope we are not annoying OP with this, a bit, OT tangent (assessing the merits of intermediate language in context of OGL and D3D). At least i can say i know more about D3D side than before - yay and thanks for that smile.png. Got my answer, which turned out to be relevant to OP too as OGL has the D3D mix-and-match option too, which if used has indeed the same limitations.

need sleep.
edit: yep, definitely bedtime.

In Topic: GLSL fragment shader, prevent depth writing

18 May 2013 - 05:11 PM

thanks both of you. I tried it out, and the results I got was:
-when early depth testing is enabled: layout(early_fragment_tests) in;
the whole sprite is overwritten the by topmost (despite the black areas being discarded, see pic)
-if disabled it works fine.

Weird :/.

Generally, if early depth test is possible then it will be done - there is no reason to ask for it (funky exceptions, which i am not aware of, excepted). When explicitly requiring it causes "discard" to be ignored then ... that is a strong suggestion that early depth test is impossible given the shader.

Discard is similar to alpha testing - it sounds like your hardware just does not support early depth test with alpha testing. When i think about it then it rings true - i vaguely remember that alpha test indeed is not compatible with early depth test.

Logically it is also true - depth test and write happen at the same time, it can not do that early if the alpha test/discard result is only known after the fragment program ran.

In short: cannot use early depth test.

So, bac to: Why do you need it ... really? What platform are you on that it is an concern? How do you know the overdraw is a problem? Is discard without early depth test really too slow? Could you perhaps simplify your shader?

edit:
Wiki agrees with me: http://www.opengl.org/wiki/Early_Fragment_Test

In Topic: Optimizing Out Uniforms, Attributes, and Varyings

18 May 2013 - 04:41 PM

addendum: did not quite remember what the "separate program objects" brought to OGL land ... well, reminded myself and: it is a way to use the D3D mix-and-match approach of forming a shader program from different stages. Similarly, with the same pitfals - there is no "whole program optimization" done (although, i quess, the driver might choose to do that in the background when it gets some extra time).

 

In short:

D3D: mix-and-match.

OGL: whole-program, or mix-and-match if you want it.


In Topic: Optimizing Out Uniforms, Attributes, and Varyings

18 May 2013 - 03:49 PM

For reference, as your OGL information is horrifically out of date, this is how it goes in OGL land:

First, shader stage inputs and outputs, regardless of stage, are called ... input and output: "in" / "out". The, imho pointless, notion of "attributes" and "varyings" in glsl source have been deprecated - good riddance.

Information is exchanged betweed shader stages per variable and/or one or more interface blocks (i use only interface blocks, except vertex in and fragment out as the drivers were a bit buggy way back then and i got just too used to not using the interface blocks there).

Vertex shader inputs (ie. attributes, sourced from buffers or as dangling attributes if not) do need a location number which are usually specified in the glsl source ("layout(location=7)") or with the rarely used alternative of querying/changing them outside glsl.

Fragment shader outputs work similarly (MRT for example).

Not sure whether one can specify variable locations inside or outside interface blocks elsewhere - would be insanely ridiculous thing to do (common sense would leave it as implementation detail outside OGL specification), so i highly doubt it is even allowed. The whole location querying/setting business is just soo silly that i did my best to forget all of it the moment the alternative got added to core OGL. So, can not say for certain that it is impossible.

Variables outside interface blocks are matched by name and interface blocks by block name (not the variable name that uses the block - which is very convenient). Sounds very similar to D3D, except the mandatory register allocation stuff.

Shader stages are compiled in isolation (that is the way it has always been), and linked together into shader program(s) later. If an input is not used then it is thrown away - D3D does probably the same, no? A shader stage can not know whether its output is used and it will be kept, of course.

OGL is usually a good specification and as expected, what exactly "compiling" and "linking" does under the hood is implementation detail and driver writers are free to do what they think is best for their particular hardware. Generally, tho, the final steps of compiling are done at link time for better results (like "whole program optimization" in VC). I can not see any reason for D3D not to do the same (it needs to recompile the intermediate anyway).

PS. shader programs can be extracted as binary blobs for caching to skip all of the compiling/linking altogether - i have never found any reason to use them myself (ie. never suffered shader count explosion).

"uniforms" are per shader stage and are hence easy to throw away at compile time. Uniform locations can also have their location defined in glsl source - however, they kind of "forgot" to add that ability with 3.* core, so either 4.3 core or ARB_explicit_uniform_location is needed.

This design choice allows you to pre-compile all your shaders individually offline, and then use them in many ways at runtime with extremely little error checking or linking code inside the driver.
GL can cull variables because it requires both an expensive compilation and linking step to occur at runtime. Basically, D3D traded a small amount of shader author effort in order to greatly simplify the runtimes for CPU performance.

AFAICS, both directions have their good and bad points and a lot of muddy water in-between. However, i can share my observations from OGL fence (NB! Just observations - i can not say whether nor how much it holds water nowadays).

Having precompiled intermediate is one of the most reoccurring requests at OGL side (even after binary blobs got already added) - having D3D brought up as an example time and time and time again. So, whats the holdup? If it makes sense for OGL then why is it not added? To paraphrase what people that actually do the drivers say: D3D intermediate destroys information, information which is vital for recompiling and optimizing the D3D intermediate (which is far from trivial) to the stuff the hardware actually needs. I imagine D3D intermediate has significantly improved over the years (ie. adding more high level stuff into it - making it a high level language and undoing the gains the intermediate initially had), but not sure (driver devs have been become a rarity in the public forums to put it mildly). Either way, it can not be better than not having the middle-muddle at all.

All it is good for is faster compilation times (at least as is often claimed, but i suspect the claims might be fairly out of date) - which is still way slower than no compiling/linking at all with OGL binary blobs (surely D3D has something similar?). Except one needs to cache thous first ... dang.

In Topic: GLSL fragment shader, prevent depth writing

17 May 2013 - 01:03 PM

Early depth testing is very efficient (nothing even remotely as efficient can be done in shader) - mucking with the depth in fragment shader will turn early depth test off (because there is no depth early available to do anything with ... early).

 

You can not disable depth writes in fragment shader - just discard the fragment completely (the "discard" you mentioned) omitting the depth write also.

 

Newer OGL versions might provide something that could be of use tho (but i doubt it) - i rarely use anything above OGL3.3.


PARTNERS