Max performance and support

Started by
11 comments, last by TheChubu 9 years, 1 month ago


Are you implying that this is automatically done when the context is created?
As in if I do nothing special, no library from absolute scratch implementation, create a OpenGL context targeting my applications highest supported version. I will automatically get support for those version's functions

Or are you saying that the library that you are using does some additional work to check?

Loading a dynamic library is a task of the operating system in the end. One step is to resolve all symbols of the library that are marked as to be resolved automatically. If the OS cannot do that for at least one symbol, then loading the library fails. So the library uses symbols it expects to exist directly, and any useful symbols that just may exist are checked inside the own library initialization code, so to say. Details depend on the running operating system and used compiler/linker, of course.


Are talking about the ARB extensions here? Functions such as GL_ARB_Map_Buffer_Range?

Yes; ARB, EXT, perhaps even vendor specific extensions (depends on the target platform). Using something like GLEW in-between is possible, of course, but it does not unburden you to check for availability.

Advertisement

iOS partially does in certain circumstances, such as when attributes in a vertex buffer are misaligned.

ie, not aligned to 16 bytes? ES 2? I've seen thrown around that explicit 16 byte alignment is good for some desktop hardware too, AMD cards apparently. I'm assuming if you're using another API (Mantle? ES 3?) you'd have to do proper alignment in any case.
Every card that I've been able to find specs for in the past decade has required attributes to be 4byte aligned.
AFAIK, D3D forces this on you, e.g. by not allowing you to define an attribute of data type short3 in an input-layout/vertex-descriptor, by having a huge enum of all valid formats (each of which is a type such as unsigned short integer, and a component count such as 4/RGBA).
GL on the other hand lets you declare the type (e.g. short integer) and the component count (4/RGBA) separately, which allows you to specify combinations that no hardware supports - such as a short3 having 6 byte alignment.
As mentioned by LS, in that particular case the GL driver will have to reallocate your buffer and insert the padding bytes after each element itself wasting a massive amount of CPU time... Would be much better to just fail hard, early, rather than limping on :(

As for 16/32byte alignment - these values are generally for the stride of your streams, not the size of each element.
E.g. Two interleaved float4 attribs (= 32byte stride) is better than an interleaved float4 attrib & float3 attrib (= 28 byte stride).
This rule is less universal, and varies greatly by GPU. The reason it's important on some GPU's is that they first have instructions to fetch a whole cache line, and then instructions to fetch individual attributes from that line.
If the vertex fetch cache uses 32 byte cache-lines and if you also have a 32 byte stride, then it's smooth sailing - just fetch one cache-line and then fetch the attribs from it!
But if you have a 28byte stride, then in the general case you have to fetch two cache lines and deal with reassembling attributes that are potentially straddling the boundary between those two lines, resulting in a lot more move instructions being generated at the head of the vertex program :(

Every card that I've been able to find specs for in the past decade has required attributes to be 4byte aligned. [...]
As mentioned by LS, in that particular case the GL driver will have to reallocate your buffer and insert the padding bytes after each element itself wasting a massive amount of CPU time... Would be much better to just fail hard, early, rather than limping on
Ahh I see. Yeah, fail fast approach would be nice in that case. Or at least a mandatory warning in ARB_debug_output.

Then again, it doesn't sounds much of a specific iOS devices issue then...


As for 16/32byte alignment - these values are generally for the stride of your streams, not the size of each element. [...] if you have a 28byte stride, then in the general case you have to fetch two cache lines and deal with reassembling attributes that are potentially straddling the boundary between those two lines, resulting in a lot more move instructions being generated at the head of the vertex program
Ahh I see, I just mentioned the 16 byte number because its also the alignment for uniform buffer attributes. ie, everything has to be fitted in vec4 chunks. Thought it might be the same for the attributes.

Not aligned according to the guidelines here. Readin'

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

This topic is closed to new replies.

Advertisement