OpenGL samplers, textures and texture units (design question)

Started by
9 comments, last by Hodgman 9 years, 3 months ago

I'm porting a DX rendering framework to OpenGL. The DX framework follows the DX API - there is no relation between the samplers and the textures, and binding them to shaders is done seperatly. A pseudo-code of the DX framework looks like


pProgram->SetSampler(sampler_shader_var_name, pSomeSampler);
pProgram->SetTexture(texture_shader_var_name, pSomeTexture);

So from the user prespective, both are just another type of shader variables.

I'm trying to achieve the same simplicity using OpenGL, but the way GL works complicates things.

The problem is that samplers are not bound to shader variables, but to a texture unit. Once the sampler is bound to the texture unit, it affects all textures bound to the same texture unit.

The simplest solution is to let the user manage texture units on its own, but that means losing the abstraction.

Another solution is to create a psuedo-sampler state (not a GL object), then let the user bind it to the texture, and change the texture's sampler parameters. This is not good since it means I can only use one sampler at a time with each texture.

I have a bunch of others solutions, but nothing as clean as the DX code.

Well, I'm stuck. Spent the last 4 hours thinking, coding, deleting and vice-versa. Any advice on a clean way to it?

Advertisement
Ohhh... you're in a world of pain.

Mostly because very advanced developers and engineers can't agree whether the separation (DX11) or the merge (GL) is the best one. Arguments about being faster/more efficient, hardware friendly, clearer, and easier to use have been made for... both. Sometimes even the same reasoning has been made!
The only explanation is that some people just simply prefer one method, others prefer the other one.

Sampler objects are a nice way to deal with the issue; and they've been specifically made taking DX11 porting into account. This extension is widely supported so it's pretty convenient.
While sampler objects will be bound to a texture unit (and hence from GLSL perspective, they're merged); from a C++ perspective you can treat textures and samplers as separate objects until it's time to assign them to a texture unit (aka merge them).

Edit: Personally I think you're overly thinking about it because it is very rare to see the same texture being sampled with different filtering parameters / mip settings within the same shader in the same pass. Just bind the same texture to two texture units and use two GL sampler objects, one for each tex. unit.

While sampler objects will be bound to a texture unit

Is that so? I always thought it's texture object stores sampling parameters.

I've made dx9 implementation of my render lib behave that way.

https://www.opengl.org/wiki/Sampler_Object

"You could say that a texture object contains a sampler object, which you access through the texture interface."

https://www.opengl.org/wiki/Texture

Anatomy_of_a_Texture.png

PS: re-checked. Sampling parameter stored in texture object, not in texture unit.

Binding different textures into same texture unit won't force same filtering on all bound textures.


PS: re-checked. Sampling parameter stored in texture object, not in texture unit.

Binding different textures into same texture unit won't force same filtering on all bound textures.

Sorry, my original post was unclear, I was talking about sampler objects, not sampler parameters. Sampler objects are bound to a texture unit.

Is that so? I always thought it's texture object stores sampling parameters.
I've made dx9 implementation of my render lib behave that way.

Its the old way that sampling parameters are stored with the texture object. But it was recognized that doing that isn't a clean way, because it is totally legal and perhaps wanted to change sampling although the texture pixel data is the same, or to change the pixel data while the sampling is kept. (One can say that texture objects violate the single responsibility principle.) The solution currently available is the sampler object which stores sampling parameters only. However, sampling parameters is not (yet) removed from texture objects. IIRC, if you bind a sampler object to a texture unit, then the sampling parameters of the texture object are ignored and those of the sampler object are used; and if no sampler object is bound, the sampling parameters within the texture object are used.

Indeed. It's as haegarr said.

Before the SamplerObjects extensions, texture parameters lived inside the texture. If you wanted to use the same texture with different sampling parameters (i.e. clamp vs wrap, point vs bilinear filtering, etc); you had to clone the texture and use twice as GPU RAM.

SamplerObjects extension addressed this issue and now sampling parameters are separated from the texture; and when SamplerObjects are bound to a texture unit, they override the internal settings from the texture.


The solution currently available is the sampler object

Yeah, I didn't read Matias correctly. He clearly wrote "sampler object".

I'm not entirely sure this could be applied effectively for bigger projects, but as a way to simplify all of this I decided simply to fix samplers to specific texture units.

For example, texture unit 0 is diffuse texture, texture unit 1 is normal map, texture unit 2 is glow map, etc. That way you'd just bind specific samplers to those, say, 16x AF sampler for the first two, normal linear sampler for the third.

This makes texture binding very straightforward, if mesh has diffuse, it gets bound to tex unit 0, if mesh has normal map, gets bound to tex unit 1, and so on. If you want to change the sampling parameters for all common textures, just rebind a sampler to the first texture unit.

Now, this obviously restricts streamlines a lot what you can do, and if you want to go with a configurable rendering pipeline its not a good idea, but I'd just say to trim the corners whenever you can, code not for all the possible things you could do, but for all the possible things you're sure you'll be going to do.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator


Mostly because very advanced developers and engineers can't agree whether the separation (DX11) or the merge (GL) is the best one. Arguments about being faster/more efficient, hardware friendly, clearer, and easier to use have been made for... both.

On current generation hardware, when programming for the GPUs directly (instead of programming for D3D/GL), there are structures that map to the GPU-native data structures used by the hardware when performing memory fetches. For simplicity, they can look like (where the number 4 is made up and may differ -- but is actually accurate for SamplerDesc on the AMD Southern Islands ISA):


struct TextureDesc { u32 registers[4]; };
struct SamplerDesc { u32 registers[4]; };
struct BufferDesc { u32 registers[4]; };

Internally, ID3D11SamplerState would contain one of these SamplerDesc structures. CreateSamplerState converts the platform-agnostic D3D11_SAMPLER_DESC structure into this GPU-specific structure.

Likewise, a ID3D11ShaderResourceView object contains either a TextureDesc or a BufferDesc, which in turn contains a pointer to the memory allocation, the format of the data, the width/height/etc...

When a shader is compiled into actual GPU-native assembly, it ends up with a per-draw-call header looking something like below (where the comments are metadata used by the driver to match up these header entries with API slots). Say our shader has 4 HLSL texture uniforms, 1 sampler uniform and 1 cbuffer uniform --


struct MyShaderNumber42
{
  TextureDesc textures[4];//diffuse(location: t0), specular(location: t1), normal(location: t2), lightmap(location: t7)
  SamplerDesc samplers[1];//smp_bilinear(location: s0)
  BufferDesc buffers[1];//cbObject(location: b7)
};

The actual GPU-native shader assembly language has instructions to load data from textures/buffers without filtering -- these functions take a TextureDesc or BufferDesc (and an offset/coordinate) as parameters.

It also has instructions to load from textures with filtering -- these functions take a TextureDesc and a SamplerDesc (and an offset/coordinate).

Assuming you're using regular filtered texture sampling instructions, then whether or not you choose to use one-sampler-per-texture-slot or not has a huge impact on performance and the memory overhead incurred by each draw-call.

If you have 4 textures, but they all use the same filtering options, then one-sampler-per-texture-slot results in (sizeof(TextureDesc)+sizeof(SamplerDesc))*4 == 128 bytes of descriptors that have to be fetched per-pixel-wavefront.

With a shared sampler you get sizeof(TextureDesc)*4+sizeof(SamplerDesc) == 80 bytes of descriptor data to fetch per pixel.

In order for a pixel-shader to carry out a texture fetch, it's wavefront has to fetch the TextureDesc and SamplerDesc objects from memory first, so that it knows how to perform the fetch and where from.

Over a single full-screen 1080p draw-call, that's a saving of ~1.5MiB of memory bandwidth [(128-80 bytes) * 1920*1080 pixels / 64 pixels-per-wavefront] (or 89MiB/s if the game is running at 60Hz), simply by not fetching the useless sampler descriptors. (D3D style = ~2.5MiB/draw, GL-style = ~4MiB/draw)

That's a decent GPU-side saving... Even if I do exaggerate slightly -- because that number is assuming that there is no cache between the shader units and RAM. In practice, the actual traffic between the GPU-RAM and the shader-unit's L2 cache is going to be somewhat lower of course.

D3D11 makes this optimization simple for the driver-authors -- the HLSL shader code makes the separation explicit, so they know at compile time how many Texture and Sampler descriptor structures are required, and which objects are the inputs for each fetch instruction. They can generate the appropriate header and fetch instructions at compile-time.

GL makes the drivers very complex sad.png AFAIK, even with the separate sampler objects extension, GLSL still doesn't expose the separation, and still acts like there is one-sampler-per-texture...

This leaves the driver authors two choices:

1) They implement the sampler objects extension to let the users (i.e. us) pretend that we're using the new DX11 style of using separate samplers, but internally they still make one sampler descriptor for every texture and then just copy our shared sampler object's contents many times into duplicated descriptors. This option makes porting from DX11 easier, but incurs the stupid GPU-side penalties described above (and the tiny CPU-side per-draw-call overhead of duplicating the sampler objects about the place).

2) They do actually send the minimal number of SamplerDesc structures to the GPU, like DX11 can easily do. However, this is very complex as the shaders have been written assuming one sampler-per-texture-slot, which means that when choosing this option, the driver authors can't fully pre-compile the shaders into GPU-specific assembly ahead of time. So... in order to do this, at draw-call time they have to analyze the currently bound objects and find the unique set of samplers. They then need to potentially patch the shader ASM, generating a smaller header with the right number of SamplerDesc structures, and fix up all of the texture-fetch instructions to reference the correct SamplerDesc strucure within the header. They'll then need to cache that modified permutation of the shader so that it can be quickly fetched next time there's a draw-call with the same kind of sampler bindings... This is exacerbated by the fact that if a user has a HLSL shader that uses one texture with two different samplers, then in their GLSL port, they'll end up with two textures to represent that! So, this advanced GL driver also needs to realize that the unique set of bound textures is smaller than the number of texture-slots described by the shader and take this information into account when recompiling/reoptimizing the shader as well...

Needless to say, that's a huge first-time-draw-call overhead (and moderate every-time-draw-call overhead) to perform an optimization that should be dead simple and done once at shader-load time. You should not be compiling shader code inside your draw-calls... wacko.png Another reason why it's important to 'prime' the driver by drawing every object using it's shader and all of it's potential pipeline states once at load time, to ensure the driver has actually finished generating and caching all the code it needs.

If your GL drivers are advanced enough to use option #2, then using the sampler objects extension may help improve GPU-side performance, but it may come at a cost of occasional CPU-side driver-time spikes due to shader patching... If your GL drivers are using option #1, then it really makes no difference whether you use the sampler objects extension or not.

So basically: separate sampler states are a great choice on modern GPUs, however, it likely doesn't actually matter whether you use them or not under GL because you're probably screwed either way. laugh.png

@Hodgman:

Oh you want to start a war, dont you? smile.png

The thing is, DX has its trade off. GL may use higher bandwidth on worst case scenario (however these descriptors often fit in the L1 cache; also the wave occupancy influences how much the data is refetched); but DX style involves more instructions and more pointer chasing. GDDR is good at bandwidth, not at pointer chasing.

Timotty Lottes has two posts with a very thorough analysis on both styles on modern HW. The short version is that there is no ultimate best solution; it depends on what's your bottleneck and the characteristics of your scene.

Interestingly, Timotty Lottes ends up concluding that GL-style fetching is still superior on general purpose. However this will be an endless debate...

This topic is closed to new replies.

Advertisement