# OpenGL Porting OpenGL to Direct3D 11 : How to handle Input Layouts?

This topic is 747 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi guys!

Theses days i'm trying to add in support for Direct3D 11 to my rendering engine which currently supports OpenGL 3.3 upwards. While writing the abstraction i hit a bit of a road block : Input Layouts. So to my knowledge in Direct3D 11 you have to define Input Layout per shader (by providing Shader Bytecode). Whereas in OpenGL you have to make glVertexAttribPointer calls for each attribute.

Currently i am using VAO's and store the Attribute Locations in them by calling glVertexAttribPointer after binding Buffers like so :

glGenVertexArrays(1, &VAO);
glGenBuffers(1, &VBO);
glGenBuffers(1, &IBO);

glBindVertexArray(VAO);

glBindBuffer(GL_ARRAY_BUFFER, VBO);
glBufferData(GL_ARRAY_BUFFER, Vertices.size() * sizeof(Vertex), &Vertices[0], GL_STATIC_DRAW);

glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, IBO);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, Indices.size() * sizeof(GLuint), &Indices[0], GL_STATIC_DRAW);

// then do attribpointer calls before unbinding VAO

glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, sizeof(Vertex), (GLvoid*)0);

......

glBindVertexArray(0);


But since doing this per-VAO won't work with Direc3D, i have i come up with the following:

Call glVertexAttribPointer every frame for each shader that uses a particular vertex layout (similar to calling IASetInputLayout).

Question 1 :

Is this a good idea? Will calling glVertexAttribPointer so often affect performance? How do you guys handle this in your engines?

Question 2 (bonus :D ) :

Should i use Vertex Buffers and Index Buffers in my OpenGL implementation without VAO's? (because Direct3D does not have such a thing). Or should i try to emulate VAO's in Direct3D somehow? (have an array of Buffers? that's awful i think).

I know it's a lot of questions but it's been driving me nuts. I really want to hear your thoughts. Thanks in advance guys!

##### Share on other sites

While writing the abstraction i hit a bit of a road block : Input Layouts. So to my knowledge in Direct3D 11 you have to define Input Layout per shader (by providing Shader Bytecode). Whereas in OpenGL you have to make glVertexAttribPointer calls for each attribute

It's not per-shader, but per vertex shader input structure. If two shaders use the same vertex structure as their input, they can share an Input Layout. The bytecode parameter when creating an IL is actually only used to extract the shader's vertex input structure and pair it up with the attributes described in your descriptor.
In my engine, I never actually pass any real shaders into that function -- I compile dummy code for each of my HLSL vertex structures which is only used during IL creation -- e.g. given some structure definitions:
StreamFormat("colored2LightmappedStream",  -- VBO attribute layouts
{
[VertexStream(0)] =
{
{ Float, 3, Position },
},
[VertexStream(1)] =
{
{ Float, 3, Normal },
{ Float, 3, Tangent },
{ Float, 2, TexCoord, 0 },
{ Float, 2, TexCoord, 1, "Unique_UVs" },
{ Float, 4, Color, 0, "Vertex_Color" },
{ Float, 4, Color, 1, "Vertex_Color_Mat" },
},
})
VertexFormat("colored2LightmappedVertex",  -- VS input structure
{
{ "position",  float3, Position },
{ "color",	   float4, Color, 0 },
{ "color2",    float4, Color, 1 },
{ "texcoord",  float2, TexCoord, 0 },
{ "texcoord2", float2, TexCoord, 1 },
{ "normal",    float3, Normal },
{ "tangent",   float3, Tangent },
})
StreamFormat("basicPostStream",  -- VBO attribute layouts
{
[VertexStream(0)] =
{
{ Float, 2, Position },
{ Float, 2, TexCoord },
},
})
VertexFormat("basicPostVertex",  -- VS input structure
{
{ "position", float2, Position },
{ "texcoord", float2, TexCoord },
})
this HLSL file is automatically generated and then compiled by my engine's toolchain, to be used as the bytecode when creating IL objects:
/*[FX]
Pass( 0, 'test_basicPostVertex', {
vertexLayout = 'basicPostVertex';
})*/
float4 vs_test_basicPostVertex( basicPostVertex inputs ) : SV_POSITION
{
float4 hax = (float4)0;
hax += (float4)(float)inputs.position;	hax += (float4)(float)inputs.texcoord;	 return hax;
}
/*[FX]
Pass( 1, 'test_colored2LightmappedVertex', {
vertexLayout = 'colored2LightmappedVertex';
})*/
float4 vs_test_colored2LightmappedVertex( colored2LightmappedVertex inputs ) : SV_POSITION
{
float4 hax = (float4)0;
hax += (float4)(float)inputs.position;	hax += (float4)(float)inputs.color;	hax += (float4)(float)inputs.color2;	hax += (float4)(float)inputs.texcoord;	hax += (float4)(float)inputs.texcoord2;	hax += (float4)(float)inputs.normal;	hax += (float4)(float)inputs.tangent;	 return hax;
}

Won't claim this is the best/only way of doing this, but I define a "Geometry Input" object that is more or less equivalent to a VAO. It holds a vertex format and the buffers that are bound all together in one bundle. The vertex format is defined identically to D3D11_INPUT_ELEMENT_DESC in an array. In GL, this pretty much just maps onto a VAO. (It also virtualizes neatly to devices that don't have working implementations of VAO. Sadly they do exist.) In D3D, it holds an input layout plus a bunch of buffer references and the metadata for how they're bound to the pipeline.

The only problem with that is that an IL is a glue/translation object between a "Geometry Input" and a VS input structure -- it doesn't just describe the layout of your geometry/attributes in memory, but also describes the order that they appear in the vertex shader. In the general case, you can have many different "Geometry Input" data layouts that are compatible with a single VS input structure -- and many VS input structures that are compatible with a single "Geometry Input" data layout.
i.e. in general, it's a many-to-many relationship between the layouts of your buffered attributes in memory, and the structure that's declared in the VS.

In my engine:
* the "geometry input" object contains a "attribute layout" object handle, which describes how the different attributes are laid out within the buffer objects.
* the "shader program" object contains a "vertex layout" object handle, which describes which attributes are consumed and in what order.
* When you create a draw-item (which requires specifying both a "geometry input" and a "shader program"), then a compatible D3D IL object is fetched from a 2D table, indexed by the "attribute layout" ID and the "vertex layout" ID.
* This table is generated ahead of time by the toolchain, by inspecting all of the attribute layouts and vertex layouts that have been declared, and creating input layout descriptors for all the compatible pairs. Edited by Hodgman

##### Share on other sites

For our relatively simple purposes and limited formats, we can simply standardize attribute ordering. In general I find the D3D 11 shader bytecode thing to be an unwelcome extra step, though maybe the wider hardware selection forces it? It seems like it's just a validation step I don't want though. You have the order as a result of the input element desc array anyway, so it's easy to generate a shader just to get through the validation. Metal simply has the shader fetch from the buffer, and it's your own responsibility to make sure that what the shader is fetching and what the buffer is offering match. That's way cleaner for me.

I'm not sure we're doing anything all that different, though.

Edited by Promit

##### Share on other sites

though maybe the wider hardware selection forces it? It seems like it's just a validation step I don't want though.

I used to struggle with the concept myself, and everyone I've worked with in the graphics space also finds it a pain. It's especially painful for simpler projects.

It's both required on some hardware and an important optimization on others.

I struggled with this a lot too, and even various Pro graphics developers I know rather hat it. Unfortunately, it's not going anyway - note that in D3D12, the input layout is baked into the PSO and so is even more cumbersome than it is in D3D11. If it helps, mentally tack on the word State to the end of ID3D11InputLayout. It's just like BlendState, RasterizerState, or DeptchStencilState in many regards, except of course for the caveat that it binds very directly with the shader code in use.

So, let's explain. The shader is expecting various inputs. Say, it's expecting an input in slot 3 that must be decoded to a float4. Armed with only this information, the input assembler doesn't know what the _source_ format is. Are you supplying floats? Normalized integers? Unnormalized integers? A compressed format? The shader doesn't know and hence the IA can't know just from the bytecode.

The input geometry has placed attributed in various formats. It knows that a given attribute is in the first buffer and is a 4-component integer that should be normalized. Unfortunately, there's no indication of what to do with that input. Is it even being used? Which IA slot should actually consume it? Is the semantic TEXCOORD0 supposed to go into slot 0, 1, 2, etc.? Is there some further operation being performed on it? What format should it be unpacked into?

Some hardware hence requires the IA to have a block of state defining all these bindings very explicitly. Even if it weren't required, looking at the bytecode and input layout, finding all the common elements, hashing those, and looking those up internally in the driver on every IASetInputLayout or VSSetShader call is difficult to make efficient.

In D3D's model, you have to write a cache yourself but you can make said cache minimal and efficient. In GL's model, the VAO/shader cache must be magic in the driver, and must handle a great deal of generalization that probably doesn't matter to your app. This is just another example of D3D's state-block model being certainly more cumbersome than GL's statemachine model while being more efficient. Just like D3D12 and Vulkan are far more cumbersome yet can be far more efficient. Likewise, just as D3D12's power comes at the cost of decreased efficiency if you aren't putting in the right effort, D3D11's model is likely less efficient than GL's _if_ you don't bother with a decent geometry<->shader cache.

Theses days i'm trying to add in support for Direct3D 11 to my rendering engine which currently supports OpenGL 3.3 upwards. While writing the abstraction i hit a bit of a road block : Input Layouts.

There's a decent approach very similar to what has been mentioned previously that more closely matches what the Big Engines do and closely matches what most Small Engines probably _should_ do.

The first to realize is that your geometry buffers represent some actual object(s). They have meaning, semantics, and purpose. They can only be rendered in a handful of ways relevant to your engine. At the simplest commercializeable end, these ways are possibly just "render to G-Buffer" and "render to depth buffer" (for shadows, z prepass, etc.).

When you define your objects/geometry you then also generally _must_ define this set of rendering methods, often called Materials (though that term has a few different meanings). A Material only needs to work with geometry in a particular input layout because the asset baking pipeline can ensure this will happen (and you can split up bigger and more complex interleaved buffers).

These rendering methods are, roughly, shaders + state (e.g. mostly a PSO in D3D12). That also turns into a great place to put input layouts! In fact in D3D12, it _must_ be where you put the input layouts since those are baked into PSOs.

Your code can thus be structured something like this:

class Buffer;
class RenderPass {
void setDepthState(enabled, op);
void ...
};
class Material {
void setLayout(attributes); // all passes must be compatible with layout
};
class MaterialInstance { // this is the object that holds or caches all the compiled things
Material material;
};
class Object {
MaterialInstance material;
Buffer[] buffers; // matches layout required by Material
};
class RenderQueue {
void draw(object, pass_id);
};


The above is easy to extend with static vs dynamic attributes which is handy for more advanced graphical effects and architectures.

##### Share on other sites

though maybe the wider hardware selection forces it? It seems like it's just a validation step I don't want though.

I used to struggle with the concept myself, and everyone I've worked with in the graphics space also finds it a pain. It's especially painful for simpler projects.

It's both required on some hardware and an important optimization on others.

I struggled with this a lot too, and even various Pro graphics developers I know rather hat it. Unfortunately, it's not going anyway - note that in D3D12, the input layout is baked into the PSO and so is even more cumbersome than it is in D3D11. If it helps, mentally tack on the word State to the end of ID3D11InputLayout. It's just like BlendState, RasterizerState, or DeptchStencilState in many regards, except of course for the caveat that it binds very directly with the shader code in use.

So, let's explain. The shader is expecting various inputs. Say, it's expecting an input in slot 3 that must be decoded to a float4. Armed with only this information, the input assembler doesn't know what the _source_ format is. Are you supplying floats? Normalized integers? Unnormalized integers? A compressed format? The shader doesn't know and hence the IA can't know just from the bytecode.

The input geometry has placed attributed in various formats. It knows that a given attribute is in the first buffer and is a 4-component integer that should be normalized. Unfortunately, there's no indication of what to do with that input. Is it even being used? Which IA slot should actually consume it? Is the semantic TEXCOORD0 supposed to go into slot 0, 1, 2, etc.? Is there some further operation being performed on it? What format should it be unpacked into?

Some hardware hence requires the IA to have a block of state defining all these bindings very explicitly. Even if it weren't required, looking at the bytecode and input layout, finding all the common elements, hashing those, and looking those up internally in the driver on every IASetInputLayout or VSSetShader call is difficult to make efficient.

Ok, I see what you're driving at. When you supply the bytecode, there's wiring of what attributes link to what semantics in the shader, and the order of these two need not match in declaration or layout. So the runtime has to generate fetch rules in the IA that tell it how to link things up. Once it's done that, there are specific rules defining what IAs are compatible with what shaders (prefixes, basically). For my purposes, the declaration and memory layout ordering always matches - I forgot this was a potential issue in the first place. That's why I can get away with generating a shader instead of using a real one (though usage-wise it's preferred to supply a real one).

On PSO APIs, my 'shader' object encapsulates a total PSO and all other pieces of the pipeline are required to conform. Works well enough in my smaller setting, in larger engines it'd be tooling-generated as Hodgman mentioned.

In D3D's model, you have to write a cache yourself but you can make said cache minimal and efficient. In GL's model, the VAO/shader cache must be magic in the driver, and must handle a great deal of generalization that probably doesn't matter to your app. This is just another example of D3D's state-block model being certainly more cumbersome than GL's statemachine model while being more efficient. Just like D3D12 and Vulkan are far more cumbersome yet can be far more efficient. Likewise, just as D3D12's power comes at the cost of decreased efficiency if you aren't putting in the right effort, D3D11's model is likely less efficient than GL's _if_ you don't bother with a decent geometry<->shader cache.

The VAO design is idiotic. What do we have in D3D 11/12, Vulkan, Metal, etc? There's a block of state, which pre-configures the necessary pipeline pieces to do all the operations they need to do. All we do at that point is plug in the pointers to the right buffers, render, change pointers, render. But GL/VAO? No, GL and VAO have to link the actual physical buffer pointers themselves into the state. So now I have a unique VAO per object (or group of objects in the cases that I can share VBOs and offset) and I have to switch between them even when the actual vertex layout itself matches. So now they're making things difficult for both the user AND the driver, AND this intersects poorly with a proper PSO implementation. Ugh. Of course they finally had to go back and fix this with ARB_vertex_attrib_binding, which was standardized for 4.3. Naturally, a number of platforms don't support it.

Edited by Promit

##### Share on other sites

No, GL and VAO have to link the actual physical buffer pointers themselves into the state. So now I have a unique VAO per object (or group of objects in the cases that I can share VBOs and offset) and I have to switch between them even when the actual vertex layout itself matches. So now they're making things difficult for both the user AND the driver, AND this intersects poorly with a proper PSO implementation. Ugh. Of course they finally had to go back and fix this with ARB_vertex_attrib_binding, which was standardized for 4.3. Naturally, a number of platforms don't support it.

Oh goodness, I totally forgot about that little turd of a detail.

The reason for it is that GL also binds buffer-specific attributes into its "layout" details in the VAO. Note in D3D for instance that you specify the offset and stride when you bind the buffers, not in your input layout / PSO. That said, it's still a bit bewildering that GL didn't let you swap the buffers even when they have the same offset and stride. :/

Did they at least fix/replace VAO's in OpenGL 4.nobody_can_use_this_yet ? I feel like they did.

Ah, yeah: https://www.opengl.org/wiki/GLAPI/glBindVertexBuffer / https://www.opengl.org/sdk/docs/man/html/glBindVertexBuffer.xhtml

glBindVertexBuffer modifies the currently bound VAO. So you'd use VAO as an input layout equivalent then rebind the buffers as needed before drawing. Those are extensions in 4.3 and core in 4.5, though, so hardware support is likely very hit and miss (and mobile is screwed as usual).

##### Share on other sites

That's the extension I mentioned, ARB_vertex_attrib_binding. You're slightly off on the versions - it's an extension to 4.2, available in 4.3 core. Also in ES 3.1. This means Windows on recent drivers yes, Linux on proprietary drivers yes, Mac no, mobile only cutting edge. Assuming it's implemented correctly...

Edited by Promit

1. 1
2. 2
Rutin
22
3. 3
4. 4
JoeJ
16
5. 5

• 14
• 29
• 12
• 11
• 11
• ### Forum Statistics

• Total Topics
631775
• Total Posts
3002283
×