Graphics engines: what's more common; standard vertex structure or dynamic based on context?

Started by
14 comments, last by Hodgman 8 years, 10 months ago
What's more common in engines? Using a standard one-size-fits-all vertex structure, or a custom vertex structure depending on the context, and delegating the choice of vertex composition, and associated shader input layouts, to the game developer? Is there any significant downside to hardcoding a standard vertex structure? Is vertex structure flexibility a case of YAGNI?
Advertisement

Vertex structure depending on your gpu pipeline, it has been the case for at least 7 years.

You don't want to waste memory/bandwidth and so on, so you don't want a one structure fits all thing.

Not only is it simple to achieve, but it's also important to get high performance, so no nothing like a YAGNI.

Just for a little explanation about how to proceed.

Your vertex buffer is a black box, you usually don't perform morphing/shape animation, so you don't need to know anything about it besides its size, only the GPU needs to know the layout. So you load your buffer in memory and connect it to your GPU pipeline (by that I mean your set of shaders and states making a pipeline, as in D3D12/Mantle/Vulkan definition, that we used for a long time hardware side... ^^).

You'll probably have either a scripting thing (FX like) or a programmed thing (compiled code) defining the vertex layout for a given GPU pipeline. Both are fine since you'll need to have compiled code to generate the vertex shader signature anyway. (So using solution #2 you spare yourself having to write a parser, but if you already have it the first solution is a little more flexible [ie data driven])

-* So many things to do, so little time to spend. *-

My engine is data driven, and I'd expect the same from other big engines. In the game's data files, I declare "stream formats" (how vertex data is laid out on disk / in memory) and "vertex formats" (the input to vertex shaders) and then declare which ones are compatible with each other. Those declared compatible pairs describe the fixed-function config of the input-assembler stage -- a.k.a. vertex declaration / input layout / vertex attribute formats.

These input assembler layouts, and reflection data on the other structures, are compiled from human-readable text into an efficient binary format for the engine runtime.

When compiling a model file supplied by an artist, I first check which material is assigned to it, which tells you which vertex shaders are potentially going to be used to draw the model. That gives you a list of vertex formats that the model has to be compatible with, and from there you can find a stream format that is compatible with all of them. You can then validate that the artist's data is compatible with that stream format, and import their data / convert it to that format and save to disk.

If you can't find a stream format that's compatible with the required vertex formats, I emit an error blaming the graphics programmer.
If the artist's file isn't compatible with the stream format, I emit an error blaming the artist (e.g. this material requires tangents, but the model was exported without tangents).

Is there any significant downside to hardcoding a standard vertex structure?

Severe performance impacts on the vertex shader, and increased memory requirements. For a PC game you could probably get away with it. On a last-gen console game (or a current-gen one later on in this cycle when people are pushing the limits), not so much.

There are 3 categories here and you need to decide where you fit.

#1: tri-Ace—in-house engine designed to meet a set of specific and focused needs. While intended to be used on many projects, there aren’t many variations on the vertex-buffer layouts. The models export to only a few types of formats and there is only one primary programmer-usable format. Even if you only care about positions and colors, you still get normals, tangents, and UV’s along with it, which wastes memory and vertex transfers.
This still works because the hard-coded format is avoided as much as possible and the model formats are not very different from each other.

#2: Luminous Engine—in-house engine designed to meet the demands of more types of games, but still a little focus on the types of games.
There are no hard-coded formats, and a few more variations on model formats than #1. There is no waste, but the 2 systems are separate (models vs. your own custom buffers), which decreases the time spent developing the systems but makes it a bit harder to use.

#3: Any public game engine—a fully general-purpose game engine with no specific focus in mind. There are no limitations on the formats you can use and the models use the same system as you would if you were to make one manually. Everything is user-friendly, but it takes more development time to create a system that is also fully capable and flexible.


Developing a fully capable and flexible wrapper around vertex formats while making them easy to use takes a bit of time and know-how, but as you can see there are ways to meet in the middle. You should do what best suits your needs.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Why not just split vertex buffers depending on what attributes are present, and then only bind those that exists when drawing?

Creating the buffers:


D3D11_BUFFER_DESC bufferDescription;
ZeroMemory(&bufferDescription, sizeof(D3D11_BUFFER_DESC));
bufferDescription.Usage = D3D11_USAGE_IMMUTABLE;
bufferDescription.ByteWidth = vertexData.size() * sizeof(float);
bufferDescription.BindFlags = D3D11_BIND_VERTEX_BUFFER;

D3D11_SUBRESOURCE_DATA initData;
ZeroMemory(&initData, sizeof(D3D11_SUBRESOURCE_DATA));
initData.pSysMem = &vertexData.at(0);
DXCALL(device->CreateBuffer(&bufferDescription, &initData, &mVertexBuffer));

// normal buffer
if (normalData.size() > 0)
{
    ZeroMemory(&bufferDescription, sizeof(D3D11_BUFFER_DESC));
    bufferDescription.Usage = D3D11_USAGE_IMMUTABLE;
    bufferDescription.ByteWidth = normalData.size() * sizeof(float);
    bufferDescription.BindFlags = D3D11_BIND_VERTEX_BUFFER;

    ZeroMemory(&initData, sizeof(D3D11_SUBRESOURCE_DATA));
    initData.pSysMem = &normalData.at(0);
    DXCALL(device->CreateBuffer(&bufferDescription, &initData, &mNormalBuffer));
}

// tangent buffer
if (tangentData.size() > 0)
{
    // ...
}

// bitangent buffer
if (bitangentData.size() > 0)
{
    // ...
}

// texcoord buffer
if (texCoords.size())
{
    // ...
}

Drawing:


const uint32_t vertexSize = sizeof(float) * 3;
const uint32_t offset = 0;

mContext->IASetVertexBuffers(VertexBufferSlot::VERTEX_BUFFER_SLOT_VERTICES, 1, &mVertexBuffer.p, &vertexSize, &offset);
if (mNormalBuffer)
    mContext->IASetVertexBuffers(VertexBufferSlot::VERTEX_BUFFER_SLOT_NORMALS, 1, &mNormalBuffer.p, &vertexSize, &offset);
if (mTangentBuffer)
    mContext->IASetVertexBuffers(VertexBufferSlot::VERTEX_BUFFER_SLOT_TANGENTS, 1, &mTangentBuffer.p, &vertexSize, &offset);
if (mBitangentBuffer)
    mContext->IASetVertexBuffers(VertexBufferSlot::VERTEX_BUFFER_SLOT_BITANGENTS, 1, &mBitangentBuffer.p, &vertexSize, &offset);
if (mTexcoordBuffer)
{
    const uint32_t texcoordSize = sizeof(float) * 2;
    mContext->IASetVertexBuffers(VertexBufferSlot::VERTEX_BUFFER_SLOT_TEXCOORDS, 1, &mTexcoordBuffer.p, &texcoordSize, &offset);
}

mContext->IASetIndexBuffer(mIndexBuffer, DXGI_FORMAT_R16_UINT, 0);
mContext->DrawIndexed(mNumIndices, 0, 0);

The gbuffer vertex shader is a little trickier though, introducing branching:


struct GBufferVSIn
{
    float3 mPosition : POSITION;
    float3 mNormal : NORMAL;
    float3 mTangent : TANGENT;
    float3 mBitangent : BITANGENT;
    float2 mTexcoord : TEXCOORD;
};

struct GBufferVSOut
{
    float4 mPosition : SV_POSITION;
    float3 mNormal : NORMAL;
    float3 mTangent : TANGENT;
    float3 mBitangent : BITANGENT;
    float2 mTexcoord : TEXCOORD;
};

cbuffer GBufferConstants : register(CBUFFER_REGISTER_VERTEX)
{
    float4x4 gWVPMatrix;
    float4x4 gWorldViewMatrix;
    float gTextureTilingFactor;
    bool gHasDiffuseTexture;
    bool gHasNormalTexture;
};


GBufferVSOut vs_main(GBufferVSIn input)
{
    GBufferVSOut output;

    output.mPosition = mul(gWVPMatrix, float4(input.mPosition, 1.0));
    output.mNormal = mul((float3x3)gWorldViewMatrix, input.mNormal);
    if (gHasNormalTexture)
    {
        output.mTangent = mul((float3x3)gWorldViewMatrix, input.mTangent);
        output.mBitangent = mul((float3x3)gWorldViewMatrix, input.mBitangent);
    }
    if (gHasDiffuseTexture)
        output.mTexcoord = gTextureTilingFactor * input.mTexcoord;

    return output;
}

Is there any noticable downsides to using this process? Is the branching really that bad?

On most game engines I have worked on we generally stick to a few base types. Usually all types have Pos, UV and a compressed tangent space. Lightmapped objects also have a second UV set and skinned have the blend weights and indices (sometimes combined together). Position only streams also common for depth only rendering which is common for prepass and shadow maps.

kaiserJohn. The downside to what you suggest is potentially more discrete reads which can lower vertex rate. For example on the X360 the vertex fetcher can do 1 "megafetch" of 32 bytes per clock at 500MHz. If you split into 4 streams like above then you have to do 4 megafetches so you are now limited to 125MVerts/sec. This isn't as relevant now on modern consoles but you can stretch the caches a little bit more.

How about the shadow pass? You would not need to send all vertex attributes, right? With a fat vertex structure I would assume you need to send all attributes no matter what. With the interleaved layout (as kaiserJohn is suggesting) you could just bind the needed data. Wouldn't that help?

How about the shadow pass? You would not need to send all vertex attributes, right? With a fat vertex structure I would assume you need to send all attributes no matter what. With the interleaved layout (as kaiserJohn is suggesting) you could just bind the needed data. Wouldn't that help?

Yes it is more efficient to just send position only for this case. As well as having a position only stream, if you can spare a bit more memory then can you can remove all the duplicate positions (due to the other attributes not being needed) and have a depth only index buffer which references only the unique positions. Think of a cube which needs 24 vertices for regular rendering but only 8 for depth only.

For alpha tested objects you generally also need the UV as well as the position.

I have a couple base types that are byte-for-byte matched to their shaders (now mandatory in Metal), and a couple specialized types that are for specific things but also matched to shaders. So there's essentially a utility vertex, a skinned model vertex, a scene model vertex, and a few specialty things like water surface vertices. Technically the underlying mesh format is data driven and allows you to declare any vertex type you like, but the tooling will only give you a few types to export.

I don't like to custom build and interleave vertex formats based on the shaders, though I've seen this done. I'd rather pay some extra transfer cost than deal with the static memory use explosion when shader formats differ in trivial ways. I also don't like to deinterleave the streams and bind them separately as KaiserJohan suggested, because this is actually not optimal on the GPU side. In a few specialized cases we do use vertices that assemble from multiple streams, but I try to avoid it.

SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.

I think it should be designed in a kind of "supply and demand" scheme, e.g. the shader has demand of a set of attributes (Pos, TexCoord) and your engine will provide it with these (and only these). In Direct3D11 you can use ShaderReflection to obtain these demands from a shader. What I do in my engine is I have a function that does this (once) whenever a shader is loaded up and it finds out what the vertex shader's input (demand) is, so that only the required buffers are set without any redundancies.

D3DReflect: https://msdn.microsoft.com/en-us/library/windows/desktop/dd607334%28v=vs.85%29.aspx

This topic is closed to new replies.

Advertisement