Interlaced or separate vertex buffers?

Started by
3 comments, last by backstep 10 years, 4 months ago

The qustion is which one to use and WHEN :

- to use one vertex buffer that stores all vertex information

- to use multiple vertex buffer or each vertex data.

Well (I'm not shure about this) *the best method* is to use interlaced buffers because of the way the GPU reads vertex buffers and cache lines?

The other problem is when we have a lot of effects that use only vertex positions, our interlaced layout will slow us down because a lot of useless data would be send.

I need more details on how the GPU handle Vertex buffers, and probably some hints how to *predict* which method will perform faster for some special cases?

Advertisement

In general you can't predict because it varies from program to program.

Sending more data might be slower, but are you really bandwidth-bound? Even if so it's a tradeoff against the extra state changes required.

It's a common enough mistake to just make an isolated measurement like this and assume that it's going to hold true for the entire program, but in reality you've got a lot of different parts of the graphics pipeline interacting, and gains in one area may lead to losses in another. Likewise a numerical gain that looks good on paper may amount to absolutely nothing in terms of performance if the area you got the gain in is not even a bottleneck (in your program) to begin with. Then again it may work out to be a worthwhile gain after all.

Wanting to make such a prediction is really just a form of pre-emptive optimization; the standard advice to profile your program, profile alternate approaches, and know the performance characteristics of them in your program still applies.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

In general you can't predict because it varies from program to program.

(Thinking out loud here...)

It will also vary across hardware; on anything recent which tends towards unified L1 and L2 caches there is a reasonable chance that split vertex streams will work better depending on your data access patterns.

For example if you have a vertex with {float3 position, float3 normal, float 2 uv} then you've got 32 bytes of data which means you can only fit two of these vertices into a single cache line (assuming 64byte per line) but chances are you won't need the normal or uv until later in the program so if you split it into {float3 position} {float3 normal, float 2 uv} you'll be able to fit 5 positions into a cache line (well, 5.33333333) which will make your GPU happy when doing coherant data fetches for all threads in a thread group.

So, on an AMD GCN card you'd use 12 cache lines to fetch 64 threads worth of positional data vs using 32 for the 'fat' format.

Of course the driver and/or HLSL might well do vertex attribute assembly up front rendering this all meaningless as it'll fetch all the data before you do things rather than on demand making this whole post somewhat redundant ;)
(I should probably look into that...)

The other option would be to store each model twice - once in a "fat" format for use where required, and once in a "skinny" format, also for use where required. That way you'd get the benefits of interleaving where the "fat" format is required, but at a tradeoff of extra memory usage, and again it's something you'd need to profile and use your own judgement on in your own program (e.g. you may not be using much video RAM to begin with so the extra storage might be something you could easily afford to spend).

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

I think the current recommendation is to split vertex data into seperate buffers - http://developer.amd.com/wordpress/media/2013/04/DX11PerformanceReloaded.ppsx (slide 21).

they give the example of buffer1 = position, texcoord, and buffer2 = normal, tangent, etc. You might be able to change that for opaque geometry so that buffer1 would only contain position data and move texcoords to the second buffer.

They do mention to only bind 2-3 buffers at once, otherwise fetch performance can be negatively impacted. I think that basically means don't take it to the extreme and use a buffer per attribute.

It makes sense really, when you're rendering shadow maps, why fill the cache with data that won't be used (normal/tangent, UV's for opaque geom).

This topic is closed to new replies.

Advertisement