VBO what does GPU prefers?

Started by
8 comments, last by tanzanite7 11 years, 9 months ago
GPU prefers vertex arrays contiguos by vertex or by data type?

example 1:

Vertex1 Position
Vertex1 Color
Vertex1 Normal

Vertex2 Position
Vertex2 Color
Vertex2 Normal

Vertex3 Position
Vertex3 Color
Vertex3 Normal

Vertex4 Position
Vertex4 Color
Vertex4 Normal

Vertex5 Position
Vertex5 Color
Vertex5 Normal

Example 2:

Vertex1 Position
Vertex2 Position
Vertex3 Position
Vertex4 Position
Vertex5 Position

Vertex1 Color
Vertex2 Color
Vertex3 Color
Vertex4 Color
Vertex5 Color

Vertex1 Normal
Vertex2 Normal
Vertex3 Normal
Vertex4 Normal
Vertex5 Normal

so is worth for rendering speed using arrays without stride?

Peace and love, now I understand really what it means! Guardian Angels exist! Thanks!

Advertisement
Example 1 is preferred by GPUs, unless you've got software T&L in which case example 2 is best.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

example 2 is still super fast. I've been using it forever and just focused on other things since I don't need to nitpick for optimization yet. example 1 is going to give you some boost for sure. I would think in big scenes it would even be noticeable.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

ok thanks for fast answers :).. by the way "T&L" stands for?

Peace and love, now I understand really what it means! Guardian Angels exist! Thanks!

The first example is called an 'interleaved' format, and the second example is called a 'planar' format.

I am under the belief that interleaved formats are always faster, and I cannot remember any source ever that would have recommended planar vertex buffer layout for GPU performance reasons. There are tons of sources that recommend using interleaved data, e.g. Apple OpenGL ES Best Practices documentation. On every platform with a GPU chip I have programmed for (PC, PSP, Nintendo DS, Android, iOS ..), interleaved data has been preferred.

There is one (possibly slight) benefit for planar data, namely that it compresses better on disk than interleaved data. It is a common data compression technique to group similar data together, since it allows compressors to detect similar data better. E.g. the crunch library takes advantage of this effect in the context of textures and reorders the internal on-disk memory layout to be planar before compression.
T&L means "Transform and Lighting", loosely translating to "CPU-based vertex shaders" in modern architectures. I'm somewhat skeptic that planar formats would be faster for that case either, only profiling will tell the best.
There's actually an old Intel doc which specifically cites the example 2 layout (what I'd call "streamed") as being more efficient, and calls out - but does not explicitly name - APIs which do not provide the capability to use this kind of layout as being inherently more inefficient. If you Google for some of the ancient API wars history you may come across a copy of it (I won't sully this thread by digging up direct links to some of the nonsense that went on back then, but if I do come across a link to the Intel doc I'll definitely provide it).

That doc must be viewed in the light of history. At the time it was written the per-vertex pipeline was predominantly handled in software by the driver, the API they call out (but do not name) has long since gained the ability to handle streamed layouts, and Intel - being a CPU company who only relatively recently added hardware T&L to their gfx chips - would naturally focus on something that would be more efficient when run on a CPU.

Also worth noting that streamed layout conforms to the "structure of arrays" design which still can be much more efficient in many cases (just not this one).

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Keep in mind that is it sometimes more appropriate to use separate streams. For example, if you need a depth-only pass, the positional data should be kept in it's own buffer and sent to the GPU separately during this pass.

Keep in mind that is it sometimes more appropriate to use separate streams. For example, if you need a depth-only pass, the positional data should be kept in it's own buffer and sent to the GPU separately during this pass.


Quite true and it highlights the most important thing which is that there is no single absolute "best" layout that is going to be most suitable in all cases. You absolutely have to adapt to your program's requirements.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.


... it highlights the most important thing which is that there is no single absolute "best" layout that is going to be most suitable in all cases. You absolutely have to adapt to your program's requirements.

Quite true. Often non-interleaved is not even much of a choice. I (relatively) recently had to shrink my primary vertex format (2 variants) from 32byte to 16byte due to memory consumption:
* 3*2B - vertex position (+ dangling attribute for full range)
* 2*1B - extra material data
* 4*1B / 2*2B - material data OR tex coord
* 4*1B - normal + unused byte OR quaternion (for reasonable tangent space approximation)

... not really reasonable to flatten that, especially as 3*2 attribute aligns badly.

This topic is closed to new replies.

Advertisement