VBO what does GPU prefers?

Graphics and GPU Programming Programming

Started by Dario Oliveri June 27, 2012 09:37 PM

8 comments, last by tanzanite7 11 years, 9 months ago

290

Author

June 27, 2012 09:37 PM

GPU prefers vertex arrays contiguos by vertex or by data type?

example 1:

Vertex1 Position
Vertex1 Color
Vertex1 Normal

Vertex2 Position
Vertex2 Color
Vertex2 Normal

Vertex3 Position
Vertex3 Color
Vertex3 Normal

Vertex4 Position
Vertex4 Color
Vertex4 Normal

Vertex5 Position
Vertex5 Color
Vertex5 Normal

Example 2:

Vertex1 Position
Vertex2 Position
Vertex3 Position
Vertex4 Position
Vertex5 Position

Vertex1 Color
Vertex2 Color
Vertex3 Color
Vertex4 Color
Vertex5 Color

Vertex1 Normal
Vertex2 Normal
Vertex3 Normal
Vertex4 Normal
Vertex5 Normal

so is worth for rendering speed using arrays without stride?

Peace and love, now I understand really what it means! Guardian Angels exist! Thanks!

21st Century Moose

13,459

June 27, 2012 09:57 PM

Example 1 is preferred by GPUs, unless you've got software T&L in which case example 2 is best.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

dpadam450

2,403

June 27, 2012 09:59 PM

example 2 is still super fast. I've been using it forever and just focused on other things since I don't need to nitpick for optimization yet. example 1 is going to give you some boost for sure. I would think in big scenes it would even be noticeable.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

Dario Oliveri

290

Author

June 27, 2012 10:02 PM

ok thanks for fast answers

.. by the way "T&L" stands for?

Peace and love, now I understand really what it means! Guardian Angels exist! Thanks!

clb

2,152

June 27, 2012 10:03 PM

The first example is called an 'interleaved' format, and the second example is called a 'planar' format.

I am under the belief that interleaved formats are always faster, and I cannot remember any source ever that would have recommended planar vertex buffer layout for GPU performance reasons. There are tons of sources that recommend using interleaved data, e.g. Apple OpenGL ES Best Practices documentation. On every platform with a GPU chip I have programmed for (PC, PSP, Nintendo DS, Android, iOS ..), interleaved data has been preferred.

There is one (possibly slight) benefit for planar data, namely that it compresses better on disk than interleaved data. It is a common data compression technique to group similar data together, since it allows compressors to detect similar data better. E.g. the crunch library takes advantage of this effect in the context of textures and reorders the internal on-disk memory layout to be planar before compression.

clb

2,152

June 27, 2012 10:06 PM

T&L means "Transform and Lighting", loosely translating to "CPU-based vertex shaders" in modern architectures. I'm somewhat skeptic that planar formats would be faster for that case either, only profiling will tell the best.

21st Century Moose

13,459

June 27, 2012 10:40 PM

There's actually an old Intel doc which specifically cites the example 2 layout (what I'd call "streamed") as being more efficient, and calls out - but does not explicitly name - APIs which do not provide the capability to use this kind of layout as being inherently more inefficient. If you Google for some of the ancient API wars history you may come across a copy of it (I won't sully this thread by digging up direct links to some of the nonsense that went on back then, but if I do come across a link to the Intel doc I'll definitely provide it).

That doc must be viewed in the light of history. At the time it was written the per-vertex pipeline was predominantly handled in software by the driver, the API they call out (but do not name) has long since gained the ability to handle streamed layouts, and Intel - being a CPU company who only relatively recently added hardware T&L to their gfx chips - would naturally focus on something that would be more efficient when run on a CPU.

Also worth noting that streamed layout conforms to the "structure of arrays" design which still can be much more efficient in many cases (just not this one).

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

web383

804

June 28, 2012 10:18 PM

Keep in mind that is it sometimes more appropriate to use separate streams. For example, if you need a depth-only pass, the positional data should be kept in it's own buffer and sent to the GPU separately during this pass.

21st Century Moose

13,459

June 29, 2012 12:06 PM

Keep in mind that is it sometimes more appropriate to use separate streams. For example, if you need a depth-only pass, the positional data should be kept in it's own buffer and sent to the GPU separately during this pass.

Quite true and it highlights the most important thing which is that there is no single absolute "best" layout that is going to be most suitable in all cases. You absolutely have to adapt to your program's requirements.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

tanzanite7

1,409

July 02, 2012 12:14 AM

... it highlights the most important thing which is that there is no single absolute "best" layout that is going to be most suitable in all cases. You absolutely have to adapt to your program's requirements.

Quite true. Often non-interleaved is not even much of a choice. I (relatively) recently had to shrink my primary vertex format (2 variants) from 32byte to 16byte due to memory consumption:
* 3*2B - vertex position (+ dangling attribute for full range)
* 2*1B - extra material data
* 4*1B / 2*2B - material data OR tex coord
* 4*1B - normal + unused byte OR quaternion (for reasonable tangent space approximation)

... not really reasonable to flatten that, especially as 3*2 attribute aligns badly.

VBO what does GPU prefers?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

VBO what does GPU prefers?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines