Sign in to follow this  
Dario Oliveri

VBO what does GPU prefers?

Recommended Posts

Dario Oliveri    290
GPU prefers vertex arrays contiguos by vertex or by data type?

example 1:

Vertex1 Position
Vertex1 Color
Vertex1 Normal

Vertex2 Position
Vertex2 Color
Vertex2 Normal

Vertex3 Position
Vertex3 Color
Vertex3 Normal

Vertex4 Position
Vertex4 Color
Vertex4 Normal

Vertex5 Position
Vertex5 Color
Vertex5 Normal

Example 2:

Vertex1 Position
Vertex2 Position
Vertex3 Position
Vertex4 Position
Vertex5 Position

Vertex1 Color
Vertex2 Color
Vertex3 Color
Vertex4 Color
Vertex5 Color

Vertex1 Normal
Vertex2 Normal
Vertex3 Normal
Vertex4 Normal
Vertex5 Normal

so is worth for rendering speed using arrays without stride?

Share this post


Link to post
Share on other sites
dpadam450    2357
example 2 is still super fast. I've been using it forever and just focused on other things since I don't need to nitpick for optimization yet. example 1 is going to give you some boost for sure. I would think in big scenes it would even be noticeable.

Share this post


Link to post
Share on other sites
clb    2147
The first example is called an 'interleaved' format, and the second example is called a 'planar' format.

I am under the belief that interleaved formats are always faster, and I cannot remember any source [b]ever[/b] that would have recommended planar vertex buffer layout for GPU performance reasons. There are tons of sources that recommend using interleaved data, e.g. [url="http://developer.apple.com/library/ios/#documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/TechniquesforWorkingwithVertexData/TechniquesforWorkingwithVertexData.html"]Apple OpenGL ES Best Practices[/url] documentation. On every platform with a GPU chip I have programmed for (PC, PSP, Nintendo DS, Android, iOS ..), interleaved data has been preferred.

There is one (possibly slight) benefit for planar data, namely that it compresses better on disk than interleaved data. It is a common data compression technique to group similar data together, since it allows compressors to detect similar data better. E.g. the [url="http://code.google.com/p/crunch/"]crunch library[/url] takes advantage of this effect in the context of textures and reorders the internal on-disk memory layout to be planar before compression.

Share this post


Link to post
Share on other sites
clb    2147
T&L means "Transform and Lighting", loosely translating to "CPU-based vertex shaders" in modern architectures. I'm somewhat skeptic that planar formats would be faster for that case either, only profiling will tell the best.

Share this post


Link to post
Share on other sites
mhagain    13430
There's actually an old Intel doc which specifically cites the example 2 layout (what I'd call "streamed") as being more efficient, and calls out - but does not explicitly name - APIs which do not provide the capability to use this kind of layout as being inherently more inefficient. If you Google for some of the ancient API wars history you may come across a copy of it (I won't sully this thread by digging up direct links to some of the nonsense that went on back then, but if I do come across a link to the Intel doc I'll definitely provide it).

That doc must be viewed in the light of history. At the time it was written the per-vertex pipeline was predominantly handled in software by the driver, the API they call out (but do not name) has long since gained the ability to handle streamed layouts, and Intel - being a CPU company who only relatively recently added hardware T&L to their gfx chips - would naturally focus on something that would be more efficient when run on a CPU.

Also worth noting that streamed layout conforms to the "structure of arrays" design which still [i]can[/i] be [i]much[/i] more efficient in many cases (just not this one). Edited by mhagain

Share this post


Link to post
Share on other sites
web383    804
Keep in mind that is it sometimes more appropriate to use separate streams. For example, if you need a depth-only pass, the positional data should be kept in it's own buffer and sent to the GPU separately during this pass.

Share this post


Link to post
Share on other sites
mhagain    13430
[quote name='web383' timestamp='1340921894' post='4953776']
Keep in mind that is it sometimes more appropriate to use separate streams. For example, if you need a depth-only pass, the positional data should be kept in it's own buffer and sent to the GPU separately during this pass.
[/quote]

Quite true and it highlights the most important thing which is that there is no single absolute "best" layout that is going to be most suitable in all cases. You absolutely [i]have[/i] to adapt to your program's requirements.

Share this post


Link to post
Share on other sites
tanzanite7    1410
[quote name='mhagain' timestamp='1340971570' post='4953933']
... it highlights the most important thing which is that there is no single absolute "best" layout that is going to be most suitable in all cases. You absolutely [i]have[/i] to adapt to your program's requirements.
[/quote]
Quite true. Often non-interleaved is not even much of a choice. I (relatively) recently had to shrink my primary vertex format (2 variants) from 32byte to 16byte due to memory consumption:
* 3*2B - vertex position (+ dangling attribute for full range)
* 2*1B - extra material data
* 4*1B / 2*2B - material data OR tex coord
* 4*1B - normal + unused byte OR quaternion (for reasonable tangent space approximation)

... not really reasonable to flatten that, especially as 3*2 attribute aligns badly.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this