Jump to content

  • Log In with Google      Sign In   
  • Create Account

VBO what does GPU prefers?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
9 replies to this topic

#1 DemonRad   Members   -  Reputation: 290

Like
0Likes
Like

Posted 27 June 2012 - 03:37 PM

GPU prefers vertex arrays contiguos by vertex or by data type?

example 1:

Vertex1 Position
Vertex1 Color
Vertex1 Normal

Vertex2 Position
Vertex2 Color
Vertex2 Normal

Vertex3 Position
Vertex3 Color
Vertex3 Normal

Vertex4 Position
Vertex4 Color
Vertex4 Normal

Vertex5 Position
Vertex5 Color
Vertex5 Normal

Example 2:

Vertex1 Position
Vertex2 Position
Vertex3 Position
Vertex4 Position
Vertex5 Position

Vertex1 Color
Vertex2 Color
Vertex3 Color
Vertex4 Color
Vertex5 Color

Vertex1 Normal
Vertex2 Normal
Vertex3 Normal
Vertex4 Normal
Vertex5 Normal

so is worth for rendering speed using arrays without stride?

Peace and love, now I understand really what it means! Guardian Angels exist! Thanks!


Sponsor:

#2 mhagain   Crossbones+   -  Reputation: 8000

Like
1Likes
Like

Posted 27 June 2012 - 03:57 PM

Example 1 is preferred by GPUs, unless you've got software T&L in which case example 2 is best.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#3 dpadam450   Members   -  Reputation: 928

Like
1Likes
Like

Posted 27 June 2012 - 03:59 PM

example 2 is still super fast. I've been using it forever and just focused on other things since I don't need to nitpick for optimization yet. example 1 is going to give you some boost for sure. I would think in big scenes it would even be noticeable.

#4 DemonRad   Members   -  Reputation: 290

Like
0Likes
Like

Posted 27 June 2012 - 04:02 PM

ok thanks for fast answers :).. by the way "T&L" stands for?

Peace and love, now I understand really what it means! Guardian Angels exist! Thanks!


#5 clb   Members   -  Reputation: 1781

Like
4Likes
Like

Posted 27 June 2012 - 04:03 PM

The first example is called an 'interleaved' format, and the second example is called a 'planar' format.

I am under the belief that interleaved formats are always faster, and I cannot remember any source ever that would have recommended planar vertex buffer layout for GPU performance reasons. There are tons of sources that recommend using interleaved data, e.g. Apple OpenGL ES Best Practices documentation. On every platform with a GPU chip I have programmed for (PC, PSP, Nintendo DS, Android, iOS ..), interleaved data has been preferred.

There is one (possibly slight) benefit for planar data, namely that it compresses better on disk than interleaved data. It is a common data compression technique to group similar data together, since it allows compressors to detect similar data better. E.g. the crunch library takes advantage of this effect in the context of textures and reorders the internal on-disk memory layout to be planar before compression.
Me+PC=clb.demon.fi | C++ Math and Geometry library: MathGeoLib, test it live! | C++ Game Networking: kNet | 2D Bin Packing: RectangleBinPack | Use gcc/clang/emcc from VS: vs-tool | Resume+Portfolio | gfxapi, test it live!

#6 clb   Members   -  Reputation: 1781

Like
0Likes
Like

Posted 27 June 2012 - 04:06 PM

T&L means "Transform and Lighting", loosely translating to "CPU-based vertex shaders" in modern architectures. I'm somewhat skeptic that planar formats would be faster for that case either, only profiling will tell the best.
Me+PC=clb.demon.fi | C++ Math and Geometry library: MathGeoLib, test it live! | C++ Game Networking: kNet | 2D Bin Packing: RectangleBinPack | Use gcc/clang/emcc from VS: vs-tool | Resume+Portfolio | gfxapi, test it live!

#7 mhagain   Crossbones+   -  Reputation: 8000

Like
0Likes
Like

Posted 27 June 2012 - 04:40 PM

There's actually an old Intel doc which specifically cites the example 2 layout (what I'd call "streamed") as being more efficient, and calls out - but does not explicitly name - APIs which do not provide the capability to use this kind of layout as being inherently more inefficient. If you Google for some of the ancient API wars history you may come across a copy of it (I won't sully this thread by digging up direct links to some of the nonsense that went on back then, but if I do come across a link to the Intel doc I'll definitely provide it).

That doc must be viewed in the light of history. At the time it was written the per-vertex pipeline was predominantly handled in software by the driver, the API they call out (but do not name) has long since gained the ability to handle streamed layouts, and Intel - being a CPU company who only relatively recently added hardware T&L to their gfx chips - would naturally focus on something that would be more efficient when run on a CPU.

Also worth noting that streamed layout conforms to the "structure of arrays" design which still can be much more efficient in many cases (just not this one).

Edited by mhagain, 27 June 2012 - 04:41 PM.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#8 web383   Members   -  Reputation: 777

Like
4Likes
Like

Posted 28 June 2012 - 04:18 PM

Keep in mind that is it sometimes more appropriate to use separate streams. For example, if you need a depth-only pass, the positional data should be kept in it's own buffer and sent to the GPU separately during this pass.

#9 mhagain   Crossbones+   -  Reputation: 8000

Like
1Likes
Like

Posted 29 June 2012 - 06:06 AM

Keep in mind that is it sometimes more appropriate to use separate streams. For example, if you need a depth-only pass, the positional data should be kept in it's own buffer and sent to the GPU separately during this pass.


Quite true and it highlights the most important thing which is that there is no single absolute "best" layout that is going to be most suitable in all cases. You absolutely have to adapt to your program's requirements.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#10 tanzanite7   Members   -  Reputation: 1304

Like
0Likes
Like

Posted 01 July 2012 - 06:14 PM

... it highlights the most important thing which is that there is no single absolute "best" layout that is going to be most suitable in all cases. You absolutely have to adapt to your program's requirements.

Quite true. Often non-interleaved is not even much of a choice. I (relatively) recently had to shrink my primary vertex format (2 variants) from 32byte to 16byte due to memory consumption:
* 3*2B - vertex position (+ dangling attribute for full range)
* 2*1B - extra material data
* 4*1B / 2*2B - material data OR tex coord
* 4*1B - normal + unused byte OR quaternion (for reasonable tangent space approximation)

... not really reasonable to flatten that, especially as 3*2 attribute aligns badly.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS