VBO what does GPU prefers?
#1 Members - Reputation: 163
Posted 27 June 2012 - 03:37 PM
example 1:
Vertex1 Position
Vertex1 Color
Vertex1 Normal
Vertex2 Position
Vertex2 Color
Vertex2 Normal
Vertex3 Position
Vertex3 Color
Vertex3 Normal
Vertex4 Position
Vertex4 Color
Vertex4 Normal
Vertex5 Position
Vertex5 Color
Vertex5 Normal
Example 2:
Vertex1 Position
Vertex2 Position
Vertex3 Position
Vertex4 Position
Vertex5 Position
Vertex1 Color
Vertex2 Color
Vertex3 Color
Vertex4 Color
Vertex5 Color
Vertex1 Normal
Vertex2 Normal
Vertex3 Normal
Vertex4 Normal
Vertex5 Normal
so is worth for rendering speed using arrays without stride?
#2 Members - Reputation: 4028
Posted 27 June 2012 - 03:57 PM
It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.
#3 Members - Reputation: 551
Posted 27 June 2012 - 03:59 PM
#5 Members - Reputation: 1603
Posted 27 June 2012 - 04:03 PM
I am under the belief that interleaved formats are always faster, and I cannot remember any source ever that would have recommended planar vertex buffer layout for GPU performance reasons. There are tons of sources that recommend using interleaved data, e.g. Apple OpenGL ES Best Practices documentation. On every platform with a GPU chip I have programmed for (PC, PSP, Nintendo DS, Android, iOS ..), interleaved data has been preferred.
There is one (possibly slight) benefit for planar data, namely that it compresses better on disk than interleaved data. It is a common data compression technique to group similar data together, since it allows compressors to detect similar data better. E.g. the crunch library takes advantage of this effect in the context of textures and reorders the internal on-disk memory layout to be planar before compression.
#6 Members - Reputation: 1603
Posted 27 June 2012 - 04:06 PM
#7 Members - Reputation: 4028
Posted 27 June 2012 - 04:40 PM
That doc must be viewed in the light of history. At the time it was written the per-vertex pipeline was predominantly handled in software by the driver, the API they call out (but do not name) has long since gained the ability to handle streamed layouts, and Intel - being a CPU company who only relatively recently added hardware T&L to their gfx chips - would naturally focus on something that would be more efficient when run on a CPU.
Also worth noting that streamed layout conforms to the "structure of arrays" design which still can be much more efficient in many cases (just not this one).
Edited by mhagain, 27 June 2012 - 04:41 PM.
It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.
#9 Members - Reputation: 4028
Posted 29 June 2012 - 06:06 AM
Keep in mind that is it sometimes more appropriate to use separate streams. For example, if you need a depth-only pass, the positional data should be kept in it's own buffer and sent to the GPU separately during this pass.
Quite true and it highlights the most important thing which is that there is no single absolute "best" layout that is going to be most suitable in all cases. You absolutely have to adapt to your program's requirements.
It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.
#10 Members - Reputation: 604
Posted 01 July 2012 - 06:14 PM
Quite true. Often non-interleaved is not even much of a choice. I (relatively) recently had to shrink my primary vertex format (2 variants) from 32byte to 16byte due to memory consumption:... it highlights the most important thing which is that there is no single absolute "best" layout that is going to be most suitable in all cases. You absolutely have to adapt to your program's requirements.
* 3*2B - vertex position (+ dangling attribute for full range)
* 2*1B - extra material data
* 4*1B / 2*2B - material data OR tex coord
* 4*1B - normal + unused byte OR quaternion (for reasonable tangent space approximation)
... not really reasonable to flatten that, especially as 3*2 attribute aligns badly.






