Sign in to follow this  

Technical question: Does it decrease the performance when I distribute Vertex buffer?

This topic is 3492 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

In order to organize my code and make it easier to support any shader in my engine, I want to distribute the vertex buffer of a single model into multiple vertex buffers, one for position info, an other for normal and one for UV, etc... I really wanted to do that because I wanna read the vertex shader input structure, and pass exactly what it needs, and I will lock the position and normal buffers when applying skeletal animation on the model, but will never lock the buffer of UV... The question is: Does it decrease the performance when I distribute vertex buffers like that?

Share this post


Link to post
Share on other sites
There's an article on the topic in the book "GPU Gems 2".

Having multiple vertex buffers is more often than not the only way to go. The DXSDK doc says that a vertex structure should ideally be 32 bytes in size, meaning that if your vertex structure is smaller than that, you should pad it to 32 bytes and if it is larger than that, you should split it up into multiple 32 byte structures.
However, you shouldn't overdo it because it can in fact hurt performance. I currently use a structure that is very similar to the one suggested in GPU Gems 2 that is:


struct Vertex1
{
float3 position;
float3 normal;
color color0;
color color1;
}

struct Vertex2
{
float2 texCoord0;
float2 texCoord1;
float2 texCoord2; // texCoord2 and texCoord3 hold the tangent vector
float2 texCoord3; // for normal mapped meshes
}

struct Vertex3
{
float blendWeights[4];
float blendIndices[4];
}


Most meshes are made up of Vertex1 and Vertex2, skinned meshes also have a Vertex3 buffer.

Share this post


Link to post
Share on other sites
Thanks Harry Hunt,

Note: the skeletal animation in the engine is calculated by CPU not GPU, so I don't have blendWeights and blendIndices.

The point is: I don't want a constant vertex structure, I want it to be dynamic as it can, I mean the model may be rendered by a textured lighting shader that needs position, normal and UV, another mesh can be rendered by a fixed color shader that needs only position info, and a third mesh can be rendered with a normal mapped reflection shader which needs position, normal, tangent and UV info.

I wanted to read the vertex shader input structure and upload EXACTLY what it needs to the GPU, to preserve GPU's RAM and bus traffic..

but if it decreases performance, I may make two buffers: one for the info that changes in runtime (Pos, normal, tangent...) and an other for the info that doesn't change (UV...)..

so what about it???

and i still want to know howmuch does it decreases performance??? (benchmarking, etc...)

Share this post


Link to post
Share on other sites
It's usually a good idea to find a compromise between flexibility and performance. You could have a dozen different vertex formats, but that would probably lead to poor batching. Also, when using vertices != 32 bytes, the cache performance will suffer severely. I'm guessing that bus traffic would be the least of your concerns here.
Dividing your buffers into data that changes at runtime and data that never changes is a good idea.

I haven't done any benchmarks on how much having multiple vertex buffers degrades performance, but I think GPU Gems 2 mentions that it does (not how much though).

Share this post


Link to post
Share on other sites
Quote:
Original post by Harry Hunt
I haven't done any benchmarks on how much having multiple vertex buffers degrades performance, but I think GPU Gems 2 mentions that it does (not how much though).
I did that some time ago. Keeping the number of VBs low and streaming from less VBs had a benefit but I am cautious in saying how much it was. In some cases it was quite faster, in some other it was "more or less the same"... I never had the time to perform more involved tests but In general I went to break my buffers by usage only.

Share this post


Link to post
Share on other sites

This topic is 3492 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this