View more

View more

View more

### Image of the Day Submit

IOTD | Top Screenshots

### The latest, straight to your Inbox.

Subscribe to GameDev.net Direct to receive the latest updates and exclusive content.

# VertexBuffer performance issue. Idea for a strategy?

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

18 replies to this topic

### #1Wartime  Members

Posted 18 April 2012 - 09:39 AM

Hi there,

I'm new in this forum.
At our university we have to program a game with DirectX9.

Me and some other students wanted to program a Minecraft-like game, but without unlimited terrain (don't worry )
Now we have a problem with our performance.

Our strategy is, that we have a chunk with 16^3 blocks. We are going through all blocks and look if there is a neighbour above, in front, .....
If there is one we dont put the vertices and indices of this side of the cube into the buffer. This works really quick.

Now we made a class for a chunk. In this class we create the buffers and put in the vertices and indices and save them in a std::vector.

On rendering we fill the buffers with memcpy and draw the primitives.

If i try to draw 4 chunks, everything works fine with 60 fps. But if I try to draw more chunks (e.g. 64) the performance goes down to 8 fps.

I've added my source-code and wanted to ask for a strategy to improve the performance.

I hope you understand me (I'm german and my english isn't very well )

### #2Waterlimon  Members

Posted 18 April 2012 - 09:58 AM

Are you also removing covered faces between 2 chunks?

o3o

### #3Wartime  Members

Posted 18 April 2012 - 10:10 AM

Not directly.

I remove the faces bewteen two blocks, not chunks....

Another strategy i use (it's not in the source) is, that I calculate the Normal of the camera and where i look and compare it with the normals of the chunk.
So I only fill in vertices that are visible and not behind, but it doesn't increase the performance.

### #4jischneider  Members

Posted 18 April 2012 - 11:09 AM

I don’t have time right now to see your code and my DX9 knowledge is limited.

But if I understand you well you copy all your vertex data from your CPU to your GPU each frame. If that’s true then your performance will suffer. Instead, you could use dynamic vertex buffers. The problem is when you erase a random block; you should copy the entire chunk back to GPU.

Also, vertex processing is very fast in the GPU. So don’t focus too much in culling per vertex, only for chuncks.

The most important part is to communicate the least you can with your GPU.

Project page: < XNA FINAL Engine >

### #5Wartime  Members

Posted 18 April 2012 - 11:23 AM

Ok, thanks.

How does Dynamic Vertexbuffer work and how can I use it?

Is there an example or can anybody post some example code?

Thanks

### #6jischneider  Members

Posted 18 April 2012 - 11:31 AM

Why don’t you try first to copy only one time the buffers and see the performance? The scene will be static and won’t be any culling, of course. But if it runs fast then we have information about your bottleneck.

Dynamic buffers only copy to the GPU the information that you change; the problem is the lack of flexibility when you modified the buffer. Search in Google for a deep explanation.

Project page: < XNA FINAL Engine >

### #7Wartime  Members

Posted 18 April 2012 - 02:38 PM

I've found a bottelneck in my code.
I call SetTexture for every chunk
Now I'm calling it once and the performance is better.

I've searched for "Dynamic Vertex Buffers", but I don't understand it.

### #8Wartime  Members

Posted 18 April 2012 - 02:53 PM

Why don’t you try first to copy only one time the buffers and see the performance? The scene will be static and won’t be any culling, of course. But if it runs fast then we have information about your bottleneck.

If I fill the Buffers once and draw draw the primitives the program runs with 60 fps.
I think your right, that the bottleneck is copying the std::vectors into the buffers.

Have you got any idea to fix the problem with the performance issue?

### #9jischneider  Members

Posted 18 April 2012 - 03:28 PM

I've found a bottelneck in my code.
I call SetTexture for every chunk
Now I'm calling it once and the performance is better.

If I fill the Buffers once and draw draw the primitives the program runs with 60 fps.
I think your right, that the bottleneck is copying the std::vectors into the buffers.

Have you got any idea to fix the problem with the performance issue?

Both problems seems related with CPU-GPU communication.

Just copy the buffers when you do modifications. And only copy the chunk being altered.
Therefore:
Load Method: Create a set of chunks.
Update Method: If the player add or remove a block then redo the chunk affected
Render Method: just render the buffers.

If you need even more performance you can improve the update method with dynamic buffers.
Vertex buffers are arrays of information stored in the GPU memory. The problem is that access this memory is costly (for several reason). In consequence you should do the less communication possible. Dynamic buffers are like regular vertex buffers that can be altered with user commands. You are still doing a communication between CPU and GPU, but dynamic buffers allows you to do per vertex, so that less communication is need it.
One more thing, Is the system destroying the memory used for the previous buffers? Like I said I don’t know much of DX9 commands, so don’t ask how to know that.

Project page: < XNA FINAL Engine >

### #10mhagain  Members

Posted 18 April 2012 - 04:33 PM

Look for the "Performance Optimizations" article in your DXSDK; there's a section on "Using Dynamic Vertex and Index Buffers" that explains how this is done.

Personally I think your std::vector is contributing to your slowdown. Yes, I know the whole "don't use raw pointers/arrays in C++" thing, but dynamic vertex buffers are not intended to be used in this manner, so that part of your code could use some reworking.

The general usage is to Lock the buffer before you do anything. That will give you a pointer, and then you write your data directly into that pointer, following which you unlock. No std::vector, just use the pointer directly. This pattern will avoid any intermediate storage, avoid memory copies, avoid potential runtime memory allocations, and run faster as a result.

For optimal dynamic vertex buffer performance you should ensure that it's created in D3DPOOL_DEFAULT and has usage D3DUSAGE_DYNAMIC and D3DUSAGE_WRITEONLY.

When filling it make sure that you only append to the buffer. So you have a counter starting at 0, Lock from an offset of this counter * vertexsize and size of numverts * vertexsize with D3DLOCK_NOOVERWRITE. When you unlock add numverts to the counter.

If there is no room left in the buffer for your data you will instead Lock with D3DLOCK_DISCARD and offset and size 0, resetting the counter to 0.

Try to keep the number of Lock/Unlock pairs per-frame as low as possible. You should be able to know the number of verts you'll require beforehand, and Lock as much of the buffer as possible.

That should give you optimal performance with a dynamic vertex buffer, and rule that out as a possible cause of slowdowns.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.

### #11Wartime  Members

Posted 18 April 2012 - 11:29 PM

Here is my code to create a chunk:

void WorldChunk::createChunk()
{
vert_count = 0;
index_count = 0;
index_number = 0;
CUSTOMVERTEX* Vertices;
int* Indices;
cdevice->CreateVertexBuffer( 24 * 16 * 16 * 16 * sizeof( CUSTOMVERTEX ), D3DUSAGE_WRITEONLY, D3DFVF_CUSTOMVERTEX, D3DPOOL_DEFAULT, &VB, NULL );
cdevice->CreateIndexBuffer(36 * 16 * 16 * 16 *sizeof(int),D3DUSAGE_WRITEONLY,D3DFMT_INDEX32,D3DPOOL_DEFAULT,&IB,NULL);
VB->Lock( 0, 0, ( void** )&Vertices, D3DLOCK_DISCARD);
for(int x = 0; x < 16; x++)
{
for(int y = 0; y < 16; y++)
{
for(int z = 0; z < 16; z++)
{
//Ist da kein Block zeichnen wir den nicht...
if(chunk[x][y][z] == 0)
continue;
block_type = chunk[x][y][z];
//Befinden wir uns am linken Rand? Dann haben wir keinen linken Nachbarn ansonsten holen wir den aus dem Chunk-Array
//Dasselbe gilt für alle anderen Richtungen (Hab keine Lust, das für jede Abfrage zu wiederholen ;-) )
if(x > 0)
{
testblock = chunk[x-1][y][z];
}else{
testblock = 0;
}
if(testblock == 0)
{
Vertices[vert_count].position = D3DXVECTOR3(x,y,z+1);
Vertices[vert_count].tu = 0.0f+((float)(block_type-1))*0.25f;
Vertices[vert_count].tv = 1.0f;
ver.push_back(Vertices[vert_count]);
Indices[index_count] = index_number;
ind.push_back(Indices[index_count]);
index_count++;

Vertices[vert_count+1].position = D3DXVECTOR3(x,y+1,z+1);
Vertices[vert_count+1].tu = 0.0f+((float)(block_type-1))*0.25f;
Vertices[vert_count+1].tv = 0.0f;
ver.push_back(Vertices[vert_count+1]);
Indices[index_count] = index_number+1;
ind.push_back(Indices[index_count]);
index_count++;
Vertices[vert_count+2].position = D3DXVECTOR3(x,y,z);
Vertices[vert_count+2].tu = 0.25f+(block_type-1)*0.25f;
Vertices[vert_count+2].tv = 1.0f;
ver.push_back(Vertices[vert_count+2]);
Indices[index_count] = index_number+2;
ind.push_back(Indices[index_count]);
index_count++;
Indices[index_count] = index_number+2;
ind.push_back(Indices[index_count]);
index_count++;
Indices[index_count] = index_number+1;
ind.push_back(Indices[index_count]);
index_count++;
Vertices[vert_count+3].position = D3DXVECTOR3(x,y+1,z);
Vertices[vert_count+3].tu = 0.25f+(block_type-1)*0.25f;
Vertices[vert_count+3].tv = 0.0f;
ver.push_back(Vertices[vert_count+3]);
Indices[index_count] = index_number+3;
ind.push_back(Indices[index_count]);
index_count++;
index_number += 4;
vert_count += 4; //Erhöhe den Zähler um 6 , weil wir 6 Vertices gezeichnet haben...
}

if(x < 16-1)
{
testblock = chunk[x+1][y][z];
}else{
testblock = 0;
}
if(testblock == 0)
{
Vertices[vert_count].position = D3DXVECTOR3(x+1,y,z+1);
Vertices[vert_count].tu = 0.0f+(block_type-1)*0.25f;
Vertices[vert_count].tv = 1.0f;
ver.push_back(Vertices[vert_count]);
Indices[index_count] = index_number;
ind.push_back(Indices[index_count]);
index_count++;

Vertices[vert_count+1].position = D3DXVECTOR3(x+1,y+1,z+1);
Vertices[vert_count+1].tu = 0.0f+(block_type-1)*0.25f;
Vertices[vert_count+1].tv = 0.0f;
ver.push_back(Vertices[vert_count+1]);
Indices[index_count] = index_number+1;
ind.push_back(Indices[index_count]);
index_count++;
Vertices[vert_count+2].position = D3DXVECTOR3(x+1,y,z);
Vertices[vert_count+2].tu = 0.25f+(block_type-1)*0.25f;
Vertices[vert_count+2].tv = 1.0f;
ver.push_back(Vertices[vert_count+2]);
Indices[index_count] = index_number+2;
ind.push_back(Indices[index_count]);
index_count++;
Indices[index_count] = index_number+2;
ind.push_back(Indices[index_count]);
index_count++;
Indices[index_count] = index_number+1;
ind.push_back(Indices[index_count]);
index_count++;
Vertices[vert_count+3].position = D3DXVECTOR3(x+1,y+1,z);
Vertices[vert_count+3].tu = 0.25f+(block_type-1)*0.25f;
Vertices[vert_count+3].tv = 0.0f;
ver.push_back(Vertices[vert_count+3]);
Indices[index_count] = index_number+3;
ind.push_back(Indices[index_count]);
index_count++;
index_number += 4;
vert_count += 4; //Erhöhe den Zähler um 6 , weil wir 6 Vertices gezeichnet haben...
}
if(y > 0)
{
testblock = chunk[x][y-1][z];
}else{
testblock = 0;
}
if(testblock == 0)
{
Vertices[vert_count].position = D3DXVECTOR3(x,y,z);
Vertices[vert_count].tu = 0.0f+(block_type-1)*0.25f;
Vertices[vert_count].tv = 1.0f;
ver.push_back(Vertices[vert_count]);
Indices[index_count] = index_number;
ind.push_back(Indices[index_count]);
index_count++;

Vertices[vert_count+1].position = D3DXVECTOR3(x,y,z+1);
Vertices[vert_count+1].tu = 0.0f+(block_type-1)*0.25f;
Vertices[vert_count+1].tv = 0.0f;
ver.push_back(Vertices[vert_count+1]);
Indices[index_count] = index_number+1;
ind.push_back(Indices[index_count]);
index_count++;
Vertices[vert_count+2].position = D3DXVECTOR3(x+1,y,z);
Vertices[vert_count+2].tu = 0.25f+(block_type-1)*0.25f;
Vertices[vert_count+2].tv = 1.0f;
ver.push_back(Vertices[vert_count+2]);
Indices[index_count] = index_number+2;
ind.push_back(Indices[index_count]);
index_count++;
Indices[index_count] = index_number+2;
ind.push_back(Indices[index_count]);
index_count++;
Indices[index_count] = index_number+1;
ind.push_back(Indices[index_count]);
index_count++;
Vertices[vert_count+3].position = D3DXVECTOR3(x+1,y,z+1);
Vertices[vert_count+3].tu = 0.25f+(block_type-1)*0.25f;
Vertices[vert_count+3].tv = 0.0f;
ver.push_back(Vertices[vert_count+3]);
Indices[index_count] = index_number+3;
ind.push_back(Indices[index_count]);
index_count++;
index_number += 4;
vert_count += 4; //Erhöhe den Zähler um 6 , weil wir 6 Vertices gezeichnet haben...
}
if(y < 16-1)
{
testblock = chunk[x][y+1][z];
}else{
testblock = 0;
}
if(testblock == 0)
{
Vertices[vert_count].position = D3DXVECTOR3(x,y+1,z);
Vertices[vert_count].tu = 0.0f+(block_type-1)*0.25f;
Vertices[vert_count].tv = 1.0f;
ver.push_back(Vertices[vert_count]);
Indices[index_count] = index_number;
ind.push_back(Indices[index_count]);
index_count++;

Vertices[vert_count+1].position = D3DXVECTOR3(x,y+1,z+1);
Vertices[vert_count+1].tu = 0.0f+(block_type-1)*0.25f;
Vertices[vert_count+1].tv = 0.0f;
ver.push_back(Vertices[vert_count+1]);
Indices[index_count] = index_number+1;
ind.push_back(Indices[index_count]);
index_count++;
Vertices[vert_count+2].position = D3DXVECTOR3(x+1,y+1,z);
Vertices[vert_count+2].tu = 0.25f+(block_type-1)*0.25f;
Vertices[vert_count+2].tv = 1.0f;
ver.push_back(Vertices[vert_count+2]);
Indices[index_count] = index_number+2;
ind.push_back(Indices[index_count]);
index_count++;
Indices[index_count] = index_number+2;
ind.push_back(Indices[index_count]);
index_count++;
Indices[index_count] = index_number+1;
ind.push_back(Indices[index_count]);
index_count++;
Vertices[vert_count+3].position = D3DXVECTOR3(x+1,y+1,z+1);
Vertices[vert_count+3].tu = 0.25f+(block_type-1)*0.25f;
Vertices[vert_count+3].tv = 0.0f;
ver.push_back(Vertices[vert_count+3]);
Indices[index_count] = index_number+3;
ind.push_back(Indices[index_count]);
index_count++;
index_number += 4;
vert_count += 4; //Erhöhe den Zähler um 6 , weil wir 6 Vertices gezeichnet haben...
}
if(z > 0)
{
testblock = chunk[x][y][z-1];
}else{
testblock = 0;
}
if(testblock == 0)
{
Vertices[vert_count].position = D3DXVECTOR3(x,y,z);
Vertices[vert_count].tu = 0.0f+(block_type-1)*0.25f;
Vertices[vert_count].tv = 1.0f;
ver.push_back(Vertices[vert_count]);
Indices[index_count] = index_number;
ind.push_back(Indices[index_count]);
index_count++;

Vertices[vert_count+1].position = D3DXVECTOR3(x,y+1,z);
Vertices[vert_count+1].tu = 0.0f+(block_type-1)*0.25f;
Vertices[vert_count+1].tv = 0.0f;
ver.push_back(Vertices[vert_count+1]);
Indices[index_count] = index_number+1;
ind.push_back(Indices[index_count]);
index_count++;
Vertices[vert_count+2].position = D3DXVECTOR3(x+1,y,z);
Vertices[vert_count+2].tu = 0.25f+(block_type-1)*0.25f;
Vertices[vert_count+2].tv = 1.0f;
ver.push_back(Vertices[vert_count+2]);
Indices[index_count] = index_number+2;
ind.push_back(Indices[index_count]);
index_count++;
Indices[index_count] = index_number+2;
ind.push_back(Indices[index_count]);
index_count++;
Indices[index_count] = index_number+1;
ind.push_back(Indices[index_count]);
index_count++;
Vertices[vert_count+3].position = D3DXVECTOR3(x+1,y+1,z);
Vertices[vert_count+3].tu = 0.25f+(block_type-1)*0.25f;
Vertices[vert_count+3].tv = 0.0f;
ver.push_back(Vertices[vert_count+3]);
Indices[index_count] = index_number+3;
ind.push_back(Indices[index_count]);
index_count++;
index_number += 4;
vert_count += 4;//Erhöhe den Zähler um 6 , weil wir 6 Vertices gezeichnet haben...
}
if(z < 16-1)
{
testblock = chunk[x][y][z+1];
}else{
testblock = 0;
}
if(testblock == 0)
{
Vertices[vert_count].position = D3DXVECTOR3(x,y,z+1);
Vertices[vert_count].tu = 0.0f+(block_type-1)*0.25f;
Vertices[vert_count].tv = 1.0f;
ver.push_back(Vertices[vert_count]);
Indices[index_count] = index_number;
ind.push_back(Indices[index_count]);
index_count++;

Vertices[vert_count+1].position = D3DXVECTOR3(x,y+1,z+1);
Vertices[vert_count+1].tu = 0.0f+(block_type-1)*0.25f;
Vertices[vert_count+1].tv = 0.0f;
ver.push_back(Vertices[vert_count+1]);
Indices[index_count] = index_number+1;
ind.push_back(Indices[index_count]);
index_count++;
Vertices[vert_count+2].position = D3DXVECTOR3(x+1,y,z+1);
Vertices[vert_count+2].tu = 0.25f+(block_type-1)*0.25f;
Vertices[vert_count+2].tv = 1.0f;
ver.push_back(Vertices[vert_count+2]);
Indices[index_count] = index_number+2;
ind.push_back(Indices[index_count]);
index_count++;
Indices[index_count] = index_number+2;
ind.push_back(Indices[index_count]);
index_count++;
Indices[index_count] = index_number+1;
ind.push_back(Indices[index_count]);
index_count++;
Vertices[vert_count+3].position = D3DXVECTOR3(x+1,y+1,z+1);
Vertices[vert_count+3].tu = 0.25f+(block_type-1)*0.25f;
Vertices[vert_count+3].tv = 0.0f;
ver.push_back(Vertices[vert_count+3]);
Indices[index_count] = index_number+3;
ind.push_back(Indices[index_count]);
index_count++;
index_number += 4;
vert_count += 4; //Erhöhe den Zähler um 6 , weil wir 6 Vertices gezeichnet haben...
}
}
}
}
//****************************************************************************************************************
IB->Unlock();
VB->Unlock();


If it's created i use the vectors to "quick fill" the buffers in the next frame (the vertices don't change)

void WorldChunk::QuickFill()
{
void* vv = NULL;
void* ii = NULL;
cdevice->CreateVertexBuffer( ver.size()*sizeof(CUSTOMVERTEX),D3DUSAGE_DYNAMIC, D3DFVF_CUSTOMVERTEX, D3DPOOL_DEFAULT, &VB, NULL );
cdevice->CreateIndexBuffer(ind.size()*sizeof(int),D3DUSAGE_DYNAMIC,D3DFMT_INDEX32,D3DPOOL_DEFAULT,&IB,NULL);
memcpy(vv,&ver[0],ver.size()*sizeof(CUSTOMVERTEX));
VB->Unlock();
memcpy(ii,&(ind[0]),ind.size()*sizeof(int));
IB->Unlock();
}


### #12NightCreature83  Members

Posted 19 April 2012 - 12:59 AM

If the verts don't change you do not need to update the vertex buffer, and for what you are doing you really want to use a dynamic vertex buffer. Basically any mesh that updates fairly frequently should be stored in a dynamic vertex buffer, any mesh that doesn't should be in a static one.

One more thing that will help you is to not update the vertex buffers when they haven't changed from the last frame, the fastest data you send to the GPU is data you never send. Rendering with the same vertex buffer as the last frame when it hasn't changed will be the same as sending the same buffer again.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max

### #13mhagain  Members

Posted 19 April 2012 - 04:03 AM

If you're calling CreateVertexBuffer and CreateIndexBuffer every frame, that explains why things slow down. Object creation is an expensive operation and should only be done during startup. In this particular case, you could use DrawPrimitiveUP instead of DrawPrimitive and it would run a lot faster - although completely reworking your code to use vertex buffers properly would be the real solution.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.

### #14Wartime  Members

Posted 19 April 2012 - 06:31 AM

Thank you both.

If I understand you the solution is:
• Create the Vertex and Idexbuffer once on Startup.
• Fill the buffers until they are full
• Draw the Primitives
• Clear Buffers
• If there are more Vertices go to 2

If this ist right, I've got another question:

What is if I walk around on the map.
The chunks i view change permanent if I turn around or go forward for a long time. So I have to change the buffer-content all the time.
How can I do this? Or is there another solution.

### #15NightCreature83  Members

Posted 19 April 2012 - 06:43 AM

You got the idea yes, but you need to only fill the buffers when they change no change no update. Change happens either when a chunk comes into the view area or leaves it, or when a block in a visible chunk changes.

Also it isn't bad to have an in system memory buffer of the vertices in the list, it's just that you only send this list when the chunk is visible or a change has happened to it.
class Chunk
{
public:
void update()
{
//if you add or remove blocks from this chunk mark m_dirty = true so that you reupload the vb and ib
}
private:
bool m_dirty; //Only update the render buffers when this flag is set.
std::vector<Vertex> m_localChunkVertexData; //Change this in the update function and you need to reupload them to the GPU, but only change it when it is actuall there.
std::vector<unsigned int> m_localIndexData;
}


This will allow you to change the vertex list without having to lock the vertex or index buffers untill you are ready to upload to the device. Those vectors can also be local update function members which you write to the VB and IB once you have filled them out with the update you wanted.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max

### #16Wartime  Members

Posted 19 April 2012 - 09:05 AM

OK,

Still one question:

Situation:
I filled the buffer until it's full. Now I draw the Vertices and flush the Buffer.
Second thing is that I fill it again with other vertices (because the buffer was full) and render.

If I re-render the frame (nothing has changed) i have to fill the buffer twice.
Once with the first data and then with the second to redraw all vertices, or?

### #17mhagain  Members

Posted 19 April 2012 - 10:13 AM

One option is to just create a bigger buffer - make it large enough to hold data for an entire frame worth of drawing, and don't bother worrying about this.

That may not always be possible. Depending on how much you're drawing a full frame's worth of data may be too much. In that case don't worry about it either - just fill and flush the buffer as you need.

The important thing to remember is that there is no guaranteed one-size-fits-all approach to this. Depending on your application's needs you'll be making adjustments to the recommended basic approach. Sometimes you'll keep a system memory copy, sometimes you won't, sometimes you don't bother refilling the buffer if data doesn't change, sometimes it's not that big a deal and is cheaper to just fill the buffer anyway, and sometimes using a group of smaller static vertex buffers is preferable to using one big dynamic buffer.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.

### #18NightCreature83  Members

Posted 19 April 2012 - 11:43 AM

One option is to just create a bigger buffer - make it large enough to hold data for an entire frame worth of drawing, and don't bother worrying about this.

That may not always be possible. Depending on how much you're drawing a full frame's worth of data may be too much. In that case don't worry about it either - just fill and flush the buffer as you need.

The important thing to remember is that there is no guaranteed one-size-fits-all approach to this. Depending on your application's needs you'll be making adjustments to the recommended basic approach. Sometimes you'll keep a system memory copy, sometimes you won't, sometimes you don't bother refilling the buffer if data doesn't change, sometimes it's not that big a deal and is cheaper to just fill the buffer anyway, and sometimes using a group of smaller static vertex buffers is preferable to using one big dynamic buffer.

With VB's and IB's the trick is to find the best batch size that works across the set of cards you want to support. I am making a maze crawling game with a lot of 4 verts squares which make up the wall, when I submitted these as seperate drawcalls my performance tanked massively with less then 100.000 verts on screen. When I batched it up into a single vertex buffer performance jumped back to a solid 60 (vsync) on a HD4850.

A good rule of thumb is to try and get about 10.000 verts per vertex buffer if you have massive amounts of vertices to draw, this number can change according to situation and profiling ofcourse.

GPU's are bad at drawing buffers with a low amount of verts in them as most of the card is doing nothing then, but go over a threshold and performance dies as well as the card is too busy to deal with all the data you give it. It's a balance you have to find through some trial and error and profiling.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max

### #19Wartime  Members

Posted 25 April 2012 - 09:38 AM