Jump to content
  • Advertisement
Sign in to follow this  
Wartime

VertexBuffer performance issue. Idea for a strategy?

This topic is 2245 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi there,

I'm new in this forum.
At our university we have to program a game with DirectX9.

Me and some other students wanted to program a Minecraft-like game, but without unlimited terrain (don't worry wink.png )
Now we have a problem with our performance.

Our strategy is, that we have a chunk with 16^3 blocks. We are going through all blocks and look if there is a neighbour above, in front, .....
If there is one we dont put the vertices and indices of this side of the cube into the buffer. This works really quick.

Now we made a class for a chunk. In this class we create the buffers and put in the vertices and indices and save them in a std::vector.

On rendering we fill the buffers with memcpy and draw the primitives.

If i try to draw 4 chunks, everything works fine with 60 fps. But if I try to draw more chunks (e.g. 64) the performance goes down to 8 fps.

I've added my source-code and wanted to ask for a strategy to improve the performance.

Source: [attachment=8313:Kubos.zip]

I hope you understand me (I'm german and my english isn't very well biggrin.png )

Share this post


Link to post
Share on other sites
Advertisement
Not directly.

I remove the faces bewteen two blocks, not chunks....

Another strategy i use (it's not in the source) is, that I calculate the Normal of the camera and where i look and compare it with the normals of the chunk.
So I only fill in vertices that are visible and not behind, but it doesn't increase the performance.

Share this post


Link to post
Share on other sites
I don’t have time right now to see your code and my DX9 knowledge is limited.

But if I understand you well you copy all your vertex data from your CPU to your GPU each frame. If that’s true then your performance will suffer. Instead, you could use dynamic vertex buffers. The problem is when you erase a random block; you should copy the entire chunk back to GPU.

Also, vertex processing is very fast in the GPU. So don’t focus too much in culling per vertex, only for chuncks.

The most important part is to communicate the least you can with your GPU.

Share this post


Link to post
Share on other sites
Ok, thanks.

How does Dynamic Vertexbuffer work and how can I use it?

Is there an example or can anybody post some example code?

Thanks

Share this post


Link to post
Share on other sites
Why don’t you try first to copy only one time the buffers and see the performance? The scene will be static and won’t be any culling, of course. But if it runs fast then we have information about your bottleneck.

Dynamic buffers only copy to the GPU the information that you change; the problem is the lack of flexibility when you modified the buffer. Search in Google for a deep explanation.

Share this post


Link to post
Share on other sites
I've found a bottelneck in my code.
I call SetTexture for every chunk rolleyes.gif
Now I'm calling it once and the performance is better.

I've searched for "Dynamic Vertex Buffers", but I don't understand it.

Share this post


Link to post
Share on other sites

Why don’t you try first to copy only one time the buffers and see the performance? The scene will be static and won’t be any culling, of course. But if it runs fast then we have information about your bottleneck.


If I fill the Buffers once and draw draw the primitives the program runs with 60 fps.
I think your right, that the bottleneck is copying the std::vectors into the buffers.

Have you got any idea to fix the problem with the performance issue?

Share this post


Link to post
Share on other sites

I've found a bottelneck in my code.
I call SetTexture for every chunk rolleyes.gif
Now I'm calling it once and the performance is better.



If I fill the Buffers once and draw draw the primitives the program runs with 60 fps.
I think your right, that the bottleneck is copying the std::vectors into the buffers.

Have you got any idea to fix the problem with the performance issue?


Both problems seems related with CPU-GPU communication.

Just copy the buffers when you do modifications. And only copy the chunk being altered.
Therefore:
Load Method: Create a set of chunks.
Update Method: If the player add or remove a block then redo the chunk affected
Render Method: just render the buffers.

If you need even more performance you can improve the update method with dynamic buffers.
Vertex buffers are arrays of information stored in the GPU memory. The problem is that access this memory is costly (for several reason). In consequence you should do the less communication possible. Dynamic buffers are like regular vertex buffers that can be altered with user commands. You are still doing a communication between CPU and GPU, but dynamic buffers allows you to do per vertex, so that less communication is need it.
One more thing, Is the system destroying the memory used for the previous buffers? Like I said I don’t know much of DX9 commands, so don’t ask how to know that.

Share this post


Link to post
Share on other sites
Look for the "Performance Optimizations" article in your DXSDK; there's a section on "Using Dynamic Vertex and Index Buffers" that explains how this is done.

Personally I think your std::vector is contributing to your slowdown. Yes, I know the whole "don't use raw pointers/arrays in C++" thing, but dynamic vertex buffers are not intended to be used in this manner, so that part of your code could use some reworking.

The general usage is to Lock the buffer before you do anything. That will give you a pointer, and then you write your data directly into that pointer, following which you unlock. No std::vector, just use the pointer directly. This pattern will avoid any intermediate storage, avoid memory copies, avoid potential runtime memory allocations, and run faster as a result.

For optimal dynamic vertex buffer performance you should ensure that it's created in D3DPOOL_DEFAULT and has usage D3DUSAGE_DYNAMIC and D3DUSAGE_WRITEONLY.

When filling it make sure that you only append to the buffer. So you have a counter starting at 0, Lock from an offset of this counter * vertexsize and size of numverts * vertexsize with D3DLOCK_NOOVERWRITE. When you unlock add numverts to the counter.

If there is no room left in the buffer for your data you will instead Lock with D3DLOCK_DISCARD and offset and size 0, resetting the counter to 0.

Try to keep the number of Lock/Unlock pairs per-frame as low as possible. You should be able to know the number of verts you'll require beforehand, and Lock as much of the buffer as possible.

That should give you optimal performance with a dynamic vertex buffer, and rule that out as a possible cause of slowdowns.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!