# How to optimize here?

This topic is 3082 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hello,
I've divided my terrain into chunks. I have a really bad performance now and I think it is because of the many draw calls. Here is a picture whihc explains my situation:

I've made the different chunks visible here. The reason why I did the "chunking" is that the terrain can get modified by bombs or similar exploding things ingame. The regeneration of the whole mesh (Gets computed via PolyVox out of voxels) takes some time .At least so much that you feel the impact in the FPS in release mode, in debug mode it takes up to 10 seconds. With the chunks, the performance when regenerating some of the chunks only is quite good, even in debug mode. Because I can simply let out the untouched chunks.

Is it a good practice to put all vertices of all chunks into a new vertex buffer together after one of them got modified? Is that fast? Or is there another solution? I guess I can't use instancing here because the meshes are different and a displacement map isn't working here.

Any thoughts?

(Edit: It may looks like VSync, but you'll have to know that I get ~400fps without the terrain)

##### Share on other sites
There is an old DX9 rule that says 500 draw calls per 1 GHz CPU. As CPUs has somewhat improved you can do some more draw calls but in general the situation is not changed for DX9.

Base on the values on your screenshot you are draw just ~50 polys per draw call. You should really aim for significant higher values (> 1000). => use larger chunks.

You can still do the recalculation on smaller parts on just update the part of the vertex buffer that contains the changes. It’s not perfect but better than having many small vertex buffers as there is an overhead for each of them.

Beside of this it looks like you are drawing chunks that could never be visible as they are complete surrounded by others. You should try to eliminate them during generation.

##### Share on other sites
Quote:
 There is an old DX9 rule that says 500 draw calls per 1 GHz CPU. As CPUs has somewhat improved you can do some more draw calls but in general the situation is not changed for DX9.

I use DX11 btw, but when I look at my FPS with the terrain I guess that rule also applies for DX11. ;)

Quote:
 You can still do the recalculation on smaller parts on just update the part of the vertex buffer that contains the changes. It’s not perfect but better than having many small vertex buffers as there is an overhead for each of them.

How Do I do that? I mean, lets say I have a vertex buffer which holds 4 chunks at once, and chunk 2 changes. Do I have to re-feed the whole buffer then (With old data from chunk 1,3,4 and the new from 2)? Is that fast?

Quote:
 Beside of this it looks like you are drawing chunks that could never be visible as they are complete surrounded by others. You should try to eliminate them during generation.

Adding this to my rendering loop
if(Meshes[Subset-1]->GetVertexCount()==0){	continue;}

Gained me about 100fps! Now I'm confused. (Still ~150 draw calls and somewhat low FPS for drawing just one terrain)

##### Share on other sites
Don’t measure FPS measure ms/frame. It’s a much better value to compare.

The situation for DX11 is somewhat improved but there are other ways to kill your performances. You can easily shoot yourself in the foot by wrong usage of constant buffers. Maybe you should check this, too.

If you use large vertex buffer you can just update the areas for the chunk that change. Then use a index buffer to generate a larger chunk from the small once. This way you only need to refill our index buffer which is much smaller.

Updating the data in a resource is always faster then creating a new one. Therefore you should create your terrain buffer somewhat larger to have room for changes. This way you only need to reallocate if the changes become to huge.

##### Share on other sites
Quote:
 Original post by DemirugThe situation for DX11 is somewhat improved but there are other ways to kill your performances. You can easily shoot yourself in the foot by wrong usage of constant buffers. Maybe you should check this, too.

Which wrong usage could that be? I have around 5 constant buffers in the shader...

Quote:
 If you use large vertex buffer you can just update the areas for the chunk that change. Then use a index buffer to generate a larger chunk from the small once. This way you only need to refill our index buffer which is much smaller.

That is the part I don't understand. Lets say I have 2 chunks that I want to put into the buffer. So, I allocate a buffer which is big enough to hold both meshes + some extra space. Lets say Mesh A has 20 vertices and mesh B has 10 vertices.

So I allocate a buffer with space for 40 vertices for example.
At the first time I simply put both meshes into the buffer:

Buffer:0..20: Mesh A20..30: Mesh B30..40: Empty

Now Mesh A gets 5 more vertices.

Buffer:0..25: Mesh A25..35: Mesh B 35..40: Empty

Do I have to shift the next data 5 places around to get there? Do I have to do this manually, like:

(Pseudo-code, maybe nonworking at all :D)
memcpy(TempBuffer,BufferSize-MeshAOldSize,VertexBuffer[MeshBStart]);memcpy(VertexBuffer,MeshANewSize,MeshAData);memcpy(VertexBuffer,TempBufferSize,TempBuffer);

Or is there a way with less copying or shifting around?

Quote:
 Updating the data in a resource is always faster then creating a new one. Therefore you should create your terrain buffer somewhat larger to have room for changes. This way you only need to reallocate if the changes become to huge.

I do this already for the little chunks, so I have all code ready for this. No problem here.

##### Share on other sites
Constant buffers can slow you down in these cases.

-Too many constants that are not used.
-More than one constant buffer per shader type and update interval.
-Mixing different update intervals in one buffer.
-Using separate constant buffers resources per object instead of reusing a single (dynamic) one.
-Too many different update intervals. 3 is a good value: settings, frame and object. Sometimes it could make sense to add another one for render target changes.

There are three solutions for adding more Vertices to a chunk.
1. Let some room between the data for each chunk
2. Use an area at the end of the vertex buffer as overflow region.
3. Use a large vertex buffer as heap.
The first solution is better for the memory caching but requires rewriting the whole buffer if a chunk becomes too large. The second one is more flexible but harder to maintain und less cache friendly. The third one is in general the best but requires that you write a whole heap manager for an external memory resource.

##### Share on other sites
If you can set aside enough space for each chunk that'd fit the largest chunk, that'd be best. For example, if your maximum chunk is 1000 vertices of 32 bytes, and you have 100 chunks, set up a buffer 3,200,000 bytes in size. This way you will never have to move anything, and don't have to manage the memory. You'll be wasting card RAM, but as long as your numbers aren't much bigger than this example, the memory you're wasting is small compared to what current cards have.

1. 1
2. 2
3. 3
Rutin
15
4. 4
5. 5
khawk
11

• 9
• 9
• 11
• 11
• 23
• ### Forum Statistics

• Total Topics
633677
• Total Posts
3013281
×