How to optimize here?

Graphics and GPU Programming Programming

Started by mind in a box July 04, 2010 03:57 AM

5 comments, last by ET3D 13 years, 9 months ago

887

Author

July 04, 2010 03:57 AM

Hello,
I've divided my terrain into chunks. I have a really bad performance now and I think it is because of the many draw calls. Here is a picture whihc explains my situation:

I've made the different chunks visible here. The reason why I did the "chunking" is that the terrain can get modified by bombs or similar exploding things ingame. The regeneration of the whole mesh (Gets computed via PolyVox out of voxels) takes some time .At least so much that you feel the impact in the FPS in release mode, in debug mode it takes up to 10 seconds. With the chunks, the performance when regenerating some of the chunks only is quite good, even in debug mode. Because I can simply let out the untouched chunks.

Is it a good practice to put all vertices of all chunks into a new vertex buffer together after one of them got modified? Is that fast? Or is there another solution? I guess I can't use instancing here because the meshes are different and a displacement map isn't working here.

Any thoughts?

(Edit: It may looks like VSync, but you'll have to know that I get ~400fps without the terrain)

D3D11-Renderer for Gothic I&II

Demirug

884

July 04, 2010 04:28 AM

There is an old DX9 rule that says 500 draw calls per 1 GHz CPU. As CPUs has somewhat improved you can do some more draw calls but in general the situation is not changed for DX9.

Base on the values on your screenshot you are draw just ~50 polys per draw call. You should really aim for significant higher values (> 1000). => use larger chunks.

You can still do the recalculation on smaller parts on just update the part of the vertex buffer that contains the changes. It’s not perfect but better than having many small vertex buffers as there is an overhead for each of them.

Beside of this it looks like you are drawing chunks that could never be visible as they are complete surrounded by others. You should try to eliminate them during generation.

mind in a box

887

Author

July 04, 2010 04:38 AM

Quote:
There is an old DX9 rule that says 500 draw calls per 1 GHz CPU. As CPUs has somewhat improved you can do some more draw calls but in general the situation is not changed for DX9.

I use DX11 btw, but when I look at my FPS with the terrain I guess that rule also applies for DX11. ;)

Quote:
You can still do the recalculation on smaller parts on just update the part of the vertex buffer that contains the changes. It’s not perfect but better than having many small vertex buffers as there is an overhead for each of them.

How Do I do that? I mean, lets say I have a vertex buffer which holds 4 chunks at once, and chunk 2 changes. Do I have to re-feed the whole buffer then (With old data from chunk 1,3,4 and the new from 2)? Is that fast?

Quote:
Beside of this it looks like you are drawing chunks that could never be visible as they are complete surrounded by others. You should try to eliminate them during generation.

Adding this to my rendering loop

if(Meshes[Subset-1]->GetVertexCount()==0){	continue;}

Gained me about 100fps! Now I'm confused. (Still ~150 draw calls and somewhat low FPS for drawing just one terrain)

D3D11-Renderer for Gothic I&II

Demirug

884

July 04, 2010 05:17 AM

Don’t measure FPS measure ms/frame. It’s a much better value to compare.

The situation for DX11 is somewhat improved but there are other ways to kill your performances. You can easily shoot yourself in the foot by wrong usage of constant buffers. Maybe you should check this, too.

If you use large vertex buffer you can just update the areas for the chunk that change. Then use a index buffer to generate a larger chunk from the small once. This way you only need to refill our index buffer which is much smaller.

Updating the data in a resource is always faster then creating a new one. Therefore you should create your terrain buffer somewhat larger to have room for changes. This way you only need to reallocate if the changes become to huge.

mind in a box

887

Author

July 04, 2010 05:59 AM

Quote:Original post by Demirug
The situation for DX11 is somewhat improved but there are other ways to kill your performances. You can easily shoot yourself in the foot by wrong usage of constant buffers. Maybe you should check this, too.

Which wrong usage could that be? I have around 5 constant buffers in the shader...

Quote:
If you use large vertex buffer you can just update the areas for the chunk that change. Then use a index buffer to generate a larger chunk from the small once. This way you only need to refill our index buffer which is much smaller.

That is the part I don't understand. Lets say I have 2 chunks that I want to put into the buffer. So, I allocate a buffer which is big enough to hold both meshes + some extra space. Lets say Mesh A has 20 vertices and mesh B has 10 vertices.

So I allocate a buffer with space for 40 vertices for example.
At the first time I simply put both meshes into the buffer:

Buffer:0..20: Mesh A20..30: Mesh B30..40: Empty

Now Mesh A gets 5 more vertices.

Buffer:0..25: Mesh A25..35: Mesh B 35..40: Empty

Do I have to shift the next data 5 places around to get there? Do I have to do this manually, like:

(Pseudo-code, maybe nonworking at all :D)

memcpy(TempBuffer,BufferSize-MeshAOldSize,VertexBuffer[MeshBStart]);memcpy(VertexBuffer,MeshANewSize,MeshAData);memcpy(VertexBuffer,TempBufferSize,TempBuffer);

Or is there a way with less copying or shifting around?

Quote:
Updating the data in a resource is always faster then creating a new one. Therefore you should create your terrain buffer somewhat larger to have room for changes. This way you only need to reallocate if the changes become to huge.

I do this already for the little chunks, so I have all code ready for this. No problem here.

D3D11-Renderer for Gothic I&II

Demirug

884

July 04, 2010 06:27 AM

Constant buffers can slow you down in these cases.

-Too many constants that are not used.
-More than one constant buffer per shader type and update interval.
-Mixing different update intervals in one buffer.
-Using separate constant buffers resources per object instead of reusing a single (dynamic) one.
-Too many different update intervals. 3 is a good value: settings, frame and object. Sometimes it could make sense to add another one for render target changes.

There are three solutions for adding more Vertices to a chunk.
1. Let some room between the data for each chunk
2. Use an area at the end of the vertex buffer as overflow region.
3. Use a large vertex buffer as heap.
The first solution is better for the memory caching but requires rewriting the whole buffer if a chunk becomes too large. The second one is more flexible but harder to maintain und less cache friendly. The third one is in general the best but requires that you write a whole heap manager for an external memory resource.

ET3D

810

July 04, 2010 03:48 PM

If you can set aside enough space for each chunk that'd fit the largest chunk, that'd be best. For example, if your maximum chunk is 1000 vertices of 32 bytes, and you have 100 chunks, set up a buffer 3,200,000 bytes in size. This way you will never have to move anything, and don't have to manage the memory. You'll be wasting card RAM, but as long as your numbers aren't much bigger than this example, the memory you're wasting is small compared to what current cards have.

ET's Place

How to optimize here?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

How to optimize here?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines