Increasing terrain performance (loading + drawing)

Started by
32 comments, last by gnmgrl 11 years, 8 months ago
erase is not calling delete for you, you need to do that.
However, if you're still calling new every frame then you haven't fixed much. Especially if you recreate vertex-buffers. They should probably only be created at the start of the program with the device, and never again.

Also, as others have already pointed out, you shouldn't need to do anything every frame.
If your map is 4096x4096, try for example 16x16 chunks of size 256x256 at full LOD, and keep 9 of those loaded at full LOD at a time, the ones around the current player position. Then only ever unload/reload chunks when you cross a chunk boundary, that is when the player enters a new chunk, not every time you move.

When that's working correctly and you want to make it even better, look into loading a little bit at a time and not many chunks at once, and not unloading old chunks exactly when you cross the boundary but add some safety distance so things aren't reloaded constantly by moving back and forth across a boundary.
Advertisement
It's basically a form of death by 1000 cuts. An alternative approach would be to pre-create a pool of objects at startup and draw from that pool rather than re-initializing everything every frame. Either way, and especially with D3D11 (where object creation is documented as being so expensive) you do need to move away from run-time creation.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

EDIT: I just figured that you can update EVERY ID3D11Buffer with UpdateSubresource. I didn't knew that, so with that I can try only updateing the buffers instead of creating new once. I'll try this and reply if it works fine.

----- ----- -----
I get it that createBuffer should only be called at the beginning. Problem is, I have to call it when I want to create a new chunk, because I am filling it there. Maybe I'm getting something wrong, at the moment that is what I'm calling every time I create a chunk:

I only create chunks if they are needed when I move, of corse.



// in chunk.h
ID3D11Buffer* vertexBuffer;
D3D11_BUFFER_DESC vertexBufferDesc;

// in chunk.cpp
ZeroMemory( &vertexBufferDesc, sizeof(vertexBufferDesc) );
vertexBufferDesc.Usage = D3D11_USAGE_DEFAULT;
vertexBufferDesc.ByteWidth = sizeof(VertexPosNormalTexColorColor2) * (lowwidth*lowheight);
vertexBufferDesc.BindFlags = D3D11_BIND_VERTEX_BUFFER;
vertexBufferDesc.CPUAccessFlags = 0;
vertexBufferDesc.MiscFlags = 0;


D3D11_SUBRESOURCE_DATA vertexBufferData;
ZeroMemory( &vertexBufferData, sizeof(vertexBufferData) );
vertexBufferData.pSysMem = verticesToLock;
vertexBufferData.SysMemPitch = 0;
vertexBufferData.SysMemSlicePitch = 0;
d3d11Device->CreateBuffer( &vertexBufferDesc, &vertexBufferData, &vertexBufferIn);




How can I get rid of this? I have to call createBuffer, because the verticesToLock-data is created in ever chunks Init() function.
Why would you need to create vertex buffers at all, except in the beginning when you load your height map?

When you move you just decide for every frame which vertex buffers to use, depending on LoD (and set it with a call to IASetVertexBuffers).

If you can't load all buffers at load time, you should probably, as others have pointed out, let a worker thread carry that out in advance.

If your map is 4096x4096, try for example 16x16 chunks of size 256x256 at full LOD


Just to come back to this point; many many terrain tutorials out there are old or based on old tutorials when GPUs weren't as powerful, vertex processing was slower or done on the CPU and processing some work on the CPU to reduce the vertices drawn and processed was worth while.

However technology has progressed, GPUs need to be feed with large chunks of work to get parallism with the CPU and CPU cost (and associated memory accesses) are the bottleneck in many many cases so don't be afraid to throw large patches at the GPU beyond what old wisdom use to say was the norm.
So, I managed to do everything you mentioned here. Memory is only allocated at the start, and buffers are only created there too, then reused. That works fine, but when I set the chunkSize to 257, the problem is that, when only 9 chunks are there, FPS already drop to ~55. I haven't reimplemented LOD yet, but those 9 chunks shall be set to full detail anyway.

So what I need to do now is increasing the FPS.
(I already got FrustumCulling and LOD)
There are many general-purpose techniques for increasing framerate.

For one, sorting by render-state/shaders/textures is one of the most important.
3D Performance Tips

Since each chunk is drawn with the same shader and many—if not all—of the same textures, you will see a large boost from just this (unless you are already doing it).

You should also be able to share vertex buffers. Switching vertex buffers is also costly, and terrain provides you with many ways to share vertex data.
For example, X and Z in one buffer that is shared across all chunks and let the Y be in a separate buffer, with only that buffer being swapped between calls.
This also allows you to heavily compress the X and Y values into 16-bit values each which saves bandwidth.
You should also be using only one index buffer for all chunks.

Draw near-to-far to reduce overdraw.

Use compressed textures, especially on terrain.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid


So, I managed to do everything you mentioned here. Memory is only allocated at the start, and buffers are only created there too, then reused. That works fine, but when I set the chunkSize to 257, the problem is that, when only 9 chunks are there, FPS already drop to ~55. I haven't reimplemented LOD yet, but those 9 chunks shall be set to full detail anyway.


A couple of questions in addition to what's already been mentioned:
What GPU etc. are you running this on?
What happens if you decrease the size of your chunks to similar of what you had before, but still use these other techniques to not reload every frame?

Also I think we need to see the code for drawing one chunk in order to give more tips.
Thanks for the quick reply. I'll read over your article. Indexbuffer is shared already.
What do you mean by compressed textures? Using them in .jpg format?
You suggest to spilt the vertexbuffer, setting one only once (x + z), then pass a y-buffer every chunk?

I only set one set of textures once per frame for the terrain at the moment.

It seems to run smoother with smaller chunks, but I'm pretty sure thats only subjective, FPS are the same.
Here's the code for drawing:


// in terrain.cpp ( called once per frame )
d3d11DevCon->VSSetShader(VS, 0, 0);
d3d11DevCon->PSSetShader(PS, 0, 0);

d3d11DevCon->OMSetBlendState(0, 0, 0xffffffff);

LightBufferType* dataPtr2;
D3D11_MAPPED_SUBRESOURCE mappedResource;
d3d11DevCon->Map(lightBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
// Get a pointer to the data in the constant buffer.
dataPtr2 = (LightBufferType*)mappedResource.pData;
// Copy the lighting variables into the constant buffer.
dataPtr2->ambientColor = D3DXVECTOR4(0.3f, 0.3f, 0.3f, 1.0f); //Everythingcolor
dataPtr2->diffuseColor = D3DXVECTOR4(1.0f, 1.0f, 1.0f, 1.0f); //Lightcolor
dataPtr2->lightDirection = D3DXVECTOR3(0.5f, -0.5f, 0.5f);
dataPtr2->padding = 0.0f; // Just Filler
// Unlock the constant buffer.
d3d11DevCon->Unmap(lightBuffer, 0);

// Finally set the light constant buffer in the pixel shader with the updated values.
d3d11DevCon->PSSetConstantBuffers(0, 1, &lightBuffer);
// For Texture
d3d11DevCon->PSSetShaderResources(0, 1, &slopeTexture);
d3d11DevCon->PSSetShaderResources(1, 1, &rockTexture);
d3d11DevCon->PSSetShaderResources(2, 1, &grassTexture);

d3d11DevCon->PSSetSamplers( 0, 1, &SamplerState );

d3d11DevCon->IASetInputLayout( vertLayout );
d3d11DevCon->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST );

d3d11DevCon->UpdateSubresource( cbPerObjectBuffer, 0, NULL, &cbPerObj, 0, 0 );
d3d11DevCon->VSSetConstantBuffers( 0, 1, &cbPerObjectBuffer );

d3d11DevCon->RSSetState(RSCullNormal);

int numVisChunks;
numVisChunks = 0;
for(int i=0;i<chunkList.size();i++){
if(chunkList->isVisible == true){
chunkList->Draw(d3d11DevCon);
numVisChunks++;
}
}

// chunk.cpp Draw() ( called once per chunk )
d3d11DevCon->IASetIndexBuffer( indexBuffer, DXGI_FORMAT_R32_UINT, 0); // I need different indexBuffers for different LODs, right?
d3d11DevCon->IASetVertexBuffers( 0, 1, &vertexBuffer, &stride, &offset );
d3d11DevCon->DrawIndexed(numIndices, 0, 0 );


What do you mean by compressed textures? Using them in .jpg format?

That is a disk compression compression format.
I am talking about run-time compression formats such as DXT, BCn, etc.
http://wiki.polycount.com/DXT
http://msdn.microsof...531(VS.85).aspx



You suggest to spilt the vertexbuffer, setting one only once (x + z), then pass a y-buffer every chunk?

I do. In addition to compressing the X and Z values to 16 bits.




I only set one set of textures once per frame for the terrain at the moment.

The way you are setting them is not efficient.
3 calls instead of 1?
Why not:
ID3D11ShaderResourceView * psrvViews[] = {
slopeTexture,
rockTexture,
grassTexture
};
d3d11DevCon->PSSetShaderResources( 0, 3, psrvViews );


And furthermore, if terrain is the only thing you are drawing, those textures will already be set, so setting them again and again is just wasting time.
You need to make wrappers for basically all of the DirectX 11 calls and check the last values you sent to them, and when matching, don’t call the DirectX 11 function.
Sorting by render states, textures, and shaders only has meaning if you are doing this.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

This topic is closed to new replies.

Advertisement