Increasing terrain performance (loading + drawing)

Started by
32 comments, last by gnmgrl 11 years, 8 months ago
I'll look what I can do with the compression thing (is it helping much or just a little?) and splitting the vertexBuffer.
I implemented your suggestion on the textures. I'm not only drawing terrain, I draw a full world with models, etc.
I can see that when I got 20 draws on models with the same texture, only setting it once then draw all of them will boost the FPS.
But I have to set the textures for terrain every frame, because I need to set the textures for all models too, right? So I dont really see how I can save calls here.

A question: How do I set an array in my shader, just with the usual constBuffer?
Are the vertices processed by the shader in the order you save them into the vertexBuffer?
Advertisement
It is very simple.
As I said, make wrappers for all of those functions.
An appropriate class name might be CDirectX11.

This class keeps local copies of the last values sent to every DirectX 11 function (or at least a lot of them).
For example.


/** Textures ready to be sent to the shaders (public). */
static ID3D11ShaderResourceView * m_psrvActiveTextures[LSG_MAX_TEXTURE_UNITS];

/** Last textures (private). */
static ID3D11ShaderResourceView * m_psrvLastActiveTextures[LSG_MAX_TEXTURE_UNITS];


When your CTexture class wants to be activated into a slot (you do have a wrapper for your DirectX 11 textures, right? It is standard practice), it simply does this:

/**
* Activate this texture in a given slot.
*
* \param _ui32Slot Slot in which to place this texture.
* \return Returns true if the texture is activated successfully.
*/
LSBOOL LSE_CALL CDirectX11StandardTexture::Activate( LSUINT32 _ui32Slot ) {
CDirectX11::m_psrvActiveTextures[_ui32Slot] = m_psrvShaderView;
return true;
}



Notice how nothing has been sent to DirectX 11 yet.
Textures, constant buffers, samplers, etc. only need to be sent when it is actually time to draw.
So there should be another function that is called just before an actual render:

/**
* Called just before rendering to allow performing of any final tasks.
*/
LSVOID LSE_CALL CDirectX11::PreRender() {
LSUINT32 ui32Index = 0UL, ui32Total = 0UL;
LSUINT32 ui32Max = CStd::Min<LSUINT32>( LSG_MAX_TEXTURE_UNITS, CFndBase::m_mMetrics.ui32MaxTexSlot );
for ( LSUINT32 I = 0UL; I < ui32Max; ++I ) {
if ( m_psrvActiveTextures != m_psrvLastActiveTextures ) {
++ui32Total;
m_psrvLastActiveTextures = m_psrvActiveTextures;
}
else {
if ( ui32Total ) {
m_pdDevice->PSSetShaderResources( ui32Index, ui32Total, &m_psrvActiveTextures[ui32Index] );
}
ui32Index = I + 1UL;
ui32Total = 0UL;
}
}
if ( ui32Total ) {
m_pdDevice->PSSetShaderResources( ui32Index, ui32Total, &m_psrvActiveTextures[ui32Index] );
}
}


It is really not that complicated. All it is doing is checking for the fewest possible calls it can make to PSSetShaderResources() on each render call by comparing the currently active textures with the textures active on the last render. This is the only location where it is valid to call PSSetShaderResources(), so the local record of the last textures sent to DirectX 11 is accurate.

You basically need a similar system in place for everything. Samplers, textures, constant buffers, etc.
And then you need to make this actually useful by implementing a render queue to maximize the number of times the same texture, shader, etc. are used in repeated render calls.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

I would like to ask again, what hardware are you running on?
If this is integrated laptop graphics or similar that explains the low FPS, and vertex count can become more important.

I certainly don't mean to argue against the good points about state changes, but considering the code posted they will probably have close to zero impact in this case. All chunks in one buffer is probably better, but if you get bad FPS from drawing 9 chunks of 256x256, and the terrain drawing is actually the bottleneck, then you'd need like a thousand state changes per chunk to notice a major difference.

Just to set a performance baseline, create a small test program that draws a 1024x1024 or similar terrain with a single vertexbuffer and a single draw call and absolutely nothing else, to determine what your computer is capable of.
I'm using my gaming notebook which can run even BF3. So that shouldn't be the problem.

I'm very new to directx and all I can do I learned from books and tutorials, there never was anything mentioned about wrappers sad.png
I simply create them with D3DX11CreateShaderResourceViewFromFile( d3d11Device, "tt1.jpg",NULL, NULL, &slopeTexture, NULL );
and set them later as in my code aboth.

EDIT: I think I have an idea where the low FPS could come from: The function which is called every frame to look where new chunks are needed, the terrain::update
Let me post it, I'm sure you will find tons of things to be fixed:

What it does is:
If the player is inside of the size of the terrain, look for each chunkposition around him if there already is one. And if not, create a new one and pass it the part of height and shadowmap. Then check if there are more chunks then I want to hold in memory and erase the first of them which are not visible.
Could it be that the checking is timeconsuming and therefor slowing everything down? (I'm just using one thread at the moment!)



void terrain::Update(float playerXin, float playerZin){

int playerChunkX = ((int)playerXin/(chunkSize-1))*(chunkSize-1);
int playerChunkZ = ((int)playerZin/(chunkSize-1))*(chunkSize-1);
// LOAD NEW // UNLOAD OLD CHUNKS!!
if(playerXin > startX && playerZin > startZ && playerXin < startX+terrainWidth && playerZin < startZ+terrainHeight){
for(int z =0;z<loadPerSide;z++){
for(int x=0;x<loadPerSide;x++){
int xIndex, zIndex;
xIndex = (x-((loadPerSide-1)/2))*(chunkSize-1);
zIndex = (z-((loadPerSide-1)/2))*(chunkSize-1);

chunkNum = chunkList.size();
bool isThereAChunk = false;

//ADD NEW CHUNKS
for(int i=0;i<chunkNum;i++){
if(playerXin+xIndex > chunkList->startX && playerXin+xIndex < chunkList->startX+chunkList->width && playerZin+zIndex > chunkList->startZ && playerZin+zIndex < chunkList->startZ+chunkList->height){
isThereAChunk = true;
continue;
}
}
// test if there is a chunk in this zone to load
if(isThereAChunk == false && playerXin+xIndex > 0 && playerZin+zIndex > 0 && playerXin+xIndex < startX+terrainWidth && playerZin+zIndex < startZ+terrainHeight-chunkSize){
//get heights from map
for(int z=0;z<chunkSize;z++){
for(int x=0;x<chunkSize;x++){
heightsToPass[z*chunkSize+x] = heightMap[(z*terrainWidth+x) + ((int)playerXin+xIndex)/(chunkSize-1)*(chunkSize-1) + ((int)playerZin+zIndex)/(chunkSize-1)*(chunkSize-1)*terrainWidth];
}
}
//get normals from map
for(int z=0;z<chunkSize;z++){
for(int x=0;x<chunkSize;x++){
normalsToPass[z*chunkSize+x] = normalMap[(z*terrainWidth+x) + ((int)playerXin+xIndex)/(chunkSize-1)*(chunkSize-1) + ((int)playerZin+zIndex)/(chunkSize-1)*terrainWidth*(chunkSize-1)];
}
}

int k;
k = 0;
//get shadows from map
for(int z=0;z<chunkSize;z++){
for(int x=0;x<chunkSize;x++){
lightsToPass[k] = lightMapImage[((z*terrainWidth+x) + ((int)playerXin+xIndex)/(chunkSize-1)*(chunkSize-1) + ((int)playerZin+zIndex)/(chunkSize-1)*terrainWidth*(chunkSize-1)) * 3];
lightsToPass[k+1] = lightMapImage[((z*terrainWidth+x) + ((int)playerXin+xIndex)/(chunkSize-1)*(chunkSize-1) + ((int)playerZin+zIndex)/(chunkSize-1)*terrainWidth*(chunkSize-1))*3+1];
lightsToPass[k+2] = lightMapImage[((z*terrainWidth+x) + ((int)playerXin+xIndex)/(chunkSize-1)*(chunkSize-1) + ((int)playerZin+zIndex)/(chunkSize-1)*terrainWidth*(chunkSize-1))*3+2];
k += 3;
}
}

int timebegin = timeGetTime();
chunkAddIndex++;
int toLoad = -1;
for(int i=0;i<chunkCache;i++){
if(bufferInUse == false){
toLoad = i;
bufferInUse = true;
break;
}
}
if(toLoad == -1){ // NOT FINISHED!!!!
printf("chunkCache buffers full!\n");
}

chunkList.push_back(new chunk());
chunkList[chunkNum]->preInit(d3d11Device, d3d11DevCon, chunkSize, chunkSize, playerChunkX+xIndex, playerChunkZ+zIndex, chunkAddIndex, vBuffers[toLoad], iBuffers[toLoad]);
chunkList[chunkNum]->Init(heightsToPass, indices, normalsToPass, lightsToPass, verticesToLock);
int timeend = timeGetTime();
//printf("init chunk took %d ms\n", timeend-timebegin);
}
} // X end!
}// Z end!
}//if player > 0 end !

if(chunkNum > chunkCache){
int toEraseNum = chunkNum-chunkCache;
for(int i=0;i<toEraseNum;i++){
if(chunkList->isVisible == false){
chunkList->CleanUp();
chunkList.erase(chunkList.begin()+i);
chunkNum = chunkList.size();
}else{
toEraseNum--;
}
}
}

visibleIndex = 0;

for(int i = 0; i<chunkList.size(); i++){ // calculate visible chunks
if(FCD->CheckRectangle(chunkList->CenterX, chunkList->CenterY, chunkList->CenterZ, chunkList->width, 256.0f, chunkList->height) == true){
chunkList->isVisible = true;
currentlyVisible[visibleIndex] = i;
visibleIndex++;
}else{
chunkList->isVisible = false;
}
}
playerX = playerXin;
playerZ = playerZin;
for(int i=0;i<chunkList.size();i++){
chunkList->Update(playerX, playerZ);
}

}






chunk::Update() is only saving the playerX and playerZ to the chunk.
Wrappers are only the next step up so it is nothing you can’t handle.

But Erik Rufelt is also correct. These redundancy optimizations are important, especially for the long run, but right now it seems clear that you have bigger issues at hand.

Put all of those chunks into one buffer and draw with one call. If the FPS remains mostly similar, it means you have a bandwidth problem, and the 16-bit vertex data optimization should be a large help.

Also, draw your terrain normally, but move the camera out so that it is only a small part of the screen.
If the FPS increases dramatically, you have a fill-rate problem related to your pixel shader. You could then start examining that.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

When I move the camera out, FPS remain the same.
The FPS are about the same on a single vertexBuffer-drawCall.
16-bit optimization could really help, but I dont have the slightest idea how to do that.

According to your blog, I have to find the size of my vertexBuffer and then pad it to 16,32 or 64, but how do I pad a vertexBuffer?
The size of each element of the vertexstructure (3x sizeof(float) * 2x sizeof(float).... + padding) = 32, eg?
To be clear, are you saying that zooming out so far that the terrain occupies only about 100-500 total pixels on the screen results in a similar framerate?
And you are sure that the rest of whatever you are drawing is not causing this slow framerate?

If so, you definitely have a bus-transfer problem, and the 2 main optimizations would be to use 16-bit vertices and a second stream for the Y, and compressed textures.
Compressed textures are the easiest to implement so start there.

When my site talks about padding, yes, it is as in your example. Add some fake bytes so that the next element in the buffer is 32 bytes after the previous element.
But while this will help, it is not going to give you results you will find acceptable.
This and redundancy checks should be put on hold while you address your most major issue: bandwidth.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Yes, when I zoom out very far, I just get ~10+ fps. I outcommented everything but the terriain, so it's the only thing to mess with right now.

By bandwidthproblem you mean that the stuff I pass from cpu to gpu is much to huge?
So I split the vertexBuffer and update only the Y (using a constantbuffer? I'm not sure what you mean by a stream)
I'll start with the compressed textures right now.
EDIT: Compressing the textures (using BC2) gave me a slight FPS increase of 5-10.
I added a screenshot of it to my first post!
Bandwidth = the transfer from the CPU to the GPU.
It means the total amount you send to the GPU is too large. This includes textures, index buffers, vertex buffers, etc.
That is why using compressed textures can help.

How many elements are in your vertex buffer?
How many bits in your index buffer?


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid


//vertexBuffer:

D3DXVECTOR3 pos;
D3DXVECTOR3 normal;
D3DXVECTOR2 texcoord;
D3DXVECTOR4 color;
D3DXVECTOR4 shadowColor;
//257*257 vertices per chunk
//indexbuffer: unsinged long 256*256*6
// as optimised as I was able to get it

This topic is closed to new replies.

Advertisement