Jump to content

  • Log In with Google      Sign In   
  • Create Account

Increasing terrain performance (loading + drawing)


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
33 replies to this topic

#1 gnomgrol   Members   -  Reputation: 584

Like
0Likes
Like

Posted 26 July 2012 - 06:17 AM

Hey guys,
I got me a pretty nice terrainengine by now.
I load a (big) heightmap and a shadowmap.
Then I split it up into chunks as soon as there isn't already a chunk(passing them their part of the two maps) and draw them (each with its own multiple vertexbuffers for different LODs).
The chunks are stored in a vector (push_back(new chunk(...)), and every frame I check if there are more chunks then the Max_chunks I want to have loaded, those are erased by vector.erase(the first one not visible).
So, that's all running fine, and thanks to the LOD-levels on 100-150 fps when I dont move.
But the loading/unloading and creating/destruction of chunks seems to take to much time. 3-9 ms currently, and i have to load/unload about 100 or more every frame. That makes the programm freezing a little when you move around, and now I have to get rid of this.

A thing I already got suggested in this Forum are memorypools. I just get me a cupple of raw memory when to programm starts, and then use new(myMemory)chunk(...) to allocate from it. But that appeared to be pretty unhandy to implement due to the method I create the chunks with, and I need to have access to this memory inside of each chunk to allocate the memory the LODlevels need (they delete[] it instantly after the vertexbuffer is set anyway).

So, are there any other ways to improve the performance of new/delete and how can I pass raw memory trough a function properly?
Any ways to increase the draw performance even more can be brought in too.

Feel free to ask for code, if it's needed anywhere.
Thanks for your concern

--gnomgrol

http://imageshack.us/photo/my-images/215/terrainla.png/

Edited by gnomgrol, 28 July 2012 - 07:40 AM.


Sponsor:

#2 Erik Rufelt   Crossbones+   -  Reputation: 3479

Like
1Likes
Like

Posted 26 July 2012 - 06:51 AM

Do you recreate the vertex buffers every frame also?
You should probably have a static number of vertex buffers always created and then just refill them with new data when a chunk is switched out and another takes its place. How big is the heightmap?
Perhaps you can store all the vertices at least in RAM all the time, and just update vertex buffers when changes occur.
If your heightmap isn't very very large then you can probably even store all the vertices in a vertex-buffer statically, and just use different index buffers to draw different LOD levels to improve performance. If your heightmap is so large that you run out of memory, then look into reusing the same memory for a new chunk instead of reallocating things.

#3 Ashaman73   Crossbones+   -  Reputation: 7500

Like
0Likes
Like

Posted 26 July 2012 - 06:56 AM

and i have to load/unload about 100 or more every frame

Every frame ? You should reconsider the size of your tiles. It seems to me, that your choosen size is too small compared to your view distance.

memorypools

You should use a cache, i.e. LRU .

Best to load and process the tiles in a separate thread, choose a decent size, not too small or too large and some kind of cache.

#4 L. Spiro   Crossbones+   -  Reputation: 13595

Like
4Likes
Like

Posted 26 July 2012 - 07:00 AM

You are considering using faster memory allocators (etc.) to solve your problem, when the real issue is actually the concept of what you are doing itself.
Since the FPS is high when you are not moving, we can assume that your terrain system itself is overall fine.

When you move, it drops, and the only thing that happens when you move is that there is deallocation and allocation.
The solution is not to make deallocation and allocation faster, but to simply eliminate them from happening at all.

You should be reusing memory as much as possible for starters, but aside from that you should have a system in place that simply does not need that much memory reallocation or at least spread it out over a longer duration. Of course there are some worlds in which there is no possible way to keep everything necessary in memory, but when memory operations are needed they are put on hold until a certain major event happens, not done every frame.

This issue is not about efficiency, it is about planning. You need to reconceptualize what you are doing so that memory allocations are as little a part of your plans as possible.


L. Spiro
It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#5 gnomgrol   Members   -  Reputation: 584

Like
0Likes
Like

Posted 26 July 2012 - 07:30 AM

@Erik Rufelt
Yes, I'm createing a new one for each chunk. 4096x4096 heightmap right now, but I would like to go bigger.
Refilling seems to be a good idea, I'll see what I can do with it.

@ Ashaman73
chunksize is 65x65 vertices, if I go larger the freezes are getting bigger. I'll look into the cachestuff.
Multithreading appread to be difficult, because I can't use nonstatic stuff in a threadfunction which would be necessary, cause terrain is a own class, as well as chunk.


@ Spiro
So basicly, what you are suggesting is to rework everything so that I don't need to 'new' and 'delete' everytime, reusing memory for me to start with.
Since the terrainclass is controlling and holding the chunks, what I could do would be allocate a 'chunkVerticesNum * maxNumChunks' array at start and then pass a part of it to the chunk to fill. That should work out fine, I'll try as soon as possible.


Why is it that memory de/allocs are taking so long anyway?

Thanks for the quick reply!

Edited by gnomgrol, 26 July 2012 - 08:53 AM.


#6 phantom   Moderators   -  Reputation: 7278

Like
1Likes
Like

Posted 26 July 2012 - 09:26 AM

65*65 is still far far too small for the loaded data.

I worked on an open world game and our 'chunk' sizes were much bigger than that.
The system had a 3x3 high resolution chunk grid around the player meaning we had 9 high resolution chunks loaded at any given time (as well as a number of lower low chunks beyond that). The view distance was 330 meters and our choice of loading in a new chunk was when the chunk bounding box was viewdistance + 20% away from the player to give us some time to stream in the world.

The chunk buffers we also pre-allocated so we could maintain a freelist of them and just load directly into it.

So trying to load and drop 64*64 'chunks' is just too much work - you also won't be able to stream without some form of multi-threading otherwise you'll just end up stalling the thread while you copy memory about.

#7 gnomgrol   Members   -  Reputation: 584

Like
0Likes
Like

Posted 26 July 2012 - 12:05 PM

So, I managed to set up everything as you mentioned aboth. It helped a little, but it keeps beeing pretty laggy. I'm pretty sure thats because I put all chunks in a vector by using push_back(new chunk(...)), and erase the old ones using chunkList.erease(chunkList.begin()). I tried some other things, like maps and deques, but vector was the only one which was really working, the rest of them brought it down to 5fps.

So I need some better stuff to manage the chunks, any suggestions?

Edited by gnomgrol, 27 July 2012 - 12:55 AM.


#8 quiSHADgho   Members   -  Reputation: 325

Like
0Likes
Like

Posted 26 July 2012 - 02:39 PM

I would say maybe you have to work on threading. In my C# engine I added a update workerthread for chunkupdating and doublebuffering. So in your case I suppose you have to make an vectorarray with two vectors inside. So the update thread can update vector1 while the mainthread can draw out of vector2. After updating you switch it so it will continue updating vector2 and drawing from vector1...

#9 mhagain   Crossbones+   -  Reputation: 7978

Like
0Likes
Like

Posted 26 July 2012 - 03:54 PM

Are you putting things in a vector using push_back at runtime? And also clearing the vector at runtime? And has this vector got any memory reserved or does it need to reallocate all the time? This is definitely not an optimal use case, but at the same time it should not be giving the kind of trouble you're experiencing. 100 objects per frame is nothing - or at least it should be nothing. Have you a constructor for the objects you're creating, and - if so - what's going on in the constructor?

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#10 gnomgrol   Members   -  Reputation: 584

Like
0Likes
Like

Posted 27 July 2012 - 12:55 AM

Yes, I'm using push_back and erease at runtime, as well as allocateing at runtime. I'm not calling delete, someone said erease is doing this for me.
The construktor is passing a few things in like the size, the d3d11Device, etc. And I call their init(...) function, which mainly is creating their vertexbuffer.

#11 Erik Rufelt   Crossbones+   -  Reputation: 3479

Like
0Likes
Like

Posted 27 July 2012 - 02:29 AM

erase is not calling delete for you, you need to do that.
However, if you're still calling new every frame then you haven't fixed much. Especially if you recreate vertex-buffers. They should probably only be created at the start of the program with the device, and never again.

Also, as others have already pointed out, you shouldn't need to do anything every frame.
If your map is 4096x4096, try for example 16x16 chunks of size 256x256 at full LOD, and keep 9 of those loaded at full LOD at a time, the ones around the current player position. Then only ever unload/reload chunks when you cross a chunk boundary, that is when the player enters a new chunk, not every time you move.

When that's working correctly and you want to make it even better, look into loading a little bit at a time and not many chunks at once, and not unloading old chunks exactly when you cross the boundary but add some safety distance so things aren't reloaded constantly by moving back and forth across a boundary.

Edited by Erik Rufelt, 27 July 2012 - 02:30 AM.


#12 mhagain   Crossbones+   -  Reputation: 7978

Like
0Likes
Like

Posted 27 July 2012 - 03:32 AM

It's basically a form of death by 1000 cuts. An alternative approach would be to pre-create a pool of objects at startup and draw from that pool rather than re-initializing everything every frame. Either way, and especially with D3D11 (where object creation is documented as being so expensive) you do need to move away from run-time creation.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#13 gnomgrol   Members   -  Reputation: 584

Like
0Likes
Like

Posted 27 July 2012 - 06:08 AM

EDIT: I just figured that you can update EVERY ID3D11Buffer with UpdateSubresource. I didn't knew that, so with that I can try only updateing the buffers instead of creating new once. I'll try this and reply if it works fine.

----- ----- -----
I get it that createBuffer should only be called at the beginning. Problem is, I have to call it when I want to create a new chunk, because I am filling it there. Maybe I'm getting something wrong, at the moment that is what I'm calling every time I create a chunk:

I only create chunks if they are needed when I move, of corse.


// in chunk.h
ID3D11Buffer* vertexBuffer;
D3D11_BUFFER_DESC vertexBufferDesc;

// in chunk.cpp
ZeroMemory( &vertexBufferDesc, sizeof(vertexBufferDesc) );
vertexBufferDesc.Usage = D3D11_USAGE_DEFAULT;
vertexBufferDesc.ByteWidth = sizeof(VertexPosNormalTexColorColor2) * (lowwidth*lowheight);
vertexBufferDesc.BindFlags = D3D11_BIND_VERTEX_BUFFER;
vertexBufferDesc.CPUAccessFlags = 0;
vertexBufferDesc.MiscFlags = 0;


D3D11_SUBRESOURCE_DATA vertexBufferData;
ZeroMemory( &vertexBufferData, sizeof(vertexBufferData) );
vertexBufferData.pSysMem = verticesToLock;
vertexBufferData.SysMemPitch = 0;
vertexBufferData.SysMemSlicePitch = 0;
d3d11Device->CreateBuffer( &vertexBufferDesc, &vertexBufferData, &vertexBufferIn);



How can I get rid of this? I have to call createBuffer, because the verticesToLock-data is created in ever chunks Init() function.

Edited by gnomgrol, 27 July 2012 - 06:52 AM.


#14 SamiHuutoniemi   Members   -  Reputation: 259

Like
0Likes
Like

Posted 27 July 2012 - 07:33 AM

Why would you need to create vertex buffers at all, except in the beginning when you load your height map?

When you move you just decide for every frame which vertex buffers to use, depending on LoD (and set it with a call to IASetVertexBuffers).

If you can't load all buffers at load time, you should probably, as others have pointed out, let a worker thread carry that out in advance.

Edited by SamiHuutoniemi, 27 July 2012 - 07:35 AM.


#15 phantom   Moderators   -  Reputation: 7278

Like
1Likes
Like

Posted 27 July 2012 - 05:50 PM

If your map is 4096x4096, try for example 16x16 chunks of size 256x256 at full LOD


Just to come back to this point; many many terrain tutorials out there are old or based on old tutorials when GPUs weren't as powerful, vertex processing was slower or done on the CPU and processing some work on the CPU to reduce the vertices drawn and processed was worth while.

However technology has progressed, GPUs need to be feed with large chunks of work to get parallism with the CPU and CPU cost (and associated memory accesses) are the bottleneck in many many cases so don't be afraid to throw large patches at the GPU beyond what old wisdom use to say was the norm.

#16 gnomgrol   Members   -  Reputation: 584

Like
0Likes
Like

Posted 28 July 2012 - 02:33 AM

So, I managed to do everything you mentioned here. Memory is only allocated at the start, and buffers are only created there too, then reused. That works fine, but when I set the chunkSize to 257, the problem is that, when only 9 chunks are there, FPS already drop to ~55. I haven't reimplemented LOD yet, but those 9 chunks shall be set to full detail anyway.

So what I need to do now is increasing the FPS.
(I already got FrustumCulling and LOD)

Edited by gnomgrol, 28 July 2012 - 02:44 AM.


#17 L. Spiro   Crossbones+   -  Reputation: 13595

Like
0Likes
Like

Posted 28 July 2012 - 02:43 AM

There are many general-purpose techniques for increasing framerate.

For one, sorting by render-state/shaders/textures is one of the most important.
3D Performance Tips

Since each chunk is drawn with the same shader and many—if not all—of the same textures, you will see a large boost from just this (unless you are already doing it).

You should also be able to share vertex buffers. Switching vertex buffers is also costly, and terrain provides you with many ways to share vertex data.
For example, X and Z in one buffer that is shared across all chunks and let the Y be in a separate buffer, with only that buffer being swapped between calls.
This also allows you to heavily compress the X and Y values into 16-bit values each which saves bandwidth.
You should also be using only one index buffer for all chunks.

Draw near-to-far to reduce overdraw.

Use compressed textures, especially on terrain.


L. Spiro
It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#18 Erik Rufelt   Crossbones+   -  Reputation: 3479

Like
0Likes
Like

Posted 28 July 2012 - 03:23 AM

So, I managed to do everything you mentioned here. Memory is only allocated at the start, and buffers are only created there too, then reused. That works fine, but when I set the chunkSize to 257, the problem is that, when only 9 chunks are there, FPS already drop to ~55. I haven't reimplemented LOD yet, but those 9 chunks shall be set to full detail anyway.


A couple of questions in addition to what's already been mentioned:
What GPU etc. are you running this on?
What happens if you decrease the size of your chunks to similar of what you had before, but still use these other techniques to not reload every frame?

Also I think we need to see the code for drawing one chunk in order to give more tips.

#19 gnomgrol   Members   -  Reputation: 584

Like
0Likes
Like

Posted 28 July 2012 - 03:39 AM

Thanks for the quick reply. I'll read over your article. Indexbuffer is shared already.
What do you mean by compressed textures? Using them in .jpg format?
You suggest to spilt the vertexbuffer, setting one only once (x + z), then pass a y-buffer every chunk?

I only set one set of textures once per frame for the terrain at the moment.

It seems to run smoother with smaller chunks, but I'm pretty sure thats only subjective, FPS are the same.
Here's the code for drawing:

// in terrain.cpp   ( called once per frame )
d3d11DevCon->VSSetShader(VS, 0, 0);
d3d11DevCon->PSSetShader(PS, 0, 0);

d3d11DevCon->OMSetBlendState(0, 0, 0xffffffff);

LightBufferType* dataPtr2;
D3D11_MAPPED_SUBRESOURCE mappedResource;
d3d11DevCon->Map(lightBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
// Get a pointer to the data in the constant buffer.
dataPtr2 = (LightBufferType*)mappedResource.pData;
// Copy the lighting variables into the constant buffer.
dataPtr2->ambientColor = D3DXVECTOR4(0.3f, 0.3f, 0.3f, 1.0f); //Everythingcolor
dataPtr2->diffuseColor = D3DXVECTOR4(1.0f, 1.0f, 1.0f, 1.0f); //Lightcolor
dataPtr2->lightDirection = D3DXVECTOR3(0.5f, -0.5f, 0.5f);
dataPtr2->padding = 0.0f; // Just Filler
// Unlock the constant buffer.
d3d11DevCon->Unmap(lightBuffer, 0);

// Finally set the light constant buffer in the pixel shader with the updated values.
d3d11DevCon->PSSetConstantBuffers(0, 1, &lightBuffer);
// For Texture
d3d11DevCon->PSSetShaderResources(0, 1, &slopeTexture);
d3d11DevCon->PSSetShaderResources(1, 1, &rockTexture);
d3d11DevCon->PSSetShaderResources(2, 1, &grassTexture);

d3d11DevCon->PSSetSamplers( 0, 1, &SamplerState );

d3d11DevCon->IASetInputLayout( vertLayout );
	d3d11DevCon->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST );

d3d11DevCon->UpdateSubresource( cbPerObjectBuffer, 0, NULL, &cbPerObj, 0, 0 );
d3d11DevCon->VSSetConstantBuffers( 0, 1, &cbPerObjectBuffer );

	d3d11DevCon->RSSetState(RSCullNormal);

int numVisChunks;
numVisChunks = 0;
for(int i=0;i<chunkList.size();i++){
  if(chunkList[i]->isVisible == true){
  chunkList[i]->Draw(d3d11DevCon);
  numVisChunks++;
  }
}

// chunk.cpp Draw()  ( called once per chunk )
d3d11DevCon->IASetIndexBuffer( indexBuffer, DXGI_FORMAT_R32_UINT, 0); // I need different indexBuffers for different LODs, right?
d3d11DevCon->IASetVertexBuffers( 0, 1, &vertexBuffer, &stride, &offset );
d3d11DevCon->DrawIndexed(numIndices, 0, 0 );


Edited by gnomgrol, 28 July 2012 - 04:32 AM.


#20 L. Spiro   Crossbones+   -  Reputation: 13595

Like
0Likes
Like

Posted 28 July 2012 - 04:48 AM

What do you mean by compressed textures? Using them in .jpg format?

That is a disk compression compression format.
I am talking about run-time compression formats such as DXT, BCn, etc.
http://wiki.polycount.com/DXT
http://msdn.microsof...531(VS.85).aspx


You suggest to spilt the vertexbuffer, setting one only once (x + z), then pass a y-buffer every chunk?

I do. In addition to compressing the X and Z values to 16 bits.



I only set one set of textures once per frame for the terrain at the moment.

The way you are setting them is not efficient.
3 calls instead of 1?
Why not:
ID3D11ShaderResourceView * psrvViews[] = {
	slopeTexture,
	rockTexture,
	grassTexture
};
d3d11DevCon->PSSetShaderResources( 0, 3, psrvViews );

And furthermore, if terrain is the only thing you are drawing, those textures will already be set, so setting them again and again is just wasting time.
You need to make wrappers for basically all of the DirectX 11 calls and check the last values you sent to them, and when matching, don’t call the DirectX 11 function.
Sorting by render states, textures, and shaders only has meaning if you are doing this.


L. Spiro

Edited by L. Spiro, 28 July 2012 - 04:49 AM.

It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS