Increasing terrain performance (loading + drawing)

Started by
32 comments, last by gnmgrl 11 years, 8 months ago
Hey guys,
I got me a pretty nice terrainengine by now.
I load a (big) heightmap and a shadowmap.
Then I split it up into chunks as soon as there isn't already a chunk(passing them their part of the two maps) and draw them (each with its own multiple vertexbuffers for different LODs).
The chunks are stored in a vector (push_back(new chunk(...)), and every frame I check if there are more chunks then the Max_chunks I want to have loaded, those are erased by vector.erase(the first one not visible).
So, that's all running fine, and thanks to the LOD-levels on 100-150 fps when I dont move.
But the loading/unloading and creating/destruction of chunks seems to take to much time. 3-9 ms currently, and i have to load/unload about 100 or more every frame. That makes the programm freezing a little when you move around, and now I have to get rid of this.

A thing I already got suggested in this Forum are memorypools. I just get me a cupple of raw memory when to programm starts, and then use new(myMemory)chunk(...) to allocate from it. But that appeared to be pretty unhandy to implement due to the method I create the chunks with, and I need to have access to this memory inside of each chunk to allocate the memory the LODlevels need (they delete[] it instantly after the vertexbuffer is set anyway).

So, are there any other ways to improve the performance of new/delete and how can I pass raw memory trough a function properly?
Any ways to increase the draw performance even more can be brought in too.

Feel free to ask for code, if it's needed anywhere.
Thanks for your concern

--gnomgrol

http://imageshack.us/photo/my-images/215/terrainla.png/
Advertisement
Do you recreate the vertex buffers every frame also?
You should probably have a static number of vertex buffers always created and then just refill them with new data when a chunk is switched out and another takes its place. How big is the heightmap?
Perhaps you can store all the vertices at least in RAM all the time, and just update vertex buffers when changes occur.
If your heightmap isn't very very large then you can probably even store all the vertices in a vertex-buffer statically, and just use different index buffers to draw different LOD levels to improve performance. If your heightmap is so large that you run out of memory, then look into reusing the same memory for a new chunk instead of reallocating things.

and i have to load/unload about 100 or more every frame

Every frame ? You should reconsider the size of your tiles. It seems to me, that your choosen size is too small compared to your view distance.


memorypools

You should use a cache, i.e. LRU .

Best to load and process the tiles in a separate thread, choose a decent size, not too small or too large and some kind of cache.
You are considering using faster memory allocators (etc.) to solve your problem, when the real issue is actually the concept of what you are doing itself.
Since the FPS is high when you are not moving, we can assume that your terrain system itself is overall fine.

When you move, it drops, and the only thing that happens when you move is that there is deallocation and allocation.
The solution is not to make deallocation and allocation faster, but to simply eliminate them from happening at all.

You should be reusing memory as much as possible for starters, but aside from that you should have a system in place that simply does not need that much memory reallocation or at least spread it out over a longer duration. Of course there are some worlds in which there is no possible way to keep everything necessary in memory, but when memory operations are needed they are put on hold until a certain major event happens, not done every frame.

This issue is not about efficiency, it is about planning. You need to reconceptualize what you are doing so that memory allocations are as little a part of your plans as possible.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

@Erik Rufelt
Yes, I'm createing a new one for each chunk. 4096x4096 heightmap right now, but I would like to go bigger.
Refilling seems to be a good idea, I'll see what I can do with it.

@ Ashaman73
chunksize is 65x65 vertices, if I go larger the freezes are getting bigger. I'll look into the cachestuff.
Multithreading appread to be difficult, because I can't use nonstatic stuff in a threadfunction which would be necessary, cause terrain is a own class, as well as chunk.


@ Spiro
So basicly, what you are suggesting is to rework everything so that I don't need to 'new' and 'delete' everytime, reusing memory for me to start with.
Since the terrainclass is controlling and holding the chunks, what I could do would be allocate a 'chunkVerticesNum * maxNumChunks' array at start and then pass a part of it to the chunk to fill. That should work out fine, I'll try as soon as possible.


Why is it that memory de/allocs are taking so long anyway?

Thanks for the quick reply!
65*65 is still far far too small for the loaded data.

I worked on an open world game and our 'chunk' sizes were much bigger than that.
The system had a 3x3 high resolution chunk grid around the player meaning we had 9 high resolution chunks loaded at any given time (as well as a number of lower low chunks beyond that). The view distance was 330 meters and our choice of loading in a new chunk was when the chunk bounding box was viewdistance + 20% away from the player to give us some time to stream in the world.

The chunk buffers we also pre-allocated so we could maintain a freelist of them and just load directly into it.

So trying to load and drop 64*64 'chunks' is just too much work - you also won't be able to stream without some form of multi-threading otherwise you'll just end up stalling the thread while you copy memory about.
So, I managed to set up everything as you mentioned aboth. It helped a little, but it keeps beeing pretty laggy. I'm pretty sure thats because I put all chunks in a vector by using push_back(new chunk(...)), and erase the old ones using chunkList.erease(chunkList.begin()). I tried some other things, like maps and deques, but vector was the only one which was really working, the rest of them brought it down to 5fps.

So I need some better stuff to manage the chunks, any suggestions?
I would say maybe you have to work on threading. In my C# engine I added a update workerthread for chunkupdating and doublebuffering. So in your case I suppose you have to make an vectorarray with two vectors inside. So the update thread can update vector1 while the mainthread can draw out of vector2. After updating you switch it so it will continue updating vector2 and drawing from vector1...
Are you putting things in a vector using push_back at runtime? And also clearing the vector at runtime? And has this vector got any memory reserved or does it need to reallocate all the time? This is definitely not an optimal use case, but at the same time it should not be giving the kind of trouble you're experiencing. 100 objects per frame is nothing - or at least it should be nothing. Have you a constructor for the objects you're creating, and - if so - what's going on in the constructor?

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Yes, I'm using push_back and erease at runtime, as well as allocateing at runtime. I'm not calling delete, someone said erease is doing this for me.
The construktor is passing a few things in like the size, the d3d11Device, etc. And I call their init(...) function, which mainly is creating their vertexbuffer.

This topic is closed to new replies.

Advertisement