Rendering a lot of different geometry

Started by
5 comments, last by L. Spiro 9 years, 2 months ago

Hi there!

I'm currently working on replacing an old games D3D7 renderer with a more modern D3D11 one (It's Gothic 2, in case you know that game!). The game originally does very much CPU-Culling, which lets it run at mostly 40fps on a modern system in the more demanding main areas, because graphic-cards weren't so strong back then.

Well, they caught up so much that it is faster now to just not cull anything at all.

However, now comes the problem:

I have no clue how I should render those objects in a fast way. I have about 1000 objects with ~3 submeshes in general on screen, while there are about 17000 objects in the whole world.

To get them, I have to run over a BSP-Tree and then pack them into a list, which I can then sort by texture and vertexbuffers.

So, drawing doesn't really take much time on my GTX970, but I am already down to 80fps (from ~300) when I don't even enable the actual draw-call.

Debugging and profiling got me, that one part of the problem seems to be the lists I am filling with the pointers to the objects.

Lets say I have 1000 Objects on screen. I am on 32-Bit, so a pointer is 4bytes, which makes for a total of about 4kb I stuff into my list every frame.

This list also gets copied in the process to sort for texture, so I have 8kb of stuff laying around in lists in my render-function, which need to get cleaned up in the end. That doesn't sound too good for realtime rendering, right?

I then went by and made the lists static vectors, which won't get reduced in capacity over the frames, so there is no cleanup needed. However, I can't do that with the sorting-part. So still 4kb.

Everything aside, what are the best options in such a case? The objects aren't all different, in fact I have loaded about 500 meshes right now, so I could go with instancing for the static stuff (using D3D11), but how do I handle frustum-culling then without updating all the constantbuffers every frame?

Any help would be appreciated!

Advertisement

So there is a lot being asked here, so I'll try and take it point by point.

As far as rendering a lot of objects fast: This is a problem that has a lot of solutions. Now, honestly, you could brute force that amount with some basic culling and maybe be okay depending on the types of shaders you're running, but if you're looking to eliminate draw calls, batching your meshes can help, where you generate a single mesh from a bunch of submeshes based on shared textures, render states, etc.

As far as culling, most titles I know use CPU frustum culling when they do their initial gather of objects to render.

So you walk your BSP finding all applicable nodes -> then frustum cull the objects in those nodes, then generate your lists for transparent/opaque geometry, sorting as necessary, then batching, then submitting draw calls.

Also I've seen lists get generated per-frame of higher than 4kb and be fine, so that probably won't kill you, so long as you're not copying that list too much.

Perception is when one imagination clashes with another

8KB per frame is only 480 KB/s and modern cpu can handle about 25GB/s bandwith.

Thanks for the replies. I am currently doing frustum and distance culling, which works quite fast. I'm also not trying to eliminate draw-calls, because I get poor performance without them as well, so the cause must be something else.

Also I've seen lists get generated per-frame of higher than 4kb and be fine, so that probably won't kill you, so long as you're not copying that list too much.

8KB per frame is only 480 KB/s and modern cpu can handle about 25GB/s bandwith.

I'm just asking, because that looked like a bit much for something which should run as fast as possible and I'm almost out of ideas.

Another thing is, that the game I am hooking uses the SmartHeap-Library, which happily hooks into my injected DLL as well, taking care of the memory allocations. Now I don't know how fast that 15 year old library is, I just caught it taking a lot of time when deleting what I thought was my lists.

I'm going to try to batch what I can using instancing now, which hopefully doesn't suffer from these problems.

My general advice is that you should avoid premature optimization. Run the game until it becomes an issue, then identify the bottleneck and optimize from there.

Perception is when one imagination clashes with another

It actually became a problem when the game ran slower on my weaker laptop then it did with the original render.

But actually I just got a decent performance boost by just using straight forward instancing! I'm happy with the performance how it is now. Thanks for your time :)

Lets say I have 1000 Objects on screen. I am on 32-Bit, so a pointer is 4bytes, which makes for a total of about 4kb I stuff into my list every frame.
This list also gets copied in the process to sort for texture, so I have 8kb of stuff laying around in lists in my render-function, which need to get cleaned up in the end. That doesn't sound too good for realtime rendering, right?

#1: Why is the list duplicated? Pass a reference/pointer to the list.

which need to get cleaned up in the end. That doesn't sound too good for realtime rendering, right?

#2: No. The list needs to be reset back to 0 items, not “cleaned up” (no allocations should ever take place during the per-frame filling and emptying of the list).
The first 1 or 2 frames might stutter as a few allocations are made, but that memory should never be released until the game shuts down or the stage is reset, etc. std::vector<>::clear() does this.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

This topic is closed to new replies.

Advertisement