Hi there!
I'm currently working on replacing an old games D3D7 renderer with a more modern D3D11 one (It's Gothic 2, in case you know that game!). The game originally does very much CPU-Culling, which lets it run at mostly 40fps on a modern system in the more demanding main areas, because graphic-cards weren't so strong back then.
Well, they caught up so much that it is faster now to just not cull anything at all.
However, now comes the problem:
I have no clue how I should render those objects in a fast way. I have about 1000 objects with ~3 submeshes in general on screen, while there are about 17000 objects in the whole world.
To get them, I have to run over a BSP-Tree and then pack them into a list, which I can then sort by texture and vertexbuffers.
So, drawing doesn't really take much time on my GTX970, but I am already down to 80fps (from ~300) when I don't even enable the actual draw-call.
Debugging and profiling got me, that one part of the problem seems to be the lists I am filling with the pointers to the objects.
Lets say I have 1000 Objects on screen. I am on 32-Bit, so a pointer is 4bytes, which makes for a total of about 4kb I stuff into my list every frame.
This list also gets copied in the process to sort for texture, so I have 8kb of stuff laying around in lists in my render-function, which need to get cleaned up in the end. That doesn't sound too good for realtime rendering, right?
I then went by and made the lists static vectors, which won't get reduced in capacity over the frames, so there is no cleanup needed. However, I can't do that with the sorting-part. So still 4kb.
Everything aside, what are the best options in such a case? The objects aren't all different, in fact I have loaded about 500 meshes right now, so I could go with instancing for the static stuff (using D3D11), but how do I handle frustum-culling then without updating all the constantbuffers every frame?
Any help would be appreciated!