render huge amount of objects

Started by
28 comments, last by poigwym 7 years, 8 months ago

Hellow!!!

In modern GPU, modern graphic api dx12,vulkan, how many objects can be drawn at most in 60fps ? and with one light?

My scene with 100 boxes and a direction light runs 15 fps. I 'm not sure is it normal?

I have a look at horde3d engine, it seems he draws 100 crowded animated models without using instancing , but still runs smoothly, I guess it may be faster

than 60fps , how can he do it?

Need tutorial/links abount rendering big big scene.

Advertisement
Sounds like something is wrong as that is a very low framerate for such a simple scene.

Have you run it though a profiler yet?

Do that first and let us know your results :)

What are you using to render (API/libraries etc)? Are you using any form of instancing? Posting your render code might allow people to spot some simple mistakes.

Interested in Fractals? Check out my App, Fractal Scout, free on the Google Play store.

Same here, show some details and we'll take a peak. Assuming the boxes are made up of 8 vertices and 12 triangles, I agree that it's not OK.

It also helps to no on which hardware/ GPU you're running it (just to be sure)

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

What cozzie said. Not all 66ms of work are born equal. 66ms of work on a GTX 980 isnt the same as 66ms of work on an Intel HD 2000.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

I have a profile in my engine, and found that the time spent in matrix multiplication and cbuffer commit are most among all instructions.

After I shut down all light, The process is simple, for every object, update transform cbuffer and commit cbuffer, and draw.

In my engine .all shader share 10 cbuffer(5 for vs, 5 for ps).

one cbuffer look like these:
struct CBTransform /*: register(b0)*/
{
Matrix4f world_matrix;
Matrix4f world_invTrans_matrix;
Matrix4f world_view_proj_matrix;
Matrix4f light_view_proj_matrix;
};
setTransform(Renderable node)
{
// update transform cb
CBTransform *p = reinterpret_cast<CBTransform*>(renderbase->MapResource(_cbtrans, 0, D3D11_MAP_WRITE_DISCARD, 0));
Matrix4f world;
if (node->_hasBone)
world.initIdentity();
else
world = node->getTransform();
p->world_matrix = world;
p->world_invTrans_matrix = world;
Camera *camera = Engine::sceneManager().getMainCamera();
Matrix4f view = camera->getView();
Matrix4f proj = camera->getProjection();
Matrix4f mwvp = world *view *proj;
p->world_view_proj_matrix = mwvp;
if (_curlight) {
Matrix4f lightTrans = world * _curlight->getLightTransform(); // world* viewproj
p->light_view_proj_matrix = lightTrans;
}
renderbase->unMapResource(_cbtrans, 0);
}
// since all shaders share 10 cbuffer, I pass 10 to gpu at every draw call. I'm not sure if the method is right??
_context->VSSetConstantBuffers(0, (int)CBufType::MAX_CBUF_GROUP, _cBufs[(int)ShaderType::VERTEX_SHADER]);
_context->PSSetConstantBuffers(0, (int)CBufType::MAX_CBUF_GROUP, _cBufs[(int)ShaderType::PIXEL_SHADER]);

hehe, I forget to say those 100 boxex that I draw have the same look, and use same vertex buffer, but don't use instancing technique.

I need to update 100 times cbuffer and commit 100 times cbuffer per frame.


Is it possible to draw 100 dynamic boxes that has different vertex buffer and texture and not using instancing technique within 60fps in modern gpu? My cpu and gpu is a little old.

Put some timing code into the hot-spots that you've found (setTransform, etc) and find out exactly how many microseconds per frame you spend on that logic.

If you are on a desktop you can see and old flash demo I did here to test what your gpu can handle.

this is in flash by the way so native you should be able to beat what you see here (it is not massively optimised either)

There is no instancing and each object has a unique transform, the only thing constant between draws is the material.

Lower end gpus should be 500-1000 no problems, mid range 1500-3000, high end can hit 8,000+

http://blog.bwhiting.co.uk/?p=314

Not sure if I understood you correctly, but if you're using 10 CBuffers to render 10 boxes with some (forward) lighting, you could do with 2 constant buffers (not 10):

1. a CB per frame, containing possible viewProjection matrix and your light properties (for multiple lights)

2. a CB per object, which you update for each update, after the last one is drawn

Both having a corresponding C++ struct in your code.

If you're using 10 different CBuffers, that might explain a part of the unexpected performance.

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

This topic is closed to new replies.

Advertisement