# Vulkan render huge amount of objects

This topic is 698 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

If you are on a desktop you can see and old flash demo I did here to test what your gpu can handle.

this is in flash by the way so native you should be able to beat what you see here (it is not massively optimised either)

There is no instancing and each object has a unique transform, the only thing constant between draws is the material.

Lower end gpus should be 500-1000 no problems, mid range 1500-3000, high end can hit 8,000+

http://blog.bwhiting.co.uk/?p=314

My computer runs you demo at 30fps  with 1000+ total render objects.  can you show your code how to update transform ?  1000+ times?

##### Share on other sites

Not sure if I understood you correctly, but if you're using 10 CBuffers to render 10 boxes with some (forward) lighting, you could do with 2 constant buffers (not 10):

1. a CB per frame, containing possible viewProjection matrix and your light properties (for multiple lights)

2. a CB per object, which you update for each update, after the last one is drawn

Both having a corresponding C++ struct in your code.

If you're using 10 different CBuffers, that might explain a part of the unexpected performance.

I only use one cbuffer to hold transform matrix, 10 cbuffer are totally different, 5 for vertex shader, 5 for pixel shader.

5 cbuf are transform, light, camera, skin, frame.

Since I use one cbuf for transformation matrices, I guess it would cause something called cpu-gpu contention..

Edited by poigwym

##### Share on other sites

I do it slightly different than some.

Each 3d object has a transform. This has getters/setters for scale/position/rotation.

There is a 32bit dirty flag that is updated through the setters. So any any given time you can know if the scale, rotation or position has been change and in detail too, i.e. which component.

When a matrix is required, the transform is requested to build it, if the dirty flag is non-zero the matrix needs rebuilding. Depending on different flags set it will do it differently. Scales and transforms are very fast, just directly set the values. But If rotations are included then a more complex recompose is done (sin/cos etc..) you can do this a number of ways and look on line for various approaches.

A common one is to build each rotation matrix required and combine it.

Then if the transform has a parent it needs to be concatenated with it's transform too, managing these relationship updates can be tricky and I am still not sold on the best way to do it.

You don't have to do it this way of course you can just operate directly on a matrix appending transformations to it as you wish (would probably be faster).

The view/projection matrix is calculated once per frame and shared across each draw call, only the world matrix is updated in the buffer between calls, so that is just copying 16 floats into the buffer and nothing else - should be pretty quick.

Hope that helps.

##### Share on other sites

I do it slightly different than some.

Each 3d object has a transform. This has getters/setters for scale/position/rotation.

There is a 32bit dirty flag that is updated through the setters. So any any given time you can know if the scale, rotation or position has been change and in detail too, i.e. which component.

When a matrix is required, the transform is requested to build it, if the dirty flag is non-zero the matrix needs rebuilding. Depending on different flags set it will do it differently. Scales and transforms are very fast, just directly set the values. But If rotations are included then a more complex recompose is done (sin/cos etc..) you can do this a number of ways and look on line for various approaches.

A common one is to build each rotation matrix required and combine it.

Then if the transform has a parent it needs to be concatenated with it's transform too, managing these relationship updates can be tricky and I am still not sold on the best way to do it.

You don't have to do it this way of course you can just operate directly on a matrix appending transformations to it as you wish (would probably be faster).

The view/projection matrix is calculated once per frame and shared across each draw call, only the world matrix is updated in the buffer between calls, so that is just copying 16 floats into the buffer and nothing else - should be pretty quick.

Hope that helps.

wow , you reply so quickly!!! I 'm the first time to feel there's a chat online rather than waiting many hours .

##### Share on other sites

Then if the transform has a parent it needs to be concatenated with it's transform too, managing these relationship updates can be tricky and I am still not sold on the best way to do it.

Are you afraid the parent does not dirty but the child has dirt? then you will omit updating the child's transform?

I solve it by using 3 flag call "childNeedUpdate" "selfDirt", "parentHasUpdate", when the child change it's transform , it set all it's ancestors' childNeedUpdate to true.When update the tree, if the node only has a "childNeedUpdate" flag, it doesn't need to recompute the transform matrix, it just act as a bridge to call child's update.   The node updates when either "parentHasUpdate" or "selfDirt" is true, only these two situation

the node will recompute transform matrix, and must remember to set all childs' "parentHasUpdate" to true. then call childs to update.

I think it is fast, but  haven't test it with huge amount objects.

Edited by poigwym

##### Share on other sites
The method I've used in the past is to put all the hierarchical transforms into a big array, and make sure it's sorted by hierachical depth -- i.e. parents always appear in the array before their children. Something like:
struct Node
{
Matrix4x4 localToParent;//modified when moving the node around
Matrix4x4 localToWorld;//computed per frame - this node's world matrix
int parent;//-1 for root nodes
};

std::vector<Node> nodes;

//Then the update loop is super simple
for(int i = 0, end = nodes.size(); i != end; ++i )
{
const Node& n = nodes[i];
assert( i > n.parent );//assert parents are updated before their children
if( n.parent == -1 )//root node
n.localToWorld = n.localToParent;
else//child node
n.localToWorld = m.localToParent * nodes[n.parent].localToWorld;  // local to parent * parent to world == local to world
}

##### Share on other sites

aaaaa I clicked on something and lost my essay of a message, I should really install a form saver plugin!!!

@hodgman, interesting and tidy approach but does it end up being more efficient that a normal tree traversal? I guess it depends how much changes from frame to frame, it nothing does then a full tree traversal for transform updating only is pointless. But sorting arrays sounds slow also.

I was aiming for a solution that only touches the minimal set of nodes to respond to a change but also scales well from zero changes to changes in every object in the scene. Me wants cake and eating it!

@poigwym, flags like that should work well I think.

##### Share on other sites

@hodgman thinking about this further, are you suggesting that you only add nodes into that array in reaction to something changing? Infact scratch that. you would also have to add any child nodes too in that case and it wouldn't work if a transform was changed multiple times.

There will always need to be a complete hierarchy pass then I think, I can't see how to avoid it. In which case it still makes sense to just update all transforms

Some odd cases to think about.

• Leaf node modified followed by parent followed by its parent... all the way up to the root in that order.
• Root node modified (all children will need updating)
• Leaf node modified followed by its parent's parent alternating all the way to the root.

There are 2 things at play with transforms the way I see it, the local update of a matrix when it is changed... then the re-combining of all the child matrices - this is where I am struggling to see the optimal solution.

Rebuild from scratch? Update and recombine using a 3rd snapshot matrix that represents the hierarchy above? Some other genius idea of justice?

EDIT::

If I get time I might make a 2d test bed to test this, a simple visual 2d tree that is update-able via mouse drags. I can then try various approaches and rather than benchmark I can compare how much work is done/or saved.

Edited by bwhiting

##### Share on other sites

Storing trees in linear arrays is always good, most if the tree structure remains static (e.g. a character).

That does not mean you have to process the whole tree even if there ar only a few changes.

The advantage is cache friendly linear memory access. You get this also for partial updates, if you use a nice memory order (typically sorted by tree level as hodgman said, and all children of any node in gapless order).

However, 100 is a small number and i can't imagine tree traversal or transformation can cause such low fps.

Do you upload each transform individually to GPU? It seems you map / unmap buffers for each object - that's slow and is probably the reason.

Use a single buffer containing all transforms instead so you have only one upload per frame.

Also make sure all render data (transforms and vertices) are on gpu memory and not on host memory.

##### Share on other sites

One thing I notice is that you do world *view *proj in setTransform every time. Can't you just calculate view*proj once per frame, and pass that in to be multiplied with the world transform? That would cut the matrix multiplications by 66%, if I'm understanding this correctly.

1. 1
Rutin
19
2. 2
3. 3
JoeJ
13
4. 4
5. 5

• 18
• 19
• 10
• 13
• 17
• ### Forum Statistics

• Total Topics
631694
• Total Posts
3001751
×