Hello. I'd like to ask a general question about how rendering is performed. If anything seems off in my understandings, please feel free to correct me.

Generally when rendering objects, from the position vector of the object you would:

1. Apply world transformation to go from object space to world space

2. Apply view transformation to go from world space to camera space

3. Apply perspective projection transform to go from camera space to canonical view volume

In a normal situation, you only have a single camera and screen to work with so you only have one each of view transform matrix and projection transform matrix to render a frame. So, before the rendering begins, you need to put all this info in a constant buffer. So that the vertex shader can transform the vertices per objects into a world matrix.

Considering you need a world transform matrix for every objects, **how is this process actually done in practical situations**?

The easiest and obvious solution I can think of is to put world matrices in a constant buffer but if I am not mistaken from the architecture of video card memory, you are not able to dynamically allocate constant buffers. I can think of this being a big problem because in real situations, the number of objects to be rendered would vary realtime.

Several Thoughts I had:

1. A DirectX11 tutorial I came across handles this situation in a rather concerning way. If object A,B,C exists, instead of passing all objects to the vertex shader, it would go through the following process:

Add world matrix of object A into a constant buffer then draw the object A

Add world matrix of object B into a constant buffer then draw the object B

Add world matrix of object C into a constant buffer then draw the object C

This works, but this seems really counter-intuitive from a performance perspective as you have to do a lot of synchronization for a simple rendering process. Not to mention drawing operation becomes O(n) for the number of objects. It also doesn't fit too well with the philosophy of shader process where parallel processing is heavily favored.

2. A random guess of mine. You would first dynamically allocate memory size of (XMMatrix) * (Object Count) in GPU, and when transforming from vertex from vertex shader, you would select which matrix to use from list of world matrices. This seems very doable in CUDA but not too sure about HLSL . Is this idea even legit to begin with?

3. Somehow send the vertices of the object located in world space directly to the Vertex Shader.

I hope my question makes sense, I'd be happy to elaborate on any parts that doesn't. Many thanks.