Practical rendering approach of an object

Started by
4 comments, last by Vilem Otte 11 years, 3 months ago

Hello. I'd like to ask a general question about how rendering is performed. If anything seems off in my understandings, please feel free to correct me.

Generally when rendering objects, from the position vector of the object you would:

1. Apply world transformation to go from object space to world space

2. Apply view transformation to go from world space to camera space

3. Apply perspective projection transform to go from camera space to canonical view volume

In a normal situation, you only have a single camera and screen to work with so you only have one each of view transform matrix and projection transform matrix to render a frame. So, before the rendering begins, you need to put all this info in a constant buffer. So that the vertex shader can transform the vertices per objects into a world matrix.

Considering you need a world transform matrix for every objects, how is this process actually done in practical situations?


The easiest and obvious solution I can think of is to put world matrices in a constant buffer but if I am not mistaken from the architecture of video card memory, you are not able to dynamically allocate constant buffers. I can think of this being a big problem because in real situations, the number of objects to be rendered would vary realtime.


Several Thoughts I had:

1. A DirectX11 tutorial I came across handles this situation in a rather concerning way. If object A,B,C exists, instead of passing all objects to the vertex shader, it would go through the following process:

Add world matrix of object A into a constant buffer then draw the object A

Add world matrix of object B into a constant buffer then draw the object B

Add world matrix of object C into a constant buffer then draw the object C

This works, but this seems really counter-intuitive from a performance perspective as you have to do a lot of synchronization for a simple rendering process. Not to mention drawing operation becomes O(n) for the number of objects. It also doesn't fit too well with the philosophy of shader process where parallel processing is heavily favored.

2. A random guess of mine. You would first dynamically allocate memory size of (XMMatrix) * (Object Count) in GPU, and when transforming from vertex from vertex shader, you would select which matrix to use from list of world matrices. This seems very doable in CUDA but not too sure about HLSL . Is this idea even legit to begin with?

3. Somehow send the vertices of the object located in world space directly to the Vertex Shader.

I hope my question makes sense, I'd be happy to elaborate on any parts that doesn't. Many thanks.

Advertisement

1. Can't really avoid putting the matrices in constant buffers. For something like world transform that is not necessarily constant frame-to-frame, more than one rotated constant buffer dedicated to the world transform would be necessary to avoid sync issues (unless you depend on drivers to do orphaning, which is far from gauranteed). It might be worth it though to simply modify a single 'matrices' constant buffer each frame for each object, and only do something more complex if you actually need to.

2. Which is essentially how tons of one object with one draw call (instancing) works.

3. Probably not worth the trouble unless you have a ton of unique non-moving objects and that single matrix multiplication in the shader is slowing things down (unlikely).

New C/C++ Build Tool 'Stir' (doesn't just generate Makefiles, it does the build): https://github.com/space222/stir

How about a different kind of logic:

If a different number of object of different type exists,

- Loop through a list of visible objects, and place the world matrices / shaders / materials of each object type in their proper bucket. If you are able to group the similar objects before hand to bigger entities, you can save lots of data copying here

- probably you should sort the buckets based on the used resources (buffers, textures, shaders) in order to minimize state changes.

- For each bucket, copy the transforms to a generic buffer object. One constant buffer isn't probably big enough to hold a scene worth of data. It is enough to map/unmap/discard the buffer object only once per frame. For each bucket, store the start position in the buffer object. This system also allows you to have different amount of data per bucket. For example, you don't need a different data structures for skinned objects, or if you wish to give some per instance data such as color / texture index.

- draw each bucket using draw(indexed)instanced

Of course the worst case scenario where all the objects are different kind doesn't improve much.

Cheers!

To clarify:

-When rendering an object in OpenGL 2.*, fixed function (or even in most shaders) there are exactly two matrices involved: the projection matrix and the modelview matrix.

-When rendering an object in OpenGL 4.*, you must use a shader, and there are no built-in matrices. So you can have as many as you want. If your shader does use them, it's fastest to compute a P*V*M and pass that in as one matrix. That way, there's only one matrix multiply per vertex.

-In your OP, I'm assuming you're asking about drawing many objects with instancing, and you want each to have a different model matrix, but the same view and projection matrices?

In the latter case, a crufty strategy that I know will work is to store each matrix in a single texel of a RGBA32 texture, and then sample that in your vertex shader. I think the right way to do this though is to look at uniform buffer arrays.

Or, you know, just draw your objects one at a time.

[size="1"]And a Unix user said rm -rf *.* and all was null and void...|There's no place like 127.0.0.1|The Application "Programmer" has unexpectedly quit. An error of type A.M. has occurred.
[size="2"]

This works, but this seems really counter-intuitive from a performance perspective as you have to do a lot of synchronization for a simple rendering process. Not to mention drawing operation becomes O(n) for the number of objects.

A couple of quick thoughts:

- There isn't as much synchronisation as you think, because commands to the GPU are pipelined. If you rapidly issue multiple draw calls, the GPU is going to overlap their processing as and when it can (provided you don't explicitly force synchonisation).

- Rendering N objects is always an O(N) operation. Parallelisation just obscures that fact.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

The easiest and obvious solution I can think of is to put world matrices in a constant buffer but if I am not mistaken from the architecture of video card memory, you are not able to dynamically allocate constant buffers. I can think of this being a big problem because in real situations, the number of objects to be rendered would vary realtime.

I don't think this is so huge problem, you actually can allocate new (bigger) buffer, copy old buffer to new one, and remove old.

The right way is definitely through uniforms (as mentioned), for example we send just single matrix to GPU (computed World * View * Projection on CPU), and it's fine for single objects.

For instancing we pre-compute matrices into texture from which we read in vertex shader. Note that you can dynamically add/remove instances from texture and it auto-sizes in our implementation, when you could get over capacity we "realloc" to twice the space, and when you're less then half of capacity we "realloc" to fit the space again. It works, but reallocations are really rare (with exception to state, where you're editing your world, where "reallocations" happen more often) in runtime.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

This topic is closed to new replies.

Advertisement