How relevant is this skinned instancing technique?

Started by
7 comments, last by Infinisearch 6 years, 10 months ago

Reference: http://developer.download.nvidia.com/SDK/10/direct3d/Source/SkinnedInstancing/doc/SkinnedInstancingWhitePaper.pdf

The paper recommends that animation frames be baked into a single texture. Is this a good technique in practice? I suppose that data for animation blending between 2 select frames can be done in the per-instance data, but what about things like only playing animations for selected nodes? Or additive animations? Would I have to bake all combinations of such animations?

Also, would you recommend using this technique for all skinned meshes in your engine?

Advertisement

It's still mostly relevent, although with modern API's you have more flexibility with regards to how you can read data in a shader. For instance, you now have structured buffers which are generally more convenient to work with than textures or constant buffers.

Typically your vertex shader will only be concerned with reading the final skeleton pose data, and won't be performing any of the actual animation or blending work that you're referring to. You would probably want to do this ahead of time on CPU (or possibly in a compute shader if you want to get fancy), where you write out the final bone transforms into buffers for each unique animation state. Then your instances would read from these buffers, with multiple instances possibly sharing the same animation state. That whitepaper achieves this sharing of animation states by putting an indirection index into the per-instance data, which is then used as an offset into a global combined bone texture/buffer. On modern API's you could also achieve this with bindless techniquesm where instead of having an offset into a buffer you could have an index into an array of descriptors.

Also, would you recommend using this technique for all skinned meshes in your engine?

There are at least two possible reasons why you might not want to do it on all skinned meshes:

  1. You do not have hundreds or even thousands of instances. Instancing is not free of overhead, for only a handful of instances it will be the same performance, possibly slower. It's more trouble to get it to work, too. So, if all you have is a dozen characers on scren, you may as well not bother.
  2. Your meshes vary greatly in their number of vertices. Note that instanced meshes do not need to be identical (on modern hardware) nor do they even have to have the same number of vertices. But they should at least have approximately the same number of vertices because in every batch you must submit as many vertices as in the largest mesh. So, huge differences means either being wasteful on bandwidth and vertex stage or many batches.

Otherwise, sure. You can draw more stuff in the same time.

Thanks all for the information and pointing out my misunderstandings.

Also @samoth, what do you mean that my meshes need not be identical? When you call DrawIndexedInstanced() each time there is an understanding that you use the vertex/index buffer of the mesh and have your instance data in a constant buffer, hence it's the same mesh.

He most likely means instancing in combination with manual vertex fetching and/or merge instancing. Check out this paper: https://www.slideshare.net/DevCentralAMD/vertex-shader-tricks-bill-bilodeau manual vertex fetching starting page 7 and merge instancing starting page 20. Merge instancing is also described here: http://www.humus.name/index.php?page=Articles&ID=5

-potential energy is easily made kinetic-

He most likely means instancing in combination with manual vertex fetching and/or merge instancing. Check out this paper: https://www.slideshare.net/DevCentralAMD/vertex-shader-tricks-bill-bilodeau manual vertex fetching starting page 7 and merge instancing starting page 20. Merge instancing is also described here: http://www.humus.name/index.php?page=Articles&ID=5

Thanks for the slides! Looks like I need some adjustments to the newer methods of rendering; I'm still using the older techniques of instancing esp with my particles.

He most likely means instancing in combination with manual vertex fetching and/or merge instancing.

Yep, that exactly.

Manual fetch is practically none different in performance nowadays from fixed-function (hardly measurable). which is very nice.

So "all meshes must be the same" is no longer really true for instancing. Or rather, it is true insofar as all meshes are the same, having no vertex data at all. You pull what you need from a buffer, using the vertex id and the instance id.

You still need the same number of vertices since that is the way instancing works. However...

...there's primitive restart, and there's points at infinity (or, you could just manually cull vertices above some threshold).

So, the requirement is really only "approximately the same count", you just leave the last bunch of vertices go waste. In principle, you could even combine any number, only just that might be wasteful. Imagine instancing objects with 50 and with 5,000 vertices together. The 50-vert instances will burn 4950 vertices to no effect. But if you have, say, 4,875 and 5,000 vertices (and completely different vertex positions, of course) then... who cares. Burning a couple of vertices will be way faster than drawing the meshes in separate batches.

When you get more complex animations due to input changes (think of something like FIFA), then you may blending several animation and it is easier to deal with on the cpu, especially since you will need final bone transforms for physics related things. However putting baked animations into a buffer/texture can be good for something like crowds, where they have a few idle or walk animations. You can then offset into the buffer and have all your instances playing a random section of an idle loop.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

Thanks for the slides! Looks like I need some adjustments to the newer methods of rendering; I'm still using the older techniques of instancing esp with my particles.

You're welcome... I should also mention this other paper which builds on merge instancing. Its the GPU driven pipeline presentation available here: http://advances.realtimerendering.com/s2015/

Basically what they do is break down their meshes into smaller clusters of a fixed number of triangles. It then on the gpu does frustum, backface, and occlusion culling on the clusters to reduce the geometry load for a given scene. It then renders visible clusters using instancing and manual vertex fetching.

-potential energy is easily made kinetic-

This topic is closed to new replies.

Advertisement