Why are we still using index/vertex/instance buffers?

Started by
18 comments, last by Corvo 6 years, 3 months ago

There are a number of options now in each API for storing data on the GPU. Specifically in DX11, we have Buffers which can be bound as vertex buffers, index buffers, constant buffers or shader resources (structured buffer for example). Constant buffers have the most limitations and I think that's because they are so optimized for non random Access (an optimization of which we have no control of). Vertex buffers and index buffers however have not many limitations compared to shader resource buffers to the point that I question their value.

For example, the common way of drawing geometry is to provide a vertex buffer (and maybe an instance buffer) by a specific call to SetVertexBuffers. We also provide index buffers with a specific call. At this point we also have to provide an input layout. That is significantly more of a management overhead than it would be if we provided the vertex and index buffers through shader resources and indexed them with sysvalues (eg. SV_VertexID) in the shader.

Now, I haven't actually tried doing vertex buffer management this way but I actually looking forward to it if no one points out the faults in my way of thinking.

Advertisement

As you've pointed out, on any recent hardware/API there's nothing stopping you from doing fully programmable vertex fetch. There's 3 things you should keep in mind though:

  1. The performance between programmable and fixed-function vertex fetch may not be the same. There's still GPU's out there that have dedicated hardware for vertex fetch, and using it could possibly be the fastest path depending on what you're doing. On the other hand, some hardware (for instance anything made by AMD in the last 7 years) has no dedicated vertex fetch, and will generate shader code that implements your input layout. But even then there can be differences depending on what types of resources you fetch your data from (structured buffer vs. formatted buffer vs. textures), and your data layout (AoS vs. SoA). For an example, here's what happened when someone benchmarked a bunch of different ways to fetch vertex data on their GTX 970.
  2. GPU's will typically tie their post-VS cache to indices from an index buffer, so you'll still need to use a dedicated index buffer to benefit from it. You may want to look through this thread for some ideas on how to do interesting things within the limitations of standard index buffers.
  3. Input layouts let you have some decoupling between your vertex buffer layout and your actual vertex shader, which can be convenient in some cases. However it's possible that different input layouts will cause the driver to generate different permutations of your VS (or different VS preludes) behind the scenes.

Thank you! I actually didn't think of using a dedicated index buffer but I see now that it still has value. What I am also interested in is that this way you can easily do hard edge normals and UV discontinuities without duplicated position vertices. I am already using deinterleaved vertex buffers (for more efficient shadow rendering/zprepass) so implementing that should not be very hard.

Oh and something to keep in mind: graphics debuggers (at least Nsight) cannot visualize geometry information without an input layout, that is certainly a downside of it.

BTW since it wasn't explicitly stated what you do is you use a null vertex buffer, this will allow you to generate vertex's procedurally or by fetching them manually using the SV_VertexID and SV_InstanceID system values. It has been documented here:

https://www.slideshare.net/DevCentralAMD/vertex-shader-tricks-bill-bilodeau

or

Starting page seven.

-potential energy is easily made kinetic-

At the end of the day, vertex buffer and index buffer is just another buffer with semantics attach..as pointed out above I think vertex caching is one of the biggest reason for the having this distinction still as without this, the API will have to be able to flag a generic buffer as being cacheable..

MJP point 2 is the most important, you need an index buffer in order to benefit from post vertex transform cache, besides that you could, if you only target recent hardware, go SoA (not interleave your vertex data) and fetch manually, that's what will happen on any GCN anyway.

As mentionned by MJP also, nVidia hardware works differently, not sure about latest gen, all consoles being GCN we tend to optimise for it...

-* So many things to do, so little time to spend. *-

Ugh, I implemented it in my engine for every scene mesh render pass and it performs significantly worse on my GTX 1070 than using regular vertex buffers. I was rendering shadows on the sponza scene in 2ms for 6 point lights and the custom vertex fetch moves it up to 11 ms which is insane). The Z prepass of 0.2 ms got up to 0.4ms. These passes are using position and sometimes texcoord and instance deinterleaved buffers.

The vertex buffers are float4 buffers which I create as shader resources with DXGI_FORMAT_R32G32B32A32 views. In the shader I declare them as Buffer<float4>. The instance buffers are structured buffers holding 4x4 float matrices.

I don't understand what could be going on but it is very fishy, I expected a very minor performance difference.

I haven't implemented this myself, but you could try eliminating the overhead of automatic type conversion that buffers have. i.e. the buffer SRV contains a format field, specifying that the data is in a particular format, and the HLSL code says that it wants it converted to DXGI_FORMAT_R32G32B32A32_FLOAT format -- this ability for general purpose conversion might have an overhead on NV?

To avoid that, you could try using a ByteAddressBuffer, and something like asfloat(buffer.Load4(vertexId*16))., which hard-codes the expectation that the buffer will be in DXGI_FORMAT_R32G32B32A32_FLOAT format.

Alternatively you could try using a StructuredBuffer<float4>.

I'd be very interested to know if these three types of buffers have any performance differences... :wink:

Yeah I will check with the other buffer types too and post my findings. And double check my implementation too, maybe I missed something more obvious. And I am using a hardware index buffer by the way.

You should use a ByteAddressBuffer as suggested by MJP.

-* So many things to do, so little time to spend. *-

This topic is closed to new replies.

Advertisement