DirectX: What are the (dis)advantages of multi-stream multi-index rendering?

Started by
3 comments, last by Hodgman 11 years, 5 months ago
What are the advantages and disadvantages of multi-stream multi-index rendering? (as is done in a d3d10 sample)
Is it often used in graphics engines? When should I bother to implement it?
Advertisement
It should probably only be used in cases where optimising for memory usage is your highest priority.

The default method of using indexed rendering only supports a single index stream, which is used to fetch all attributes. This method definately has specialized hardware designed for it in the GPU so that indexing is fast.

The multi-index method is implemented by the user making use of the general-purpose shading hardware, so it will have greater overheads, and likely perform worse. The advantage in indexing each attribute separately is that you might end up with less data.

e.g. a cube, with a square texture applied to each face, requires 8 unique positions and 4 unique UV coordinates.
However, in the default method, because a single index value is used to address all attributes, then you need at least 8 UV coordinates (1 for each position). Also, it's likely that although each face shares positions with it's neighbours, it may not use the same tex-coords. As a worst case, you end up with 24 (4 verts * 6 faces) unique vertices (position+UV combinations).

There might also be a motivation to use it if you're writing visualisation software for formats that use multiple index streams natively, such as OBJ and COLLADA files, and you don't want to bother converting the data to single-index format.
So would a AAA console game where memory (and/or bandwidth) is, or is preferably, tight typically use separate streams or would they use the default stream? Or would it be on a game-by-game basis?
IIRC from something I read elsewhere, when the hardware reads in a vertex the read is done in 32 byte chunks from the vertex buffer, every time. So let's say your vertex shader input looks like this:


struct Vertex
{
float3 Position;
float3 Normal;
float2 TexCoord;
};


and your vertex buffers are set up like this:

// Vertex buffer 1 - Positions
pos1 | pos2 | pos3 | pos4 | ...
// Vertex buffer 2 - Normals
norm1 | norm2 | norm3 | norm4 | ...
// Vertex buffer 3 - TexCoords
tex1 | tex2 | tex3 | tex4 | ...


Then the device has to do 3 32 byte reads per vertex, for a total of 96 bytes, 64 of which is useless and will be discarded. However, if I packed one vertex buffer

// Vertex buffer 1 - Positions, normals and texcoords all interleaved
pos1 | norm1 | tex1 | pos2 | norm2 | tex2 | pos3 | norm3 | tex3 | pos4 | norm4 | tex4 | ...

Then the device only reads one 32 byte chunk which is a 3x bandwidth saver.

If your're complicating things further by using different indices, that's another lot of Buffer<ushort>::Load() calls slowing down the shader. There's no point sacrificing texture samples to save a little GPU memory, I bet your entire game's vertex content weighs less than your 2048x2048 shadow map anyway.

(Doesn't everybody have a 2048x2048 shadow map these days? Or several :))

So would a AAA console game where memory (and/or bandwidth) is, or is preferably, tight typically use separate streams or would they use the default stream? Or would it be on a game-by-game basis?
When making modern games on 6 year old hardware, everything is done on a game-by-game basis.

Going by publicly accessibly information (to respect NDAs), Wikipedia says the PS3's GPU uses the G70 architecture, which is DX9-level. This multiple-index-stream technique requires a DX10-level GPU that can perform manual fetching from buffers in the vertex shader. However, the PS3 has got the SPUs, which are fully programmable (and much more powerful than it's GPU) so in theory you could use the SPUs to do your vertex shading, but this would require careful synchronisation between the SPUs and GPU... Options to be evaluated game-by-game wink.png
I bet your entire game's vertex content weighs less than your 2048x2048 shadow map anyway
That's a good point -- vertices are cheap. You can fit half a million of your hypothetical vertex structure into the same space as that single texture.
If you did want to save vertex space, you might be better off storing the normal and tex-coord in 16-bits per component instead of full 32-bit float.

This topic is closed to new replies.

Advertisement