using system generated SV_VertexID in VertexShader fails

Started by
10 comments, last by evelyn4you 5 years, 2 months ago

Hi,

i could pull each of my hair because i can' t find my bug for hours i really got headache

in my Indie game engine i use DrawIndexed 

and want to use the VertexID in my Vertex Shader in Directx 11 ( working with win7).

here the manual of Microsoft

Using System-Generated Values

VertexID

A vertex id is used by each shader stage to identify each vertex. It is a 32-bit unsigned integer whose default value is 0. It is assigned to a vertex when the primitive is processed by the IA stage. Attach the vertex-id semantic to the shader input declaration to inform the IA stage to generate a per-vertex id.

The IA will add a vertex id to each vertex for use by shader stages. For each draw call, the vertex id is incremented by 1. Across indexed draw calls, the count resets back to the start value. For ID3D11DeviceContext::DrawIndexed and ID3D11DeviceContext::DrawIndexedInstanced, the vertex id represents the index value. If the vertex id overflows (exceeds 2³²– 1), it wraps to 0.

For all primitive types, vertices have a vertex id associated with them (regardless of adjacency).

end of Microsoft Text.

When setting my Material_idx  manually e.g. to 25 or another value the shape is displayed accordingly

But  when i try to get the actual shape idx from the system generated vertexID i always get the attributes of shape 0

But the manual  states when call drawindexed the vertex id represents the index value.

The Question is: 

Does this mean, that the value of the vertex id will alway stay in the range of e.g. ( 0 to 7 ) for a cube Vertexbuffer even i do a draw call with a binded indexbuffer of e.g. 36 000 indexes all referring to the same 8 vertexies in the tiny small vertexbuffer ? ( Trying to draw 1000 different attributed shapes )

In the Blog Post  http://www.joshbarczak.com/blog/?p=667  "Why Geometry Shaders Are Slow" there is chapter where the author does exactly the same i want to do.

Big Indexbuffer and tiny vertexbuffer to draw multiple cubes but NOT with the drawinstanced method.

Method 2: The “Instancing Sucks” Way

Let’s do the same thing, except let’s not use instancing. The reasons for this will become clear shortly. Now, we don’t want to go doing one drawcall per box, because that would just introduce more overhead. We also don’t want to duplicate the same unit cube 250K times. That would consume unnecessary bandwidth. Instead, we’ll do this by generating a gigantic index buffer and doing SV_VertexID math. We know that cube i will reference vertices 8*i through 8*i+7, so we can figure out our own local instance and vertex ID from a flat index buffer. The only drawback is that now we need to fetch our vertex and instance data explicitly:


 
uniform row_major float4x4 g_ViewProj;
 
Buffer<float4> Verts;
Buffer<float4> XForms;
 
float4 main( 
    uint vid : SV_VertexID ) : SV_Position
{
    uint xform = vid/8;
    float4 v  = Verts[vid%8];
    float4 R0 = XForms[3*xform];
    float4 R1 = XForms[3*xform+1];
    float4 R2 = XForms[3*xform+2];
 
    // deform unit box into desired oriented box
    float3 vPosWS = float3( dot(v,R0), dot(v,R1), dot(v,R2) );
 
    // clip-space transform
    return mul( float4(vPosWS,1), g_ViewProj );
}

 



DrawBoneShapes_VSOut DrawShapeColored_unique_sequenz_VertexShader(DrawShapeColored_VSIn input, uint vertexID : SV_VertexID)
{
    uint a_Material_idx = vertexID;

    // Test Case Cube:
    // has 8 Vertecies
    // has 6 Sides * 2 Triangles * 3 Vertecies = 36 indexes
    // vertexID never gets greater than 8 (I assume) so allways the attributes of 0 th Cube are displayed
    a_Material_idx = vertexID/36;

    // set manually attributes of 26 th Cube ( all works fine !! )
    a_Material_idx = 25;

    DrawBoneShapes_VSOut Output;

    float4x4 WorldViewProj_Shape = mul(BoneShapeList_WVP[a_Material_idx], WorldViewProj);
    
    Output.Position     = mul(input.Position, WorldViewProj_Shape);
    Output.Normal       = float3(0, 0, 0);
    Output.TexCoord     = float2(0, 0);
    Output.Depth        = 1;

    Output.Material_idx = a_Material_idx;
    return Output;
}

 

Advertisement

Basically, there are two ways your SV_VertexID can be filled:

  1. Draw(): the system automatically generates vertex ids starting from 0 to vertexCount
  2. DrawIndexed(): the vertex id will be read from your index buffer. If your index buffer contains indices for a 8 vertex cube, you don't have to do a vertexID % 8 in the shader

If you have an 8-vertex vertex-buffer and a 12-triangle index-buffer representing 1 mesh (a cube with shared vertices, let's say) and want to instance it 3000x times using DrawIndexedInstanced, your vertex shaders will see the following.

All 8 vertices of cube K will see SV_InstanceID = K, where K goes from 0 to 2999.

Every first vertex of any cube will see SV_VertexID = 0, every second vertex of any cube will see SV_VertexID = 1, etc... up to 7.

In this simple case, there will be 3000x8 vertex shader invocations in total.

You don't have to prepare an index buffer containing 3000x12 triangles to render 3000 cubes. No. But you do have to provide a buffer with 3000 elements containing your material index, for example. This could be a constant buffer or a structured read-only buffer (SRV), or even a vertex buffer, with D3D11_INPUT_ELEMENT_DESC::InputSlotClass = D3D11_INPUT_PER_INSTANCE_DATA.

.

@pcmaster

thanks for your reply. You describe the typical method to draw instanced. But with small vertexbuffers this method is proven to be slow.
Also shown in the linked Blog of my post.

@turanszkij

also thanks for your answer, but like in the linked Blog post i need to divide SV_VertexID by 36 so that i get the idx of the actual shape that i want to reference in the attribute constant buffer to get color or transform matrix.

if  SV_VertexID  does NOT give the actual index in the range of the full indexbuffer how could the author of the Blog post do it ?

 

It took me a few times to read Method 2: The “Instancing Sucks” Way, in order to grasp what he's doing there.

He still wants to only have 8 vertices (not duplicated) and he uses a FLAT index buffer. If he wants 250000 cubes, that's 250000 * 6 * 2 triangles, he needs 250K * 12 * 3 = 9M indices, if I count right.

He'd issue a DrawIndexed(9000000, 0, 0) draw-call, which will actually generate SV_VertexID 0 through 8999999!

Now the deal is that you must ensure that the input assembler (IA) won't read past the end of your VB. I reckon that the best way is to bind NO vertex buffer to the IA at all and also no vertex declaration / layout. There'd be no VB at all!

EDIT: I'm rewriting this third time :) I think you don't even need an index buffer at all. No VBs, no IBs. Just a (constant or SRV) buffer with the vertices to be sampled by manual indexing plus an instance buffer (SRV) with the matrices and/or other per-instance data. No need for DrawIndexed, a simple Draw(9M) would be enough!

Note: There's no way to access the index of the vertex-index (the index into the index buffer) nor SV_PrimitiveID inside HLSL vertex shaders.

 

There's one downside to the index-less approach I described. There will be no vertex cache use and even shared vertices will be reshaded, because they are unique. Also, on some HW, the fetch performance from a SRV or CBV might be worse than letting the IA fetch the VB and IB. The performance hit will depend on what kind of topology you're rendering.

You don't have to use a vb, ib, cbv or srv at all. You can do what you want. You can render 1000's of cubes just from SV_VertexID. Presumably you'd still use a CBV for projection/view matrix but could just build the mesh based on SV_VertexID if you want.

Rendering a grid without any buffers:


PS_IN VSMain(in uint index : SV_VertexID)
{
    PS_IN output = (PS_IN)0;

    float4 pos = float4(0, 0, 0, 1);

    uint blockIndex = index / 6;
    uint pointIndex = index % 6;

    if (pointIndex == 1)
    {
        pos.x += 1;
    }
    if (pointIndex == 2)
    {
        pos.x += 1;
        pos.y += 1;
    }
    if (pointIndex == 4)
    {
        pos.x += 1;
        pos.y += 1;
    }
    if (pointIndex == 5)
    {
        pos.y += 1;
    }

    pos.x += blockIndex % (MAP_SIZE);
    pos.y += blockIndex / (MAP_SIZE);

.

.

.

 

Rendering cubes with a buffer only to locate each cube:


static float3 CUBE_VERTS[24] = { float3(-1, 0, -1), float3(0, -1, 0), float3(-1, 0, 1),
float3(-1, 0, 1), float3(0, -1, 0), float3(1, 0, 1),
float3(1, 0, 1), float3(0, -1, 0), float3(1, 0, -1),
float3(1, 0, -1), float3(0, -1, 0), float3(-1, 0, -1),
float3(-1, 0, -1), float3(0, 1, 0), float3(-1, 0, 1),
float3(-1, 0, 1), float3(0, 1, 0), float3(1, 0, 1),
float3(1, 0, 1), float3(0, 1, 0), float3(1, 0, -1),
float3(1, 0, -1), float3(0, 1, 0), float3(-1, 0, -1),
};

PS_INPUT VSMain(in uint index : SV_VertexID)
{
    PS_INPUT o = (PS_INPUT)0;

    o.vPosition.w = 1;

    o.vPosition.xyz = CUBE_VERTS[index % 24] * SCALE;

.

.

.

To render 1000's of cubes say, calculate their world positions using whatever system you want, sampling a buffer, a texture, algorithmicaly (like a noise function) etc. In DX11, sampling a texture is probably a good choice, but passing a buffer is fine too. I have also used a technique where, you generate the cube in the geometry shader, using each vertice as the cube position, and then stream out an entire cube per vertice, which is kind of a hack but works. It is not efficient but is convenient for voxel editing tools.

 

You can also render using SV_VertexID where a texture sample provides the vertice info. This is from an old system I used, showing how I packed vertice info into a texture and used the index to reference which pixel to sample:

MeshToTex.thumb.jpg.e1d690076bc0b97d7c3513447fdb1f8f.jpg

 

Or like this even for animations, again no vertex or index buffer was provided, just SV_VertexID was used and a texture sampled for the vertex information:

 

Packing1.thumb.jpg.073ef396fffe6cc7ea72171cd017d6c9.jpg

 

The reason for doing it this was was so you don't have to load different buffers for each mesh, a texture is ultimately just another buffer and in some ways is a good choice especially pre DX12 or if you want to interpolate (not really relevant in the above example).

 

Also, one of the above posts also misrepresents I think, in that using an index buffer only saves on vertex shader calls, not later shader processing and so index buffers are not magically better for situations like this where it might be convenient to not use one. Also don't confuse index with instanced rendering, a post above that says instanced rendering of small buffers is less efficient should be backed up with a source because I'm not sure that is true (unless you are only rendering a few instances).

 

IN SUMMARY: You don't have to follow the traditional vertex buffer pipeline although the standard techniques are standard for a reason and if something was superior, it would probably be documented somewhere. Still, it doesn't hurt to experiment and see for yourself.

Hi pcmaster und TeaTreeTim,

you are both very kind to give so comprehensive answers. Although i am experienced in directx 11 i have a big questionmark when the special cases come.  ( feeding no Vertexbuffer, no indexbuffer , dont set inputlayout ...??)

I understand your explanations quite well especially the PS Code example of TeaTreeTim but the devil is in the implementation detail

Quote

He'd issue a DrawIndexed(9000000, 0, 0) draw-call, which will actually generate SV_VertexID 0 through 8999999!

The fails of this was my source problem. I only got e.g. SV_VertexID output range of [0..7].

So what exactly are the conditions that i can a ( full range ) upcounted SV_VertexID ?

Are the following assumptions correct ? ??

a.  using drawindexed(1000, 0, 0) there has to be a Indexbuffer and Vertexbuffer so the SV_VertexID output => e.g. [0 .. 7]

b. using draw(1000) with a vertexbuffer attached the SV_VertexID output range =>  e.g. [0 .. 7]

c. using draw(1000) without a_vertexbuffer attached then magically the vertexshader will be called 1000 times
  SV_VertexID output range =>  e.g. [0 .. 999]  is this right ?  

  So in this c. case

  The microsoft Manual states that SV_VertexID can only be input into the first active shader

  1. when using    a   vertexShader i have to pass the SV_VertexID explicitely to the Pixelshader
  2. when using    no  vertexShader i can use VertexID  directly in the PixelShader
 

Quote

Other system values (SV_VertexID, SV_InstanceID, SV_IsFrontFace) can only be input into the first active shader in the pipeline that can interpret the particular value; after that the shader function must pass the values to subsequent stages.

You can't have a PS without a VS, so for your purposes, VS will be always the first active shader stage and you'll see SV_VertexID in it, don't worry :) Don't confuse VB and VS (just in case).

Assumption A is ambiguous. The SV_VertexID will be exactly what's stored in the index buffer. If your IB looks like 0,1,2,3,4,5,6,7, 0,1,2,3,4,5,6,7, 0,1,2,3,4,5,6,7, 0,1,2,3,4,5,6,7, then yes.

Assumption B is incorrect. The SV_VertexID will always run up to the argument of the Draw call (not DrawIndexed). So for Draw(1000), you'll see SV_VertexID from 0 to 999.

Just in case, note that the Draw() argument is the number of vertices, whereas the DrawIndexed() argument is the number of indices and DrawIndexedInstanced wants the number of indices and instances.

Assumption C = There's no magic.

If there's also a vertex buffer attached (or several vertex buffers, as defined by the input layout declaration), IA might be reading past the end of the vertex buffer (not good). The VertexID doesn't depend on the size of the VB(s) attached.

Just don't attach any vertex buffers and don't attach any index buffer.

I can't try this right now, so you'll have to try if a nullptr will work for IASetInputLayout (if not, declare a layout with 0 input elements). I also don't know about having an index buffer but no vertex buffer.

 

 

Also, as @TeaTreeTim mentioned, I cannot confirm that rendering too small meshes instanced (with DrawIndexedInstanced) is a performance problem. It isn't. It's totally fine to render e.g. 2-triangle (1-quad, 4-verts, 6-index for example) billboards (made up example). The input vertex buffer data will be cached just fine (if any!) and the vertex shader invocations are nicely 'packed' into 64-thread waves, at least that's what I observed on AMD GCN.

hello pcmaster,

thank you very much for your last post. Now i  understand how it works.

I could already implement it in my code and it works fine.

I asked myself how to set  a null vertexbuffer ( vertexbufferbinding ) but actuallally this is done by setting a null inputlayout.

When using a indexbuffer, which is  not necessary we have to create a static float4 vertecie position arry [8] ( e.g. Cube ) in the vertexshader. ( this i have implemented quick and dirty )

When using NO Indexbuffer we have to create a static float4 vertecie position arry [36] ( e.g. Cube ) 
This i  will try out next.

So as you all said no Vertexbuffer an no Indexbuffer Solution !!

The attributes of each cube can be read from other bound buffers by the SELF calculated instance Idx

greetings evelyn

This topic is closed to new replies.

Advertisement