Triangles can't keep up?

Started by
25 comments, last by Hodgman 9 years, 3 months ago
On modern hardware, yep, you can just get the vertex ID in your shader, divide by 4 (or 6 if non-indexed - there's 6 verts in a quad made of triangles) and then use that resulting value as an index into your position buffer, which you fetch from manually. Not much point relying on the fixed function features when you can do it yourself now.

On modern AMD hardware, there is no fixed-function vertex attribute fetching in hardware anyway! The driver takes the vertex attribute info you've told the app and then actually compiles it into a shader subroutine that does the fetches using shader logic and prepends it on your vertex shader.


With total manual fetching, don't bind any attributes, bind the buffers to uniforms instead. Then draw triangles with the primitive-count in the draw function being 2*numPoints (or vertex count being 6*NumPoints) - what're want to do is basically expanding each source point into 6 vertices.
Then in the shader:
TexCoord/Corner index = VertexID%6;
Position index = VertexID/6;
Fetch from the buffers using those indices.


The fixed function names for the feature are: Stream source frequency (D3D9), Vertex attribute divisor (GL3), Instance data step rate (D3D11).
I can't remember the GL2 name, it was probably some wacky extension.

On DX9/GL2 era fixed function, the HW's logic when fetching vertex data was:
VertexBufferOffset = StartVertex / Divider +
VertexIndex / Divider * StreamStride + StreamOffset

This meant that if Divider=1 then it was a per-vertex attribute, if Divider=NumVertsInDrawCall then it was a per-instance attribute.
If Divider was N or N*NumVertsInDrawCall then it was per N-verts/N-instances.

So, you could bind tex-coords as a per-vertex attribute and quad-center positions as a per-4/6-vertices attribute.


On GL3/D3D11, this feature has somehow become less powerful :(
Instead, the new logic is that if Divider=0 it's per-vertex, if Divider=1 it's per-instance, and if Divider=N it's per-N-instances.
i.e. There's no way you do per-N-vertices attributes via fixed function on the new APIs :(

However, you can still do instanced quad rendering. You have 6 values in an index buffer, for the verts of the two triangles that make up the quad. You then have 4 tex-coords in a per-vertex attribute. You then bind the positions as a per-instance attribute, and submit a draw-instanced command specifying two triangles and the number of quads as the number of instances.


TL;DR - GL2/D3D9 have a cool fixed function features you can use, or on modern GPU's you should probably just do it yourself in the shader, though you also have the option of instancing.
Advertisement

I'm targeting OpenGL 3.3, so my questions will be focused around that area

On modern hardware, yep, you can just get the vertex ID in your shader, divide by 4 (or 6 if non-indexed - there's 6 verts in a quad made of triangles) and then use that resulting value as an index into your position buffer, which you fetch from manually. Not much point relying on the fixed function features when you can do it yourself now.

With total manual fetching, don't bind any attributes, bind the buffers to uniforms instead. Then draw triangles with the primitive-count in the draw function being 2*numPoints (or vertex count being 6*NumPoints) - what're want to do is basically expanding each source point into 6 vertices.
Then in the shader:
TexCoord/Corner index = VertexID%6;
Position index = VertexID/6;
Fetch from the buffers using those indices.


I interpret this two different ways. In the top portion you make it seem like I have the ability to do:


//In vertex shader main
int index = gl_VertexId % 4;
gl_Position[index] = vec4(1,1,1,0);

Now that to me sounds like a pipe dream, so If I can actually do this please let me know. When you mention the portion about uniforms I can see this making more sense.
My question here is do you literally mean I can have a VBO sent through a uniform, because I know there are calls like glUniform4uiv. Which will allow me to send in an array of values. Or are you talking about making my VBO bind as a uniform buffer object (UBO)?

On GL3/D3D11, this feature has somehow become less powerful sad.png
Instead, the new logic is that if Divider=0 it's per-vertex, if Divider=1 it's per-instance, and if Divider=N it's per-N-instances.
i.e. There's no way you do per-N-vertices attributes via fixed function on the new APIs sad.png

However, you can still do instanced quad rendering. You have 6 values in an index buffer, for the verts of the two triangles that make up the quad. You then have 4 tex-coords in a per-vertex attribute. You then bind the positions as a per-instance attribute, and submit a draw-instanced command specifying two triangles and the number of quads as the number of instances.


After re-reading your post a few times I started to think maybe this what you meant. That I could just use the glVertexAttribDivisor call, but not actually use one of the glDraw** instance calls. But when looking at the reference docs I see it says: modify the rate at which generic vertex attributes advance during instanced rendering sad.png

Do you think that matters as I have seen some documentations say that it just advances the attribute faster?

I could use instancing, but I've heard and get the notion that this is a trap. That the performance is poor unless I'm doing a certain amount of vertices per object and that a instancing a quad is not worth it. Something to best be avoided much like a GEO-shader. Maybe I have my wires crossed here?

I could use instancing, but I've heard and get the notion that this is a trap. That the performance is poor unless I'm doing a certain amount of vertices per object and that a instancing a quad is not worth it. Something to best be avoided much like a GEO-shader. Maybe I have my wires crossed here?

Yes, the performance wins with instancing mostly appear when you have many vertices per instance. Only have 4 verts per instance is not ideal... but might still be worth it because it makes your code for drawing particles very simple -- One buffer with per-vertex data (just 4 tex-coords/corner values), one buffer with per-particle data (position/etc).

At the moment I am actually using this instancing technique to draw a crowd of 100000 people (each person is a textured quad, so 100k instances of a 4-vertex mesh) - so the performance is not terrible, it's just not as good as it could be in theory.

That I could just use the glVertexAttribDivisor call, but not actually use one of the glDraw** instance calls

On GL2/D3D9 you can do this... On GL3/D3D10 you have to use instancing and the per-instance divisor (or do it yourself in a shader).

I interpret this two different ways. In the top portion you make it seem like I have the ability to do:
//In vertex shader main
int index = gl_VertexId % 4;
gl_Position[index] = vec4(1,1,1,0);

No, more like


int index = gl_VertexId % 4;
int cornerindex = gl_VertexId / 4;
vec3 position = u_PositionBuffer[index];
vec2 texcoord = u_VertexBuffer[cornerIndex];
gl_Position = mul( mvp, vec4(position + texcoord*2-1, 1) );

My question here is do you literally mean I can have a VBO sent through a uniform?

Shaders can have raw uniforms (e.g. a vec4 variable), UBOs (buffers that hold a structure of raw uniforms), Textures (Which you can sample/load pixel from) and yes, VBOs (which you can load elements from).
In D3D, you can't directly bind textures/buffers (resources) to shaders - you can only bind 'views' of those resources. So once you have a Texture or Buffer resource, you create a "Shader Resource View" for it, and then you bind that shader-resource-view to the shader. This means that binding a buffer to a shader is exactly the same as binding a texture to a shader -- they're both just "resource views".

In GL it's a bit different. In GL there's no "resource views", instead, you can just bind texture resources to shaders directly (I assume you know how to do this already - texturing is important biggrin.png).

In order to bind a buffer to a shader, you have to make GL think that it is a texture. You do this by making a "buffer texture" object, which links to your VBO but gives you a new texture handle! You can then bind this to the shader like any other texture, but internally it'll actually be reading the data from your VBO. In your shader, you can read vertices from your buffer by using the texelFetch? GLSL function, e.g.


samplerBuffer? u_positionBuffer;
...
vec3 position = texelFetch( u_positionBuffer, index );

On GL3/D3D10 you have to use instancing and the per-instance divisor (or do it yourself in a shader).


When you talk about 'doing it yourself in the shader', I assume you're talking about using the Texture Buffer Objects(TBO) you mentioned below that portion. I have a few questions about limitations

1. Since it is a texture, does that mean values over 255 can't be used without some special magic or is this just nonsense?
2. In a TBO I see that I can only access up to the value of GL_MAX_TEXTURE_BUFFER_SIZE. Which must be at least 65536. So restricting myself to that minimum, I believe that means I could never hit a 1 Million particle count correct?
1 - no, this depends on the formats that you specify
You should be able to store float4's in you buffer if you like.
BTW you can also store float4's in "normal" textures too - there's a lot more texture formats than 8-bit available.

2- the extension spec decided to give a low minimum of 2^16, but IIRC it's expected that all vendors will support something more like 2^24.
So assuming the GPU/driver are sensible, you should be able to do 1m particles in a draw call. If not though, you can fall-back to doing multiple draw-calls, with each one binding the buffers with different offsets.

Get that value on your PC and see if it's reported as being much higher than 65k.
FWIW, I'm not aware of this kind of restriction under D3D, and the same GPU's/drivers also run D3D programs fine :)

1 - no, this depends on the formats that you specify
You should be able to store float4's in you buffer if you like.
BTW you can also store float4's in "normal" textures too - there's a lot more texture formats than 8-bit available.


Now for this format I assume I should choose one that supports my highest Vec* value used in my shader. But what happens if
I have mixed Vec* values? E.G vec3 for position and just a float for particle lifespan. Would I need to use filler dummy values to offset or will it automatically be 0 if nothing is specified?

2- the extension spec decided to give a low minimum of 2^16, but IIRC it's expected that all vendors will support something more like 2^24.
So assuming the GPU/driver are sensible, you should be able to do 1m particles in a draw call. If not though, you can fall-back to doing multiple draw-calls, with each one binding the buffers with different offsets.

Get that value on your PC and see if it's reported as being much higher than 65k.

I checked what this value was on my dev pc (5+ year old laptop) and its stupid high close to 300 Mill. Makes me wonder what hardware would only support that min value.

You do this by making a "buffer texture"

While we are on the subject of TBOs. I implemented these for the rendering portion of my system using a single TBO and a IBO. This kind of got me thinking, I know there is a technique to just use textures for updating particle states, so is there a point to using transform feedback? I know with transform feedback I can use the GPU for processing while the CPU does its own thing, but can't I do this with just using textures too? Or am I only limited to updating the texture through the CPU in a glBufferData call?

Now for this format I assume I should choose one that supports my highest Vec* value used in my shader. But what happens if
I have mixed Vec* values? E.G vec3 for position and just a float for particle lifespan. Would I need to use filler dummy values to offset or will it automatically be 0 if nothing is specified?

You can use a vec4 to hold a position + a lifespan.
It doesn't really matter though as long as each element is 4-byte aligned. i.e. a vec3 + a float will be fine as a 12-byte and a 4-byte element. Alternatively you can use a 16-byte vec4, but it's much the same.
If you're using smaller types though, it can end up being more vital to pack things together.
e.g. a 16-bit x 3, 8-bit x 1, 8-bit x 2, 8-bit x 3 are unusable vertex/pixel formats, as they result in bad alignments.
Strangely, 16-bit x 1 seems to be the exception, where 2-byte alignment is allowed.

While we are on the subject of TBOs. I implemented these for the rendering portion of my system using a single TBO and a IBO. This kind of got me thinking, I know there is a technique to just use textures for updating particle states, so is there a point to using transform feedback? I know with transform feedback I can use the GPU for processing while the CPU does its own thing, but can't I do this with just using textures too? Or am I only limited to updating the texture through the CPU in a glBufferData call?

Yeah holding the state in a texture is completely valid too. In this technique, you'd have a large 2D texture (or many large textures) holding the state of each particle.
e.g. if you required 8 floats to hold a particle's state, and you had 1M particles, you could use 2 textures that were each RGBA_FP32 and 1024*1024.

To update the particles, you'd need to double-buffer the data, as you can't read/write a texture simultaneously using the graphics pipeline.
Youd bind one set of your textures to an FBO, and then render a quad that fills the viewport (covers all 1024*1024 pixels), reading texture-set A, computing state, and outputting to texture-set B.
Next update you'll read from B and write to A, etc...

To draw the particles, you render a million quads, and use their quad ID (vertex ID / 4) to generate a texture coordinate, then read from your textures in the vertex shader to get the particle's properties.

n.b. Instead of using the graphics pipeline, on modern (DX11/GL4) GPUs, you could store the particle state in a buffer or texture, and update it using a compute shader, which is allowed to read+write the same resource!

Some older (DX9/GL2) GPUs may support reading from textures in the vertex shader, but might not support transform-feedback - so this technique could be a possible fallback for those GPUs.
Other DX9/GL2 GPUs don't support VTF (reading textures within vertex shaders) nor do they support transform-feedback... but some support a weird extension known as R2VB (render to vertex buffer) that lets you bind a vertex buffer to an FBO, so the pixel shader is actually outputting vertices!

So:
1 - GL2/DX9 A) No way to have the GPU generate vertices -- CPU generated particles only.
2 - GL2/DX9 B) Has R2VB -- Fragment shader writes particle state to a vertex buffer.
3 - GL2/DX9 C) Has VTF -- Fragment shader writes particle state to a texture, vertex shader reads values from texture.
4 - GL3/DX10) Has transform-feedback -- Vertex shader writes particle state to a vertex buffer.
5 - GL4/DX11) Has compute -- Compute shader writes particle state to a generic buffer (or texture).

#2 era also supports #1
#3 era also supports #1
#4 era also supports #1, #3
#5 era also supports #1, #3, #4

This topic is closed to new replies.

Advertisement