Sign in to follow this  
Endar

Multiple Vertex Types in large studios

Recommended Posts

I'm getting to the point in my own engine where I'm rendering to a texture, and then rendering the texture to the screen, which necessitates at least 2 vertex types. There's the type that I use while rendering normal meshes, and then there's the more simplistic type that I use when rendering the texture to a screen sized quad.

[source lang="cpp"]

typedef struct Vertex
{
Vec3 pos;
Vec3 normal;
u32 colourARGB;
f32 u;
f32 v;
}Vertex;


typedef struct ScreenVertex
{
f32 x; // 0
f32 y;
f32 z;
f32 h;
f32 u; // 16
f32 v;
}ScreenVertex;
[/source]

Something like that. What I was wondering if how often larger studios would use multiple vertex types. I mean, most of them would need at least these 2, but would they have different mesh files compiled that use several different vertex types, or is the practice usually to have a single type that's used the most that supports everything they'll ever needs, like multiple sets of texture coords and so on?

Share this post


Link to post
Share on other sites
our internal API allows to bind streams and before a drawcall is issued, a fitting vertex declaration is selected (and if needed, created), based on the streams that are currently set and the shader input.

having exactly the needed stream for a shader can give you performance, that's also the case for the interpolators that you pass from vertex to pixelshader.

Share this post


Link to post
Share on other sites
Some other times you might switch vertex formats (besides your model vs screen coord example) -- is it light-mapped (does it need a 2nd set of UVs)? Is it animated (skinning weights/indices)? Does it have colours or lighting (e.g. AO) baked into a vertex channel? Is it normal mapped (tangent/binormal required)?

Share this post


Link to post
Share on other sites
Yeah, I had completely forgotten about DX being able to render in streams, and was stuck thinking that all these companies had 12 different Vertex formats floating around, which would be crazy (I think so, at least), but having a stream of normals, positions, colours, etc, would make it much easier to render things with multiple formats.

Share this post


Link to post
Share on other sites
Note that the convenience of using many streams does have a cost though -- interleaved streams ([i]i.e. a stream with multiple elements, such as position/normal/etc[/i]) are friendlier on the cache and will make your vertex processing go [i]slightly[/i] faster, depending on the GPU etc...

Share this post


Link to post
Share on other sites
Is it just the cache cost that will make it go slower? I mean, depending on the GPU, but in the general case, is that going to be the main thing affecting it, just having the cache thrashed? I assume that since I've got all vertex elements in a single structure and have an array of that structure that I'm actually using interleaved streams?

Share this post


Link to post
Share on other sites
While I can't speak for other companies where I work vertex formats and shaders were pretty much linked.

So, a model would be processed by our tools pipeline and that would spit out a file with the (interleaved) vertex buffer data in it and a head which, on load, was processed to figure out what streams the file has so they could be bound later (reusing/creating the required information behind the scenes).

At the other end the vertex shader had to match these streams; if it didn't the model would render visible wrong and one of us rendering monkeys would have to fix it. (You could probably do some validation steps somewhere however to avoid the need for someone to spot it).

Share this post


Link to post
Share on other sites
[quote name='Endar' timestamp='1305555031' post='4811430']
Is it just the cache cost that will make it go slower? I mean, depending on the GPU, but in the general case, is that going to be the main thing affecting it, just having the cache thrashed?[/quote]

Look at it this way - your input to a vertex shader is a chunk of data containing position, normals, texcoords, and whatever else you may have. That has to come from somewhere, and if you've got mutiple streams the GPU has to assemble it from those streams. Take stream 1, grab position, copy to VS registers, move forward 12 bytes. Take stream 2, grab normal, copy to VS registers, move forward 12 bytes. And so on until it has everything. On the other hand with a single interleaved stream it goes like: take full vertex, copy to VS registers, move forward (vertexsize) bytes.

If the GPU is capable of processing multiple vertexes in parallel (and most are) then the multiple stream approach has to hop back and forth between streams moving forward a itty little bit at a time. With a single stream it's just a fast linear and sequential scan and copy through a single block of data. So feeding data to the GPU in the order and layout it likes best will always get you the best performance.

[quote name='Endar' timestamp='1305555031' post='4811430']I assume that since I've got all vertex elements in a single structure and have an array of that structure that I'm actually using interleaved streams?
[/quote]

Yup, that's it.

Share this post


Link to post
Share on other sites
[quote name='phantom' timestamp='1305557854' post='4811448']
... vertex formats and shaders were pretty much linked.
So, a model would be processed by our tools pipeline and that would spit out a file with the (interleaved) vertex buffer data in it ... At the other end the vertex shader had to match these streams[/quote]Can I ask how you deal with the case where the one model needs to be rendered using different shaders?


We've recently been using an engine that works in the exact same way, but we found this approach restricting when it came to experimenting with changes to the rendering pipeline.
e.g. a forward lit pass, vs a shadow-map pass, vs a g-buffer pass, vs a deferred-lighting materials pass, could potentially all require different vertex inputs.
On this engine we ended up duplicating entire models in the cases where a model needed different shaders :/

Share this post


Link to post
Share on other sites
sorry, by streams I wasn't strictly talking about memory buffer, but just about vertex attributes (like normal stream, position stream, color stream....)

[quote name='Hodgman' timestamp='1305553234' post='4811420']
Note that the convenience of using many streams does have a cost though -- interleaved streams ([i]i.e. a stream with multiple elements, such as position/normal/etc[/i]) are friendlier on the cache and will make your vertex processing go [i]slightly[/i] faster, depending on the GPU etc...
[/quote]
interleaved streams are cache friendlier if you use all attributes in the buffer, they are cache unfriendlier if you don't,as they pollute the cache with unused data.

for this reason it's common to combine the streams that you use together into the same buffer,

- Positions -> used for the diffuse rendering, as well as shadowmap rendering (and if you have one, then also for the zpass)


- UV-albedo texture -> used for diffuse (with alpha-test geometry also for shadowmaps and zpass)

- UV-Bump/specular -> used for diffuse

- tangent/bitangent -> for diffuse

- normal -> diffuse

...




so, it would make sense to put

- positions in one buffer, if your geometry is alpha-tested (and that's hugely defined by the geometry, it's rare that you use the same geometry for alphatested and non-alphatested materials e.g. vegetation leaves), then you can also put those UVs into the same buffer.

UVs for normalmaps and tangent/bitangent shall be in the same buffer, they won't be used seperately, usually

Normals (e.g. used on lower LODs or simple meshes without bump-/normalmapping) can be in a stream with the UV-diffuse stream.




screen-sized quads, you shouldn't really care, make the fattest vertex you can create and if you don't need some data, don't use it. it will never be a performance bottleneck. it's WAY more important that your VS->PS data is as tightly packed as possible, as this one costs.




(oh, and this was just an example, if you have a deferred renderer, or your Diffuse and normalmaps have always the same layout or if you make a celshaded renderer, it might change quite a bit :D. )

there is also an article about vertex stream utilization on I think ShaderX2 or X3?, from one of the "gothic game" guys, although I don't fully agree with it ;), it's one of those rare references for this particular topic

Share this post


Link to post
Share on other sites
[quote name='Krypt0n' timestamp='1305622882' post='4811842']
screen-sized quads, you shouldn't really care, make the fattest vertex you can create and if you don't need some data, don't use it. it will never be a performance bottleneck. it's WAY more important that your VS->PS data is as tightly packed as possible, as this one costs.
[/quote]

are you really sure about this? this is text fragment from Nvidia's gpu programming guide (3.4.10.1) , and you are saying the opposite... (paper is about geforce 9 so its not that outdated)

"When interpolating between VS and PS don’t pack
When passing data from a vertex shader to a pixel shader, the number of scalar
attributes is the only thing that matters.
Packing too much information into a calculation can make it harder for the
compiler to optimize your code efficiently. For example, if you are passing
down a tangent matrix, do not include the view vector in the 3 q components."

"Note: this is not true for vertex declarations (i.e. INPUT to a vertex
shader). When creating a vertex buffer, the number of attributes AND the
number of vector values are both relevant and packing is a valid optimization.
See section 4.4 for more information about vertex setup and attribute
boundedness. "

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this