Multiple Vertex Types in large studios

Started by
9 comments, last by quaikohc2 12 years, 11 months ago
I'm getting to the point in my own engine where I'm rendering to a texture, and then rendering the texture to the screen, which necessitates at least 2 vertex types. There's the type that I use while rendering normal meshes, and then there's the more simplistic type that I use when rendering the texture to a screen sized quad.

[source lang="cpp"]

typedef struct Vertex
{
Vec3 pos;
Vec3 normal;
u32 colourARGB;
f32 u;
f32 v;
}Vertex;


typedef struct ScreenVertex
{
f32 x; // 0
f32 y;
f32 z;
f32 h;
f32 u; // 16
f32 v;
}ScreenVertex;
[/source]

Something like that. What I was wondering if how often larger studios would use multiple vertex types. I mean, most of them would need at least these 2, but would they have different mesh files compiled that use several different vertex types, or is the practice usually to have a single type that's used the most that supports everything they'll ever needs, like multiple sets of texture coords and so on?
[size="2"][size=2]Mort, Duke of Sto Helit: NON TIMETIS MESSOR -- Don't Fear The Reaper
Advertisement
our internal API allows to bind streams and before a drawcall is issued, a fitting vertex declaration is selected (and if needed, created), based on the streams that are currently set and the shader input.

having exactly the needed stream for a shader can give you performance, that's also the case for the interpolators that you pass from vertex to pixelshader.

Some other times you might switch vertex formats (besides your model vs screen coord example) -- is it light-mapped (does it need a 2nd set of UVs)? Is it animated (skinning weights/indices)? Does it have colours or lighting (e.g. AO) baked into a vertex channel? Is it normal mapped (tangent/binormal required)?
Yeah, I had completely forgotten about DX being able to render in streams, and was stuck thinking that all these companies had 12 different Vertex formats floating around, which would be crazy (I think so, at least), but having a stream of normals, positions, colours, etc, would make it much easier to render things with multiple formats.
[size="2"][size=2]Mort, Duke of Sto Helit: NON TIMETIS MESSOR -- Don't Fear The Reaper
Note that the convenience of using many streams does have a cost though -- interleaved streams (i.e. a stream with multiple elements, such as position/normal/etc) are friendlier on the cache and will make your vertex processing go slightly faster, depending on the GPU etc...
Is it just the cache cost that will make it go slower? I mean, depending on the GPU, but in the general case, is that going to be the main thing affecting it, just having the cache thrashed? I assume that since I've got all vertex elements in a single structure and have an array of that structure that I'm actually using interleaved streams?
[size="2"][size=2]Mort, Duke of Sto Helit: NON TIMETIS MESSOR -- Don't Fear The Reaper
While I can't speak for other companies where I work vertex formats and shaders were pretty much linked.

So, a model would be processed by our tools pipeline and that would spit out a file with the (interleaved) vertex buffer data in it and a head which, on load, was processed to figure out what streams the file has so they could be bound later (reusing/creating the required information behind the scenes).

At the other end the vertex shader had to match these streams; if it didn't the model would render visible wrong and one of us rendering monkeys would have to fix it. (You could probably do some validation steps somewhere however to avoid the need for someone to spot it).

Is it just the cache cost that will make it go slower? I mean, depending on the GPU, but in the general case, is that going to be the main thing affecting it, just having the cache thrashed?


Look at it this way - your input to a vertex shader is a chunk of data containing position, normals, texcoords, and whatever else you may have. That has to come from somewhere, and if you've got mutiple streams the GPU has to assemble it from those streams. Take stream 1, grab position, copy to VS registers, move forward 12 bytes. Take stream 2, grab normal, copy to VS registers, move forward 12 bytes. And so on until it has everything. On the other hand with a single interleaved stream it goes like: take full vertex, copy to VS registers, move forward (vertexsize) bytes.

If the GPU is capable of processing multiple vertexes in parallel (and most are) then the multiple stream approach has to hop back and forth between streams moving forward a itty little bit at a time. With a single stream it's just a fast linear and sequential scan and copy through a single block of data. So feeding data to the GPU in the order and layout it likes best will always get you the best performance.

I assume that since I've got all vertex elements in a single structure and have an array of that structure that I'm actually using interleaved streams?


Yup, that's it.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.


... vertex formats and shaders were pretty much linked.
So, a model would be processed by our tools pipeline and that would spit out a file with the (interleaved) vertex buffer data in it ... At the other end the vertex shader had to match these streams
Can I ask how you deal with the case where the one model needs to be rendered using different shaders?


We've recently been using an engine that works in the exact same way, but we found this approach restricting when it came to experimenting with changes to the rendering pipeline.
e.g. a forward lit pass, vs a shadow-map pass, vs a g-buffer pass, vs a deferred-lighting materials pass, could potentially all require different vertex inputs.
On this engine we ended up duplicating entire models in the cases where a model needed different shaders :/
sorry, by streams I wasn't strictly talking about memory buffer, but just about vertex attributes (like normal stream, position stream, color stream....)


Note that the convenience of using many streams does have a cost though -- interleaved streams (i.e. a stream with multiple elements, such as position/normal/etc) are friendlier on the cache and will make your vertex processing go slightly faster, depending on the GPU etc...

interleaved streams are cache friendlier if you use all attributes in the buffer, they are cache unfriendlier if you don't,as they pollute the cache with unused data.

for this reason it's common to combine the streams that you use together into the same buffer,

- Positions -> used for the diffuse rendering, as well as shadowmap rendering (and if you have one, then also for the zpass)


- UV-albedo texture -> used for diffuse (with alpha-test geometry also for shadowmaps and zpass)

- UV-Bump/specular -> used for diffuse

- tangent/bitangent -> for diffuse

- normal -> diffuse

...




so, it would make sense to put

- positions in one buffer, if your geometry is alpha-tested (and that's hugely defined by the geometry, it's rare that you use the same geometry for alphatested and non-alphatested materials e.g. vegetation leaves), then you can also put those UVs into the same buffer.

UVs for normalmaps and tangent/bitangent shall be in the same buffer, they won't be used seperately, usually

Normals (e.g. used on lower LODs or simple meshes without bump-/normalmapping) can be in a stream with the UV-diffuse stream.




screen-sized quads, you shouldn't really care, make the fattest vertex you can create and if you don't need some data, don't use it. it will never be a performance bottleneck. it's WAY more important that your VS->PS data is as tightly packed as possible, as this one costs.




(oh, and this was just an example, if you have a deferred renderer, or your Diffuse and normalmaps have always the same layout or if you make a celshaded renderer, it might change quite a bit :D. )

there is also an article about vertex stream utilization on I think ShaderX2 or X3?, from one of the "gothic game" guys, although I don't fully agree with it ;), it's one of those rare references for this particular topic

This topic is closed to new replies.

Advertisement