The average 3D game has a lot of geometry data. Quake 3 levels have thousands of polygons; more recent systems like HL2 have orders of magnitude more. Thousands of polygons bring with them thousands of vertices. We've got limited video memory, so we can only load so many vertices onto the card at once; however, we want to avoid splitting the vertices up into seperate buffers as much as possible, because switching buffers is slow. How can we improve the memory footprint of our vertices?
One of the things that many people seem to overlook is the flexibility we have in our vertex formats. In the days of the programmable pipeline, position data doesn't necessarily mean an (X, Y, Z) float tuple.
So firstly, you can drop bits you don't need. I was writing a water renderer recently, in which all vertices in the source mesh were at the same height. If they're all at the same height, I don't need to store that height in each vertex - I can store it in a vertex shader constant instead, and the shader can 'reassemble' the position before using it. So that's what I did. You can use a similar approach for, say, a mesh that is a constant colour but has varying alpha. And if you get really desperate for space, you can drop components that could be deduced in the vertex shader - if your normals are normalised, you can calculate the third component using only the first two. Same for blend weights.
Secondly, you've got a choice of formats. We know that X bits can store (1 << X) different values. Consider your source data - do you really have that many unique values to store? Even if you do, is it important that the values be so diverse? Through quantization - mapping the full set of values onto a restricted set, like rounding everything to the nearest integer - you can reduce the number of bits you need.
Consider texture coordinates. You can create meshes that only have texture coordinates in the 0..1 range, by adjusting and subdividing anything outside that. 0..1 might be a bit too restrictive.. we'll say -8 to +8, just to give us some room to wrap a little. So, our texcoords are in the -8 to +8 range, and most people would store texcoords as 32bit floats, so we've got (1 << 32) unique values between -8 and +8 that our texcoords could be. (1 << 32) is around 4.3 billion, so the smallest difference between two values we can store is in the region of 0.000000003. If we were using a 256x256 texture, that resolution lets us store texture coordinates to the accuracy of around 0.000001 of a texel.
If you ask me, that's overkill.
So let's drop it down a size. Let's implement texture coordinates as 'short's instead - 16-bit values. (1 << 16) is 65535, and 65535 values across our [-8..8] range gives us about 0.00024 between values. For a 256x256 texture, it results in texture coordinates accurate to 0.0625 of a texel - or 1/16th. Much better.
'short' is a 16-bit integer type ranging from 0 to 65535. Our texture coordinates are floating point values ranging from -8 to +8. How do we consolidate the two? Simple - linear mapping. -8 will map to 0, and +8 will map to 65535, with values in between being evenly distributed. Encoded value = ((value + 8) / 16) * 65535. In the vertex shader, just reverse the process - takes one multiply-add instruction - and you're sorted.
And in the process you've halved the space that texture coordinate was taking up. Nice job.
Check out the docs for a full list of available formats. I'm eyeing DEC3N for normals, amongst other things. It's slightly irritating that the set of available types is fairly limited - there is no SHORT1 or SHORT3 type, for example - but you can try and take advantage of spare slots by packing data together. Need to store UV and a bone index? Pack them together into a SHORT4, and hey, if you've got an extra 1D texcoord knocking around there's a free slot for it.
Just a quick clarification... from my general reading about the subject, very few games (note: not tech demos!) are transform limited. Thus, I presume, it's fairly safe to say that the overhead of reconstructing/decompressing vertex data is of little-to-no importance performance wise?
A regular vertex:
Position : 3 floats
Normal : 3 floats
TexCoord : 2 floats
---
32 bytes
Position: 2 floats (1 is a constant height -> only works for your water example though)
Normal : 2 floats (normalized => 3rd can be known)
TexCoord: 2 shorts (as per your example)
---
24 bytes
So, for the "optimal" 4mb static vertex buffer approximately 43 thousand more verticies for your money [grin]
Jack