• Advertisement
Sign in to follow this  

Recommended byte size of a single vertex

This topic is 391 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm wondering, what is the recommended byte size of a single vertex?

Currently, we're using 80 bytes per vertex in our engine:

struct Vertex
{
DirectX::XMFLOAT3 position; // 0 + 12 = 12 bytes
DirectX::XMFLOAT3 normal; // 12 + 12 = 24 bytes
DirectX::XMFLOAT3 tangent; // 24 + 12 = 36 bytes
DirectX::XMFLOAT3 bitangent; // 36 + 12 = 48 bytes
DirectX::XMFLOAT2 uv; // 48 + 8 = 56 bytes
DirectX::XMFLOAT4 color; // 56 + 16 = 72 bytes
uint32_t bone_id; // 72 + 4 = 76 bytes
float bone_weight; // 76 + 4 = 80 bytes
}

What are the things to keep in mind when setting up a vertex layout?

 

Share this post


Link to post
Share on other sites
Advertisement

Have it be a multiple of your GPU's cache line size (which typically means a multiple of 32) and be as small as possible.

In your case there is ample opportunity for compression: you can, for example, drop the bitangent and compute it at runtime (in your vertex shader) from your normal and tangent - that's a 12-byte saving (and may even run faster since compute is very cheap these days).  Similarly, your color could be reduced to a ubyte4 type which is normally sufficient - another 12 bytes, and now your vertex size is 56.

Edited by mhagain

Share this post


Link to post
Share on other sites

Have it be a multiple of your GPU's cache line size (which typically means a multiple of 32) and be as small as possible.

In your case there is ample opportunity for compression: you can, for example, drop the bitangent and compute it at runtime (in your vertex shader) from your normal and tangent - that's a 12-byte saving (and may even run faster since compute is very cheap these days).  Similarly, your color could be reduced to a ubyte4 type which is normally sufficient - another 12 bytes, and now your vertex size is 56.

Is that sort of stuff worth it though? Reducing memory usage at the cost of increasing computation?

Share this post


Link to post
Share on other sites

Is that sort of stuff worth it though? Reducing memory usage at the cost of increasing computation?


Computation is cheap - and in this case it's just a cross-product per vertex so it's not something you really need to worry excessively over.

Also worth noting that we're not talking about reducing overall memory usage as a goal here.  I need to be absolutely clear on that, because (so long as you're not swapping) amount of memory used is almost never an arbiter of performance.  The goal is instead to reduce the vertex size so that it fits into fewer cache lines.

Share this post


Link to post
Share on other sites

First a question, its been a while regarding skeletal animation but shouldn't you have 3 bone weights since you have 4 bones?

Second the optimal layout would be each stream being a power of 2 in size.  So if you have a full vertex of 48bytes split it into one 32byte and one 16byte per element stream.  Also like previously mentioned the smaller the better.

edit - forget the question, I misread.

Edited by Infinisearch

Share this post


Link to post
Share on other sites

In your case there is ample opportunity for compression: you can, for example, drop the bitangent and compute it at runtime (in your vertex shader) from your normal and tangent - that's a 12-byte saving (and may even run faster since compute is very cheap these days).

Or go a step smaller and store a quaternion tangent frame instead of a separate normal and tangent. Crytek did a presentation a while ago on this.

Share this post


Link to post
Share on other sites

Normals can also be compressed into 4 bytes without quality loss -- 11_11_10 is ok, 16_16 octahedral is better.
That means you can store normal + tangent in 8 bytes.
You probably don't need full 32-bit float precision for UV's, so you could put them in 16-bit half-floats, which reduces them from 8 bytes to 4 bytes total.
If colour is not HDR, then it can be stored in 8-bit per channel, reducing them from 16 bytes to 4 bytes.
Bone ID and weight can often be 8 bit, but we've got space in the structure leftover so let's say they need to be 16bit each.
This gives you a 32byte vertex -- a 2.5x improvement!

struct Vertex
{
 float3 position;     // 0 + 12 = 12 bytes
 uint normal;         // 12 + 4 = 16 bytes
 uint  tangent;       // 16 + 4 = 20 bytes
 half2 uv;            // 20 + 4 = 24 bytes
 uint color;          // 24 + 4 = 28 bytes
 uint bone_id_weight; // 28 + 4 = 32 bytes
}

You also need one bit to tell you whether the binormal is cross(normal,tangent) or -cross(normal,tangent), but you could squeeze that into one of the bits in the bone_id_weight field -- 15 bits for weight is still way more than required.

Is that sort of stuff worth it though? Reducing memory usage at the cost of increasing computation?

Computers double in computation power every two years, and double in memory bandwidth every 10 years... Which is another way to say that every decade the "bytes transferred per FLOP" performance of computers gets 32 times WORSE! :o (on the flipside, "FLOPS available per transferred byte" gets 32 times better each decade! :D )
Optimizing for memory bandwidth has always been important for GPU's, and is getting more important with every new generation of hardware :(

Edited by Hodgman

Share this post


Link to post
Share on other sites

.

Is that sort of stuff worth it though? Reducing memory usage at the cost of increasing computation?

Computers double in computation power every two years, and double in memory bandwidth every 10 years... Which is another way to say that every decade the "bytes transferred per FLOP" performance of computers gets 32 times WORSE! :o (on the flipside, "FLOPS available per transferred byte" gets 32 times better each decade! :D )
Optimizing for memory bandwidth has always been important for GPU's, and is getting more important with every new generation of hardware :(

 

Well not anymore, Moore's law has technically been dead for around 2+ years now. And HBM is a huge boost in memory bandwidth! Buuuut, that doesn't apply to consoles. Or even to almost any GPU out now for that matter. So for the most part a handful of decompression instructions in order to minimize memory bandwidth and size is an easy tradeoff to make.

Share this post


Link to post
Share on other sites

Well not anymore, Moore's law has technically been dead for around 2+ years now. And HBM is a huge boost in memory bandwidth!

 [citation needed] :D

http://www.telegraph.co.uk/technology/2017/01/05/ces-2017-moores-law-not-dead-says-intel-boss/
Technically Moore's law is about how many transistors you can fit onto a surface, and Intel is still keeping up the pace in 2017. In practice the number of transistors you can fit onto a surface does roughly corellate with compute performance. 
In recent years single-core perf has plateaued, but we can still just keep adding more and more cores to keep the performance charts growing.
On a timescale where bandwidth doubles every decade, sudden leaps like HBM don't really affect that long term trend -- or if they do, you can't be sure until a decade's time when you look back over the data :)

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement