Jump to content

  • Log In with Google      Sign In   
  • Create Account


#Actualhupsilardee

Posted 21 November 2012 - 06:30 PM

IIRC from something I read elsewhere, when the hardware reads in a vertex the read is done in 32 byte chunks from the vertex buffer, every time. So let's say your vertex shader input looks like this:

struct Vertex
{
	 float3 Position;
	 float3 Normal;
	 float2 TexCoord;
};

and your vertex buffers are set up like this:
// Vertex buffer 1 - Positions
pos1 | pos2 | pos3 | pos4 | ...
// Vertex buffer 2 - Normals
norm1 | norm2 | norm3 | norm4 | ...
// Vertex buffer 3 - TexCoords
tex1 | tex2 | tex3 | tex4 | ...

Then the device has to do 3 32 byte reads per vertex, for a total of 96 bytes, 64 of which is useless and will be discarded. However, if I packed one vertex buffer
// Vertex buffer 1 - Positions, normals and texcoords all interleaved
pos1 | norm1 | tex1 | pos2 | norm2 | tex2 | pos3 | norm3 | tex3 | pos4 | norm4 | tex4 | ...
Then the device only reads one 32 byte chunk which is a 3x bandwidth saver.

If your're complicating things further by using different indices, that's another lot of Buffer<ushort>::Load() calls slowing down the shader. There's no point sacrificing texture samples to save a little GPU memory, I bet your entire game's vertex content weighs less than your 2048x2048 shadow map anyway.

(Doesn't everybody have a 2048x2048 shadow map these days? Or several :))

#2hupsilardee

Posted 21 November 2012 - 06:30 PM

IIRC from something I read elsewhere, when the hardware reads in a vertex the read is done in 32 byte chunks from the vertex buffer, every time. So let's say your vertex shader input looks like this:

struct Vertex
{
	 float3 Position;
	 float3 Normal;
	 float2 TexCoord;
};

and your vertex buffers are set up like this:
// Vertex buffer 1 - Positions
pos1 | pos2 | pos3 | pos4 | ...
// Vertex buffer 2 - Normals
norm1 | norm2 | norm3 | norm4 | ...
// Vertex buffer 3 - TexCoords
tex1 | tex2 | tex3 | tex4 | ...

Then the device has to do 3 32 byte reads per vertex, for a total of 96 bytes, 64 of which is useless and will be discarded. However, if I packed one vertex buffer
// Vertex buffer 1 - Positions, normals and texcoords all interleaved
pos1 | norm1 | tex1 | pos2 | norm2 | tex2 | pos3 | norm3 | tex3 | pos4 | norm4 | tex4 | ...
Then the device only reads one 32 byte chunk which is a 3x bandwidth saver.

If your're complicating things further by using different indices, that's another lot of Buffer<ushort>::Load() calls slowing down the shader. There's no point sacrificing texture samples to save a little GPU memory, I bet your entire game's vertex content weighs less than your 2048x2048 shadow map anyway.

(Doesn't everybody have a 2048x2048 shadow map these days? Or several :))

#1hupsilardee

Posted 21 November 2012 - 06:25 PM

IIRC from something I read elsewhere, when the hardware reads in a vertex the read is done in 32 byte chunks from the vertex buffer, every time. So let's say your vertex shader input looks like this:

struct Vertex
{
	 float3 Position;
	 float3 Normal;
	 float2 TexCoord;
};

and your vertex buffers are set up like this:
// Vertex buffer 1 - Positions
pos1 | pos2 | pos3 | pos4 | ...
// Vertex buffer 2 - Normals
norm1 | norm2 | norm3 | norm4 | ...
// Vertex buffer 3 - TexCoords
tex1 | tex2 | tex3 | tex4 | ...

Then the device has to do 3 32 byte reads per vertex, for a total of 96 bytes, 64 of which is useless and will be discarded. However, if I packed one vertex buffer
// Vertex buffer 1 - Positions, normals and texcoords all interleaved
pos1 | norm1 | tex1 | pos2 | norm2 | tex2 | pos3 | norm3 | tex3 | pos4 | norm4 | tex4 | ...
Then the device only reads one 32 byte chunk which is a 3x bandwidth saver.

and if you're worried about memory savings, I bet your entire game's vertex content weighs less than your 2048x2048 shadow map anyway

PARTNERS