Jump to content

  • Log In with Google      Sign In   
  • Create Account

14 years ago on June 15th Gamedev.net was first launched! We want to thank all of you for being part of our community and hope the best years are ahead of us. Happy birthday Gamedev.net!

#ActualC0lumbo

Posted 23 December 2012 - 02:45 AM

I do think the software approach will beat the vertex shader approach, so you should probably go ahead and give that a shot.

But - I think you should be able to manage more than 16 cubes per batch. Firstly, you don't need a full 4x4 matrix - you can switch to 4x3 matrices and implicitly assume that the last row is 0, 0, 0, 1. You might need to transpose your matrices to achieve this.

Also, I don't think you need two arrays of matrices. I assume one of the matrices is for your normals, and one for the positions. But you can usually use the same matrix for your positions and your normals and simply ignore the position for the normals.

So, by my reckoning you should be able to manage 128/3=42 cubes each batch. Your GLSL code might end up looking a bit like this (untested, not compiled):

uniform float4 g_vMatrices[42*3];

...

vWorldPos.x = dot(g_vMatrices[iCubeIndexAttribute*3], float4(vPositionAttribute, 1.0));
vWorldPos.y = dot(g_vMatrices[iCubeIndexAttribute*3+1], float4(vPositionAttribute, 1.0));
vWorldPos.z = dot(g_vMatrices[iCubeIndexAttribute*3+2], float4(vPositionAttribute, 1.0));
vWorldNormal.x = dot(g_vMatrices[iCubeIndexAttribute*3], float4(vNormalAttribute, 0.0));
vWorldNormal.y = dot(g_vMatrices[iCubeIndexAttribute*3+1], float4(vNormalAttribute, 0.0));
vWorldNormal.z = dot(g_vMatrices[iCubeIndexAttribute*3+2], float4(vNormalAttribute, 0.0));


Oops - just realised you'd then need to transform the positions by the viewproj matrix which will take up a few more uniforms, so you'll end up with only 41 cubes per batch.

#2C0lumbo

Posted 23 December 2012 - 02:43 AM

I do think the software approach will beat the vertex shader approach, so you should probably go ahead and give that a shot.

But - I think you should be able to manage more than 16 cubes per batch. Firstly, you don't need a full 4x4 matrix - you can switch to 4x3 matrices and implicitly assume that the last row is 0, 0, 0, 1. You might need to transpose your matrices to achieve this.

Also, I don't think you need two arrays of matrices. I assume one of the matrices is for your normals, and one for the positions. But you can usually use the same matrix for your positions and your normals and simply ignore the position for the normals.

So, by my reckoning you should be able to manage 128/3=42 cubes each batch. Your GLSL code might end up looking a bit like this (untested, not compiled):

uniform float4 g_vMatrices[42*3];

...

vWorldPos.x = dot(g_vMatrices[iCubeIndexAttribute*3], float4(vPositionAttribute, 1.0)));
vWorldPos.y = dot(g_vMatrices[iCubeIndexAttribute*3+1], float4(vPositionAttribute, 1.0)));
vWorldPos.z = dot(g_vMatrices[iCubeIndexAttribute*3+2], float4(vPositionAttribute, 1.0)));
vWorldNormal.x = dot(g_vMatrices[iCubeIndexAttribute*3], float4(vNormalAttribute, 0.0)));
vWorldNormal.y = dot(g_vMatrices[iCubeIndexAttribute*3+1], float4(vNormalAttribute, 0.0)));
vWorldNormal.z = dot(g_vMatrices[iCubeIndexAttribute*3+2], float4(vNormalAttribute, 0.0)));


Oops - just realised you'd then need to transform the positions by the viewproj matrix which will take up a few more uniforms, so you'll end up with only 41 cubes per batch.

#1C0lumbo

Posted 23 December 2012 - 02:42 AM

<blockquote class="ipsBlockquote" data-author="space_cadet" data-cid="5013406"><p>The one big drawback about this method is, like you pointed out, the limited uniform space. For those who don't have a clue what that is (like I did): it is the space available for declaring uniform variables in the shader. I read somewhere that this space relates to the amount / size of registers the GPU has - correct me if I'm wrong here. For anyone interested, calling <em>glGetIntegerv(GL_MAX_VERTEX_UNIFORM_VECTORS)</em> will return a number that says how much uniform space you have on your system. <a data-cke-saved-href="http://www.khronos.org/opengles/sdk/docs/man/xhtml/glGet.xml" href="http://www.khronos.org/opengles/sdk/docs/man/xhtml/glGet.xml">The specification for OpenGL ES 2.0 says it has to be at least 128</a>. The number is expressed in vectors, and since each matrix has 4 vectors, that means you could declare a maximum of 32 matrix variables or, in my case, two arrays of 16 matrices. In other words, I can batch a maximum of 16 cubes now.&nbsp;<br />&nbsp;<br />&nbsp;<br /><strong>NEXT</strong><br />&nbsp;<br />Dynamic indexing has certainly improved performance, but I am not entirely happy yet. I will implement the abovementioned option 1, hoping to improve performance by shifting work from the GPU to the CPU and, as a next step, parallize the <em>move()</em> and <em>draw()</em> operations.</p></blockquote><br />I do think the software approach will beat the vertex shader approach, so you should probably go ahead and give that a shot.<br /><br />But - I think you should be able to manage more than 16 cubes per batch. Firstly, you don't need a full 4x4 matrix - you can switch to 4x3 matrices and implicitly assume that the last row is 0, 0, 0, 1. You might need to transpose your matrices to achieve this.<br /><br />Also, I don't think you need two arrays of matrices. I assume one of the matrices is for your normals, and one for the positions. But you can usually use the same matrix for your positions and your normals and simply ignore the position for the normals.<br /><br />So, by my reckoning you should be able to manage 128/3=42 cubes each batch. Your GLSL code might end up looking a bit like this (untested, not compiled):<br /><br />uniform float4 g_vMatrices[42*3];<br /><br />...<br /><br />vWorldPos.x = dot(g_vMatrices[iCubeIndexAttribute*3], float4(vPositionAttribute, 1.0)));<br />vWorldPos.y = dot(g_vMatrices[iCubeIndexAttribute*3+1], float4(vPositionAttribute, 1.0)));<br />vWorldPos.z = dot(g_vMatrices[iCubeIndexAttribute*3+2], float4(vPositionAttribute, 1.0)));<br />vWorldNormal.x = dot(g_vMatrices[iCubeIndexAttribute*3], float4(vNormalAttribute, 0.0)));<br />vWorldNormal.y = dot(g_vMatrices[iCubeIndexAttribute*3+1], float4(vNormalAttribute, 0.0)));<br />vWorldNormal.z = dot(g_vMatrices[iCubeIndexAttribute*3+2], float4(vNormalAttribute, 0.0)));<br /><br /><br />Oops - just realised you'd then need to transform the positions by the viewproj matrix which will take up a few more uniforms, so you'll end up with only 41 cubes per batch.<br />

PARTNERS