Hello all, long time since my last post.
I worked on a voxel engine about a year ago in XNA using hardware instancing of each cube's faces. While making it, I made a post here asking for advice (link below) on how to improve the framerate, as my code was incredibly inefficient, from doing too much unneeded update logic to simply drawing too much. During said discussion it has improved vastly, but had to put the project down without implementing the number 1 most important piece of advice I took from that post: reducing the volume of data about each instance. Following is a quote explaining how much I was using and how much I could have been using.
It sounds like your instance vertex format is something like this?
transform 16*4 -> 64 bytes
texcoord 2*4 -> 8 bytes
texcoord 2*4 -> 8 bytes
texcoord 2*4 -> 8 bytes
color 4*4 -> 16 bytes
As kalle_h mentioned, you should be able to reduce the color to 4 bytes. You can also use lower precision values for the texcoords. Using HalfVector2 for the texcoords will cut their size in half (or use the index menthod Kalle_h mentioned). These are simple changes to the vertex format, you don't need to change the shader.
For the transform, it sounds like you're passing a whole matrix? You actually only need to pass some of the matrix elements, and you can "reconstruct" the matrix in the shader. Certainly you could cut this down to 12 floats. If you only need translation, then you could cut it down to 3 floats. If you also need a uniform scale, that's only 1 more float. Rotation? Probably 4 more.
So, conservatively, you get have:
transform 12*4 -> 48 bytes
texcoord 2*2 -> 4 bytes
texcoord 2*2 -> 4 bytes
texcoord 2*2 -> 4 bytes
color 4*1 -> 4 bytes
TOTAL: 64 bytes
More aggressively, say you only need translation for your transform:
transform 3*4 -> 12 bytes
texcoord 2*2 -> 4 bytes
texcoord 2*2 -> 4 bytes
texcoord 2*2 -> 4 bytes
color 4*1 -> 4 bytes
TOTAL: 28 bytes
My goal is to take the last byte count down even further.
transform 3*4 -> 12 bytes
As the quote shows, I was indeed passing an entire matrix, when only the translation info was needed. I solved this easily enough by simply passing a Vector3 and rebuilding the world matrix in the shader.
Next, I want to take the next three variables (coordinates for a texture atlas, (base, overlay, and breaking)), and reduce them further than 2*2 down to a single 16 bit unsigned integer, which should allow ~65,536 different atlas coordinates. Problem is:
(1). I don't know how to turn an index into an x and a y int in HLSL code. I could do it in C# fine by just dividing the index by the number of pieces per row, storing the rounded down number as y, and subtracting the y * the number of pieces per row from the index to get x.
(2). HLSL has no 16-bit integer. Is it possible to still send a 16 bit int, but convert to a 32 bit int once it's there? Or am I just missing unlisted integral types?
Lastly, for my color, I only want to send 2 bytes. The first will represent red, green, and blue, while the second will represent transparency. How do I convert 2 bytes into 2 floats when the time comes?
All in all, I just need help in converting different data types back and forth to allow for the least needed bandwidth when I send my instance information.
For those of you that would like to view the original discussion, click this link: http://www.gamedev.net/topic/629247-face-instancing-dividing-draw-calls/
EDIT: I just realized that since I am instancing individual faces of each cube, I do also need rotation, so memory per instance will actually look like this:
transform 12*4 -> 48 bytes