If I want each stick to be animated and positioned correctly, I'd need three 4x3 transformation matrices to define how to move and rotate each of the sticks in 3D space. If I do what you propose, I'd have 3 indices per vertex, and an array containing three 4x3 matrices? This doesn't scale very well as a model will have 100's of vertices which means 100's of 4x3 matrices.
I don't understand how you've come up with three in the bolded bit. Each stick only has two bone (head/foot), so each stick has two matrices. Each vertex also only has one bone-index because it's either connected to the head, or to the feet.
It doesn't matter how many vertices are in the feet/head. Per object, you have one 'feet' transform and one 'head' transform.
Transform buffer: {Head0, Feet0, Head1, Feet1, Head2, Feet2...}
Vertex Buffer if the head was made up of 2 verts and the feet also of two verts:
{
//stick 0's verts
{pos={a,b,c},uv={d,e},bone={0/*aka Head0*/}},
{pos={f,g,h},uv={i,j},bone={0/*aka Head0*/}},
{pos={k,l,m},uv={n,o},bone={1/*aka Feet0*/}},
{pos={p,q,r},uv={s,t},bone={1/*aka Feet0*/}},
//stick 1's verts
{pos={u,v,w},uv={x,y},bone={2/*aka Head1*/}},
{pos={z,A,B},uv={C,D},bone={2/*aka Head1*/}},
{pos={E,F,G},uv={H,!},bone={3/*aka Feet1*/}},
...
}
In the vertex shader, you then do something like:
int boneIndex = vertex.bone;
Vec4 transform0 = TransformBuffer.Load(boneIndex*3+0);//index*3 because we have 3 Vec4's per transform
Vec4 transform1 = TransformBuffer.Load(boneIndex*3+1);
Vec4 transform2 = TransformBuffer.Load(boneIndex*3+2);
Mat4 transform = Mat4( transform0, transform1, transform2, vec4(0,0,0,1) );
Vec3 worldPosition = mul(transform, Vec4(vertex.position,1) );
Then as an extension to this, you can get "skinning" (soft transitions between bones) by using more than one bone index per vertex.
e.g. A vertex that's 75% controlled by the head bone, but 25% by the feet bone:
{pos={a,b,c},uv={d,e},bones={0/*aka Head0*/, 1/*aka Feet0*/}, weights={0.75,0.25}},
Then a VS that loads multiple bone indexes and blend weights for each one.
int boneIndex0 = vertex.bones.x;
Vec4 transform0_0 = TransformBuffer.Load(boneIndex0*3+0);
Vec4 transform0_1 = TransformBuffer.Load(boneIndex0*3+1);
Vec4 transform0_2 = TransformBuffer.Load(boneIndex0*3+2);
int boneIndex1 = vertex.bones.y;
Vec4 transform1_0 = TransformBuffer.Load(boneIndex1*3+0);
Vec4 transform1_1 = TransformBuffer.Load(boneIndex1*3+1);
Vec4 transform1_2 = TransformBuffer.Load(boneIndex1*3+2);
Vec4 transform0 = transform0_0 * vertex.weights[0] + transform1_0 * vertex.weights[0];
Vec4 transform1 = transform0_1 * vertex.weights[0] + transform1_1 * vertex.weights[0];
Vec4 transform2 = transform0_2 * vertex.weights[0] + transform1_2 * vertex.weights[0];
Mat4 transform = Mat4( transform0, transform1, transform2, vec4(0,0,0,1) );
p.s. the above code does horrible linear blending of matrices, which doens't produce very good quality. Often animation systems will use a quaternion + a vec3 scale + a vec3 position, blending them individually, and then using those blended results to construct a Mat4x4.
p.p.s. Half-Life 1 in 1998 was one of the first games I know of that pioneered "skinned animation" and it's been the defacto standard character animation technique ever since. It's common these days to have characters with, say, 10k verts and 50 bone matrices. Nextgen even more like 100k verts and 150 bone matrices.