Hm, a buffer reorganisation / draw splitting routine would be useful yeah. It's like having a 16 bit index buffer but still allowing use of a vertex buffer containing more than 65k vertices?
Yep. So in my (C#) tools, I've got some code like this, which takes a single non-indexed triangle list, and splits it into 1 or more indexed triangle lists with a max of 65k verts per sub-list so that 16-bit indices can be used:
IndexedTriList[] ReindexTriList(TriList list)
{
var outputLists = new List<IndexedTriList>();
int numOutputVerts = list.vertices.Count();
for (int vertIdx = 0; vertIdx != numOutputVerts; )
{
//This will become the index buffer content for this group of up to 65k verts:
var indices = new List<int>();
//This will become the vertex buffer content for this group of up to 65k verts:
var vertices = new List<Vertex>();
//This lets us keep track of duplicate vertices, mapping them to their index into vertices
var uniqueVertices = new Dictionary<Vertex, int>(new VertexEqualityComparer());
//number of triangles in this group:
int triCount = 0;
//keep pushing triangles into the group until we've consumed them all, or there's 65k unique verts in the buffer
for (; vertIdx != numOutputVerts && vertices.Count < 0xFFFC; vertIdx += 3)
{
++triCount;
//read the next triangle out of the input non-indexed triangle list
for (int j = 0; j != 3; ++j)
{
Vertex v = list.vertices[vertIdx+j];
//check if we've already added this vertex to the group
int index = -1;
if (!uniqueVertices.TryGetValue(v, out index))
{
//if not, add this vertex to the group now
index = vertices.Count;
vertices.Add(v);
uniqueVertices.Add(v, index);
}
//add the index of the vertex within the group to the group's index buffer
indices.Add(index);
}
}
//add this group of <=65k verts to the output
outputLists.Add( new IndexedTriList(triCount, vertices.ToArray(), indices.ToArray()) );
}
return outputLists.ToArray();
}
You can take the resulting array of indexed tri-lists and put them all into a single vertex-buffer and single index-buffer if you like -- and then use the DrawIndexedPrimitive parameters to set the appropriate offsets into your buffers: BaseVertexIndex (32bit number to add to each 16-bit index) and StartIndex (offset into the index buffer of where to start reading from).
I'm not planning to 'thread' the transformations into the dynamic vertex buffer, because I build the bone matrices for a given frame practically right before rendering, so it seems pointless for the rendering thread to wait for a seperate thread to finish what the rendering thread itself could do.
Most engines use a "job system" for threading these days. Say you've got 100k vertices to be processed, you could add 100 jobs to the job queue, each of which is responsible for processing 1k vertices each. All of your threads (main thread, and worker threads) can then consume those jobs when they've got nothing else to do. While the main thread is waiting on these jobs to finish (before it can continue with rendering), it can consume jobs too. This model lets you (periodically) get 100% CPU usage (all cores busy) pretty easily, at least whenever you've got large batches of data to process.
Lastly, I noticed there are no usage flags for creating read-only vertex buffers? I should just leave out D3DUSAGE_WRITEONLY at creating the vertex buffer, and use D3DLOCK_READONLY when I lock the vertex buffer for reading? Is this the fastest read-only static vertex buffer? Or should I just not use d3d vertex buffers at all and go with std vectors or so...
If you're never going to be sending the data to the GPU or otherwise transforming it using D3D... then yeah, just allocate the memory yourself instead of using a D3D buffer object. If you do need CPU-side D3D data for whatever reason though, D3DPOOL_SYSTEMMEM is what you're looking for.