Currently all i would know how to do would be thread the whole thing as poly soup (which would mean both meshes would be linked by a triangle).
It depends on the primitive type that you're drawing with.
Often games use triangle lists, where 3 values are read from the index buffer at a time to define a triangle. Triangles can be disconnected from each other / there's no need for them to all be linked. If you use triangle lists, there's no problem at all.
If you're using a triangle strip / triangle fan, then yes, all the triangles are joined together, as after the first triangle, only one index is read at a time to 'attach' the next vertex in the strip/fan. This causes a problem when you want to draw different parts in one draw-call, without them being linked.
In this situation you can use a trick known as a "degenerate triangle" to link two physically seperate meshes together. The GPU will immediately discard any triangle with zero area without drawing anything - these triangles are called degenerates.
e.g. say you've got two quads, defined with these indices:
1-2 5-6
|/| |/|
3-4 7-8
For a tri-strip draw-call, you can use the index buffer values of
1234455678, instead of
12345678. The duplicate indices in the middle create the degenerate linking triangles.
Those indices interpreted as a triangle strip will result in 7 triangles using these indices:
123 234 344 445 556 567 678...but, with 3 of those triangles being degenerate (
zero area, because the same vertex is used twice per triangle) and ignored by the GPU.
Oh and another thing because something you said picked my interested: it can be faster to do multiple draw calls!
How come? I mean i always understood it was a design limitation (sharing material) vs performance, how could it possibly be faster to call draw multiple times instead of once?
Whenever you call
any D3D/GL function, it takes time on the CPU. Reducing the number of state-changes (
GL/D3D calls other than 'draw' ones, such as changing textures/shaders/etc) is just as important as reducing the number of draw-calls.
However, in D3D11, calls are
much cheaper than they were in D3D9. In D3D9, it's quite important to keep your D3D-call count low, but in D3D11 you can make many more calls in the same amount of CPU time... so it's not quite as much of a concern as it used to be.
In any case, reducing draw-calls is a CPU-side optimization.
The GPU on the other hand doesn't really care how many draw-calls you make, as long as between each group of state-changes (
i.e. changing shaders, textures, etc) it is given a decent amount of work.
Usually you'll sort your objects so that all the ones that share a material are drawn one after the other, so that you can make all the state-changes once, then execute all the draw-calls together without state-changes inbetween. This reduces your CPU overheads, and it also gives the GPU more work to do between groups of state changes, so the GPU will stall less.
On the
GPU side, if you set up your material (set your shader, textures, etc) once and then make 10 small draw-calls, or just make 1 big draw-call, the GPU acts the same.
On the
CPU side though, the version with 10 draw-calls has to do up to 10 times more work.
If you have a fast
CPU and a slow
GPU, then
CPU-side optimisations will have no effect on your framerate.
Also, if your 'batching' strategy requires you to copy vertex data around, merge buffers, pre-transform stuff on the CPU, etc, etc, then all that work might be more expensive than the savings you get from 'batching'. It may be faster to just do nothing ;)