1) How does a DrawCall actually look like? You mention type, offset and primitive count, but what about indexed/non-indexed, instanced, multi-draw-indirect etc...? Obviously you can figure out if you need to draw indexed or not by looking at if an index-buffer is bound, but for indexed draw calls there is an additional index-buffer-offset-parameter, for instanced rendering the number of instances to draw, etc... Do you just have a bunch of generic uint32_t parameters to handle this? (don't think so, since your DrawItem-structs are really small)
I have a separate draw-call-descriptor struct for each type, which the user fills in prior to compiling a draw-item.
The interface to create draw-items looks something like:
struct LinearDrawDesc
{
PrimitiveType::Type primitive;
u32 primitiveCount;
u32 vbOffset; // - vbOffset is counting in the number-of-vertices from the start of the buffer, NOT in bytes
u8 stencilRef;
bool useStencilRef;
};
struct IndexedDrawDesc
{
PrimitiveType::Type primitive;
u32 primitiveCount;
u32 vbOffset; // - vbOffset is counting in the number-of-vertices from the start of the buffer, NOT in bytes
u32 ibOffset; // - ibOffset is in number-of-indices, NOT in bytes
u8 stencilRef;
bool useStencilRef;
};
class DrawItemWriter
{
public:
DrawItemWriter();
// Either pass a Scope allocator, or pass 'Persistent'.
// In the Persistent case: Each DrawItem must have it's Release function called, and
// the DrawItemSharedResources must have it's Release function called.
// In the Scope case: the DrawItems and DrawItemSharedResource will be released automatically by the supplied Scope. Do not call Release on them.
void Begin( GpuDevice& gpu, Scope& alloc );
void Begin( GpuDevice& gpu, Persistent_tag, DrawItemSharedResources* reuseExistingSharedData=0 );
void BeginPass( u32 pass, const PassState*, const RenderTargetState& );
void BeginPass( const RenderPass& );
void PreFlattenStates( FlattenedDrawStates& output, u32 stateGroupCount, const StateGroup*const* stateGroups );//If you're going to use the same state-group stack for multiple draws within a pass, this lets you pay the stack-flattening cost once.
DrawItem* Add( const char* name, const DrawDesc&, u32 stateGroupCount, const StateGroup*const* stateGroups, const DrawItemOptions& opt = DrawItemOptions() );
DrawItem* Add( const char* name, const DrawDesc&, const FlattenedDrawStates&, const DrawItemOptions& opt = DrawItemOptions() );
void EndPass();
DrawItemSharedResources* End();
};
Internally, every draw-item starts with a 64-bit header, which mostly contains state ID's, but it also contains a jump-table index.
Jump tables are kind of like a vtable used for virtual function calls - an array of function pointers. The array itself differs per platform, but looks something like:
typedef void(*PfnDraw)(void*, const void*);
const static PfnDraw s_drawJumpTable[] =
{
&DL <0>, &DI <0>, &IDL<0>, &IDI<0>,//non instanced, no per-draw stencil-ref
&DLI<0>, &DII<0>, &IDL<0>, &IDI<0>,// instanced, no per-draw stencil-ref
&DL <1>, &DI <1>, &IDL<1>, &IDI<1>,//non instanced, per-draw stencil-ref
&DLI<1>, &DII<1>, &IDL<1>, &IDI<1>,// instanced, per-draw stencil-ref
};
^That's 16 different drawing function permutations, depending on linear/indexed, instanced, indirect, and whether the stencil-ref value is set per pass or per draw(!)... More on that last one later.
When building a draw-item, the DrawItemWriter asks the back-end for the appropriate jump-table index, based on the type of draw-call that it's building, and then stores this index in the header. Note that a table of 16 entries requries 4 bits in the header to store this info.
The actual draw-item itself can then be one of 16 different structures, as it will be interpreted by the corresponding function in that table.
2) Regarding InputAssemblerConfig. In your StateGroup, you have VertexData and InstanceData separated, but then you have a InputAssemblerConfigID in your DrawItem. How does that work?
Furthermore, does it even make sense to separate InstanceData, since VertexData holds the stream format, which already has to know to use an instance buffer for certain attributes (doesn't it?)
See DrawItemSharedResources, above. The DrawItemWriter keeps track of these potentially reusable structures while building a collection of draw-items (between one Begin/End pair). Begin also takes a pointer to an existing DrawItemSharedResources, if you want a new set of draw-items to continue using the same pool as an early batch of draw-items. That class itself is basically a pool / hash-table, yep.
With the separate instance/vertex data, I split those because they tend to come from separate sources, which means separate state-groups. The mesh itself has a state-group that binds the per-vertex buffers, and an instancing system will have another state-group that binds the per-instance buffers.
3) In order to compile a DrawItem, you need to use a specific RenderPass (for override & defaults). For immediate-mode, this is pretty simple, but what ie. for meshes that are rendered in a deferred pass, shadow-pass, reflection, ... Any tips on how to implement this? Up until now, I could just generate my StateGroups, and insert them to any pass. The pass would then first evaluate an unique DrawItem, binding its own state (cbuffers, ...), followed by the items submitted. Now, I need to somehow register the renderable with the pass, to create a state-item for the pass/renderable combination. Does that sound about right? Where do I store the generated DrawItem for the pass, inside a lookup-table/array in the Renderable, or inside the Pass itself?
4) Furthermore, upon Execution of a DrawItem I have to submit the RenderPass-information (render-targets, viewport). How you handle this in combination with your render-queue/DrawItem-sorting?
That problem simply isn't solved in my low-level API -- it just trusts that you submit a draw-item alongside the same render-pass as you created it with, and does undefined behaviour otherwise.
The GpuContex submit function looks like:
//Submit a list of draw-calls, optionally clearing before the first draw
void Submit( const RenderPass&, const DrawList&, const ClearCommand* c=0 );
Where DrawList is a lightweight class that basically contains a DrawItem** and a count. So yes, you submit a collection of DrawItems and explicitly state the RenderPass to use with them.
At a higher level, rendering systems will query their model's shaders as to which passes that model is compatible with, and will query the rendering pipeline as to which passes it intends to render models with. The rendering systems will then pre-generate several draw-items for each mesh -- one for each pass that it will be used in.
A scene traversal / sorting system can then get the list of required passes from the rendering pipeline, and then fill in one array per pass from these rendering systems, sort them, and hand the sorted arrays over to the rendering pipeline.
5) Where/How does the scissor rect fit in the StateGroup/DrawItem? In DX11 its part of the rasterizer-stage, but making it part of the Rasterizer-config seems clumsy, and as I woul definately need more than 256 scissor-rects, I would need to increase the ID-size of the rasterizer config from 8 to 16 bit, also I'd have to creatate the whole rasterizer-config when just the scissor-rect changed.
The draw-item header has one bit indicating whether a per-draw-item scissor rect is being supplied or not. If not, there's no extra overhead, otherwise there's four u16's added to the end of the draw-item containing the scissor rect (I don't support floating point rect coords :( ).
The scissor rect is not part of the rasterizer-state object in DX11 - only a bool saying whether a scissor rect is in use or not is part of that object. So, I always compile a pair of D3D rasterizer-state objects for each of my own rasterizer-states, so I can pick the right one depending on whether a draw-item/pass wants scissoring or not.
Also see the above example of having the draw submission code vary based on whether the draw-item has a per-draw stencil-ref or not -- you can build similar permutations that deal with extra data like a scissor rect being present or not.