Jump to content

  • Log In with Google      Sign In   
  • Create Account


Advanced Render Queue API


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
31 replies to this topic

#21 IncidentRay   Members   -  Reputation: 150

Like
1Likes
Like

Posted 12 January 2013 - 01:38 PM

Thanks for the reply, Hodgman.  Memcpy does seem to work on non-POD types in this case.  I'm interested in the alternate design you suggested using composition rather than inheritance.  That looks like it work quite well for one level of inheritance, but I'm not sure about how you would extend this to multiple levels of inheritance.  For this renderer design, I think you might need two levels of inheritance -- you mentioned that State inherits from Command, and then, in my understanding, you have multiple render-state structs derived from State.  Would you just do something like this?

struct Command {
   u8 id;
};

struct State {
    Command baseClass;
    // other members...
};

struct BindShader {
    State baseClass;
    Shader* shader;
};



Sponsor:

#22 TiagoCosta   Crossbones+   -  Reputation: 1712

Like
1Likes
Like

Posted 12 January 2013 - 03:14 PM

For this renderer design, I think you might need two levels of inheritance -- you mentioned that State inherits from Command, and then, in my understanding, you have multiple render-state structs derived from State. Would you just do something like this?

 

I'm pretty sure you just need a single level of inheritance

If you have a Command struct:

struct Command
{
        u8 id;
};

You can have as many commands structs as you need:

struct BindVSCommand
{
     Command cmd;
     VertexShader* pVS;
};

struct BindVSTexture0Command
{     Command cmd;
     Texture* pTexture;
};

struct BindVSTexture1Command
{
     Command cmd;
     Texture* pTexture;
};

In your State struct you don't need any other members so you can remove that structure.

 

In my renderer (which follows many of Hodgman ideas) I don't even use a command struct so no need for inheritance.

I simply store an int before each Command struct in the CommandGroup blob (in this topic called StateGroup).

When executing CommandGroups I get the int from the blob switch on it to the appropriate command based on it.

 

CommandGroup cmdGroup;

for(int i = 0; i < cmdGroup.numCommands; i++)
{
    int cmdID = cmdGroup.blob.next<int>();

    switch(cmdID)
    {
        case(BIND_VERTEX_SHADER):
            BindVertexShaderCommand cmd = cmdGroup.blob.next<BindVertexShaderCommand>();
            //all kinds of state check etc
            executeCommand(cmdID, cmd);
            break;
        //all other commands
    }
}

 

 

To Hodgman: is there any perfomance penalty in using template functions?


Edited by TiagoCosta, 12 January 2013 - 03:32 PM.

Tiago Costa
Aqua Engine - my DirectX 11 game "engine" - In development

#23 melbow   Members   -  Reputation: 215

Like
0Likes
Like

Posted 12 January 2013 - 07:50 PM

While I am not Hodgman, I will say that, broadly speaking, templates increase the compiled code's size. If the function is inline though (which I assume it is in your case, TiagoCosta), there should not be any negative performance penalties.

#24 IncidentRay   Members   -  Reputation: 150

Like
0Likes
Like

Posted 12 January 2013 - 09:18 PM

I'm pretty sure you just need a single level of inheritance

 

You're right, it does look like a single level of inheritance would be fine.  I was wondering about the purpose of the State struct, as I can't think of what data members it would have.

 

About templates (sorry, not Hodgman!): I'm pretty sure templates are a purely compile-time feature, so it shouldn't have too much of a runtime-performance penalty.  However, melbow has a good point: templates often do increase the size of the compiled code.  I think there probably wouldn't be much benefit if you implemented exactly the same thing but without using templates (e.g. just copying and pasting your classes/structs/functions and changing the variable types).  Templated functions could possibly cause cache-misses if the compiler hasn't placed the code for both implementations nearby.  In your example, if the instructions for next<int> and next<BindVertexShaderCommand> aren't nearby, this might cause an i-cache miss.  However, this only applies if they're not inlined, so if they are inlined, you should be fine.  I've only started learning about how caches work recently, though, so someone should correct me if I'm wrong.



#25 Hodgman   Moderators   -  Reputation: 24222

Like
1Likes
Like

Posted 12 January 2013 - 09:54 PM

Yeah you could do this with a single level of inheritance. I do use two levels like you demonstrated, IncidentRay, but State (and DrawCall) don't have any other members.

I only have this second layer of inheritance so that elsewhere in the API I can say that StateGroups contain States (which are just Commands) and SubMeshes/RenderItems contain DrawCalls (which are also just Commands). It just ensures that the user can't put a DrawCall-derived command in a StateGroup, nor use a State-derived command to represent geometry in a SubMesh, without generating a compile-time error.

 

TiagoCosta's method of not even putting the ID integer in command structure (and just having it precede the structure in the parent allocation) is also a popular choice -- I've often seen this approach used in networking systems.

 

 

Yeah when using templates, you can imagine what the code would be like if you manually implemented them, e.g. if you wrote NextInt() and NextBindVertexShaderCommand() -- the compiler will basically be doing that behind the scenes.

 

If two different template functions happen to produce the same assembly code (e.g. vector<void*> and vector<Foo*> are likely exactly the same at the asm level) then your compile times will suffer, as the compiler will generate the same code multiple times, but then a modern linker will remove/merge the duplicates.

On old compilers (nothing you'll be using these days), the linker wouldn't perform this step, so if you had many different vector<T*> (with different T's) you'd end up with a ton of identical functions in your final executable... This is one reason why game developers didn't used to like using templates or the STL.



#26 IncidentRay   Members   -  Reputation: 150

Like
0Likes
Like

Posted 12 January 2013 - 10:19 PM

Yeah you could do this with a single level of inheritance. I do use two levels like you demonstrated, IncidentRay, but State (and DrawCall) don't have any other members.
I only have this second layer of inheritance so that elsewhere in the API I can say that StateGroups contain States (which are just Commands) and SubMeshes/RenderItems contain DrawCalls (which are also just Commands). It just ensures that the user can't put a DrawCall-derived command in a StateGroup, nor use a State-derived command to represent geometry in a SubMesh, without generating a compile-time error.

 

 

Moving the errors to compile time is definitely a good thing.  So then the RenderItem struct can have a pointer to a DrawCall rather than just a pointer to a Command. I'm not quite sure how it would work for the state-group, however.  I was thinking of having a StateGroup struct which actually just stores the header -- i.e. the number of states, the size in bytes, etc.  Then the actual commands would be allocated contiguously after the header.  For example,

 

struct StateGroup {
    u8 numStates;
    u32 sizeInBytes;
};

void Foo(StateGroup* group)
{
    State* states = (State*)((u8*)group + sizeof(StateGroup));
}

 

I can't see a way to stop the user from putting DrawCalls in the state group.  Do you implement StateGroups differently, or is there something I'm missing?

 

 

If two different template functions happen to produce the same assembly code (e.g. vector and vector are likely exactly the same at the asm level) then your compile times will suffer, as the compiler will generate the same code multiple times, but then a modern linker will remove/merge the duplicates.
On old compilers (nothing you'll be using these days), the linker wouldn't perform this step, so if you had many different vector (with different T's) you'd end up with a ton of identical functions in your final executable... This is one reason why game developers didn't used to like using templates or the STL.

 

Yeah, I forgot that linkers can do that; thanks for the reminder.  TiagoCosta's use of templates is probably completely fine then.

 

EDIT: This editor seems to keep messing up the angle brackets for templates, so I can't get the quote above right.


Edited by IncidentRay, 12 January 2013 - 10:23 PM.


#27 Hodgman   Moderators   -  Reputation: 24222

Like
1Likes
Like

Posted 12 January 2013 - 10:38 PM

If you're going for memory efficiency, be aware of your compiler's padding behaviour when designing your header structures -- e.g. your numStates var may as well be a u32 on my compiler.
If you support having external tools that generate these kinds of data-structures (e.g. I put state-groups directly in my engine's model, material and technique files so that I don't have to allocate extra memory when parsing those files on-load), then you've also got to be aware that those tools are dependent on your compiler's structure layout, or you've got to use compiler-specific extensions to force it to use the structure layout that you want.
struct StateGroup {
    u8 numStates;
    u8 _padding_[3];//!
    u32 sizeInBytes;
};
struct StateGroupB {
    u16 numStates;
    u16 sizeInBytes;//64KiB of state data should be enough for anyone ;)
    u32 _paddding_;//n.b. your compiler might do this to align the u64 below!
    u64 stateMask;//this isn't required, but I chose to add it
};
#pragma pack(push)//compiler-specific code to ensure no padding...
#pragma pack(1)//you're now responsible for understanding the alignment requirements of your target CPU...
struct StateGroupC {
    u64 stateMask;
    u16 numStates;
    u16 sizeInBytes;
};
#pragma pack(pop)

I do pretty much the same thing that you've thought of here, except I also put a bitfield in the header, with one bit for each type of state.
For an example of where this may be be useful -- in my loop that processes state-groups (posted earlier), I allow an item to contain 'layers' of state-groups, where state-values in earlier/higher groups take precedence over the values in later/lower groups. The bitfield lets me quickly check if a state-group contains any states that I'm interested in looking at, or whether it only contains states that I've already got values for and can be skipped entirely.
I can't see a way to stop the user from putting DrawCalls in the state group.  Do you implement StateGroups differently, or is there something I'm missing?
It depends on how you generate your StateGroups. I use the same allocation scheme -- a header followed by states -- and use a stack allocator to achieve this.
As a simple example of how you could enforce compile-time checking while creating a state-group:
class StateGroupWriter
{
  StackAlloc& a;
  StateGroupHeader* header;
public:
  void Begin() { header = a.Alloc<StateGroupHeader>(); header->numStates = 0; header->stateMask = 0; }
  void Write( State& state )
  {
    uint size = state.Size();
    u8* mem = a.Alloc(size);
    memcpy(mem, &state, size);
    ++header->numStates;
    header->stateMask |= u64(1)<<state.id;
  }
  StateGroupHeader* End()
  {
    StateGroupHeader* result = header;
    header = 0;
    result->sizeInBytes = a.Mark() - ((u8*)result);
    return result;
  }
};

Edited by Hodgman, 12 January 2013 - 11:03 PM.


#28 IncidentRay   Members   -  Reputation: 150

Like
0Likes
Like

Posted 12 January 2013 - 11:26 PM

If you're going for memory efficiency, be aware of your compiler's padding behaviour when designing your header structures

 

Yeah, that StateGroup struct was just a quick example; I'd definitely worry about padding in real code.

 

e.g. I put state-groups directly in my engine's model, material and technique files

 

Putting the state groups in resources files is an intriguing idea.  How do you deal with pointers in this case, as obviously they will have different values each time you run the engine? I suppose you could store some other data in the pointer field, and then fix the pointers at runtime -- e.g. maybe a hash of a filename.  Or do you do something else?

 

 

I do pretty much the same thing that you've thought of here, except I also put a bitfield in the header, with one bit for each type of state.

 

Your recommendation of storing a state bitfield sounds good -- that looks like it would be quite useful.  As I understand it, the bitfield can be used to quickly check whether a state group can be totally ignored, but if not, the next step is to check each state individually?

 

Thanks for the detailed example of how you might generate state groups; that answers several questions I had.



#29 Hodgman   Moderators   -  Reputation: 24222

Like
1Likes
Like

Posted 13 January 2013 - 12:39 AM

How do you deal with pointers in this case, as obviously they will have different values each time you run the engine? I suppose you could store some other data in the pointer field, and then fix the pointers at runtime -- e.g. maybe a hash of a filename.  Or do you do something else?

First up, if you want to research this, the technique of loading your runtime data structures straight out of a file with no (or little) on-load processing is usually called "load-in-place" or "in-place memory", or something similar. There's a gamasutra article here.

 

For pointers within a particular asset (e.g. a pointer from a model header to an array of state-group pointers, to a state-group) I use the Offset (relative address) and Address (absolute address) classes in this header -- i.e. I don't use actual pointers.

 

For things that need to be pointers at runtime, but can't be known in advance (e.g. a pointer to an actual D3D vertex buffer), then they have to undergo "pointer patching" on-load.

For example, say a model asset has n vertex-buffers, and also has some state-groups that need to contain pointers to those vertex buffers. The state groups are saved with an integer from 0..n, in place of the VB pointer, which indicates the index of the VB that it should point to. On-load, the model's VBs are created by D3D and we now know the real pointer values. We can then iterate through the state-groups, reading these index integers and using them to look up the appropriate VB pointer, and writing the pointer over the top of the integer.

 

For references to other assets, I use filename hashes, yep. As above, these hashes can be converted to real pointers on-load, if required.

 

but if not, the next step is to check each state individually?

Yeah, my states are variable size, and the group-header doesn't contain the actual offset of each state. Therefore to iterate through the group, you have to inspect each state in a linear order, and determine the current state's size to know how far to jump ahead to find the next state. The bitfield does allow you to halt this iteration early if you know that you've already inspected all of the 'interesting' states in the group though.

 

As an alternative, you could allocate an array of size numStates in/after the header and write the offset of each state into this array. If you then ordered the states by their ID value, you'd be able to quickly jump to a particular state that you're interested in without iterating through each one.


Edited by Hodgman, 13 January 2013 - 12:43 AM.


#30 melbow   Members   -  Reputation: 215

Like
0Likes
Like

Posted 13 January 2013 - 06:57 AM

Is the reason for storing StateGroups as a header and chunk of States that you make it more cache friendly due to locality?

Also, you say you store StateGroups in Resources. Am I correct in assuming some StateGroups must still be created at runtime, e.g. geometry instance transforms. And therefore RenderItems creation also cannot be done when the game object is initialized, but must be done every frame. Is this correct or am I completely off base?

#31 IncidentRay   Members   -  Reputation: 150

Like
0Likes
Like

Posted 13 January 2013 - 06:12 PM

First up, if you want to research this, the technique of loading your runtime data structures straight out of a file with no (or little) on-load processing is usually called "load-in-place" or "in-place memory", or something similar. There's a gamasutra article here.
 
For pointers within a particular asset (e.g. a pointer from a model header to an array of state-group pointers, to a state-group) I use the Offset (relative address) and Address (absolute address) classes in this header -- i.e. I don't use actual pointers.

 

I had heard of this technique before; I was more wondering about the specific case of state-groups, where it seems that a lot of states would need to store pointers.  For example, pointers to shaders, textures, vertex buffers, index buffers, cbuffer data, etc.  Your example of model assets cleared up my questions about this, though.  I wasn't quite sure what the name for this technique was, however, so your suggestions for terms to research look helpful.  I'll have a look at the article you linked as well.

 

So in general, the idea is to also store the actual data as well as the state groups, then in place of an actual pointer use an offset or address to the data, and then perform pointer-patching when loading.

 

Yeah, my states are variable size, and the group-header doesn't contain the actual offset of each state. Therefore to iterate through the group, you have to inspect each state in a linear order, and determine the current state's size to know how far to jump ahead to find the next state. The bitfield does allow you to halt this iteration early if you know that you've already inspected all of the 'interesting' states in the group though.
 
As an alternative, you could allocate an array of size numStates in/after the header and write the offset of each state into this array. If you then ordered the states by their ID value, you'd be able to quickly jump to a particular state that you're interested in without iterating through each one.

 

I think linear iteration would probably be fine for me as well.  Using the bitfield to skip the rest of a state-group if possible sounds like a good idea.


Edited by IncidentRay, 13 January 2013 - 06:13 PM.


#32 melbow   Members   -  Reputation: 215

Like
0Likes
Like

Posted 15 January 2013 - 04:48 PM

So I think I've managed to go full circle from virtual Draw methods to a Render Command Queue back to virtual Draw methods. It just doesn't seem worth it to do all this work just for a unified renderer when you could have a virtual Draw method and call it a day. If renderable objects are managed and batched prior to being made into RenderItems then as far as I can tell, you could just give the batch a Draw method and enqueue the batch.

I guess I don't really see the point of the Commands.

I am at a loss because I want to do this the right way, but I have too little experience with coding rendering APIs to know what that is. And please don't say, "There is no 'right way.'"

Could anybody recommend a book that covers render queues similar to what has been described by Hodgman?




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS