Advanced Render Queue API

melbow · 2013-01-15T22:48:22

Let me preface this by saying I have read everything I could find on the web on this topic, but have been unable to answer my question. This list includes most notably: http://realtimecollisiondetection.net/blog/?p=86http://realtimecollisiondetection.net/blog/?p=86 http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter05.html http://www.gamedev.net/topic/604899-frostbite-rendering-architecture-question/ http://www.gamedev.net/topic/605065-renderqueue-design-theory-and-implementation/#entry4828468 http://www.gamedev.net/topic/602839-whats-the-point-of-having-a-single-render-device/#entry4815585 I understand sorting the draw calls and how to use this system under the fixed-function pipeline. However, when I change to a programmable pipeline, I can't seem to come up with how a "Render Operation" would be structured. The best solution I could come up with was an object list for both uniforms and attributes. However, I can't see how a VAO or VBO would fit into this scheme. Roughly, my code might look like: class ShaderProgram { public: void Create(char*,char*); void MakeCurrent(); ShaderAttributeInfo const & GetAttribute(const char* name); ... private: List<ShaderAttributeInfo> m_Attributes; List<ShaderUniformInfo> m_Uniforms; }; class ShaderAttribInfo { GLint handle; GLenum type; public: Set(void* data); }; class ShaderAttrib { ShaderAttribInfo* info; void* data; ... }; ... // Same basic structure for Uniforms class RenderOperation { List<ShaderAttrib> m_Attributes; List<ShaderUniform> m_Uniforms; ShaderProgram* m_Program; }; I'm trying to make my system as versatile, yet efficient as possible. To reiterate, my question is: How would a "Render Operation" be formatted for a Render Queue using a programmable pipeline? The operation should allow for any sort of uniform or attribute and allow for VAOs and VBOs. And I don't feel the need to support a fixed function pipeline, so don't worry about that. I hope I made my confusion clear enough. Thanks a ton.

Graphics and GPU Programming Programming

Started by melbow December 27, 2012 06:37 AM

30 comments, last by melbow 11 years, 3 months ago

IncidentRay

154

January 12, 2013 07:38 PM

Thanks for the reply, Hodgman. Memcpy does seem to work on non-POD types in this case. I'm interested in the alternate design you suggested using composition rather than inheritance. That looks like it work quite well for one level of inheritance, but I'm not sure about how you would extend this to multiple levels of inheritance. For this renderer design, I think you might need two levels of inheritance -- you mentioned that State inherits from Command, and then, in my understanding, you have multiple render-state structs derived from State. Would you just do something like this?


struct Command {
   u8 id;
};

struct State {
    Command baseClass;
    // other members...
};

struct BindShader {
    State baseClass;
    Shader* shader;
};

Aqua Costa

3,705

January 12, 2013 09:14 PM

For this renderer design, I think you might need two levels of inheritance -- you mentioned that State inherits from Command, and then, in my understanding, you have multiple render-state structs derived from State. Would you just do something like this?

I'm pretty sure you just need a single level of inheritance

If you have a Command struct:


struct Command
{
        u8 id;
};

You can have as many commands structs as you need:


struct BindVSCommand
{
     Command cmd;
     VertexShader* pVS;
};

struct BindVSTexture0Command
{     Command cmd;
     Texture* pTexture;
};

struct BindVSTexture1Command
{
     Command cmd;
     Texture* pTexture;
};

In your State struct you don't need any other members so you can remove that structure.

In my renderer (which follows many of Hodgman ideas) I don't even use a command struct so no need for inheritance.

I simply store an int before each Command struct in the CommandGroup blob (in this topic called StateGroup).

When executing CommandGroups I get the int from the blob switch on it to the appropriate command based on it.


CommandGroup cmdGroup;

for(int i = 0; i < cmdGroup.numCommands; i++)
{
    int cmdID = cmdGroup.blob.next<int>();

    switch(cmdID)
    {
        case(BIND_VERTEX_SHADER):
            BindVertexShaderCommand cmd = cmdGroup.blob.next<BindVertexShaderCommand>();
            //all kinds of state check etc
            executeCommand(cmdID, cmd);
            break;
        //all other commands
    }
}

To Hodgman: is there any perfomance penalty in using template functions?

melbow

221

Author

January 13, 2013 01:50 AM

While I am not Hodgman, I will say that, broadly speaking, templates increase the compiled code's size. If the function is inline though (which I assume it is in your case, TiagoCosta), there should not be any negative performance penalties.

IncidentRay

154

January 13, 2013 03:18 AM

[quote name='TiagoCosta' timestamp='1358025244' post='5020848']
I'm pretty sure you just need a single level of inheritance
[/quote]

You're right, it does look like a single level of inheritance would be fine. I was wondering about the purpose of the State struct, as I can't think of what data members it would have.

About templates (sorry, not Hodgman!): I'm pretty sure templates are a purely compile-time feature, so it shouldn't have too much of a runtime-performance penalty. However, melbow has a good point: templates often do increase the size of the compiled code. I think there probably wouldn't be much benefit if you implemented exactly the same thing but without using templates (e.g. just copying and pasting your classes/structs/functions and changing the variable types). Templated functions could possibly cause cache-misses if the compiler hasn't placed the code for both implementations nearby. In your example, if the instructions for next<int> and next<BindVertexShaderCommand> aren't nearby, this might cause an i-cache miss. However, this only applies if they're not inlined, so if they are inlined, you should be fine. I've only started learning about how caches work recently, though, so someone should correct me if I'm wrong.

Hodgman

52,717

January 13, 2013 03:54 AM

Yeah you could do this with a single level of inheritance. I do use two levels like you demonstrated, IncidentRay, but State (and DrawCall) don't have any other members.

I only have this second layer of inheritance so that elsewhere in the API I can say that StateGroups contain States (which are just Commands) and SubMeshes/RenderItems contain DrawCalls (which are also just Commands). It just ensures that the user can't put a DrawCall-derived command in a StateGroup, nor use a State-derived command to represent geometry in a SubMesh, without generating a compile-time error.

TiagoCosta's method of not even putting the ID integer in command structure (and just having it precede the structure in the parent allocation) is also a popular choice -- I've often seen this approach used in networking systems.

Yeah when using templates, you can imagine what the code would be like if you manually implemented them, e.g. if you wrote NextInt() and NextBindVertexShaderCommand() -- the compiler will basically be doing that behind the scenes.

If two different template functions happen to produce the same assembly code (e.g. vector<void*> and vector<Foo*> are likely exactly the same at the asm level) then your compile times will suffer, as the compiler will generate the same code multiple times, but then a modern linker will remove/merge the duplicates.

On old compilers (nothing you'll be using these days), the linker wouldn't perform this step, so if you had many different vector<T*> (with different T's) you'd end up with a ton of identical functions in your final executable... This is one reason why game developers didn't used to like using templates or the STL.

. 22 Racing Series .

IncidentRay

154

January 13, 2013 04:19 AM

Yeah you could do this with a single level of inheritance. I do use two levels like you demonstrated, IncidentRay, but State (and DrawCall) don't have any other members.
I only have this second layer of inheritance so that elsewhere in the API I can say that StateGroups contain States (which are just Commands) and SubMeshes/RenderItems contain DrawCalls (which are also just Commands). It just ensures that the user can't put a DrawCall-derived command in a StateGroup, nor use a State-derived command to represent geometry in a SubMesh, without generating a compile-time error.

Moving the errors to compile time is definitely a good thing. So then the RenderItem struct can have a pointer to a DrawCall rather than just a pointer to a Command. I'm not quite sure how it would work for the state-group, however. I was thinking of having a StateGroup struct which actually just stores the header -- i.e. the number of states, the size in bytes, etc. Then the actual commands would be allocated contiguously after the header. For example,


struct StateGroup {
    u8 numStates;
    u32 sizeInBytes;
};

void Foo(StateGroup* group)
{
    State* states = (State*)((u8*)group + sizeof(StateGroup));
}

I can't see a way to stop the user from putting DrawCalls in the state group. Do you implement StateGroups differently, or is there something I'm missing?

If two different template functions happen to produce the same assembly code (e.g. vector and vector are likely exactly the same at the asm level) then your compile times will suffer, as the compiler will generate the same code multiple times, but then a modern linker will remove/merge the duplicates.
On old compilers (nothing you'll be using these days), the linker wouldn't perform this step, so if you had many different vector (with different T's) you'd end up with a ton of identical functions in your final executable... This is one reason why game developers didn't used to like using templates or the STL.

Yeah, I forgot that linkers can do that; thanks for the reminder. TiagoCosta's use of templates is probably completely fine then.

EDIT: This editor seems to keep messing up the angle brackets for templates, so I can't get the quote above right.

Hodgman

52,717

January 13, 2013 04:38 AM

If you're going for memory efficiency, be aware of your compiler's padding behaviour when designing your header structures -- e.g. your numStates var may as well be a u32 on my compiler.
If you support having external tools that generate these kinds of data-structures (e.g. I put state-groups directly in my engine's model, material and technique files so that I don't have to allocate extra memory when parsing those files on-load), then you've also got to be aware that those tools are dependent on your compiler's structure layout, or you've got to use compiler-specific extensions to force it to use the structure layout that you want.

struct StateGroup {
    u8 numStates;
    u8 _padding_[3];//!
    u32 sizeInBytes;
};
struct StateGroupB {
    u16 numStates;
    u16 sizeInBytes;//64KiB of state data should be enough for anyone ;)
    u32 _paddding_;//n.b. your compiler might do this to align the u64 below!
    u64 stateMask;//this isn't required, but I chose to add it
};
#pragma pack(push)//compiler-specific code to ensure no padding...
#pragma pack(1)//you're now responsible for understanding the alignment requirements of your target CPU...
struct StateGroupC {
    u64 stateMask;
    u16 numStates;
    u16 sizeInBytes;
};
#pragma pack(pop)

I do pretty much the same thing that you've thought of here, except I also put a bitfield in the header, with one bit for each type of state.
For an example of where this may be be useful -- in my loop that processes state-groups (posted earlier), I allow an item to contain 'layers' of state-groups, where state-values in earlier/higher groups take precedence over the values in later/lower groups. The bitfield lets me quickly check if a state-group contains any states that I'm interested in looking at, or whether it only contains states that I've already got values for and can be skipped entirely.

I can't see a way to stop the user from putting DrawCalls in the state group. Do you implement StateGroups differently, or is there something I'm missing?

It depends on how you generate your StateGroups. I use the same allocation scheme -- a header followed by states -- and use a stack allocator to achieve this.
As a simple example of how you could enforce compile-time checking while creating a state-group:

class StateGroupWriter
{
  StackAlloc& a;
  StateGroupHeader* header;
public:
  void Begin() { header = a.Alloc<StateGroupHeader>(); header->numStates = 0; header->stateMask = 0; }
  void Write( State& state )
  {
    uint size = state.Size();
    u8* mem = a.Alloc(size);
    memcpy(mem, &state, size);
    ++header->numStates;
    header->stateMask |= u64(1)<<state.id;
  }
  StateGroupHeader* End()
  {
    StateGroupHeader* result = header;
    header = 0;
    result->sizeInBytes = a.Mark() - ((u8*)result);
    return result;
  }
};

. 22 Racing Series .

IncidentRay

154

January 13, 2013 05:26 AM

[quote name='Hodgman' timestamp='1358051934' post='5020960']
If you're going for memory efficiency, be aware of your compiler's padding behaviour when designing your header structures
[/quote]

Yeah, that StateGroup struct was just a quick example; I'd definitely worry about padding in real code.

[quote name='Hodgman' timestamp='1358051934' post='5020960']
e.g. I put state-groups directly in my engine's model, material and technique files
[/quote]

Putting the state groups in resources files is an intriguing idea. How do you deal with pointers in this case, as obviously they will have different values each time you run the engine? I suppose you could store some other data in the pointer field, and then fix the pointers at runtime -- e.g. maybe a hash of a filename. Or do you do something else?

[quote name='Hodgman' timestamp='1358051934' post='5020960']

I do pretty much the same thing that you've thought of here, except I also put a bitfield in the header, with one bit for each type of state.
[/quote]

Your recommendation of storing a state bitfield sounds good -- that looks like it would be quite useful. As I understand it, the bitfield can be used to quickly check whether a state group can be totally ignored, but if not, the next step is to check each state individually?

Thanks for the detailed example of how you might generate state groups; that answers several questions I had.

Hodgman

52,717

January 13, 2013 06:39 AM

How do you deal with pointers in this case, as obviously they will have different values each time you run the engine? I suppose you could store some other data in the pointer field, and then fix the pointers at runtime -- e.g. maybe a hash of a filename. Or do you do something else?

First up, if you want to research this, the technique of loading your runtime data structures straight out of a file with no (or little) on-load processing is usually called "load-in-place" or "in-place memory", or something similar. There's a gamasutra article here.

For pointers within a particular asset (e.g. a pointer from a model header to an array of state-group pointers, to a state-group) I use the Offset (relative address) and Address (absolute address) classes in this header -- i.e. I don't use actual pointers.

For things that need to be pointers at runtime, but can't be known in advance (e.g. a pointer to an actual D3D vertex buffer), then they have to undergo "pointer patching" on-load.

For example, say a model asset has n vertex-buffers, and also has some state-groups that need to contain pointers to those vertex buffers. The state groups are saved with an integer from 0..n, in place of the VB pointer, which indicates the index of the VB that it should point to. On-load, the model's VBs are created by D3D and we now know the real pointer values. We can then iterate through the state-groups, reading these index integers and using them to look up the appropriate VB pointer, and writing the pointer over the top of the integer.

For references to other assets, I use filename hashes, yep. As above, these hashes can be converted to real pointers on-load, if required.

but if not, the next step is to check each state individually?

Yeah, my states are variable size, and the group-header doesn't contain the actual offset of each state. Therefore to iterate through the group, you have to inspect each state in a linear order, and determine the current state's size to know how far to jump ahead to find the next state. The bitfield does allow you to halt this iteration early if you know that you've already inspected all of the 'interesting' states in the group though.

As an alternative, you could allocate an array of size numStates in/after the header and write the offset of each state into this array. If you then ordered the states by their ID value, you'd be able to quickly jump to a particular state that you're interested in without iterating through each one.

. 22 Racing Series .

Advanced Render Queue API

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Advanced Render Queue API

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines