Followers 0

Advanced Render Queue API

31 posts in this topic

Thanks for replying Noizex. You gave some great input. I was wondering if you perform any batching of your Render Tasks and if so, how? I don't see any place for attributes that aren't in your VAO.
0

Share on other sites
Thanks for replying Noizex. You gave some great input. I was wondering if you perform any batching of your Render Tasks and if so, how? I don't see any place for attributes that aren't in your VAO.

I don't batch them really, I have plan to draw some objects with instancing (not sure yet if I will just submit 1 RenderTask that has additional info for instanced drawing, or many RenderTasks and somehow determine they should be instanced and collapsed to 1 draw call in the renderer). I batch things before I submit RenderTask too because often its very specific for the thing thats drawn.

Whole thing is so flexible that I don't optimize too much yet (like batching everything) because its more convenient to have 1 VAO per single objec and just draw it with 1 draw call per object. If I ever run into draw call problem I can batch them just modifying how tasks are processed by renderer. I will for sure batch things like foliage / grass / particles and other things that otherwise would waste too many draw calls, but for normal objects / terrain I don't really want to optimize that yet, unless I see that I end up with too many calls from this.

1

Share on other sites
Hodgman, are your states then inheritted from a State base class? If not, how do you determine how to set the state the correct way? I would think you would want to avoid the overhead of virtual functions on something so low level on the engine.
0

Share on other sites
So, for this particular case (renderer commands), I implemented a vtable-type system myself, where every different command class shares the same "vtable" -- this means that if I execute 100 commands in a row (from a pre-prepared command buffer), then the first one will generate a cache-miss when reading this "vtable", but the following commands are less likely to cause the same cache-miss.

So if everything is a command, do you batch ahead of time like Noizex? Say for example, geometry instancing, how would you collect each instance for creation of the commands? And thanks for such a good example. You've helped clarify a ton already :D

0

Share on other sites

Hodgman, wouldn't this violate the strict aliasing rule when you cast a Command reference to a Foo or Bar reference, or vice versa?

1

Share on other sites
Also, the only value that we actually need to "alias" is the first member -- u8 id -- and it doesn't actually need to be aliased as a different type, so it's possible to write this system in a way that doesn't violate strict aliasing if you need to -- e.g.

Thanks for the example.  Would you still need the Command struct with this design?  Also, I was wondering whether you think it's worth trying to always avoid breaking the strict aliasing rule, or do you think it's better to just risk the undefined behavior if it's the simplest option?

Edited by IncidentRay
0

Share on other sites
Thanks for the example.  Would you still need the Command struct with this design?  Also, I was wondering whether you think it's worth trying to always avoid breaking the strict aliasing rule, or do you think it's better to just risk the undefined behavior if it's the simplest option?

No, the command struct has been replaced with a pointer to the id's primitive type.

Yes, breaking the strict-aliasing rule can be very bad, because it can cause the compiler to emit code that doesn't do what you intended it to! So it should be avoided.

I've taken this thread off-topic enough already, so I've started new topic just about the strict aliasing rule over here

0

Share on other sites

I would like to add another question in this topic:

How do you handle RenderItems (objects) that require a Texture that is generated by a differente RenderStage.

Example:

In a deferred renderer, every light source needs to have it's shadow map generated, but you only have a GPU resource to store the shadow map so you have to:

-Generate Light 1 shadow map;

-Draw Light 1;

-Generate Light 2 shadow map;

-Draw Light 2;

...

Currently I handle this by having a command called ExecuteRenderStage that stop the rendering of the current render stage, executes another stage and restores back to the "main" one, but I would like to hear how you do it.

0

Share on other sites
All this talk of unpredictable behavior has me questioning this approach. What if a Command was simply a sort of container, like:
struct Command {
Commands::Type id;
union {
Foo* foo;
Bar* bar;
} u;
};
0

Share on other sites

I don't know why I didn't mention it before, but in my own engine I get around the undefined behaviour the potential aliasing issues with inheritance...

struct Command { Commands::Type id; };
struct Foo : public Command { int value; };

How do you handle RenderItems (objects) that require a Texture that is generated by a differente RenderStage

I just submit a series of stages. e.g. the stage to generate a shadow-map, then a stage that draws the light (which is a draw-call paired who's paired state-group sets the texture generated by the first stage).

Edited by Hodgman
0

Share on other sites

[quote name='Hodgman' timestamp='1357784838' post='5019739']
I get around the undefined behaviour with inheritance...
[/quote]

But if you use inheritance, don't the structs become non-POD types?  That might create more undefined behavior to deal with -- for example, I was thinking of using memcmp for detecting redundant state-changes in the RenderGroup class, but that would only work if the structs were POD.

1

Share on other sites
I too am puzzled by how redundant state changes are eliminated in this model. Am I correct in that states may be submitted in any order? And if this is the case, then states may be sorted and then linearly compared. However, this seems expensive considering how many states may be set per frame. I'm sure you have a much more clever way of doing this.
0

Share on other sites
But if you use inheritance, don't the structs become non-POD types?  That might create more undefined behavior to deal with -- for example, I was thinking of using memcmp for detecting redundant state-changes in the RenderGroup class, but that would only work if the structs were POD.
You've got a good eye for C++ details ;) I should've said inheritance avoids the strict-aliasing issues, but you're right, the standard says that using inheritance like that means they're now non-POD.
However, on the compilers that I support, they still act as if they were POD, so I can still memcmp/memcpy them on these compilers. Relying on compiler details should generally be avoided, but it's something you can choose to do

Instead of inheritance, I guess I could've used composition to be fully compliant, e.g.
struct Command { int id; };
struct FooCommand { Command baseClass; int fooValue; };

I too am puzzled by how redundant state changes are eliminated in this model. Am I correct in that states may be submitted in any order? And if this is the case, then states may be sorted and then linearly compared. However, this seems expensive considering how many states may be set per frame. I'm sure you have a much more clever way of doing this.
I haven't really mentioned redundant state removal, except that I do it at the "second level". The 1st level takes a stream of commands, and can't do any redundant state removal besides the traditional technique, which is to check the value of every state before submitting it, something like:
if( 0!=memcmp(&cache[state.id], &state, sizeof(State)) ) { cache[state.id]=state; Apply(state); }

A lot of renderers do do redundant state checking at that level, which pretty much means having an if like the above every time you go to set a state. I do a little bit of this kind of state caching, but try to avoid it.
Instead, I do redundant state checking at the next level up -- the part that generates the sequences of commands in the first place. This part of the code also submits commands to set states back to their default values if a particular draw-call hasn't been paired with any values for that state.
After sorting my render-items, the "2nd layer" which produces the stream of commands for the 1st layer looks like:
defaults[maxStates] = {/*states to apply if a value doesn't exist for them*/}

previousState[maxStates] = {NULL} // a cache of which states are 'current'

nonDefaultState[maxStates] = {true} // which states have a non-default value

for each item in renderItems

draw = item.draw
stateGroups = item.stateGroups

statesSet[maxStates] = {false} //which states have been set by this item
for each group in stateGroups
for each state in group
if statesSet[state.id] == false && //this state not set by a previous group in this item
previousState[state.id] != state //this state not set by a previous item and still current
then
Submit(state) // add to command buffer, or send to device
statesSet[state.id] = true
previousState[state.id] = state
endif
endfor
endfor

setToDefault = nonDefaultState & ~statesSet
nonDefaultState = statesSet
for each id in setToDefault
Submit(defaults[state.id]) // add to command buffer, or send to device
previousState[state.id] = defaults[state.id]
endfor

Submit(draw) // add to command buffer, or send to device

endfor
Except the actual C++ code uses a lot of bitmasks instead of arrays of bools, and uses pointers to identify state value equality, and everything is tightly laid out to be cache-friendly, etc... Edited by Hodgman
1

Share on other sites
Thanks again Hodgman. I really appreciate how detailed yet concise your responses are. The only thing that is still not completely clear to me is the generation of RenderItems. Are they allocated each frame from a data cache (like what is described here http://docs.madewithmarmalade.com/native/api_reference/iwgxapidocumentation/iwgxapioverview/datacache.html )? And would a higher level object like a GLShader or GeometryPacket class then maintain their respective Commands? I am not seeing a way to to check for duplicate states by comparing pointers unless the Commands are maintained by global, shared resources, or I guess if Commands ARE global shared resources, but the first option seems cleaner.

Again, I really appreciate everyone's input on this thread, it has helped me a great deal.
1

Share on other sites

Thanks for the reply, Hodgman.  Memcpy does seem to work on non-POD types in this case.  I'm interested in the alternate design you suggested using composition rather than inheritance.  That looks like it work quite well for one level of inheritance, but I'm not sure about how you would extend this to multiple levels of inheritance.  For this renderer design, I think you might need two levels of inheritance -- you mentioned that State inherits from Command, and then, in my understanding, you have multiple render-state structs derived from State.  Would you just do something like this?

struct Command {
u8 id;
};

struct State {
Command baseClass;
// other members...
};

State baseClass;
};


1

Share on other sites
For this renderer design, I think you might need two levels of inheritance -- you mentioned that State inherits from Command, and then, in my understanding, you have multiple render-state structs derived from State. Would you just do something like this?

I'm pretty sure you just need a single level of inheritance

If you have a Command struct:

struct Command
{
u8 id;
};

You can have as many commands structs as you need:

struct BindVSCommand
{
Command cmd;
};

struct BindVSTexture0Command
{     Command cmd;
Texture* pTexture;
};

struct BindVSTexture1Command
{
Command cmd;
Texture* pTexture;
};

In your State struct you don't need any other members so you can remove that structure.

In my renderer (which follows many of Hodgman ideas) I don't even use a command struct so no need for inheritance.

I simply store an int before each Command struct in the CommandGroup blob (in this topic called StateGroup).

When executing CommandGroups I get the int from the blob switch on it to the appropriate command based on it.

CommandGroup cmdGroup;

for(int i = 0; i < cmdGroup.numCommands; i++)
{
int cmdID = cmdGroup.blob.next<int>();

switch(cmdID)
{
//all kinds of state check etc
executeCommand(cmdID, cmd);
break;
//all other commands
}
}

To Hodgman: is there any perfomance penalty in using template functions?

Edited by TiagoCosta
1

Share on other sites
While I am not Hodgman, I will say that, broadly speaking, templates increase the compiled code's size. If the function is inline though (which I assume it is in your case, TiagoCosta), there should not be any negative performance penalties.
0

Share on other sites

[quote name='TiagoCosta' timestamp='1358025244' post='5020848']
I'm pretty sure you just need a single level of inheritance
[/quote]

You're right, it does look like a single level of inheritance would be fine.  I was wondering about the purpose of the State struct, as I can't think of what data members it would have.

About templates (sorry, not Hodgman!): I'm pretty sure templates are a purely compile-time feature, so it shouldn't have too much of a runtime-performance penalty.  However, melbow has a good point: templates often do increase the size of the compiled code.  I think there probably wouldn't be much benefit if you implemented exactly the same thing but without using templates (e.g. just copying and pasting your classes/structs/functions and changing the variable types).  Templated functions could possibly cause cache-misses if the compiler hasn't placed the code for both implementations nearby.  In your example, if the instructions for next<int> and next<BindVertexShaderCommand> aren't nearby, this might cause an i-cache miss.  However, this only applies if they're not inlined, so if they are inlined, you should be fine.  I've only started learning about how caches work recently, though, so someone should correct me if I'm wrong.

0

Share on other sites

Yeah you could do this with a single level of inheritance. I do use two levels like you demonstrated, IncidentRay, but State (and DrawCall) don't have any other members.

I only have this second layer of inheritance so that elsewhere in the API I can say that StateGroups contain States (which are just Commands) and SubMeshes/RenderItems contain DrawCalls (which are also just Commands). It just ensures that the user can't put a DrawCall-derived command in a StateGroup, nor use a State-derived command to represent geometry in a SubMesh, without generating a compile-time error.

TiagoCosta's method of not even putting the ID integer in command structure (and just having it precede the structure in the parent allocation) is also a popular choice -- I've often seen this approach used in networking systems.

Yeah when using templates, you can imagine what the code would be like if you manually implemented them, e.g. if you wrote NextInt() and NextBindVertexShaderCommand() -- the compiler will basically be doing that behind the scenes.

If two different template functions happen to produce the same assembly code (e.g. vector<void*> and vector<Foo*> are likely exactly the same at the asm level) then your compile times will suffer, as the compiler will generate the same code multiple times, but then a modern linker will remove/merge the duplicates.

On old compilers (nothing you'll be using these days), the linker wouldn't perform this step, so if you had many different vector<T*> (with different T's) you'd end up with a ton of identical functions in your final executable... This is one reason why game developers didn't used to like using templates or the STL.

1

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Followers 0

• 10
• 11
• 19
• 14
• 23