Advanced Render Queue API

melbow · 2013-01-15T22:48:22

Let me preface this by saying I have read everything I could find on the web on this topic, but have been unable to answer my question. This list includes most notably: http://realtimecollisiondetection.net/blog/?p=86http://realtimecollisiondetection.net/blog/?p=86 http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter05.html http://www.gamedev.net/topic/604899-frostbite-rendering-architecture-question/ http://www.gamedev.net/topic/605065-renderqueue-design-theory-and-implementation/#entry4828468 http://www.gamedev.net/topic/602839-whats-the-point-of-having-a-single-render-device/#entry4815585 I understand sorting the draw calls and how to use this system under the fixed-function pipeline. However, when I change to a programmable pipeline, I can't seem to come up with how a "Render Operation" would be structured. The best solution I could come up with was an object list for both uniforms and attributes. However, I can't see how a VAO or VBO would fit into this scheme. Roughly, my code might look like: class ShaderProgram { public: void Create(char*,char*); void MakeCurrent(); ShaderAttributeInfo const & GetAttribute(const char* name); ... private: List<ShaderAttributeInfo> m_Attributes; List<ShaderUniformInfo> m_Uniforms; }; class ShaderAttribInfo { GLint handle; GLenum type; public: Set(void* data); }; class ShaderAttrib { ShaderAttribInfo* info; void* data; ... }; ... // Same basic structure for Uniforms class RenderOperation { List<ShaderAttrib> m_Attributes; List<ShaderUniform> m_Uniforms; ShaderProgram* m_Program; }; I'm trying to make my system as versatile, yet efficient as possible. To reiterate, my question is: How would a "Render Operation" be formatted for a Render Queue using a programmable pipeline? The operation should allow for any sort of uniform or attribute and allow for VAOs and VBOs. And I don't feel the need to support a fixed function pipeline, so don't worry about that. I hope I made my confusion clear enough. Thanks a ton.

Graphics and GPU Programming Programming

Started by melbow December 27, 2012 06:37 AM

30 comments, last by melbow 11 years, 3 months ago

Hodgman

52,717

December 30, 2012 07:49 AM

Hodgman, wouldn't this violate the strict aliasing rule when you cast a Command reference to a Foo or Bar reference, or vice versa?

Yes. Technically, casting a Foo* to a Command* is undefined behaviour, but in practice, it will work in most situations.

We're never writing to an aliased Command and reading from an aliased Foo (or vice versa) inside the one function, which minimizes the risks.
e.g. this code would be dangerous:


assert( command.id == 0 );//assume the command is actuall a "Foo"
command.id = 42;//change the id value
Foo& foo = *(Foo*)&command;
assert( foo.id == 42 );//the id value should be changed on the "Foo" also, but this might fail in optimized builds!

The worst thing in the earlier code is a sub-optimal assertion:


assert( command.id >= Commands::Bar0 && command.id <= Commands::Bar2 );//this will load command.id from RAM
Bar& bar = *(Bar*)&command;
device.SetBarSlot( bar.id - Commands::Bar0, bar.value );//bar.id will generate another "load" instruction here, even though the value was loaded above

Also, the only value that we actually need to "alias" is the first member -- u8 id -- and it doesn't actually need to be aliased as a different type, so it's possible to write this system in a way that doesn't violate strict aliasing if you need to -- e.g.


//Instead of this:
Foo foo = { Commands::Foo, 1337 };
Command* cmd = (Command*)&foo;
SubmitCommand( device, *cmd );

//We could use
Foo foo = { Commands::Foo, 1337 };
u8* cmd = &foo.id;
SubmitCommand( device, cmd );

//with:
inline void SubmitCommand(Device& device, u8* command)
{
	g_CommandTable[*command](device, command);
}
void Submit_Foo(Device& device, u8* command)
{
	assert( *command == Commands::Foo );
	Foo& foo = *(Foo*)(command - offsetof(Foo,id));
	device.DoFoo( foo.value );
}

P.S. u8* (my version of unsigned char*) is allowed to alias any other type (strict aliasing rule doesn't apply to it), but the above version will work even if this wasn't true.

. 22 Racing Series .

IncidentRay

154

December 31, 2012 01:43 AM

Also, the only value that we actually need to "alias" is the first member -- u8 id -- and it doesn't actually need to be aliased as a different type, so it's possible to write this system in a way that doesn't violate strict aliasing if you need to -- e.g.

Thanks for the example. Would you still need the Command struct with this design? Also, I was wondering whether you think it's worth trying to always avoid breaking the strict aliasing rule, or do you think it's better to just risk the undefined behavior if it's the simplest option?

Hodgman

52,717

January 01, 2013 01:52 PM

Thanks for the example. Would you still need the Command struct with this design? Also, I was wondering whether you think it's worth trying to always avoid breaking the strict aliasing rule, or do you think it's better to just risk the undefined behavior if it's the simplest option?

No, the command struct has been replaced with a pointer to the id's primitive type.

Yes, breaking the strict-aliasing rule can be very bad, because it can cause the compiler to emit code that doesn't do what you intended it to! So it should be avoided.

I've taken this thread off-topic enough already, so I've started new topic just about the strict aliasing rule over here

. 22 Racing Series .

Aqua Costa

3,705

January 03, 2013 12:24 AM

I would like to add another question in this topic:

How do you handle RenderItems (objects) that require a Texture that is generated by a differente RenderStage.

Example:

In a deferred renderer, every light source needs to have it's shadow map generated, but you only have a GPU resource to store the shadow map so you have to:

-Generate Light 1 shadow map;

-Draw Light 1;

-Generate Light 2 shadow map;

-Draw Light 2;

...

Currently I handle this by having a command called ExecuteRenderStage that stop the rendering of the current render stage, executes another stage and restores back to the "main" one, but I would like to hear how you do it.

melbow

221

Author

January 10, 2013 02:10 AM

All this talk of unpredictable behavior has me questioning this approach. What if a Command was simply a sort of container, like:
struct Command {
Commands::Type id;
union {
Foo* foo;
Bar* bar;
} u;
};

Hodgman

52,717

January 10, 2013 02:27 AM

I don't know why I didn't mention it before, but in my own engine I get around the undefined behaviour the potential aliasing issues with inheritance...


struct Command { Commands::Type id; };
struct Foo : public Command { int value; };

How do you handle RenderItems (objects) that require a Texture that is generated by a differente RenderStage

I just submit a series of stages. e.g. the stage to generate a shadow-map, then a stage that draws the light (which is a draw-call paired who's paired state-group sets the texture generated by the first stage).

. 22 Racing Series .

IncidentRay

154

January 10, 2013 05:02 AM

[quote name='Hodgman' timestamp='1357784838' post='5019739']
I get around the undefined behaviour with inheritance...
[/quote]

But if you use inheritance, don't the structs become non-POD types? That might create more undefined behavior to deal with -- for example, I was thinking of using memcmp for detecting redundant state-changes in the RenderGroup class, but that would only work if the structs were POD.

melbow

221

Author

January 11, 2013 08:06 AM

I too am puzzled by how redundant state changes are eliminated in this model. Am I correct in that states may be submitted in any order? And if this is the case, then states may be sorted and then linearly compared. However, this seems expensive considering how many states may be set per frame. I'm sure you have a much more clever way of doing this.

Hodgman

52,717

January 11, 2013 10:51 AM

But if you use inheritance, don't the structs become non-POD types? That might create more undefined behavior to deal with -- for example, I was thinking of using memcmp for detecting redundant state-changes in the RenderGroup class, but that would only work if the structs were POD.

You've got a good eye for C++ details ;) I should've said inheritance avoids the strict-aliasing issues, but you're right, the standard says that using inheritance like that means they're now non-POD.
However, on the compilers that I support, they still act as if they were POD, so I can still memcmp/memcpy them on these compilers. Relying on compiler details should generally be avoided, but it's something you can choose to do

Instead of inheritance, I guess I could've used composition to be fully compliant, e.g.

struct Command { int id; };
struct FooCommand { Command baseClass; int fooValue; };

I too am puzzled by how redundant state changes are eliminated in this model. Am I correct in that states may be submitted in any order? And if this is the case, then states may be sorted and then linearly compared. However, this seems expensive considering how many states may be set per frame. I'm sure you have a much more clever way of doing this.

I haven't really mentioned redundant state removal, except that I do it at the "second level". The 1st level takes a stream of commands, and can't do any redundant state removal besides the traditional technique, which is to check the value of every state before submitting it, something like:

if( 0!=memcmp(&cache[state.id], &state, sizeof(State)) ) { cache[state.id]=state; Apply(state); }

A lot of renderers do do redundant state checking at that level, which pretty much means having an if like the above every time you go to set a state. I do a little bit of this kind of state caching, but try to avoid it.
Instead, I do redundant state checking at the next level up -- the part that generates the sequences of commands in the first place. This part of the code also submits commands to set states back to their default values if a particular draw-call hasn't been paired with any values for that state.
After sorting my render-items, the "2nd layer" which produces the stream of commands for the 1st layer looks like:

defaults[maxStates] = {/*states to apply if a value doesn't exist for them*/}

previousState[maxStates] = {NULL} // a cache of which states are 'current'

nonDefaultState[maxStates] = {true} // which states have a non-default value

for each item in renderItems

 draw = item.draw
 stateGroups = item.stateGroups
 
 statesSet[maxStates] = {false} //which states have been set by this item
 for each group in stateGroups
  for each state in group
   if statesSet[state.id] == false && //this state not set by a previous group in this item
      previousState[state.id] != state //this state not set by a previous item and still current
    then
     Submit(state) // add to command buffer, or send to device
     statesSet[state.id] = true
     previousState[state.id] = state
   endif
  endfor
 endfor

 setToDefault = nonDefaultState & ~statesSet
 nonDefaultState = statesSet
 for each id in setToDefault
  Submit(defaults[state.id]) // add to command buffer, or send to device
  previousState[state.id] = defaults[state.id]
 endfor

 Submit(draw) // add to command buffer, or send to device

endfor

Except the actual C++ code uses a lot of bitmasks instead of arrays of bools, and uses pointers to identify state value equality, and everything is tightly laid out to be cache-friendly, etc...

. 22 Racing Series .

melbow

221

Author

January 12, 2013 04:30 AM

Thanks again Hodgman. I really appreciate how detailed yet concise your responses are. The only thing that is still not completely clear to me is the generation of RenderItems. Are they allocated each frame from a data cache (like what is described here http://docs.madewithmarmalade.com/native/api_reference/iwgxapidocumentation/iwgxapioverview/datacache.html )? And would a higher level object like a GLShader or GeometryPacket class then maintain their respective Commands? I am not seeing a way to to check for duplicate states by comparing pointers unless the Commands are maintained by global, shared resources, or I guess if Commands ARE global shared resources, but the first option seems cleaner.

Again, I really appreciate everyone's input on this thread, it has helped me a great deal.

Advanced Render Queue API

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Advanced Render Queue API

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines