Jump to content
  • Advertisement
Sign in to follow this  
lephyrius

OpenGL Render queue ids/design?

This topic is 2073 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I've deciding that I want a render queue!  

I have decided 2 things:

1. Not submitting a new queue on each frame.(seems like it creates a lot of memory fragmentation). I will have to have some reference to the scenegraph then. Is this a bad thing?

2. Have separate queues to "opaque 3D", "transparent 3D" and "2D". Because they have a bit different sorting order.

Im using OpenGL in my example but I feel that this could apply to any other API as well.

So the ID is 64-bit you need to sort after.

The parameters are in the priority order for the opaque 3D queue:
1. Shader id
2. Material id(probably an UBO or a combination of texture id)
3. VBO id
4. IBO id

Now how do I combine the ids into one 64 bit integer?
Something like this?

uint64_t queue_id = ((shader_id % USHRT_MAX) << 48) | ((material_id % USHRT_MAX) << 32) | ((vbo_id % USHRT_MAX) << 16) | ibo_id % USHRT_MAX

Is there some other way of compacting the 32-bits numbers into 16-bit?

Or should I maybe create more of a wrapper around the GL ids?
So the shader permutation class:

class Shader {
public:
uint16_t id;
GLuint shader_id;
};


And have a manager(factory) that takes care of assigning ids that is compact.
 

class ShaderManager {
  public:
 
  Shader& get(const char* filename);  // The shader configuration file.
 
  std::map<std::string, uint16_t> loaded_shaders;
  std::vector<uint16_t> free_ids;
  std::vector<Shader> shaders;
};

Hmmm..
This solution is probably a bit more compact and robust.

Hmmm...
I don't think I ever will have 65535 different shaders, materials, vbos or ibos at least not at the same time. Then I could use uint8 and add z order also.

Maybe I should have different kind of queues so that:
 

class ShaderCommand {
public:
  GLuint shader_id;
 
  std::vector<MaterialQueue> texture_queue;
};

class ShaderQueue { // The root of the tree.
public:
  ShaderCommand& add();
  void remove(ShaderCommand& command);
  void render();
std::vector<ShaderCommand> commands;

};
// This is just an example of the render function.
void ShaderQueue::render() {
  // Sort the commands based on shader_id.(skipped)
  GLuint current_shader_id = 0;
  for (ShaderCommand& command : commands  ) {
    glUseProgram(command.shader_id);     // Apply the shader!
    for (TextureQueue& queue : command.texture_queue  ) {
      queue.render();
    }
  }
}
class MaterialCommand {
public:
  size_t material_id;
  GLuint textures[4];  // Probably will be ubo:s but I use textures for now for simplicity.  
 
  std::vector<MaterialQueue> texture_queue;
};

class MaterialQueue { 
public:
  MaterialCommand& add();
  void remove(MaterialCommand& command);
  void render();
std::vector<MaterialCommand> commands;

};

The problem is that I feel that with this approach is that it probably creates more memory fragmentation(+ maybe some cache) and it's harder to change materials on things(I wonder how often that needs to be done). But this will be more of a bucket approach. Also another problem is that this needs a lot copying when I sort unless I use pointers for storing the command. Another problem I see is that like for instance a mesh node in the scene graph will need to have: ShaderCommand, MaterialCommand, VBOCommand and IBOCommand references in their nodes so they could change the Material, Shaders and VBO:s/IBO:s.
At least it will solve the generating of the ids.

Am I overthinking this now?
Is there something I have totally missed? Or I need to have/think about?

Share this post


Link to post
Share on other sites
Advertisement
Hello. What I did is to have a render list(vector) of each different object type that holds only a pointers to objects that are on screen. Then all I do is render each list then clear them ready for next frame. I'm using instancing as well. The list is realy only used to get each objects world pos. It's like buckets

Share this post


Link to post
Share on other sites

Hello. What I did is to have a render list(vector) of each different object type that holds only a pointers to objects that are on screen. Then all I do is render each list then clear them ready for next frame. I'm using instancing as well. The list is realy only used to get each objects world pos. It's like buckets

 

Hi
 
Well I'm using different design.
Lets say we have a scenegraph, where each renderable object holds pointer to material, then each material holds pointer to shader, now when it comes to rendering
we traverse scenegraph adding each renderable object to it's material's internal list.
Now when it comes to rendering of solid things we traverse thru list of solid shaders, bind program and 'global scale' uniforms, then we traverse thru shader's materials list
bind per material states (textures, 'material scale' uniforms) and then go thru the material's list of renderable objects and bind their uniforms and finaly renders them.
 
By shader I mean: shader program + any shader global uniforms (for example bumpShader or paralaxShader)
By material mean: set of textures + any material specific uniforms (for example duffise_map.dds, bump_map.dds, specular_power, specular_color, etc. etc.)
By object I mean: VertexBuffer, IndexBuffer, ToWorld matrix, optiomally SkinningPalette.
 
Instancing is easy, you just need to sort each material renderables list by object_id (assuming that objects that have exacly that same VertexBuffer and IndexBuffer have that same object_id)
and then put all ToWorld matrixes of objects with same object_id into instancing table and draw them at once.
 
In reality when you traverse scenegraph you will clip to to frustum, shadow_frustum, reflection_frustum(s) etc. and have a bit_flag that gets bits set for each frustum where the object is visible
and then at render time you just check if the visibility_mask & clip_bits are zero or not to render only whats needed.
 
For objects with transparency there's a problem with sorting as with this design you can sort only in one material bucket, but for me this was not an issue since i have only one transparent shader
and all the transparent objects have same texture atlas.
 
Any comments on this design are welcome.

Share this post


Link to post
Share on other sites

now when it comes to rendering we traverse scenegraph

A scenegraph is completely unrelated to rendering.  Every time a scenegraph is used for rendering an angel loses its wings.
 
 

Am I overthinking this now?

Your first example is more correct, but you need to target your priorities better.
 
Obviously the shader ID needs to be there, as changing shaders is the most expensive thing you can do.
The next most expensive is typically texture swaps.  You have no information regarding the textures used, unless that information is part of the material.
Next are the VBO’s and then IBO’s.  If texture information is not part of the material, discard the material—by this point it wouldn’t be of any help once everything else is sorted.
 
You have no depth information.  Sorting by depth is a good idea for depth pre-pass (if you use it) and necessary for transparent objects.
 
You also need some kind of priority system so that things can be drawn before other things.  This goes even if you have a layering system, as even within a layer you may need that kind of control.
 

Or should I maybe create more of a wrapper around the GL ids?

You should always use your own ID’s instead of theirs.  There is nothing in the OpenGL standard saying they have to give you shader ID’s starting at 1.  79,873 is a fully valid ID to get back for your first shader.
 
 

And have a manager(factory) that takes care of assigning ids that is compact.
 

class ShaderManager {
  public:
 
  Shader& get(const char* filename);  // The shader configuration file.
 
  std::map loaded_shaders;
  std::vector free_ids;
  std::vector shaders;
};

Why should some manager assign ID’s, and why do you have a list of “free ID’s”? Are these ID’s that are available to be taken?

A simpler way would be:

 
class Shader {
	/** ID of this shader. */
	LSUINT32						m_ui32Id;

	/** ID counter. */
	static LSUINT32						m_ui32IdCounter;

	/** Thread safety. */
	static CCriticalSection					m_csBaseCrit;
}


//*************************
// CPP
//*************************

/** ID counter. */
LSUINT32 Shader::m_ui32IdCounter = 0UL;


Shader::Shader() {
	CCriticalSection::CLocker lLock( m_csBaseCrit );
	m_ui32Id = ++m_ui32IdCounter;
}
The shader can assign its own ID. More robust because it eliminates the necessity that the shader be created by the shader manager—manual creation does not break the system.
The critical section isn’t really necessary though; it could just be an atomic.


L. Spiro Edited by L. Spiro

Share this post


Link to post
Share on other sites


The next most expensive is typically texture swaps. You have no information regarding the textures used, unless that information is part of the material.
Next are the VBO’s and then IBO’s. If texture information is not part of the material, discard the material—by this point it wouldn’t be of any help once everything else is sorted.

You have no depth information. Sorting by depth is a good idea for depth pre-pass (if you use it) and necessary for transparent objects.

Yeah!

At first I was thinking that material is a combination of all the textures like diffuse, normals and specular.

So let's say I have this structure:(I assume u mean that I need to have a single ID of 64-bit)

Then the structure would be more like this:

1. type(3D opaque, 3D transparent,fullscreen effects like SMAA/HDR, 2D/UI) = 2 bit

2. z-index = 8 bit

3. shader = 7 bit

4. diffuse = 6 bit

5. normals = 6 bit

6. specular = 6 bit

7. IBO = 6 bit

8. VBO = 6 bit

 

This is really conservative!

I get this to 2+8+7+5*6 = 47 bit

 

Is this a correct order? Should I maybe move z-index down a bit?

Hmmm.. Layering. Couldn't this be solved with z-index?

Now I got 17 bits to play around with.

I should probably increase some bits. What do u think would be best to increase?

Or Have I missed something that I need to have?

Maybe I shouldn't exactly specify diffuse,normals and specular but then I would need some kind of "material_id" like before. This feels more sustainable at least and I can easily sort them. Maybe just have 3 texture pools and u could store them in different slots for the shader?

 


Why should some manager assign ID’s, and why do you have a list of “free ID’s”? Are these ID’s that are available to be taken?

For shaders permutations this is most likely to be loaded when the game starts so then it's ok. Let's say we have 100 shader permutations and that would easily fit into 8 bit.

My concern is with with textures(needs most likely 3 textures per model) and IBO:s/VBO:s(a lot more per model) . They will change when the game progresses and then I need to reuse the id:s. Because I must be able to squeeze everything into  64-bit in order to sort the render queue efficiently.

 


The shader can assign its own ID. More robust because it eliminates the necessity that the shader be created by the shader manager—manual creation does not break the system.
The critical section isn’t really necessary though; it could just be an atomic.

I really like the idea I also like the idea of using atomics. The problem is that it feels it could create too many ID:s if you just add up.

Share this post


Link to post
Share on other sites

I've been thinking I got some sleep.

 

I think I need 2 types of ids.

1 for opaque objects and 1 for transparent.

These lists just to clarify 1 = most significant bit.

So for opaque objects:

1. Type(like a tiny header in this case = 0(opaque 3D)) = 2 bit

2. layer = 4 bit

3. shader = 7 bit

4. diffuse = 7 bit

5. normals = 7 bit

6. specular = 7 bit

7. IBO = 7 bit

8. VBO = 7 bit

9. z-index = 8 bit

 

Ok, this has now 2+4+7*6+8 = 56 bit

Hmmm.. I think I could squeeze in another texture and that's a glow map that would mean: 2+4+7*7+8 = 63 bit

Pretty close to 64.

 

So for the transparent objects:

1. Type(like a tiny header in this case = 1(transparent 3D)) = 2 bit

2. layer = 4 bit

3. z-index = 12 bit

4. shader = 7 bit

5. diffuse = 7 bit

6. normals = 7 bit

7. specular = 7 bit

8. IBO = 7 bit

9. VBO = 7 bit

 

So for this it's 2+4+12+7*6 = 60 bit.

Also I could have 2 pools(1 for opaque and 1 for transparent) of shaders, diffuse textures, normals textures and specular textures. So I at least could get 256 IBO/VBO and texture combinations.

 

I think I need to think about this design a bit more so that I don't regret it in the future.

Share this post


Link to post
Share on other sites

First-off, specifying diffuse, specular, normal, etc. textures specifically is a very good way to ensure you have to rewrite the whole thing at a later date.

Never hard-code the set of textures you have; you will always need to add one more new type.  Besides you are wasting bits.

 

A better way would be to have a map-like class that takes an array of any number of texture ID’s and for each unique order and combination returns a single ID that need only consume 16 bits or so.  Then only that single value needs to be tested to know if 2 objects have the same textures in the same orders (although once on the GPU order doen’t really matter, so it really could be just a unique ID based off the combination of textures, not the order).

 


2. z-index = 8 bit


3. z-index = 12 bit

You have misunderstood.

Depth is not an index.  It is a distance from the camera.  It is a 32-bit floating-point value.  It won’t fit inside your single 64-bit value.  You have no choice but to bite the bullet.

 

 


1. Type(like a tiny header in this case = 0(opaque 3D)) = 2 bit

Why even put opaque and translucent objects together at all?  Why not just put them into separate render queues from the start?  They require different sorting routines anyway and sorting 2 smaller buckets is faster than sorting 1 larger one.

 

 

L. Spiro

Share this post


Link to post
Share on other sites


Depth is not an index.  It is a distance from the camera.  It is a 32-bit floating-point value.  It won’t fit inside your single 64-bit value.  You have no choice but to bite the bullet.
If you don't care about perfect depth sorting, but do want a rough depth sort (good for opaque, not so good for translucent), you can definately quantize it to some smaller amount of bits.

e.g. quantized = depth/maxDepth * 2numBits-1


Is there some other way of compacting the 32-bits numbers into 16-bit?
If you need to be able to recover the original 32-bit index, there's no way to do this. You need to use 16-bit indices to begin with.

If you don't need to be able to recover the original values, there's many ways, including hashing. A very simple hash that I sometimes use is just:

hashed = ((original >> 16) ^ original)&0xFFFFU;

 

Speaking of recovering original values -- there's two design choices here:

1) You design your render queue so that from a large sorting key (e.g. 64-bits) you can extract all sorts of state information of how that object should be rendered. You can extract materials ID's, geometry ID's, shader ID's, etc, etc.

2) You treat the sorting key as just a binary lump of data that's only used for sorting and nothing else. You store render-state information elsewhere.

 

Both methods have their merits. To offer another point of view though, I personally chose #2. The objects in my render queue look kind-of like:

{ u32:key, u32:drawCallData, u32:numStateGroups, StateGroup[]:states }

Each item contains the data that will be used to make the draw-call, and then it contains a variable-sized list of state-group pointers. A state-group then contains actual render-states, like commands to bind textures, buffers, shaders, etc...

 

The sort key is only used for sorting; no data is ever extracted from it. Different high level rendering systems can then populate it in whatever way they need to.

If translucent and opaque objects are going to be put into the same queue, then the most-significant-bit can be used to define which one of these passes the object should appear in. Opqaue objects could then follow this with an 8-bit shader hash, whereas translucent objects could follow it with a 24-bit depth value, etc...

Share this post


Link to post
Share on other sites

 


Depth is not an index.  It is a distance from the camera.  It is a 32-bit floating-point value.  It won’t fit inside your single 64-bit value.  You have no choice but to bite the bullet.
If you don't care about perfect depth sorting, but do want a rough depth sort (good for opaque, not so good for translucent), you can definately quantize it to some smaller amount of bits.

e.g. quantized = depth/maxDepth * 2numBits-1

 

 


Is there some other way of compacting the 32-bits numbers into 16-bit?
If you need to be able to recover the original 32-bit index, there's no way to do this. You need to use 16-bit indices to begin with.

If you don't need to be able to recover the original values, there's many ways, including hashing. A very simple hash that I sometimes use is just:

hashed = ((original >> 16) ^ original)&0xFFFFU;

 

Speaking of recovering original values -- there's two design choices here:

1) You design your render queue so that from a large sorting key (e.g. 64-bits) you can extract all sorts of state information of how that object should be rendered. You can extract materials ID's, geometry ID's, shader ID's, etc, etc.

2) You treat the sorting key as just a binary lump of data that's only used for sorting and nothing else. You store render-state information elsewhere.

 

Both methods have their merits. To offer another point of view though, I personally chose #2. The objects in my render queue look kind-of like:

{ u32:key, u32:drawCallData, u32:numStateGroups, StateGroup[]:states }

Each item contains the data that will be used to make the draw-call, and then it contains a variable-sized list of state-group pointers. A state-group then contains actual render-states, like commands to bind textures, buffers, shaders, etc...

 

The sort key is only used for sorting; no data is ever extracted from it. Different high level rendering systems can then populate it in whatever way they need to.

If translucent and opaque objects are going to be put into the same queue, then the most-significant-bit can be used to define which one of these passes the object should appear in. Opqaue objects could then follow this with an 8-bit shader hash, whereas translucent objects could follow it with a 24-bit depth value, etc...

 

Hodgman has once again done a great job of explaining, and I have little to add. Here's the first source I saw suggesting this though: http://realtimecollisiondetection.net/blog/?p=86

Share this post


Link to post
Share on other sites


First-off, specifying diffuse, specular, normal, etc. textures specifically is a very good way to ensure you have to rewrite the whole thing at a later date.

Never hard-code the set of textures you have; you will always need to add one more new type. Besides you are wasting bits.



A better way would be to have a map-like class that takes an array of any number of texture ID’s and for each unique order and combination returns a single ID that need only consume 16 bits or so. Then only that single value needs to be tested to know if 2 objects have the same textures in the same orders (although once on the GPU order doen’t really matter, so it really could be just a unique ID based off the combination of textures, not the order).

So you mean like a hashcode?

Like murmurhash?

That sounds perfect. The problem I see is that then everything with the same diffuse but not the same normal map will be totally different. It's like the material ids.

Hmmm...

I really don't like loosing that much control.

Let's say I have a 16 bit number to play around with:

1. The first texture id = 8 bit

2. The second texture id = 8 bit

 

Maybe then have something that keeps track of all the texture ids usage(the number of texture ids used and store them in a array).

When you call the "generate_material_id"/"generate_texture_id" then it checks all the current textures in the material and selects the two most used.

That would ensure that most texture switches is possibly mitigated. Or are you thinking of something else?

 


2) You treat the sorting key as just a binary lump of data that's only used for sorting and nothing else. You store render-state information elsewhere.

 

Ohh!

This is another solution I guess it's never black or white.

 


Both methods have their merits. To offer another point of view though, I personally chose #2. The objects in my render queue look kind-of like:

{ u32:key, u32:drawCallData, u32:numStateGroups, StateGroup[]:states }

Each item contains the data that will be used to make the draw-call, and then it contains a variable-sized list of state-group pointers. A state-group then contains actual render-states, like commands to bind textures, buffers, shaders, etc...

 

Why store the draw call data in the queue when you have state groups?

Why not store the draw call data in the state groups?

Why have a list of state groups isn't one enough?

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!