Jump to content

  • Log In with Google      Sign In   
  • Create Account

Render queue ids/design?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
14 replies to this topic

#1 lephyrius   Members   -  Reputation: 290

Like
0Likes
Like

Posted 04 November 2013 - 05:13 AM

I've deciding that I want a render queue!  

I have decided 2 things:

1. Not submitting a new queue on each frame.(seems like it creates a lot of memory fragmentation). I will have to have some reference to the scenegraph then. Is this a bad thing?

2. Have separate queues to "opaque 3D", "transparent 3D" and "2D". Because they have a bit different sorting order.

Im using OpenGL in my example but I feel that this could apply to any other API as well.

So the ID is 64-bit you need to sort after.

The parameters are in the priority order for the opaque 3D queue:
1. Shader id
2. Material id(probably an UBO or a combination of texture id)
3. VBO id
4. IBO id

Now how do I combine the ids into one 64 bit integer?
Something like this?

uint64_t queue_id = ((shader_id % USHRT_MAX) << 48) | ((material_id % USHRT_MAX) << 32) | ((vbo_id % USHRT_MAX) << 16) | ibo_id % USHRT_MAX

Is there some other way of compacting the 32-bits numbers into 16-bit?

Or should I maybe create more of a wrapper around the GL ids?
So the shader permutation class:

class Shader {
public:
uint16_t id;
GLuint shader_id;
};


And have a manager(factory) that takes care of assigning ids that is compact.
 

class ShaderManager {
  public:
 
  Shader& get(const char* filename);  // The shader configuration file.
 
  std::map<std::string, uint16_t> loaded_shaders;
  std::vector<uint16_t> free_ids;
  std::vector<Shader> shaders;
};

Hmmm..
This solution is probably a bit more compact and robust.

Hmmm...
I don't think I ever will have 65535 different shaders, materials, vbos or ibos at least not at the same time. Then I could use uint8 and add z order also.

Maybe I should have different kind of queues so that:
 

class ShaderCommand {
public:
  GLuint shader_id;
 
  std::vector<MaterialQueue> texture_queue;
};

class ShaderQueue { // The root of the tree.
public:
  ShaderCommand& add();
  void remove(ShaderCommand& command);
  void render();
std::vector<ShaderCommand> commands;

};
// This is just an example of the render function.
void ShaderQueue::render() {
  // Sort the commands based on shader_id.(skipped)
  GLuint current_shader_id = 0;
  for (ShaderCommand& command : commands  ) {
    glUseProgram(command.shader_id);     // Apply the shader!
    for (TextureQueue& queue : command.texture_queue  ) {
      queue.render();
    }
  }
}
class MaterialCommand {
public:
  size_t material_id;
  GLuint textures[4];  // Probably will be ubo:s but I use textures for now for simplicity.  
 
  std::vector<MaterialQueue> texture_queue;
};

class MaterialQueue { 
public:
  MaterialCommand& add();
  void remove(MaterialCommand& command);
  void render();
std::vector<MaterialCommand> commands;

};

The problem is that I feel that with this approach is that it probably creates more memory fragmentation(+ maybe some cache) and it's harder to change materials on things(I wonder how often that needs to be done). But this will be more of a bucket approach. Also another problem is that this needs a lot copying when I sort unless I use pointers for storing the command. Another problem I see is that like for instance a mesh node in the scene graph will need to have: ShaderCommand, MaterialCommand, VBOCommand and IBOCommand references in their nodes so they could change the Material, Shaders and VBO:s/IBO:s.
At least it will solve the generating of the ids.

Am I overthinking this now?
Is there something I have totally missed? Or I need to have/think about?



Sponsor:

#2 ankhd   Members   -  Reputation: 1274

Like
3Likes
Like

Posted 04 November 2013 - 08:26 PM

Hello. What I did is to have a render list(vector) of each different object type that holds only a pointers to objects that are on screen. Then all I do is render each list then clear them ready for next frame. I'm using instancing as well. The list is realy only used to get each objects world pos. It's like buckets

#3 ADDMX   Members   -  Reputation: 229

Like
3Likes
Like

Posted 06 November 2013 - 02:32 AM

Hello. What I did is to have a render list(vector) of each different object type that holds only a pointers to objects that are on screen. Then all I do is render each list then clear them ready for next frame. I'm using instancing as well. The list is realy only used to get each objects world pos. It's like buckets

 

Hi
 
Well I'm using different design.
Lets say we have a scenegraph, where each renderable object holds pointer to material, then each material holds pointer to shader, now when it comes to rendering
we traverse scenegraph adding each renderable object to it's material's internal list.
Now when it comes to rendering of solid things we traverse thru list of solid shaders, bind program and 'global scale' uniforms, then we traverse thru shader's materials list
bind per material states (textures, 'material scale' uniforms) and then go thru the material's list of renderable objects and bind their uniforms and finaly renders them.
 
By shader I mean: shader program + any shader global uniforms (for example bumpShader or paralaxShader)
By material mean: set of textures + any material specific uniforms (for example duffise_map.dds, bump_map.dds, specular_power, specular_color, etc. etc.)
By object I mean: VertexBuffer, IndexBuffer, ToWorld matrix, optiomally SkinningPalette.
 
Instancing is easy, you just need to sort each material renderables list by object_id (assuming that objects that have exacly that same VertexBuffer and IndexBuffer have that same object_id)
and then put all ToWorld matrixes of objects with same object_id into instancing table and draw them at once.
 
In reality when you traverse scenegraph you will clip to to frustum, shadow_frustum, reflection_frustum(s) etc. and have a bit_flag that gets bits set for each frustum where the object is visible
and then at render time you just check if the visibility_mask & clip_bits are zero or not to render only whats needed.
 
For objects with transparency there's a problem with sorting as with this design you can sort only in one material bucket, but for me this was not an issue since i have only one transparent shader
and all the transparent objects have same texture atlas.
 
Any comments on this design are welcome.


#4 L. Spiro   Crossbones+   -  Reputation: 13574

Like
4Likes
Like

Posted 06 November 2013 - 04:21 AM

now when it comes to rendering we traverse scenegraph

A scenegraph is completely unrelated to rendering.  Every time a scenegraph is used for rendering an angel loses its wings.
 
 

Am I overthinking this now?

Your first example is more correct, but you need to target your priorities better.
 
Obviously the shader ID needs to be there, as changing shaders is the most expensive thing you can do.
The next most expensive is typically texture swaps.  You have no information regarding the textures used, unless that information is part of the material.
Next are the VBO’s and then IBO’s.  If texture information is not part of the material, discard the material—by this point it wouldn’t be of any help once everything else is sorted.
 
You have no depth information.  Sorting by depth is a good idea for depth pre-pass (if you use it) and necessary for transparent objects.
 
You also need some kind of priority system so that things can be drawn before other things.  This goes even if you have a layering system, as even within a layer you may need that kind of control.
 

Or should I maybe create more of a wrapper around the GL ids?

You should always use your own ID’s instead of theirs.  There is nothing in the OpenGL standard saying they have to give you shader ID’s starting at 1.  79,873 is a fully valid ID to get back for your first shader.
 
 

And have a manager(factory) that takes care of assigning ids that is compact.
 

class ShaderManager {
  public:
 
  Shader& get(const char* filename);  // The shader configuration file.
 
  std::map loaded_shaders;
  std::vector free_ids;
  std::vector shaders;
};

Why should some manager assign ID’s, and why do you have a list of “free ID’s”? Are these ID’s that are available to be taken?

A simpler way would be:

 
class Shader {
	/** ID of this shader. */
	LSUINT32						m_ui32Id;

	/** ID counter. */
	static LSUINT32						m_ui32IdCounter;

	/** Thread safety. */
	static CCriticalSection					m_csBaseCrit;
}


//*************************
// CPP
//*************************

/** ID counter. */
LSUINT32 Shader::m_ui32IdCounter = 0UL;


Shader::Shader() {
	CCriticalSection::CLocker lLock( m_csBaseCrit );
	m_ui32Id = ++m_ui32IdCounter;
}
The shader can assign its own ID. More robust because it eliminates the necessity that the shader be created by the shader manager—manual creation does not break the system.
The critical section isn’t really necessary though; it could just be an atomic.


L. Spiro

Edited by L. Spiro, 06 November 2013 - 04:22 AM.

It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#5 lephyrius   Members   -  Reputation: 290

Like
0Likes
Like

Posted 06 November 2013 - 08:41 AM


The next most expensive is typically texture swaps. You have no information regarding the textures used, unless that information is part of the material.
Next are the VBO’s and then IBO’s. If texture information is not part of the material, discard the material—by this point it wouldn’t be of any help once everything else is sorted.

You have no depth information. Sorting by depth is a good idea for depth pre-pass (if you use it) and necessary for transparent objects.

Yeah!

At first I was thinking that material is a combination of all the textures like diffuse, normals and specular.

So let's say I have this structure:(I assume u mean that I need to have a single ID of 64-bit)

Then the structure would be more like this:

1. type(3D opaque, 3D transparent,fullscreen effects like SMAA/HDR, 2D/UI) = 2 bit

2. z-index = 8 bit

3. shader = 7 bit

4. diffuse = 6 bit

5. normals = 6 bit

6. specular = 6 bit

7. IBO = 6 bit

8. VBO = 6 bit

 

This is really conservative!

I get this to 2+8+7+5*6 = 47 bit

 

Is this a correct order? Should I maybe move z-index down a bit?

Hmmm.. Layering. Couldn't this be solved with z-index?

Now I got 17 bits to play around with.

I should probably increase some bits. What do u think would be best to increase?

Or Have I missed something that I need to have?

Maybe I shouldn't exactly specify diffuse,normals and specular but then I would need some kind of "material_id" like before. This feels more sustainable at least and I can easily sort them. Maybe just have 3 texture pools and u could store them in different slots for the shader?

 


Why should some manager assign ID’s, and why do you have a list of “free ID’s”? Are these ID’s that are available to be taken?

For shaders permutations this is most likely to be loaded when the game starts so then it's ok. Let's say we have 100 shader permutations and that would easily fit into 8 bit.

My concern is with with textures(needs most likely 3 textures per model) and IBO:s/VBO:s(a lot more per model) . They will change when the game progresses and then I need to reuse the id:s. Because I must be able to squeeze everything into  64-bit in order to sort the render queue efficiently.

 


The shader can assign its own ID. More robust because it eliminates the necessity that the shader be created by the shader manager—manual creation does not break the system.
The critical section isn’t really necessary though; it could just be an atomic.

I really like the idea I also like the idea of using atomics. The problem is that it feels it could create too many ID:s if you just add up.



#6 lephyrius   Members   -  Reputation: 290

Like
0Likes
Like

Posted 07 November 2013 - 12:38 AM

I've been thinking I got some sleep.

 

I think I need 2 types of ids.

1 for opaque objects and 1 for transparent.

These lists just to clarify 1 = most significant bit.

So for opaque objects:

1. Type(like a tiny header in this case = 0(opaque 3D)) = 2 bit

2. layer = 4 bit

3. shader = 7 bit

4. diffuse = 7 bit

5. normals = 7 bit

6. specular = 7 bit

7. IBO = 7 bit

8. VBO = 7 bit

9. z-index = 8 bit

 

Ok, this has now 2+4+7*6+8 = 56 bit

Hmmm.. I think I could squeeze in another texture and that's a glow map that would mean: 2+4+7*7+8 = 63 bit

Pretty close to 64.

 

So for the transparent objects:

1. Type(like a tiny header in this case = 1(transparent 3D)) = 2 bit

2. layer = 4 bit

3. z-index = 12 bit

4. shader = 7 bit

5. diffuse = 7 bit

6. normals = 7 bit

7. specular = 7 bit

8. IBO = 7 bit

9. VBO = 7 bit

 

So for this it's 2+4+12+7*6 = 60 bit.

Also I could have 2 pools(1 for opaque and 1 for transparent) of shaders, diffuse textures, normals textures and specular textures. So I at least could get 256 IBO/VBO and texture combinations.

 

I think I need to think about this design a bit more so that I don't regret it in the future.



#7 L. Spiro   Crossbones+   -  Reputation: 13574

Like
4Likes
Like

Posted 07 November 2013 - 05:01 PM

First-off, specifying diffuse, specular, normal, etc. textures specifically is a very good way to ensure you have to rewrite the whole thing at a later date.

Never hard-code the set of textures you have; you will always need to add one more new type.  Besides you are wasting bits.

 

A better way would be to have a map-like class that takes an array of any number of texture ID’s and for each unique order and combination returns a single ID that need only consume 16 bits or so.  Then only that single value needs to be tested to know if 2 objects have the same textures in the same orders (although once on the GPU order doen’t really matter, so it really could be just a unique ID based off the combination of textures, not the order).

 


2. z-index = 8 bit


3. z-index = 12 bit

You have misunderstood.

Depth is not an index.  It is a distance from the camera.  It is a 32-bit floating-point value.  It won’t fit inside your single 64-bit value.  You have no choice but to bite the bullet.

 

 


1. Type(like a tiny header in this case = 0(opaque 3D)) = 2 bit

Why even put opaque and translucent objects together at all?  Why not just put them into separate render queues from the start?  They require different sorting routines anyway and sorting 2 smaller buckets is faster than sorting 1 larger one.

 

 

L. Spiro


It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#8 Hodgman   Moderators   -  Reputation: 30350

Like
4Likes
Like

Posted 07 November 2013 - 05:24 PM


Depth is not an index.  It is a distance from the camera.  It is a 32-bit floating-point value.  It won’t fit inside your single 64-bit value.  You have no choice but to bite the bullet.
If you don't care about perfect depth sorting, but do want a rough depth sort (good for opaque, not so good for translucent), you can definately quantize it to some smaller amount of bits.

e.g. quantized = depth/maxDepth * 2numBits-1


Is there some other way of compacting the 32-bits numbers into 16-bit?
If you need to be able to recover the original 32-bit index, there's no way to do this. You need to use 16-bit indices to begin with.

If you don't need to be able to recover the original values, there's many ways, including hashing. A very simple hash that I sometimes use is just:

hashed = ((original >> 16) ^ original)&0xFFFFU;

 

Speaking of recovering original values -- there's two design choices here:

1) You design your render queue so that from a large sorting key (e.g. 64-bits) you can extract all sorts of state information of how that object should be rendered. You can extract materials ID's, geometry ID's, shader ID's, etc, etc.

2) You treat the sorting key as just a binary lump of data that's only used for sorting and nothing else. You store render-state information elsewhere.

 

Both methods have their merits. To offer another point of view though, I personally chose #2. The objects in my render queue look kind-of like:

{ u32:key, u32:drawCallData, u32:numStateGroups, StateGroup[]:states }

Each item contains the data that will be used to make the draw-call, and then it contains a variable-sized list of state-group pointers. A state-group then contains actual render-states, like commands to bind textures, buffers, shaders, etc...

 

The sort key is only used for sorting; no data is ever extracted from it. Different high level rendering systems can then populate it in whatever way they need to.

If translucent and opaque objects are going to be put into the same queue, then the most-significant-bit can be used to define which one of these passes the object should appear in. Opqaue objects could then follow this with an 8-bit shader hash, whereas translucent objects could follow it with a 24-bit depth value, etc...



#9 tool_2046   Members   -  Reputation: 1059

Like
0Likes
Like

Posted 07 November 2013 - 07:30 PM

 


Depth is not an index.  It is a distance from the camera.  It is a 32-bit floating-point value.  It won’t fit inside your single 64-bit value.  You have no choice but to bite the bullet.
If you don't care about perfect depth sorting, but do want a rough depth sort (good for opaque, not so good for translucent), you can definately quantize it to some smaller amount of bits.

e.g. quantized = depth/maxDepth * 2numBits-1

 

 


Is there some other way of compacting the 32-bits numbers into 16-bit?
If you need to be able to recover the original 32-bit index, there's no way to do this. You need to use 16-bit indices to begin with.

If you don't need to be able to recover the original values, there's many ways, including hashing. A very simple hash that I sometimes use is just:

hashed = ((original >> 16) ^ original)&0xFFFFU;

 

Speaking of recovering original values -- there's two design choices here:

1) You design your render queue so that from a large sorting key (e.g. 64-bits) you can extract all sorts of state information of how that object should be rendered. You can extract materials ID's, geometry ID's, shader ID's, etc, etc.

2) You treat the sorting key as just a binary lump of data that's only used for sorting and nothing else. You store render-state information elsewhere.

 

Both methods have their merits. To offer another point of view though, I personally chose #2. The objects in my render queue look kind-of like:

{ u32:key, u32:drawCallData, u32:numStateGroups, StateGroup[]:states }

Each item contains the data that will be used to make the draw-call, and then it contains a variable-sized list of state-group pointers. A state-group then contains actual render-states, like commands to bind textures, buffers, shaders, etc...

 

The sort key is only used for sorting; no data is ever extracted from it. Different high level rendering systems can then populate it in whatever way they need to.

If translucent and opaque objects are going to be put into the same queue, then the most-significant-bit can be used to define which one of these passes the object should appear in. Opqaue objects could then follow this with an 8-bit shader hash, whereas translucent objects could follow it with a 24-bit depth value, etc...

 

Hodgman has once again done a great job of explaining, and I have little to add. Here's the first source I saw suggesting this though: http://realtimecollisiondetection.net/blog/?p=86



#10 lephyrius   Members   -  Reputation: 290

Like
0Likes
Like

Posted 08 November 2013 - 01:37 AM


First-off, specifying diffuse, specular, normal, etc. textures specifically is a very good way to ensure you have to rewrite the whole thing at a later date.

Never hard-code the set of textures you have; you will always need to add one more new type. Besides you are wasting bits.



A better way would be to have a map-like class that takes an array of any number of texture ID’s and for each unique order and combination returns a single ID that need only consume 16 bits or so. Then only that single value needs to be tested to know if 2 objects have the same textures in the same orders (although once on the GPU order doen’t really matter, so it really could be just a unique ID based off the combination of textures, not the order).

So you mean like a hashcode?

Like murmurhash?

That sounds perfect. The problem I see is that then everything with the same diffuse but not the same normal map will be totally different. It's like the material ids.

Hmmm...

I really don't like loosing that much control.

Let's say I have a 16 bit number to play around with:

1. The first texture id = 8 bit

2. The second texture id = 8 bit

 

Maybe then have something that keeps track of all the texture ids usage(the number of texture ids used and store them in a array).

When you call the "generate_material_id"/"generate_texture_id" then it checks all the current textures in the material and selects the two most used.

That would ensure that most texture switches is possibly mitigated. Or are you thinking of something else?

 


2) You treat the sorting key as just a binary lump of data that's only used for sorting and nothing else. You store render-state information elsewhere.

 

Ohh!

This is another solution I guess it's never black or white.

 


Both methods have their merits. To offer another point of view though, I personally chose #2. The objects in my render queue look kind-of like:

{ u32:key, u32:drawCallData, u32:numStateGroups, StateGroup[]:states }

Each item contains the data that will be used to make the draw-call, and then it contains a variable-sized list of state-group pointers. A state-group then contains actual render-states, like commands to bind textures, buffers, shaders, etc...

 

Why store the draw call data in the queue when you have state groups?

Why not store the draw call data in the state groups?

Why have a list of state groups isn't one enough?



#11 L. Spiro   Crossbones+   -  Reputation: 13574

Like
3Likes
Like

Posted 08 November 2013 - 07:24 PM


The problem I see is that then everything with the same diffuse but not the same normal map will be totally different.

From what I understand as being the reason for you putting everything into 64 bits, the exact same problem would exist if you try to do the sort by quickly doing a 64-bit integer compare.

 

If that is not your plan (quick compares) then you would be better off just using a structure and holding everything as separate members.  If you are planning to compare each set of bits in the 64-bit value separately then it will be terribly slow as you shift and mask off bits, not to mention a mess when it comes to adding things later, all on top of how limited you are in your usage of the bits.

 

I am seeing really no reason why you are trying to stick to 64 bits when you have many better options for tackling whatever issues you are trying to avoid.

-> Want to reduce copy overhead when passing render-queue items?  Pass references or pointers to them instead.

-> Want to avoid copy overhead during the sort?  Sort indices, not actual render-queue items.

-> Want to make comparisons between render-queue items fast?  Store things properly aligned inside a structure rather than bit-shifting them around in a 64-bit integer.

-> Want to avoid memory fragmentation by not recreating the queue every frame?  Wrong!  Recreate the render queue but don’t re-allocate unless you need more room.  In other words, clearing the render queue just means setting a count back to 0, not messing with memory.  Let it grow when needed and allocations will be kept to a minimum.

-> Want to take advantage of per-frame temporal coherency?  When the count is reset to 0, don’t modify the sorted indices.  The same objects as last frame will likely already be in sorted order and the indices already correct.  Use an insertion sort to take advantage of temporal coherency.

 

 

Trying to cram everything into a 64-bit value solves nothing and creates problems.  You are limited in your bits and comparisons are 3 times slower.

So perhaps you should explain why you feel the need to go this route and then perhaps we can give you a better alternative.

 

 

L. Spiro


It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#12 Hodgman   Moderators   -  Reputation: 30350

Like
3Likes
Like

Posted 08 November 2013 - 11:38 PM


Why store the draw call data in the queue when you have state groups?

Why not store the draw call data in the state groups?
Why have a list of state groups isn't one enough?

Conceptually, the GPU device has a lots of states that can be configured in many ways, and the a "draw" command tells it to perform some action using those configured states. The "draw" is a command, whereas the states are arguments/parameters for the command. The way I describe my render items is one draw call with many states.

 

I let render items contain many state-groups so that state-groups can be shared. If there's different "sub meshes" inside the same vertex-buffer, then you can have a state group that binds the right vertex streams, sets up the input assembler, etc... If different models share the same textures, then you can have a state-group that binds those textures, and share it between these different models.

Typically, my items will have state-groups such as per-camera (view/proj matrix), per-object (world matrix), per material (textures, shader), etc...



#13 lephyrius   Members   -  Reputation: 290

Like
0Likes
Like

Posted 09 November 2013 - 01:45 AM


Trying to cram everything into a 64-bit value solves nothing and creates problems. You are limited in your bits and comparisons are 3 times slower.

So perhaps you should explain why you feel the need to go this route and then perhaps we can give you a better alternative.

 

I totally see that it creates a lot of problems storing everything in a 64 bit value.

Woah I didn't know that it's 3 times slower to compare a 64 bit value. So I thought that it was as fast 32-bit compare on 64 bit computers.

 

The thing is that I don't feel I need to go this route cramming everything into 64-bit it was just that it was the first idea that came to me and I tried to stick to that idea. Im quiet open to any idea. Initially I wanted some input so that I could avoid the worst bottlenecks and design decisions.

 

I really like the idea that Hodgman gave me to just treat the sorting key as a binary blob that is only used for sorting. That's something I would never have thought of if it wasn't for this thread.

 

Hmmm...

Coming up with a "perfect key" is impossible to do I know it now.

It's almost like an AI that needed to create these keys. Has someone tried building something like a neural network to create keys?

The inputs are shaders, uniforms, textures, vbos, ibos, z-depth, transparent and some other stuff. Output is a key. Training the network would be the draw calls or frame time.

I think it would be slow but will it be slower than the all the unnecessary render state changes?

 


Typically, my items will have state-groups such as per-camera (view/proj matrix), per-object (world matrix), per material (textures, shader), etc...

 

Wouldn't this create a bunch a of memory fragmentation? Or create a lot of dynamic dispatch?

 

I know writing branch free code is really hard so I guess that's ok.



#14 Hodgman   Moderators   -  Reputation: 30350

Like
3Likes
Like

Posted 09 November 2013 - 06:03 AM


Wouldn't this create a bunch a of memory fragmentation? Or create a lot of dynamic dispatch?
You get memory fragmentation when repeatedly allocating and freeing objects. Most of these state-groups aren't temporary objects (a material, object or camera may bea long-lived as a whole level). If I do need to create temporary ones, I'll typically allocate them using a dedicated "frame temporary" stack-allocator that is cleared / reset to empty at the beginning of each frame.

 

As for dynamic dispatch, yeah, in my engines "states" are basically polymorphic / derived from a common "state" interface. Yep, this means that applying a state-object / executing a state-change requires dynamic dispatch. You can just use regular virtual functions for this to begin with and optimize it later if it turns out to be a problem for you.

In my engine, I don't use virtual in this case, and instead the first byte of any state object is an index that identifies the type. I then have an array of pointers-to-functions, containing the functions to apply each type of state-change. When iterating through a list of state objects and applying them, this first byte is used to index that array, e.g. stateFunctions[state.type]( state );

In my case, this is slightly more optimal than virtual (single-indirection instead of double-indirection, and it allows me to use POD-types for the state objects instead of being forced to construct them properly due to the vtables), though I wouldn't recommend doing it in the general case.

My "draw calls" also work the same way, with them also having just a different index in this function array. The core of the renderer then looks very much like a virtual-machine, being a simple loop that's just executing command packets wink.png



#15 L. Spiro   Crossbones+   -  Reputation: 13574

Like
2Likes
Like

Posted 10 November 2013 - 12:10 PM


Woah I didn't know that it's 3 times slower to compare a 64 bit value. So I thought that it was as fast 32-bit compare on 64 bit computers.

You have misunderstood.  I am talking about comparing parts of the 64-bit value, such as bits 17-22.  It requires a shift and a mask operation, making the compare slower.  A single 64-bit value compared to another on an x64 machine is the same speed as a 32-bit compare on an x86.

 

 

L. Spiro


It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS