Sign in to follow this  
ongamex92

OpenGL DrawCalls as a struct

Recommended Posts

For a while I've been trying to implement correctly s struct that HOLDS all the possible data that could be needed to execute a draw call in a single struct. I tough that I invented the hot water, but it turns out that other are also doing it and form what I *hear* they are doing better. To put you in context here is what ive got 

struct DrawCall
{

// Buffer Texture SamplerState ARE JUST POINTERS - LITERALLY
typedef std::vector<Pair<unsigned, Buffer>> BoundCBuffersContainter;
typedef std::vector<Pair<unsigned, Texture>> BoundTexturesContainter;
typedef std::vector<Pair<unsigned, SamplerState>> BoundSamplersContainter;

std::vector<BoundUniform> m_uniforms; // a quick access to the 0th CBuffer in D3D or the uniforms in OpenGL, holds std::string uniform name(as hashed int), type and byte offset int m_uniformdata
std::vector<char> m_uniformData;

ShadingProgram  m_shadingProg; // POINTER
Buffer                             m_vertexBuffers[GraphicsCaps::NUM_VERTEX_BUFFER_SLOTS]; // AN ARRAY OF POINTERS, GraphicsCaps::NUM_VERTEX_BUFFER_SLOTS is ~8
uint32                             m_vbOffsets[GraphicsCaps::NUM_VERTEX_BUFFER_SLOTS];
uint32                             m_vbStrides[GraphicsCaps::NUM_VERTEX_BUFFER_SLOTS];
VertexDeclIndex                    m_vertDeclIndex;
PrimitiveTopology::Enum            m_primTopology;
Buffer                             m_indexBuffer; // POINTER
UniformType::Enum                  m_indexBufferFormat;
uint32                             m_indexBufferByteOffset;
BoundCBuffersContainter            m_boundCbuffers; // STD::VECTOR OF POINTERS
BoundTexturesContainter            m_boundTextures; // STD::VECTOR OF POINTERS
BoundSamplersContainter            m_boundSamplers; // STD::VECTOR OF POINTERS
FrameTarget                        m_frameTarget;  //POINTER (a set of render targets + depth buffer)
Viewport                           m_viewport;
RasterizerState                    m_rasterState; // POINTER
DepthStencilState                  m_depthStencilState; // POINTER
BlendState                         m_blendState; // POINTER
AABox2i                            m_scissorRect; // This will be used only if useScissors is true in the rasterizer state.
DrawExecDesc m_drawExec; // linear indexed ect. vertex bufffer num primitives ect.


DrawCall();
~DrawCall() = default;

// Just setter for the values above.
void setProgram(ShadingProgram pShadingProgram);
void setVB(const int slot, Buffer pBuffer, const uint32 byteOffset, const uint32 stride);
void SetVBDeclIndex(const VertexDeclIndex idx);
void setPrimitiveTopology(const PrimitiveTopology::Enum pt);
void setIB(Buffer pBuffer, const UniformType::Enum format, const uint32 byteOffset);
void setCBuffer(const unsigned nameStrIdx, Buffer cbuffer); // nameStrIdx is just a hash (actually retrieved by std::map<std::string, int>)
void setTexture(const unsigned nameStrIdx, Texture texture, const bool bindSampeler = true);// nameStrIdx is just a hash (actually retrieved by std::map<std::string, int>)
void setSampler(const unsigned nameStrIdx, SamplerState sampler);// nameStrIdx is just a hash (actually retrieved by std::map<std::string, int>)
void setFrameTarget(FrameTarget frameTarget);
void setRenderState(RasterizerState rasterState, DepthStencilState depthStencilState, BlendState blendState = BlendState());
void setScissorsRect(const AABox2i& rect);
void setViewport(const Viewport& viewport);void draw(const uint32 numVerts, const uint32 startVert, const uint32 numInstances = 1);
void drawIndexed(const uint32 numIndices, const uint32 startIndex, const uint32 startVert, const uint32 numInstances = 1);

// This functions checks the validity of the bound resources
// and returns true if the draw call is valid-ish...
// [CAUTION] this function is NOT complete.
bool validateDrawCall() const;
};

So far so good...
This struct struct could later passed to a "Context" that can execute it. My problems are:

 - The struct under x64 i 400 bytes long.

 - The struct allocates (via std::vector) in order to hold the bound data: A quick solution to fix this is
                                   - use static arrays (will increase the overall size of the struct but no dynamic allocations)

                                   - reuse a DrawCall struct when possible (which is pretty often), but this basically kill the whole idea of having DrawCall as structs

 

Technically that DrawCall reuse isn't that bad, but i have this gut feeling that it is not right. All suggestions are welcome.

My other issue is the way I bind resources (texture, constant buffers,  sampler states, "regular uniforms"(I use that because they are WAY easier than constant buffer).
Also I like the idea of binding something by string-name/string-hash but that lookup is a bit costly, so I think something like slots as in d3d might be better, any suggestions here?

So any suggestions how i can improve that?

Edited by imoogiBG

Share this post


Link to post
Share on other sites

Thanks for the great explanation Hodgman.

I still have some conceptual question that i cannot resolve.

Currently I do not have a game, but a small set of scenes that I render via my API. And usually I have only one constant buffer that is constantly bound and I update it before eveydraw call. Additionally
A while ago I did a measurement (if i did it right) and it turned to that indeed having a constantly bound cbuffer which I update() is wayyy faster compared to having multiple CBuffer just binding them(not including the updates). Because of that assumption I cannot understand your concept of CBuffers, to me they are not like the usual texture (which is immutable), to me cbuffers are constantly updated and that's why I currently store "bare" uniforms In my draw call.

Could you go a little bit in depth (or explain what i could be doing wrong) that CBuffer, thing and how do you use them?  

Share this post


Link to post
Share on other sites

The thing with shader constants is that they typically change at different frequencies based on what the data is and some might be reused across many draw calls within the scene. e.g.
- View / projection matrices typically only change once per frame. 
- Material constants are the same across many draw calls that use the same value
- Model matrix changes per draw call.

The goal with constant buffers is to split your data up such that you only need to re-upload the constants to the GPU as needed. For example, if you have a seperate View & Projection matrix per draw call, thats 128 bytes per draw call you're uploading repeatedly. It doesn't sound like much but that number starts scaling very quickly as you have more data and more draw calls. The cost of switching CBuffers is typically much lower than the CPU cost of recomitting the data into a single constantly bound cbuffer - depending on what exactly your scene construction looks like.

Share this post


Link to post
Share on other sites
The cost of switching CBuffers is typically much lower than the CPU cost of recomitting the data into a single constantly bound cbuffer - depending on what exactly your scene construction looks like.

This is exactly the opposite of what I've measured. At least with my constant buffer which is pretty small in bytes.
My measurements are kind of in sync with https://developer.nvidia.com/content/constant-buffers-without-constant-pain-0 if I understand this paper correctly?

OFFTOPIC:

How can I "ping" Hodgman, I'm not sure that he was able to see that this thread has been updated.

Edited by imoogiBG

Share this post


Link to post
Share on other sites
The cost of switching CBuffers is typically much lower than the CPU cost of recomitting the data into a single constantly bound cbuffer - depending on what exactly your scene construction looks like.

This is exactly the opposite of what I've measured. At least with my constant buffer which is pretty small in bytes.
My measurements are kind of in sync with https://developer.nvidia.com/content/constant-buffers-without-constant-pain-0 if I understand this paper correctly?

OFFTOPIC:

How can I "ping" Hodgman, I'm not sure that he was able to see that this thread has been updated.

 

Thats why I mentioned that it scales quickly based on  the amount of data in the cbuffer and the number of drawcalls. Even if you have just split it into 2 constant buffers that are always bound, e.g. constants that only change once per frame, and then everything else in the other, you would be transferring less data to the GPU, while the GPU will be still be doing the same work. But this stuff only matters when upload bandwidth is your performance bottleneck.

Regarding your offtopic, if you quote their post they get a notification.

Share this post


Link to post
Share on other sites

A while ago I did a measurement (if i did it right) and it turned to that indeed having a constantly bound cbuffer which I update() is wayyy faster compared to having multiple CBuffer just binding them(not including the updates). Because of that assumption I cannot understand your concept of CBuffers, to me they are not like the usual texture (which is immutable), to me cbuffers are constantly updated and that's why I currently store "bare" uniforms In my draw call.
Which API did you measure that on?

You can provide a cbuffer abstraction to the DrawItem system, but then use a completely different back-end implementation.

e.g. on D3D9, there are no cbuffers, so the DrawItem has a CBufferID, which is associated with some memory that came from malloc, where the user has placed their constants, and the back-end copies these constants into D3D9's constant registers.

Or on D3D12, I have a single massive "constant-ring" for streaming per-frame/dynamic constants to the GPU. The back-end copies the user's constants into this ring-buffer every frame, and they're overwritten again next frame.

In both cases, the DrawItem still just stores a CBufferID as if it's using cbuffers/UBO's.

 

If you've got OpenGL profiling data that shows that raw uniforms are indeed better than UBO's, then your OpenGL back-end can still use raw-uniforms, even though if the DrawItem system is using a UBO-like abstraction.

 

Also - what Digital said. You generally want to split up your constants by update-frequency / data-source. Constants from your camera system are updated rarely and shared between many draw calls. UBO's let you send this data to the GPU once and then simply bind a pointer to each draw-call. On the other hand, traditional GL uniforms require you to repeatedly send the data to the GPU for each draw call - so in the general case, they don't scale as well.

That said, I'm sure that on NVidia's GL drivers (the GL optimization kings), that traditional uniforms probably do perform amazingly well in certain situations.

 

My measurements are kind of in sync with https://developer.nvidia.com/content/constant-buffers-without-constant-pain-0 if I understand this paper correctly?
It doesn't sound like it. They show SetConstantBuffers to be cheaper than updating a constant buffer.

If you're using multiple constant buffers correctly, then you don't need to update them for each draw call.

e.g.

* your per-material buffers and your per-object buffers for static objects get updated once (on load).

* your camera buffer gets updated once per frame

* your per-object buffers for dynamic objects get updated once per draw

 

So per frame you call UpdateSubresource once for the camera, and once for each dynamic object (unless you're using instancing/etc -- then it's likely once or less per group of mesh types)...

So when rendering static objects, you only need to call SetConstantBuffers, as all your cbuffers already contain the right data.

And when rendering dynamic (non-instanced) objects, you call UpdateSubresource to copy the dynamic data into a cbuffer, and then call SetConstantBuffers.

 

They then also describe an advanced feature that's available in D3D11.1 and OpenGL4 that lets you reduce the number of calls to UpdateSubresource dramatically.

Share this post


Link to post
Share on other sites
Which API did you measure that on?

Direct3D 11. Under OpenGL I've got implemented uniform buffers, the problem Is that I'm writing the shading language myself(it gets translated to HLSL GLSL) and I haven't to implemented that feature yet, so yeah I'm going to drop these "regular" uniforms in the future.

Maybe I should remeasure and think of a more complex scene.

Otherwise to back up my decision, as far as I know the constant buffers size could contain up to 4096 float4 variables, which is 64KB. My scenes currently have very little cbuffer data : (world, view, proj, few color, few colors for lights and few point light positions ... nothing fancy). 

There is no running away from updating at least one constant buffer, because of the "world" matrix (and not to mention if there are animated material, which I currently do not have, but it is a possibility),
so I did a bit of measuring and it turns out that If I use only one cbuffer, the cost the updating one variable is the same as updating multiple variables(note again that my cbuffers aren't that big), and if I keep that cbuffer constantly bound I wouldn't pay for binding another buffer.
So yeah that's the logic behind that decision.

I'm on a halt with that cbuffer thing, I really want to hear what you've experienced guys first.
Otherwise I've adopted the "StateGroup" technique and it flows really nice with everything else so far. I'm able now to cache those StateGroups and reuse them between draw calls.


EDIT:

My concept will not scale well if and object has multiple materials, which isn't that rare(well it depends... as everything in this world).

Edited by imoogiBG

Share this post


Link to post
Share on other sites
Or on D3D12, I have a single massive "constant-ring" for streaming per-frame/dynamic constants to the GPU. The back-end copies the user's constants into this ring-buffer every frame, and they're overwritten again next frame.

 

I'll add this is really handy. Even if for some awkward reason it runs slower than plain glUniforms, its so much easier to manage. Instead of tying a bunch of different glUniformBleh calls for different types, juggling with the names, indices and the fact that they work per shader, with UBOs you got a fat buffer, write bytes on it, and just index into it inside the shader.

 

Quoting that nVidia article: 

 


  1. Don’t update a subset of a larger constant buffer, as this increases the accumulated memory size more than necessary, and will cause the driver to reach the renaming limit more quickly. This piece of advice doesn’t apply when you are using the DX11.1 features that allow for partial constant buffer updates.

Partial updates have been possible in GL since UBOs exist afaik (3.1, although I'm currently using it with GL 3.3). So its a feature you can take advantage of in a lot of hardware (+/- driver bugs, as always :P ).

Edited by TheChubu

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Announcements

  • Forum Statistics

    • Total Topics
      628385
    • Total Posts
      2982391
  • Similar Content

    • By test opty
      Hi all,
       
      I'm starting OpenGL using a tut on the Web. But at this point I would like to know the primitives needed for creating a window using OpenGL. So on Windows and using MS VS 2017, what is the simplest code required to render a window with the title of "First Rectangle", please?
       
       
    • By DejayHextrix
      Hi, New here. 
      I need some help. My fiance and I like to play this mobile game online that goes by real time. Her and I are always working but when we have free time we like to play this game. We don't always got time throughout the day to Queue Buildings, troops, Upgrades....etc.... 
      I was told to look into DLL Injection and OpenGL/DirectX Hooking. Is this true? Is this what I need to learn? 
      How do I read the Android files, or modify the files, or get the in-game tags/variables for the game I want? 
      Any assistance on this would be most appreciated. I been everywhere and seems no one knows or is to lazy to help me out. It would be nice to have assistance for once. I don't know what I need to learn. 
      So links of topics I need to learn within the comment section would be SOOOOO.....Helpful. Anything to just get me started. 
      Thanks, 
      Dejay Hextrix 
    • By mellinoe
      Hi all,
      First time poster here, although I've been reading posts here for quite a while. This place has been invaluable for learning graphics programming -- thanks for a great resource!
      Right now, I'm working on a graphics abstraction layer for .NET which supports D3D11, Vulkan, and OpenGL at the moment. I have implemented most of my planned features already, and things are working well. Some remaining features that I am planning are Compute Shaders, and some flavor of read-write shader resources. At the moment, my shaders can just get simple read-only access to a uniform (or constant) buffer, a texture, or a sampler. Unfortunately, I'm having a tough time grasping the distinctions between all of the different kinds of read-write resources that are available. In D3D alone, there seem to be 5 or 6 different kinds of resources with similar but different characteristics. On top of that, I get the impression that some of them are more or less "obsoleted" by the newer kinds, and don't have much of a place in modern code. There seem to be a few pivots:
      The data source/destination (buffer or texture) Read-write or read-only Structured or unstructured (?) Ordered vs unordered (?) These are just my observations based on a lot of MSDN and OpenGL doc reading. For my library, I'm not interested in exposing every possibility to the user -- just trying to find a good "middle-ground" that can be represented cleanly across API's which is good enough for common scenarios.
      Can anyone give a sort of "overview" of the different options, and perhaps compare/contrast the concepts between Direct3D, OpenGL, and Vulkan? I'd also be very interested in hearing how other folks have abstracted these concepts in their libraries.
    • By aejt
      I recently started getting into graphics programming (2nd try, first try was many years ago) and I'm working on a 3d rendering engine which I hope to be able to make a 3D game with sooner or later. I have plenty of C++ experience, but not a lot when it comes to graphics, and while it's definitely going much better this time, I'm having trouble figuring out how assets are usually handled by engines.
      I'm not having trouble with handling the GPU resources, but more so with how the resources should be defined and used in the system (materials, models, etc).
      This is my plan now, I've implemented most of it except for the XML parts and factories and those are the ones I'm not sure of at all:
      I have these classes:
      For GPU resources:
      Geometry: holds and manages everything needed to render a geometry: VAO, VBO, EBO. Texture: holds and manages a texture which is loaded into the GPU. Shader: holds and manages a shader which is loaded into the GPU. For assets relying on GPU resources:
      Material: holds a shader resource, multiple texture resources, as well as uniform settings. Mesh: holds a geometry and a material. Model: holds multiple meshes, possibly in a tree structure to more easily support skinning later on? For handling GPU resources:
      ResourceCache<T>: T can be any resource loaded into the GPU. It owns these resources and only hands out handles to them on request (currently string identifiers are used when requesting handles, but all resources are stored in a vector and each handle only contains resource's index in that vector) Resource<T>: The handles given out from ResourceCache. The handles are reference counted and to get the underlying resource you simply deference like with pointers (*handle).  
      And my plan is to define everything into these XML documents to abstract away files:
      Resources.xml for ref-counted GPU resources (geometry, shaders, textures) Resources are assigned names/ids and resource files, and possibly some attributes (what vertex attributes does this geometry have? what vertex attributes does this shader expect? what uniforms does this shader use? and so on) Are reference counted using ResourceCache<T> Assets.xml for assets using the GPU resources (materials, meshes, models) Assets are not reference counted, but they hold handles to ref-counted resources. References the resources defined in Resources.xml by names/ids. The XMLs are loaded into some structure in memory which is then used for loading the resources/assets using factory classes:
      Factory classes for resources:
      For example, a texture factory could contain the texture definitions from the XML containing data about textures in the game, as well as a cache containing all loaded textures. This means it has mappings from each name/id to a file and when asked to load a texture with a name/id, it can look up its path and use a "BinaryLoader" to either load the file and create the resource directly, or asynchronously load the file's data into a queue which then can be read from later to create the resources synchronously in the GL context. These factories only return handles.
      Factory classes for assets:
      Much like for resources, these classes contain the definitions for the assets they can load. For example, with the definition the MaterialFactory will know which shader, textures and possibly uniform a certain material has, and with the help of TextureFactory and ShaderFactory, it can retrieve handles to the resources it needs (Shader + Textures), setup itself from XML data (uniform values), and return a created instance of requested material. These factories return actual instances, not handles (but the instances contain handles).
       
       
      Is this a good or commonly used approach? Is this going to bite me in the ass later on? Are there other more preferable approaches? Is this outside of the scope of a 3d renderer and should be on the engine side? I'd love to receive and kind of advice or suggestions!
      Thanks!
    • By nedondev
      I 'm learning how to create game by using opengl with c/c++ coding, so here is my fist game. In video description also have game contain in Dropbox. May be I will make it better in future.
      Thanks.
  • Popular Now