• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By fleissi
      Hey guys!

      I'm new here and I recently started developing my own rendering engine. It's open source, based on OpenGL/DirectX and C++.
      The full source code is hosted on github:
      https://github.com/fleissna/flyEngine

      I would appreciate if people with experience in game development / engine desgin could take a look at my source code. I'm looking for honest, constructive criticism on how to improve the engine.
      I'm currently writing my master's thesis in computer science and in the recent year I've gone through all the basics about graphics programming, learned DirectX and OpenGL, read some articles on Nvidia GPU Gems, read books and integrated some of this stuff step by step into the engine.

      I know about the basics, but I feel like there is some missing link that I didn't get yet to merge all those little pieces together.

      Features I have so far:
      - Dynamic shader generation based on material properties
      - Dynamic sorting of meshes to be renderd based on shader and material
      - Rendering large amounts of static meshes
      - Hierarchical culling (detail + view frustum)
      - Limited support for dynamic (i.e. moving) meshes
      - Normal, Parallax and Relief Mapping implementations
      - Wind animations based on vertex displacement
      - A very basic integration of the Bullet physics engine
      - Procedural Grass generation
      - Some post processing effects (Depth of Field, Light Volumes, Screen Space Reflections, God Rays)
      - Caching mechanisms for textures, shaders, materials and meshes

      Features I would like to have:
      - Global illumination methods
      - Scalable physics
      - Occlusion culling
      - A nice procedural terrain generator
      - Scripting
      - Level Editing
      - Sound system
      - Optimization techniques

      Books I have so far:
      - Real-Time Rendering Third Edition
      - 3D Game Programming with DirectX 11
      - Vulkan Cookbook (not started yet)

      I hope you guys can take a look at my source code and if you're really motivated, feel free to contribute :-)
      There are some videos on youtube that demonstrate some of the features:
      Procedural grass on the GPU
      Procedural Terrain Engine
      Quadtree detail and view frustum culling

      The long term goal is to turn this into a commercial game engine. I'm aware that this is a very ambitious goal, but I'm sure it's possible if you work hard for it.

      Bye,

      Phil
    • By tj8146
      I have attached my project in a .zip file if you wish to run it for yourself.
      I am making a simple 2d top-down game and I am trying to run my code to see if my window creation is working and to see if my timer is also working with it. Every time I run it though I get errors. And when I fix those errors, more come, then the same errors keep appearing. I end up just going round in circles.  Is there anyone who could help with this? 
       
      Errors when I build my code:
      1>Renderer.cpp 1>c:\users\documents\opengl\game\game\renderer.h(15): error C2039: 'string': is not a member of 'std' 1>c:\program files (x86)\windows kits\10\include\10.0.16299.0\ucrt\stddef.h(18): note: see declaration of 'std' 1>c:\users\documents\opengl\game\game\renderer.h(15): error C2061: syntax error: identifier 'string' 1>c:\users\documents\opengl\game\game\renderer.cpp(28): error C2511: 'bool Game::Rendering::initialize(int,int,bool,std::string)': overloaded member function not found in 'Game::Rendering' 1>c:\users\documents\opengl\game\game\renderer.h(9): note: see declaration of 'Game::Rendering' 1>c:\users\documents\opengl\game\game\renderer.cpp(35): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>c:\users\documents\opengl\game\game\renderer.cpp(36): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>c:\users\documents\opengl\game\game\renderer.cpp(43): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>Done building project "Game.vcxproj" -- FAILED. ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========  
       
      Renderer.cpp
      #include <GL/glew.h> #include <GLFW/glfw3.h> #include "Renderer.h" #include "Timer.h" #include <iostream> namespace Game { GLFWwindow* window; /* Initialize the library */ Rendering::Rendering() { mClock = new Clock; } Rendering::~Rendering() { shutdown(); } bool Rendering::initialize(uint width, uint height, bool fullscreen, std::string window_title) { if (!glfwInit()) { return -1; } /* Create a windowed mode window and its OpenGL context */ window = glfwCreateWindow(640, 480, "Hello World", NULL, NULL); if (!window) { glfwTerminate(); return -1; } /* Make the window's context current */ glfwMakeContextCurrent(window); glViewport(0, 0, (GLsizei)width, (GLsizei)height); glOrtho(0, (GLsizei)width, (GLsizei)height, 0, 1, -1); glMatrixMode(GL_PROJECTION); glLoadIdentity(); glfwSwapInterval(1); glEnable(GL_SMOOTH); glEnable(GL_DEPTH_TEST); glEnable(GL_BLEND); glDepthFunc(GL_LEQUAL); glHint(GL_PERSPECTIVE_CORRECTION_HINT, GL_NICEST); glEnable(GL_TEXTURE_2D); glLoadIdentity(); return true; } bool Rendering::render() { /* Loop until the user closes the window */ if (!glfwWindowShouldClose(window)) return false; /* Render here */ mClock->reset(); glfwPollEvents(); if (mClock->step()) { glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); glfwSwapBuffers(window); mClock->update(); } return true; } void Rendering::shutdown() { glfwDestroyWindow(window); glfwTerminate(); } GLFWwindow* Rendering::getCurrentWindow() { return window; } } Renderer.h
      #pragma once namespace Game { class Clock; class Rendering { public: Rendering(); ~Rendering(); bool initialize(uint width, uint height, bool fullscreen, std::string window_title = "Rendering window"); void shutdown(); bool render(); GLFWwindow* getCurrentWindow(); private: GLFWwindow * window; Clock* mClock; }; } Timer.cpp
      #include <GL/glew.h> #include <GLFW/glfw3.h> #include <time.h> #include "Timer.h" namespace Game { Clock::Clock() : mTicksPerSecond(50), mSkipTics(1000 / mTicksPerSecond), mMaxFrameSkip(10), mLoops(0) { mLastTick = tick(); } Clock::~Clock() { } bool Clock::step() { if (tick() > mLastTick && mLoops < mMaxFrameSkip) return true; return false; } void Clock::reset() { mLoops = 0; } void Clock::update() { mLastTick += mSkipTics; mLoops++; } clock_t Clock::tick() { return clock(); } } TImer.h
      #pragma once #include "Common.h" namespace Game { class Clock { public: Clock(); ~Clock(); void update(); bool step(); void reset(); clock_t tick(); private: uint mTicksPerSecond; ufloat mSkipTics; uint mMaxFrameSkip; uint mLoops; uint mLastTick; }; } Common.h
      #pragma once #include <cstdio> #include <cstdlib> #include <ctime> #include <cstring> #include <cmath> #include <iostream> namespace Game { typedef unsigned char uchar; typedef unsigned short ushort; typedef unsigned int uint; typedef unsigned long ulong; typedef float ufloat; }  
      Game.zip
    • By lxjk
      Hi guys,
      There are many ways to do light culling in tile-based shading. I've been playing with this idea for a while, and just want to throw it out there.
      Because tile frustums are general small compared to light radius, I tried using cone test to reduce false positives introduced by commonly used sphere-frustum test.
      On top of that, I use distance to camera rather than depth for near/far test (aka. sliced by spheres).
      This method can be naturally extended to clustered light culling as well.
      The following image shows the general ideas

       
      Performance-wise I get around 15% improvement over sphere-frustum test. You can also see how a single light performs as the following: from left to right (1) standard rendering of a point light; then tiles passed the test of (2) sphere-frustum test; (3) cone test; (4) spherical-sliced cone test
       

       
      I put the details in my blog post (https://lxjk.github.io/2018/03/25/Improve-Tile-based-Light-Culling-with-Spherical-sliced-Cone.html), GLSL source code included!
       
      Eric
    • By Fadey Duh
      Good evening everyone!

      I was wondering if there is something equivalent of  GL_NV_blend_equation_advanced for AMD?
      Basically I'm trying to find more compatible version of it.

      Thank you!
    • By Jens Eckervogt
      Hello guys, 
       
      Please tell me! 
      How do I know? Why does wavefront not show for me?
      I already checked I have non errors yet.
      using OpenTK; using System.Collections.Generic; using System.IO; using System.Text; namespace Tutorial_08.net.sourceskyboxer { public class WaveFrontLoader { private static List<Vector3> inPositions; private static List<Vector2> inTexcoords; private static List<Vector3> inNormals; private static List<float> positions; private static List<float> texcoords; private static List<int> indices; public static RawModel LoadObjModel(string filename, Loader loader) { inPositions = new List<Vector3>(); inTexcoords = new List<Vector2>(); inNormals = new List<Vector3>(); positions = new List<float>(); texcoords = new List<float>(); indices = new List<int>(); int nextIdx = 0; using (var reader = new StreamReader(File.Open("Contents/" + filename + ".obj", FileMode.Open), Encoding.UTF8)) { string line = reader.ReadLine(); int i = reader.Read(); while (true) { string[] currentLine = line.Split(); if (currentLine[0] == "v") { Vector3 pos = new Vector3(float.Parse(currentLine[1]), float.Parse(currentLine[2]), float.Parse(currentLine[3])); inPositions.Add(pos); if (currentLine[1] == "t") { Vector2 tex = new Vector2(float.Parse(currentLine[1]), float.Parse(currentLine[2])); inTexcoords.Add(tex); } if (currentLine[1] == "n") { Vector3 nom = new Vector3(float.Parse(currentLine[1]), float.Parse(currentLine[2]), float.Parse(currentLine[3])); inNormals.Add(nom); } } if (currentLine[0] == "f") { Vector3 pos = inPositions[0]; positions.Add(pos.X); positions.Add(pos.Y); positions.Add(pos.Z); Vector2 tc = inTexcoords[0]; texcoords.Add(tc.X); texcoords.Add(tc.Y); indices.Add(nextIdx); ++nextIdx; } reader.Close(); return loader.loadToVAO(positions.ToArray(), texcoords.ToArray(), indices.ToArray()); } } } } } And It have tried other method but it can't show for me.  I am mad now. Because any OpenTK developers won't help me.
      Please help me how do I fix.

      And my download (mega.nz) should it is original but I tried no success...
      - Add blend source and png file here I have tried tried,.....  
       
      PS: Why is our community not active? I wait very longer. Stop to lie me!
      Thanks !
  • Advertisement
  • Advertisement
Sign in to follow this  

OpenGL Porting OpenGL to Direct3D 11 : How to handle Input Layouts?

This topic is 659 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi guys!

 

Theses days i'm trying to add in support for Direct3D 11 to my rendering engine which currently supports OpenGL 3.3 upwards. While writing the abstraction i hit a bit of a road block : Input Layouts. So to my knowledge in Direct3D 11 you have to define Input Layout per shader (by providing Shader Bytecode). Whereas in OpenGL you have to make glVertexAttribPointer calls for each attribute.

Currently i am using VAO's and store the Attribute Locations in them by calling glVertexAttribPointer after binding Buffers like so :

glGenVertexArrays(1, &VAO);
glGenBuffers(1, &VBO);
glGenBuffers(1, &IBO);

glBindVertexArray(VAO);

glBindBuffer(GL_ARRAY_BUFFER, VBO);
glBufferData(GL_ARRAY_BUFFER, Vertices.size() * sizeof(Vertex), &Vertices[0], GL_STATIC_DRAW);

glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, IBO);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, Indices.size() * sizeof(GLuint), &Indices[0], GL_STATIC_DRAW);

// then do attribpointer calls before unbinding VAO

glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, sizeof(Vertex), (GLvoid*)0);

......

glBindVertexArray(0);

But since doing this per-VAO won't work with Direc3D, i have i come up with the following:
 

Call glVertexAttribPointer every frame for each shader that uses a particular vertex layout (similar to calling IASetInputLayout).

 

Question 1 :

Is this a good idea? Will calling glVertexAttribPointer so often affect performance? How do you guys handle this in your engines?

Question 2 (bonus :D ) :

Should i use Vertex Buffers and Index Buffers in my OpenGL implementation without VAO's? (because Direct3D does not have such a thing). Or should i try to emulate VAO's in Direct3D somehow? (have an array of Buffers? that's awful i think).

I know it's a lot of questions but it's been driving me nuts. I really want to hear your thoughts. Thanks in advance guys!

Share this post


Link to post
Share on other sites
Advertisement

While writing the abstraction i hit a bit of a road block : Input Layouts. So to my knowledge in Direct3D 11 you have to define Input Layout per shader (by providing Shader Bytecode). Whereas in OpenGL you have to make glVertexAttribPointer calls for each attribute

It's not per-shader, but per vertex shader input structure. If two shaders use the same vertex structure as their input, they can share an Input Layout. The bytecode parameter when creating an IL is actually only used to extract the shader's vertex input structure and pair it up with the attributes described in your descriptor.
In my engine, I never actually pass any real shaders into that function -- I compile dummy code for each of my HLSL vertex structures which is only used during IL creation -- e.g. given some structure definitions:
StreamFormat("colored2LightmappedStream",  -- VBO attribute layouts
{
	[VertexStream(0)] = 
	{
		{ Float, 3, Position },
	},
	[VertexStream(1)] = 
	{
		{ Float, 3, Normal },
		{ Float, 3, Tangent },
		{ Float, 2, TexCoord, 0 },
		{ Float, 2, TexCoord, 1, "Unique_UVs" },
		{ Float, 4, Color, 0, "Vertex_Color" },
		{ Float, 4, Color, 1, "Vertex_Color_Mat" },
	},
})
VertexFormat("colored2LightmappedVertex",  -- VS input structure
{
	{ "position",  float3, Position },
	{ "color",	   float4, Color, 0 },
	{ "color2",    float4, Color, 1 },
	{ "texcoord",  float2, TexCoord, 0 },
	{ "texcoord2", float2, TexCoord, 1 },
	{ "normal",    float3, Normal },
	{ "tangent",   float3, Tangent },
})
StreamFormat("basicPostStream",  -- VBO attribute layouts
{
	[VertexStream(0)] = 
	{
		{ Float, 2, Position },
		{ Float, 2, TexCoord },
	},
})
VertexFormat("basicPostVertex",  -- VS input structure
{
	{ "position", float2, Position },
	{ "texcoord", float2, TexCoord },
})
this HLSL file is automatically generated and then compiled by my engine's toolchain, to be used as the bytecode when creating IL objects:
/*[FX]
Pass( 0, 'test_basicPostVertex', {
	vertexShader = 'vs_test_basicPostVertex';
	vertexLayout = 'basicPostVertex';
})*/
float4 vs_test_basicPostVertex( basicPostVertex inputs ) : SV_POSITION
{
	float4 hax = (float4)0;
	hax += (float4)(float)inputs.position;	hax += (float4)(float)inputs.texcoord;	 return hax;
}
/*[FX]
Pass( 1, 'test_colored2LightmappedVertex', {
	vertexShader = 'vs_test_colored2LightmappedVertex';
	vertexLayout = 'colored2LightmappedVertex';
})*/
float4 vs_test_colored2LightmappedVertex( colored2LightmappedVertex inputs ) : SV_POSITION
{
	float4 hax = (float4)0;
	hax += (float4)(float)inputs.position;	hax += (float4)(float)inputs.color;	hax += (float4)(float)inputs.color2;	hax += (float4)(float)inputs.texcoord;	hax += (float4)(float)inputs.texcoord2;	hax += (float4)(float)inputs.normal;	hax += (float4)(float)inputs.tangent;	 return hax;
}

Won't claim this is the best/only way of doing this, but I define a "Geometry Input" object that is more or less equivalent to a VAO. It holds a vertex format and the buffers that are bound all together in one bundle. The vertex format is defined identically to D3D11_INPUT_ELEMENT_DESC in an array. In GL, this pretty much just maps onto a VAO. (It also virtualizes neatly to devices that don't have working implementations of VAO. Sadly they do exist.) In D3D, it holds an input layout plus a bunch of buffer references and the metadata for how they're bound to the pipeline.

The only problem with that is that an IL is a glue/translation object between a "Geometry Input" and a VS input structure -- it doesn't just describe the layout of your geometry/attributes in memory, but also describes the order that they appear in the vertex shader. In the general case, you can have many different "Geometry Input" data layouts that are compatible with a single VS input structure -- and many VS input structures that are compatible with a single "Geometry Input" data layout.
i.e. in general, it's a many-to-many relationship between the layouts of your buffered attributes in memory, and the structure that's declared in the VS.
 
In my engine:
* the "geometry input" object contains a "attribute layout" object handle, which describes how the different attributes are laid out within the buffer objects.
* the "shader program" object contains a "vertex layout" object handle, which describes which attributes are consumed and in what order.
* When you create a draw-item (which requires specifying both a "geometry input" and a "shader program"), then a compatible D3D IL object is fetched from a 2D table, indexed by the "attribute layout" ID and the "vertex layout" ID.
* This table is generated ahead of time by the toolchain, by inspecting all of the attribute layouts and vertex layouts that have been declared, and creating input layout descriptors for all the compatible pairs. Edited by Hodgman

Share this post


Link to post
Share on other sites

For our relatively simple purposes and limited formats, we can simply standardize attribute ordering. In general I find the D3D 11 shader bytecode thing to be an unwelcome extra step, though maybe the wider hardware selection forces it? It seems like it's just a validation step I don't want though. You have the order as a result of the input element desc array anyway, so it's easy to generate a shader just to get through the validation. Metal simply has the shader fetch from the buffer, and it's your own responsibility to make sure that what the shader is fetching and what the buffer is offering match. That's way cleaner for me.

 

I'm not sure we're doing anything all that different, though.

Edited by Promit

Share this post


Link to post
Share on other sites

though maybe the wider hardware selection forces it? It seems like it's just a validation step I don't want though.


I used to struggle with the concept myself, and everyone I've worked with in the graphics space also finds it a pain. It's especially painful for simpler projects.

It's both required on some hardware and an important optimization on others.

I struggled with this a lot too, and even various Pro graphics developers I know rather hat it. Unfortunately, it's not going anyway - note that in D3D12, the input layout is baked into the PSO and so is even more cumbersome than it is in D3D11. If it helps, mentally tack on the word State to the end of ID3D11InputLayout. It's just like BlendState, RasterizerState, or DeptchStencilState in many regards, except of course for the caveat that it binds very directly with the shader code in use.

So, let's explain. The shader is expecting various inputs. Say, it's expecting an input in slot 3 that must be decoded to a float4. Armed with only this information, the input assembler doesn't know what the _source_ format is. Are you supplying floats? Normalized integers? Unnormalized integers? A compressed format? The shader doesn't know and hence the IA can't know just from the bytecode.

The input geometry has placed attributed in various formats. It knows that a given attribute is in the first buffer and is a 4-component integer that should be normalized. Unfortunately, there's no indication of what to do with that input. Is it even being used? Which IA slot should actually consume it? Is the semantic TEXCOORD0 supposed to go into slot 0, 1, 2, etc.? Is there some further operation being performed on it? What format should it be unpacked into?

Some hardware hence requires the IA to have a block of state defining all these bindings very explicitly. Even if it weren't required, looking at the bytecode and input layout, finding all the common elements, hashing those, and looking those up internally in the driver on every IASetInputLayout or VSSetShader call is difficult to make efficient.

In D3D's model, you have to write a cache yourself but you can make said cache minimal and efficient. In GL's model, the VAO/shader cache must be magic in the driver, and must handle a great deal of generalization that probably doesn't matter to your app. This is just another example of D3D's state-block model being certainly more cumbersome than GL's statemachine model while being more efficient. Just like D3D12 and Vulkan are far more cumbersome yet can be far more efficient. Likewise, just as D3D12's power comes at the cost of decreased efficiency if you aren't putting in the right effort, D3D11's model is likely less efficient than GL's _if_ you don't bother with a decent geometry<->shader cache.
 

Theses days i'm trying to add in support for Direct3D 11 to my rendering engine which currently supports OpenGL 3.3 upwards. While writing the abstraction i hit a bit of a road block : Input Layouts.


There's a decent approach very similar to what has been mentioned previously that more closely matches what the Big Engines do and closely matches what most Small Engines probably _should_ do.

The first to realize is that your geometry buffers represent some actual object(s). They have meaning, semantics, and purpose. They can only be rendered in a handful of ways relevant to your engine. At the simplest commercializeable end, these ways are possibly just "render to G-Buffer" and "render to depth buffer" (for shadows, z prepass, etc.).

When you define your objects/geometry you then also generally _must_ define this set of rendering methods, often called Materials (though that term has a few different meanings). A Material only needs to work with geometry in a particular input layout because the asset baking pipeline can ensure this will happen (and you can split up bigger and more complex interleaved buffers).

These rendering methods are, roughly, shaders + state (e.g. mostly a PSO in D3D12). That also turns into a great place to put input layouts! In fact in D3D12, it _must_ be where you put the input layouts since those are baked into PSOs.

Your code can thus be structured something like this:
 
class Buffer;
class RenderPass {
  void setShaders(vs, ps);
  void setDepthState(enabled, op);
  void ...
};
class Material {
  void addRenderPass(pass_id, pass);
  void setLayout(attributes); // all passes must be compatible with layout
};
class MaterialInstance { // this is the object that holds or caches all the compiled things
  Material material;
};
class Object {
  MaterialInstance material;
  Buffer[] buffers; // matches layout required by Material
};
class RenderQueue {
  void draw(object, pass_id);
};

The above is easy to extend with static vs dynamic attributes which is handy for more advanced graphical effects and architectures.

Share this post


Link to post
Share on other sites

 

though maybe the wider hardware selection forces it? It seems like it's just a validation step I don't want though.


I used to struggle with the concept myself, and everyone I've worked with in the graphics space also finds it a pain. It's especially painful for simpler projects.

It's both required on some hardware and an important optimization on others.

I struggled with this a lot too, and even various Pro graphics developers I know rather hat it. Unfortunately, it's not going anyway - note that in D3D12, the input layout is baked into the PSO and so is even more cumbersome than it is in D3D11. If it helps, mentally tack on the word State to the end of ID3D11InputLayout. It's just like BlendState, RasterizerState, or DeptchStencilState in many regards, except of course for the caveat that it binds very directly with the shader code in use.

So, let's explain. The shader is expecting various inputs. Say, it's expecting an input in slot 3 that must be decoded to a float4. Armed with only this information, the input assembler doesn't know what the _source_ format is. Are you supplying floats? Normalized integers? Unnormalized integers? A compressed format? The shader doesn't know and hence the IA can't know just from the bytecode.

The input geometry has placed attributed in various formats. It knows that a given attribute is in the first buffer and is a 4-component integer that should be normalized. Unfortunately, there's no indication of what to do with that input. Is it even being used? Which IA slot should actually consume it? Is the semantic TEXCOORD0 supposed to go into slot 0, 1, 2, etc.? Is there some further operation being performed on it? What format should it be unpacked into?

Some hardware hence requires the IA to have a block of state defining all these bindings very explicitly. Even if it weren't required, looking at the bytecode and input layout, finding all the common elements, hashing those, and looking those up internally in the driver on every IASetInputLayout or VSSetShader call is difficult to make efficient.

 

Ok, I see what you're driving at. When you supply the bytecode, there's wiring of what attributes link to what semantics in the shader, and the order of these two need not match in declaration or layout. So the runtime has to generate fetch rules in the IA that tell it how to link things up. Once it's done that, there are specific rules defining what IAs are compatible with what shaders (prefixes, basically). For my purposes, the declaration and memory layout ordering always matches - I forgot this was a potential issue in the first place. That's why I can get away with generating a shader instead of using a real one (though usage-wise it's preferred to supply a real one).

 

On PSO APIs, my 'shader' object encapsulates a total PSO and all other pieces of the pipeline are required to conform. Works well enough in my smaller setting, in larger engines it'd be tooling-generated as Hodgman mentioned.

In D3D's model, you have to write a cache yourself but you can make said cache minimal and efficient. In GL's model, the VAO/shader cache must be magic in the driver, and must handle a great deal of generalization that probably doesn't matter to your app. This is just another example of D3D's state-block model being certainly more cumbersome than GL's statemachine model while being more efficient. Just like D3D12 and Vulkan are far more cumbersome yet can be far more efficient. Likewise, just as D3D12's power comes at the cost of decreased efficiency if you aren't putting in the right effort, D3D11's model is likely less efficient than GL's _if_ you don't bother with a decent geometry<->shader cache.

The VAO design is idiotic. What do we have in D3D 11/12, Vulkan, Metal, etc? There's a block of state, which pre-configures the necessary pipeline pieces to do all the operations they need to do. All we do at that point is plug in the pointers to the right buffers, render, change pointers, render. But GL/VAO? No, GL and VAO have to link the actual physical buffer pointers themselves into the state. So now I have a unique VAO per object (or group of objects in the cases that I can share VBOs and offset) and I have to switch between them even when the actual vertex layout itself matches. So now they're making things difficult for both the user AND the driver, AND this intersects poorly with a proper PSO implementation. Ugh. Of course they finally had to go back and fix this with ARB_vertex_attrib_binding, which was standardized for 4.3. Naturally, a number of platforms don't support it.

Edited by Promit

Share this post


Link to post
Share on other sites

No, GL and VAO have to link the actual physical buffer pointers themselves into the state. So now I have a unique VAO per object (or group of objects in the cases that I can share VBOs and offset) and I have to switch between them even when the actual vertex layout itself matches. So now they're making things difficult for both the user AND the driver, AND this intersects poorly with a proper PSO implementation. Ugh. Of course they finally had to go back and fix this with ARB_vertex_attrib_binding, which was standardized for 4.3. Naturally, a number of platforms don't support it.


Oh goodness, I totally forgot about that little turd of a detail.

The reason for it is that GL also binds buffer-specific attributes into its "layout" details in the VAO. Note in D3D for instance that you specify the offset and stride when you bind the buffers, not in your input layout / PSO. That said, it's still a bit bewildering that GL didn't let you swap the buffers even when they have the same offset and stride. :/

Did they at least fix/replace VAO's in OpenGL 4.nobody_can_use_this_yet ? I feel like they did.

Ah, yeah: https://www.opengl.org/wiki/GLAPI/glBindVertexBuffer / https://www.opengl.org/sdk/docs/man/html/glBindVertexBuffer.xhtml

glBindVertexBuffer modifies the currently bound VAO. So you'd use VAO as an input layout equivalent then rebind the buffers as needed before drawing. Those are extensions in 4.3 and core in 4.5, though, so hardware support is likely very hit and miss (and mobile is screwed as usual).

Share this post


Link to post
Share on other sites

That's the extension I mentioned, ARB_vertex_attrib_binding. You're slightly off on the versions - it's an extension to 4.2, available in 4.3 core. Also in ES 3.1. This means Windows on recent drivers yes, Linux on proprietary drivers yes, Mac no, mobile only cutting edge. Assuming it's implemented correctly...

Edited by Promit

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement