Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!

Matias Goldberg

Member Since 02 Jul 2006
Offline Last Active Today, 10:09 PM

#5217154 What are your opinions on DX12/Vulkan/Mantle?

Posted by Matias Goldberg on 17 March 2015 - 03:13 PM


If anyone has lots of questions, you can just compile and try Ogre 2.1, then disect its source code to see how we're handling it. It's Open Source after all.
Doing what I'm saying is not impossible, otherwise we wouldn't be doing it.

To answer The Chubu's question, glDrawElementsInstancedBaseVertexBaseInstance has THREE key parameters:

  • baseInstance: With this I can send an arbitrary index as I explained, which I can use to index whatever I want from a constant (UBO) or texture (TBO) buffer. I can even perform multiple indirections (retrieve an index from an UBO using the index from baseInstance)
  • baseVertex​: With this I can store as many meshes as I want in the same buffer object; and select them individually by providing the offset location to the start of the mesh I want to render. With this, I don't need to alter state at all (unless vertex format changes). The meshes don't even need to be contiguous in memory; they just need to be in the same Buffer Object and aligned to the vertex size.
  • indices: With this I can store as many meshes' index data as I want in the same buffer object, and select them individually by providing the offset location to the start of the index data. Remember to keep alignment to 4 bytes. Bonus points: You can keep the vertex and index data in the same buffer object.


The DX11 equivalent of this DrawIndexedInstanced and the analogous parameters are StartInstanceLocation, BaseVertexLocation & StartIndexLocation respectively.

We treat all of our draws with these functions.


The DX11 function works on DX10 hardware just fine. glDrawElementsInstancedBaseVertexBaseInstance was introduced in GL 4.2; however it is available to GL3 hardware via extension. The most notable remark is that OS X doesn't support this extension, at the time of writing.


The end result is that we just map the buffer(s) once; write all the data in sequence; bind these buffers and then issue a lot of consecutive glDrawElementsInstancedBaseVertexBaseInstance / DrawIndexedInstanced calls without any other API calls in between.

We only need to perform additional API calls when:

  • We need to bind a different buffer / buffer section (i.e. we've exhausted the 64kb limit)
  • We need to change state (shaders, vertex format, blending modes, rasterizer states; we keep them sorted to reduce this)
  • We're using more than one mesh pool (pool = a buffer where we store all our meshes together), and the next mesh is stored in another pool (we sort by pools though, in order to reduce this switching).

#5216918 What are your opinions on DX12/Vulkan/Mantle?

Posted by Matias Goldberg on 16 March 2015 - 03:01 PM

1 is a valid value for the instance count.

Of course but the idea is to batch up data inside the constant/uniform buffers and use the instance ID for indexing. No sense doing it if you can only index one thing (ie, you end up what I am doing, one glDraw and glUniform1i call per mesh drawn).

Doing this for just one instance is completely valid. If you do it the way you said, although valid; your API overhead will go through the roofs, specially if you have a lot of different meshes.

#5216917 What are your opinions on DX12/Vulkan/Mantle?

Posted by Matias Goldberg on 16 March 2015 - 02:54 PM

How are you implementing your constant buffers? From what you've written as your #3b, it sounds like you're packing multiple materials'/objects' constants into a single large constant buffer, and perhaps indexing out of it in your draws?


IIRC, that's supported only in D3D11.1+, as there is no *SSetConstantBuffer function that takes offsets until then.

That's one way of doing it, and doing it that way, then you're correct. We don't use D3D11.1 functionality, though since OpenGL does support setting constant buffers by offsets, we take advantage of that to further reduce splitting some batch of draw calls.

Otherwise, if you aren't using constant buffers with offsets, how are you avoiding having to set things like object transforms and the like? If you are, how are you handling targets below D3D11.1?

By treating all your draws as instanced draws (even if they're just one instance) and use StartInstanceLocation.
I attach a "drawId" R32_UINT vertex buffer (instanced buffer) which is filled with 0, 1, 2, 3, 4 ... 4095 (basically, we can't batch more than 4096 draws together in the same call; that limit is not arbitrary: 4096 * 4 floats per vector = 64kb; aka the const buffer limit).
Hence the "drawId" vertex attribute will always contain the value I want as long as it is in range [0; 4096) and thus index whatever I want correctly.

This is the most compatible way of doing it which works with both D3D11 and OpenGL. There is a GL4 extension that exposes the keywords gl_DrawIDARB & gl_BaseInstanceARB which allows me to do the same without having to use an instanced vertex buffer (thus gaining some performance bits in memory fetching; though I don't know if it's noticeable since the vertex buffer is really small and doesn't consume much bandwidth; also the 4096 draws per call limit can be lifted thanks to this extension).

#5216326 ConstantBuffer is leaking memory after release

Posted by Matias Goldberg on 13 March 2015 - 02:23 PM

On a completely unrelated bug in my code, I just run by chance with this helpful debug message from the D3D11 runtime that is relevant to your topic:

Objects with Refcount=0 and IntRef=0 will be eventually destroyed through typical Immediate Context usage. However, if the application requires these objects to be destroyed sooner, ClearState followed by Flush on the Immediate Context will realize their destruction.

#5216265 What are your opinions on DX12/Vulkan/Mantle?

Posted by Matias Goldberg on 13 March 2015 - 07:28 AM

I use baseInstance parameter from glDraw*BaseInstanceBaseVertex. gl_InstanceID will still be zero based, but you can use an instanced vertex element to overcome this problem (or use an extension that exposes an extra glsl variable with the value of baseInstance)

#5216192 What are your opinions on DX12/Vulkan/Mantle?

Posted by Matias Goldberg on 12 March 2015 - 08:12 PM

- Root Signatures/Shader Constant management
Again really exciting stuff, but seems like a huge potential for issues, not to mention the engine now has to be acutely aware of how frequently the constants are changed and then map them appropriately.

You should already be doing that on modern D3D11/GL.
In Ogre 2.1 we use 4 buffer slots:

  1. One for per-pass data
  2. One to store all materials (up to 273 materials per buffer due to the 64kb per const buffer restriction)
  3. One to store per-draw data
  4. One tbuffer to store per-draw data (similar to 3. but it's a tbuffer which stores more data where not having the 64kb restriction is handy)

Of all those slots, we don't really change them. Even the per-draw parameters.

The only time we need rebind buffers are when:

  1. We've exceeded one of the per-draw buffers size (so we bind a new empty buffer)
  2. We are in a different pass (we need another per-pass buffer)
  3. We have more than 273 materials overall and previous draw referenced material #0 and the current one is referencing material #280 (so we need the switch the material buffer)
  4. We change to a shader that doesn't use these bindings (very rare).

Point 2 happens very infrequently. Point 3 & 4 can be minimized by sorting by state in a RenderQueue. Point 1 happens very infrequently too, and if you're on GCN the 64kb limit gets upgraded to 2GB limit, which means you wouldn't need to switch at all (and also solves point #3 entirely).

The entire bindings don't really change often and this property can already be exploited using DX11 and GL4. DX12/Vulkan just makes the interface thiner; that's all.

#5216190 Forward+ vs Deferred rendering

Posted by Matias Goldberg on 12 March 2015 - 07:45 PM

MJP failed to mention his awesome post.


As a comparison, Deferred vs Forward+:

  • Forward+ plays nice with multiple BRDFs, transparency, and MSAA (antialising). Deferred doesn't. Lots of hacks are needed that either sacrifice quality, a lot of performance, or just resort to forward to do some of that stuff (i.e. fallback to regular forward for transparents)
  • Deferred uses A LOT of bandwidth. It doesn't scale well with screen resolution. A lot of research has been put into compressing the deferred data (Albedo, G-Buffers, Depth, material properties); but it's far from being solved problem. Forward+ doesn't have this problem.
  • Forward+ needs a Z-prepass which puts pressure on the vertex shader (and draw call overhead depending on which API is used). Like MJP said, it can be skipped, but you'll be sacrificing pixel shader efficiency. Sometimes the Z-Prepass is a win if the BRDF is heavy. But it can also be a loss.
  • Forward+ requires atomic counters for implementing efficiently on compute shaders; which makes DX11 level hardware a requirement. Deferred can be implemented to run on DX10 HW just fine.

#5215564 Vulkan is Next-Gen OpenGL

Posted by Matias Goldberg on 09 March 2015 - 08:51 PM

Gpu hang are properly spotted in Windows and Linux, you can try using arb bind less texture and over filling memory/passing half handle or wrong handle to shader. in Windows the display server is resetted, in Linux Xorg is killed.

I have more than 2 dozen of experiences while doing very advanced shader stuff in the last two weeks proving otherwise; where I had to hit the reset button because not even caps locks worked. Complete system hang and the OSes didn't notice.
TDR and killing XOrg are last-resorts fail safe mechanisms. Not your every-day exception handlers. They're not guaranteed to work. For simple stuff, they do.
But it's not always possible. A few months ago I managed to lockup the Bus (a hardware flaw design in one of the motherboards I tested with) while doing something terrible wrong with AZDO code (I was asked not to divulge the specific details when I reported the problem. Don't know what happened next).
TDR can't recover from something like that. On Linux, killing XOrg won't help if the video driver is hang on a deadlock or stuck waiting for the GPU to finish its infinte loop.

That's the sort of stuff that can happen when the bubble D3D11 (and other old APIs) puts you in gets broken.
The best possible solution is to keep improving upon GPU virtualization and a more stable hardware interface (dreaming that some day we won't need APIs to control the video card) so that all GPU exceptions can get easily caught.

#5215051 What are your opinions on DX12/Vulkan/Mantle?

Posted by Matias Goldberg on 06 March 2015 - 05:53 PM

There is something I don't really understand in Vulkan/DX12, it's the "descriptor" object. Apparently it acts as a gpu readable data chunk that hold texture pointer/size/layout and sampler info, but I don't understand the descriptor set/pool concept work, this sounds a lot like array of bindless texture handle to me.

Without going into detail; it's because only AMD & NVIDIA cards support bindless textures in their hardware, there's one major Desktop vendor that doesn't support it even though it's DX11 HW. Also take in mind both Vulkan & DX12 want to support mobile hardware as well.
You will have to give the API a table of textures based on frequency of updates: One blob of textures for those that change per material, one blob of textures for those that rarely change (e.g. environment maps), and another blob of textures that don't change (e.g. shadow maps).
It's very analogous to how we have been doing constant buffers with shaders (provide different buffers based on frequency of update).
And you put those blobs into a bigger blob and tell the API "I want to render with this big blob which is a collection of blobs of textures"; so the API can translate this very well to all sorts of hardware (mobile, Intel on desktop, and bindless like AMD's and NVIDIA's).

If all hardware were bindless, this set/pool wouldn't be needed because you could change one texture anywhere with minimal GPU overhead like you do in OpenGL4 with bindless texture extensions.
Nonetheless this descriptor pool set is also useful for non-texture stuff, (e.g. anything that requires binding, like constant buffers). It is quite generic.

#5215030 What are your opinions on DX12/Vulkan/Mantle?

Posted by Matias Goldberg on 06 March 2015 - 03:50 PM

I feel like to fully support these APIs I need to almost abandon the previous APIs support in my engine since the veil is so much thinner, otherwise I'll just end up adding the same amount of abstraction that DX11 does already, kind of defeating the point.

But it depends. For example if you were doing AZDO OpenGL, many of the concepts will already be familiar to you.
However, for example, AZDO never dealt with textures as thin as Vulkan or D3D12 do so you'll need to refactor those.
If you weren't following AZDO, then it's highly likely that the way you were using the old APIs is incompatible with the new says.

Actually there are way to do kindof multithreading in OpenGL 4 : (...). There is also glBufferStorage + IndirectDraw which allows you to access a buffer of instanced data that can be written like any others buffer, eg concurrently.
But it's not as powerful as what Vulkan or DX12 which allow to issue any command and not just instanced ones.

Actually DX12 & Vulkan are exactly following the same path glBufferStorage + IndirectDraw did. It just got easier, made thiner, and can now handle other misc aspects from within multiple cores (texture binding, shader compilation, barrier preparation, etc).

The rest was covered by Promit's excellent post.

#5214737 Vulkan is Next-Gen OpenGL

Posted by Matias Goldberg on 05 March 2015 - 08:36 AM

subjecting yourself to the tortures that OpenGL driver writers had to endure for so long (and still will unless they got promoted).
The OpenGL API is significantly flawed, which is specifically why these kinds of major upgrades have been requested for so long(’s Peak).


That might be fun as a pet project but otherwise I don’t see the point(...)

IMO the point is that instead of having one GL implementation per vendor; we could have just one running on top of Vulkan. So if it doesn't work in my machine due to an implementation bug, I can at least be 90% certain it won't work in your machine either.
In principle it's no different from ANGLE which translates GL calls and shaders into DX9.
However ANGLE is limited to ES2/WebGL-like functionality and DX9 is a high level API with high overhead; while running on top of Vulkan could deliver very acceptable performance and support the latest GL functionality.

#5214490 Vulkan is Next-Gen OpenGL

Posted by Matias Goldberg on 04 March 2015 - 08:57 AM

THIS. A lot of people don't seem to get these are very low level APIs with a focus on raw memory manipulation and baking of objects/commands that are needed very frequently. You destroyed a texture while it was still in use?

Come on, time has changed. Current game engines uses multithreading and multithreading is one of the best ways to kill your game project, still people are able to code games smile.png

It's not really the same. Multithreading problems can be debugged and there's a lot of literature and tools to understand them.
It's much harder to debug a problem that locks up your entire system every time you try to analyze it.

I'm currently at the state of handling many things by buffers and in the application itself and that with OGL2.1 (allocate buffer, manage double/triple buffering yourself, handling buffer sync yourself etc.). Most likely I use only a few % of the API at all. I think that a modern OGL architecture (AZDO, using buffers everywhere including UBOs etc) will be close to what you could expect from vulkan and that if they expose some vulkan features as extensions (command buffer), then switching over to vulkan will not be a pain in the ass.

If you're already doing AZDO with explicit synchronization then you will find these new APIs pleasing indeed. However there are breaking changes like how textures are being loaded and bound. Since there's no hazard tracking, you can't issue a draw call that uses a texture until the it is actually in GPU memory. Drivers were also handling residency for you, but since now they don't, out of GPU errors can be much more common unless you write your own residency solution. Also how textures are bound is going to change.
Then, in the case of D3D12, there's PSOs, which fortunately you should be already emulating them for forward compatibility.

Indeed, professional developers won't have much problems; whatever annoyance they may have is obliterated by the happiness from the performance gains. I'm talking from a rookie perspective.

#5214486 Litterature about GPU architecture ?

Posted by Matias Goldberg on 04 March 2015 - 08:39 AM

Perhaps this is a bit of shameless self-promotion, but I talked a bit about memory operations on modern hardware, it may be of your interest.

They're a bit outdated, but the ATI Radeon 2000 programming guide and Depth In Depth from Emil Persson explain a lot of background concepts that are still relevant today (Hi Z, Z Compression, Early Z, Fast Z Clear, dynamic branching and divergence).
Seeing his two recent talks for modern archs is also useful to find the differences.

#5214367 Vulkan is Next-Gen OpenGL

Posted by Matias Goldberg on 03 March 2015 - 11:03 PM

Remember, Vulkan is going to be a huge pain in the ass compared to GL. The Vulkan API is _much_ cleaner, yes, but it also eschews all the hand-holding and conveniences of GL and forces you to manage all kinds of hardware state and resource migration manually. Vulkan does not _replace_ OpenGL; it simply provides yet another alternative.

The same is true in Microsoft land: D3D11.3 is being released alongside D3D12, bringing the new hardware features to the older API because the newer API is significantly more complicated to use due to the greatly thinner abstractions; it's expected that the average non-AAA developer will want to stick with the older, easier APIs.

THIS. A lot of people don't seem to get these are very low level APIs with a focus on raw memory manipulation and baking of objects/commands that are needed very frequently. You destroyed a texture while it was still in use? BAM! Graphics corruption (or worse, BSOD). You wrote to a constant buffer while it was still in use? Let the random jumping of objects begin! You manipulated the driver buffers and had an off-by-1 error? BAM! Crash or BSOD. Your shader has a loop and is reading the count from unitialized memory? BAM! TDR kicks in or system becomes highly unresponsive.
You need to change certain states more frequently than you thought? Too bad, turns out you need to make some architectural modifications to do what you want efficiently.

It's hard. But I love it, with great power comes great responsability. None of this is a show-stopper for people used to low level programming. But it is certainly not newbie friendly like D3D11 or GL were (if you considered those newbie friendly). Anyway, a lot of people learned hardcore programming back in the DOS days when it was a wild west. So may be this is a good thing.

#5213319 Render Queue Design

Posted by Matias Goldberg on 27 February 2015 - 08:41 AM

You seem to be missing the base theory on which L. Spiro built his posts/improvements.

The article Order your draw calls around from 2008 should shed light on your questions.