Jump to content

  • Log In with Google      Sign In   
  • Create Account

What are your opinions on DX12/Vulkan/Mantle?


Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.


  • You cannot reply to this topic
121 replies to this topic

#1   Members   -  Reputation: 781

Like
0Likes
Like

Posted 06 March 2015 - 12:33 PM

I'm pretty torn on what to think about it. On one hand being able to write the implementations for a lot of the resource management and command processing allows for a lot of gains, and for a much better management of the rendering across a lot of hardware. Also all the threading support is going to be fantastic.

But on the other hand, accepting that burden will vastly increase the cost to maintain a platform and the amount of time to fully port. I understand that a lot of it can be ported piece by piece, but it seems like the amount of time necessary to even meet the performance of, say, dx12 is on the order of man weeks. 

I feel like to fully support these APIs I need to almost abandon the previous APIs support in my engine since the veil is so much thinner, otherwise I'll just end up adding the same amount of abstraction that DX11 does already, kind of defeating the point.

 

What are your opinions?


Perception is when one imagination clashes with another

#2   Members   -  Reputation: 10561

Like
6Likes
Like

Posted 06 March 2015 - 01:13 PM

*
POPULAR

Depending on how you have designed things the time to port should be pretty low and, importantly, the per-API level of code for the new Vulkan and DX12 paths will be very thin as they look very much the same in terms of API space and functionality.

Older APIs might be a problem, however again this depends on your abstraction levels; if you have coupled to D3D11 or OpenGL's style of doing things too closely at the front end then yes, it'll cause you pain - if your abstraction was light enough then you could possibly treat D3D11 and OpenGL as if they are D3D12 and Vulkan from the outside and do the fiddling internally.

At this point however it depends on where you are with things; if you want to release Soon then you'd be better off forgetting the new APIs and getting finished before going back, gutting your rendering backend and doing it again for another project/product.

In fact I'd probably argue 'releasing soon' is the only reason to stick with the current APIs; if you are just learning then by all means continue however I would advocating switching to the newer stuff as soon as you can. That might mean waiting for lib to abstract things a bit of you don't feel like fiddling but the new APIs look much cleaner and much simpler to learn - a few tricky concepts are wrapped in them but they aren't that hard to deal with and, with Vulkan at least, it looks like you'll have solid debugging tools and layers built in to help.

I guess if you are a hobbyist you could keep on with the old stuff; but I'd honestly say that if you are looking to be a gfx programmer in the industry switching sooner rather than later will help you out as you'll be useful in the future when you join up.

#3   Members   -  Reputation: 912

Like
0Likes
Like

Posted 06 March 2015 - 01:39 PM

I feel like to fully support these APIs I need to almost abandon the previous APIs support in my engine since the veil is so much thinner, otherwise I'll just end up adding the same amount of abstraction that DX11 does already, kind of defeating the point.

 

 

That's highly unlikely. One of the reasons the classical APIs are so slow is that they have to do a whole lot of rule validation. When you google an OpenGL function and you see that whole list of things that an argument is allowed to be, what happens when something goes wrong, etc, then the drivers have to actually validate that at runtime (to actually fulfill the language spec requirements correctly, but also to prevent you from crashing the GPU). That is because drivers can't make many assumptions about the context that you execute those API calls in. When you write a game engine with DX12 or Vulkan, that's not true. As the programmer you typically have complete knowledge of the relevant context and can make many assumptions, thus you can skip a whole lot of work that a classical OpenGL driver would have to do.

 

In addition to that multithreading in DX11 and OpenGL 4.x is still very crappy (although I'm not sure why) and with the new APIs you will be able to actually use multiple cores to do rendering API stuff for more than just 5% gains. 

 

Thinking about it, it's kinda like C++ vs a more managed language like Java or C#. Similar concepts, of course extremely similar execution context, but one gives more control, is a more precise abstraction of the hardware, but enables you to shoot yourself in the foot more (and in return is faster).


Edited by agleed, 06 March 2015 - 01:43 PM.


#4   Members   -  Reputation: 1066

Like
0Likes
Like

Posted 06 March 2015 - 02:37 PM

 

 

In addition to that multithreading in DX11 and OpenGL 4.x is still very crappy (although I'm not sure why)

 

OpenGL doesnt really provide any mean for multithreading even in 4.4. OpenGL context belongs to a single thread who can issue rendering command. That's what vulkan/DX12 are tackling by the "command buffer" object that can be created on any thread (although they have to be commited to the command queue which seems to belong to a single thread only).

 

Actually there are way to do kindof multithreading in OpenGL 4 : you can share a context between thread, that's used to load texture asynchronously for instance, but I've heard that this is really inefficient. There is also glBufferStorage + IndirectDraw which allows you to access a buffer of instanced data that can be written like any others buffer, eg concurrently.

But it's not as powerful as what Vulkan or DX12 which allow to issue any command and not just instanced ones.



#5   Members   -  Reputation: 912

Like
0Likes
Like

Posted 06 March 2015 - 02:45 PM

 

 

 

In addition to that multithreading in DX11 and OpenGL 4.x is still very crappy (although I'm not sure why)

 

OpenGL doesnt really provide any mean for multithreading even in 4.4. OpenGL context belongs to a single thread who can issue rendering command. That's what vulkan/DX12 are tackling by the "command buffer" object that can be created on any thread (although they have to be commited to the command queue which seems to belong to a single thread only).

 

Actually there are way to do kindof multithreading in OpenGL 4 : you can share a context between thread, that's used to load texture asynchronously for instance, but I've heard that this is really inefficient. There is also glBufferStorage + IndirectDraw which allows you to access a buffer of instanced data that can be written like any others buffer, eg concurrently.

But it's not as powerful as what Vulkan or DX12 which allow to issue any command and not just instanced ones.

 

 

Yes, but I'm more interested in what prevented driver implementors to get proper multithreading support into the APIs in the first place. DX11 has the concept of command lists too and it's kind of working, but practical gains from it are pretty small. I don't know what about the APIs (or the implementations of the drivers) prevents proper multithreading from working in DX11 and GL4.x



#6   Moderators   -  Reputation: 12198

Like
89Likes
Like

Posted 06 March 2015 - 02:56 PM

*
POPULAR

Many years ago, I briefly worked at NVIDIA on the DirectX driver team (internship). This is Vista era, when a lot of people were busy with the DX10 transition, the hardware transition, and the OS/driver model transition. My job was to get games that were broken on Vista, dismantle them from the driver level, and figure out why they were broken. While I am not at all an expert on driver matters (and actually sucked at my job, to be honest), I did learn a lot about what games look like from the perspective of a driver and kernel.

 

The first lesson is: Nearly every game ships broken. We're talking major AAA titles from vendors who are everyday names in the industry. In some cases, we're talking about blatant violations of API rules - one D3D9 game never even called BeginFrame/EndFrame. Some are mistakes or oversights - one shipped bad shaders that heavily impacted performance on NV drivers. These things were day to day occurrences that went into a bug tracker. Then somebody would go in, find out what the game screwed up, and patch the driver to deal with it. There are lots of optional patches already in the driver that are simply toggled on or off as per-game settings, and then hacks that are more specific to games - up to and including total replacement of the shipping shaders with custom versions by the driver team. Ever wondered why nearly every major game release is accompanied by a matching driver release from AMD and/or NVIDIA? There you go.

 

The second lesson: The driver is gigantic. Think 1-2 million lines of code dealing with the hardware abstraction layers, plus another million per API supported. The backing function for Clear in D3D 9 was close to a thousand lines of just logic dealing with how exactly to respond to the command. It'd then call out to the correct function to actually modify the buffer in question. The level of complexity internally is enormous and winding, and even inside the driver code it can be tricky to work out how exactly you get to the fast-path behaviors. Additionally the APIs don't do a great job of matching the hardware, which means that even in the best cases the driver is covering up for a LOT of things you don't know about. There are many, many shadow operations and shadow copies of things down there.

 

The third lesson: It's unthreadable. The IHVs sat down starting from maybe circa 2005, and built tons of multithreading into the driver internally. They had some of the best kernel/driver engineers in the world to do it, and literally thousands of full blown real world test cases. They squeezed that system dry, and within the existing drivers and APIs it is impossible to get more than trivial gains out of any application side multithreading. If Futuremark can only get 5% in a trivial test case, the rest of us have no chance.

 

The fourth lesson: Multi GPU (SLI/CrossfireX) is fucking complicated. You cannot begin to conceive of the number of failure cases that are involved until you see them in person. I suspect that more than half of the total software effort within the IHVs is dedicated strictly to making multi-GPU setups work with existing games. (And I don't even know what the hardware side looks like.) If you've ever tried to independently build an app that uses multi GPU - especially if, god help you, you tried to do it in OpenGL - you may have discovered this insane rabbit hole. There is ONE fast path, and it's the narrowest path of all. Take lessons 1 and 2, and magnify them enormously. 

 

Deep breath.

 

Ultimately, the new APIs are designed to cure all four of these problems.

* Why are games broken? Because the APIs are complex, and validation varies from decent (D3D 11) to poor (D3D 9) to catastrophic (OpenGL). There are lots of ways to hit slow paths without knowing anything has gone awry, and often the driver writers already know what mistakes you're going to make and are dynamically patching in workarounds for the common cases.

* Maintaining the drivers with the current wide surface area is tricky. Although AMD and NV have the resources to do it, the smaller IHVs (Intel, PowerVR, Qualcomm, etc) simply cannot keep up with the necessary investment. More importantly, explaining to devs the correct way to write their render pipelines has become borderline impossible. There's too many failure cases. it's been understood for quite a few years now that you cannot max out the performance of any given GPU without having someone from NVIDIA or AMD physically grab your game source code, load it on a dev driver, and do a hands-on analysis. These are the vanishingly few people who have actually seen the source to a game, the driver it's running on, and the Windows kernel it's running on, and the full specs for the hardware. Nobody else has that kind of access or engineering ability.

* Threading is just a catastrophe and is being rethought from the ground up. This requires a lot of the abstractions to be stripped away or retooled, because the old ones required too much driver intervention to be properly threadable in the first place.

* Multi-GPU is becoming explicit. For the last ten years, it has been AMD and NV's goal to make multi-GPU setups completely transparent to everybody, and it's become clear that for some subset of developers, this is just making our jobs harder. The driver has to apply imperfect heuristics to guess what the game is doing, and the game in turn has to do peculiar things in order to trigger the right heuristics. Again, for the big games somebody sits down and matches the two manually. 

 

Part of the goal is simply to stop hiding what's actually going on in the software from game programmers. Debugging drivers has never been possible for us, which meant a lot of poking and prodding and experimenting to figure out exactly what it is that is making the render pipeline of a game slow. The IHVs certainly weren't willing to disclose these things publicly either, as they were considered critical to competitive advantage. (Sure they are guys. Sure they are.) So the game is guessing what the driver is doing, the driver is guessing what the game is doing, and the whole mess could be avoided if the drivers just wouldn't work so hard trying to protect us.

 

So why didn't we do this years ago? Well, there are a lot of politics involved (cough Longs Peak) and some hardware aspects but ultimately what it comes down to is the new models are hard to code for. Microsoft and ARB never wanted to subject us to manually compiling shaders against the correct render states, setting the whole thing invariant, configuring heaps and tables, etc. Segfaulting a GPU isn't a fun experience. You can't trap that in a (user space) debugger. So ... the subtext that a lot of people aren't calling out explicitly is that this round of new APIs has been done in cooperation with the big engines. The Mantle spec is effectively written by Johan Andersson at DICE, and the Khronos Vulkan spec basically pulls Aras P at Unity, Niklas S at Epic, and a couple guys at Valve into the fold.

 

Three out of those four just made their engines public and free with minimal backend financial obligation.

 

Now there's nothing wrong with any of that, obviously, and I don't think it's even the big motivating raison d'etre of the new APIs. But there's a very real message that if these APIs are too challenging to work with directly, well the guys who designed the API also happen to run very full featured engines requiring no financial commitments*. So I think that's served to considerably smooth the politics involved in rolling these difficult to work with APIs out to the market, encouraging organizations that would have been otherwise reticent to do so.

[Edit/update] I'm definitely not suggesting that the APIs have been made artificially difficult, by any means - the engineering work is solid in its own right. It's also become clear, since this post was originally written, that there's a commitment to continuing DX11 and OpenGL support for the near future. That also helped the decision to push these new systems out, I believe.

 

The last piece to the puzzle is that we ran out of new user-facing hardware features many years ago. Ignoring raw speed, what exactly is the user-visible or dev-visible difference between a GTX 480 and a GTX 980? A few limitations have been lifted (notably in compute) but essentially they're the same thing. MS, for all practical purposes, concluded that DX was a mature, stable technology that required only minor work and mostly disbanded the teams involved. Many of the revisions to GL have been little more than API repairs. (A GTX 480 runs full featured OpenGL 4.5, by the way.) So the reason we're seeing new APIs at all stems fundamentally from Andersson hassling the IHVs until AMD woke up, smelled competitive advantage, and started paying attention. That essentially took a three year lag time from when we got hardware to the point that compute could be directly integrated into the core of a render pipeline, which is considered normal today but was bluntly revolutionary at production scale in 2012. It's a lot of small things adding up to a sea change, with key people pushing on the right people for the right things.

 

 

Phew. I'm no longer sure what the point of that rant was, but hopefully it's somehow productive that I wrote it. Ultimately the new APIs are the right step, and they're retroactively useful to old hardware which is great. They will be harder to code. How much harder? Well, that remains to be seen. Personally, my take is that MS and ARB always had the wrong idea. Their idea was to produce a nice, pretty looking front end and deal with all the awful stuff quietly in the background. Yeah it's easy to code against, but it was always a bitch and a half to debug or tune. Nobody ever took that side of the equation into account. What has finally been made clear is that it's okay to have difficult to code APIs, if the end result just works. And that's been my experience so far in retooling: it's a pain in the ass, requires widespread revisions to engine code, forces you to revisit a lot of assumptions, and generally requires a lot of infrastructure before anything works. But once it's up and running, there's no surprises. It works smoothly, you're always on the fast path, anything that IS slow is in your OWN code which can be analyzed by common tools. It's worth it.

 

(*See this post by Unity's Aras P for more thoughts. I have a response comment in there as well.)


Edited by Promit, 13 March 2015 - 10:04 AM.

SlimDX | Shark Eaters for iOS | Ventspace Blog | Twitter | Proud supporter of diversity and inclusiveness in game development

#7   Crossbones+   -  Reputation: 8584

Like
3Likes
Like

Posted 06 March 2015 - 03:50 PM

I feel like to fully support these APIs I need to almost abandon the previous APIs support in my engine since the veil is so much thinner, otherwise I'll just end up adding the same amount of abstraction that DX11 does already, kind of defeating the point.

Yes.
But it depends. For example if you were doing AZDO OpenGL, many of the concepts will already be familiar to you.
However, for example, AZDO never dealt with textures as thin as Vulkan or D3D12 do so you'll need to refactor those.
If you weren't following AZDO, then it's highly likely that the way you were using the old APIs is incompatible with the new says.

Actually there are way to do kindof multithreading in OpenGL 4 : (...). There is also glBufferStorage + IndirectDraw which allows you to access a buffer of instanced data that can be written like any others buffer, eg concurrently.
But it's not as powerful as what Vulkan or DX12 which allow to issue any command and not just instanced ones.

Actually DX12 & Vulkan are exactly following the same path glBufferStorage + IndirectDraw did. It just got easier, made thiner, and can now handle other misc aspects from within multiple cores (texture binding, shader compilation, barrier preparation, etc).

The rest was covered by Promit's excellent post.

#8   Members   -  Reputation: 1066

Like
0Likes
Like

Posted 06 March 2015 - 05:21 PM

Actually there are way to do kindof multithreading in OpenGL 4 : (...). There is also glBufferStorage + IndirectDraw which allows you to access a buffer of instanced data that can be written like any others buffer, eg concurrently.
But it's not as powerful as what Vulkan or DX12 which allow to issue any command and not just instanced ones.

Actually DX12 & Vulkan are exactly following the same path glBufferStorage + IndirectDraw did. It just got easier, made thiner, and can now handle other misc aspects from within multiple cores (texture binding, shader compilation, barrier preparation, etc).

 

 

There is something I don't really understand in Vulkan/DX12, it's the "descriptor" object. Apparently it acts as a gpu readable data chunk that hold texture pointer/size/layout and sampler info, but I don't understand the descriptor set/pool concept work, this sounds a lot like array of bindless texture handle to me.



#9   Crossbones+   -  Reputation: 8584

Like
3Likes
Like

Posted 06 March 2015 - 05:53 PM

There is something I don't really understand in Vulkan/DX12, it's the "descriptor" object. Apparently it acts as a gpu readable data chunk that hold texture pointer/size/layout and sampler info, but I don't understand the descriptor set/pool concept work, this sounds a lot like array of bindless texture handle to me.

Without going into detail; it's because only AMD & NVIDIA cards support bindless textures in their hardware, there's one major Desktop vendor that doesn't support it even though it's DX11 HW. Also take in mind both Vulkan & DX12 want to support mobile hardware as well.
You will have to give the API a table of textures based on frequency of updates: One blob of textures for those that change per material, one blob of textures for those that rarely change (e.g. environment maps), and another blob of textures that don't change (e.g. shadow maps).
It's very analogous to how we have been doing constant buffers with shaders (provide different buffers based on frequency of update).
And you put those blobs into a bigger blob and tell the API "I want to render with this big blob which is a collection of blobs of textures"; so the API can translate this very well to all sorts of hardware (mobile, Intel on desktop, and bindless like AMD's and NVIDIA's).

If all hardware were bindless, this set/pool wouldn't be needed because you could change one texture anywhere with minimal GPU overhead like you do in OpenGL4 with bindless texture extensions.
Nonetheless this descriptor pool set is also useful for non-texture stuff, (e.g. anything that requires binding, like constant buffers). It is quite generic.

Edited by Matias Goldberg, 06 March 2015 - 05:55 PM.


#10   Prime Members   -  Reputation: 2875

Like
0Likes
Like

Posted 06 March 2015 - 09:55 PM

So... is the Vulkan API available for us plebeians to peruse anywhere?



#11   Members   -  Reputation: 1274

Like
4Likes
Like

Posted 06 March 2015 - 10:15 PM

So... is the Vulkan API available for us plebeians to peruse anywhere?

Not Yet.

 

" Vulkan initial specifications and implementations are expected later this year "

 

From the press release (https://www.khronos.org/news/press/khronos-reveals-vulkan-api-for-high-efficiency-graphics-and-compute-on-gpus)



#12   Prime Members   -  Reputation: 2875

Like
0Likes
Like

Posted 06 March 2015 - 10:50 PM

 

So... is the Vulkan API available for us plebeians to peruse anywhere?

Not Yet.

 

" Vulkan initial specifications and implementations are expected later this year "

 

From the press release (https://www.khronos.org/news/press/khronos-reveals-vulkan-api-for-high-efficiency-graphics-and-compute-on-gpus)

 

 

sad.png



#13   Moderators   -  Reputation: 48904

Like
19Likes
Like

Posted 07 March 2015 - 02:53 AM

*
POPULAR

Apparently the mantle spec documents will be made public very soon, which will serve as a draft/preview of the Vulkan docs that will come later.

I'm extremely happy with what we've heard about Vulkan so far. Supporting it in my engine is going to be extremely easy.

However, supporting it in other engines may be a royal pain.
e.g. If you've got an engine that's based around the D3D9 API, then your D3D11 port is going to be very complex.
However, if your engine is based around the D3D911 API, then your D3D9 port is going to be very simple.

Likewise for this new generation of APIs -- if you're focusing too heavily on current generation thinking, then forward-porting will be painful.

In general, implementing new philosophies using old APIs is easy, but implementing old philosophies on new APIs is hard.

 

In my engine, I'm already largely using the Vulkan/D3D12 philosophy, so porting to them will be easy.
I also support D3D9-11 / GL2-4 - and the code to implement these "new" ideas on these "old" APIs is actually fairly simple - so I'd be brave enough to say that it is possible to have a very efficient engine design that works equally well on every API - the key is to base it around these modern philosophies though!
Personally, my engines cross-platform rendering layer is based on a mixture of Mantle and D3D11 ideas.

Ive made my API stateless, where every "DrawItem" must contain a complete pipeline state (blend/depth/raster/shader programs/etc) and all resource bindings required by those programs - however, these way these states/bindings are described (in client/user code) is very similar to the D3D11 model.
DrawItems can/should be prepared ahead of time and reused, though you can create them every frame if you want... When creating a DrawItem, you need to specify which "RenderPass" it will be used for, which specifies the render-target format(s), etc.

On older APIs, this let's you create your own compact data structures containing all the data required to make D3D/GL API calls required for that draw-call.
On newer APIs, this let's you actually pre-compile the native GPU commands!

 

You'll notice that in the Vulkan slides released so far, when you create a command buffer, you're forced to specify which queue you promise to use when submitting it later. Different queues may exist on different GPUs -- e.g. if you've got an NVidia and an Intel GPU present. The requirement to specify a queue ahead of time means that you're actually specifying a particular GPU ahead of time, which means the Vulkan drivers can convert your commands to that GPU's actual native instruction set ahead of time!

In either case, submitting a pre-prepared DrawItem to a context/commanf-buffer is very simple/efficient.
As a bonus, you sidestep all the bugs involved in state-machine graphics APIs biggrin.png


Edited by Hodgman, 07 March 2015 - 04:27 AM.


#14   Members   -  Reputation: 912

Like
0Likes
Like

Posted 07 March 2015 - 04:53 AM

Apparently the mantle spec documents will be made public very soon, which will serve as a draft/preview of the Vulkan docs that will come later.

I'm extremely happy with what we've heard about Vulkan so far. Supporting it in my engine is going to be extremely easy.

However, supporting it in other engines may be a royal pain.
e.g. If you've got an engine that's based around the D3D9 API, then your D3D11 port is going to be very complex.
However, if your engine is based around the D3D911 API, then your D3D9 port is going to be very simple.

Likewise for this new generation of APIs -- if you're focusing too heavily on current generation thinking, then forward-porting will be painful.

In general, implementing new philosophies using old APIs is easy, but implementing old philosophies on new APIs is hard.

 

In my engine, I'm already largely using the Vulkan/D3D12 philosophy, so porting to them will be easy.
I also support D3D9-11 / GL2-4 - and the code to implement these "new" ideas on these "old" APIs is actually fairly simple - so I'd be brave enough to say that it is possible to have a very efficient engine design that works equally well on every API - the key is to base it around these modern philosophies though!
Personally, my engines cross-platform rendering layer is based on a mixture of Mantle and D3D11 ideas.

Ive made my API stateless, where every "DrawItem" must contain a complete pipeline state (blend/depth/raster/shader programs/etc) and all resource bindings required by those programs - however, these way these states/bindings are described (in client/user code) is very similar to the D3D11 model.
DrawItems can/should be prepared ahead of time and reused, though you can create them every frame if you want... When creating a DrawItem, you need to specify which "RenderPass" it will be used for, which specifies the render-target format(s), etc.

On older APIs, this let's you create your own compact data structures containing all the data required to make D3D/GL API calls required for that draw-call.
On newer APIs, this let's you actually pre-compile the native GPU commands!

 

You'll notice that in the Vulkan slides released so far, when you create a command buffer, you're forced to specify which queue you promise to use when submitting it later. Different queues may exist on different GPUs -- e.g. if you've got an NVidia and an Intel GPU present. The requirement to specify a queue ahead of time means that you're actually specifying a particular GPU ahead of time, which means the Vulkan drivers can convert your commands to that GPU's actual native instruction set ahead of time!

In either case, submitting a pre-prepared DrawItem to a context/commanf-buffer is very simple/efficient.
As a bonus, you sidestep all the bugs involved in state-machine graphics APIs biggrin.png

 

That sounds extremely interesting. Could you make a concrete example of what the descriptions in a DrawItem look like? What is the granularity of a DrawItem? Is is it a per-Mesh kind of thing, or more like a "one draw item for every material type" kind of thing, and then you draw every mesh that uses that material with a single DrawItem?



#15   Members   -  Reputation: 4272

Like
0Likes
Like

Posted 07 March 2015 - 05:44 AM

Can I say something I do not like (DX related)? The "new" feature levels, especially 12.1.

 

Starting from 10.1 Microsoft introduced the concept of "feature level", a nice and smart way to collect all together hundreds of caps-bits and thousand of related permutation in a single - unique - decree. With feature level you can target older hardware with the last runtime available. Microsoft did not completely remove caps-bits for optional features, but their number reduced dramatically, something like two orders of magnitude. Even with Direct3D 11.2 the caps-bits number remained relatively small, although they could add a new feature level - let's call it feature level 11.2 - with all new optional features and tier 1 of tiled resources; nevermind that's not a big deal after all - complaints should be focused on the OS support since D3D 11.1.

Since the new API is focused mostly on the programming model, with Direct3D 12 new caps-bits and tiers collections were expected, and Microsoft did a good job reducing dramatically the complexity of different hardware capabilities permutations. New caps-bits and tiers of DX12 are not a big issue. At GDC15 they also announce two "new" feature levels (~14:00): feature level 12.0 and feature level 12.1. While feature level 12.0 looks reasonable (All GCN 1.1/1.2 and Maxwell 2.0 should support this - dunno about first generation of Maxwell), feature level 12.1 adds only ROVs (OK) and tier 1 of conservative rasterization (the most useless!) mandatory support.

I will not go into explicit details (detailed information should be still under NDA), however the second feature level looks tailor-made for a certain particular hardware (guess what!). Moreover FL 12.1 do not requires some really interesting features (greater conservative rasterization tier, volume tiled resources and even resource binding tier 3) that you could expected to be mandatory supported by future hardware. In substance FL12.1 really brake the concept of feature level in my view, which was a sort of "barrier" that defined new hardware capabilities for upcoming hardware.


Edited by Alessio1989, 07 March 2015 - 05:47 AM.

"Recursion is the first step towards madness." - "Skeggǫld, Skálmǫld, Skildir ro Klofnir!"
Direct3D 12 quick reference: https://github.com/alessiot89/D3D12QuickRef/

#16   Moderators   -  Reputation: 48904

Like
11Likes
Like

Posted 07 March 2015 - 07:39 AM

*
POPULAR

That sounds extremely interesting. Could you make a concrete example of what the descriptions in a DrawItem look like? What is the granularity of a DrawItem? Is is it a per-Mesh kind of thing, or more like a "one draw item for every material type" kind of thing, and then you draw every mesh that uses that material with a single DrawItem?

My DrawItem corresponds to one glDraw* / Draw* call, plus all the state that needs to be set immediately prior the draw.
One model will usually have one DrawItem per sub-mesh (where a sub-mesh is a portion of that model that uses a material), per pass (where as pass is e.g. drawing to gbuffer, drawing to shadow-map, forward rendered, etc). When drawing a model, it will find all the DrawItems for the current pass, and push them into a render list, which can then be sorted.

A DrawItem which contains the full pipeline state, the resource bindings, and the draw-call parameters could look like this in a naive D3D11 implementation:

struct DrawItem
{
  //pipeline state:
  ID3D11PixelShader* ps;
  ID3D11VertexShader* vs;
  ID3D11BlendState* blend;
  ID3D11DepthStencilState* depth;
  ID3D11RasterizerState* raster;
  D3D11_RECT* scissor;
  //input assembler state
  D3D11_PRIMITIVE_TOPOLOGY primitive;
  ID3D11InputLayout* inputLayout;
  ID3D11Buffer* indexBuffer;
  vector<tuple<int/*slot*/,ID3D11Buffer*,uint/*stride*/,uint/*offset*/>> vertexBuffers;
  //resource bindings:
  vector<pair<int/*slot*/, ID3D11Buffer*>> cbuffers;
  vector<pair<int/*slot*/, ID3D11SamplerState*>> samplers;
  vector<pair<int/*slot*/, ID3D11ShaderResourceView*>> textures;
  //draw call parameters:
  int numVerts, numInstances, indexBufferOffset, vertexBufferOffset;
};

That structure is extremely unoptimized though. It's a base size of ~116 bytes, plus the memory used by the vectors, which could be ~1KiB!

I'd aim to compress them down to 28-100 bytes in a single contiguous allocation, e.g. by using ID's instead of pointers, by grouping objects together (e.g. referencing a PS+VS program pair, instead of referencing each individually), and by using variable length arrays built into that structure instead of vectors.

When porting to Mantle/Vulkan/D3D12, that "pipeline state" section all gets replaced with a single "pipeline state object" and the "input assembler" / "resource bindings" sections get replaced by a "descriptor set". Alternatively, these new APIs also allow for a DrawItem to be completely replaced by a very small native command buffer!

 

There's a million ways to structure a renderer, but this is the design I ended up with, which I personally find very simple to implement on / port to every platform.


Edited by Hodgman, 08 March 2015 - 09:51 PM.


#17   Members   -  Reputation: 912

Like
0Likes
Like

Posted 07 March 2015 - 10:08 AM

 

That sounds extremely interesting. Could you make a concrete example of what the descriptions in a DrawItem look like? What is the granularity of a DrawItem? Is is it a per-Mesh kind of thing, or more like a "one draw item for every material type" kind of thing, and then you draw every mesh that uses that material with a single DrawItem?

My DrawItem corresponds to one glDraw* / Draw* call, plus all the state that needs to be set immediately prior the draw.
One model will usually have one DrawItem per sub-mesh (where a sub-mesh is a portion of that model that uses a material), per pass (where as pass is e.g. drawing to gbuffer, drawing to shadow-map, forward rendered, etc). When drawing a model, it will find all the DrawItems for the current pass, and push them into a render list, which can then be sorted.

A DrawItem which contains the full pipeline state, the resource bindings, and the draw-call parameters could look like this in a naive D3D11 implementation:

struct DrawItem
{
  //pipeline state:
  ID3D11PixelShader* ps;
  ID3D11VertexShader* vs;
  ID3D11BlendState* blend;
  ID3D11DepthStencilState* depth;
  ID3D11RasterizerState* raster;
  D3D11_RECT* scissor;
  //input assembler state
  D3D11_PRIMITIVE_TOPOLOGY primitive;
  ID3D11InputLayout* inputLayout;
  ID3D11Buffer* indexBuffer;
  vector<tuple<int/*slot*/,ID3D11Buffer*,uint/*stride*/,uint/*offset*/>> vertexBuffers;
  //resource bindings:
  vector<pair<int/*slot*/, ID3D11Buffer*>> cbuffers;
  vector<pair<int/*slot*/, ID3D11SamplerState*>> samplers;
  vector<pair<int/*slot*/, ID3D11ShaderResourceView*>> textures;
  //draw call parameters:
  int numVerts, numInstances, indexBufferOffset, vertexBufferOffset;
};

That structure is extremely unoptimized though. It's a base size of ~116 bytes, plus the memory used by the vectors, which could be ~1KiB!

I'd aim to compress them down to 28-100 bytes in a single contiguous allocation, e.g. by using ID's instead of pointers, by grouping objects together (e.g. referencing a PS+VS program pair, instead of referencing each individually), and by using variable length arrays built into that structure instead of vectors.

When porting to Mantle/Vulkan/D3D12, that "pipeline state" section all gets replaced with a single object and the "input assembler" / "resource bindings" sections get replaced by a descriptor. Alternatively, these new APIs also allow for a DrawItem to be completely replaced by a very small native command buffer!

 

There's a million ways to structure a renderer, but this is the design I ended up with, which I personally find very simple to implement on / port to every platform.

 

 

Thanks a lot for that description. I must say it sounds very elegant. It's almost like a functional programming approach to draw call submission, along with its disadvantages and advantages. 



#18   Members   -  Reputation: 1066

Like
0Likes
Like

Posted 07 March 2015 - 12:29 PM

 

There is something I don't really understand in Vulkan/DX12, it's the "descriptor" object. Apparently it acts as a gpu readable data chunk that hold texture pointer/size/layout and sampler info, but I don't understand the descriptor set/pool concept work, this sounds a lot like array of bindless texture handle to me.

Without going into detail; it's because only AMD & NVIDIA cards support bindless textures in their hardware, there's one major Desktop vendor that doesn't support it even though it's DX11 HW. Also take in mind both Vulkan & DX12 want to support mobile hardware as well.
You will have to give the API a table of textures based on frequency of updates: One blob of textures for those that change per material, one blob of textures for those that rarely change (e.g. environment maps), and another blob of textures that don't change (e.g. shadow maps).
It's very analogous to how we have been doing constant buffers with shaders (provide different buffers based on frequency of update).
And you put those blobs into a bigger blob and tell the API "I want to render with this big blob which is a collection of blobs of textures"; so the API can translate this very well to all sorts of hardware (mobile, Intel on desktop, and bindless like AMD's and NVIDIA's).

If all hardware were bindless, this set/pool wouldn't be needed because you could change one texture anywhere with minimal GPU overhead like you do in OpenGL4 with bindless texture extensions.
Nonetheless this descriptor pool set is also useful for non-texture stuff, (e.g. anything that requires binding, like constant buffers). It is quite generic.

 

 

Thank.
I think it also make sparse texture available ? At least the tier level requested by arb_sparse_texture (ie without shader function returning residency state).



#19   Members   -  Reputation: 4272

Like
0Likes
Like

Posted 07 March 2015 - 12:45 PM

 

 

There is something I don't really understand in Vulkan/DX12, it's the "descriptor" object. Apparently it acts as a gpu readable data chunk that hold texture pointer/size/layout and sampler info, but I don't understand the descriptor set/pool concept work, this sounds a lot like array of bindless texture handle to me.

Without going into detail; it's because only AMD & NVIDIA cards support bindless textures in their hardware, there's one major Desktop vendor that doesn't support it even though it's DX11 HW. Also take in mind both Vulkan & DX12 want to support mobile hardware as well.
You will have to give the API a table of textures based on frequency of updates: One blob of textures for those that change per material, one blob of textures for those that rarely change (e.g. environment maps), and another blob of textures that don't change (e.g. shadow maps).
It's very analogous to how we have been doing constant buffers with shaders (provide different buffers based on frequency of update).
And you put those blobs into a bigger blob and tell the API "I want to render with this big blob which is a collection of blobs of textures"; so the API can translate this very well to all sorts of hardware (mobile, Intel on desktop, and bindless like AMD's and NVIDIA's).

If all hardware were bindless, this set/pool wouldn't be needed because you could change one texture anywhere with minimal GPU overhead like you do in OpenGL4 with bindless texture extensions.
Nonetheless this descriptor pool set is also useful for non-texture stuff, (e.g. anything that requires binding, like constant buffers). It is quite generic.

 

 

Thank.
I think it also make sparse texture available ? At least the tier level requested by arb_sparse_texture (ie without shader function returning residency state).

 

 

On DirectX 12 Feature Level 11/11.1 GPUs the support of tier 1 of tiled resources (sparse texture) is still optional. In that GPU range, even if their architecture should support tier 1 of tiled resource, there are some GPUs (low/low-mid end, desktop and mobile) that do not support it (e.g.: AMD HD 7700 Mobile GPUs driver support of tiled resources is still disable). The same should apply to OGL/Vulkan.


Edited by Alessio1989, 07 March 2015 - 12:45 PM.

"Recursion is the first step towards madness." - "Skeggǫld, Skálmǫld, Skildir ro Klofnir!"
Direct3D 12 quick reference: https://github.com/alessiot89/D3D12QuickRef/

#20   Moderators   -  Reputation: 48904

Like
9Likes
Like

Posted 08 March 2015 - 05:40 AM

*
POPULAR

If all hardware were bindless, this set/pool wouldn't be needed because you could change one texture anywhere with minimal GPU overhead like you do in OpenGL4 with bindless texture extensions.
Nonetheless this descriptor pool set is also useful for non-texture stuff, (e.g. anything that requires binding, like constant buffers). It is quite generic.

They're actually designed specifically to exploit the strengths of modern bindless GPU's, especially AMD GCN as they're basically copy&pasted from the Mantle specs (which were designed to be cross-vendor, but obviously somewhat biased by having AMD GCN as the min-spec).
 

There is something I don't really understand in Vulkan/DX12, it's the "descriptor" object. Apparently it acts as a gpu readable data chunk that hold texture pointer/size/layout and sampler info, but I don't understand the descriptor set/pool concept work, this sounds a lot like array of bindless texture handle to me.

A descriptor is a texture-view, buffer-view, sampler, or a pointer
A descriptor set is an array/table/struct of descriptors.
A descriptor pool is basically a large block of memory that acts as a memory allocator for descriptor sets.

So yes, it's very much like bindless handles, but instead of them being handles, they're the actual guts of a texture-view, or an actual sampler structure, etc...
 
Say you've got a HLSL shader with:
Texture2D texture0 : register(t0);
SamplerState samLinear : register(s0);
 In D3D11, you'd bind resources to this shader using something like:
ID3D11SamplerState* mySampler = ...;
ID3D11ShaderResourceView* myTexture = ...;
ctx.PSSetSampelrs( 0, 1, &mySampler );
ctx.VSSetSampelrs( 0, 1, &mySampler );
ctx.PSSetShaderResources( 0, 1, &myTexture );
ctx.VSSetShaderResources( 0, 1, &myTexture );
ctx.Draw(...);//draw something using the bound resources
Let's say that these new APIs give us a nice new bindless way to describe the inputs to the shader. Instead of assigning resources to slots/registers, we'll just put them all into a struct -- that struct is the descriptor set.
Our hypothetical (because I don't know the new/final syntax yet) HLSL shader code might look like:
struct DescriptorSet : register(d0)
{
  Texture2D texture0;
  SamplerState samLinear;
};
In our C/C++ code, we can now "bind resources" to the shader with something like this:
I'm inventing the API here -- vulkan doesn't look like this, it's just a guess of what it might look like:
struct MyDescriptorSet // this matches our shader's structure, using corresponding Vulkan C types instead of the "HLSL" types above.
{
  VK_IMAGE_VIEW texture0;    //n.b. these types are the actual structures that the GPU is hard-wired to interpret, which means
  VK_SAMPLER_STATE samLinear;//      they'll change from driver-to-driver, so there must be some abstraction here over my example
};                           //      such as using handles or pointers to the actual structures?

descriptorHandle = vkCreateDescriptorSet( sizeof(MyDescriptorSet), descriptorPool );//allocate an instance of the structure in GPU memory

//copy the resource views that you want to 'bind' into the descriptor set.
MyDescriptorSet* descriptorSet = (MyDescriptorSet*)vkMapDescriptorSet(descriptorHandle);
descriptorSet->texture0 = *myTexture; // CPU is writing into GPU memory here, via write-combined uncached pages!
descriptorSet->samLinear = *mySampler;
vkUnmapDescriptorSet(descriptorHandle);

//later when drawing something 
vkCmdBindDescriptorSet(cmdBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, descriptorHandle, 0);
vkCmdDraw(cmdBuffer, ...);//draw something using the bound resources
You can see now, when drawing an object, there's only a single API call required to bind all of it's resources.
Also, earlier we required to double up our API calls if the pixel-shader and the vertex-shader both needed the same resources, but now the descriptor-set is shared among all stages.
If an object always uses the same resources every frame, then you can prepare it's descriptor set once, ahead of time, and then do pretty much nothing every frame! All you need to do is call vkCmdBindDescriptorSet and vkCmdDraw.
Even better, those two functions record their commands into a command buffer... so it's possible to record a command buffer for each object ahead of time, and then every frame you only need to call vkQueueSubmit per object to submit it's pre-prepared command buffer.

If we want to modify which resources that draw-call uses, we can simply write new descriptors into that descriptor set. The easiest way is by mapping/unmapping the tables and writing with the CPU as above, but in theory you could also use GPU copy or compute jobs to modify them. GPU modification of descriptor sets would only be possible on truely bindless GPUs, so I'm not sure if this feature will actually be exposed by Vulkan/D3D12 -- maybe in an extension later... This would mean that when you want to change which material a draw-item uses, you could use a compute job to update that draw-item's descriptor set! Along with multi-draw-indirect, you could move even more CPU side work over to the GPU.


Also, it's possible to put pointers to descriptor sets inside descriptor sets!
This is useful where you've got a lot of resource bindings that are shared across a series of draw-calls, so you don't want the CPU to have to re-copy all those bindings for each draw-call.

e.g. set up a shader with a per-object descriptor set, which points to a per-camera descriptor set:
cbuffer CameraData
{
  Matrix4x4 viewProj;
};

struct SharedDescriptorSet
{
  SamplerState samLinear;
  CameraData camera;
}
struct MainDescriptorSet : register(d0)
{
  Texture2D texture0;
  SharedDescriptorSet* shared;
};
The C side would then make an instance of each, and make one link to the other. When drawing, you just have to bind the per-object one:
sharedDescriptorHandle = vkCreateDescriptorSet( sizeof(SharedDescriptorSet), descriptorPool );
obj0DescriptorHandle = vkCreateDescriptorSet( sizeof(MainDescriptorSet ), descriptorPool );

SharedDescriptorSet* descriptorSet = (SharedDescriptorSet*)vkMapDescriptorSet(sharedDescriptorHandle);
descriptorSet->camera = *myCbufferView;
descriptorSet->samLinear = *mySampler;
vkUnmapDescriptorSet(sharedDescriptorHandle);

MainDescriptorSet * descriptorSet = (MainDescriptorSet *)vkMapDescriptorSet(obj0DescriptorHandle);
descriptorSet->texture0 = *myTexture;
descriptorSet->shared = sharedDescriptorHandle;
vkUnmapDescriptorSet(obj0DescriptorHandle);

//bind obj0Descriptor, which is a MainDescriptorSet, which points to sharedDescriptor, which is a SharedDescriptorSet
vkCmdBindDescriptorSet(cmdBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, obj0DescriptorHandle, 0); 
vkCmdDraw(cmdBuffer, ...);//draw something using the bound resources

Edited by Hodgman, 08 March 2015 - 05:51 AM.





Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.




PARTNERS