Jump to content
  • Advertisement
Sign in to follow this  
Seabolt

Vulkan What are your opinions on DX12/Vulkan/Mantle?

This topic is 1253 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

 

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

 

 

Try to "ask" access another time, it worked for me happy.png Anyway I have to recognize that the approbation process could be improved a lot.

Edited by Alessio1989

Share this post


Link to post
Share on other sites
Advertisement

 

 

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

 

 

Try to "ask" access another time, it worked for me happy.png Anyway I have to recognize that the approbation process could be improved a lot.

 

 

I have no idea what you mean by "try to 'ask' access another time".

Share this post


Link to post
Share on other sites

I've refrained from replying to this for a few days while I've been letting the information that's recently come out, and the implications of it, bounce around my head for a bit, but feel roundabout ready to do so now.

 

I'm really looking forward to programming in this style.

 

I'm aware and accept that there's going to be a substantial upfront investment required, but I think the payoff is going to be worth it.

 

I think a lot of code is going to get much cleaner as a result of all this.  A lot of really gross batching and state management/filtering code is just going to go away.  Things are going to get a lot simpler; once we tackle the challenge of managing (and being responsible for) GPU resources at a lower level, which I think is something that we're largely going to write once and then reuse across multiple projects, programming graphics is going to start being fun again.

 

I think it's going to start becoming a little like the old days of OpenGL; not quite at the level where you could just issue a glBegin/glEnd pair and start experimenting and seeing what kind of cool stuff you could do, but it will become a lot easier to just drop in new code without having to fret excessively about draw call counts, batching, state management, driver overhead, and "is this effect slow because it's slow, or is it slow because I've hit a slow path in the driver and I need to go back and rearchitect?"  That's really going to open up a lot of possibilities for people to start going nuts.

 

I think that the people who are going to have the hardest time of it are those who have the heaviest investment in what's become a traditional API usage over the past few years: lots of batching and instancing, in other words.  I have one project, using D3D11, that I think I would probably have to rewrite from scratch (I probably won't bother).  On the other hand, I have another, using a FrankenGL version, that I think will come over quite a bit more cleanly.  That's going to be quite cool and fun to do.

 

So unless I've got things badly wrong about all of this, I'm really stoked about the prospects.

Share this post


Link to post
Share on other sites

I will not go into explicit details (detailed information should be still under NDA), however the second feature level looks tailor-made for a certain particular hardware (guess what!). Moreover FL 12.1 do not requires some really interesting features (greater conservative rasterization tier, volume tiled resources and even resource binding tier 3) that you could expected to be mandatory supported by future hardware. In substance FL12.1 really brake the concept of feature level in my view, which was a sort of "barrier" that defined new hardware capabilities for upcoming hardware.

So you have feature level 12.0 for mainstream hardware, older feature levels for old/low-end hardware, and 12.1 for "a certain particular hardware" and most foreseeable future hardware. How is this a problem? Clearly, if 12.1 is so similar to 12.0, 12.0 is the main target and you won't be writing much special case code for 12.1.

Edited by LorenzoGatti

Share this post


Link to post
Share on other sites

 

 

 

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

 

 

Try to "ask" access another time, it worked for me happy.png Anyway I have to recognize that the approbation process could be improved a lot.

 

 

I have no idea what you mean by "try to 'ask' access another time".

 

 

Try to compile twice the form: http://aka.ms/dxeap

 

 

I will not go into explicit details (detailed information should be still under NDA), however the second feature level looks tailor-made for a certain particular hardware (guess what!). Moreover FL 12.1 do not requires some really interesting features (greater conservative rasterization tier, volume tiled resources and even resource binding tier 3) that you could expected to be mandatory supported by future hardware. In substance FL12.1 really brake the concept of feature level in my view, which was a sort of "barrier" that defined new hardware capabilities for upcoming hardware.

So you have feature level 12.0 for mainstream hardware, older feature levels for old/low-end hardware, and 12.1 for "a certain particular hardware" and most foreseeable future hardware. How is this a problem? Clearly, if 12.1 is so similar to 12.0, 12.0 is the main target and you won't be writing much special case code for 12.1.

 

 

It's not "a problem" per sé, I'm just saying I was expected to see a feature level for future hardware with more interesting and radical requirements that could have been FL 12.1 (eg: mandatory support for 3D tiled resouces,  higher tier of conservative rasterization and standard swizzle, tier 3 resource binding.. and what the hell, even PS stencil ref is still optional). FL 12.0 and 12.1 are quite identical except or ROVs (probably the most valuable requirement of FL12.1) and conservative rasterization tier 1 (which is useless but for anti-aliasing).

I'm not saying anything else. With D3D12 you can still target every feature level you want (even 10Level9s) and query for every single new feature hardware feature (e.g.: you can use ROVs on a FL 11.0 GPU if it is supported by the hardware/driver).

Edited by Alessio1989

Share this post


Link to post
Share on other sites

 

 

 

 

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

 

 

Try to "ask" access another time, it worked for me happy.png Anyway I have to recognize that the approbation process could be improved a lot.

 

 

I have no idea what you mean by "try to 'ask' access another time".

 

 

Try to compile twice the form: http://aka.ms/dxeap

 

I've submitted the form at least three times. At this point, I've given up.

Share this post


Link to post
Share on other sites

- Memory residency management. The presenters were talking along the lines of the developers being responsible for loading/unloading graphics resources from VRAM to System Memory whenever the loads are getting too high. This should be an edge case but it's still an entirely new engine feature.

Yeah it's going to be interesting to see what solutions different engines end up using here.
The simplest thing I can think of is to maintain a Set<Resource*> alongside every command buffer. Whenever you bind a resource, add it to the set. When submitting the command buffer, you can first use that set to notify windows of the VRAM regions that are required to be resident.

The fail case there is when that residency request is too big... As you're building the command buffer, you'd have to keep track of an estimate of the VRAM residency requirement, and if it gets too big, finish the current command buffer and start a new one.


- Secondary threads for resource loading/shader compilation. This is actually a really good thing that I'm excited for, but it does mean I need to change my render thread to start issuing new jobs and maintaining. It's necessary, and for the better good, but another task nonetheless.

If you're using D3D11, you can start working on it now.
If you're on GL, you can start doing it for buffers/textures via context resource sharing... But it's potentially a lot of GL-specific code that you're not going to need in your new engine.

- Root Signatures/Shader Constant management
Again really exciting stuff, but seems like a huge potential for issues, not to mention the engine now has to be acutely aware of how frequently the constants are changed and then map them appropriately.

Yeah if you can give frequency hints in your shader code, it might make your life easier.

When compiling a shader, I imagine you'd first try to fit all of its parameters into the root, and then fall back to other strategies if they don't fit.

The simplest strategy is putting everything required for your shader into a single big descriptor set, and having the root just contain the link to that set. I imagine a lot of people might start with something like that to begin with.

I don't have an update-frequency hinting feature, but my shader system does already group texture/buffer bindings together into "ResourceLists".
e.g. A DX11 shader might have material data in slots t0/t1/t2 and a shadowmap in t3. In the shader code, I declare a ResourceList containing the 3 material textures, and a 2nd ResourceList containing the shadowmap.
The user can't bind individual resources to my shader, they can only bind entire ResourceLists.
I imagine that on D3D12, these ResourceLists can actually just be DescriptorSets, and the root can just point out to them.
So, not describing frequency, but at least describing which bindings are updated together.

I'll also be adding in architecture for Compute Shaders for the first time, so I'm worried that I might be biting off too much at once.

Yeah it's haven't done a robust compute wrapper before either. I'm doing the same stateless job kinda thing as I've already done for graphics so far.
With the next generation APIs, there's a few extra hassles with compute -- after a dispatch, you almost always have to submit a barrier, so that the next draw/dispatch call will stall until the preceding compute shader is actually complete.

Same goes for passes that render to render-target actually. e.g. In a post-processing chain (where each draw reads the results from the previous one) you need barriers after each draw to transition from RT to texture, which had the effect of inserting these necessary stalls.

I think a lot of code is going to get much cleaner as a result of all this. A lot of really gross batching and state management/filtering code is just going to go away.

For simple ports, you might be able to leverage that ugly code :D
In the D3D12 preview from last year, they mentioned that when porting 3DMark, they replaced their traditional state-caching code with a PSO/bundle cache, and still got more than a 2x performance boost over DX11.

I think that the people who are going to have the hardest time of it are those who have the heaviest investment in what's become a traditional API usage over the past few years: lots of batching and instancing, in other words.

Stuff that's designed for traditional batching will probably be very well suited to the new "bundle" API.

I am a bit concerned about sync issues. Sync between CPU and GPU (or even the GPU with itself) can lead to some really awful, hard-to-track down bugs. It's bad because you might think that you're doing it right, but then you make a small tweak to a shader and suddenly you have artifacts.

Here's hoping the debuggers are able to detect sync errors. The whole "transition" concept, which is a bit more abstracted than the reality, should help debuggers here. Even if the debugger can just put its hands up and say "you did *something* non-deterministic in that frame", then at least we'll know our app is busted.

Share this post


Link to post
Share on other sites

- Root Signatures/Shader Constant management
Again really exciting stuff, but seems like a huge potential for issues, not to mention the engine now has to be acutely aware of how frequently the constants are changed and then map them appropriately.

You should already be doing that on modern D3D11/GL.
In Ogre 2.1 we use 4 buffer slots:

  1. One for per-pass data
  2. One to store all materials (up to 273 materials per buffer due to the 64kb per const buffer restriction)
  3. One to store per-draw data
  4. One tbuffer to store per-draw data (similar to 3. but it's a tbuffer which stores more data where not having the 64kb restriction is handy)

Of all those slots, we don't really change them. Even the per-draw parameters.

The only time we need rebind buffers are when:

  1. We've exceeded one of the per-draw buffers size (so we bind a new empty buffer)
  2. We are in a different pass (we need another per-pass buffer)
  3. We have more than 273 materials overall and previous draw referenced material #0 and the current one is referencing material #280 (so we need the switch the material buffer)
  4. We change to a shader that doesn't use these bindings (very rare).

Point 2 happens very infrequently. Point 3 & 4 can be minimized by sorting by state in a RenderQueue. Point 1 happens very infrequently too, and if you're on GCN the 64kb limit gets upgraded to 2GB limit, which means you wouldn't need to switch at all (and also solves point #3 entirely).

The entire bindings don't really change often and this property can already be exploited using DX11 and GL4. DX12/Vulkan just makes the interface thiner; that's all.

Edited by Matias Goldberg

Share this post


Link to post
Share on other sites

 One to store per-draw data 
Do you use some form of indexing into the UBO to fetch the data? I'm currently batching UBO updates (say, fit as many transforms, lights or materials as I can on one glBufferSubData call) and do a glUniform1i with an index, then index into the UBO to fetch the correct transform. This has the obvious limitation that I need one draw call per object being drawn to update the index uniform in between, but honestly I'm not sure how else I could do that. And AFAIK its also how its made in a nVidia presentation about batching updates.

 

Good thing is that I can do usually batches of 100 to 200 in one buffe rupdate call, bad thing is that I have equivalent number of draw and glUniform1i calls. Have in mind that I'm using OpenGL 3.3 here so no multi draw indirect stuff :D

 

And BTW, marking Promit's post as "Popular" is the understatement of the year (I never saw that badge before!). Thing hit like all the retweets and 300 comments on Reddit. You could sell Promit as internet traffic attractor if the site is low on cash :P

Share this post


Link to post
Share on other sites

I use baseInstance parameter from glDraw*BaseInstanceBaseVertex. gl_InstanceID will still be zero based, but you can use an instanced vertex element to overcome this problem (or use an extension that exposes an extra glsl variable with the value of baseInstance)

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!