Jump to content
  • Advertisement
Sign in to follow this  
Seabolt

Vulkan What are your opinions on DX12/Vulkan/Mantle?

This topic is 1080 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

 

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

 

 

Try to "ask" access another time, it worked for me happy.png Anyway I have to recognize that the approbation process could be improved a lot.

Edited by Alessio1989

Share this post


Link to post
Share on other sites
Advertisement

 

 

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

 

 

Try to "ask" access another time, it worked for me happy.png Anyway I have to recognize that the approbation process could be improved a lot.

 

 

I have no idea what you mean by "try to 'ask' access another time".

Share this post


Link to post
Share on other sites

I've refrained from replying to this for a few days while I've been letting the information that's recently come out, and the implications of it, bounce around my head for a bit, but feel roundabout ready to do so now.

 

I'm really looking forward to programming in this style.

 

I'm aware and accept that there's going to be a substantial upfront investment required, but I think the payoff is going to be worth it.

 

I think a lot of code is going to get much cleaner as a result of all this.  A lot of really gross batching and state management/filtering code is just going to go away.  Things are going to get a lot simpler; once we tackle the challenge of managing (and being responsible for) GPU resources at a lower level, which I think is something that we're largely going to write once and then reuse across multiple projects, programming graphics is going to start being fun again.

 

I think it's going to start becoming a little like the old days of OpenGL; not quite at the level where you could just issue a glBegin/glEnd pair and start experimenting and seeing what kind of cool stuff you could do, but it will become a lot easier to just drop in new code without having to fret excessively about draw call counts, batching, state management, driver overhead, and "is this effect slow because it's slow, or is it slow because I've hit a slow path in the driver and I need to go back and rearchitect?"  That's really going to open up a lot of possibilities for people to start going nuts.

 

I think that the people who are going to have the hardest time of it are those who have the heaviest investment in what's become a traditional API usage over the past few years: lots of batching and instancing, in other words.  I have one project, using D3D11, that I think I would probably have to rewrite from scratch (I probably won't bother).  On the other hand, I have another, using a FrankenGL version, that I think will come over quite a bit more cleanly.  That's going to be quite cool and fun to do.

 

So unless I've got things badly wrong about all of this, I'm really stoked about the prospects.

Share this post


Link to post
Share on other sites

I will not go into explicit details (detailed information should be still under NDA), however the second feature level looks tailor-made for a certain particular hardware (guess what!). Moreover FL 12.1 do not requires some really interesting features (greater conservative rasterization tier, volume tiled resources and even resource binding tier 3) that you could expected to be mandatory supported by future hardware. In substance FL12.1 really brake the concept of feature level in my view, which was a sort of "barrier" that defined new hardware capabilities for upcoming hardware.

So you have feature level 12.0 for mainstream hardware, older feature levels for old/low-end hardware, and 12.1 for "a certain particular hardware" and most foreseeable future hardware. How is this a problem? Clearly, if 12.1 is so similar to 12.0, 12.0 is the main target and you won't be writing much special case code for 12.1.

Edited by LorenzoGatti

Share this post


Link to post
Share on other sites

 

 

 

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

 

 

Try to "ask" access another time, it worked for me happy.png Anyway I have to recognize that the approbation process could be improved a lot.

 

 

I have no idea what you mean by "try to 'ask' access another time".

 

 

Try to compile twice the form: http://aka.ms/dxeap

 

 

I will not go into explicit details (detailed information should be still under NDA), however the second feature level looks tailor-made for a certain particular hardware (guess what!). Moreover FL 12.1 do not requires some really interesting features (greater conservative rasterization tier, volume tiled resources and even resource binding tier 3) that you could expected to be mandatory supported by future hardware. In substance FL12.1 really brake the concept of feature level in my view, which was a sort of "barrier" that defined new hardware capabilities for upcoming hardware.

So you have feature level 12.0 for mainstream hardware, older feature levels for old/low-end hardware, and 12.1 for "a certain particular hardware" and most foreseeable future hardware. How is this a problem? Clearly, if 12.1 is so similar to 12.0, 12.0 is the main target and you won't be writing much special case code for 12.1.

 

 

It's not "a problem" per sé, I'm just saying I was expected to see a feature level for future hardware with more interesting and radical requirements that could have been FL 12.1 (eg: mandatory support for 3D tiled resouces,  higher tier of conservative rasterization and standard swizzle, tier 3 resource binding.. and what the hell, even PS stencil ref is still optional). FL 12.0 and 12.1 are quite identical except or ROVs (probably the most valuable requirement of FL12.1) and conservative rasterization tier 1 (which is useless but for anti-aliasing).

I'm not saying anything else. With D3D12 you can still target every feature level you want (even 10Level9s) and query for every single new feature hardware feature (e.g.: you can use ROVs on a FL 11.0 GPU if it is supported by the hardware/driver).

Edited by Alessio1989

Share this post


Link to post
Share on other sites

 

 

 

 

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

 

 

Try to "ask" access another time, it worked for me happy.png Anyway I have to recognize that the approbation process could be improved a lot.

 

 

I have no idea what you mean by "try to 'ask' access another time".

 

 

Try to compile twice the form: http://aka.ms/dxeap

 

I've submitted the form at least three times. At this point, I've given up.

Share this post


Link to post
Share on other sites

- Memory residency management. The presenters were talking along the lines of the developers being responsible for loading/unloading graphics resources from VRAM to System Memory whenever the loads are getting too high. This should be an edge case but it's still an entirely new engine feature.

Yeah it's going to be interesting to see what solutions different engines end up using here.
The simplest thing I can think of is to maintain a Set<Resource*> alongside every command buffer. Whenever you bind a resource, add it to the set. When submitting the command buffer, you can first use that set to notify windows of the VRAM regions that are required to be resident.

The fail case there is when that residency request is too big... As you're building the command buffer, you'd have to keep track of an estimate of the VRAM residency requirement, and if it gets too big, finish the current command buffer and start a new one.


- Secondary threads for resource loading/shader compilation. This is actually a really good thing that I'm excited for, but it does mean I need to change my render thread to start issuing new jobs and maintaining. It's necessary, and for the better good, but another task nonetheless.

If you're using D3D11, you can start working on it now.
If you're on GL, you can start doing it for buffers/textures via context resource sharing... But it's potentially a lot of GL-specific code that you're not going to need in your new engine.

- Root Signatures/Shader Constant management
Again really exciting stuff, but seems like a huge potential for issues, not to mention the engine now has to be acutely aware of how frequently the constants are changed and then map them appropriately.

Yeah if you can give frequency hints in your shader code, it might make your life easier.

When compiling a shader, I imagine you'd first try to fit all of its parameters into the root, and then fall back to other strategies if they don't fit.

The simplest strategy is putting everything required for your shader into a single big descriptor set, and having the root just contain the link to that set. I imagine a lot of people might start with something like that to begin with.

I don't have an update-frequency hinting feature, but my shader system does already group texture/buffer bindings together into "ResourceLists".
e.g. A DX11 shader might have material data in slots t0/t1/t2 and a shadowmap in t3. In the shader code, I declare a ResourceList containing the 3 material textures, and a 2nd ResourceList containing the shadowmap.
The user can't bind individual resources to my shader, they can only bind entire ResourceLists.
I imagine that on D3D12, these ResourceLists can actually just be DescriptorSets, and the root can just point out to them.
So, not describing frequency, but at least describing which bindings are updated together.

I'll also be adding in architecture for Compute Shaders for the first time, so I'm worried that I might be biting off too much at once.

Yeah it's haven't done a robust compute wrapper before either. I'm doing the same stateless job kinda thing as I've already done for graphics so far.
With the next generation APIs, there's a few extra hassles with compute -- after a dispatch, you almost always have to submit a barrier, so that the next draw/dispatch call will stall until the preceding compute shader is actually complete.

Same goes for passes that render to render-target actually. e.g. In a post-processing chain (where each draw reads the results from the previous one) you need barriers after each draw to transition from RT to texture, which had the effect of inserting these necessary stalls.

I think a lot of code is going to get much cleaner as a result of all this. A lot of really gross batching and state management/filtering code is just going to go away.

For simple ports, you might be able to leverage that ugly code :D
In the D3D12 preview from last year, they mentioned that when porting 3DMark, they replaced their traditional state-caching code with a PSO/bundle cache, and still got more than a 2x performance boost over DX11.

I think that the people who are going to have the hardest time of it are those who have the heaviest investment in what's become a traditional API usage over the past few years: lots of batching and instancing, in other words.

Stuff that's designed for traditional batching will probably be very well suited to the new "bundle" API.

I am a bit concerned about sync issues. Sync between CPU and GPU (or even the GPU with itself) can lead to some really awful, hard-to-track down bugs. It's bad because you might think that you're doing it right, but then you make a small tweak to a shader and suddenly you have artifacts.

Here's hoping the debuggers are able to detect sync errors. The whole "transition" concept, which is a bit more abstracted than the reality, should help debuggers here. Even if the debugger can just put its hands up and say "you did *something* non-deterministic in that frame", then at least we'll know our app is busted.

Share this post


Link to post
Share on other sites

- Root Signatures/Shader Constant management
Again really exciting stuff, but seems like a huge potential for issues, not to mention the engine now has to be acutely aware of how frequently the constants are changed and then map them appropriately.

You should already be doing that on modern D3D11/GL.
In Ogre 2.1 we use 4 buffer slots:

  1. One for per-pass data
  2. One to store all materials (up to 273 materials per buffer due to the 64kb per const buffer restriction)
  3. One to store per-draw data
  4. One tbuffer to store per-draw data (similar to 3. but it's a tbuffer which stores more data where not having the 64kb restriction is handy)

Of all those slots, we don't really change them. Even the per-draw parameters.

The only time we need rebind buffers are when:

  1. We've exceeded one of the per-draw buffers size (so we bind a new empty buffer)
  2. We are in a different pass (we need another per-pass buffer)
  3. We have more than 273 materials overall and previous draw referenced material #0 and the current one is referencing material #280 (so we need the switch the material buffer)
  4. We change to a shader that doesn't use these bindings (very rare).

Point 2 happens very infrequently. Point 3 & 4 can be minimized by sorting by state in a RenderQueue. Point 1 happens very infrequently too, and if you're on GCN the 64kb limit gets upgraded to 2GB limit, which means you wouldn't need to switch at all (and also solves point #3 entirely).

The entire bindings don't really change often and this property can already be exploited using DX11 and GL4. DX12/Vulkan just makes the interface thiner; that's all.

Edited by Matias Goldberg

Share this post


Link to post
Share on other sites

 One to store per-draw data 
Do you use some form of indexing into the UBO to fetch the data? I'm currently batching UBO updates (say, fit as many transforms, lights or materials as I can on one glBufferSubData call) and do a glUniform1i with an index, then index into the UBO to fetch the correct transform. This has the obvious limitation that I need one draw call per object being drawn to update the index uniform in between, but honestly I'm not sure how else I could do that. And AFAIK its also how its made in a nVidia presentation about batching updates.

 

Good thing is that I can do usually batches of 100 to 200 in one buffe rupdate call, bad thing is that I have equivalent number of draw and glUniform1i calls. Have in mind that I'm using OpenGL 3.3 here so no multi draw indirect stuff :D

 

And BTW, marking Promit's post as "Popular" is the understatement of the year (I never saw that badge before!). Thing hit like all the retweets and 300 comments on Reddit. You could sell Promit as internet traffic attractor if the site is low on cash :P

Share this post


Link to post
Share on other sites

I use baseInstance parameter from glDraw*BaseInstanceBaseVertex. gl_InstanceID will still be zero based, but you can use an instanced vertex element to overcome this problem (or use an extension that exposes an extra glsl variable with the value of baseInstance)

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By tanzanite7
      Trying to figure out why input attachment reads as black with NSight VS plugin - and failing.
      This is what i can see at the invocation point of the shader:
      * attachment is filled with correct data (just a clear to bright red in previous renderpass) and used by the fragment shader:
      // SPIR-V decompiled to GLSL #version 450 layout(binding = 0) uniform sampler2D accum; // originally: layout(input_attachment_index=0, set=0, binding=0) uniform subpassInput accum; layout(location = 0) out vec4 fbFinal; void main(){ fbFinal = vec4(texelFetch(accum, ivec2(gl_FragCoord.xy), 0).xyz + vec3(0.0, 0.0, 1.0), 1.0); // originally: fbFinal = vec4(subpassLoad(accum).rgb + vec3(0.0, 0.0, 1.0), 1.0); } * the resulting image is bright blue - instead of the expected bright purple (red+blue)
      How can this happen?
      'fbFinal' format is B8G8R8A8_UNORM and 'accum' format is R16G16B16A16_UNORM - ie. nothing weird.
    • By turanszkij
      Hi, running Vulkan with the latest SDK, validation layers enabled I just got the following warning:
      That is really strange, because in DX11 we can have 15 constant buffers per shader stage. And my device (Nvidia GTX 1050 is DX11 compatible of course) Did anyone else run into the same issue? How is it usually handled? I would prefer not enforcing less amount of CBs for the Vulkan device and be as closely compliant to DX11 as possible. Any idea what could be the reason behind this limitation?
    • By DiligentDev
      Hello everyone!
      For my engine, I want to be able to automatically generate pipeline layouts based on shader resources. That works perfectly well in D3D12 as shader resources are not required to specify descriptor tables, so I use reflection system and map different shader registers to tables as I need. In Vulkan, however, looks like descriptor sets must be specified in both SPIRV bytecode and when creating pipeline layout (why is that?). So it looks like I will have to mess around with the bytecode to tweak bindings and descriptor sets. I looked at SPIRV-cross but it seems like it can't emit SPIRV (funny enough). I also use glslang to compile GLSL to SPIRV and for some reason, binding decoration is only present for these resources that I explicitly defined.
      Does anybody know if there is a tool to change bindings in SPIRV bytecode?
       
    • By turanszkij
      Hi, I am having problems with all of my compute shaders in Vulkan. They are not writing to resources, even though there are no problems in the debug layer, every descriptor seem correctly bound in the graphics debugger, and the shaders definitely take time to execute. I understand that this is probably a bug in my implementation which is a bit complex, trying to emulate a DX11 style rendering API, but maybe I'm missing something trivial in my logic here? Currently I am doing these:
      Set descriptors, such as VK_DESCRIPTOR_TYPE_STORAGE_BUFFER for a read-write structured buffer (which is non formatted buffer) Bind descriptor table / validate correctness by debug layer Dispatch on graphics/compute queue, the same one that is feeding graphics rendering commands.  Insert memory barrier with both stagemasks as VK_PIPELINE_STAGE_ALL_COMMANDS_BIT and srcAccessMask VK_ACCESS_SHADER_WRITE_BIT to dstAccessMask VK_ACCESS_SHADER_READ_BIT Also insert buffer memory barrier just for the storage buffer I wanted to write Both my application behaves like the buffers are empty, and Nsight debugger also shows empty buffers (ssems like everything initialized to 0). Also, I tried the most trivial shader, writing value of 1 to the first element of uint buffer. Am I missing something trivial here? What could be an other way to debug this further?
       
    • By khawk
      LunarG has released new Vulkan SDKs for Windows, Linux, and macOS based on the 1.1.73 header. The new SDK includes:
      New extensions: VK_ANDROID_external_memory_android_hardware_buffer VK_EXT_descriptor_indexing VK_AMD_shader_core_properties VK_NV_shader_subgroup_partitioned Many bug fixes, increased validation coverage and accuracy improvements, and feature additions Developers can download the SDK from LunarXchange at https://vulkan.lunarg.com/sdk/home.

      View full story
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!