• Advertisement
Sign in to follow this  

Vulkan What are your opinions on DX12/Vulkan/Mantle?

This topic is 992 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts


On the flip side, I am a bit concerned about sync issues. Sync between CPU and GPU (or even the GPU with itself) can lead to some really awful, hard-to-track down bugs. It's bad because you might think that you're doing it right, but then you make a small tweak to a shader and suddenly you have artifacts. It's hard enough dealing with that for one hardware configuration, so it's a little scary to imagine what could happen for PC games that have to run on everything. Hopefully there will be some good debugging/validation functionality available for tracking this down, otherwise we will probably end up with drivers automatically inserting sync points to prevent corruption (and/or removing unnecessary syncs for better performance). Either way, beginners are probably in for a rough time. 

Don't worry, a variety of shipping professional games will somehow make a complete mess of it in final build too rolleyes.gif

Edited by Promit

Share this post


Link to post
Share on other sites
Advertisement

On the flip side, I am a bit concerned about sync issues. Sync between CPU and GPU (or even the GPU with itself) can lead to some really awful, hard-to-track down bugs. It's bad because you might think that you're doing it right, but then you make a small tweak to a shader and suddenly you have artifacts. It's hard enough dealing with that for one hardware configuration, so it's a little scary to imagine what could happen for PC games that have to run on everything. Hopefully there will be some good debugging/validation functionality available for tracking this down, otherwise we will probably end up with drivers automatically inserting sync points to prevent corruption (and/or removing unnecessary syncs for better performance). Either way, beginners are probably in for a rough time. sad.png

 

New debugging tools are coming: https://channel9.msdn.com/Events/GDC/GDC-2015/Solve-the-Tough-Graphics-Problems-with-your-Game-Using-DirectX-Tools

Share this post


Link to post
Share on other sites

Slightly off-topic, but I'm starting a new engine before Vulkan or D3D12 are released. Any pointers on how I can prepare my rendering pipeline architecture so that when they are released, I can use them efficiently? I'm planning to start with D3D11 and OpenGL 4.5.

Share this post


Link to post
Share on other sites

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

Share this post


Link to post
Share on other sites

 

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

 

 

Try to "ask" access another time, it worked for me happy.png Anyway I have to recognize that the approbation process could be improved a lot.

Edited by Alessio1989

Share this post


Link to post
Share on other sites

 

 

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

 

 

Try to "ask" access another time, it worked for me happy.png Anyway I have to recognize that the approbation process could be improved a lot.

 

 

I have no idea what you mean by "try to 'ask' access another time".

Share this post


Link to post
Share on other sites

I've refrained from replying to this for a few days while I've been letting the information that's recently come out, and the implications of it, bounce around my head for a bit, but feel roundabout ready to do so now.

 

I'm really looking forward to programming in this style.

 

I'm aware and accept that there's going to be a substantial upfront investment required, but I think the payoff is going to be worth it.

 

I think a lot of code is going to get much cleaner as a result of all this.  A lot of really gross batching and state management/filtering code is just going to go away.  Things are going to get a lot simpler; once we tackle the challenge of managing (and being responsible for) GPU resources at a lower level, which I think is something that we're largely going to write once and then reuse across multiple projects, programming graphics is going to start being fun again.

 

I think it's going to start becoming a little like the old days of OpenGL; not quite at the level where you could just issue a glBegin/glEnd pair and start experimenting and seeing what kind of cool stuff you could do, but it will become a lot easier to just drop in new code without having to fret excessively about draw call counts, batching, state management, driver overhead, and "is this effect slow because it's slow, or is it slow because I've hit a slow path in the driver and I need to go back and rearchitect?"  That's really going to open up a lot of possibilities for people to start going nuts.

 

I think that the people who are going to have the hardest time of it are those who have the heaviest investment in what's become a traditional API usage over the past few years: lots of batching and instancing, in other words.  I have one project, using D3D11, that I think I would probably have to rewrite from scratch (I probably won't bother).  On the other hand, I have another, using a FrankenGL version, that I think will come over quite a bit more cleanly.  That's going to be quite cool and fun to do.

 

So unless I've got things badly wrong about all of this, I'm really stoked about the prospects.

Share this post


Link to post
Share on other sites

I will not go into explicit details (detailed information should be still under NDA), however the second feature level looks tailor-made for a certain particular hardware (guess what!). Moreover FL 12.1 do not requires some really interesting features (greater conservative rasterization tier, volume tiled resources and even resource binding tier 3) that you could expected to be mandatory supported by future hardware. In substance FL12.1 really brake the concept of feature level in my view, which was a sort of "barrier" that defined new hardware capabilities for upcoming hardware.

So you have feature level 12.0 for mainstream hardware, older feature levels for old/low-end hardware, and 12.1 for "a certain particular hardware" and most foreseeable future hardware. How is this a problem? Clearly, if 12.1 is so similar to 12.0, 12.0 is the main target and you won't be writing much special case code for 12.1.

Edited by LorenzoGatti

Share this post


Link to post
Share on other sites

 

 

 

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

 

 

Try to "ask" access another time, it worked for me happy.png Anyway I have to recognize that the approbation process could be improved a lot.

 

 

I have no idea what you mean by "try to 'ask' access another time".

 

 

Try to compile twice the form: http://aka.ms/dxeap

 

 

I will not go into explicit details (detailed information should be still under NDA), however the second feature level looks tailor-made for a certain particular hardware (guess what!). Moreover FL 12.1 do not requires some really interesting features (greater conservative rasterization tier, volume tiled resources and even resource binding tier 3) that you could expected to be mandatory supported by future hardware. In substance FL12.1 really brake the concept of feature level in my view, which was a sort of "barrier" that defined new hardware capabilities for upcoming hardware.

So you have feature level 12.0 for mainstream hardware, older feature levels for old/low-end hardware, and 12.1 for "a certain particular hardware" and most foreseeable future hardware. How is this a problem? Clearly, if 12.1 is so similar to 12.0, 12.0 is the main target and you won't be writing much special case code for 12.1.

 

 

It's not "a problem" per sé, I'm just saying I was expected to see a feature level for future hardware with more interesting and radical requirements that could have been FL 12.1 (eg: mandatory support for 3D tiled resouces,  higher tier of conservative rasterization and standard swizzle, tier 3 resource binding.. and what the hell, even PS stencil ref is still optional). FL 12.0 and 12.1 are quite identical except or ROVs (probably the most valuable requirement of FL12.1) and conservative rasterization tier 1 (which is useless but for anti-aliasing).

I'm not saying anything else. With D3D12 you can still target every feature level you want (even 10Level9s) and query for every single new feature hardware feature (e.g.: you can use ROVs on a FL 11.0 GPU if it is supported by the hardware/driver).

Edited by Alessio1989

Share this post


Link to post
Share on other sites

 

 

 

 

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

 

 

Try to "ask" access another time, it worked for me happy.png Anyway I have to recognize that the approbation process could be improved a lot.

 

 

I have no idea what you mean by "try to 'ask' access another time".

 

 

Try to compile twice the form: http://aka.ms/dxeap

 

I've submitted the form at least three times. At this point, I've given up.

Share this post


Link to post
Share on other sites

- Memory residency management. The presenters were talking along the lines of the developers being responsible for loading/unloading graphics resources from VRAM to System Memory whenever the loads are getting too high. This should be an edge case but it's still an entirely new engine feature.

Yeah it's going to be interesting to see what solutions different engines end up using here.
The simplest thing I can think of is to maintain a Set<Resource*> alongside every command buffer. Whenever you bind a resource, add it to the set. When submitting the command buffer, you can first use that set to notify windows of the VRAM regions that are required to be resident.

The fail case there is when that residency request is too big... As you're building the command buffer, you'd have to keep track of an estimate of the VRAM residency requirement, and if it gets too big, finish the current command buffer and start a new one.


- Secondary threads for resource loading/shader compilation. This is actually a really good thing that I'm excited for, but it does mean I need to change my render thread to start issuing new jobs and maintaining. It's necessary, and for the better good, but another task nonetheless.

If you're using D3D11, you can start working on it now.
If you're on GL, you can start doing it for buffers/textures via context resource sharing... But it's potentially a lot of GL-specific code that you're not going to need in your new engine.

- Root Signatures/Shader Constant management
Again really exciting stuff, but seems like a huge potential for issues, not to mention the engine now has to be acutely aware of how frequently the constants are changed and then map them appropriately.

Yeah if you can give frequency hints in your shader code, it might make your life easier.

When compiling a shader, I imagine you'd first try to fit all of its parameters into the root, and then fall back to other strategies if they don't fit.

The simplest strategy is putting everything required for your shader into a single big descriptor set, and having the root just contain the link to that set. I imagine a lot of people might start with something like that to begin with.

I don't have an update-frequency hinting feature, but my shader system does already group texture/buffer bindings together into "ResourceLists".
e.g. A DX11 shader might have material data in slots t0/t1/t2 and a shadowmap in t3. In the shader code, I declare a ResourceList containing the 3 material textures, and a 2nd ResourceList containing the shadowmap.
The user can't bind individual resources to my shader, they can only bind entire ResourceLists.
I imagine that on D3D12, these ResourceLists can actually just be DescriptorSets, and the root can just point out to them.
So, not describing frequency, but at least describing which bindings are updated together.

I'll also be adding in architecture for Compute Shaders for the first time, so I'm worried that I might be biting off too much at once.

Yeah it's haven't done a robust compute wrapper before either. I'm doing the same stateless job kinda thing as I've already done for graphics so far.
With the next generation APIs, there's a few extra hassles with compute -- after a dispatch, you almost always have to submit a barrier, so that the next draw/dispatch call will stall until the preceding compute shader is actually complete.

Same goes for passes that render to render-target actually. e.g. In a post-processing chain (where each draw reads the results from the previous one) you need barriers after each draw to transition from RT to texture, which had the effect of inserting these necessary stalls.

I think a lot of code is going to get much cleaner as a result of all this. A lot of really gross batching and state management/filtering code is just going to go away.

For simple ports, you might be able to leverage that ugly code :D
In the D3D12 preview from last year, they mentioned that when porting 3DMark, they replaced their traditional state-caching code with a PSO/bundle cache, and still got more than a 2x performance boost over DX11.

I think that the people who are going to have the hardest time of it are those who have the heaviest investment in what's become a traditional API usage over the past few years: lots of batching and instancing, in other words.

Stuff that's designed for traditional batching will probably be very well suited to the new "bundle" API.

I am a bit concerned about sync issues. Sync between CPU and GPU (or even the GPU with itself) can lead to some really awful, hard-to-track down bugs. It's bad because you might think that you're doing it right, but then you make a small tweak to a shader and suddenly you have artifacts.

Here's hoping the debuggers are able to detect sync errors. The whole "transition" concept, which is a bit more abstracted than the reality, should help debuggers here. Even if the debugger can just put its hands up and say "you did *something* non-deterministic in that frame", then at least we'll know our app is busted.

Share this post


Link to post
Share on other sites

- Root Signatures/Shader Constant management
Again really exciting stuff, but seems like a huge potential for issues, not to mention the engine now has to be acutely aware of how frequently the constants are changed and then map them appropriately.

You should already be doing that on modern D3D11/GL.
In Ogre 2.1 we use 4 buffer slots:

  1. One for per-pass data
  2. One to store all materials (up to 273 materials per buffer due to the 64kb per const buffer restriction)
  3. One to store per-draw data
  4. One tbuffer to store per-draw data (similar to 3. but it's a tbuffer which stores more data where not having the 64kb restriction is handy)

Of all those slots, we don't really change them. Even the per-draw parameters.

The only time we need rebind buffers are when:

  1. We've exceeded one of the per-draw buffers size (so we bind a new empty buffer)
  2. We are in a different pass (we need another per-pass buffer)
  3. We have more than 273 materials overall and previous draw referenced material #0 and the current one is referencing material #280 (so we need the switch the material buffer)
  4. We change to a shader that doesn't use these bindings (very rare).

Point 2 happens very infrequently. Point 3 & 4 can be minimized by sorting by state in a RenderQueue. Point 1 happens very infrequently too, and if you're on GCN the 64kb limit gets upgraded to 2GB limit, which means you wouldn't need to switch at all (and also solves point #3 entirely).

The entire bindings don't really change often and this property can already be exploited using DX11 and GL4. DX12/Vulkan just makes the interface thiner; that's all.

Edited by Matias Goldberg

Share this post


Link to post
Share on other sites

 One to store per-draw data 
Do you use some form of indexing into the UBO to fetch the data? I'm currently batching UBO updates (say, fit as many transforms, lights or materials as I can on one glBufferSubData call) and do a glUniform1i with an index, then index into the UBO to fetch the correct transform. This has the obvious limitation that I need one draw call per object being drawn to update the index uniform in between, but honestly I'm not sure how else I could do that. And AFAIK its also how its made in a nVidia presentation about batching updates.

 

Good thing is that I can do usually batches of 100 to 200 in one buffe rupdate call, bad thing is that I have equivalent number of draw and glUniform1i calls. Have in mind that I'm using OpenGL 3.3 here so no multi draw indirect stuff :D

 

And BTW, marking Promit's post as "Popular" is the understatement of the year (I never saw that badge before!). Thing hit like all the retweets and 300 comments on Reddit. You could sell Promit as internet traffic attractor if the site is low on cash :P

Share this post


Link to post
Share on other sites

I use baseInstance parameter from glDraw*BaseInstanceBaseVertex. gl_InstanceID will still be zero based, but you can use an instanced vertex element to overcome this problem (or use an extension that exposes an extra glsl variable with the value of baseInstance)

Share this post


Link to post
Share on other sites

 

 

You should already be doing that on modern D3D11/GL.

 

That's true, and I am ashamed to say I stuck too closely to the DX9 port of my engine where I didn't have nearly as much register space and needed to swap things around on a per-draw basis at times.

 

Scrapping all of that now though and moving forward with DX11 and OGL 4.x and porting in DX12 and Vulkan when they are more public. 

You guys have assuaged most of my fears about the ports though :)

 

Share this post


Link to post
Share on other sites

Edit: Said something stupid, sorry about that :)

Edited by AlexPol

Share this post


Link to post
Share on other sites

 

- Root Signatures/Shader Constant management
Again really exciting stuff, but seems like a huge potential for issues, not to mention the engine now has to be acutely aware of how frequently the constants are changed and then map them appropriately.

You should already be doing that on modern D3D11/GL.
In Ogre 2.1 we use 4 buffer slots:

  1. One for per-pass data
  2. One to store all materials (up to 273 materials per buffer due to the 64kb per const buffer restriction)
  3. One to store per-draw data
  4. One tbuffer to store per-draw data (similar to 3. but it's a tbuffer which stores more data where not having the 64kb restriction is handy)

Of all those slots, we don't really change them. Even the per-draw parameters.

The only time we need rebind buffers are when:

  1. We've exceeded one of the per-draw buffers size (so we bind a new empty buffer)
  2. We are in a different pass (we need another per-pass buffer)
  3. We have more than 273 materials overall and previous draw referenced material #0 and the current one is referencing material #280 (so we need the switch the material buffer)
  4. We change to a shader that doesn't use these bindings (very rare).

Point 2 happens very infrequently. Point 3 & 4 can be minimized by sorting by state in a RenderQueue. Point 1 happens very infrequently too, and if you're on GCN the 64kb limit gets upgraded to 2GB limit, which means you wouldn't need to switch at all (and also solves point #3 entirely).

The entire bindings don't really change often and this property can already be exploited using DX11 and GL4. DX12/Vulkan just makes the interface thiner; that's all.

 

 

How are you implementing your constant buffers? From what you've written as your #3b, it sounds like you're packing multiple materials'/objects' constants into a single large constant buffer, and perhaps indexing out of it in your draws? IIRC, that's supported only in D3D11.1+, as there is no *SSetConstantBuffer function that takes offsets until then.

 

Otherwise, if you aren't using constant buffers with offsets, how are you avoiding having to set things like object transforms and the like? If you are, how are you handling targets below D3D11.1?

Edited by Ameise

Share this post


Link to post
Share on other sites
use baseInstance parameter from glDraw*BaseInstanceBaseVertex. gl_InstanceID will still be zero based, but you can use an instanced vertex element to overcome this problem (or use an extension that exposes an extra glsl variable with the value of baseInstance)

And what if you're drawing two different meshes? ie, not instancing a single mesh.

 

How are you implementing your constant buffers? From what you've written as your #3b, it sounds like you're packing multiple materials'/objects' constants into a single large constant buffer, and perhaps indexing out of it in your draws? IIRC, that's supported only in D3D11.1+, as there is no *SSetConstantBuffer function that takes offsets until then.

I have no idea about D3D11, but prolly isn't even necessary. Just update the entire buffer in one call. Buffer is defined as an array of structs, index into that to fetch the one that corresponds to the current thing being drawn.

Edited by TheChubu

Share this post


Link to post
Share on other sites
And what if you're drawing two different meshes? ie, not instancing a single mesh.

 

1 is a valid value for the instance count.

Share this post


Link to post
Share on other sites

1 is a valid value for the instance count.
Of course but the idea is to batch up data inside the constant/uniform buffers and use the instance ID for indexing. No sense doing it if you can only index one thing (ie, you end up what I am doing, one glDraw and glUniform1i call per mesh drawn).

Share this post


Link to post
Share on other sites

 

Of course but the idea is to batch up data inside the constant/uniform buffers and use the instance ID for indexing. No sense doing it if you can only index one thing (ie, you end up what I am doing, one glDraw and glUniform1i call per mesh drawn).

 

 

Id comes from Instance data if I understand correctly and not gl_InstanceID. Id is different for two different instance, and a different mesh is a different instance.

 

Think of this as 2 buffers, one is instance buffer which contains only ID, the other is vertex buffer.

A first draw call would use 10 instance from the instance buffer, starting from BaseInstance 0.
A second draw call would use 1 instance from the instance buffer, starting from BaseInstance 10.

 

So if in your instance buffer you put Id in ascending order for instance, all the ID will be different.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Tags

  • Advertisement
  • Popular Now

  • Similar Content

    • By turanszkij
      Hi, right now building my engine in visual studio involves a shader compiling step to build hlsl 5.0 shaders. I have a separate project which only includes shader sources and the compiler is the visual studio integrated fxc compiler. I like this method because on any PC that has visual studio installed, I can just download the solution from GitHub and everything just builds without additional dependencies and using the latest version of the compiler. I also like it because the shaders are included in the solution explorer and easy to browse, and double-click to open (opening files can be really a pain in the ass in visual studio run in admin mode). Also it's nice that VS displays the build output/errors in the output window.
      But now I have the HLSL 6 compiler and want to build hlsl 6 shaders as well (and as I understand I can also compile vulkan compatible shaders with it later). Any idea how to do this nicely? I want only a single project containing shader sources, like it is now, but build them for different targets. I guess adding different building projects would be the way to go that reference the shader source project? But how would they differentiate from shader type of the sources (eg. pixel shader, compute shader,etc.)? Now the shader building project contains for each shader the shader type, how can other building projects reference that?
      Anyone with some experience in this?
    • By mark_braga
      I am working on a compute shader in Vulkan which does some image processing and has 1024 * 5=5120 loop iterations (5 outer and 1024 inner)
      If I do this, I get a device lost error after the succeeding call to queueSubmit after the image processing queueSubmit
      // Image processing dispatch submit(); waitForFence(); // All calls to submit after this will give the device lost error If I lower the number of loops from 1024 to 256 => 5 * 256 = 1280 loop iterations, it works fine. The shader does some pretty heavy arithmetic operations but the number of resources bound is 3 (one SRV, one UAV, and one sampler). The thread group size is x=16 ,y=16,z=1
      So my question - Is there a hardware limit to the number of loop executions/number of instructions per shader?
    • By AxeGuywithanAxe
      I wanted to see how others are currently handling descriptor heap updates and management.
      I've read a few articles and there tends to be three major strategies :
      1 ) You split up descriptor heaps per shader stage ( i.e one for vertex shader , pixel , hull, etc)
      2) You have one descriptor heap for an entire pipeline
      3) You split up descriptor heaps for update each update frequency (i.e EResourceSet_PerInstance , EResourceSet_PerPass , EResourceSet_PerMaterial, etc)
      The benefits of the first two approaches is that it makes it easier to port current code, and descriptor / resource descriptor management and updating tends to be easier to manage, but it seems to be not as efficient.
      The benefits of the third approach seems to be that it's the most efficient because you only manage and update objects when they change.
    • By khawk
      CRYENGINE has released their latest version with support for Vulkan, Substance integration, and more. Learn more from their announcement and check out the highlights below.
      Substance Integration
      CRYENGINE uses Substance internally in their workflow and have released a direct integration.
       
      Vulkan API
      A beta version of the Vulkan renderer to accompany the DX12 implementation. Vulkan is a cross-platform 3D graphics and compute API that enables developers to have high-performance real-time 3D graphics applications with balanced CPU/GPU usage. 

       
      Entity Components
      CRYENGINE has addressed a longstanding issue with game code managing entities within the level. The Entity Component System adds a modular and intuitive method to construct games.
      And More
      View the full release details at the CRYENGINE announcement here.

      View full story
    • By khawk
      The AMD GPU Open website has posted a brief tutorial providing an overview of objects in the Vulkan API. From the article:
      Read more at http://gpuopen.com/understanding-vulkan-objects/.


      View full story
  • Advertisement