Jump to content

View more

Image of the Day

Boxes as reward for our ranking mode. ヾ(☆▽☆)
#indiedev #gamedev #gameart #screenshotsaturday https://t.co/ALF1InmM7K
IOTD | Top Screenshots

The latest, straight to your Inbox.

Subscribe to GameDev.net Direct to receive the latest updates and exclusive content.


Sign up now

What are your opinions on DX12/Vulkan/Mantle?

4: Adsense

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.


  • You cannot reply to this topic
121 replies to this topic

#21 CirdanValen   Members   

371
Like
0Likes
Like

Posted 08 March 2015 - 11:40 AM

Many years ago, I briefly worked at NVIDIA on the DirectX driver team (internship). This is Vista era, when a lot of people were busy with the DX10 transition, the hardware transition, and the OS/driver model transition. My job was to get games that were broken on Vista, dismantle them from the driver level, and figure out why they were broken.

...

 
That was very interesting, thanks for that!
 

A descriptor is a texture-view, buffer-view, sampler, or a pointer
A descriptor set is an array/table/struct of descriptors.
A descriptor pool is basically a large block of memory that acts as a memory allocator for descriptor sets.

So yes, it's very much like bindless handles, but instead of them being handles, they're the actual guts of a texture-view, or an actual sampler structure, etc...
 
Say you've got a HLSL shader with:

...


Also very informative, I'm starting to understand how to think in the "new way".


I'm looking forward to the new APIs (specifically Vulkan). Not only will we get better game performance, but it seems like it will be less of a headache given what Promit said. Less black box under-the-hood state management, the easier it will be to write and debug.

#22 Seabolt   Members   

781
Like
0Likes
Like

Posted 09 March 2015 - 10:40 AM

I feel like I need to defend myself a little bit, I am a professional graphics programmer and I've written renderers on Xbox 360 and Wii-U, along with my own side project that can render in DirectX9/11/OpenGL 3.x. I've written a multi-threaded renderer before and already have an idea on how I plan to tackle the new APIs. My initial worry was that in order to get a minimally viable renderer may be extremely painful since my multi-threaded renderer will need to be restructured to create it's own threads. Luckily it already does something along the lines of PSOs since the game logic is sending it's own render commands, I should be able to encapsulate them well enough.

Big concerns for me, (and these are initial thoughts from a GDC presentation on DX12) are:

- Memory residency management. The presenters were talking along the lines of the developers being responsible for loading/unloading graphics resources from VRAM to System Memory whenever the loads are getting too high. This should be an edge case but it's still an entirely new engine feature.

- Secondary threads for resource loading/shader compilation. This is actually a really good thing that I'm excited for, but it does mean I need to change my render thread to start issuing new jobs and maintaining. It's necessary, and for the better good, but another task nonetheless.

 

- Root Signatures/Shader Constant management

Again really exciting stuff, but seems like a huge potential for issues, not to mention the engine now has to be acutely aware of how frequently the constants are changed and then map them appropriately.

 

@Promit: Thanks for the insight, that makes me feel better about the potential gains to be made and helps to assuage my fears that adding my own abstractions between the game thread and the render thread won't defeat the purpose of the API.

@Hodgman: I'm actually writing a new engine for the express purpose of supporting the new APIs, and my previous engine also had stateless rendering. (Kinda, the game thread would just append unique commands for whatever states it wanted and those commands would be filtered by a cache before being dispatched to the rendering thread, with the new threading changes I'll likely abandon this approach so that I can add rendering commands from any thread.) I do like your idea of having specific render passes, that would allow me to reuse render commands for shadowing vs shading passes and I'll be able to better generate my command lists/bundles.

 

I'll also be adding in architecture for Compute Shaders for the first time, so I'm worried that I might be biting off too much at once.


Perception is when one imagination clashes with another

#23 Ohforf sake   Members   

2052
Like
0Likes
Like

Posted 09 March 2015 - 11:18 AM

Many years ago, I briefly worked at NVIDIA on the DirectX driver team ...


Congrats for ending up on John Carmack's twitter feed!

#24 TheChubu   Members   

9323
Like
8Likes
Like

Posted 09 March 2015 - 11:35 AM

*
POPULAR


Congrats for ending up on John Carmack's twitter feed!
LMAO the clickbait title they guy gave it hahaha

 

He could have gone all the way:

 

"Nvidia's ex employee tells you 10 things you wouldn't believe about the company!"

"5 things nvidia isn't telling you about their drivers!"

"Weird facts from nvidia employee, shocking!"


"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

 

My journals: dustArtemis ECS framework and Making a Terrain Generator


#25 MJP   Moderators   

19420
Like
9Likes
Like

Posted 10 March 2015 - 12:10 AM

*
POPULAR

Overall, I think it's great to see renewed focus on lower CPU overhead. It's sometimes ridiculous just how much more efficient it can be to build command buffers on console vs. PC, and the new API's look poised to close that gap a bit. Mobile in particular is the real winner here: they desperately need an API is more efficient so that they can save more battery power, and/or cram in more draw calls (from what I've heard, draw call counts in the 100's is pretty common on mobile). So far I haven't seen any major screw-ups from the GL camp, so if they keep going this way they have a real shot at dislodging D3D as the defacto API for Windows development. However, I think I still have more trust in MS to actually deliver exactly what they've promised, so I will reserve full judgement until Vulkan is actually released.

 

Personally, I'm mostly just looking forward to having a PC version of our engine that's more inline with our console version. Bindless is just fantastic in every way, and it's really painful having to go back to the old "bind a texture at this slot" stuff when working on Windows (not to mention it makes our code messy having to support both). Manual memory management and synchronization can also be really powerful, not to mention more efficient. Async compute is also huge if you use it correctly, and hopefully there will be much more discussion about it now that it will be exposed in public API's.

 

On the flip side, I am a bit concerned about sync issues. Sync between CPU and GPU (or even the GPU with itself) can lead to some really awful, hard-to-track down bugs. It's bad because you might think that you're doing it right, but then you make a small tweak to a shader and suddenly you have artifacts. It's hard enough dealing with that for one hardware configuration, so it's a little scary to imagine what could happen for PC games that have to run on everything. Hopefully there will be some good debugging/validation functionality available for tracking this down, otherwise we will probably end up with drivers automatically inserting sync points to prevent corruption (and/or removing unnecessary syncs for better performance). Either way, beginners are probably in for a rough time. sad.png


Edited by MJP, 10 March 2015 - 12:11 AM.


#26 Promit   Senior Moderators   

13097
Like
2Likes
Like

Posted 10 March 2015 - 12:22 AM


On the flip side, I am a bit concerned about sync issues. Sync between CPU and GPU (or even the GPU with itself) can lead to some really awful, hard-to-track down bugs. It's bad because you might think that you're doing it right, but then you make a small tweak to a shader and suddenly you have artifacts. It's hard enough dealing with that for one hardware configuration, so it's a little scary to imagine what could happen for PC games that have to run on everything. Hopefully there will be some good debugging/validation functionality available for tracking this down, otherwise we will probably end up with drivers automatically inserting sync points to prevent corruption (and/or removing unnecessary syncs for better performance). Either way, beginners are probably in for a rough time. 

Don't worry, a variety of shipping professional games will somehow make a complete mess of it in final build too rolleyes.gif


Edited by Promit, 10 March 2015 - 12:22 AM.

SlimDX | Shark Eaters for iOS | Ventspace Blog | Twitter | Proud supporter of diversity and inclusiveness in game development

#27 Alessio1989   Members   

4582
Like
1Likes
Like

Posted 10 March 2015 - 02:06 AM

On the flip side, I am a bit concerned about sync issues. Sync between CPU and GPU (or even the GPU with itself) can lead to some really awful, hard-to-track down bugs. It's bad because you might think that you're doing it right, but then you make a small tweak to a shader and suddenly you have artifacts. It's hard enough dealing with that for one hardware configuration, so it's a little scary to imagine what could happen for PC games that have to run on everything. Hopefully there will be some good debugging/validation functionality available for tracking this down, otherwise we will probably end up with drivers automatically inserting sync points to prevent corruption (and/or removing unnecessary syncs for better performance). Either way, beginners are probably in for a rough time. sad.png

 

New debugging tools are coming: https://channel9.msdn.com/Events/GDC/GDC-2015/Solve-the-Tough-Graphics-Problems-with-your-Game-Using-DirectX-Tools


"Recursion is the first step towards madness." - "Skeggǫld, Skálmǫld, Skildir ro Klofnir!"
Direct3D 12 quick reference: https://github.com/alessiot89/D3D12QuickRef/

#28 bioglaze   Members   

1584
Like
0Likes
Like

Posted 10 March 2015 - 03:54 AM

Slightly off-topic, but I'm starting a new engine before Vulkan or D3D12 are released. Any pointers on how I can prepare my rendering pipeline architecture so that when they are released, I can use them efficiently? I'm planning to start with D3D11 and OpenGL 4.5.



#29 Alessio1989   Members   

4582
Like
1Likes
Like

Posted 10 March 2015 - 04:05 AM

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.


"Recursion is the first step towards madness." - "Skeggǫld, Skálmǫld, Skildir ro Klofnir!"
Direct3D 12 quick reference: https://github.com/alessiot89/D3D12QuickRef/

#30 Ameise   Members   

1148
Like
0Likes
Like

Posted 11 March 2015 - 03:27 PM

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).



#31 Alessio1989   Members   

4582
Like
0Likes
Like

Posted 11 March 2015 - 03:43 PM

 

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

 

 

Try to "ask" access another time, it worked for me happy.png Anyway I have to recognize that the approbation process could be improved a lot.


Edited by Alessio1989, 11 March 2015 - 03:44 PM.

"Recursion is the first step towards madness." - "Skeggǫld, Skálmǫld, Skildir ro Klofnir!"
Direct3D 12 quick reference: https://github.com/alessiot89/D3D12QuickRef/

#32 Ameise   Members   

1148
Like
0Likes
Like

Posted 11 March 2015 - 06:15 PM

 

 

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

 

 

Try to "ask" access another time, it worked for me happy.png Anyway I have to recognize that the approbation process could be improved a lot.

 

 

I have no idea what you mean by "try to 'ask' access another time".



#33 mhagain   Members   

13154
Like
1Likes
Like

Posted 11 March 2015 - 07:00 PM

I've refrained from replying to this for a few days while I've been letting the information that's recently come out, and the implications of it, bounce around my head for a bit, but feel roundabout ready to do so now.

 

I'm really looking forward to programming in this style.

 

I'm aware and accept that there's going to be a substantial upfront investment required, but I think the payoff is going to be worth it.

 

I think a lot of code is going to get much cleaner as a result of all this.  A lot of really gross batching and state management/filtering code is just going to go away.  Things are going to get a lot simpler; once we tackle the challenge of managing (and being responsible for) GPU resources at a lower level, which I think is something that we're largely going to write once and then reuse across multiple projects, programming graphics is going to start being fun again.

 

I think it's going to start becoming a little like the old days of OpenGL; not quite at the level where you could just issue a glBegin/glEnd pair and start experimenting and seeing what kind of cool stuff you could do, but it will become a lot easier to just drop in new code without having to fret excessively about draw call counts, batching, state management, driver overhead, and "is this effect slow because it's slow, or is it slow because I've hit a slow path in the driver and I need to go back and rearchitect?"  That's really going to open up a lot of possibilities for people to start going nuts.

 

I think that the people who are going to have the hardest time of it are those who have the heaviest investment in what's become a traditional API usage over the past few years: lots of batching and instancing, in other words.  I have one project, using D3D11, that I think I would probably have to rewrite from scratch (I probably won't bother).  On the other hand, I have another, using a FrankenGL version, that I think will come over quite a bit more cleanly.  That's going to be quite cool and fun to do.

 

So unless I've got things badly wrong about all of this, I'm really stoked about the prospects.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#34 LorenzoGatti   Members   

4364
Like
0Likes
Like

Posted 12 March 2015 - 04:22 AM

I will not go into explicit details (detailed information should be still under NDA), however the second feature level looks tailor-made for a certain particular hardware (guess what!). Moreover FL 12.1 do not requires some really interesting features (greater conservative rasterization tier, volume tiled resources and even resource binding tier 3) that you could expected to be mandatory supported by future hardware. In substance FL12.1 really brake the concept of feature level in my view, which was a sort of "barrier" that defined new hardware capabilities for upcoming hardware.

So you have feature level 12.0 for mainstream hardware, older feature levels for old/low-end hardware, and 12.1 for "a certain particular hardware" and most foreseeable future hardware. How is this a problem? Clearly, if 12.1 is so similar to 12.0, 12.0 is the main target and you won't be writing much special case code for 12.1.


Edited by LorenzoGatti, 12 March 2015 - 04:22 AM.

Omae Wa Mou Shindeiru


#35 Alessio1989   Members   

4582
Like
0Likes
Like

Posted 12 March 2015 - 04:48 AM

 

 

 

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

 

 

Try to "ask" access another time, it worked for me happy.png Anyway I have to recognize that the approbation process could be improved a lot.

 

 

I have no idea what you mean by "try to 'ask' access another time".

 

 

Try to compile twice the form: http://aka.ms/dxeap

 

 

I will not go into explicit details (detailed information should be still under NDA), however the second feature level looks tailor-made for a certain particular hardware (guess what!). Moreover FL 12.1 do not requires some really interesting features (greater conservative rasterization tier, volume tiled resources and even resource binding tier 3) that you could expected to be mandatory supported by future hardware. In substance FL12.1 really brake the concept of feature level in my view, which was a sort of "barrier" that defined new hardware capabilities for upcoming hardware.

So you have feature level 12.0 for mainstream hardware, older feature levels for old/low-end hardware, and 12.1 for "a certain particular hardware" and most foreseeable future hardware. How is this a problem? Clearly, if 12.1 is so similar to 12.0, 12.0 is the main target and you won't be writing much special case code for 12.1.

 

 

It's not "a problem" per sé, I'm just saying I was expected to see a feature level for future hardware with more interesting and radical requirements that could have been FL 12.1 (eg: mandatory support for 3D tiled resouces,  higher tier of conservative rasterization and standard swizzle, tier 3 resource binding.. and what the hell, even PS stencil ref is still optional). FL 12.0 and 12.1 are quite identical except or ROVs (probably the most valuable requirement of FL12.1) and conservative rasterization tier 1 (which is useless but for anti-aliasing).

I'm not saying anything else. With D3D12 you can still target every feature level you want (even 10Level9s) and query for every single new feature hardware feature (e.g.: you can use ROVs on a FL 11.0 GPU if it is supported by the hardware/driver).


Edited by Alessio1989, 12 March 2015 - 04:49 AM.

"Recursion is the first step towards madness." - "Skeggǫld, Skálmǫld, Skildir ro Klofnir!"
Direct3D 12 quick reference: https://github.com/alessiot89/D3D12QuickRef/

#36 Ameise   Members   

1148
Like
0Likes
Like

Posted 12 March 2015 - 09:21 AM

 

 

 

 

If you sign-up in the DX12 EAP you can access toe source code of the UE4 DX12 implementation.

 

It isn't 'signing up'. It's applying. You have to be approved (I've yet to be approved, sadly).

 

 

Try to "ask" access another time, it worked for me happy.png Anyway I have to recognize that the approbation process could be improved a lot.

 

 

I have no idea what you mean by "try to 'ask' access another time".

 

 

Try to compile twice the form: http://aka.ms/dxeap

 

I've submitted the form at least three times. At this point, I've given up.



#37 Hodgman   Moderators   

50612
Like
2Likes
Like

Posted 12 March 2015 - 06:47 PM

- Memory residency management. The presenters were talking along the lines of the developers being responsible for loading/unloading graphics resources from VRAM to System Memory whenever the loads are getting too high. This should be an edge case but it's still an entirely new engine feature.

Yeah it's going to be interesting to see what solutions different engines end up using here.
The simplest thing I can think of is to maintain a Set<Resource*> alongside every command buffer. Whenever you bind a resource, add it to the set. When submitting the command buffer, you can first use that set to notify windows of the VRAM regions that are required to be resident.

The fail case there is when that residency request is too big... As you're building the command buffer, you'd have to keep track of an estimate of the VRAM residency requirement, and if it gets too big, finish the current command buffer and start a new one.


- Secondary threads for resource loading/shader compilation. This is actually a really good thing that I'm excited for, but it does mean I need to change my render thread to start issuing new jobs and maintaining. It's necessary, and for the better good, but another task nonetheless.

If you're using D3D11, you can start working on it now.
If you're on GL, you can start doing it for buffers/textures via context resource sharing... But it's potentially a lot of GL-specific code that you're not going to need in your new engine.

- Root Signatures/Shader Constant management
Again really exciting stuff, but seems like a huge potential for issues, not to mention the engine now has to be acutely aware of how frequently the constants are changed and then map them appropriately.

Yeah if you can give frequency hints in your shader code, it might make your life easier.

When compiling a shader, I imagine you'd first try to fit all of its parameters into the root, and then fall back to other strategies if they don't fit.

The simplest strategy is putting everything required for your shader into a single big descriptor set, and having the root just contain the link to that set. I imagine a lot of people might start with something like that to begin with.

I don't have an update-frequency hinting feature, but my shader system does already group texture/buffer bindings together into "ResourceLists".
e.g. A DX11 shader might have material data in slots t0/t1/t2 and a shadowmap in t3. In the shader code, I declare a ResourceList containing the 3 material textures, and a 2nd ResourceList containing the shadowmap.
The user can't bind individual resources to my shader, they can only bind entire ResourceLists.
I imagine that on D3D12, these ResourceLists can actually just be DescriptorSets, and the root can just point out to them.
So, not describing frequency, but at least describing which bindings are updated together.

I'll also be adding in architecture for Compute Shaders for the first time, so I'm worried that I might be biting off too much at once.

Yeah it's haven't done a robust compute wrapper before either. I'm doing the same stateless job kinda thing as I've already done for graphics so far.
With the next generation APIs, there's a few extra hassles with compute -- after a dispatch, you almost always have to submit a barrier, so that the next draw/dispatch call will stall until the preceding compute shader is actually complete.

Same goes for passes that render to render-target actually. e.g. In a post-processing chain (where each draw reads the results from the previous one) you need barriers after each draw to transition from RT to texture, which had the effect of inserting these necessary stalls.

I think a lot of code is going to get much cleaner as a result of all this. A lot of really gross batching and state management/filtering code is just going to go away.

For simple ports, you might be able to leverage that ugly code :D
In the D3D12 preview from last year, they mentioned that when porting 3DMark, they replaced their traditional state-caching code with a PSO/bundle cache, and still got more than a 2x performance boost over DX11.

I think that the people who are going to have the hardest time of it are those who have the heaviest investment in what's become a traditional API usage over the past few years: lots of batching and instancing, in other words.

Stuff that's designed for traditional batching will probably be very well suited to the new "bundle" API.

I am a bit concerned about sync issues. Sync between CPU and GPU (or even the GPU with itself) can lead to some really awful, hard-to-track down bugs. It's bad because you might think that you're doing it right, but then you make a small tweak to a shader and suddenly you have artifacts.

Here's hoping the debuggers are able to detect sync errors. The whole "transition" concept, which is a bit more abstracted than the reality, should help debuggers here. Even if the debugger can just put its hands up and say "you did *something* non-deterministic in that frame", then at least we'll know our app is busted.

#38 Matias Goldberg   Members   

9472
Like
3Likes
Like

Posted 12 March 2015 - 08:12 PM

- Root Signatures/Shader Constant management
Again really exciting stuff, but seems like a huge potential for issues, not to mention the engine now has to be acutely aware of how frequently the constants are changed and then map them appropriately.

You should already be doing that on modern D3D11/GL.
In Ogre 2.1 we use 4 buffer slots:

  1. One for per-pass data
  2. One to store all materials (up to 273 materials per buffer due to the 64kb per const buffer restriction)
  3. One to store per-draw data
  4. One tbuffer to store per-draw data (similar to 3. but it's a tbuffer which stores more data where not having the 64kb restriction is handy)

Of all those slots, we don't really change them. Even the per-draw parameters.

The only time we need rebind buffers are when:

  1. We've exceeded one of the per-draw buffers size (so we bind a new empty buffer)
  2. We are in a different pass (we need another per-pass buffer)
  3. We have more than 273 materials overall and previous draw referenced material #0 and the current one is referencing material #280 (so we need the switch the material buffer)
  4. We change to a shader that doesn't use these bindings (very rare).

Point 2 happens very infrequently. Point 3 & 4 can be minimized by sorting by state in a RenderQueue. Point 1 happens very infrequently too, and if you're on GCN the 64kb limit gets upgraded to 2GB limit, which means you wouldn't need to switch at all (and also solves point #3 entirely).

The entire bindings don't really change often and this property can already be exploited using DX11 and GL4. DX12/Vulkan just makes the interface thiner; that's all.


Edited by Matias Goldberg, 12 March 2015 - 08:14 PM.


#39 TheChubu   Members   

9323
Like
1Likes
Like

Posted 12 March 2015 - 08:57 PM

 One to store per-draw data 
Do you use some form of indexing into the UBO to fetch the data? I'm currently batching UBO updates (say, fit as many transforms, lights or materials as I can on one glBufferSubData call) and do a glUniform1i with an index, then index into the UBO to fetch the correct transform. This has the obvious limitation that I need one draw call per object being drawn to update the index uniform in between, but honestly I'm not sure how else I could do that. And AFAIK its also how its made in a nVidia presentation about batching updates.

 

Good thing is that I can do usually batches of 100 to 200 in one buffe rupdate call, bad thing is that I have equivalent number of draw and glUniform1i calls. Have in mind that I'm using OpenGL 3.3 here so no multi draw indirect stuff :D

 

And BTW, marking Promit's post as "Popular" is the understatement of the year (I never saw that badge before!). Thing hit like all the retweets and 300 comments on Reddit. You could sell Promit as internet traffic attractor if the site is low on cash :P


"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

 

My journals: dustArtemis ECS framework and Making a Terrain Generator


#40 Matias Goldberg   Members   

9472
Like
2Likes
Like

Posted 13 March 2015 - 07:28 AM

I use baseInstance parameter from glDraw*BaseInstanceBaseVertex. gl_InstanceID will still be zero based, but you can use an instanced vertex element to overcome this problem (or use an extension that exposes an extra glsl variable with the value of baseInstance)






Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.