What are your opinions on DX12/Vulkan/Mantle?

- - · 2015-06-08T18:47:33

Apple has announced their upcoming OS X version El Capitan. Thought that it would be relevant to this thread, as it will bring support for their Metal API to OS X. I have no details, but apparently there has been claims of 50 % improvement in performance and 40 % reduction in CPU usage.

Graphics and GPU Programming Programming Vulkan

Started by Seabolt March 06, 2015 06:33 PM

120 comments, last by Ubik 8 years, 11 months ago

agleed

1,014

April 19, 2015 10:34 AM

This brilliant programmer reverse-engineered the Mantle API and wrote a "Hello Triangle" tutorial. Definitely worth checking out.

https://medium.com/@Overv/implementing-hello-triangle-in-mantle-4302450fbcd2

Interesting. It seems like a verbose effort to get a triangle up and running (although, from what I remember, my first experiences with DX11 and GL3+ first triangle were the same), but conceptually it's a lot easier to grasp than I feared.

I haven't been active here for that long. How do the forums look like when a new API is released to the masses? Especially a paradigm shift like DX12 and Mantle must be fun.

Klutzershy

1,697

May 14, 2015 02:51 AM

https://github.com/boreal-games/magma

Decided to write my own comprehensive Mantle headers and loading library so I can use Mantle while waiting for Vulkan. So far I've filled out mantle.h, mantleDbg.h, and mantleWsiWinExt.h.

There are a couple minor issues that I've discovered so far:

grWsiWinGetDisplays thinks the output array you pass into it is zero-length
The Windows WSI error code values haven't been determined for the aforementioned reason
I could only get the functions working for the 64-bit DLLs so far, seems to be a calling convention issue

Over the next few days I'll be writing the other extensions, like the DMA queue extension.

"So there you have it, ladies and gentlemen: the only API I’ve ever used that requires both elevated privileges and a dedicated user thread just to copy a block of structures from the kernel to the user." - Casey Muratori

boreal.aggydaggy.com

L. Spiro

25,818

May 14, 2015 03:08 AM

otherwise I'll just end up adding the same amount of abstraction that DX11 does already, kind of defeating the point.

Porting such that your Direct3D 12 implementation mimics that of Direct3D 11 will leave you will 50% the final performance of Direct3D 11 on Direct3D 12. In other words, definitely do not model your graphics pipeline around that of Direct3D 11 inside Direct3D 12.

As for the porting itself, I rather enjoy it. I am working on Metal right now, and you get a sense of pride each time you re-implement a part of the system in a stable and reliable manner while getting the same results as on every other platform. I like having a single interface that produces the same results reliably across many API’s.
Naturally, I am an engine programmer, so your mental tolerance for this kind of low-level handling may vary.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Hodgman

52,718

May 14, 2015 03:19 AM

Porting such that your Direct3D 12 implementation mimics that of Direct3D 11 will leave you will 50% the final performance of Direct3D 11 on Direct3D 12. In other words, definitely do not model your graphics pipeline around that of Direct3D 11 inside Direct3D 12.

It's obviously not ideal, but probably still better performance than just sticking with D3D11. MS showed off a naive port of Futuremark's engine, where there'd just shoe-horned D3D12 into their D3D11-oriented engine (by replacing their D3D11 redundant state removal / caching code with a D3D12 PSO hashmap) and still got ~2x the performance of the original D3D11 version.

. 22 Racing Series .

L. Spiro

25,818

May 14, 2015 03:33 AM

My numbers are coming from our naïve initial port of our engine for this demo (which did at least largely the same thing Futuremark claims they did):
http://www.rockpapershotgun.com/2015/05/05/square-enix-directx-12-tech-demo-witch-chapter-0-cry/

I don’t trust any company that says they got great performance just by shoehorning.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Alessio1989

4,648

May 14, 2015 10:41 AM

It is just me, or with this new APIs (at least with D3D12 and after you learned the new basis) writing code and implement things feels more natural and less artificial then older APIs with a higher level of abstraction? Yeah, it is not sill like writing general code that will run on the CPU only, but it feels kinda closer.. Or probably it is just because these API are shorter and you have to remember less calls and structures XD

"Recursion is the first step towards madness." - "Skegg?ld, Skálm?ld, Skildir ro Klofnir!"
Direct3D 12 quick reference: https://github.com/alessiot89/D3D12QuickRef/

vlj

1,071

May 17, 2015 12:41 AM

It is just me, or with this new APIs (at least with D3D12 and after you learned the new basis) writing code and implement things feels more natural and less artificial then older APIs with a higher level of abstraction?

I agree. I think that's because there is less member and structures ; at least with OpenGL there are often several way to have the same result with very subtle difference.

For instance to create a buffer there are 2 functions, glBufferData and glBufferStorage, which can be used to upload data, and you have 1 function to upload data to a specific range (glBufferSubData), you have 2 functions to map the data (glMapBuffer and glMapBufferRange), there are 2 ways to define VAO (one that binds underlying storage and another one that split the vertex description and the buffer mapping), and so on...

With DX12 creation and upload are completly decoupled, there is a single mapping function. There is also an upload function but I never used it so far. It's much nicer.

The only "not so natural" things that may come from DX12 is that you need to avoid modifying resources when they are used by a command list. I do this by having dual command allocator and constant buffer that are swapped when a frame is finished.

https://pinkieduck.wordpress.com/

L. Spiro

25,818

May 18, 2015 12:28 AM

I do this by having dual command allocator and constant buffer that are swapped when a frame is finished.

You are supposed to use signals.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Hodgman

52,718

May 18, 2015 02:32 AM

The only "not so natural" things that may come from DX12 is that you need to avoid modifying resources when they are used by a command list. I do this by having dual command allocator and constant buffer that are swapped when a frame is finished.

I find this to be a really natural change to, as it's just the result of being honest about parallel programming :)
All the old APIs have tried really hard to pretend that your grqphics code is single threaded - that your function calls have an immediate result.
In truth all CPU+GPU programming is "multithreaded", so an honest API should reflect that.
When using the old APIs, which hide these details, it's very easy to do horribly slow operations, like accidentally read from write-combined memory regions or synchronize the CPU/GPU to lock a resource. To use these old APIs effectively, you really had to actually know what was happening behind their lies and work with the reality -- e.g. this means that in D3D11 you should already be taking care to avoid modifying a resource that is in use! Lot's of engines already use double buffering or ring buffering on D3D11, which is much more natural now on D3D12.

I do this by having dual command allocator and constant buffer that are swapped when a frame is finished.
You are supposed to use signals.

Double buffering works fine, as long as you can guarantee that the gpu has finished the previous frame before the CPU begins using that buffer... Which requires the use of a signal, yeah :)

We use the same strategy on D3D9/D3D11 for transient vertex data - CPU writes into unsynchronized buffers (NOOVERWRITE flag), swapping which one is used per frame (or which range of the buffer is used each frame). The CPU then just has to wait on an event signalling that the GPU has finished the previous frame.

. 22 Racing Series .