Preparing for Mantle

Started by
16 comments, last by Krypt0n 10 years ago

What are your thoughts here, what should we expect, this could be a good chance to get ahead at the start, any resources, ideas etc..

I really don't know where to start and how big a change it will be from GL/DX.

Advertisement

I would check out the recent presentations from GDC about Mantle:

http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Mantle-Introducing-a-New-API-for-Graphics-Guennadi-Riguer.ppsx

http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Rendering-Battlefield-4-with-Mantle-Johan-Andersson.ppsx

http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Mantle-and-Nitrous-Combining-efficient-engine-design-with-a-modern-API-Dan-Baker.ppsx

FWIW, D3D12 looks similar to Mantle in many ways.

Sounds like a pain in the ass, to me.

SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.

I'm starting to think that AMD will never release a SDK/bindings/whatever for it...

Like seriously, what is the point of all these "Mantle is cool" public presentations if only a few (big) companies can get a hold of it? They could just make private presentations for the 10 or so representatives of the companies they're interested in and be done with it...

"Oh thanks Frostbite team for explaining to me how to use Mantle properly! Now with all this newfound knowledge I'll go to my favorite IDE and fuck myself!"

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

Like seriously, what is the point of all these "Mantle is cool" public presentations if only a few (big) companies can get a hold of it?


Building hype is a large part of advertising any product. Why do you think you hear about games months or even years before they come out?

Sean Middleditch – Game Systems Engineer – Join my team!

lets not get of topic with rants about AMD. The topic is quite interesting and if not for Mantle, the same points can be made for D3D12 and we can be quite sure MS will release it to the public at some point.

I think there are two main components that make the new APIs different from the previous ones.

1. a lot of caching/pre-creation of states. this can make your life quite difficult if you haven't designed for it. D3D11 has already states, but it's more of a bundling of settings to have less api calls, but with the new APIs, it seems like they optimize a lot of the whole GPU setup (kinda of similar to shader linking in opengl). Previously you could have artist controlled states or even dynamically created states by the game, but now, you don't really want to create those on runtime.

The issues we had in the past console generation with shader permutations, where you had tons of 'cached' version, depending on flags that each adds 2x the shader amount, now it will be the same for the whole rendering setup.

you can probably set any vertexshader, any pixelshader and then disable color writes, knowing the whole scope, the Mantle/D3D12 driver should be able to back track the GPU setup to your vertexshader, knowing just positions are needed and strip out every other redundant bit (which previously those 'mysterious' driver threads might or might not have done).

But this might be quite vendor specific (some might end up with the same gpu setup for two different states, e.g. in one you disable color writes in the other you set blend to be add(Zero,Zero) and other driver might not detect this), not sure how this would reflect the the API. whether you'd know pipelines are the same and you could adjust your sorting to account for this.

Everything on runtime needs to select from the pre-created permutation set to have a stable framerate. I wonder if there will be any guarantees on how long a pipeline creation might take (in opengl (es), sometimes shader linking takes several seconds). That's not only and issue of renderer architecture, but also initialization time. you don't want to spent minutes to cache thousands of states.

2. multithreading: previously I think there was no game that has used multithreading to speed up the API part. mainly because it was either not supported to use multiple interfaces or when there was a way (e.g. d3d11), it was actually slower.

yet it makes me wonder, are we really that much cpu bound? from my perspective, it needs a really slow cpu to saturate on the API side. usually, with instancing etc. any modern i3,i5,i7 is fast enough in a single thread to saturate on the GPU side.

and in case we don't want to optimize for the current saturation, but rather increase drawcall count etc. I really wonder when it starts to be rather suboptimal on the GPU side. if we'd be able to push 10M draw calls/s, that's like 100cycle/DC on a modern gpu and those have really deep pipelines, sometimes needing flushes for caches, every DC needs some context for the gpu setup that needs to be fetched. We'll end up with "yes, this could be done, but would be suboptimal, lets go back to instancing etc. again".

that's no different than what we do now. few 'big' setups per frame and pushing as many drawcalls with as few state/resource changes as possible to saturate rather on shader/rasterization/fillrate side.

The opengl extension from NVidia's talk are somehow way more what I'd hope for the direction of 'next gen apis'. it's as easy to use as opengl always was, just extending the critical parts to perform better. (I'm talking bout http://www.slideshare.net/CassEveritt/approaching-zero-driver-overhead ). it's actually making things nicer with persistent mapped buffers (you don't need to guess and hope how every driver will 'optimize' your calls and you have all the responsibility and possibilities that comes with using persistent buffers). and if multidrawindirect would be extended a bit more to support an array of indexed shader objects, you could render the whole solid pass with one drawcall. shadowmaps would possibly end up being one drawcall each and preparing those batched drawcalls could be done in multithreaded way if you want.

feels like GL has not the great marketing campain, but designing the next renderer would mean for me to rather go the NV/GL way and map it to D3D12/Mantle under the hood.

and if multidrawindirect would be extended a bit more to support an array of indexed shader objects

Isn't that the point of shader subroutines?

If you're really game, you can read all about how to program the raw GCN architecture here:

http://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.pdf

http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf

Basically, if you were at AMD working on a GL/D3D/Mantle driver, that's the information you'd need to know.

Mantle is basically a super-thin driver, compared to the GL/D3D drivers, so the abstraction provided will be half way between D3D and the info in that PDF wink.png

and if multidrawindirect would be extended a bit more to support an array of indexed shader objects

Isn't that the point of shader subroutines?

technically, but I'd not use that practically.

you might get that opinion if you look at that from the programming point of view and technically it would be possible, but it's also a lot about HW that is important:

1. with just one shader setup, you'd need to always involve all the shader stages that any of your 'subroutines' need (hull, domain, geometry, pixelshader). and that might be wasteful

2. hardware allocates resources for the worst case. if you have a simple vertex shader for most geometry but some very complex e.g. for skinned characters, the gpu (or rather driver) would allocate registers, caches etc. for the skinned version, reducing your throughput a lot for all the other cases.

3. gpus have optimizations for special cases, e.g. running early depth culling in the rasterizer stage if you don't modify the depth outcome in the shader, but with a unified shader and subroutines, if just one uses e.g. clip/kill, that optimization would be disabled for all of subroutines.

you're right to the point that it would nicely work already for nowadays hardware and maybe we should consider to use that as a smart optimization in some very local cases where we know exactly what's going on. yet I'd like to see that as a general purpose solution with no worries whether it might hurt performance more than it gives. NVidia stated that their 'volta' GPU should have some build in ARM cores, maybe then they can process more high level states (aka shader smile.png ).


yet it makes me wonder, are we really that much cpu bound? from my perspective, it needs a really slow cpu to saturate on the API side. usually, with instancing etc. any modern i3,i5,i7 is fast enough in a single thread to saturate on the GPU side.
In my experience it's very easy to be CPU-bound in D3D11 with real-world rendering scenarios. Lots of draw calls, and lots of resource bindings. This is true for us even on beefy Intel CPU's. We've had to invest considerable amounts of engineering effort into changing our asset pipeline and engine runtime in ways that reduced CPU usage. Things aren't helped at all by the fact that we can't multithread D3D calls, and the fact that there's a giant driver thread always using up a core.

The opengl extension from NVidia's talk are somehow way more what I'd hope for the direction of 'next gen apis'. it's as easy to use as opengl always was, just extending the critical parts to perform better. (I'm talking bout http://www.slideshare.net/CassEveritt/approaching-zero-driver-overhead ). it's actually making things nicer with persistent mapped buffers (you don't need to guess and hope how every driver will 'optimize' your calls and you have all the responsibility and possibilities that comes with using persistent buffers). and if multidrawindirect would be extended a bit more to support an array of indexed shader objects, you could render the whole solid pass with one drawcall. shadowmaps would possibly end up being one drawcall each and preparing those batched drawcalls could be done in multithreaded way if you want.
Really? The future you want is more instancing, wrapped up in a typical OpenGL layer of "you have to do it in this super-special way in order to hit the fast path"??? To me it's completely at odds with what actual software developers want. Everybody who works on consoles knows how low-overhead it *should* be to generate command buffers, and so they constantly beg for lower-overhead draw calls, better multithreading, and more access to GPU memory. Instead we get that "zero driver overhead" presentation that's like "lol too bad we're never going to change anything, here's some new extensions that only work on Nvidia and may require to to completely rewrite your rendering pipeline to use effectively." Great :-/

and in case we don't want to optimize for the current saturation, but rather increase drawcall count etc. I really wonder when it starts to be rather suboptimal on the GPU side. if we'd be able to push 10M draw calls/s, that's like 100cycle/DC on a modern gpu and those have really deep pipelines, sometimes needing flushes for caches, every DC needs some context for the gpu setup that needs to be fetched. We'll end up with "yes, this could be done, but would be suboptimal, lets go back to instancing etc. again".
that's no different than what we do now. few 'big' setups per frame and pushing as many drawcalls with as few state/resource changes as possible to saturate rather on shader/rasterization/fillrate side.
Of course you can end up getting GPU limited, but the problem is that right now we can't even get to that point because there's too much CPU overhead. Software should be able to hit that breaking point of where batching is needed for GPU performance, and then developers can decide case-by-case on how much it makes sense for them to persue instancing and things like that. It shouldn't be that you're forced into 100% instancing from the start otherwise you're dead in the water on PC, at least in my humble opinion.

This topic is closed to new replies.

Advertisement