Advertisement Jump to content
Sign in to follow this  
rAm_y_

Preparing for Mantle

This topic is 1748 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

What are your thoughts here, what should we expect, this could be a good chance to get ahead at the start, any resources, ideas etc..

 

I really don't know where to start and how big a change it will be from GL/DX.

Share this post


Link to post
Share on other sites
Advertisement

I'm starting to think that AMD will never release a SDK/bindings/whatever for it...

 

Like seriously, what is the point of all these "Mantle is cool" public presentations if only  a few (big) companies can get a hold of it? They could just make private presentations for the 10 or so representatives of the companies they're interested in and be done with it...

 

"Oh thanks Frostbite team for explaining to me how to use Mantle properly! Now with all this newfound knowledge I'll go to my favorite IDE and fuck myself!"

Edited by TheChubu

Share this post


Link to post
Share on other sites

Like seriously, what is the point of all these "Mantle is cool" public presentations if only  a few (big) companies can get a hold of it?


Building hype is a large part of advertising any product. Why do you think you hear about games months or even years before they come out?

Share this post


Link to post
Share on other sites

lets not get of topic with rants about AMD. The topic is quite interesting and if not for Mantle, the same points can be made for D3D12 and we can be quite sure MS will release it to the public at some point.

 

I think there are two main components that make the new APIs different from the previous ones.

1.  a lot of caching/pre-creation of states. this can make your life quite difficult if you haven't designed for it. D3D11 has already states, but it's more of a bundling of settings to have less api calls, but with the new APIs, it seems like they optimize a lot of the whole GPU setup (kinda of similar to shader linking in opengl). Previously you could have artist controlled states or even dynamically created states by the game, but now, you don't really want to create those on runtime.

The issues we had in the past console generation with shader permutations, where you had tons of 'cached' version, depending on flags that each adds 2x the shader amount, now it will be the same for the whole rendering setup.

you can probably set any vertexshader, any pixelshader and then disable color writes, knowing the whole scope, the Mantle/D3D12 driver should be able to back track the GPU setup to your vertexshader, knowing just positions are needed and strip out every other redundant bit (which previously those 'mysterious' driver threads might or might not have done).

But this might be quite vendor specific (some might end up with the same gpu setup for two different states, e.g. in one you disable color writes in the other you set blend to be add(Zero,Zero) and other driver might not detect this), not sure how this would reflect the the API. whether you'd know pipelines are the same and you could adjust your sorting to account for this.

Everything on runtime needs to select from the pre-created permutation set to have a stable framerate. I wonder if there will be any guarantees on how long a pipeline creation might take (in opengl (es), sometimes shader linking takes several seconds). That's not only and issue of renderer architecture, but also initialization time. you don't want to spent minutes to cache thousands of states.

2. multithreading: previously I think there was no game that has used multithreading to speed up the API part. mainly because it was either not supported to use multiple interfaces or when there was a way (e.g. d3d11), it was actually slower.

yet it makes me wonder, are we really that much cpu bound? from my perspective, it needs a really slow cpu to saturate on the API side. usually, with instancing etc. any modern i3,i5,i7 is fast enough in a single thread to saturate on the GPU side.

and in case we don't want to optimize for the current saturation, but rather increase drawcall count etc. I really wonder when it starts to be rather suboptimal on the GPU side. if we'd be able to push 10M draw calls/s, that's like 100cycle/DC on a modern gpu and those have really deep pipelines, sometimes needing flushes for caches, every DC needs some context for the gpu setup that needs to be fetched. We'll end up with "yes, this could be done, but would be suboptimal, lets go back to instancing etc. again".

that's no different than what we do now. few 'big' setups per frame and pushing as many drawcalls with as few state/resource changes as possible to saturate rather on shader/rasterization/fillrate side.

 

The opengl extension from NVidia's talk are somehow way more what I'd hope for the direction of 'next gen apis'. it's as easy to use as opengl always was, just extending the critical parts to perform better. (I'm talking bout http://www.slideshare.net/CassEveritt/approaching-zero-driver-overhead ). it's actually making things nicer with persistent mapped buffers (you don't need to guess and hope how every driver will 'optimize' your calls and you have all the responsibility and possibilities that comes with  using persistent buffers). and if multidrawindirect would be extended a bit more to support an array of indexed shader objects, you could render the whole solid pass with one drawcall. shadowmaps would possibly end up being one drawcall each and preparing those batched drawcalls could be done in multithreaded way if you want.

 

feels like GL has not the great marketing campain, but designing the next renderer would mean for me to rather go the NV/GL way and map it to D3D12/Mantle under the hood.

Share this post


Link to post
Share on other sites
and if multidrawindirect would be extended a bit more to support an array of indexed shader objects

Isn't that the point of shader subroutines?

Share this post


Link to post
Share on other sites

If you're really game, you can read all about how to program the raw GCN architecture here:

http://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.pdf

http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf

 

Basically, if you were at AMD working on a GL/D3D/Mantle driver, that's the information you'd need to know.

Mantle is basically a super-thin driver, compared to the GL/D3D drivers, so the abstraction provided will be half way between D3D and the info in that PDF wink.png

Share this post


Link to post
Share on other sites

 

and if multidrawindirect would be extended a bit more to support an array of indexed shader objects

Isn't that the point of shader subroutines?

 

technically, but I'd not use that practically.

you might get that opinion if you look at that from the programming point of view and technically it would be possible, but it's also a lot about HW that is important:

1. with just one shader setup, you'd need to always involve all the shader stages that any of your 'subroutines' need (hull, domain, geometry, pixelshader). and that might be wasteful

2. hardware allocates resources for the worst case. if you have a simple vertex shader for most geometry but some very complex e.g. for skinned characters, the gpu (or rather driver) would allocate registers, caches etc. for the skinned version, reducing your throughput a lot for all the other cases.

3. gpus have optimizations for special cases, e.g. running early depth culling in the rasterizer stage if you don't modify the depth outcome in the shader, but with a unified shader and subroutines, if just one uses e.g. clip/kill, that optimization would be disabled for all of subroutines.

 

you're right to the point that it would nicely work already for nowadays hardware and maybe we should consider to use that as a smart optimization in some very local cases where we know exactly what's going on. yet I'd like to see that as a general purpose solution with no worries whether it might hurt performance more than it gives. NVidia stated that their 'volta' GPU should have some build in ARM cores, maybe then they can process more high level states (aka shader smile.png ).

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!