• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
rAm_y_

Preparing for Mantle

17 posts in this topic

What are your thoughts here, what should we expect, this could be a good chance to get ahead at the start, any resources, ideas etc..

 

I really don't know where to start and how big a change it will be from GL/DX.

1

Share this post


Link to post
Share on other sites

Like seriously, what is the point of all these "Mantle is cool" public presentations if only  a few (big) companies can get a hold of it?


Building hype is a large part of advertising any product. Why do you think you hear about games months or even years before they come out?
0

Share this post


Link to post
Share on other sites
and if multidrawindirect would be extended a bit more to support an array of indexed shader objects

Isn't that the point of shader subroutines?

0

Share this post


Link to post
Share on other sites

 

and if multidrawindirect would be extended a bit more to support an array of indexed shader objects

Isn't that the point of shader subroutines?

 

technically, but I'd not use that practically.

you might get that opinion if you look at that from the programming point of view and technically it would be possible, but it's also a lot about HW that is important:

1. with just one shader setup, you'd need to always involve all the shader stages that any of your 'subroutines' need (hull, domain, geometry, pixelshader). and that might be wasteful

2. hardware allocates resources for the worst case. if you have a simple vertex shader for most geometry but some very complex e.g. for skinned characters, the gpu (or rather driver) would allocate registers, caches etc. for the skinned version, reducing your throughput a lot for all the other cases.

3. gpus have optimizations for special cases, e.g. running early depth culling in the rasterizer stage if you don't modify the depth outcome in the shader, but with a unified shader and subroutines, if just one uses e.g. clip/kill, that optimization would be disabled for all of subroutines.

 

you're right to the point that it would nicely work already for nowadays hardware and maybe we should consider to use that as a smart optimization in some very local cases where we know exactly what's going on. yet I'd like to see that as a general purpose solution with no worries whether it might hurt performance more than it gives. NVidia stated that their 'volta' GPU should have some build in ARM cores, maybe then they can process more high level states (aka shader smile.png ).

0

Share this post


Link to post
Share on other sites

Sounds like a pain in the ass, to me.


Both Mantle and D3D12 should actually be quite a bit easier for most non-trivial renderer designs.

It's also actually fairly similar to how one might try to use D3D11 today in a multi-threaded renderer; some of the trickier/dumber parts are thankfully simplified. Create a command list, create some resources, execute a command list with a set of resources as inputs, done. The rest of the changes are conceptual changes to simplify the resources model (no more different kinds of buffers, simpler texture semantics, etc.), the more explicit threading model (only particularly relevant if you want/need render threading), and the more explicit device model (pick which GPU you use for what on multi-GPU systems).
0

Share this post


Link to post
Share on other sites

yet it makes me wonder, are we really that much cpu bound? from my perspective, it needs a really slow cpu to saturate on the API side. usually, with instancing etc. any modern i3,i5,i7 is fast enough in a single thread to saturate on the GPU side.

In my experience it's very easy to be CPU-bound in D3D11 with real-world rendering scenarios. Lots of draw calls, and lots of resource bindings. This is true for us even on beefy Intel CPU's. We've had to invest considerable amounts of engineering effort into changing our asset pipeline and engine runtime in ways that reduced CPU usage.

I'm implying that you'll end up doing the same for D3D12/Mantle, just not because of the CPU, but because the GPU will have idle-bubbles in the pipeline if you start switching states. (if you profile on consoles, with low CPU overhead, that's what you'll see) It's still work that has to be done and an 1GHz sequential processor won't do any magic. (not talking bout shaders, but bout the command processor part!)
We have low level access to HW for consoles and while you might think we could now end up being wasteful with drawcalls etc. we actually waste a lot of SPU cycles to batch meshes and remove redundant states and even shader preperation that the GPU could handle, to avoid it on the GPU.
it's just moving the bottleneck to another place, but it's not removing it and at some point you'll hit it again and end up with the same old thinking: the fastest optimization is to not do wasteful work, no matter how fast you'd do it otherwise.

 
 

The opengl extension from NVidia's talk are somehow way more what I'd hope for the direction of 'next gen apis'. it's as easy to use as opengl always was, just extending the critical parts to perform better. (I'm talking bout http://www.slideshare.net/CassEveritt/approaching-zero-driver-overhead ). it's actually making things nicer with persistent mapped buffers (you don't need to guess and hope how every driver will 'optimize' your calls and you have all the responsibility and possibilities that comes with  using persistent buffers). and if multidrawindirect would be extended a bit more to support an array of indexed shader objects, you could render the whole solid pass with one drawcall. shadowmaps would possibly end up being one drawcall each and preparing those batched drawcalls could be done in multithreaded way if you want.

Really? The future you want is more instancing, wrapped up in a typical OpenGL layer of "you have to do it in this super-special way in order to hit the fast path"??? To me it's completely at ends with what actual software developers want.

I have a feeling you haven't looked into Cass Everitt's talk.
it's not about classical instancing.
it's about the creation of a list of drawcalls, with various resources (vertexbuffers, indexbuffers, textures...) and just submitting all of it in one drawcall, so instead of




for_all_my_drawcall
  gl set states
  gl draw mesh
you write




for_all_my_drawcall
  store_states into array
  store_mesh offsets/count etc. into array

gl draw_everything of array 
so, there is no "you have to do it in this super-special way in order to hit the fast path", it's quite the opposite, a very generic way. you don't have to touch the shader or something to account for some special instancing or something. you don't have to worry about resource limits and binding. all you do is creating a vector of all drawcalls, just like you'd 'record' it with mantle/D3D12.

yes, it's more limited right now, but that's why I've said, I'd rather see this extended.

 

Everybody who works on consoles knows how low-overhead it *should* be to generate command buffers, and so they constantly beg for lower-overhead draw calls, better multithreading, and more access to GPU memory. Instead we get that "zero driver overhead" presentation that's like "lol too bad we're never going to change anything, here's some new extensions that only work on Nvidia and may require to to completely rewrite your rendering pipeline to use effectively." Great :-/

I really disagree on that one.
it offers you persistant memory, where you can write multithreaded and manage it yourself, just like we do on consoles. it offers you to create command lists (rather vectors) in a multithreaded way, as you can do on consoles. and it's not about "we won't change a thing", it's rather "we've already given you a 90% solution that you can get hands on right now and the changes required are minimal compared to the rewrite for D3D12/Mantle for 10% more".

no offense intended, but have you really looked into it? I can't think of why it would be a pipeline rewrite for you, it's just a little change in buffer management (which aligns well with what you do if you follow best practice guides like https://developer.nvidia.com/sites/default/files/akamai/gamedev/files/gdc12/Efficient_Buffer_Management_McDonald.pdf ) and the 2nd part is 'recording' of drawcalls, less complex than with D3D12/Mantle (because you don't have to pre-create all states and manage them), which isn't that different to what you do if you try to sort your drawcalls to minimize state switching (which everyone does even on consoles, where drawcalls should be cheap, yet those hit you hard on GPU).

 

and in case we don't want to optimize for the current saturation, but rather increase drawcall count etc. I really wonder when it starts to be rather suboptimal on the GPU side. if we'd be able to push 10M draw calls/s, that's like 100cycle/DC on a modern gpu and those have really deep pipelines, sometimes needing flushes for caches, every DC needs some context for the gpu setup that needs to be fetched. We'll end up with "yes, this could be done, but would be suboptimal, lets go back to instancing etc. again".
that's no different than what we do now. few 'big' setups per frame and pushing as many drawcalls with as few state/resource changes as possible to saturate rather on shader/rasterization/fillrate side.

Of course you can end up getting GPU limited, but the problem is that right now we can't even get to that point because there's too much CPU overhead. Software should be able to hit that breaking point of where batching is needed for GPU performance, and then developers can decide case-by-case on how much it makes sense for them to persue instancing and things like that. It shouldn't be that you're forced into 100% instancing from the start otherwise you're dead in the water on PC, at least in my humble opinion.

well, maybe I'm just too used to prepare everything in best way for GPU's, we barely ran into cpu limitations due to rendering. most of the time the GPU tries to run our games. it was at first as if consoles have benefits due to low overhead, but then you take some captures and realize you pay for cache and pipeline flushes and the solution is just the plain old way you'd always optimize for <D3D12 .
I just expect the same for D3D12/Mantle.
2

Share this post


Link to post
Share on other sites

Regarding being CPU bound - this depends on whether you're making a graphical tech demo, or a game.
For the former, you might have 16ms of GPU time and 16ms of CPU time per frame dedicated to graphics.
For the latter, you've still got 16ms of GPU time (for now, until everyone else realizes you've ended up as the gatekeeper of their SPU-type jobs!), but maybe only a budget of 2ms of CPU time because all the other departments on the team need CPU time as well! In that situation, it's much easier to overrun your CPU budget...

yet there are very few games that saturate more than 2 cores. most have a render thread and that one is running independent of the other parts, that implies, from the architecture point of view, rendering in games nowadays runs no different than in tech demos unless your job system really fills up all cores and could benefit from freeing up the rendering thread/core.
if you don't occupy all cores and you don't run a render thread, there is no reason to complain about API limitations.

P.S. I'm about to sign an NDA with AMD to get access to Mantle, so it's obviously being released wider than just DICE now biggrin.png

part of the NDA is to not talk about the NDA ;)
2

Share this post


Link to post
Share on other sites

Create a command list, create some resources, execute a command list with a set of resources as inputs, done.

so, how do you keep a game (engine) flexible, yet knowing _all_ the states etc. that you don't want to create on runtime? (assume pipeline creation can take as much time as shader linking in opengl which is 1s in bad cases).
there is no driver anymore that does that in a background thread for you, in an as fast as possible way.
assume you have about 1024 shader combination, add stencil, rasterizer, blend, rendertarget permutations that might be part of the gpu-setup and therefor included in one static state you have to create.
assume, it's not a state creation that is cross platform, but per driver+gpu, you cannot really do it offline before you ship the game.
 

The rest of the changes are conceptual changes to simplify the resources model (no more different kinds of buffers, simpler texture semantics, etc.).

there still are. check out the links in the 2nd post. it's split in 2 stages
1. you allocate a bunch of memory
2. you prepare it for a specific usage case e.g. as render target or vertexbuffer.

now assume you want to use a texture as render target and use it as source in the 2nd drawcall (e.g. some temporal texture you use in post processing). you need to state that to the API.
assume further you take advantage of the new multithreaded command generation, so you can't keep track of the state of an object inside the object, you rather need to track states per commandbuffer/thread.
assume further, you don't want to do redundant state conversions, as those might be quite expensive (changing layouts of buffers to make them best suited for texture sampling, for rendering, for vertex access), so you'd need to actually somehow merge the states of resources you use in consecutive command buffers.


 

the more explicit threading model (only particularly relevant if you want/need render threading), and the more explicit device model (pick which GPU you use for what on multi-GPU systems).

you know you have to test and balance all that? cross fire works across different GPUs. you can have an APU gpu + some mid range Radeon HD 7700 + a top notch R9 290x.
and with D3D, there is a generic driver that might execute asymmetrically on those GPUs. it's something you'd need to handle.
I don't say that's impossible, but for the majority of devs, it can end up in either a lot of work (testing all kind of configuration in various parts of your game) or you can disappoint some high end users that their expensive 4x crossfire is no faster thatn 3x crossfire or even buggy.


a lot of work that drivers did before, will end up in the hands of devs and it's not optional, it's what you'll have to do. you might ship a perfectly fine running game and some new GPU might take advantage of something that hasn't been used before and it might uncover a bug in your 1year old game that ppl still play. and AMD/NV won't release a driver fix, you need to release a patch.

I see benefits you've mentioned, but I also see all the drawbacks.
I like low level programming on consoles, below what mantle/D3D12 offers, but I'm not sure about the PC side. when there was Glide/S3 Metal/RRedline/... and even GL was working different (MiniGL/PowerSGL/..) every developer felt relieved it ended with D3D. and the RefRas was actually pushed by game devs, to be able to validate something is a driver bug. now it all seem forgotten and like a step back.

the Cass Everitt talk really seems like the best balance of both worlds to me (if it would be extended a little bit). Edited by Krypt0n
2

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0