• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
lipsryme

Horrible performance or not ?

10 posts in this topic

I've implemented a rather basic deferred renderer in DirectX11 but seem to be getting horrible performance even in the clearing stage.

I've got 4 GBuffer targets that I'm rendering to. When inputting those into a clear shader which basically just sets everything to 0 or 1.0f for depth. This pass already takes almost 1 milliseconds to do. Now when rendering the GBuffer pass also it doubles to almost 2ms for just a single textured plane.

I just can't believe this is normal or is it ?
I know my graphics cards might not be the best out there but 2ms just for that ? Is it maybe that 4 targets are too much of a strain on the memory bandwith or something ?

My hardware:
Core i5 2.8ghz
Radeon 5750M (mobile) 1GB
4GB ram

And the sample is running in 1280x720 without any AA solution.
0

Share this post


Link to post
Share on other sites
Are you measuring the time it takes for the GPU to perform its part of the job or the time it takes to issue the commands aswell ?

If you are measuring the full frame time you should make sure you are doing so with release builds.
0

Share this post


Link to post
Share on other sites
I'm measuring the full frame time, and the release build does not make it any faster : /
I realize you can't pinpoint me to the exact problem I just need to know if there's something not right here. I just don't get how rendering a fullscreen quad and just outputting 0.0f or 1.0f to 4 RT's increases the frame time THAT much..?

Looking down on a single plane with texture on it and a directional light I'm getting around 390fps (so about 2.5ms frame time) in 1280x720 that seems a little too much to me.

Is there a way to further explore the performance impact with pix or something ? Edited by lipsryme
0

Share this post


Link to post
Share on other sites
Seems fairly reasonable to me. What is your graphics card texture bandwidth and fill rate?

4 32-bit g-buffers at 1280x720 takes up about 15MB. So roughly 15 million bytes that you are writing to. And then sampling from, and writing to another 4 million bytes.

Assuming texture sample bandwidth and pixel fill rate are roughly equal (they probably aren't), that's about 35 million bytes in 2.5ms.

Or roughly 14GB/s (or 112 Gbps). Does that correspond to your graphics card's specs?
1

Share this post


Link to post
Share on other sites
You should never clear the whole GBuffer. Simply clearing the depth should be enough.
1

Share this post


Link to post
Share on other sites
[quote name='CryZe' timestamp='1345241785' post='4970687']
You should never clear the whole GBuffer. Simply clearing the depth should be enough.
[/quote]

That is the first time I've heard that, can you elaborate why ?


@phil_T
Those are the stats according to AMD's site:

Engine clock speed: 550 MHz
Processing power (single precision): 440 GigaFLOPS
Polygon throughput: 550M polygons/sec
Data fetch rate (32-bit): 44 billion fetches/sec
Texel fill rate (bilinear filtered): 11 Gigatexels/sec
Pixel fill rate: 4.4 Gigapixels/sec
Anti-aliased pixel fill rate: 17.6 Gigasamples/sec
Memory clock speed: 800 MHz GDDR5
Memory data rate: 3.2 Gbps GDDR5
Memory bandwidth: 51.2 GB/sec
TDP: 25 Watts Edited by lipsryme
0

Share this post


Link to post
Share on other sites
[quote name='phil_t' timestamp='1345240539' post='4970683']
4 32-bit g-buffers at 1280x720 takes up about 15MB. So roughly 15 million bytes that you are writing to.
[/quote]QFE - phil_t's on the money here. Doing this in 1ms indicates a frame-buffer write bandwidth of about ~14GiB/s, which isn't the highest I've seen, but might be typical for a "mobile" version of a card.
If you could profile your GPU in depth, you'd probably find that this operation is entirely ROP bound ([i]frame-buffer write operations[/i]), so looking at the theoretical fill-rate of the card will give you a "speed of light" value ([i]theoretical limit[/i]).
[edit]Apparently your specific card has a theoretical max of 4.4 billion pixel writes ([i]probably 32-bit ones[/i]) per second, so in theory, your 1280*720*4 buffer should take at least ~0.84ms. What's your actual measured "[i]almost 1ms[/i]" value?
[quote name='lipsryme' timestamp='1345282612' post='4970781']That is the first time I've heard that, can you elaborate why ?[/quote]There's no point clearing any buffer that you're going to overwrite the contents of later on. Assuming that geometry always fills your entire screen, then new geometry is going to fill your g-buffer anyway, so clearing it is a waste of time. Edited by Hodgman
0

Share this post


Link to post
Share on other sites
Alright so to give a little more accurate results here is what I've measured.

0.33ms for drawing nothing
0.53ms for clear depth RT only
1.28ms for clear & GBuffer
1.65ms for clear, GBuffer and Lighting
1.92ms for clear, GBuffer, Lighting and Compose

Clear Pass = 0.2ms
GBuffer Pass = 1.08ms
Lighting Pass = 0.37ms
Compose Pass = 0.27ms

Writing only to the GBuffer depth target in the clear pass made it a bit faster.
0

Share this post


Link to post
Share on other sites
[quote name='Hodgman' timestamp='1345283090' post='4970783']
There's no point clearing any buffer that you're going to overwrite the contents of later on. Assuming that geometry always fills your entire screen, then new geometry is going to fill your g-buffer anyway, so clearing it is a waste of time.
[/quote]

In this case and in general this is true.

However, there is a point clearing the render targets and that is the case when using SLI/Crossfire setup. That is one of the ways that the driver is able to recognize which surfaces aren't needed by the other GPU and may skip the transfer of framebuffer between the GPU memories. So keep your clear code there for the case when number of GPUs is bigger than 1.

Otherwise, you may save some bandwidth if you use the hardware z-buffer for position creation instead of using another buffer for depth. The quality isn't as good, but should be enough for typical scenarios.

Best regards! Edited by kauna
1

Share this post


Link to post
Share on other sites
Good points Kauna.

In my engine, I force the user to be explicit about their usage of a render-target when they bind one -- the bind API forces them to choose some enum values, which boil down to:
1) When I bind this target, I need the previous contents to remain intact.
2) When I bind this target, I need it to be cleared to a specific value.
3) I'm going to overwrite every pixel in this target, so I don't care what it's initial values are upon binding.
4) I don't care whether any values are actually written to this target or not.
5) When I'm finished with this target, I need it to be cloned into this texture.

I haven't don't it yet, but I should apply your advice to case #3 -- if someone binds a target in this mode, I usually avoid clearing ([i]actually I clear to a random colour in non-shipping builds only, to test that their choice is valid[/i]), but I should issue a clear command if they are using an SLI setup.

#4 is used, for example, when you're rendering depth only, but the underlying API requires you to bind a colour target anyway ([i]e.g. GLES[/i]). This can be used to tell the driver not to 'resolve' the colour target, even though it's bound.

#5 is used when you want 2 or more copies of the rendered data. On some API's, it can be done quicker using 2 resolve commands, instead of 1 resolve + 1 copy.
1

Share this post


Link to post
Share on other sites
[quote name='kauna' timestamp='1345312764' post='4970880']
[quote name='Hodgman' timestamp='1345283090' post='4970783']
There's no point clearing any buffer that you're going to overwrite the contents of later on. Assuming that geometry always fills your entire screen, then new geometry is going to fill your g-buffer anyway, so clearing it is a waste of time.
[/quote]

In this case and in general this is true.

However, there is a point clearing the render targets and that is the case when using SLI/Crossfire setup. That is one of the ways that the driver is able to recognize which surfaces aren't needed by the other GPU and may skip the transfer of framebuffer between the GPU memories. So keep your clear code there for the case when number of GPUs is bigger than 1.

Otherwise, you may save some bandwidth if you use the hardware z-buffer for position creation instead of using another buffer for depth. The quality isn't as good, but should be enough for typical scenarios.

Best regards!
[/quote]
[quote name='Hodgman' timestamp='1345283090' post='4970783']
[quote name='lipsryme' timestamp='1345282612' post='4970781']That is the first time I've heard that, can you elaborate why ?[/quote]There's no point clearing any buffer that you're going to overwrite the contents of later on. Assuming that geometry always fills your entire screen, then new geometry is going to fill your g-buffer anyway, so clearing it is a waste of time.
[/quote]

There's another reason to clear render targets on tiled architecture GPUs that are prevalent on mobile devices; according to [url="http://www.realtimerendering.com/downloads/MobileCrossPlatformChallenges_siggraph.pdf"]Unity's talk at Siggraph this year[/url], clearing render targets can avoid extra copies done by the driver. So if I'm reading the slides correctly, the render target clear can act as an equivalent of EXT_discard_framebuffer operation on devices that don't expose that extension.
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0