• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
  • entries
    316
  • comments
    485
  • views
    321135

Smart Pointers Aren't Always So Smart

Sign in to follow this  
Followers 0
Jason Z

1886 views

The title may seem more controversial than I really intend, but I think it fits with the situation that I have recently encountered. I was going through each of the sample applications in Hieroglyph 3 after applying a user supplied patch, and I noticed that the frame rate of the MirrorMirror sample had dropped off significantly. This particular sample is designed to make multi-threaded rendering in Direct3D 11 pay off by rendering three reflective spheres which are surrounded by some large number of simple objects around them. Here is a screenshot of the sample to give you a visual:

MirrorMirror100003.png

Since each reflective sphere requires the scene to be rendered for its paraboloid map, the sample effectively renders the scene a total of four times - once for each sphere (both paraboloid maps are generated simultaneously) and then once to render the final scene. Thus if I specify 200 objects floating around the spheres, then you end up performing approximately 800 draw calls over four rendering passes. This is effectively the best use case for parallel rendering - the work loads are more or less evenly distributed over four threads, and there is a corresponding speed boost when using multi-threaded as opposed to single threaded rendering.

However, when I did my test I found that the frame rate in debug mode had dropped from somewhere around 70 to ~9. After going back through my source code tree, I found that before some recent changes to how I handle the input assembler stage's state the performance was as expected. This seemed really strange, since the new state management actually should have been more or less equivalent to the old method.

To further investigate, I stepped through the drawing operation with the debugger, and immediately found out the issue. I changed to using a standalone object to represent all of the input assembler state within the engine, including all of the available vertex buffer slots. To set up the situation, there are a maximum of D3D11_IA_VERTEX_INPUT_RESOURCE_SLOT_COUNT slots in the IA, which is currently defined as 32 available slots... I typically reference resources in the engine with a smart pointer to a proxy object. The proxy object contains the indices of a resource, plus any resource views that it would need. The proxy is used by applications as a very easy way to reference several pieces of data as one, and has overall worked out very well.

To get to the point, I was declaring an input assembler state object on the stack in each draw call, which initialized the state object to have null pointers for all of those 32 vertex buffer slots. Even though I was only using a single vertex buffer, I was still initializing all the other slots to null, which amounts to 32 assignments of the smart pointer. With a little math, we can see what was going on: 32 x 4 rendering passes x 200 = 25,600 references per frame.

In the end, I simply switched the stored state to use directly the index of the vertex buffer (since the input assembler doesn't use resource views, there is no need to use the proxy object anyways). Just a few short changes popped the speed right back up to where it should have been. So the moral of the story is this - smart pointers are only as smart as the person that is using them, and sometimes (especially in my case) they end up being not so smart :)

Anyways, this has opened my eyes to some state management issues with Hieroglyph 3, which I am now working on. My goal is to reduce the number of API calls to as few as possible, all the while properly managing the cached pipeline state with respect to multithreaded rendering... This will be the subject of my next post!

3
Sign in to follow this  
Followers 0


14 Comments


Heheh, in my game AdventureFar, I used smart pointers for handling chunks of the world. The speed wasn't fast enough, and profiling showed Boost's smart pointers as [u]part of[/u] the problem. I now use regular pointers where speed is of issue, and smart pointers in less-critical areas.
In my case, it was the creation of the smart pointers (boost::make_shared()) that was part of the slowdown.

Smart pointers are very good, and I use them alot in my code, but they shouldn't just be used with a "replace every pointer with a smart pointer!" line of thought. Like everything in C++, used properly, they are of great benefit. Used improperly, your head asplodes.
0

Share this comment


Link to comment
I completely agree - they are a great way to simplify and secure high level code.&nbsp; However, as soon as you dip into anything that is high frequency then you are really playing with fire.&nbsp; In my case, only the high level objects should have been using the smart pointers, it was just my mistake to use them on something at a lower level.<br><br>At least I can go to sleep tonight and know that I wasn't the only one to make the same mistake :)<br>
1

Share this comment


Link to comment
Also, it is important to keep in mind a bit of the structuring of your application. shared_ptr can be used to forget about responsibility. In the google C++ guidelines, they advice to rarely use them. The reason being that no one has a clear ownership of the data. They advice using managers pattern instead. One can also recourse to a kind of managers that issues weak_ptr instead of ID objects... Well in any case, shared_ptr everywhere is just a bad Java. (because at least Java is quite clever about circular connections and floating graph components etc, C++ won't.)
But don't read me wrong, I love shared_ptr and I use them... where it makes sense.
0

Share this comment


Link to comment
The main reason shared_ptr has a performance impact is that the standard specifies it as thread-safe (the smart pointer only, not the object inside, so for example reset() is not thread-safe) and so it has to use atomic instructions to update the reference count--which imply memory barriers that will ruin any compiler and CPU reordering optimization of instructions around the atomic operation.
1

Share this comment


Link to comment
That's good to know - I didn't realize that atomics were used in the smart pointer implementation. That would really explain a lot, and is a good thing to keep in mind in the future. Thanks for "pointing" it out :)
0

Share this comment


Link to comment
IIRC another reason shared_ptr can be slow is that a heap alloc is used for the ref counter. It's a one-time hit, but it can be non-trivial to work out when it will occur unless your design is tight.

(Edit: it's actually one time per ref counted class instance, of course)

Check out Alexandrescu's Loki library and "Modern C++ Design" book for an in depth look at designing smart pointers, and a very customisable implementation which has template parameters controlling thread-safety, intrusiveness, etc. I've never used Loki, but Alexandrescu's discussion of the issues has definitely helped me when choosing a smart pointer for a particular situation.
0

Share this comment


Link to comment
[quote name='Prune' timestamp='1322673403']
The main reason shared_ptr has a performance impact is that the standard specifies it as thread-safe (the smart pointer only, not the object inside, so for example reset() is not thread-safe) and so it has to use atomic instructions to update the reference count--which imply memory barriers that will ruin any compiler and CPU reordering optimization of instructions around the atomic operation.
[/quote]
Interesting. I wonder how practical it is to create a non-thread safe version by using Boost's source for if your program isn't multi-threaded; or better yet, #ifdef to disable or enable thread-safety depending on your application's needs.
0

Share this comment


Link to comment
I try to use shared_ptr as little as possible, but to avoid the extra heap allocation you can use make_shared.

Most of the time I use unique_ptr(which has no overhead at all basically), or intrusive_ptr(which also has no overhead) since D3D resources have built in reference counts
0

Share this comment


Link to comment
And ofcourse there are the other types of smart pointers that are a bit more light-weight, like boost::scoped_ptr. I always check if I can't use one of those instead of resorting to raw pointers.
0

Share this comment


Link to comment
[quote name='Prune' timestamp='1322673403']The main reason shared_ptr has a performance impact is that the standard specifies it as thread-safe and so it has to use atomic instructions to update the reference count[/quote]And despite this, they're still not completely "thread safe" ([i]terrible vague term that it is[/i]) -- the objects they point to can be owned by multiple threads, but a shared_ptr itself cannot be shared between multiple threads (if any of them have write access).
[size="1"][If you've got a shared_ptr that is read/writable by two threads, and one sets it to null at the same time as another thread attempts to copy it, the reference count can reach zero (and the object deleted) just before the pointer is copied and the recently-deleted ref-counter incremented back to 1][/size]
1

Share this comment


Link to comment
This has become a rant about shared_ptr, not smart ptrs in general?

We use intrusive ref counting to eliminate the extra heap allocations. Our pointer wrapper can be defined to be either thread safe or not (by default it is not).

I agree that smart pointers can cause performance issues, mostly with cache misses since you have to fetch the object (or ref counter location) just to copy the pointer address.

It would indeed be wise to use raw pointers for low level systems such as rendering. The render manager may store a single smart pointer as objects are registered. This will preserve the object while it is inside the renderer. Then, internally raw pointers can be "safely" used.

Smart pointers also often cause circular references and memory leaks. But imo, their benefits FAR outweigh their pitfalls.
2

Share this comment


Link to comment
I hate them, I hate them, I hate them. You need a smart pointer manager to manage your smart pointers, why don't smart pointer classes implement simple code to log where they were allocated in debug builds? how stupid is it that a smart pointer leaks, somewhere, heres the object on the heap - oh you weren't tracing all heap allocations? poor u! smart is as smart does, its a misnomer!
0

Share this comment


Link to comment
oh I forgot - heres a worthless counter representing all the places where you copy-constructed it and incremented the ref counter, now fix it
0

Share this comment


Link to comment
I have to agree with bzroom - the performance issues were certainly caused by me. I simply used the wrong tool for the job, due to a lack of experience with them. However, this string of comments has been extremely informative and interesting.

I'm happy to see that there is such a huge amount of experience roaming around GDNet!
0

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now