Resource management

Started by
29 comments, last by SmkViper 9 years, 5 months ago

Suppose now I do end up with a Manager per type of asset and I am giving away raw pointers, why would I want shared_ptr instead of a unique one? I see why I'd need a shared one if I give away weak_ptr, but why with raw pointers?

What is a manager, exactly?

Let's take a ModelManager.

I'm guessing you've got one set of calls that says 'Load this model and give me a handle". This breaks down to several tasks. The model may already exist and be in use, so that means a model cache and/or a model pool. You have a handle to a model, but if you are sane you will not block. You want an immediate response that returns a proxy object rather than waiting a half second or maybe even longer for the data to get loaded from disk, so you've got a proxy. Then you are wanting the data from somewhere, probably a disk but also potentially from a data stream or potentially from a network or some other location.

So as I wrote earlier "manager" usually means "cache", "loader", "pool", and "proxy".

Also remember that you are dealing with data, your objects are wrappers around data. You don't care so much about the object at 0x32b34100, but you do care about the data "model of the player". That is a subtle but important distinction.

Using those we can answer your questions.

Is a ModelManager class (I'd prefer calling it a ModelPool) going to give away raw pointers to the data? In major projects, no way. You don't want to wait for loads (especially for complex models made up of many meshes and textures that potentially cause a cascade of loading many other resources), you want to allow reuse and restructuring of data, so you have a proxy object that serves for indirection to access data. The proxy can redirect to a placeholder. The proxy can redirect to loaded resource data. It is possible that the pool may be doing some shuffling internally and moving stuff around so there may be more than one copy of the data, you don't care where it is, only that you have a way to access valid data through an object. If the data needs to move for some reason the proxy can redirect it to a new copy or an old copy or whatever it wants. If the data gets evicted from the model cache for any reason the proxy can handle it correctly and give you CORRECT and USABLE data, even if that is not IDEAL data of the final model. Far better to get a single-point model for your proxy object that is available instantly rather than to stall the player for several seconds and pop up a loading screen.

is a ModelManager class (= ModelPool) going to give away shared_ptr or unique_ptr or similar? No, because the classes that rely on it do not manage the object's lifetime. The cache or pool is in charge of the actual model data's lifetime. If you are using a proxy then you don't care about the lifetime of the proxy, instead only caring about the lifetime of the texture. So you can return a proxy object directly, and you don't give out the details of the resource except within the library itself.

When the renderer needs the model and gives you a proxy to fill out, the proxy (not the manager) can return a raw pointer because the pointer will be immediately used and then discarded with an immediate single-use lifetime. That pointer may point to the actual model or to the default placeholder model, but that doesn't matter because it only exists for an instant and is not preserved anywhere. The only time such a pointer is handed out is for immediate use, never to be persisted nor stored by any system. The raw pointer should be considered invalidated very quickly depending on your system design. It may be invalid at the end of the function, it may be invalid at the end of the frame, it may be invalid at the end of the update, but whatever it is, the raw pointer should never be stored for longer term use.

Advertisement


As described, used a raw pointer or reference (to the resource or to the shared_ptr) if the resource is guaranteed to outlive the pointer/reference. There's no way to check whether the resource is still valid, you can only assume.

This method is actually a very valid way to go, since you usually will bundle up the resources to be used in a particular 'level' together. You can unload the resources when the level is finished (or whatever other logical construct you want to use) and in between you can utilize the naked pointer to your hearts content. That provides a very lightweight system, and as long as your non-managing code doesn't delete a resource, it would be viable.

Personally I would rather utilize (and do utilize) a proxy in between the resource and its users. This has many benefits, including the fact that the object that the client code has (the proxy in this case) does not have any ownership at all. The reference only points to an object, but doesn't have the ability to keep it alive. When you try to use it, the proxy is used to look up the reference and can check if the resource is still alive. This lets you assert that there isn't any calls to a resource that was unloaded.

I found that an oldish post on the bitsquid blog was a pretty good reference for this type of discussion: http://bitsquid.blogspot.com/2011/09/managing-decoupling-part-4-id-lookup.html

What's described in that article shares some of what I envision for the re-write of my ResourceCache, so thanks for the link.

In most ways I can think of, weak_ptr serves the function of a proxy, except that:

  • It allows you only to rehydrate a a shared_ptr, which means you're taking an owning interest in the resource, if only temporary. There's a risk, then, of forgetting to release that shared_ptr mistakenly, causing the resource to leak.
  • Because it relys on the shared_ptr control block, it can't be used with unique_ptr, which would better express the semantics of your solution if you wanted the cache/pool to be the sole owner and is also less efficient. Because weak_ptr necessarily leaks its underlying shared_ptr dependency, there's no way to work around this (I suppose you could wrap weak_ptr in another class and return that, but that's just boilerplate and doesn't resolve the inefficiency).

So, it seemed to me that there's a legitimate need for something like weak_ptr, but for unique_ptr rather than shared_ptr (that is: weak_ptr is to shared_ptr, as ??? is to unique_ptr). Other's agree, as a little google-fu lead me to this question on StackOverflow, and a subsequent answer mentioning this observer_ptr proposal for C++, which acknowledges the need for something similar, though it only goes as far as being a standard semantic layer over a raw pointer and less so a functional one. It provides useful semantics for pointers intended as observers, but doesn't provide a means to check whether what it observes is still valid.

So it seems to me now that there's actually a missing pair of smart pointers that together implement the semantics of a uniquely-owned object and of observers that ensure the object is still alive before returning a one-time-use pointer to it. This would differ from shared_ptr/weak_ptr by its shared_ptr equivalent being strictly lighter weight, and by its weak_ptr equivalent offering no implication of dehydrating an ownership interest in the object from it.

Something like the proposed std::optional of unique_ptr combined with something weak_ptr-like but aware of objects "optional-ness" seems to give the right semantics (though I haven't scrutinized it too hard), but I wonder how std::optional is intended to be implemented/specialized and if a special implementation of this observed and observable smart pointer pair could be made more efficient by clever implementation (e.g. maybe as a tagged-pointer).

throw table_exception("(? ???)? ? ???");

What's described in that article shares some of what I envision for the re-write of my ResourceCache, so thanks for the link.

In most ways I can think of, weak_ptr serves the function of a proxy, except that:

  • It allows you only to rehydrate a a shared_ptr, which means you're taking an owning interest in the resource, if only temporary. There's a risk, then, of forgetting to release that shared_ptr mistakenly, causing the resource to leak.
  • Because it relys on the shared_ptr control block, it can't be used with unique_ptr, which would better express the semantics of your solution if you wanted the cache/pool to be the sole owner and is also less efficient. Because weak_ptr necessarily leaks its underlying shared_ptr dependency, there's no way to work around this (I suppose you could wrap weak_ptr in another class and return that, but that's just boilerplate and doesn't resolve the inefficiency).
So, it seemed to me that there's a legitimate need for something like weak_ptr, but for unique_ptr rather than shared_ptr (that is: weak_ptr is to shared_ptr, as ??? is to unique_ptr). Other's agree, as a little google-fu lead me to this question on StackOverflow, and a subsequent answer mentioning this observer_ptr proposal for C++, which acknowledges the need for something similar, though it only goes as far as being a standard semantic layer over a raw pointer and less so a functional one. It provides useful semantics for pointers intended as observers, but doesn't provide a means to check whether what it observes is still valid.

So it seems to me now that there's actually a missing pair of smart pointers that together implement the semantics of a uniquely-owned object and of observers that ensure the object is still alive before returning a one-time-use pointer to it. This would differ from shared_ptr/weak_ptr by its shared_ptr equivalent being strictly lighter weight, and by its weak_ptr equivalent offering no implication of dehydrating an ownership interest in the object from it.

Something like the proposed std::optional of unique_ptr combined with something weak_ptr-like but aware of objects "optional-ness" seems to give the right semantics (though I haven't scrutinized it too hard), but I wonder how std::optional is intended to be implemented/specialized and if a special implementation of this observed and observable smart pointer pair could be made more efficient by clever implementation (e.g. maybe as a tagged-pointer).


I can't think of any need for a "weak_ptr" for a unique_ptr. Because as soon as you have something like that it's no longer unique by simple virtue that in order to use the object you have to share ownership for a while so the unique_ptr doesn't delete it. It just makes no logical sense. Not to mention that in order to implement something like that you'd lose all the advantages of unique_ptr in the first place - namely the fact that unique_ptr is the exact same cost and speed as a raw pointer. You might as well use shared_ptr if you need a weak_ptr-like access because that's what you really wanted in the first place.

As far as I can tell from that proposal - they just re-implemented raw pointers with a fancy name. So... use a raw pointer. There is no mechanism for clearing the "observer pointer" when the unique_ptr nukes itself simply because they unique_ptr has no way to tell anyone that it's dead because that would require additional data that people don't want in unique_ptr in the first place.

The closest thing I can think of is a handle system (which isn't standardized) that would basically store pointers in a giant table. When something is deleted you clear the entry in the table, and all the handles (pointing at the table entry) are invalidated automatically. You still have an overhead of a table, and you lose ownership semantics, but that sounds kind of like what you want. Of course, more advanced systems will also "version" the table and handles so that table entries can be re-used and long-lasting stale handles can detect the re-use and return null. (This still will not save you from someone nuking the table entry while someone is working with a resolved handle - to which you can add shared pointer mechanics to keep it around while the table entry is cleared... but we're right back where we started)

I've tried many systems in the past, reference counting, weak pointers, smart pointers. All have their strengths and weaknesses.

Honestly though, if your game design allows it, have a resource block per level and throw the old one away.

Nuke it from orbit, it's the only way to be sure.

If your game does not support a level based resource system, be prepared for problems and allow time for debuging.

We now use a very complex system with reference counted pointers, multi-stage tear down, and bit masked loading. (bit masked loading uses the hash value of the resource to see if it needs to be deleted, left alone, or loaded on a scene change).

It's a nightmare.

We have a system that allows designers to use a tool to generate small game objects as cpp files, and if they forget to check a check box you get crashes on a scene change that are hell on earth to track down. Sadly I can't change the default on that check box either or we get memory leaks. sad.png

And all this generated code gets built on the fly when you launch the game....

I didn't want to be a programmer, I wanted to be a lumberjack. Swinging from tree to tree across the mighty rivers of British Columbia. With my best girl by my side I'd sing... sing ... sing...

Sorry

Long day


I can't think of any need for a "weak_ptr" for a unique_ptr. Because as soon as you have something like that it's no longer unique by simple virtue that in order to use the object you have to share ownership for a while so the unique_ptr doesn't delete it. It just makes no logical sense. Not to mention that in order to implement something like that you'd lose all the advantages of unique_ptr in the first place - namely the fact that unique_ptr is the exact same cost and speed as a raw pointer. You might as well use shared_ptr if you need a weak_ptr-like access because that's what you really wanted in the first place.

As far as I can tell from that proposal - they just re-implemented raw pointers with a fancy name. So... use a raw pointer. There is no mechanism for clearing the "observer pointer" when the unique_ptr nukes itself simply because they unique_ptr has no way to tell anyone that it's dead because that would require additional data that people don't want in unique_ptr in the first place.

I don't think its hard at all to imagine what the desired and useful semantics of such a pointer would be -- You want a single pointer that owns its resource, and a co-pointer that can check whether its still valid before returning a for-immediate-use-only non-owning pointer. The owning pointer would have the move/copy semantics of unique_ptr, but would itself be the same size as a shared_ptr (to a control block, or just point to the control block and pay a double-indirection cost to get to its resource, just like shared_ptr could); it would have a ref-counted control block that would work mostly like shared_ptr, except it would have only the equivalent of weak_count. The "handle" pointer would behave mostly identical to weak_ptr, except it would return the pointer (or nullptr), not a shared_ptr -- and it would express single-use, non-owning semantics. Together, they could allow for the resource to be moved if necessary. These are the same semantics that many handle/body systems provide, like the one Jason described, albeit many of them accomplish it via a table-based manager class similar to what you describe.

Now, having thought about it over the weekend, I agree that that it doesn't appear to have much advantage over just using shared_ptr at first blush -- after all, you could wrap a weak_ptr in a simple facade and just use a shared_ptr that you never share and get the same semantics (* that's a half-truth I'll get to in a second) as I describe for my imagined co-pointers if you're happy to pay the overhead of doing so, and on the surface it doesn't appear that a custom implementation saves us much, in fact, it only saves one, maybe two, counters as near as I can tell. But, that's not the whole story...

Now for the half-truth. The main expense of using shared_ptr isn't the size of the control block, or even the fact that its twice as large as a raw pointer -- its that every time you create or destroy a shared_ptr to the same resource you have to jump through hoops to be thread-safe while you change the use_count and potentially destroy the object. The semantics described by my imagined co-pointers don't require this kind of heavy synchronization because only the owning pointer determines the lifetime of the object, only the increment/decrement of the weak_count needs to be synchronized, and many processors have ISA-level support for that. The only time heavy synchronization might be required is when the owning pointer deletes the resource and informs the control block, but I think even that isn't necessary. Now, simply not over-using shared_ptr by passing it all the way down your call-stacks can save a lot of overhead itself, but the co-pointers I describe don't really have this flaw to begin with -- pass them around all you want, there's not much cost over a raw pointer except an extra pointer on the stack and an atomic increment.

But, I also don't disagree with your assertion that many times you do want to take a temporary ownership interest in something, and for that shared_ptr/weak_ptr express the right semantics. There are really two camps regarding resource managers -- the kind where ownership flows into the hands of a user and can be implemented with shared_ptr/weak_ptr, and the handle/body style that usually uses a table-based manager -- the co-pointers I describe aren't really proposing new semantics, just standard semantics for handle/body that doesn't require centralized management. Neither camp is wrong, its just a different philosophy that offers different trade-offs.

Regarding the proposal for observer_ptr -- yes, its little more than a semantic layer over a raw pointer, but that alone is incredibly useful. With other builtin types, like int or char, you know all there is to know about them that the language could reasonably tell you -- you might not know what purpose they serve algorithmically, but you have a reasonable idea of where the fences are, at least. But its never been so with pointers -- If you have a pointer to an int you maybe knew about this int what you always know about int, but you can't be certain without more context and you certainly don't have any earthly idea what semantics the pointer itself was meant to convey -- does it own or not own what it points to? Is what it points to on the stack or heap? Does it point into an array? How was that array allocated? Is it really pointing to a memory-mapped register? Good naming helps convey intent, but it doesn't erect any fences. Interestingly, if you read in that paper, you'll see its taken a lot longer for everyone to agree on the move/copy semantics observer_ptr should have than one would think if it really were as simple as doing what a raw pointer would do -- it's contentious enough that it's still a proposal and didn't make it into either C++11 or C++14.

throw table_exception("(? ???)? ? ???");

All of that description can be satisfied with a simple handle or proxy object. You have a handle/proxy to the resource. It serves as a long-term reference that can even be persisted. You can check to see if the handle/proxy is currently live. You can request a raw pointer to the direct data, with some systems using some kind of locking mechanism if you need a longer-term pointer, but many returning a pointer that is invalid at the end of the render/update/simulate phase.

Lifetimes and ownership are important. The various forms of smart pointers are helpful to manage cases where lifetimes need to shift between systems. That is not often the case in most games, the lifetime is well established and the data rarely migrates between systems.

I don't think its hard at all to imagine what the desired and useful semantics of such a pointer would be -- You want a single pointer that owns its resource, and a co-pointer that can check whether its still valid before returning a for-immediate-use-only non-owning pointer.


I don't necessarily disagree on the usefulness aspect. I'm simply confused as to why people think we don't have this already with unique_ptr/raw ptr pair. If you are asking your "weak ptr" for an actual pointer you are going to want something to ensure the owner doesn't nuke the memory while you're using it - at which point you're using shared ptr semantics.

The owning pointer would have the move/copy semantics of unique_ptr, but would itself be the same size as a shared_ptr (to a control block, or just point to the control block and pay a double-indirection cost to get to its resource, just like shared_ptr could); it would have a ref-counted control block that would work mostly like shared_ptr, except it would have only the equivalent of weak_count. The "handle" pointer would behave mostly identical to weak_ptr, except it would return the pointer (or nullptr), not a shared_ptr -- and it would express single-use, non-owning semantics. Together, they could allow for the resource to be moved if necessary. These are the same semantics that many handle/body systems provide, like the one Jason described, albeit many of them accomplish it via a table-based manager class similar to what you describe.


Ok - so you can save 32 bits - assuming your compiler/allocator doesn't pad it out to 64 bits anyway. That's a measurable win if applicable smile.png I still contend that returning a raw pointer from a weak reference is a bad idea as the owner can just delete it out from under you while you're holding onto the raw pointer.

Now for the half-truth. The main expense of using shared_ptr isn't the size of the control block, or even the fact that its twice as large as a raw pointer -- its that every time you create or destroy a shared_ptr to the same resource you have to jump through hoops to be thread-safe while you change the use_count and potentially destroy the object. The semantics described by my imagined co-pointers don't require this kind of heavy synchronization because only the owning pointer determines the lifetime of the object, only the increment/decrement of the weak_count needs to be synchronized, and many processors have ISA-level support for that. The only time heavy synchronization might be required is when the owning pointer deletes the resource and informs the control block, but I think even that isn't necessary. Now, simply not over-using shared_ptr by passing it all the way down your call-stacks can save a lot of overhead itself, but the co-pointers I describe don't really have this flaw to begin with -- pass them around all you want, there's not much cost over a raw pointer except an extra pointer on the stack and an atomic increment.


I'm pretty sure the synchronization costs for your system and shared_ptr are identical. Both atomically increment/decrement a counter whenever you make copies of the shared_ptr/weak_ptr. The handle system doesn't get you anything here as far as I know - because you don't need to actually delete anything until the count goes to 0, which doesn't need to be synchronized because no one else points at the value by definition. Creation also doesn't need to be synchronized because, again, no one else is pointing at the object.

Regarding the proposal for observer_ptr -- yes, its little more than a semantic layer over a raw pointer, but that alone is incredibly useful. With other builtin types, like int or char, you know all there is to know about them that the language could reasonably tell you -- you might not know what purpose they serve algorithmically, but you have a reasonable idea of where the fences are, at least. But its never been so with pointers -- If you have a pointer to an int you maybe knew about this int what you always know about int, but you can't be certain without more context and you certainly don't have any earthly idea what semantics the pointer itself was meant to convey -- does it own or not own what it points to? Is what it points to on the stack or heap? Does it point into an array? How was that array allocated? Is it really pointing to a memory-mapped register? Good naming helps convey intent, but it doesn't erect any fences. Interestingly, if you read in that paper, you'll see its taken a lot longer for everyone to agree on the move/copy semantics observer_ptr should have than one would think if it really were as simple as doing what a raw pointer would do -- it's contentious enough that it's still a proposal and didn't make it into either C++11 or C++14.


Here I'll agree - adding more type information can always be a benefit - especially when the compiler can use that to catch bugs.

As a counter-argument, however, you shouldn't be using pointers for anything but non-owning-optional-value. If you want a non-optional value, use references instead. If you want to portray ownership, pass in your shared_ptr/unique_ptr. If you're passing an array, use iterators instead (which lets you use more then just arrays if necessary). Doesn't matter if it is on the stack or heap because you shouldn't be doing anything to that pointer that would require it to matter. (i.e. you shouldn't be sticking raw pointers into a shared_ptr)

The only exception to this rule would be if you're dealing with C APIs or old C++ libraries that haven't been updated to C++11/14, and, well, there's not much you can do there if it's 3rd party code. (And sometimes not much you can do if it's your own code if management won't let you 'fix' it)
It allows you only to rehydrate a a shared_ptr, which means you're taking an owning interest in the resource, if only temporary. There's a risk, then, of forgetting to release that shared_ptr mistakenly, causing the resource to leak. [...] The main expense of using shared_ptr isn't the size of the control block, or even the fact that its twice as large as a raw pointer -- its that every time you create or destroy a shared_ptr to the same resource you have to jump through hoops to be thread-safe while you change the use_count and potentially destroy the object.

Maybe I don't understand your issue correctly, but isn't that exactly what one would want?

The asset/resource cache/loader/manager/whatever will almost certainly run asynchronously. Which means unless you can register a (temporary) ownership, the cache will pull the chair you're sitting on from beneath your butt, figuratively. You're in the middle of uploading a model's vertices when the "manager" frees the memory to make room for something else it wants to load. Bang, you're dead.

Yes, you can do clever stuff like the resource IDs in that bitsquid article, and they are easier faster, blah blah... but they also are not threadsafe at all. Making that beast threadsafe (lockfree, of course) is a nightmare. Or, you would have to design it so the cache is only allowed to delete objects at end-of-frame or such... not precisely pretty. Or, you would have to do without threads.

Holding shared_ptrs and handing out weak_ptrs avoids all problems one could think of. Yes, incrementing the refcount on the shared_ptr is not a free operation. It's an atomic increment, which is like 6-7 cycles instead of 2. But in the light of a single context switch or even a disk access, that's really no biggie. You don't have ten million assets loaded simultaneously, you maybe have a thousand or so (most likely less). So that's a few thousand atomic increments.

Yes, ownership is somewhat "blurred" because the moment you lock() the weak_ptr you obtain a (temporary) ownership, presumed that the object is still in the cache. But hey, that is just what you want, it is what you need. And, it doesn't impair the cache's ability to do its job. The cache/manager/whatever is still in complete control of what's being kept and what's ditched. It will hold a single shared_ptr of all the objects that it wishes to remain cached, and ditch that one once an object is to be destroyed. And destroyed it will be, but at the right time, when nobody is reading from it -- not randomly somewhere mid-frame. You could call it a kind of "deferred" delete.

The nice thing is, you have the guarantee that it will work, there is no way it could possibly fail (shared_ptr makes sure of that!), and it's something like dozen lines of code. There's no way short of an infinite loop (and then you have a different problem!) you could leak either since the shard_ptr you get from locking the weak_ptr is of automatic storage duration.

In addendum about shared_ptr inc/dec costs:

You shouldn't be doing a lot of those anyway. If you pass a shared_ptr you only do it because that function wants to take ownership and you do it by const ref - then you don't pay the increment/decrement costs until that function takes ownership.

Otherwise you pass by raw non-owning pointer, and all the inc/dec costs go away because you paid the cost once when you grabbed the pointer from the smart pointer. As long as you've written your code so that the shared_ptr exists as long as that call does, and the call doesn't store that raw pointer somewhere, you're fine. And if you're in a function that has access to a shared_ptr and needs to deref it multiple times, then just do it once.

(Of course, all of the above is based on you profiling your code and knowing that inc/dec costs are what is preventing you from getting 60fps)

This topic is closed to new replies.

Advertisement