Jump to content
  • Advertisement
Sign in to follow this  
Decrius

[C++] My reference-counting implementation

This topic is 3446 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Not so long ago I was advised to use boost::shared_ptr and not my own implementation. Using well tested libraries makes you're project more stable and I quite liked the idea how it makes sure unused data is deleted. I did some quick not very reliable test runs (as reliable as I can get) to see what the difference was between boost::shared_ptr, my old reference counting system I made half a year back (which syntax I dislike, since it requires you to use lose(), instead of delete) and just plain old new and delete without keeping track of anything. I'm a performance freak, so these were the measurements: if not keeping track is 100%, then boost was 233% and my method was 113%. The higher above 100%, the slower compared to tracking nothing at all. So, I created a new project and tried to make a reference counting implementation with more ease of use. I wanted it to use new and delete (not gain and lose, neither only new like for boost), be easy to add to any class, be transparent to the user and have very little overhead. I think I got the concept done, need to see what I'll throw if allocation goes awry, and the variables are non-descriptive. This is how it works, the reference counting system (NOTE: adding backslash after each row makes it all appear on one row with this syntax highlighter, so you'll have to add backslashes at all ends of the rows except the last):
#define REFERENCE_COUNTING(T)
    unsigned int reference_count;
public:
    static void *operator new (size_t size) throw (const char *)
    {
        void *p = malloc(size);
        if (!p)
        {
            throw "allocation failure";
        }
        T *b = (T *) p;
        b->reference_count = 0;
        return p;
    }
    
    static void operator delete (void *p)
    {
        if (p)
        {
            T *b = (T *) p;
            if (!b->lose())
            {
                free(p);
            }
        }
    }
    
    void gain()
    {
        reference_count++;
    }

private:
    bool lose()
    {
        if (!reference_count)
        {
            return false;
        }
        reference_count--;
        return true;
    }

public:
    unsigned int count()
    {
        return reference_count;
    }
private:



How it's implemented:
class B
{
    REFERENCE_COUNTING(B)

public:
    B()
    {
        std::cout << "B();\n";
    }

    ~B()
    {
        std::cout << "~B();\n";
    }
};



How it's used:
// normal syntax works, reference counting is entirely transparent
B *b = new B;
delete b;

// how it could work in a library
B *b = new B; // developer using a library allocates the class of the library

library_use(b); // send pointer to library

delete b; // 'delete' instance, won't be deleted since it's still in the library

library_shutdown();

// inside the library
library_use(B *b)
{
    b->gain();
    this->b = b; // library holds the pointer somehow
}

library_shutdown()
{
    if (b)
    {
        delete b;
        b = NULL;
    }
}



What do you think of this way of approaching a reference counting mechanism? It might be not how 'real' reference counting works...I mean, it won't _automatically_ delete the pointer, but this is the way I wanted to make this. Also, from the measurements I did, this came out to be _FASTER_ then plain new and delete. I guess something is wrong here, it took this way 87% of the time it takes without reference counting. This is the measurement code:
#include <iostream>
#include <windows.h>
#include <boost/shared_ptr.hpp>

//#include <mmgr/mmgr.cpp>

double GetSystemTime();

///////////////////////////////////////////////////////////

class A
{
public:
    A()
    {
    }

    ~A()
    {
    }
};

class B
{
    REFERENCE_COUNTING(B)

public:
    B()
    {
    }

    ~B()
    {
    }
};

typedef boost::shared_ptr<A> A_Ptr;

///////////////////////////////////////////////////////////

int main()
{
    double time;

    // boost
    time = GetSystemTime();
    for (unsigned int i = 0; i < 10000; i++)
    {
        A_Ptr a(new A);
    }
    std::cout << (GetSystemTime() - time) << '\n';

    // my method
    time = GetSystemTime();
    for (unsigned int i = 0; i < 10000; i++)
    {
        B *a = new B;
        delete a;
    }
    std::cout << (GetSystemTime() - time) << '\n';

    // normal allocation
    time = GetSystemTime();
    for (unsigned int i = 0; i < 10000; i++)
    {
        A *a = new A;
        delete a;
    }
    std::cout << (GetSystemTime() - time) << '\n';

    std::cout << '\n';
    /* and again */

    // boost
    time = GetSystemTime();
    for (unsigned int i = 0; i < 10000; i++)
    {
        A_Ptr a(new A);
    }
    std::cout << (GetSystemTime() - time) << '\n';

    // my method
    time = GetSystemTime();
    for (unsigned int i = 0; i < 10000; i++)
    {
        B *a = new B;
        delete a;
    }
    std::cout << (GetSystemTime() - time) << '\n';

    // normal allocation
    time = GetSystemTime();
    for (unsigned int i = 0; i < 10000; i++)
    {
        A *a = new A;
        delete a;
    }
    std::cout << (GetSystemTime() - time) << '\n';

    return 0;
}

///////////////////////////////////////////////////////////

double GetSystemTime()
{
    static LARGE_INTEGER Frequency;
    static BOOL UseHighPerformanceTimer = QueryPerformanceFrequency(&Frequency);

    if (UseHighPerformanceTimer)
    {
        LARGE_INTEGER CurrentTime;
        QueryPerformanceCounter(&CurrentTime);

        return static_cast<double>(CurrentTime.QuadPart) / Frequency.QuadPart;
    }
    else
    {
        return GetTickCount() * 0.001;
    }
}

///////////////////////////////////////////////////////////



Thanks for any comments. [Edited by - Decrius on January 9, 2009 7:26:57 AM]

Share this post


Link to post
Share on other sites
Advertisement
Quote:

What do you think of this way of approaching a reference counting mechanism?

I do not like it:
It is intrusive on the class which it is tracking.
Relies on C style casts which could result in bad casts when an offset is required in a hierarchy.
new function - should this not be a static function and should it also not throw an exception which is derived from bad_alloc?
delete - shouldn't this also be static? if so what happens(when static) and a null pointer is deleted which is valid.
If reference count is one and you call delete then you decrement the variable and do not free!
What happens when it is used in a hierarchy? does each level get a different variable reference_count which is not synchronised.

Share this post


Link to post
Share on other sites
Quote:
Original post by dmail
Quote:

What do you think of this way of approaching a reference counting mechanism?

I do not like it:
It is intrusive on the class which it is tracking.
Relies on C style casts which could result in bad casts when an offset is required in a hierarchy.
new function - should this not be a static function and should it also not throw an exception which is derived from bad_alloc?
delete - shouldn't this also be static? if so what happens(when static) and a null pointer is deleted which is valid.
If reference count is one and you call delete then you decrement the variable and do not free!
What happens when it is used in a hierarchy? does each level get a different variable reference_count which is not synchronised.


What do you mean intrusive? And what's the problem with C casts?

And no, reference count always starts at zero...zero means 'one class is using it'. If no class is using it, the object doesn't exist, so why not using the zero? Perhaps I should call it other_references_counter.

And it should only be used in the base class, I myself don't need it for any derived class only.

And thanks, added static stuff and NULL checking.

Share this post


Link to post
Share on other sites
Quote:
Original post by Decrius
What do you mean intrusive? And what's the problem with C casts?


It's intrusive because you need to modify the class you're reference counting. boost::shared_ptr works with any class, not just ones you "own".

The problem with C-style casts is that it's not always clear what kind of cast you're doing. Is it's static_cast<>? reinterpret_cast<>? etc? A C++ class is more explicit in that respect.

Quote:
Original post by Decrius
And no, reference count always starts at zero...zero means 'one class is using it'. If no class is using it, the object doesn't exist, so why not using the zero? Perhaps I should call it other_references_counter.


That's a non-standard meaning for the reference count. I've never heard of a reference counting implementation where "0" means "1 reference". It still works, obviously, but if another programmer comes along and sees your work, they're going to be confused for a while until they figure out the non-standard implementation.

Share this post


Link to post
Share on other sites
Quote:
Original post by Codeka
It's intrusive because you need to modify the class you're reference counting. boost::shared_ptr works with any class, not just ones you "own".


Yes true. That isn't much of a feature-let-down for me, but it does mean it won't work for everything. The way this implementation works is that it needs to be IN a class, and doesn't CONTAIN the object. So it needs to be in a struct / class, while boost takes anything, true.

Quote:
Original post by Codeka
The problem with C-style casts is that it's not always clear what kind of cast you're doing. Is it's static_cast<>? reinterpret_cast<>? etc? A C++ class is more explicit in that respect.


Is it used to make code more clear, or make it safer as well?

Quote:
Original post by Codeka
That's a non-standard meaning for the reference count. I've never heard of a reference counting implementation where "0" means "1 reference". It still works, obviously, but if another programmer comes along and sees your work, they're going to be confused for a while until they figure out the non-standard implementation.


I get what you mean, but it didn't sound logical to me to not use the zero. All counters start at zero for computers...might change it to start at one, just for code clearness.

Share this post


Link to post
Share on other sites
Quote:
Original post by Decrius
What do you think of this way of approaching a reference counting mechanism? It might be not how 'real' reference counting works...I mean, it won't _automatically_ delete the pointer, but this is the way I wanted to make this.
It doesn't work, and I can't really think of a way of making it work. Every delete calls the destructor: you can't use the object once you delete it, regardless of whether you deallocated the memory or not, and you certainly can't delete it a second time.

Try to add a complex object member (say, an std::string) to your object and you'll see what I mean.

Share this post


Link to post
Share on other sites
Just glimpsed over it:


0) Code C++, not C/C++

You are using #define macros, which should be avoided in c++ except for very rare cases. Most of what those macros where used for in C can now be done better with inheritance and templates.


1) Security

You are assuming that an implementing class puts your macro inside public visibilty. It could be that the macro hence exposes protected and private members of the implementing class (and users of intellisense will use what they see in the intellisense-popup (not microsoft specific)).


2) Fight the optimiser

Your benchmark is bogus:
    // my method
time = GetSystemTime();
for (unsigned int i = 0; i < 10000; i++)
{
B *a = new B; // << alloc
delete a; // << dealloc
}
std::cout << (GetSystemTime() - time) << '\n';

Compilers are perfectly allowed to remove the whole loop, as it has no side effects, except the constructor or destructor do something that affects a bigger scope then the loop. It is generally quite hard to write benchmarks. A good start is to make the constructor somehow dependent on unpredictable data (for example user input), and to ensure the output is dependent on the processed input (enforce that the measured data participates in all stages of the IPO model).


3) Unreliable results

Point 3) is really an addendum to point 2). I see you have calls to std::cout<< in B::B() and B::~B(). While this would be a 99% guarantee that your loop will not be optimized away, it is not a good idea to read/write from/to standard i/o:
  • output buffering occurs, and every now and then (e.g. every 8192 chars) it will be flushed, which takes up processing time and makes measurement more or less unreliable
  • you interface with the operating system, which is always a (with respect to time) unpredictable process



4) Compare to boost::intrusive_ptr

You are comparing to boost::shared_ptr what really should be compared to boost::intrusive_ptr.


Appendix A)

Hint: If you are on Linux, you can get more reliable benchmark results by starting your benchmark as the only user process, by passing init=<path-to-your-benchmark> as an argument to the kernel (actually, if you're benchmark is stable (i.e. without std::cout or similar in the measure loop), you can get results that only vary in fractions of percents from run to run).


Appendix B)

For serious measurements, always check what your compiler spits out, i.e. have a look at the assembly, e.g. to make sure certain optimisations do not influence the results.


Further Reading on Optimisation





No offense, but you asked for it ;)

edit: Appendix B) added
edit2: Added optimisation links

[Edited by - phresnel on January 9, 2009 8:41:45 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by ToohrVyk
Quote:
Original post by Decrius
What do you think of this way of approaching a reference counting mechanism? It might be not how 'real' reference counting works...I mean, it won't _automatically_ delete the pointer, but this is the way I wanted to make this.
It doesn't work, and I can't really think of a way of making it work. Every delete calls the destructor: you can't use the object once you delete it, regardless of whether you deallocated the memory or not, and you certainly can't delete it a second time.

Try to add a complex object member (say, an std::string) to your object and you'll see what I mean.


You are right. That's really a pity. Is there then no other way then to use a wrapper to prevent calls to the destructor and deallocation of memory? :'(

So in fact, a custom delete operator isn't really custom...it still does things the stock delete operator does '-.-

Share this post


Link to post
Share on other sites
This looks like something I would have tried only to find out it doesn't work.

Rather than create loads of wrappers to do the same job, its a shame you can't overload pointers in the class definition.

Share this post


Link to post
Share on other sites
I used to use my own smart pointer implementation, mainly because I didn't want too much boost dependency.

However, since that shared_ptr is part of the c++ 0x standard, I think it's a much, much better solution, and I threw away my old implementation (which was also not quite as good and useful as shared_ptr).

The ability to specify a special deleter is very useful when you have situations calling for special end-of-life management (for instance if you have a persistence system, the deleter could put the object in a queue to be serialized back to disk before being deleted)

The ability to have untyped shared_ptr (ie shared_ptr< void >) that still do the right thing and delete the object properly is also very useful.

The standard (and boost) also provide weak_ptr. It can be a bit of a pain to do yourself and get it right, and you'll probably need it at some point. To do it you'll have to give up a lot of simplicity in your implementation because you'll need to allocate a separate object to hold the actual instance pointer, or alternatively you'll have to maintain a list of all the weak pointers pointing on your object.

Your implementation also won't work in situations where you have a reference counted class that you then use through a pointer to a parent class or an interface that it implements.

shared_ptr also works properly with multithreading. Your implementation doesn't.

Finally, since shared_ptr is in the standard, it's bound to become widely adopted. So in the future if you use shared_ptr you'll have an easier time interoperating with libraries also using those.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!