The Copy&Swap Idiom

Started by
31 comments, last by TheComet 10 years, 6 months ago

Great thread, especially that post by Hodgman on the history of exceptions smile.png I never have really liked them. They're ok for application development, but games can generally ensure that errors won't happen in the first place enough of the time to skip robust handling of them in most cases, and do direct checking of return values and such when needed.

I do like copy&swap, if only to save on code duplication. And on the topic of the rule of 3, more often than defining all 3, I just do the destructor, and inherit noncopyable to prevent any chance of the other two causing trouble later. Few complex classes ever actually need to be copied, IME, and you can always go back and add copying if you ever do need it.

And to TheComet: /)

Advertisement


I get the idea of copy and swap, but you're really just offloading the work to the copy ctor, aren't you? You could just as easily write the code into the assignment operator and then write the copy ctor to just call the assignment operator. If you're copying resources then you have to write the explicit copy behaviors somewhere.
  • The Big Rule Of 3 states: If you need either a copy-constructor, destructor, or overload the assignment operator, you will need all 3. http://en.wikipedia.org/wiki/Rule_of_three_%28C++_programming%29 This is always the case if your class manually manages resources (e.g. uses 'new' for its member variables).
    It is good practice to have a swap method for such classes in addition to all of that (the "3 and a half rule", some people call it). So with all of this in place, it is usually easier to write a copy&swap implementation and offloading the workload to the copy constructor rather than duplicating your copying code.
  • A copy-constructor's initialisation list will execute noticably faster than manually assigning values. http://stackoverflow.com/questions/13894415/c-creating-objects-initialization-lists-vs-assignment
  • When using copy & swap, no needless self-assignment checks have to be performed ( if( &cp == this ) return *this; )
  • copy & swap is guaranteed to be exception safe.

The bottom line is: There are more pro's than con's for a copy&swap over the traditional method.

-.-

Look, if you're talking Ro3 then you have at least one member that won't copy correctly on its own. That means that you're going to be specifying explicit behavior somewhere in there, otherwise there's no reason to write the functions. If that behavior is not exception safe, regardless of which method you put it in, then you don't have exception safe code. If it is exception safe then you do have exception safe code. C&S will not change this. When you accept by value in operator=(), you are triggering the copy ctor. If the copy ctor is not safe then you can have leaks.

Initializers can be faster than manual assignment, but you need to bench this before asserting that you're saving the universe through premature optimization here. believe it or not, a comparison for equality is likely to be a little more lightweight than constructing a whole new object in a significant number of scenarios. Additionally, the reason initializers are faster than manual assignments is that manual assignments in a ctor are touching members that have already been instantiated. You're not instantiating new members in operator=(), since they already exist, so manual assignment is not slower.

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

However, writing C++ code is already complex, and writing strongly-exception-safe code makes it even more complex.

To be fair, writing such code isn't just an issue where exceptions are used. Writing code without involving exceptions, that can just as equally fail to acquire resources for the same reasons, needs about as much work to ensure correctness.

It's not just resource handling code. The point there is kind of that if you choose to use C++ exceptions in your project, then you have to think about exception-safety in every function, not just in resource handling ones. e.g. Given a made up class with some invariant, in it's functions, we have to think about whether every other function that it calls throws or not.


void Widget::SetHealth( float value )
{
//class invariant: m_integerHealth == floor(m_realHealth)
  m_integerHealth = (int)floor(value);
  UpdateHealthHUD( m_integerHealth );//if this throws, the invariant is broken. Can/does it throw?
  m_realHealth = value;
}

The fix for the above code is pretty simple -- update both members before calling the HUD function, but that's not the point. Given that C++98 / C++03 give you no solid way to know whether a function throws or not (C++11 fixes this somewhat), this is a mental burden that companies/projects/people have the choice of opting out from, by simply shunning C++ exceptions, which in my experience has been a common choice. Contrast this to C#, where the IDE can very quickly tell me whether a function throws, and what specific exceptions is can throw -- in that case, the burden of writing safe code is lessened.

Even some nice safe looking C++11 code like this is actually a potential memory leak, if one of the constructors throws. C++ likes to set traps like that dry.png


f( std::unique_ptr<T1>{ new T1 }, std::unique_ptr<T2>{ new T2 } ); //looks safe; can leak

And on the topic of the rule of 3, more often than defining all 3, I just do the destructor, and inherit noncopyable to prevent any chance of the other two causing trouble later.

Same here biggrin.png With new classes, I usually follow YAGNI, and start off by inheriting a noncopyable base unless it's obvious that clones need to be made. Also, the RO3 isn't needed for most classes, as long as you use it in your RAII smart-pointers and containers -- from that point on, the rest of your classes can be composed out of these RAII objects, meaning they don't need destructors. So, the vast majority of my classes remain non-copyable and don't have explicit destructors, which takes care of the RO3.

When using copy & swap, no needless self-assignment checks have to be performed ( if( &cp == this ) return *this; )

You may still want to include a self-assignment check, even though the code still works without one. For example, if you have 100 member variables, each of which is expensive to copy and/or swap, then this check will act as a healthy optimization.

A copy-constructor's initialisation list will execute noticably faster than manually assigning values.

Only if those members are default initialized and then later assigned. If the members are primitive types, then they're not initialized by default, so using either initialization or assignment will be exactly equivalent.

copy & swap is guaranteed to be exception safe

The standard says that std::swap should never throw... but a clumsy programmer can still write a swap function for their own class (which overrides std::swap via ADL), which breaks this rule and throws exceptions. You need to have trust in all the programmers on the project that they actually will write nothrow swap functions.

@Khatharr

Sorry, I didn't mean to offend you by putting your quote there.

I did the profiling test as you requested, here are the results.


#include <ctime>

// ---------------------------------------------------------------------
// traditional implementation
// ---------------------------------------------------------------------
class Traditional
{
public:

    // default constructor
    Traditional() : m_Data( new int() )
    {
    }

    // copy constructor
    Traditional( const Traditional& that ) : m_Data( new int() )
    {
        *this = that;
    }

    // assignment operator overload
    Traditional& operator=( const Traditional& that )
    {
        if( &that == this ) return *this;
        *m_Data = *that.m_Data;
        return *this;
    }

    // default destructor
    ~Traditional()
    {
        delete m_Data;
    }

    void set( int i){ *m_Data = i; }
private:
    int* m_Data;
};

// ---------------------------------------------------------------------
// copy & swap implementation
// ---------------------------------------------------------------------
class CaS
{
public:

    // default constructor
    CaS() : m_Data( new int() )
    {
    }

    // copy constructor
    CaS( const CaS& that ) : m_Data( new int() )
    {
        *m_Data = *that.m_Data;
    }

    // assignment operator overload using copy & swap
    CaS& operator=( CaS that )
    {
        swap( that );
        return *this;
    }
    void swap( CaS& that )
    {
        using std::swap;
        swap( m_Data, that.m_Data );
    }

    // default destructor
    ~CaS()
    {
        delete m_Data;
    }

    void set( int i){ *m_Data = i; }
private:
    int* m_Data;
};

int main()
{
    std::clock_t begin;
    std::clock_t end;
    double elapsed;

    // ---------------------------------------------------------------------
    // testing: copy constructor speed
    // ---------------------------------------------------------------------
    std::cout << "TEST: copy constructor" << std::endl;
    begin = std::clock();
    for( int i = 0; i != 10000000; ++i )
    {
        Traditional test1;
        Traditional test2( test1 );
    }
    end = std::clock();
    elapsed = double(end-begin) / CLOCKS_PER_SEC;
    std::cout << "traditional: " << elapsed << std::endl;
    begin = std::clock();
    for( int i = 0; i != 10000000; ++i )
    {
        CaS test1;
        CaS test2( test1 );
    }
    end = std::clock();
    elapsed = double(end-begin) / CLOCKS_PER_SEC;
    std::cout << "copy&swap: " << elapsed << std::endl;

    // ---------------------------------------------------------------------
    // testing: assignment speed
    // ---------------------------------------------------------------------
    std::cout << "TEST: assignment" << std::endl;
    begin = std::clock();
    {
        Traditional test1, test2; test2.set(5);
        for( int i = 0; i != 10000000; ++i )
        {
            test1 = test2;
        }
    }
    end = std::clock();
    elapsed = double(end-begin) / CLOCKS_PER_SEC;
    std::cout << "traditional: " << elapsed << std::endl;
    begin = std::clock();
    {
        CaS test1, test2; test2.set(5);
        for( int i = 0; i != 10000000; ++i )
        {
            test1 = test2;
        }
    }
    end = std::clock();
    elapsed = double(end-begin) / CLOCKS_PER_SEC;
    std::cout << "copy&swap: " << elapsed << std::endl;

    return 0;
}

AlmGx.png

The conclusion of this test is that the traditional method of copying and overloading the assignment operator is faster then the copy & swap method. As long as people who use the traditional method don't do silly things like the following code, they should be on the faster side of things.

EDIT: IGNORE THE FOLLOWING CODE


// NOTE: assume m_Data has the definition: int* m_Data = new int[ m_Size ];

// DON'T do this
Foo& operator=( const Foo& that )
{
   if( &that == this ) return *this;
   int* oldData = m_Data;
   for( std::size_t i = 0; i != m_Size; ++i )
      m_Data[i] = that.m_Data[i];
   delete[] m_Data;
   return *this;
}

// DO this instead
Foo& operator=( const Foo& that )
{
   if( &that == this ) return *this;
   std::swap( m_Data, that.m_Data );
   return *this;
}

So when I opened this thread, I questioned how efficient the copy & swap idiom was. Am I safe to conclude that it is not as efficient as the traditional method, with the only benefit of being less error prone?

"I would try to find halo source code by bungie best fps engine ever created, u see why call of duty loses speed due to its detail." -- GettingNifty

@Hodgman

I can't find the exact quote from you anymore, but somewhere you mentioned how unfortunate it is that the operator new throws exceptions, and can't just return 0 if it failed.

According to this: http://www.cplusplus.com/reference/new/operator%20new%5B%5D/

You can bring that functionality back by writing:


MyClass* myObject = (std::nothrow) new MyClass();
if( !myObject )
   std::cerr << "oh oh, couldn't allocate memory for myObject" << std::endl;
"I would try to find halo source code by bungie best fps engine ever created, u see why call of duty loses speed due to its detail." -- GettingNifty

I questioned how efficient the copy & swap idiom was. Am I safe to conclude that it is not as efficient as the traditional method, with the only benefit of being less error prone?

If you're striving for transactional / exception-safe code, it's definitely less error prone. As for efficiency, you'd have to look at a what's actually happening on a case by case basis.
The copy constructor of your CaS class should actually be faster than the Traditional one:


*Traditional copy*      *CaS copy*
 new integer             new integer
 self assignment check   copy integer
 copy integer

Your current CaS assignment is very inefficient though:


*Traditional assign*      *CaS assign*
 self assignment check     new integer
 copy integer              copy integer
                           swap pointers
                           delete integer

So, you're basically profiling the speed of new/delete here tongue.png

You could also try this version that defaults to using an empty pointer, so no new/delete happens unless necessary to store a value, and also this "Move" version, which takes advantage of the fact that it's ok to mess with the temporary copy and steal it's memory allocation.


class CaS
{
public:

    // default constructor
    CaS() : m_Data()
    {
    }

    // copy constructor
    CaS( const CaS& that )
    {
        if( !that.m_Data )
            m_Data = 0;
        else
        {
            m_Data = new int;
            *m_Data = *that.m_Data;
        }
    }

    // assignment operator overload using copy & swap
    CaS& operator=( const CaS& that )
    {
        swap( CaS(that) );
        return *this;
    }
    void swap( CaS& that )
    {
        using std::swap;
        swap( m_Data, that.m_Data );
    }

    // default destructor
    ~CaS()
    {
        delete m_Data;
    }

    void set( int i){ *m_Data = i; }
private:
    int* m_Data;
};


class Move
{
public:

    // default constructor
    Move() : m_Data()
    {
    }

    // copy constructor
    Move( const Move& that )
    {
        if( !that.m_Data )
            m_Data = 0;
        else
        {
            m_Data = new int;
            *m_Data = *that.m_Data;
        }
    }

    // assignment operator overload using copy & move
    Move& operator=( const Move& that )
    {
        move( Move(that) );
        return *this;
    }
    void move( Move& that )
    {
        delete m_Data;
        m_Data = that.m_Data;
        that.m_Data = 0;
    }

    // default destructor
    ~Move()
    {
        delete m_Data;
    }

    void set( int i){ *m_Data = i; }
private:
    int* m_Data;
};

In C++11, as well as copy constructors, we now have move constructors. You might want to read up on r-value references and move semantics if you're interested in this:
http://msdn.microsoft.com/en-us/library/dd293665.aspx
http://en.cppreference.com/w/cpp/language/move_constructor


// DO this instead
Foo& operator=( const Foo& that )
{
   if( &that == this ) return *this;
   std::swap( m_Data, that.m_Data );
   return *this;
}

You can't do that. That code modifies that.m_Data, and that is const, so it's not a valid assignment function (it's more of a swap function!)

So when I opened this thread, I questioned how efficient the copy & swap idiom was. Am I safe to conclude that it is not as efficient as the traditional method, with the only benefit of being less error prone?

I'm still disagreeing with the assertion that it's less error prone compared to RAII. (I'm not offended, just mildly facepalmy. You're fine.)

Foo& operator=( const Foo& that )
{
   if( &that == this ) return *this;
   int* oldData = m_Data;
   for( std::size_t i = 0; i != m_Size; ++i )
      m_Data[i] = that.m_Data[i];
   delete[] m_Data;
   return *this;
}
^ No, no, no...

//Make m_Data a unique_ptr<int[]>. RAII helps with exception safety!

Foo& Foo::operator=(const Foo& that) {
  if(&that == this) {return *this;}

  unique_ptr<int[]> tentativeData(new int[that.m_Size]);
  int* srcPtr = that.m_Data.get();
  std::copy(srcPtr, srcPtr + that.m_Size, tentativeData.get());
  //exception prior to here? tentativeData is deallocated by the smart pointer
  m_Data = std::move(tentativeData); //the original contents of m_Data will be deallocated by the move
  m_Size = that.m_Size;
  //the m_Size assignment won't be reached if an exception occurs during the move

  return *this;
}

Foo::Foo(const Foo& that) : m_Data(nullptr) {
  operator=(that);
}

//if that shows up as a performance concern (unlikely) then do it explicitly instead:

Foo::Foo(const Foo& that) : m_Data(new int[that.m_Size]), m_Size(that.m_Size) {
  int* srcPtr = that.m_Data.get();
  std::copy(srcPtr, srcPtr + m_Size, m_Data.get());
}


Also, this one isn't right. You're trying to swap the m_Data of the two objects here. The object you're assigning from would end up with the m_Data from the object you're assigning to. Except it can't happen because 'that' is const.
Foo& operator=( const Foo& that )
{
   if( &that == this ) return *this;
   std::swap( m_Data, that.m_Data );
   return *this;
}
void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

Oh god, now I face-palmed. Sorry, let's just step back a bit, and ignore that last code snippet I posted.



Foo& operator=( const Foo& that )
{
   if( &that == this ) return *this;
   int* oldData = m_Data;
   for( std::size_t i = 0; i != m_Size; ++i )
   m_Data[i] = that.m_Data[i];
   delete[] m_Data;
   return *this;
}

^ No, no, no...

Is this not how it's done traditionally? Or am I missing something?


You could also try this version that defaults to using an empty pointer, so no new/delete happens unless necessary to store a value, and also this "Move" version, which takes advantage of the fact that it's ok to mess with the temporary copy and steal it's memory allocation.

I'm trying to decifer how you intended your code to run, as it will not compile on my end (linux, gcc 4.8.1, c++11). Here are the errors:


swap( CaS(that) );

Requires a method accepting an rvalue reference (only possible in C++11), so the method signature should be void swap( CaS&& that ) instead of void swap( CaS& that ).

The same goes for the move method in the other class you provided.

I also changed both set methods to allocate an int if it hasn't been done yet.


void set( int i ){ if( !m_Data ) m_Data = new int(); *m_Data = i; }

And a note on thread safety:


    void move( Move&& that )
    {
        delete m_Data;   // m_Data is momentarily unavailable between here
        m_Data = that.m_Data; // and here
        that.m_Data = 0;
    }

Now to the profiling: Obviously, if you do not set the data, there is nothing to copy, so it is near instant. Setting the data yields the same results as the copy&swap profile test I made earlier.

But the question is: Does this actually optimize performance, or impede it due to all of the extra checks? I cannot think of many real world examples where a class would have non-initialized member variables, and the ones that do have this property are usually classes that aren't copied around the place all of the time (such as manager classes).


So, you're basically profiling the speed of new/delete here tongue.png

But that's exactly what the difference is between copy&swap and the traditional method, so it makes complete sense in my mind: The traditional method doesn't suffer from having to call a copy constructor, because the resources it needs to copy are already there. They just have to be copied across with a single assignment operator each.

"I would try to find halo source code by bungie best fps engine ever created, u see why call of duty loses speed due to its detail." -- GettingNifty

I'm still disagreeing with the assertion that it's less error prone compared to RAII.

The downside I see with the traditional way of copying is it continuously modifies the original object. An exception could be thrown at any point during resource allocation, and if these resources are directly assigned to member variables of the original object, and one of them throws an exception, the original object may only have copied half of the resources.

Let me expand the example to better demonstrate. Let's assume we have the following class, for which the rule of 3 needs to be implemented. I will omit details such as how the Texture object is constructed for simplicity's sake.


class Player
{
//...
private:
   Texture* m_Texture;
   Sprite* m_Sprite;
   HealthController* m_Health;
};

The traditional way:


    // default constructor
    Player() : m_Texture( new Texture() ), m_Sprite( new Sprite() ), m_Health( new m_HealthController() )
    {
    }

    // copy constructor
    Player( const Player& that ) : m_Texture( new Texture() ), m_Sprite( new Sprite() ), m_Health( new m_HealthController() )
    {
        *this = that;
    }

    // destructor
    ~Player()
    {
        delete m_Health;
        delete m_Sprite;
        delete m_Texture;
    }

    // overload assignment operator
    Player& operator=( const Player& that )
    {
        if( &that == this ) return *this;
        
        // any of these could throw an exception
        *m_Texture = *that.m_Texture;
        *m_Sprite = *that.m_SpriteM
        *m_Health = *that.m_Health;
    }

The copy & swap way:


    // default constructor
    Player() : m_Texture( new Texture() ), m_Sprite( new Sprite() ), m_Health( new m_HealthController() )
    {
    }

    // copy constructor
    Player( const Player& that ) : m_Texture( new Texture(*that.m_Texture) ),
                                                  m_Sprite( new Sprite(*that.m_Texture) ),
                                                  m_Health( new m_HealthController(*that.m_Health) )

    {
    }

    // destructor
    ~Player()
    {
        delete m_Health;
        delete m_Sprite;
        delete m_Texture;
    }

    // overload assignment operator (copy & swap)
    Player& operator=( Player that ) // notice exceptions could be thrown here, but do not change object state
    {
        // this cannot throw exceptions, completely exception safe
        swap( that );
    }
    void swap( Player& that )
    {
        using std::swap;
        swap( m_Texture, that.m_Texture );
        swap( m_Sprite, that.m_Sprite );
        swap( m_Health, that.m_Health );
    }

That's why copy & swap is safer to use.

"I would try to find halo source code by bungie best fps engine ever created, u see why call of duty loses speed due to its detail." -- GettingNifty

You may still want to include a self-assignment check, even though the code still works without one. For example, if you have 100 member variables, each of which is expensive to copy and/or swap, then this check will act as a healthy optimization.

Only if you perform enough self assignments to overcome the additional branch penalty for non-self assignment. I don't know what kind of code you're working on that has a significant amount of self assignment, but in my experience it's actually pretty rare. As always profile for specific cases, etc. but I would default to no self assignment check.

This topic is closed to new replies.

Advertisement