Overall Strategy For Move-Semantics? [C++11]

Started by
26 comments, last by l0calh05t 7 years, 8 months ago

Hello,

so something I've been wondering for a while. How do you generally account for move-semantics/rvalue-references, overall? What I mean is, in order to take advantage of move-semantics, I've generally been doing the following since now:

1) Make all classes have move-ctors, where possibly and advantageous. Since I'm making havy use of STL wrappers, most classes have functioning default-move-ctors already, with the exceptions being classes making use of std::unique_ptr and the latter.

2) When I know that a function will take an object that is expensive to copy, but I know that its only going to be a temporary in all forseably use cases (ie. a local function that is only called for one-time object initialization/serialization), I declarte it as &&, like this:


class MyClass
{
    using MyVector = std::vector<ComplexClass>;
    void Function(MyVector&& vData)
    {
        m_vData = std::move(vData);
    }

private:

    MyVector m_vVector;
}

However, things get a bit more complicated when I cannot foresee how an variable is to be used, or if I know I will call it with temporaries, as well as have to pass in fixed data members that cannot be moved anyways.

So to all you fellow C++11-users, how dou you take care of this? I bascially see 4 options:

1) Do not account for it at all. Just use


void Function(const MyVector& vData)
{
    m_vData = vData;
}

like you would've done without C++11 and move semantics, and take the additional copies, memory allocations and deletions. It doesn't matter in most cases (so we're basically in premature optimization land), and/or maybe the compiler can figure this out for himself (which I doubt in any case where the compiler will not inline the function, but I'm by far no expert).

2) Write both version for move and no-move operations:


void Function(const MyVector& vData)
{
    m_vData = vData;
}

void Function(MyVector&& vData)
{
    m_vData = std::move(vData);
}

Obviously this will take care of both use cases, but it will require additional work for every function that benefits for move-semantics, and produces code duplication for non-trivial functions. Also it gets really messy once there are multiple parameters that could have move semantics.

3) Write only the move-semantic version


void Function(MyVector&& vData)
{
    m_vData = std::move(vData);
}

and when calling the function with a non-temporary, explicitely create a temporary:


const MyVector vDataFromSomewhere;
object.Function(MyVector(vDataFromSomewhere));

While this is just as efficient for this case than before (the temporary I create will be moved in), it requires additional typing for every non-temporary I pass in (so I now have to specify how I want to pass it in via eigther temporary ctor or std::move for every paramter that there is, ugh).

4) Now the next option was pretty much what sparked the question. I found out that I could do this:


void Function(MyVector vData)
{
    m_vData = std::move(vData);
}

MyVector vTemporaryData;
object.Function(std::move(vTemporaryData));

Which will move vTemporaryData to vData, and then vData to m_vData. So this means that for non-temporaries, I do not have to make additional typing, and it should also be equally efficient. For temporaries, it should also work like with MyVector&&, though in both cases there is an additional move-ctor call that would otherwise be evaded (though I imagine the compiler could be able to optimize this out, plus an move-ctor call is really nothing compared to copying a vector of 1000 elements).

________________________________________________________________________

So now I know this is not the most important problem in the world and I probably shouldn't worry about it, but its just something that I found interesting and wanted some opinions/real world stories on. Do you account for move-semantics in your code (if you generally use C++11, of course). If so, do you use any of the four options I presented, or do you use it on per-case basis like I used to do before (or maybe is there something completely different that I didn't see like option 4 for the longest time)?

I find option 4 the best in this regard, though it has something weird and unusual, to now suddently be passing in all the expensive objects per value instead of reference... though I guess the same applied to before I found out that I could savely return stuff like MyVector from functions due to RVO/move semantics.

Advertisement

You could do this by combining Universal References with std::forward if you don't mind adding templates into the mix.

I haven't tested it but something like this should work:


template<class VectorT>
void Function(VectorT&& vData)
{
    m_vData = std::forward(vData);
}

If universal references work the way i understand them too, then it should accept both kinds of references, and forward to the assignment operator preserving the reference type.

You could add a static_assert with a is_vector type_trait style check to make sure you get a clean error message rather than the template type mismatch if you pass the incorrect type.

If you'd prefer not to have templates then i'd go with the multiple functions with overloads. it's cleaner for calling code:


object.Function(vDataFromSomewhere); // reference will copy data
object.Function(std::move(vDataFromSomewhere)); // move will move data
When I am in this situation (which does not happen really often) I have tried a few approaches since C++11 and ended up settling on (4).

I don't think thePyro_13's approach is useful though because it pretty much throws type safety out the window and prevents you from hiding any implementation details in its own compilation unit. It's obviously fine when you already write generic template code but I cannot see it work reasonably for the concrete stuff.

If you use universal references(which sounds like what you need) you can also add an enable_if to limit the parameter type instead of a static_assert.

Ah, I quess universal references is option 5) then. Though I really do not like this option in the cases I'm discussing here, since this would mean adding templates to a whole lot of functions, with all the downsides bitmaster already mentioned. I'm already using them in some places where it makes sense, but I think I'll stick to 4) when going for a broad-scale solution.

If you'd prefer not to have templates then i'd go with the multiple functions with overloads. it's cleaner for calling code:

That is one of the reasons why I disliked option 3), however as I've written 4) also has this property, without requiring additional overloads. Also imagine if I had a function like this:


void SetExtentionData(const std::wstring& stExtention, std::string&& stData, ExtentionData::VariableMap&& mAttributes);

In this case I've already accounted for what is likely to be moveable and what not, but if I wanted it to be generic, I had to write 9 different overloads to account for all different semantic attribute combinations (which is why I don't like 2) very much eighter).

Option 4 looks a bit messy since it implies the caller has to think about this problem. I'd probably prefer to use option 2, migrating there from option 1 as and when profiles are showing the copies to be expensive. But it depends a lot on whether you're writing new code or maintaining old code, and whether you can assume everyone using the code will understand all the performance implications.

Option 4 is promoted by some of the standard committee members. My mind has accepted the reasoning, but my fingers refuse to type it. :P

Mostly I just pass by const ref, and use r-value ref when needed.

However, const-ref for when copies aren't needed, and pass-by-value when copies are needed are technically the correct way. My only "problem" with it is that it prevents me from forward-declaring classes (since pass-by-value requires the full class definition).

Option 4 looks a bit messy since it implies the caller has to think about this problem.

Assume you have a member function like this:


void MyClass::SetText(std::string text)
{
     this->text = std::move(text);
}

The requirement is, the class needs its own variable that it can modify without affecting external variables (i.e. a reference/pointer/handle to an externally owned variable is for some reason unacceptable in this circumstance - we'll assume the class' requirements have actually been designed well).

Since a copy is going to be made anyway, caller behavior is thus:

99.9% of the time, the caller doesn't need to think. If he isn't explicitly giving up ownership of his copy, the function then makes its own copy.


myClass.SetText(variableWeStillWantToUse); //Does a copy - like normal.

.

If the caller happens to call it with a literal or otherwise short-lived value (i.e. an 'r-value'), then the function silently does a move instead of a copy.


myClass.SetText("variable that is a literal"); //Does a move - a small optimization.

variableWeStillWantToUse = "The time is now %time%";
myClass.SetText(variableWeStillWantToUse.format(getTime())); //Does a move on the return-value of the format() call (if it's a temporary).

.

But, if the caller knows that he's not going to use the variable anymore, he can explicitly choose to give up ownership:


myClass.SetText(std::move(variableWeStillWantToUse)); //Does a move.

This last one would be "premature optimization" in most cases; callers generally, 99.9% of the time, shouldn't be calling std::move().

Callers never have to think about this, unless they want to make that micro-optimization.

The *default* is business as usual with automatic micro-optimizations taking place on variables that the function already knows are temporary (like literals, or the results of function calls like (25 + 17) or something.format(str)).

Being able to call std::move() explicitly is an added bonus (or a distraction) for callers, but one they don't have to think about. Essentially, move-semantics was added to provide automatic micro-optimizations, with no change necessary in caller-behavior, and only (opt-in) changes for class/function writers.

You'd still use const-ref for variables you don't need copies of, and references/pointers/handles for variables you want shared or owned elsewhere.

Ofcourse, compilers will still do RVO to avoid even the moves, when possible, but when not possible, the moves are equal or superior to copies (a move is essentially a 'shallow-copy' with transfer of ownership, whereas a copy is a 'deep-copy', if the class has (for example) pointers to allocated memory or file handles or whatever).

The only problem with pass-by-value is that you can't forward-declare your variables, which makes it a no-go for me most of the time. :P

Option 4 looks a bit messy since it implies the caller has to think about this problem. I'd probably prefer to use option 2, migrating there from option 1 as and when profiles are showing the copies to be expensive. But it depends a lot on whether you're writing new code or maintaining old code, and whether you can assume everyone using the code will understand all the performance implications.

True, the caller having to think about it is something that I didn't really see. I'm not sure I'd agree that its really that messy, since it allows the user not to think about it in the general case, and if they eigther measure that its a performance problem or are overly paranoid, they can just add std::move().

Its mostly about me using my own code, though. Its a mixture of maintaining old code and writing new one (since my codebase is now ~3 years old and is still being extended, though I also do heavy refactoring with stuff like this here). So I think outside of my plugin-API, the user having to think about when to apply move is not much different to the writer of the function since its the same person, so I guess I'll stick with option 4, though I see that option 1/2 might be beneficial for a widely used API.

However, const-ref for when copies aren't needed, and pass-by-value when copies are needed are technically the correct way. My only "problem" with it is that it prevents me from forward-declaring classes (since pass-by-value requires the full class definition).

Ah, true, thats something I didn't see. Now in most of my use cases it doesn't matter because I'm using STL/templated classes here, which means they are included in the header anyways. Outside of that, I will mostly store the moved variables as class members anyways, so yet another reason why their definition is already included in the header. Still a good catch and something I might look out for at times (I'm not a fan of scrapping forward declarations myself).

Assume you have a member function like this:




void MyClass::SetText(std::string text)
{
     this->text = std::move(text);
}
The requirement is, the class needs its own variable that it can modify without affecting external variables (i.e. a reference/pointer/handle to an externally owned variable is for some reason unacceptable in this circumstance - we'll assume the class' requirements have actually been designed well).

Since a copy is going to be made anyway, caller behavior is thus:
99.9% of the time, the caller doesn't need to think.


I think I'd expect it to go more like this:

  • They want to call MyClass::SetText
  • They see that the interface takes a copy instead of a const reference
  • They then have to dig into the implementation to see whether this is going to create unnecessary copies or not, because they're used to passing const references to functions like this
  • If they find the implementation then they realise it's not a problem.
  • If the implementation is hidden then they have to ask around and find out more, or just hope it's not an issue.

Whereas with option 2, the process is:

  • ?They want to call MyClass::SetText
  • They see the interface has an overload with && in it
  • They either:
    • a) recognize that it's an r-value reference that will do the right thing, so they're happy
    • b) wonder what the hell this double ampersand business is about, read up on it, and then they're happy

This is from my experience where gamedev teams typically contain a majority of both legacy code and (legacy) coders that predate C++11 and do not use typically it, but do use older idioms designed to improve performance (eg. never passing complex objects by value).

For my own personal stuff this seems like a situation that doesn't arise often - I rarely create temporaries that then get copied into something else. Again this is probably just an old habit where, in a world where all we had was auto_ptr and we were told to avoid it anyway, we were very careful about what created an object, how they got passed on, and about not copying them unnecessarily.

[...]


I feel the need to disagree. Back when I first got my hands on a C++11-capable compiler I was working on a tool with a legitimate use of injecting expensive to copy object into something else.

Obviously, I started out with the obvious: pass by rvalue reference and all was good. At least for a while, but the program evolved and sometimes I did truly need to copy instead of move out of. Of course the first time that happens you can just create a temporary and move out of that. Then you need to do it more than once and you add the overload. You proceed to notice your overload looks somehow like this:
void myFunc(const Expensive& expensive)
{
   Expensive tempCopy(expensive);
   myFunc(std::move(tempCopy));
}
Ideally at this point you immediately realize that is a rather pointless overload and with a simple pass by value you can have the exact same effect, just without the overload-mania.

This becomes even more clear once you become aware it is also a very common pattern to specify copy/move assignment operators in one pass-by-value implementation as well.

Here is the thing: C++11 is not the new kid on the block. It's half a decade old. If it were an actual kid it would be getting bored with kindergarten and preparing to move on the more interesting pastures. It's even older when you consider how long some of these core concepts have been there in the C++0x-era when everyone still believed in the '0' of 0x.

If you show so extremely little interest in your most basic of tools at your disposal that standard idioms like that elude after half a decade, then you really should not call yourself a C++ programmer. Or if you do, at least make sure someone who went to the trouble of actually investing skill points is close by.

This topic is closed to new replies.

Advertisement