Sign in to follow this  
Servant of the Lord

Naming functions that 'operate on' vs 'copy and return'

Recommended Posts

What are your views on naming identical functions that do the same thing, with the only difference being whether they [b]operate on [/b]the variable passed in (by reference), or operate on a [b]copy[/b] of the variable (passed in by const reference) and returning it?

In my current code, I want to have two functions with the same purpose, but different usages. (both functions I have in a "[b]String::[/b]" namespace)
I have:
[CODE]std::string ReplaceAll(const std::string &str, ....);[/CODE]

I also have:
[CODE]void ReplaceAll(std::string &str, ...);[/CODE]

They do the exact same thing (takes a string to operate on, a string to find, and a string to replace it with), but I use them in different ways.
I have more than just 'ReplaceAll()', I want multiple versions of several different functions. (Since I can always implement one function by calling the other, it doesn't introduce code bloat, and bug-fixes or optimizations to one automatically effect the other).

The first is useful for passing as parameters, such as this type of situation: (Supposing ToLower() is the function we are talking about)
[CODE]map[ String::ToLower(key) ] = value;[/CODE]

It works well. If it didn't return a value, I'd have to do this:
[code]std::string lowercaseKey = key;
String::ToLower(lowercaseKey);
map[ lowercaseKey ] = value;[/code]

That makes code less clear when doing a series of operations, because you have boilerplate lines for every operation of the function, instead of just having each operation be self-explanatory.

However, when I'm operating on a single value, or a series of values, I don't want to go like this:
[code]str = Operation(str);
str = SecondOperation(str);
str = OtherOperation(str);
str = NextOperation(str);[/code]

I would much rather do the following, for code readability:

[code]Operation(str);
SecondOperation(str);
OtherOperation(str);
NextOperation(str);[/code]


But I can't have two different versions of the same function named the same thing, for two reasons:
1) It'd be confusing to read. Which version am I calling here, version A, or version B?
2) C++ wouldn't allow it, even if I wanted it - the function parameters are too identical so the function names would collide.

So what would you call the two functions to distinguish them?
I was thinking something like,
[CODE]ReplaceAllOn(str, ...) //Operates 'on' str, and returns void.
ReplaceAllFrom(str, ...) //Operates on a copy of str, and returns the copy.[/CODE]

Another possibility is using pointers for the 'operate on' version of functions, letting the & symbol indicate what's going on.
[CODE]ReplaceAll(&str, ...)
str = ReplaceAll(str, ...)
[/CODE]

However, I prefer using references when I know the variable address won't change during the function, and there's still the possibility of accidentally going:
[code]ReplaceAll(str, ...) //Returning a copy, and having no effect, because you forgot the '&' so it called the wrong function. Hard to spot mistake when scanning code visually.[/code]

What I'd really love is a compiler error if I'm using the wrong function, but I can't think of a way to make that happen.
[code]Blah = OperateOn(blah); //Compiler error: Blah = void is invalid. (This error already works)
OperateOnCopyOf(blah); //Compiler error: You're not assigning the return value to anything, you moron! (Sadly, this error my compiler doesn't inform me of)[/code]

Thoughts?

Share this post


Link to post
Share on other sites
Telastyn    3777
I would perhaps argue that this should be avoided. There should be one idiomatic usage in the code that is consistent.

That said, I would use something like ReplaceAllOn, ReplaceAllInplace or ReplaceAllMutating. Why? Mutating behavior should be the less preferred version, so should have the more annoying name. But it is similar behavior so should have the same beginning of the function name so it's next to the other one via intellisense/auto-doc and thus aiding in discover-ability.

The non-mutating version would be simply ReplaceAll.

Share this post


Link to post
Share on other sites
phresnel    953
Just about the technical side:

[quote]2) C++ wouldn't allow it, even if I wanted it - the function parameters are too identical so the function names would collide.[/quote]

C++ allows to overload on const and non-const references.

Share this post


Link to post
Share on other sites
Cornstalks    7030
The Standard Library seems to prefer putting "copy" on the copy version and no suffix to the in place version ([url="http://www.cplusplus.com/reference/algorithm/"]linky[/url]). But I don't really have an opinion.

Share this post


Link to post
Share on other sites
_moagstar_    465
+1 for the _copy suffix. I would argue that it is the most idiomatic "c++" way of doing things, it's the approach taken by both the standard library and boost. As an example, take a look at the std::string member functions, the name of the function is the operation, and they are all mutating:

[url="http://www.cplusplus.com/reference/string/string/replace/"]http://www.cplusplus.com/reference/string/string/replace/[/url]

So I would expect the default behaviour of a function to be mutating, unless the name of the function specifies otherwise.

I agree that the convenience of the copy version and the fact that it takes next to no work to implement make it worth the while.

For some inspiration you might want to take a look at the boost string algorithms:

[url="http://www.boost.org/doc/libs/1_48_0/doc/html/string_algo.html"]http://www.boost.org/doc/libs/1_48_0/doc/html/string_algo.html[/url]

Share this post


Link to post
Share on other sites
The thing std::string's functions having '_copy()' functions is because it's part of a class. My functions are free-floating - there's a difference in expected usage.

[quote name='Telastyn' timestamp='1328935627' post='4911883']
Mutating behavior should be the less preferred version[/quote]

Why should it be less-preferred? I get the benefits of compiler-errors if I mistake it for the copy version ("[i]Warning: Can't assign 'void' to 'std::string'[/i]"), but not vice-versa. Plus, it doesn't bother creating unneeded copies that it just discards. It's safer and faster, so why not prefer it?

What are your thoughts about using namespaces for this purpose?

[CODE]string = String::Copy::ReplaceAll(string);
String::Mutate::ReplaceAll(string);[/CODE]

Or maybe having the mutate version just be String::ReplaceAll() and the copy versions of all the string functions, being the specialized versions, exist in String::Copy::ReplaceAll().

Share this post


Link to post
Share on other sites
Cornstalks    7030
[quote name='Servant of the Lord' timestamp='1328989319' post='4912027']
The thing std::string's functions having '_copy()' functions is because it's part of a class. My functions are free-floating - there's a difference in expected usage.
[/quote]
You might wanna look at <algorithm> (which I linked to in my first post).

Personally, I'd prefer to have them in the same namespace.

Share this post


Link to post
Share on other sites
Ravyne    14300
For string mutation, you usually can't assume that the storage won't need to change size anyhow, so you probably end up doing a re-allocation internally anyhow. Just return the new string, and if you wish to over-write the old version, assign to it.

Share this post


Link to post
Share on other sites
Telastyn    3777
[quote name='Servant of the Lord' timestamp='1328989319' post='4912027']
Why should it be less-preferred? I get the benefits of compiler-errors if I mistake it for the copy version ("Warning: Can't assign 'void' to 'std::string'"), but not vice-versa. Plus, it doesn't bother creating unneeded copies that it just discards. It's safer and faster, so why not prefer it?
[/quote]

That C++ isn't good enough to return errors on unused return values isn't a mark against the practice of favoring mutating functions in general.


It's not safer. At least not in the ways that I value.

Mutating functions tend to cause side effects with other things that reference the object, causing harder to find bugs; for somethings this is expected behavior. For most things that are not literal objects (conceptually) it's not.
Mutating functions are far harder to test in isolation, causing more runtime errors.
Mutating functions are far harder to design with in a concurrent scenario, leading to more awkward code and/or worse bugs.
You can make mutating behavior by combining non-mutating behavior and assignment; you cannot make non-mutating behavior from a mutating function. This gives you more flexibility as things evolve.

Share this post


Link to post
Share on other sites
Very good points, thank you.

What are you defining 'literal objects' as?
I generally feel like unless the function holds onto the reference beyond the life of the function, it's pretty-clear cut and wont cause any side-effects.

I don't really get the distinction you are making. How is:
[CODE]ReplaceAll(str); //Changes 'str'.[/CODE]
Any different from:
[CODE]str.replace(...); //Changes 'str'.[/CODE]

[quote name='Telastyn' timestamp='1328998743' post='4912080']
You can make mutating behavior by combining non-mutating behavior and assignment; you cannot make non-mutating behavior from a mutating function. This gives you more flexibility as things evolve.
[/quote]
I'm not sure I understand this. I can make a non-mutating function from a mutating function.
[code]//Implementing DoWork_Copy() off of a DoWork_Mutating().
std::string DoWork_Copy(std::string original)
{
std::string copy = original;
DoWork_Mutating(copy);
return copy;
}[/code]

Versus:
[code]//Implementing DoWork_Mutating() off of a DoWork_Copy().
void DoWork_Mutating(std::string &original)
{
original = DoWork_Copy(original);
}[/code]
Perhaps I'm misunderstanding your terminology?

I haven't ever worked in multithreading code before, though I'm sure I will in the future. How is the first chunk of code harder to parallelize than the second chunk of code?

Share this post


Link to post
Share on other sites
Telastyn    3777
[quote name='Servant of the Lord' timestamp='1329002047' post='4912091']
What are you defining 'literal objects' as?
[/quote]

Sorry, poor wording. Things that are objects in the real world or problem domain. For example Dates and Strings are not objects. Mutating them is weird, and not immediately intuited (in my experience). Inventories or people are more objects with traits or children. Changing a trait on an object doesn't make a new object, but adding something to a value gives you a new value.

Perhaps I've been using C# for too long, but the distinction between [url="http://www.albahari.com/valuevsreftypes.aspx"]value and reference types[/url] is one thing that I think doesn't get as much credit as it should. Even in some things like strings that are reference types but immutable that distinction often comes if the type represents an object in real life as opposed to a thing or a concept.[quote name='Servant of the Lord' timestamp='1329002047' post='4912091']
// example of non-mutating function from a mutating one
[/quote]

This requires copy on assignment, and will end up in... 3 copies if I count correctly (one for the parameter into the function, one into the temp and then one in whatever catches the result). Again, used to working in languages where copy on assignment is uncommon (or mutation is highly frowned upon/unavailable).

[quote name='Servant of the Lord' timestamp='1329002047' post='4912091']
How is the first chunk of code harder to parallelize than the second chunk of code?
[/quote]

Those examples don't cause issues. Assume though that you were doing your mutation in-line on the same variable. You have to be careful how that is implemented. If for example you were doing a ReplaceAll on a string, that string would be off limits if you did something like

[code]
for( char *i = str; i; ++i ){
if( *i == search ){
*i = replace;
}
}
[/code]

Because another mutating function or something that gets the string's value might run any time during the middle of this loop and see any of the intermediary steps. You would have to essentially build the new buffer as a copy and then do the replacement; and even then you might undo another mutating effect that occurs at the same time. Or you could have to prevent access to the variable... which sucks.

With an immutable behavior there's no chance for an interruption because each pile of intermediate steps exists in its own call stack. There you still have to worry about the scenario where the value you thought you were working with changes before your operation is done. Those issues though are less catastrophic if you fail to address them, and 'compare and swap' operations are well known solutions to these problems. Plus, they allow the user of the classes to deal with concurrency where they are using them since what needs to be done depends a bit on usage.

I'm sure something there is not entirely clear... let me know and I'll see what I can do to clarify or further expound on things.

Share this post


Link to post
Share on other sites
MaulingMonkey    1728
While it's hard to generalize, where possible I generally trend towards making in-place operations verbs (e.g. "Normalize") and return-a-copy operations nouns (e.g. "Normal") or adjectives (e.g. "Normalized") and, depending on the language, possibly properties if they lack any inputs.

So a possibility would be "ReplaceAll" vs "WithAllReplaced". Of course, you can't enforce this on external libraries... and .NET's Replace for example returns a copy since strings are immutable there.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this