NULL vs nullptr

Started by
28 comments, last by Zipster 9 years, 4 months ago


Seems like you're getting tripped up on semantics because you're using the wrong parameter type.

Write tells you exactly what it does.

If you want to write a string... then you make "Write(std::string)". If you want to write an array, then you "Write(std::array)" - or more likely "template Write(ConstIterator, ConstIterator)".

Assigning arbitrary meaning to some random type that the type never had is the problem, not the naming of the function.

"char*" means "pointer to character". Not "pointer to a zero-terminated array of characters".

Well, unless you're using C, in which case you can't overload anyway.

I'm not using the wrong parameter type. The person who wrote the interface I'm coding against used the wrong parameter type. Yet I wouldn't know that until I attempted to use it and observed incorrect behavior, because the function name wasn't descriptive enough for the char* overload and the original author was relying too heavily on parameter information to imply behavior. Although you're right that this is C++, so I don't want to lean too heavily on my contrived C-string example. It was only one hypothetical case.

However this brings me back to my original point. The canonical use case of nullptr is dealing with ambiguous overloads, yet that fact that one would even have multiple ambiguous overloads where a parameter could either be a integral type or a a pointer type already seems highly suspect to me. I've noticed that serialization libraries tend to be the worst offenders in this regard since they have highly generic, nondescript function names (Read/Write) that support dozens of different overloads. It's highly unlikely that the verbs 'Read' and 'Write' can accurately and concisely convey their behavior for all the possible type overloads they support.

Function overloading is a great feature, but this is one example of how it's often abused.

Advertisement
Ambiguous overloads are harder to avoid with overloaded operators. Consider ostream <<. Has to have both int and pointer overloads.

I was under the impression from a Microsoft interview with one of the VC developers that nullptr was required to allow perfect forwarding to work with varadic templates.


Except that in the case of Write(char*), the function name doesn't say what it does. It could do one of several things as I pointed out above. It's not the parameters' fault that it's ambiguous, I agree. But by your own admission, it's the fault of the function name... so you change the function name. You don't include it as one of the possible overloads of Write. You make WritePointerValue (and have a set of overloads to handle various pointer types), WritePointedToValue, WriteString, for C-strings, etc.

Maybe, but for myself I wouldn't introduce new verbs if it performs the same logical function (that is, its part of the same package of functionality, and is meant to be interchangeable with other "writes" in that context). Instead, by observing that writing/serializing a pointer itself is typically not useful because the address won't be preserved across sessions, I would simply avoid the confusion by explicitly converting it to a type that represents a handle that can be re-hydrated when the file is read back. With that in place, you then have write(int) which writes the integer value provided as a parameter, you have write(int*) which writes the integer value stored at the address provided by the parameter, and write(my_handle<int>) which writes some representation that makes it easier to correlate and re-hydrate its relationship to some integer. For the rare cases that you might actually want to write the actual address of some data (say, for debugging, or a hardware address), simply cast the pointer to an appropriate type like std::size_t and provide an overload for it (be aware the size_t doesn't apply to member pointers). You can also treat indexes/pointers into contiguous arrays specially since there's a common base address.

Also keep in mind that if you want your serialization to be cross platform, you need to be mindful of 32bit vs. 64bit systems, of alignment requirements on different systems, endianness and other things.

See also: std::addressof for completeness of implementation.

All of that said, if you're working with an existing system that you can't change or can only change in limited ways, simply providing a different name for the function might be the path of least resistance.

throw table_exception("(? ???)? ? ???");


Ambiguous overloads are harder to avoid with overloaded operators. Consider ostream <<. Has to have both int and pointer overloads.

Very true, you often don't have a choice with overloaded operators. And to be honest, I try to avoid using them if there could be any confusion over their behavior. A classic example is a vector class that overloads the multiplication operator for itself. Is that a dot product or component-wise multiplication?!

I wouldn't consider the const void* overload of ostream << ambiguous, either, since there's not much else you can do with a void pointer in that context other than treat it as an integral value.

With that in place, you then have write(int) which writes the integer value provided as a parameter, you have write(int*) which writes the integer value stored at the address provided by the parameter, and write(my_handle) which writes some representation that makes it easier to correlate and re-hydrate its relationship to some integer.

To me that would make write(int*) unnecessary, since I could just deference my pointers and it would resolve to the write(int) overload. Same with the other pointer types. Unless you're using write(int*) because it handles null pointers somehow. However more often than not, the calling code is already handling null pointers and missing data, because that may need to be handled differently depending on what's being serialized (sometimes you should write nothing, sometimes you write zero or some other sentinel value, sometimes you throw an exception, etc.).

For the rare cases that you might actually want to write the actual address of some data (say, for debugging, or a hardware address), simply cast the pointer to an appropriate type like std::size_t and provide an overload for it (be aware the size_t doesn't apply to member pointers).

Exactly, which is why I don't see the utility in the pointer overloads. Maybe we'll just have to agree to disagree.


To me that would make write(int*) unnecessary, since I could just deference my pointers and it would resolve to the write(int) overload.

You can always deference a pointer of course, but I think its more than a matter of opinion and preference. First, to me, as a user of write, I don't particularly want to juggle that information unless its explicitly important to me -- I want a convenience function to take care of it, and function overloading does just that with not even a character difference in spelling. In fact, I'd recommend implementing write(int*) in terms of write(int) to avoid code duplication; with inlining, performance will be equivalent. I'll make a second point on whether this is a matter of personal preferences in a bit.


Unless you're using write(int*) because it handles null pointers somehow. However more often than not, the calling code is already handling null pointers and missing data, because that may need to be handled differently depending on what's being serialized (sometimes you should write nothing, sometimes you write zero or some other sentinel value, sometimes you throw an exception, etc.).

That's one distinction yes, and I agree that often the client checks. Indeed, if the client needs to throw an exception or fail and exit the serialization immediately because the data is malformed, it has to be the client because the serialization library doesn't know about the client's data structures. Likewise if the client prefers fine-grained control over whether a particular null pointer should be initialized to some valid default. However, if a value is written to encode a null value, that representation probably belongs to the serialization library, not the client, and if the client wants only course-grained control over whether null pointers should be initialized to some default, its easier and safer for the serialization library to allow the client to set this behavior (either globally for all null pointers, or on a per-type basis) -- the job of a library is not to provide a minimum footprint, but to make doing the right thing easy (a minimal footprint and making it easy to do the right thing

are not usually at odds, but sometimes they are).

Better still, putting my serialization library writer hat back on, providing those "convenience" functions means that I can change the implementation as I see fit -- for example, I might be able to notice that the user writes the same integer via its address several times. If I know, by policy or because its value is const, that the value doesn;t change between calls to write during the same file serialization, then perhaps I can coalesce disk storage of the integer value itself, and encode a smaller means of referencing it into the file-stream. I can do similar for plain old integers -- write(int) -- too, but (and here's that second point) I can do neither if I can't tell the difference between a number of integers that all happen to have the same value and a number of pointers who point to the same integer. This is because a number of integers that happen to have the same value only share equivalence, while a number of pointers who point to the same integer share identity -- its the difference between "has the same value" and "is the same entity", and that is (or at least can be) important. But you can't make the distinction if you wipe away its pointer-ness by prematurely dereferencing it.

You could use the same kind of space-saving trick by remembering the contiguous memory ranges you might have already written (say, arrays or vectors), and then transforming pointers into that range into a potentially-smaller indexer, which could lead to considerably smaller files if the element type is large -- and smaller files are great and all, but really its the ability to make the distinction between equivalence and identity that's important.

throw table_exception("(? ???)? ? ???");

I use 0 for null pointers everywhere. I am not ashamed!

It's mostly because I'm too lazy to type out NULL.

I don't really see a reason not to use nullptr all the time even if for most cases it's not needed. Overloads aside, it's there and it makes it more obvious what the code is doing in my opinion.

"To know the road ahead, ask those coming back."

The most common problem with non-nullptr variants of null is not ambiguity between int and int *, but int and char *. You can also get ambiguity with void *.

For example:

print(NULL);

What should this code do?

If NULL is defined as literal 0, this might print the literal "0".

If this is C, and NULL is defined as (void*)0 this might print 0x00000000. This is probably the most intuitive result, but it doesn't work in C++.

Or, also in the C case, it might have a runtime error, when trying to dereference a null pointer assumed to point to a C-string.

Another result you might want in a case like this is a compile time error. Or at least a warning about bad practice.

In C++98, standard practive was never to use NULL. This makes the behavior of this more obvious, but what if you really do want option 2? More generally, when you mean a null pointer, why should you have to write "0"? That's a different concept, yet you can't use a different symbol. Until nullptr came along.


You can always deference a pointer of course, but I think its more than a matter of opinion and preference. First, to me, as a user of write, I don't particularly want to juggle that information unless its explicitly important to me -- I want a convenience function to take care of it, and function overloading does just that with not even a character difference in spelling. In fact, I'd recommend implementing write(int*) in terms of write(int) to avoid code duplication; with inlining, performance will be equivalent. I'll make a second point on whether this is a matter of personal preferences in a bit.

But certainly you didn't chose to use a pointer (over a reference or a value) on a whim. You did so because you required the semantics of a pointer. But these semantics transfer to the functions that take them as parameters, and I think this is where our opinions differ. If I'm looking through a serialization interface and I see write(int*), I don't just see the function name 'write', I also see the pointer parameter and think that this method must require that pointer because it's doing something special (determining equivalence vs identity, performing some special handling if it's null, etc.) that it wouldn't otherwise be doing were it taking a reference or a value. I can't ignore the built-in reasoning behind why one would chose a pointer vs a reference vs a value, just because a function is overloaded with the same name. And that's not an assumption I would want the compiler accidentally making for me either, which brings me back to my original point of nullptr just being a band-aid for the better solution of using a different function name.


Better still, putting my serialization library writer hat back on, providing those "convenience" functions means that I can change the implementation as I see fit -- for example, I might be able to notice that the user writes the same integer via its address several times. If I know, by policy or because its value is const, that the value doesn;t change between calls to write during the same file serialization, then perhaps I can coalesce disk storage of the integer value itself, and encode a smaller means of referencing it into the file-stream. I can do similar for plain old integers -- write(int) -- too, but (and here's that second point) I can do neither if I can't tell the difference between a number of integers that all happen to have the same value and a number of pointers who point to the same integer. This is because a number of integers that happen to have the same value only share equivalence, while a number of pointers who point to the same integer share identity -- its the difference between "has the same value" and "is the same entity", and that is (or at least can be) important. But you can't make the distinction if you wipe away its pointer-ness by prematurely dereferencing it.

As neat as such a feature/optimization would be, the interface is now most certainly misleading. If the client were to assume that write(int*) was a wrapper around write(int), or that they were otherwise equivalent based solely on the fact that they're overloads, they would be inclined to haphazardly use one or the other based on whatever type was being used in the calling code, because it was convenient. They wouldn't be aware that the pointer overload could do some neat optimizations underneath, and that they should have payed more attention in the calling code to which parameter type they were using. Or perhaps they may want to explicitly circumvent any such optimizations.

This topic is closed to new replies.

Advertisement