When C++ makes you want to throw your monitor out of the window

Started by
13 comments, last by cjmarsh 6 years, 4 months ago

Assuming a C++14 or C++17 compliant compiler, what is the (approximate, neglecting the exact addresses) output of this program?

No cheating! Say what you believe it is without consulting the standard for 30 minutes, and without compiling and running the program.


#include <cstdio>
#include <cstdint>
#include <cstdlib>
#include <cassert>

void* operator new  ( std::size_t sz ) { printf("alloc size %u\n", sz); return malloc(sz); }
void operator delete( void* ptr, std::size_t sz ) noexcept { printf("delete size %u\n", sz); free(ptr); }
void operator delete( void* ) noexcept { assert(0); }

int main()
{
    volatile int* x[5] = {};

    for(auto& y : x) y = new int(0);
    for(auto& y : x) printf("%d @ %p\n", *y, y);
    for(auto& y : x) delete y;

    puts("exit");
    return 0;
}
Advertisement

Not sure what the purpose is, but I don't see anything that would make it not work as expected on its face.

Of course with the language family there are always fun gotchas. The obfuscated code contests are a great example of that.

The only things I see that make me suspect oddity are the volatile modifier (which shouldn't do much here except prevent load/store caching if the loops are unrolled) and the multiple versions of delete, but I believe the most specialized version is always called because so many custom allocators require it. The noexecpt values make me wonder if there are new specializations I need to look up. It has been a while since I've looked too closely at the memory allocation operators, but those seem like correct since there have been heated discussions before about what exactly is required in them, they return a buffer address and the language does the rest.

So my guess: Print five lines of "alloc size 4" or possibly "alloc size 8", or very unlikely "alloc size 2". Then five lines of "0 @ address" since you allocated five integers of value zero, then it should call your first delete function that prints "delete size #" with a matching size of 2, 4, or 8 for each item, followed by "exit".

There could easily be some trickery, but I'm not seeing it with a slow reading. Time to copy/pate this thing!

My analysis concurs with frob's, and VS2015 does exactly what we both expected.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

The only things I see that make me suspect oddity are the volatile modifier

The effect is irrespective of volatile, I only added it to be 100% on the safe side (besides calling printf) so nobody could say "optimizing out unused stuff probably gets in the way somehow". That's the reason why I'm using a loop over an array of pointers as well (but it works the same with a single pointer).

The verdict is a different one.
If you thought the output of this program should look somewhat like this:

alloc size 4
alloc size 4
alloc size 4
alloc size 4
alloc size 4
0 @ 0000000000375fa0
0 @ 0000000000375800
0 @ 0000000000375820
0 @ 0000000000375840
0 @ 0000000000375860
delete size 4
delete size 4
delete size 4
delete size 4
delete size 4
exit

... then you are of course right. That is exactly what the output has to look like. A "usual allocation function" with exactly two parameters of which the second one is of type size_t is present, and the allocation is not an array of trivially destructible objects (in which case the single-argument global operator delete is to be used) so operator delete(void*, size_t) noexcept is the correctly chosen overload.
It's what you can easily verify by compiling e.g. with GCC 7.1, too.

However, if you thought that the output might look like this:

alloc size 4
alloc size 4
alloc size 4
alloc size 4
alloc size 4
0 @ 0000000000375fa0
0 @ 0000000000375800
0 @ 0000000000375820
0 @ 0000000000375840
0 @ 0000000000375860
Assert failed: test.cpp, line 9: 0

...then, again, you are of course right. That is exactly what the output may look like. Because, you know, an implementation is allowed to call the single-argument overload instead at its own discretion, and it is a requirement that allocations work the same with either one being called (with or without size). No, I'm not joking.
The entire point of having a sized operator delete in the first place is that you do not need to explicitly store the size of the allocation.
Well,... duh. Except... a program that provides the void* overload should also provide the void*, size_t overload, and a program that provides the void*, size_t overload shall also provide the void* overload. Plus, you are required that it "just works" anyway if you are only being called with void*.
Note the difference in wording. The meaning of shall is that your program is not well-formed if you don't fulfill the precondition (and GCC indeed emits a warning, although it "works fine"). It is well-defined what overload is called in some exceptional cases, but for the general case which version is called is implementation-defined.

But lastly, if you thought that maybe the output could look like:

alloc size 4
alloc size 4
alloc size 4
alloc size 4
alloc size 4
0 @ 000000000049a2e0
0 @ 000000000049a300
0 @ 000000000049a320
0 @ 000000000049a340
0 @ 000000000049a360
exit

... then, again you are of course right. That is exactly what the output may look like. It's what you get if you compile the same source code with clang 5.0, too.
Wait, neither of your overloads being called? What gives? Compiler bug? Sigh... No, if only that was the case...

More specifically, if you look at the generated code (running objdump -dSC, that's the single-object version without loop for brevity), what you get from GCC is this:

0000000000407ed0 <main>:
int main()
{
407ed0: 53 push %rbx
407ed1: 48 83 ec 20 sub $0x20,%rsp
407ed5: e8 86 98 ff ff callq 401760 <__main>
void* operator new ( std::size_t sz ) { printf("alloc size %u\n", sz); return malloc(sz); }
407eda: ba 04 00 00 00 mov $0x4,%edx
407edf: 48 8d 0d 1a 11 00 00 lea 0x111a(%rip),%rcx # 409000 <.rdata>
407ee6: e8 b5 ff ff ff callq 407ea0 <printf(char const*, ...)>
407eeb: b9 04 00 00 00 mov $0x4,%ecx
407ef0: e8 1b fc ff ff callq 407b10 <malloc>
auto x = new int(0);

printf("%d @ %p\n", *x, x);
407ef5: 31 d2 xor %edx,%edx
407ef7: 48 8d 0d 4e 11 00 00 lea 0x114e(%rip),%rcx # 40904c <.rdata+0x4c>
407efe: c7 00 00 00 00 00 movl $0x0,(%rax)
407f04: 49 89 c0 mov %rax,%r8
void* operator new ( std::size_t sz ) { printf("alloc size %u\n", sz); return malloc(sz); }
407f07: 48 89 c3 mov %rax,%rbx
printf("%d @ %p\n", *x, x);
407f0a: e8 91 ff ff ff callq 407ea0 <printf(char const*, ...)>
void operator delete( void* ptr, std::size_t sz ) noexcept { printf("delete size %u\n", sz); free(ptr); }
407f0f: ba 04 00 00 00 mov $0x4,%edx
407f14: 48 8d 0d f4 10 00 00 lea 0x10f4(%rip),%rcx # 40900f <.rdata+0xf>
407f1b: e8 80 ff ff ff callq 407ea0 <printf(char const*, ...)>
407f20: 48 89 d9 mov %rbx,%rcx
407f23: e8 10 fc ff ff callq 407b38 <free> <--- inlined everything, but OK

delete x;

puts("exit");
407f28: 48 8d 0d 26 11 00 00 lea 0x1126(%rip),%rcx # 409055 <.rdata+0x55>
407f2f: e8 cc fb ff ff callq 407b00 <puts>
return 0;
}
407f34: 31 c0 xor %eax,%eax
407f36: 48 83 c4 20 add $0x20,%rsp
407f3a: 5b pop %rbx
407f3b: c3 retq



whereas clang gives you:
00000000004015f0 <main>:
4015f0: 55 push %rbp
4015f1: 48 83 ec 20 sub $0x20,%rsp
4015f5: 48 8d 6c 24 20 lea 0x20(%rsp),%rbp
4015fa: e8 31 02 00 00 callq 401830 <__main>
void* operator new ( std::size_t sz ) { printf("alloc size %u\n", sz); return malloc(sz); }
4015ff: 48 8d 0d fa 79 00 00 lea 0x79fa(%rip),%rcx # 409000 <.rdata>
401606: ba 04 00 00 00 mov $0x4,%edx
40160b: e8 40 00 00 00 callq 401650 <printf(char const*, ...)>
401610: b9 04 00 00 00 mov $0x4,%ecx
401615: e8 c6 65 00 00 callq 407be0 <malloc> <--- So far so good
auto x = new int(0);
40161a: c7 00 00 00 00 00 movl $0x0,(%rax)

printf("%d @ %p\n", *x, x);
401620: 48 8d 0d 24 7a 00 00 lea 0x7a24(%rip),%rcx # 40904b <.rdata+0x4b>
401627: 31 d2 xor %edx,%edx
401629: 49 89 c0 mov %rax,%r8
40162c: e8 1f 00 00 00 callq 401650 <printf(char const*, ...)>

delete x; <--- what about this?

puts("exit");
401631: 48 8d 0d 1c 7a 00 00 lea 0x7a1c(%rip),%rcx # 409054 <.rdata+0x54>
401638: e8 93 65 00 00 callq 407bd0 <puts>
return 0;
40163d: 31 c0 xor %eax,%eax
40163f: 48 83 c4 20 add $0x20,%rsp
401643: 5d pop %rbp
401644: c3 retq <--- wait, that's all???

Because, you know, an implementation is allowed to call the single-argument overload instead at its own discretion, and it is a requirement that allocations work the same with either one being called (with or without size).

Is it?

http://en.cppreference.com/w/cpp/memory/new/operator_delete

According to the notes regarding the two-argument (void*, size_t) function:

5-6) Called instead of (1-2) if a user-defined replacement is provided except that it's implementation-defined whether (1-2) or (5-6) is called when deleting objects of incomplete type and arrays of non-class and trivially-destructible class types (since C++17). A memory allocator can use the given size to be more efficient. The standard library implementations are identical to (1-2).

The implementation is only allowed to call the simple overload for arrays of PoD-types or incomplete types (so "incorrect" any way), and only since C++17.
Where'd you get the idea that the implementation is free to call eigther function in all cases?

Where'd you get the idea that the implementation is free to call eigther function in all cases?

From how I read my copy of the draft. which is a bit less obvious but alas... more binding than cppreference.

Let's start with the harmless and obvious stuff.

The library provides default definitions [...] Some global allocation and deallocation functions are replaceable.

A C ++ program shall provide at most one definition of a replaceable allocation or deallocation function. Any such function definition replaces the
default version provided in the library.

A usual deallocation function is a deallocation function that has: [...] exactly two parameters, the type of the second being either std::align_val_t or std::size_t


OK, so far it's all fine. No surprises, just what you'd expect.

An implementation is allowed to omit a call to a replaceable global allocation function. When it does so, the storage is instead provided by the implementation or provided by extending the allocation of another new-expression.

OK, and that's fucking cool. It's indeed what both GCC and clang do if you do anything that doesn't absolutely require an allocation.
For example: int* a = new int(5); ++*a; printf("%d", *a); will, maybe much to your surprise, never allocate anything and call printf with an immediate! That's about as cool as you can get, and it's perfectly acceptable per the as-if rule too, because it's truly as-if, just without the allocation overhead. Nothing is leaked, no surprises.

One might think that the artefact maybe comes from omitting or extending the allocation, but that's not the case. The allocation takes place, but no deallocation happens (although letting the process die and having the OS reclaim memory will "work", I don't think it is legal to do that in presence of an explicit delete expression, at least I can't see how it would be).

The value of the operand of delete may be a null pointer value, a pointer to a non-array object [blah blah...]. If not, the behavior is undefined.
Note: This means that the syntax of the delete-expression must match the type of the object allocated by new


OK. Pretty obvious.

delete expression will call destructurs
[...]
if [...] incomplete type at the point of deletion and [...] non-trivial destructor, behavior is undefined

OK. Again, pretty obvious. How should the compiler know what to do on an incomplete type.

If the allocation call for the new-expression [...] was not omitted and the allocation was not extended, the delete-expression shall call a deallocation function

That's what you expect!

Otherwise, if the allocation was extended [...]

Otherwise [if the allocation was omitted] ... will not call a deallocation function.


Again, what you expect. Makes perfect sense.

But now it starts...

If deallocation function lookup finds more than one usual deallocation function, the function to be called is selected as follows:

  • If [alignment], function with std::align_val_t preferred; otherwise without [obvious]
  • If the deallocation functions have class scope, the one without a parameter of type std::size_t is selected [irrelevant]
  • If the type is complete and if, for the second alternative (delete array) only... [irrelevant]
  • Otherwise, it is unspecified whether a deallocation function with a parameter of type std::size_t is selected.

Bummer.

If exceptional case or other exceptional case... otherwise (general case): unspecified. Unspecified means nothing but the compiler can in principle just do what it wants.

But it goes on:

3.7.4
[...]
void operator delete(void*, std::align_val_t) noexcept;

17.5.4.6
void operator delete(void*, std::size_t);
[...]
The program’s definitions are used instead of the default versions supplied by the implementation.

18.6.1
[...]
void operator delete(void* ptr) noexcept;

(Note: I accidentially copied the wrong lines, but it's all the same either way. They're obviously all supposed to be the same version, e.g. void*, size_t. What I wonder is at one location they're noexcept and at the other they're not.)

So which one is it? The standard says the exception specification is part of the function type, thus it is not possible to overload these. Maybe a typo in the standard, and for a while I thought that might be the reason for the artefact that I see... but I tried both with and without noexcept, got the exact same behavior, and no error/warning in either case.

By the way, the standard says it's undefined if the deallocator throws, so making them explicitly noexcept is kinda superfluous (much like destructors).

[...]
Replaceable: A C ++ program may define functions with any of these function signatures, [...].
If a function without a size parameter is defined, the program should also define the corresponding function with a size parameter.
If a function with a size parameter is defined, the program shall also define the corresponding version without the size parameter.
[...]
Required behavior: A call to an operator delete with a size parameter may be changed to a call to the corresponding operator delete without a size parameter, without affecting memory allocation.

Bummer, again. Any usually. in common understanding, means "none, one, two, or whichever combination of as many I want", but shall (in the context of the standard) means your program is not well-formed if you don't do as you're told.

I'm not qute sure how you're supposed to work without affecting the (de)allocation if you're not given the possibly very relevant information of how much to free. The only conclusion can be that you're supposed to explicitly store the allocation size anyway (because you never know whether you're being told the size or not).

The first reading was without the benefit of the standard or reference sites. Now that we're past that, I'm going to argue a few of those.

Fortunately I had some time this morning to look these up, getting cozy with the language standard takes some time.

An implementation is allowed to omit a call to a replaceable global allocation function. When it does so, the storage is instead provided by the implementation or provided by extending the allocation of another new-expression.

OK, and that's fucking cool. It's indeed what both GCC and clang do if you do anything that doesn't absolutely require an allocation. For example: int* a = new int(5); ++*a; printf("%d", *a); will, maybe much to your surprise, never allocate anything and call printf with an immediate!

It is true that would be amazing.

However, that's a piece of 5.3.4p10, but you've got to read the whole block which includes both item 10 and also 10.1 through 10.6 together. There are several constraints: it omitted allocations need to be consecutive, the allocations need to invoke the same allocator and use the same delete functions that are also at the same point. So your example doesn't quite work. Your original code could possibly merge them if you did something slightly different since there are five identical allocations and five identical destructions if it were to unroll the loop.
Your specific example is covered in the example of paragraph 10.6, explicitly listed as "unmergeable". It is possible for the new operator to throw, meaning a catch handler would have observably different results. With a slight modification you could use the nowthrow version of new -- which returns a null pointer instead of throwing -- and that code could be mergeable.

But that wasn't your example. If you made the change to use the nowthrow new instead of the throwing new then it could possibly be a single allocation and deallocation.

That is exactly what the output may look like. It's what you get if you compile the same source code with clang 5.0, too. Wait, neither of your overloads being called? What gives? Compiler bug? Sigh... No, if only that was the case...

That is covered under 5.3.5p7 which includes 7.1, 7.2, and 7.3. It is possible under some circumstances for destructors to be called but the deallocation function to not be called. I had forgotten about that when working through your first example (without referencing anything) although I've read it before.

So yes, that's legal. Destructors would be called, but deallocators can potentially be optimized into oblivion if the compiler can prove a few things.

As you described, that's a version that displays the text inside the new call, but doesn't actually call the deallocators, jumping immediately to the exit line. It is a cool little gotcha for an unusual situation.

Assert failed: test.cpp, line 9: 0

...then, again, you are of course right. That is exactly what the output may look like. Because, you know, an implementation is allowed to call the single-argument overload instead at its own discretion, and it is a requirement that allocations work the same with either one being called (with or without size). No, I'm not joking.

This one is a little more tricky. There are a bunch of possible deallocation functions the programmer can provide. Class-specific deallocators, array deallocators, placement deallocation, alignment-knowing and size-knowing versions, nothrow ("placement") versions, and user defined version that take whatever arguments you throw at them.

If you are ONLY looking at the delete functions within 18.6.1, you are right. The standard plainly describes how the compiler is free to use a less-parameter version instead of a more-parameter version.

There are assorted documents that discuss why that is the case. There have been additions to the functions over the years through different versions of the standard. C++98 had only the plain pointer version and the nothrow version (That's section 18.4 in the 1998 standard). In an effort to remain backwards compatible with itself, compilers needed a way to gracefully back down to prior versions.

That's where 5.3.5p10 come in. The later versions added the paragraph to state that if both the size-providing version and the non-size-providing version are present, then the size-providing version must be used. The 2017 proposal (renumbered to 8.3.5p10) has expanded the selection even more since there are even more possible forms of delete, describing how if they are present the most specific version must be called.
So in your case where you specifically called out a 2014 or 2017 respecting compiler and you provided two versions, the compiler must choose the two parameter versions. If you had provided one of the proposed 2017 versions and required a 2017 respecting compiler, than those more specific versions would be the required form.

So no, I don't think that one is an option at all. On a recent compiler where both options are available, the less-specific version should not be called when a more-specific version is found.

That's where 5.3.5p10 come in. The later versions added the paragraph to state that if both the size-providing version and the non-size-providing version are present, then the size-providing version must be used. The 2017 proposal (renumbered to 8.3.5p10) has expanded the selection even more since there are even more possible forms of delete, describing how if they are present the most specific version must be called. So in your case where you specifically called out a 2014 or 2017 respecting compiler and you provided two versions, the compiler must choose the two parameter versions. If you had provided one of the proposed 2017 versions and required a 2017 respecting compiler, than those more specific versions would be the required form. So no, I don't think that one is an option at all. On a recent compiler where both options are available, the less-specific version should not be called when a more-specific version is found.
So you are saying that this is in your opinion a clang compiler error rather than something allowed by the standard?

That would be too good to be true because it would mean it would need to be, and will be fixed. Whereas if the standard allows for it... well, you know. If it's legitimate, then the only thing you can do is throw your monitor out of the window in a fit of rage as indicated by the topic title, but it remains debatable inhowfar this helps solving your programming problem. :)

Unluckily (but maybe my English/standard reading skills are just too low) I understand the forementioned 5.3.5p10 as explicitly allowing (not disallowing) the compiler to choose whatever it wants.

5.3.5p10.1 is about new-extended alignment, so not relevant

5.3.5p10.2 gives class-scope deallocators precedence (which kinda makes sense too, but is also irrelevant)

5.3.5p10.3 is about deleting an array of objects, so again it's irrelevant (for the example, anyway, very relevant otherwise)

Lastly 5.3.5p10.4 says "Otherwise, it is unspecified". In my understanding, the word "otherwise" means as much as "except for the preceding three situations, in every case", and the word "unspecified" means "can do what you want" (in the same sense as "can evaluate function arguments in any order you like").

If you want another C++ nightmare, this came up on the chat (thanks Bacterius :D ):
#include <thread>
#include <iostream>

class alignas(32) AlignedObject {
public:
  float dummy[8]; // I bet a smart auto-vectorizing compiler can use AVX to store/process this!
};

int main() {
  while (true) {
    std::thread([](){
      AlignedObject x;
      std::cout << &x;
      std::thread([x](){
        std::cout << &x;
      }).join();
    }).join();
  }

  return 0;
}
Both Clang and GCC will compile this into invalid AVX code that crashes due to unaligned accesses -- when passing the float[8] by value into the capture, it does not align it's internal allocation to the sufficient 32 bytes. My first reaction was that this is a bug -- why the hell wouldn't they respect alignof(T) when allocating space to store a T... but then my very quick language lawyering seems to indicate that they are allowed to do this as the largest scalar type available on the system is only 16 bytes, so compilers are allowed to declare this as the maximum possible alignment value.

another C++ nightmare
C++17 should fix that one.

The problem here is that alignas like most things in C++14 -- much to my dismay --, was done half-assed only. It is only to be respected for objects with automatic storage. Which, of course, is not the case when you pass a lambda capture larger than one pointers' size to a thread, because then the std::function used by std::thread will do a dynamic allocation. Surprise, you're dead. And yes, although it's very obviously not what you want and complete bullshit, the compiler is perfectly right in what it's doing. It follows the requirements of the standard to the letter.

Now, luckily, this problem has bitten more than a couple of people, not just in this extreme example, so C++17 will be more sane, it also requires dynamic allocations to respect alignas. Which is what most people assumed to be the case in the first place.

This topic is closed to new replies.

Advertisement