A bit of a misnomer in most C++ textbooks

Started by
6 comments, last by Bregma 5 years, 8 months ago

In almost all (read all) tutorials and books I found that when they talk about object lifetime is that they say the same thing. That is that a variable, automatic variable (declared in local scope), is created when it's definition is encountered and destroyed when its name goes out of scope. I find this to not be true. Even Stroustrup mentions this in page 167 (C++ Programming Language).

Usually this variable is allocated in a stack, called stack frame. Now, I wrote a few simple c++ code examples and opened them as MASM files. From what I know they are allocated immediately when stack frame is created, that is at some offset from base stack pointer (or whatever is called), for e.g. if we have two ints, the second one is located at [BSP-8] considering 32-bit wide ints.
This also makes sense since the space for all local variables needs to be set once, at function beginning, instead of being created each time a variable is declared. This would also slow the execution since several variables would have to created at separate times in a function which would use additional instructions.

The thing is that I don't understand why all the textbooks mention this as explained above? It used to confuse me at the beginning until I red several lower level (asm books). If I'm wrong in any way please correct me.

Advertisement

I reckon you should see this in the context of the language. For higher level languages, a LOT of lower level stuff can be going on, but in the high level language you won't be able to leverage that (well, you can in some cases, but let's keep things simple).

So the books are correct when looking at the level of the language they describe. Even if in the resulting assembly or machine code variables are created ahead of time, you cannot use them outside of your definition scope, reliably and deterministically.

Hope that helps!

You appear to be conflating allocation (the process of setting aside memory for use) and construction (initializing the memory, causing it to be a valid object). Those are different things, though from the perspective of your C++ code, typically they happen (or appear to happen, as you've discovered with stack-allocated objects) at the same time. It is possible to allocate memory for an object on the heap without constructing it (at that time), as well, though the new operator (by default) does both for you. Similarly, "destruction" does not necessarily mean "deallocation" - it just means the destructor is called on the object - though the delete operator (by default) also does both and a stack-allocated object will do both when it goes out of scope.

This allows for some types of custom allocator that might destroy an object without releasing its memory so that a new object can be constructed in the same memory block as the old one, for example.

So, no, the books and tutorials are not lying to you. :)

A language definition doesn't describe actual implementation behavior, it describes rules that must hold for any implementation of the language.

That is, an actual implementation of the language may do anything it likes, as long as all the rules in the language are fulfilled. This opens the door to optimizing the implementation. Different compilers make different choices there, resulting in differences in performance for a single program.

If you write your program while following all the rules of the language, it will work correctly with any compiler. If you go beyond what the language rules promise, you're in "undefined behavior" land, which may or may not work, depending on what an implementation actually does.

There is even more than those in the language standard's introduction. In the C++ the details run on for five pages with all kinds of details. There are similar descriptions in other language standards.

The language standard is defined in the abstract. Actual compilers can make changes as needed as long as they don't change observable behavior. It is also sometimes called the "as if" rule: it can change anything it wants as long as ultimately it looks as if the C++ instructions were run exactly in the order they were given.

Many choices are left up to the actual implementation. These include sizes and alignment of things, how values are passed and processed, and how certain tasks and processing are ordered.  As @Alberth describes this allows for most language optimizations and language extensions (which are permitted in C++). Processing steps can be merged, tasks can be split, and unnecessary actions can be removed entirely as long as the observable behavior is the same.

What you described, and what @Oberon_Command explained, is one of those pieces of non-observable behavior.  In the C++ code you create a variable and it eventually goes out of scope. There is no observable difference to the C++ code if the space is allocated a little bit earlier. There is also no observable difference to the C++ code if the space is allocated somewhere else. While it happens to be allocated on the stack in your case, there is nothing preventing a compiler from allocating it somewhere else, like a location in scratch memory or keeping the value in CPU registers. The allocation and release could be done earlier in the program, even at program startup long before execution hits those lines. Those implementation details aren't observable to the abstract language.

In heavily optimized builds the compiler does an enormous amount of work to speed up the program, and it does all of it while preserving observable behavior.  As a few examples: The compiler can move things up outside of loops; it can combine loops to run in parallel; it can combine data or split out data in certain types of processing; it can remove "dead code" it believes has no observable effects such as writing a value and then immediately overwriting the value; it can pre-compute values at compile time so they aren't computed at runtime; it can replace your math operations with different math operations that run faster on the hardware, like replacing multiplication by a constant with a shift and an addition; it can replace calls to library functions with intrinsic functions, like replacing a call to sin() with an equivalent instruction; it can inline functions so the parameters and return codes are never processed; there are 

 

 

59 minutes ago, Alberth said:

If you write your program while following all the rules of the language, it will work correctly with any compiler. If you go beyond what the language rules promise, you're in "undefined behavior" land, which may or may not work, depending on what an implementation actually does.

Exactly ... I believe with almost every C++ compiler you can call a member through null pointer, like this......

reinterpret_cast<CMyClass *>(0)->MyMemberFunction(a,b,c);

It generally works fine and even may have some use.  you can say  "if (this==0)" in your code. As long as your don't reference through your null pointer your program will happily run. In fact to make it fail the compiler would have to put in extra code, so for performance reasons they don't. I think Microsoft famously used this in some of their system code and people were screaming at them for it :D

4 hours ago, Gnollrunner said:

It generally works fine and even may have some use.

Undefined behaviour may or may align with the programmer's expectations, until or except when it doesn't.

Anyway, OP's concern was that the lifetime variables declared with automatic storage duration was a misnomer because it doesn't correspond exactly  to the lifetime of the automatic storage.  His assumption was the two lifetimes should align: the only actual requirement is that the lifetime of a storage class needs to be at least as long as the variables stored therein (a variable that lasts beyond the lifetime of the free store, for example, is a memory leak).  It's not a misnomer, it's an incomplete understanding on OP's part.

Stephen M. Webb
Professional Free Software Developer

This topic is closed to new replies.

Advertisement