Sign in to follow this  
nullsquared

i=i++, sequence points (C++)

Recommended Posts

nullsquared    126
i = i++;
I'm sure we've all seen that. Assuming i is a primitive type (such as int), is the behavior undefined? I've read that it is indeed undefined since the = operator does not introduce a sequence point (*). Is this correct? I don't fully understand why, though. The post-fix ++ operator increments by 1 and returns the old value, right? And then this old value will be assigned to i, effectively nullifying the change. (*) Unless i is a non-primitive type and operator=() is overloaded, apparently. In which case it would introduce a sequence point because it is treated as a normal function, and therefore i++ would be fully evaluated before operator=() is actually called. Would any of the C++ gurus like to shed some light on this?

Share this post


Link to post
Share on other sites
nullsquared    126
Quote:
Original post by Sirisian
Quote:
Original post by nullsquared
Assuming i is a primitive type (such as int), is the behavior undefined?

Sequence Points 3rd paragraph.

Also open up the C++ standard and go to section 5.0.4.

Also this: sequence points.


Yes, I've seen these sources. Thus the "I've read that it is indeed undefined [...]". Further more, did you read the part of my post involving "I don't fully understand why, though [...]"?

Share this post


Link to post
Share on other sites
rip-off    10979
Your sentences "The post-fix ++ operator increments by 1 and returns the old value, right? And then this old value will be assigned to i, effectively nullifying the change." rely heavily on the assumption that the increment will take place before the assignment.

You have to forget about the semantics of the operators involved. It is obvious by looking at the code to see what the programmer intended (well, a sane programmer). But the compiler is a machine, it sees a statement that involves two assignments, neither of which has priority over the other.

Share this post


Link to post
Share on other sites
nullsquared    126
Right; what I'm asking is what goes on behind the scenes. The way I see it, i++ is a single self contained operation - "increment, return old value." Obviously it's not a single instruction in assembly. Thus I want to understand how a non-overloaded operator= works such that it may cause undefined behavior in this case. How would the assembly get mixed up such that one happens before the other, considering operator= has lower precedence than operator++(int)?

If it was something like operator=(int &lhs, int rhs) and you called operator=(i, i++) then it would work fine because both parameters would be evaluated before the function is actually executed; clearly this isn't the case, however, thus my question.

Share this post


Link to post
Share on other sites
Mike.Popoloski    3258
Undefined behavior has nothing to do with assembly getting "mixed up". The C++ language defines sequence points that dictate when and where you may modify a variable in a given expression. If you violate these specific rules, you will have undefined behavior on your hands, regardless of what your particular compiler does with the resulting assembly.

Quote:
If it was something like operator=(int &lhs, int rhs) and you called operator=(i, i++) then it would work fine because both parameters would be evaluated before the function is actually executed; clearly this isn't the case, however, thus my question.

The order of evaluation of parameters isn't specified and there is no sequence point there I believe, which would make that undefined behavior as well.

Share this post


Link to post
Share on other sites
SiCrane    11839
Interestingly, this is the assembly that MSVC 2008 spat out in debug mode for i = i++:

mov eax, DWORD PTR _i$[ebp]
mov DWORD PTR _i$[ebp], eax
mov ecx, DWORD PTR _i$[ebp]
add ecx, 1
mov DWORD PTR _i$[ebp], ecx

i gets loaded into a register, immediately reassigned back to itself, reloaded, incremented and then assigned back to itself.

Share this post


Link to post
Share on other sites
nullsquared    126
Quote:
Original post by Mike.Popoloski
Undefined behavior has nothing to do with assembly getting "mixed up". The C++ language defines sequence points that dictate when and where you may modify a variable in a given expression. If you violate these specific rules, you will have undefined behavior on your hands, regardless of what your particular compiler does with the resulting assembly.

I understand that, I'm just trying to figure out if that's just an arbitrary decision by the standard or if there is some underlying issue/reason.

Quote:
Quote:
If it was something like operator=(int &lhs, int rhs) and you called operator=(i, i++) then it would work fine because both parameters would be evaluated before the function is actually executed; clearly this isn't the case, however, thus my question.

The order of evaluation of parameters isn't specified and there is no sequence point there I believe, which would make that undefined behavior as well.

The sequence point is before the function code is executed, which means both i and i++ will be evaluated. The reference will refer to the variable i, and i++ will return a copy of the old value of i:

// pseudo code
int i = 2;
operator=(i, i++);
operator=(int &lhs, int rhs)
{
// lhs is i, which == 3
// rhs is a copy of the old value of i, which is 2
lhs = rhs; // i == 2 again
}

Share this post


Link to post
Share on other sites
nullsquared    126
Quote:
Original post by SiCrane
Interestingly, this is the assembly that MSVC 2008 spat out in debug mode for i = i++:

mov eax, DWORD PTR _i$[ebp]
mov DWORD PTR _i$[ebp], eax
mov ecx, DWORD PTR _i$[ebp]
add ecx, 1
mov DWORD PTR _i$[ebp], ecx

i gets loaded into a register, immediately reassigned back to itself, reloaded, incremented and then assigned back to itself.


That's really interesting.

Share this post


Link to post
Share on other sites
Mike.Popoloski    3258
Quote:
Original post by nullsquared
The sequence point is before the function code is executed, which means both i and i++ will be evaluated. The reference will refer to the variable i, and i++ will return a copy of the old value of i:

// pseudo code
int i = 2;
operator=(i, i++);
operator=(int &lhs, int rhs)
{
// lhs is i, which == 3
// rhs is a copy of the old value of i, which is 2
lhs = rhs; // i == 2 again
}


There is no sequence point between parameters, which means you are modifying a variable without a sequence point in between, which means it is undefined behavior, no matter whether it appears to work or not.

Share this post


Link to post
Share on other sites
jpetrie    13149
Quote:

Right; what I'm asking is what goes on behind the scenes.

That depends on the specific compiler.

Quote:

The way I see it, i++ is a single self contained operation - "increment, return old value." Obviously it's not a single instruction in assembly.

It doesn't matter whether or not it's a single assembly instruction; nor does it matter whether you conceptualize the operation as a single one or an aggregation of many operations. The assembly and the precedence of the operators doesn't matter. What matters is what the C++ standard says, and the standard explicitly calls modification of a scalar value more than once between sequence points "undefined."

Quote:

Thus I want to understand how a non-overloaded operator= works such that it may cause undefined behavior in this case.

It will cause undefined behavior. Always.

Remember, undefined behavior (in the sense of the C++ standard) isn't necessarily going to have apparently malignant results. It can, in fact, do exactly what you would think it correct and logical. That does not preclude it from being undefined behavior. Undefined behavior is the standard saying "we do not account for this scenario" or "we do not consider this scenario to be well-formed" even those it is syntactically valid. Thus, compilers are free to behave however they like. In practice this generally might translate for "do not write code to detect or handle this case," thus the compiler continues code generation as normal. This may cause the compiler to produce code that would crash, or perform an operation out of order... it may cause the compiler to produce slightly different code each time (maybe some sort or algorithm isn't stable under these conditions, for example) -- it all really depends on how the compiler is implemented.

Share this post


Link to post
Share on other sites
SiCrane    11839
Quote:
Original post by Mike.Popoloski
There is no sequence point between parameters, which means you are modifying a variable without a sequence point in between, which means it is undefined behavior, no matter whether it appears to work or not.

I'm pretty sure this isn't undefined behavior. You need to modify the value multiple times between sequence points to rise to undefined behavior, and the value is only modified once.

Share this post


Link to post
Share on other sites
Mike.Popoloski    3258
Quote:
Original post by SiCrane
Quote:
Original post by Mike.Popoloski
There is no sequence point between parameters, which means you are modifying a variable without a sequence point in between, which means it is undefined behavior, no matter whether it appears to work or not.

I'm pretty sure this isn't undefined behavior. You need to modify the value multiple times between sequence points to rise to undefined behavior, and the value is only modified once.


Ah yes, jpetrie was just informing me of that. This just reinforces the idea that you shouldn't try to dance around the issue trying to be tricky, since it will most likely come back to bite you.

Share this post


Link to post
Share on other sites
nullsquared    126
Quote:
Original post by Mike.Popoloski
There is no sequence point between parameters, which means you are modifying a variable without a sequence point in between, which means it is undefined behavior, no matter whether it appears to work or not.


As SiCrane points out, I don't believe that's undefined behavior. If you passed ++i as the first parameter and i++ as the second, then that would undefined.

Quote:

Remember, undefined behavior (in the sense of the C++ standard) isn't necessarily going to have apparently malignant results.

jpetrie, you're right. I'm basing decisions on an incorrect definition of undefined [smile].

Edit:
Interesting enough, here is GCC's version:

movl $1337, 12(%esp) // int i = 1337;
incl 12(%esp) // i = i++;

It increments i and doesn't even assign it back to itself.

Then again, it also says,

D:\programming\test_sequence_points\main.cpp:4: warning: operation on 'i' may be undefined

[grin]

Anyways, thanks for the discussion.

Share this post


Link to post
Share on other sites
swiftcoder    18437
Quote:
Original post by SiCrane
Quote:
Original post by Mike.Popoloski
There is no sequence point between parameters, which means you are modifying a variable without a sequence point in between, which means it is undefined behavior, no matter whether it appears to work or not.

I'm pretty sure this isn't undefined behavior. You need to modify the value multiple times between sequence points to rise to undefined behavior, and the value is only modified once.
But the order of evaluation of parameters is undefined, so the output of printf("%d %d", i, i++) is not predictable, is it?

Share this post


Link to post
Share on other sites
MaulingMonkey    1730
Quote:
Original post by nullsquared
Quote:
Original post by Mike.Popoloski
Undefined behavior has nothing to do with assembly getting "mixed up". The C++ language defines sequence points that dictate when and where you may modify a variable in a given expression. If you violate these specific rules, you will have undefined behavior on your hands, regardless of what your particular compiler does with the resulting assembly.

I understand that, I'm just trying to figure out if that's just an arbitrary decision by the standard or if there is some underlying issue/reason.

It allows for more aggressive optimizers. Consider:
*a = (*b)++;

What this rule means is that if a and b point to the same object, undefined behavior occurs. Let's reword that: The optimizer can act under the assumption that a and b don't point to the same thing, because it would be undefined behavior if they did. In this example, there's not much more optimization we can do based on this -- but we do still gain a little in that the optimizer doesn't have to be quite as strict about the order of operation.

Quote:
Original post by SiCrane
Yes, I believe the standard calls it unspecified behavior rather than undefined behavior.

I believe it (printf("%d %d", i, i++)) still runs afoul of:
Quote:
The C++ Standard, 5¶4, emphasis added:

4 Except where noted, the order of evaluation of operands of individual operators and subexpressions of indi-
vidual expressions, and the order in which side effects take place, is unspecified.
53)
Between the previous
and next sequence point a scalar object shall have its stored value modified at most once by the evaluation
of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored.
The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full
expression; otherwise the behavior is undefined.

And thus is still undefined behavior.

The canonical example of f(a(),b()); does have merely 'unspecified' behavior as to whether a() or b() executes first.

[Edited by - MaulingMonkey on February 13, 2010 6:24:43 PM]

Share this post


Link to post
Share on other sites
nullsquared    126
Quote:
Original post by Makaan
i get something else from gcc for:

int i;
i = i++;

it is just :

add dword ptr [ebp-0x10],0x1

and i get no warnings.


What version of GCC? 4.4.2 here not only gives me the undefined behavior warning but also "warning: 'i' is used uninitialized in this function". And it generates this assembly:

incl 12(%esp)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this