Knowledge has only one definition, "It has no Upper Bound"

Sunday, April 26, 2009

Sequence Points, Side Effects & The Confusion

Before proceeding further try answering these easy questions

int i=0, j=0, k=0;
i = i++;
a[j] = j++;

f(k, ++k, ++k);

1) What is the value of i, j, k?
2) What is the value of a[0], a[1] assuming entire array was initialized to zero?
3) If f is defined as
void f(int a, int b, int c){cout<<a<<b<<c;}
then what is the output?



Side Effects

A general definition: A side effect is an action that changes the state of the program's execution environment. Side effects include modifying an object, modifying a file, accessing a volatile object, or calling a function that does any of these operations.

All expressions have a type associated with them, they also yield a value (except for the void type expressions) All expressions have side effects even a void one.
For e.g. n++ is an expression whose value is n and has the side effect of incrementing n.

Let's look at a few more examples of side effects.

We can use the increment or decrement expression (based on unary ++ or --) as a full expression. A full expression is one that is not a sub-expression of some larger expression.
For e.g.
++i;
itself is a full expression.
So is the increment expression in:
for (i = 0; i!=10; i++)

In both the cases we are interested only in the side effect i.e. the incremented value. However, when we use an increment expression as a sub-expression of a larger expression, as in:
x = *p++;
then we are interested in the resulting value as well as the side effect.
No unary operator other than ++ and -- has a side effect.

An assignment expression such as:
m = n;
has an obvious side effect, that is it stores the value of n (converted to the type of m) into m. Less obvious and often ignored fact is that the assignment also returns a value, which is the value of m after the assignment. (Note that it does not return an lvalue. A statement such as a = b = c works only because the association is right to left.)
Arithmetic assignment operators, such as += and -= produce both values and side effects, but other binary operators such as + and - produce only a value. They have no side effects.

Finally, any expression that calls a standard library i/o function has the side effect of modifying a file. For example:
printf("i = %d; j = %d\n", i, j);
yields an int result, which we often ignore. Its side effect is the modification of the standard output file, this is what we are generally interested in.


Sequence Points

A sequence point is a point in a computer program's execution at which it is guaranteed that all side effects of previous evaluations have completed.

The C/C++ standard defines the following sequence points:

  • The call to a function, after the arguments have been evaluated or before a function is entered in a function call. In the expression f(i++), f is called with the original value of i, however i itself is incremented before control enters the body of f.
  • The end of the first operand of the following operators: logical AND (&&); logical OR (||); conditional (?); comma (,).
    For example, in the expression *p++ != 0 && *q++ != 0, all side effects of the sub-expression *p++ != 0 are completed before any attempt to access q.
    Similarly, in the expression a = (*p++) ? (*p++) : 0 there is a sequence point after the first *p++, meaning it has already been incremented by the time the second instance is executed.
  • The end of a full expression: an initializer; the expression in an expression statement; the controlling expression of a selection statement (if or switch); the controlling expression of a while or do statement; each of the expressions of a for statement; the expression in a return statement.
  • The end of a full declarator: declarators.
    At the end of an initializer; for example, after the evaluation of 5 in the declaration int a = 5;.
  • At a function return, after the return value is copied into the calling context. (This sequence point is only specified in the C++ standard. Thou I believe it to be valid for C as well.)
The above given listing can be found in MSDN library at the following url
http://msdn.microsoft.com/en-us/library/azk8zbxd.aspx


The Confusion

I hope you have tried the above mentioned questions. Its time for the explanations.
Both the C and C++ standards are quiet precise in specifying the syntax and semantics of programs. However, neither of them mention the exact meaning of every construct rather they simply describe some of them as undefined or implementation dependent.

The reason is that the C/C++ standard does state that all side effects, after a sequence point, must be over before the next sequence point. However it does NOT mention anything about the order of the occurrence of the side effects. This order is, what is actually implementation dependent.
Consider the expression,

*p++ = *q++;

This expression has 5 operators here (2 increment, 2 de-referencing, 1 assignment) and three side effects.
- increment in the value of p
- increment in the value of q
- the update in the value of *p

The above expression can be evaluated as:
*p = *q;
p++;
q++;
OR
*p = *q;
q++;
p++;
OR
temp1 = p;
p++;
temp2 = q;
q++;
*temp1 = *temp2;

Now consider the questions given above.
i = i++;

here there are two side effects increment & assignment, if their sequences are altered we get two different interpretations:
i = i; i++; OR i++; i = i; here no matter which sequence is followed we get the same value.
But as we said this is implementation dependent so some thing like
temp = i;
i++;
i = temp;
is also possible. The above sequence would have worked perfectly if the left operand was anything but i.

Now consider a[j] = j++;
Here again we have the same problem what shall be the value of the index, old value of j or the incremented value of j.
Should a[j] be updated or a[j+1]? What shall be the updated value, j or j+1? Similar to the above examples we are sure that j will have an incremented value after the execution of above statement.

Although the comma operator was described as a sequence point, the commas that separate the arguments in a function call are not comma operators; they're mere punctuation. Therefore when we consider something like
f(k, ++k, ++k)
we do not know which argument is evaluated first so the above call can be
f(0, 1, 2) considering left to right evaluation
or f(2, 2, 1) in case its done right to left
or may be both increments are done before so we get f(2, 2, 2)
even f(0, 0, 0) is a valid possibility.
In all the above cases the one thing guaranteed is that before the execution of f begins the value of k in the calling function will be 2.


Mistakes out of smartness

Some of us have a habit of using the ++ or -- operators to shrink the size of our code, some even do it to make the code look professional.
Consider the e.g.
i = 2*i++;
This looks similar to out first problem, i = i++; but such an use is quiet uncommon.
However, anyone who wants a value to be doubled and then incremented would have written
i = 2*i; i++;
and a smart programmer combined the two to make i = 2*i++;
but due to its undefined behavior this may be treated as i = 2*(i++);

The standard states: "Between two consecutive sequence points an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored."
This is the reason why the above expression exhibits undefined behavior.


The Solution

The only solution out of this confusion is to avoid dependency on order of side effects. Such dependency leads to less portable code. Whenever there is a need of a particular order to be maintained, one should rely on enclosing sub-expressions in parenthesis or converting sub-expressions into full expressions, using temporary variables and checking that a value is not modified more than once between two consecutive sequence points.

A golden rule will be to trust the standards and not the compiler. Just because your compiler gives results as per your expectations, does not necessarily mean that you will have the same result on every other compiler.

No comments:

Post a Comment