Knowledge has only one definition, "It has no Upper Bound"

Friday, July 31, 2009

Modulo using Bitwise Operators

I was asked a question today how can we check whether a number N is divisible by 7 using bitwise operators. Then I was asked to do the same for 3.

Initial Approach (wrong approach):
To observe bit patterns
for e.g. 3 = 00000011, 6 = 00000110, 9 = 00001001 and so on...

I thought that all the binary 1's will conform to either 11 or 1001. BUT this was wrong, it failed for 81.

The Final Approach (courtesy google):
After a bit of googling I found an approach which has to do with modulo arithmetic in Numbers Theory.
(3)10 = (11)2 Note: I will use this notation whenever needed to avoid confusion

The algorithm:
1 - Take the Binary of the number
2 - Divide the entire sequence of bits into pairs of bits
3 - Take the sum of all these pairs
4 - If the sum is greater than three then repeat Steps 1, 2, 3 for this sum
5 - If sum is 0 or 3 then number is divisible by three else not. Moreover this sum will have the remainder.

For e.g. let us take (assume 8 bits for simplicity)

First Iteration:
(40)10 = (00101000)2 = 00 10 10 00 ==> Sum = (4)10
(41)10 = (00101001)2 = 00 10 10 01 ==> Sum = (5)10
(42)10 = (00101010)2 = 00 10 10 10 ==> Sum = (6)10

Second Iteration:
(4)10 = (00000100)2 = 00 00 01 00 ==> Sum = (1)10 ==> 40%3 = 1
(5)10 = (00000101)2 = 00 00 01 01 ==> Sum = (2)10 ==> 41%3 = 2
(6)10 = (00000110)2 = 00 00 01 10 ==> Sum = (3)10 ==> 42%3 = 0

Thus 42 is divisible
Similar approach is for checking divisibility with 7. Only this time instead this time taking pairs of consecutive bits we take triplets because (7)10 = (111)2

Consider a sample code for N % 7 using the above approach.

unsigned int calcRemainder(unsigned int n)
{
unsigned int sum = 0;
while(1)
{
sum = 0; // Correction done as mentioned by Jitendra
while(n)
{
sum += n&7;
n = n >> 3; // seven is 3 1s (ones) so shift by three
}
if(sum>7)
n = sum;
else if(sum==7)
return 0;
else
return sum;
}
}
The code looks similar to what we do for checking divisibility with 3 or 9 in decimal number system. We take the sum of all the digits recursively till we get a single either divisible or not divisible by 3 or 9.
The only limitation that I could observe was that this technique works for numbers like which of the form 2^n - 1 such as 3, 7, 15. (I havent tried it for other numbers anyways :P )


However I wasn't able to deduce this mathematically (I'm not a maths guy). So If any of you Mathos can give me mathematical proof and may be extend this algorithm for other numbers as well, kindly add them as a comment, or mail mel. I will surely update the blog.


Sunday, April 26, 2009

Sequence Points, Side Effects & The Confusion

Before proceeding further try answering these easy questions

int i=0, j=0, k=0;
i = i++;
a[j] = j++;

f(k, ++k, ++k);

1) What is the value of i, j, k?
2) What is the value of a[0], a[1] assuming entire array was initialized to zero?
3) If f is defined as
void f(int a, int b, int c){cout<<a<<b<<c;}
then what is the output?



Side Effects

A general definition: A side effect is an action that changes the state of the program's execution environment. Side effects include modifying an object, modifying a file, accessing a volatile object, or calling a function that does any of these operations.

All expressions have a type associated with them, they also yield a value (except for the void type expressions) All expressions have side effects even a void one.
For e.g. n++ is an expression whose value is n and has the side effect of incrementing n.

Let's look at a few more examples of side effects.

We can use the increment or decrement expression (based on unary ++ or --) as a full expression. A full expression is one that is not a sub-expression of some larger expression.
For e.g.
++i;
itself is a full expression.
So is the increment expression in:
for (i = 0; i!=10; i++)

In both the cases we are interested only in the side effect i.e. the incremented value. However, when we use an increment expression as a sub-expression of a larger expression, as in:
x = *p++;
then we are interested in the resulting value as well as the side effect.
No unary operator other than ++ and -- has a side effect.

An assignment expression such as:
m = n;
has an obvious side effect, that is it stores the value of n (converted to the type of m) into m. Less obvious and often ignored fact is that the assignment also returns a value, which is the value of m after the assignment. (Note that it does not return an lvalue. A statement such as a = b = c works only because the association is right to left.)
Arithmetic assignment operators, such as += and -= produce both values and side effects, but other binary operators such as + and - produce only a value. They have no side effects.

Finally, any expression that calls a standard library i/o function has the side effect of modifying a file. For example:
printf("i = %d; j = %d\n", i, j);
yields an int result, which we often ignore. Its side effect is the modification of the standard output file, this is what we are generally interested in.


Sequence Points

A sequence point is a point in a computer program's execution at which it is guaranteed that all side effects of previous evaluations have completed.

The C/C++ standard defines the following sequence points:

  • The call to a function, after the arguments have been evaluated or before a function is entered in a function call. In the expression f(i++), f is called with the original value of i, however i itself is incremented before control enters the body of f.
  • The end of the first operand of the following operators: logical AND (&&); logical OR (||); conditional (?); comma (,).
    For example, in the expression *p++ != 0 && *q++ != 0, all side effects of the sub-expression *p++ != 0 are completed before any attempt to access q.
    Similarly, in the expression a = (*p++) ? (*p++) : 0 there is a sequence point after the first *p++, meaning it has already been incremented by the time the second instance is executed.
  • The end of a full expression: an initializer; the expression in an expression statement; the controlling expression of a selection statement (if or switch); the controlling expression of a while or do statement; each of the expressions of a for statement; the expression in a return statement.
  • The end of a full declarator: declarators.
    At the end of an initializer; for example, after the evaluation of 5 in the declaration int a = 5;.
  • At a function return, after the return value is copied into the calling context. (This sequence point is only specified in the C++ standard. Thou I believe it to be valid for C as well.)
The above given listing can be found in MSDN library at the following url
http://msdn.microsoft.com/en-us/library/azk8zbxd.aspx


The Confusion

I hope you have tried the above mentioned questions. Its time for the explanations.
Both the C and C++ standards are quiet precise in specifying the syntax and semantics of programs. However, neither of them mention the exact meaning of every construct rather they simply describe some of them as undefined or implementation dependent.

The reason is that the C/C++ standard does state that all side effects, after a sequence point, must be over before the next sequence point. However it does NOT mention anything about the order of the occurrence of the side effects. This order is, what is actually implementation dependent.
Consider the expression,

*p++ = *q++;

This expression has 5 operators here (2 increment, 2 de-referencing, 1 assignment) and three side effects.
- increment in the value of p
- increment in the value of q
- the update in the value of *p

The above expression can be evaluated as:
*p = *q;
p++;
q++;
OR
*p = *q;
q++;
p++;
OR
temp1 = p;
p++;
temp2 = q;
q++;
*temp1 = *temp2;

Now consider the questions given above.
i = i++;

here there are two side effects increment & assignment, if their sequences are altered we get two different interpretations:
i = i; i++; OR i++; i = i; here no matter which sequence is followed we get the same value.
But as we said this is implementation dependent so some thing like
temp = i;
i++;
i = temp;
is also possible. The above sequence would have worked perfectly if the left operand was anything but i.

Now consider a[j] = j++;
Here again we have the same problem what shall be the value of the index, old value of j or the incremented value of j.
Should a[j] be updated or a[j+1]? What shall be the updated value, j or j+1? Similar to the above examples we are sure that j will have an incremented value after the execution of above statement.

Although the comma operator was described as a sequence point, the commas that separate the arguments in a function call are not comma operators; they're mere punctuation. Therefore when we consider something like
f(k, ++k, ++k)
we do not know which argument is evaluated first so the above call can be
f(0, 1, 2) considering left to right evaluation
or f(2, 2, 1) in case its done right to left
or may be both increments are done before so we get f(2, 2, 2)
even f(0, 0, 0) is a valid possibility.
In all the above cases the one thing guaranteed is that before the execution of f begins the value of k in the calling function will be 2.


Mistakes out of smartness

Some of us have a habit of using the ++ or -- operators to shrink the size of our code, some even do it to make the code look professional.
Consider the e.g.
i = 2*i++;
This looks similar to out first problem, i = i++; but such an use is quiet uncommon.
However, anyone who wants a value to be doubled and then incremented would have written
i = 2*i; i++;
and a smart programmer combined the two to make i = 2*i++;
but due to its undefined behavior this may be treated as i = 2*(i++);

The standard states: "Between two consecutive sequence points an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored."
This is the reason why the above expression exhibits undefined behavior.


The Solution

The only solution out of this confusion is to avoid dependency on order of side effects. Such dependency leads to less portable code. Whenever there is a need of a particular order to be maintained, one should rely on enclosing sub-expressions in parenthesis or converting sub-expressions into full expressions, using temporary variables and checking that a value is not modified more than once between two consecutive sequence points.

A golden rule will be to trust the standards and not the compiler. Just because your compiler gives results as per your expectations, does not necessarily mean that you will have the same result on every other compiler.

Tuesday, April 21, 2009

restrict Qualifier

Recently I came across a new keyword “restrict”.
It is one of the new features in the recently approved C standard C99.
The C89 standards committee added two type qualifiers to C, const and volatile.
The C99 committee added a third type of qualifier named restrict. Individually, and in combination, these type qualifiers determine the assumptions a compiler may make when it accesses an object through an lvalue.
The definition of an lvalue is an object locator, typically an identifier or the dereferencing of a pointer.


Type Qualifiers

Type qualifiers were introduced in the language to provide greater control over optimization. "Controlling optimization, but How?"?"
The answer lies in the fact that optimization techniques are based on the principle of caching. Under certain circumstances the compiler can remember the last value accessed (read or written) from a location, and use this retained value the next time that location is read. If this memory is a machine register, for instance, the code can be smaller and faster using the register rather than accessing external memory.

Thus, similar to the const qualifier, using the restrict qualifier appropriately in C programs may allow the compiler to produce significantly faster executable code. How this optimization is achieved, will be clear with the coming examples.


What's the deal with "restrict"?

Objects referenced through a restrict qualified pointer have a special association with that pointer. All references to that object must directly or indirectly use the value of that pointer. In the absence of the restrict qualifier, other pointers can alias this object. The restrict keyword is "Just a Message" or "An assurance" from the programmer to the compiler. It tells the compiler that only the pointer or a value based on the pointer (such as pointer+1) will be used to access the object it points to.
In other words we can use this qualifier to make a Promise to the compiler that the value referred by a pointer will be accessed(read or written) using that qualified pointer only. AND that this promise will be kept throughout the lifetime of the pointer.


The confusion: const & restrict

Some text defined the restrict as: The restrict keyword enables the compiler to perform certain optimizations based on the assumption that a given object cannot be changed through another pointer.

Now it may seem that, "const already guarantees that." Not exactly, the qualifier const ensures that a variable cannot be changed through some particular pointer. However, it's still possible to change the variable through a different pointer.
So, the difference between const & restrict is that, const guarantees inaccessibility of a particular data from one pointer. However restrict allows the compiler to assume that only one pointer will access that data and that the compiler is free to generate optimal code.


How is this optimization achieved?

If a compiler cannot determine that two different pointers are being used to reference different objects, then it cannot apply optimizations such as maintaining the values of the objects in registers rather than in memory, or reordering loads and stores of these values. The restrict qualifier allows the compiler to determine the existence of such pointers.

We all know that all HLL code is converted to some lower level code prior to execution. For now let us assume our compiler generates something similar to assembly code.

Consider the following example:
1 void addCtoAB(int *ptrA, int *ptrB, int *ptrC)
2 {
3 *ptrA += *ptrC;
4 *ptrB += *ptrC;
6 }

So for the third statement the compiler generates something like-
load R1, *ptrC ; Load the value of *ptrC
load R2, *ptrA ; Load the value of *ptrA
add R2, R1 ; Perform Addition
store R2, *ptrA ; Update the value of *ptrA

Similarly for the fourth statement we have-
load R1, *ptrC ; Load the value of *ptrC
load R2, *ptrB ; Load the value of *ptrB
add R2, R1 ; Perform Addition
store R2, *ptrB ; Update the value of *ptrB

Now you would say, there was no need to load ptrC in R1 the second time. You are right, but the compiler does not know if ptrA, ptrB & ptrC refer to the same location. Thus, an update to *ptrA may have altered the contents of *ptrC as well.

Now consider the example:
1 void addCtoAB(int* restrict ptrA, int* restrict ptrB, int* restrict ptrC)
2 {
3 *ptrA += *ptrC;
4 *ptrB += *ptrC;
6 }

Here the compiler is given assurance of two things
  • First, ptrA, ptrB, ptrC point to different objects
  • Second, as long as the scope of ptrA, ptrB & ptrC exists no other pointer shall update the contents of memory locations pointed by them.
  • Thus the compiler is sure that an update of *ptrA will not alter the value of *ptrC

Therefore once ptrC is loaded to R1 there is no need to reload it, the compiler generates the following code:
load R1, *ptrC ; Load the value of *ptrC
load R2, *ptrA ; Load the value of *ptrA
add R2, R1 ; Perform Addition
store R2, *ptrA ; Update the value of *ptrA
load R2, *ptrB ; Load the value of *ptrB
add R2, R1 ; Perform Addition
store R2, *ptrB ; Update the value of *ptrB

Note that if the above method is called as
addCtoAB(&a, &b, &c); // The code runs fine

However a call such as
addCtoAB(&c, &c, &c); // This may lead to undefined behavior

The only logical explanation to this undefined behavior is that restrict keyword merely gives an assurance to the compiler. It is like a promise that the programmer makes to the compiler, and if this promise is not kept then we may face surprising results. Note that the assembly code is just a sample & not the exact code that any compiler will generate.


Revisit to const & restrict

As mentioned before, restrict is considered to be similar to const, but its not this shall be clear with the coming examples.

The execution model assumed here is:
  • Load primary memory data into cache
  • Perform computations
  • Restore primary memory data from cache
Terms R1, R2, RX, RY are being used to denote caching. Cache entries are associated with the memory addresses so RX & RY are used where we are not sure whether they will refer to same location in the primary memory. So if two pointers p1 & p2 contain the same address then *p1 & *p2 will be mapped to same cache location.

void funny(int *ptrA, int *ptrB)
{
*ptrA += *ptrB;
*ptrA += *ptrB; // a repeated statement
}

The code generated will be something like:
; For the first statement
load R1, ptrA ; Load the value of ptrA pointer
load R2, ptrB ; Load the value of ptrB pointer
load RX, [R1] ; Load the value of *ptrA
load RY, [R2] ; Load the value of *ptrB
add RX, RY ; Perform Addition
store [R1], RX ; Restore value of *ptrA
store [R2], RY ; Restore value of *ptrB

; For the second statement
load R1, ptrA ; Load the value of ptrA pointer
load R2, ptrB ; Load the value of ptrB ponter
load RX, [R1] ; Load the value of *ptrA
load RY, [R2] ; Load the value of *ptrB
add RX, RY ; Perform Addition
store [R1], RX ; Restore value of *ptrA
store [R2], RY ; Restore value of *ptrB

RX & RY may be the same cache register if ptrA == ptrB

As you can see in the above example the statement store [R2], RY is not needed, because we never updated *ptrB. Now this extra store statement can be removed by using const qualifier.

void funny(int *ptrA, const int *ptrB)
{
*ptrA += *ptrB;
*ptrA += *ptrB; // a repeated statement
}


; For the first statement
load R1, ptrA ; Load the value of ptrA pointer
load R2, ptrB ; Load the value of ptrB ponter
load RX, [R1] ; Load the value of *ptrA
load RY, [R2] ; Load the value of *ptrB
add RX, RY ; Perform Addition
store [R1], RX ; Restore value of *ptrA

; For the second statement
load R1, ptrA ; Load the value of ptrA pointer
load R2, ptrB ; Load the value of ptrB pointer
load RX, [R1] ; Load the value of *ptrA
load RY, [R2] ; Load the value of *ptrB
add RX, RY ; Perform Addition
store [R1], RX ; Restore value of *ptrA


Now you may say that we did not need to perform load RY, [R2] the second time either. The reasoning is similar to the previous example. The compiler does not know whether ptrA & ptrB refer to different locations and thus assumes that an update to *ptrA may also update *ptrB. Now comes restrict to our rescue.

void funny(int * restrict ptrA, int * restrict ptrB)
{
*ptrA += *ptrB;
*ptrA += *ptrB; // a repeated statement
}


; For the first statement
load R1, ptrA ; Load the value of ptrA pointer
load R2, ptrB ; Load the value of ptrB pointer
load RX, [R1] ; Load the value of *ptrA
load RY, [R2] ; Load the value of *ptrB
add RX, RY ; Perform Addition
store [R1], RX ; Restore value of *ptrA
store [R2], RY ; Restore value of *ptrB

; For the second statement
load R1, ptrA ; Load the value of ptrA pointer
load RX, [R1] ; Load the value of *ptrA
load RY, [R2] ; Load the value of *ptrB
add RX, RY ; Perform Addition
store [R1], RX ; Restore value of *ptrA
store [R2], RY ; Restore value of *ptrB

Here you can again use the const qualifier to remove the extra store statements as well.


Typical Uses of restrict

It may be difficult to understand the complexity of the specification of the restrict qualifier. However, we only need to understand few simple patterns of usage explained below:

  1. A compiler can assume that a file-scope restrict qualified pointer is the sole initial means of access to an object, much as if it were the declared name of an array. This is useful for a dynamically allocated array whose size is not known until run time. Note in the following example how a single block of storage is effectively subdivided into two disjoint objects.

    float * restrict a1, * restrict a2;

    void init(int n)
    {
    float * t = malloc(2 * n * sizeof(float));
    a1 = t; // a1 refers to 1st half
    a2 = t + n; // a2 refers to 2nd half
    }

  2. A compiler can assume that a restrict qualified pointer which is a function parameter(at the beginning of each execution of the function), is the sole means of access to an object. However this assumption expires with the end of each execution. In the following example, parameters a1 and a2 can be assumed to refer to disjoint array objects because both are restrict qualified. This implies that each iteration of the loop is independent of the others, and so the loop can be aggressively optimized.

    void f1(int n, float * restrict a1, const float * restrict a2)
    {
    int i;
    for ( i = 0; i <>
    a1[i] += a2[i];
    }

    However a call such as f1(10, &a[0], &a[0]) may lead to undefined behavior because the compiler assumes a1 & a2 to be non-overlapping and thus produces an optimized code, by altering caching/load/store statements.

  3. A compiler can assume that a restrict qualified pointer declared with block scope the sole initial means of access to an object during each execution of the block. An invocation of the macro shown in the following example is equivalent to an inline version of a call to the function f1 above.

    # define f2(N,A1,A2) \
    { int n = (N); \
    float * restrict a1 = (A1); \
    float * restrict a2 = (A2); \
    int i; \
    for ( i = 0; i <>
    a1[i] += a2[i]; \
    }

  4. The restrict qualifier can be used in the declaration of a structure member. A compiler can assume that the member provides the sole initial means of access to an object of the type specified in the member declaration, whenever an identifier is declared that provides a means of access to an object of that structure type. The duration of the assumption depends on the scope of the identifier & not on the scope of the declaration of the structure. Thus in the example below a compiler can assume that s1.a1 and s1.a2 are used to refer to disjoint objects for the duration of the whole program, but that s2.a1 and s2.a2 are used to refer to disjoint objects only for the duration of each invocation of the f3 function.

    struct t
    {

    int n;
    float * restrict a1, * restrict a2;
    };

    struct t s1;

    void f3(struct t s2) { /* ... */ }


C++ & restrict


C++ doesn't support restrict yet. However, since many C++ compilers are also C compilers, it's likely that this feature will be added to most C++ compilers too.


A final summary
  • Type qualifiers such as const, volatile & restrict are actually intended to produce optimized code. So, we should stop thinking that const is used only to declare global constants rather consider its usage at a deeper level.
  • Caching the value in an object designated through a restrict qualified pointer is safe at the beginning of the block in which the pointer is declared, because no pre-existing aliases may now be used to reference that object.
  • The cached valued can be used again & again without re-loading it whenever such a pointer is used in any statement.
  • The cached value must be restored to the object by the end of the block, where pre-existing aliases again become available. This restore may be avoided if the pointer was qualified using const as well.
  • New aliases may be formed within the block, but these must all depend on the value of the restrict-qualified pointer, so that they can be identified and adjusted to refer to the cached value.
  • For a restrict qualified pointer at file scope, the block is the body of each function in the file.