Knowledge has only one definition, "It has no Upper Bound"

Tuesday, April 21, 2009

restrict Qualifier

Recently I came across a new keyword “restrict”.
It is one of the new features in the recently approved C standard C99.
The C89 standards committee added two type qualifiers to C, const and volatile.
The C99 committee added a third type of qualifier named restrict. Individually, and in combination, these type qualifiers determine the assumptions a compiler may make when it accesses an object through an lvalue.
The definition of an lvalue is an object locator, typically an identifier or the dereferencing of a pointer.


Type Qualifiers

Type qualifiers were introduced in the language to provide greater control over optimization. "Controlling optimization, but How?"?"
The answer lies in the fact that optimization techniques are based on the principle of caching. Under certain circumstances the compiler can remember the last value accessed (read or written) from a location, and use this retained value the next time that location is read. If this memory is a machine register, for instance, the code can be smaller and faster using the register rather than accessing external memory.

Thus, similar to the const qualifier, using the restrict qualifier appropriately in C programs may allow the compiler to produce significantly faster executable code. How this optimization is achieved, will be clear with the coming examples.


What's the deal with "restrict"?

Objects referenced through a restrict qualified pointer have a special association with that pointer. All references to that object must directly or indirectly use the value of that pointer. In the absence of the restrict qualifier, other pointers can alias this object. The restrict keyword is "Just a Message" or "An assurance" from the programmer to the compiler. It tells the compiler that only the pointer or a value based on the pointer (such as pointer+1) will be used to access the object it points to.
In other words we can use this qualifier to make a Promise to the compiler that the value referred by a pointer will be accessed(read or written) using that qualified pointer only. AND that this promise will be kept throughout the lifetime of the pointer.


The confusion: const & restrict

Some text defined the restrict as: The restrict keyword enables the compiler to perform certain optimizations based on the assumption that a given object cannot be changed through another pointer.

Now it may seem that, "const already guarantees that." Not exactly, the qualifier const ensures that a variable cannot be changed through some particular pointer. However, it's still possible to change the variable through a different pointer.
So, the difference between const & restrict is that, const guarantees inaccessibility of a particular data from one pointer. However restrict allows the compiler to assume that only one pointer will access that data and that the compiler is free to generate optimal code.


How is this optimization achieved?

If a compiler cannot determine that two different pointers are being used to reference different objects, then it cannot apply optimizations such as maintaining the values of the objects in registers rather than in memory, or reordering loads and stores of these values. The restrict qualifier allows the compiler to determine the existence of such pointers.

We all know that all HLL code is converted to some lower level code prior to execution. For now let us assume our compiler generates something similar to assembly code.

Consider the following example:
1 void addCtoAB(int *ptrA, int *ptrB, int *ptrC)
2 {
3 *ptrA += *ptrC;
4 *ptrB += *ptrC;
6 }

So for the third statement the compiler generates something like-
load R1, *ptrC ; Load the value of *ptrC
load R2, *ptrA ; Load the value of *ptrA
add R2, R1 ; Perform Addition
store R2, *ptrA ; Update the value of *ptrA

Similarly for the fourth statement we have-
load R1, *ptrC ; Load the value of *ptrC
load R2, *ptrB ; Load the value of *ptrB
add R2, R1 ; Perform Addition
store R2, *ptrB ; Update the value of *ptrB

Now you would say, there was no need to load ptrC in R1 the second time. You are right, but the compiler does not know if ptrA, ptrB & ptrC refer to the same location. Thus, an update to *ptrA may have altered the contents of *ptrC as well.

Now consider the example:
1 void addCtoAB(int* restrict ptrA, int* restrict ptrB, int* restrict ptrC)
2 {
3 *ptrA += *ptrC;
4 *ptrB += *ptrC;
6 }

Here the compiler is given assurance of two things
  • First, ptrA, ptrB, ptrC point to different objects
  • Second, as long as the scope of ptrA, ptrB & ptrC exists no other pointer shall update the contents of memory locations pointed by them.
  • Thus the compiler is sure that an update of *ptrA will not alter the value of *ptrC

Therefore once ptrC is loaded to R1 there is no need to reload it, the compiler generates the following code:
load R1, *ptrC ; Load the value of *ptrC
load R2, *ptrA ; Load the value of *ptrA
add R2, R1 ; Perform Addition
store R2, *ptrA ; Update the value of *ptrA
load R2, *ptrB ; Load the value of *ptrB
add R2, R1 ; Perform Addition
store R2, *ptrB ; Update the value of *ptrB

Note that if the above method is called as
addCtoAB(&a, &b, &c); // The code runs fine

However a call such as
addCtoAB(&c, &c, &c); // This may lead to undefined behavior

The only logical explanation to this undefined behavior is that restrict keyword merely gives an assurance to the compiler. It is like a promise that the programmer makes to the compiler, and if this promise is not kept then we may face surprising results. Note that the assembly code is just a sample & not the exact code that any compiler will generate.


Revisit to const & restrict

As mentioned before, restrict is considered to be similar to const, but its not this shall be clear with the coming examples.

The execution model assumed here is:
  • Load primary memory data into cache
  • Perform computations
  • Restore primary memory data from cache
Terms R1, R2, RX, RY are being used to denote caching. Cache entries are associated with the memory addresses so RX & RY are used where we are not sure whether they will refer to same location in the primary memory. So if two pointers p1 & p2 contain the same address then *p1 & *p2 will be mapped to same cache location.

void funny(int *ptrA, int *ptrB)
{
*ptrA += *ptrB;
*ptrA += *ptrB; // a repeated statement
}

The code generated will be something like:
; For the first statement
load R1, ptrA ; Load the value of ptrA pointer
load R2, ptrB ; Load the value of ptrB pointer
load RX, [R1] ; Load the value of *ptrA
load RY, [R2] ; Load the value of *ptrB
add RX, RY ; Perform Addition
store [R1], RX ; Restore value of *ptrA
store [R2], RY ; Restore value of *ptrB

; For the second statement
load R1, ptrA ; Load the value of ptrA pointer
load R2, ptrB ; Load the value of ptrB ponter
load RX, [R1] ; Load the value of *ptrA
load RY, [R2] ; Load the value of *ptrB
add RX, RY ; Perform Addition
store [R1], RX ; Restore value of *ptrA
store [R2], RY ; Restore value of *ptrB

RX & RY may be the same cache register if ptrA == ptrB

As you can see in the above example the statement store [R2], RY is not needed, because we never updated *ptrB. Now this extra store statement can be removed by using const qualifier.

void funny(int *ptrA, const int *ptrB)
{
*ptrA += *ptrB;
*ptrA += *ptrB; // a repeated statement
}


; For the first statement
load R1, ptrA ; Load the value of ptrA pointer
load R2, ptrB ; Load the value of ptrB ponter
load RX, [R1] ; Load the value of *ptrA
load RY, [R2] ; Load the value of *ptrB
add RX, RY ; Perform Addition
store [R1], RX ; Restore value of *ptrA

; For the second statement
load R1, ptrA ; Load the value of ptrA pointer
load R2, ptrB ; Load the value of ptrB pointer
load RX, [R1] ; Load the value of *ptrA
load RY, [R2] ; Load the value of *ptrB
add RX, RY ; Perform Addition
store [R1], RX ; Restore value of *ptrA


Now you may say that we did not need to perform load RY, [R2] the second time either. The reasoning is similar to the previous example. The compiler does not know whether ptrA & ptrB refer to different locations and thus assumes that an update to *ptrA may also update *ptrB. Now comes restrict to our rescue.

void funny(int * restrict ptrA, int * restrict ptrB)
{
*ptrA += *ptrB;
*ptrA += *ptrB; // a repeated statement
}


; For the first statement
load R1, ptrA ; Load the value of ptrA pointer
load R2, ptrB ; Load the value of ptrB pointer
load RX, [R1] ; Load the value of *ptrA
load RY, [R2] ; Load the value of *ptrB
add RX, RY ; Perform Addition
store [R1], RX ; Restore value of *ptrA
store [R2], RY ; Restore value of *ptrB

; For the second statement
load R1, ptrA ; Load the value of ptrA pointer
load RX, [R1] ; Load the value of *ptrA
load RY, [R2] ; Load the value of *ptrB
add RX, RY ; Perform Addition
store [R1], RX ; Restore value of *ptrA
store [R2], RY ; Restore value of *ptrB

Here you can again use the const qualifier to remove the extra store statements as well.


Typical Uses of restrict

It may be difficult to understand the complexity of the specification of the restrict qualifier. However, we only need to understand few simple patterns of usage explained below:

  1. A compiler can assume that a file-scope restrict qualified pointer is the sole initial means of access to an object, much as if it were the declared name of an array. This is useful for a dynamically allocated array whose size is not known until run time. Note in the following example how a single block of storage is effectively subdivided into two disjoint objects.

    float * restrict a1, * restrict a2;

    void init(int n)
    {
    float * t = malloc(2 * n * sizeof(float));
    a1 = t; // a1 refers to 1st half
    a2 = t + n; // a2 refers to 2nd half
    }

  2. A compiler can assume that a restrict qualified pointer which is a function parameter(at the beginning of each execution of the function), is the sole means of access to an object. However this assumption expires with the end of each execution. In the following example, parameters a1 and a2 can be assumed to refer to disjoint array objects because both are restrict qualified. This implies that each iteration of the loop is independent of the others, and so the loop can be aggressively optimized.

    void f1(int n, float * restrict a1, const float * restrict a2)
    {
    int i;
    for ( i = 0; i <>
    a1[i] += a2[i];
    }

    However a call such as f1(10, &a[0], &a[0]) may lead to undefined behavior because the compiler assumes a1 & a2 to be non-overlapping and thus produces an optimized code, by altering caching/load/store statements.

  3. A compiler can assume that a restrict qualified pointer declared with block scope the sole initial means of access to an object during each execution of the block. An invocation of the macro shown in the following example is equivalent to an inline version of a call to the function f1 above.

    # define f2(N,A1,A2) \
    { int n = (N); \
    float * restrict a1 = (A1); \
    float * restrict a2 = (A2); \
    int i; \
    for ( i = 0; i <>
    a1[i] += a2[i]; \
    }

  4. The restrict qualifier can be used in the declaration of a structure member. A compiler can assume that the member provides the sole initial means of access to an object of the type specified in the member declaration, whenever an identifier is declared that provides a means of access to an object of that structure type. The duration of the assumption depends on the scope of the identifier & not on the scope of the declaration of the structure. Thus in the example below a compiler can assume that s1.a1 and s1.a2 are used to refer to disjoint objects for the duration of the whole program, but that s2.a1 and s2.a2 are used to refer to disjoint objects only for the duration of each invocation of the f3 function.

    struct t
    {

    int n;
    float * restrict a1, * restrict a2;
    };

    struct t s1;

    void f3(struct t s2) { /* ... */ }


C++ & restrict


C++ doesn't support restrict yet. However, since many C++ compilers are also C compilers, it's likely that this feature will be added to most C++ compilers too.


A final summary
  • Type qualifiers such as const, volatile & restrict are actually intended to produce optimized code. So, we should stop thinking that const is used only to declare global constants rather consider its usage at a deeper level.
  • Caching the value in an object designated through a restrict qualified pointer is safe at the beginning of the block in which the pointer is declared, because no pre-existing aliases may now be used to reference that object.
  • The cached valued can be used again & again without re-loading it whenever such a pointer is used in any statement.
  • The cached value must be restored to the object by the end of the block, where pre-existing aliases again become available. This restore may be avoided if the pointer was qualified using const as well.
  • New aliases may be formed within the block, but these must all depend on the value of the restrict-qualified pointer, so that they can be identified and adjusted to refer to the cached value.
  • For a restrict qualified pointer at file scope, the block is the body of each function in the file.

2 comments: