09.md

Warm-up

Working with dereference operator

./a.out dragon firefly jupiter
  • Using the pointer arithmetics and the dereference operator * only, given the exact command line above, print 'g' from the first argument, 'r' from the 2nd, and 't' from the last one.
  • That is, you must not use brackets [] at all.
  • You can assume the command will have the arguments above, i.e. it's OK to crash if you run it without any arguments, for example.
  • It is still good to verify you have at least 3 arguments.
  • Remember program arguments

Operator precedence

There are 15 levels of operator precedence, see the table at http://en.cppreference.com/w/c/language/operator_precedence

Associativity

If there are multiple operators with the same precedence in an expression, the evaluation is decided based on their associativity.

For example:

8 / 2 % 3

has 2 operators with precedence level of 3 that have left-to-right associativity. Therefore, they will be evaluated as

(8 / 2) % 3

Examples

*p++ is *(p++) as suffix ++ is of higher priority than *. However, the value of the expression is still *p though as the dereference operator is applied on p since that is the value of p++.

Note that suffix ++ and -- has higher priority than prefix ++ and --, and also has different associativity.

🔧 Task: determine the outcome of these expressions/declarations:

  • *p++
  • ++*p
  • int *p[2]
  • int (*p)[3]

👀 pointer-to-array-type-difference.c shows why the array size is significant in the last example.

Operand evaluation order

Consider the following:

int foo(void);
int bar(void);

int x = foo() + bar();

Note that the order of evaluation of subexpressions is an unspecified behavior (see C99 6.5, paragraph 3). So, foo() can be called before or after bar(). If we add another function:

int foo(void);
int bar(void);
int another(void);

int x = foo() + bar() + another();

then the expression will become (foo() + bar()) + another() however the order in which foo() and bar() will be called is still not specified. Remember - unspecified means the implementation (= compiler) is free to choose which argument will be processes first and generally should not document the decision. More on that in types of behavior.

Common gotchas

== or != versus =

the condition in the statement:

if ((c = getchar()) != 0)
    ...

needs to be bracketed this way because == and != is of higher precedence than =.

& or * versus -> or .

-> and . (structure member access) have higher precedence than & (address of) and * (dereference). So, *somestruct->p_member dereferences the pointer p_member.

🔧 Operator precedence and structures

Consider the following structures:

struct bar {
	int val;
} bar;

struct {
	int a[42];
	char *b;
	struct bar *c;
} foo;

Write code to initialize the members of foo with 1, 2, 3, "BBB" and address of bar, respectively and val in bar with 42. Use designated initializers.

Write expressions to get:

  • The address of a
  • The address of b
  • The address of the second element of a
  • The address of the 3rd character from string b
  • The 3rd character from string b
  • Value of val in bar using foo
  • Address of val in bar using foo

Use as few brackets as possible.

🔑 struct-op-precedence.c

Pointer addressing

We already know that as with arrays, it is possible to use the array subscript operator [] on pointers, too:

int a[10];
int *p = a;

p[0] = ...

That goes directly from the standard:

p[x] is equivalent to *(p + x).

Remember that incrementing a pointer or an array expression goes by increments of the element type size.

int a[2] = { 1, 222 };

/* Will print 222. */
printf("%d\n", *(a + 1));

Array of pointers

Remember what we know about program arguments.

/*
 * [] is of higher precedence than *, so the following means:
 *
 *	 "a is an array of 10 pointers to char"
 */
char *a[10];

You could write it as char *(a[10]) but that would be quite uncommon and is not recommended.

You can initialize such an array the same way as before. Remember, a string literal is internally an array of chars, so we can use it to initialize the following array of char pointers (it is the same as if we did char *p = a).

char *a[] = { "hello", "world", "!" };

Modifying such strings is an undefined operation as they are read-only by the specification. Code doing so and compiled by GCC or Clang will crash.

👀 array-of-ptrs.c

Pointer to a pointer

We know that a pointer points to an object (unless its a null pointer or of type void), e.g. an int. However, the object may be another address, i.e. a pointer may point to another pointer.

int i = 13;
int *pi = &i;
int **ppi = π        // ppi is a pointer to a pointer to an int

// *pi is a pointer to an int, so another dereference is needed to get the value
printf("%d\n", **ppi);   // prints 13

You could write *(*ppi) but the associativity here is obvious and using parentheses would be very unusual here.

A pointer to a pointer to an array vs. a pointer to an array of pointers

Imagine this: let's assume we have an array of pointers to an int, as follows:

int *p[10];     // note it is equivalent to "int *(p[10])"

Now, let's have a function that takes an array of int pointers and prints the value pointed to by array's n-th element. We know that when passing an array as an argument, C converts it to a pointer. So, int *p[] can be passed as an argument to an int **p parameter.

void
foo(int **p, size_t idx)
{
        printf("%d\n", **(p + idx));
}

We can legally pass in any pointer to a pointer to an int. Clearly, that works fine in a situation when the object being passed in is indeed a pointer to an array of int pointers. Now what if someone passes in a pointer to a pointer to an int array?

Note that in both situations, ppa and ppb properly fit the declaration int **p. The difference is in semantics - is the actual array in the first or the second indirection? And that is a piece of information that must be provided upfront by whoever wrote the function accepting such an argument.

    +-------+     +-------+     +-------+
 i  |   42  |   j |   7   |   k |   99  |
    +-------+     +-------+     +-------+
        ^           /      _______/
        |         /       /
    +-------+-------+-------+           +------+------+------+
 a  |   &i  |   &j  |   &k  |         b |  42  |   7  |  99  |
    +-------+-------+-------+           +------+------+------+
        ^                                   ^
        |                                   |
    +-------+                           +------+.......
ppa |   a   |                        pb |   b  |   ?  :
    +-------+                           +------+.......
                                           ^
                                           |
                                        +------+
                                    ppb |  pb  |
                                        +------+
  • What is the difference in accessing the values? I.e. what will **(x + 1) and *(*x + 1) do when x is ppa or ppb?
  • Semantically, we could also pass in a pointer to an array of pointers to arrays of ints, which would have arrays on both levels in the above ASCII chart, and would still match the int **p declaration.
  • The following source code closely follows the ASCII art schema above. Please study the code and make sure you understand the output completely. Which address from the program output corresponds to the value of ? above?

👀 ptr-ptr-array.c

Example output:

$ ./a.out
 &i = 0x7ffee68b1880 (42)
 &j = 0x7ffee68b187c (7)
 &k = 0x7ffee68b1878 (99)
  a = 0x7ffee68b1890
  b = 0x7ffee68b1884
 pb = 0x7ffee68b1884 [*pb = 42]
ppb = 0x7ffee68b1868 [*ppb = 0x7ffee68b1884]
first:
0x7ffee68b1880 = 42
0x7ffee68b1884 = 42
second:
0x7ffee68b187c -> 7 (in hex 0x7)
0x7ffee68b1890 -> -427091840 (in hex 0xe68b1880)

The situation above is taken from a real-life problem. Quoting from https://unixpapa.com/incnote/pam.html:

When the conversation function is called, it is passed an array of prompts. This is always passed in as struct pam_message **mesg. However, the interpretation varies. In Linux-PAM and OpenPAM this is a pointer to an array of pointers to pam_message structures, whereas in Solaris it is a pointer to a pointer to an array of pam_message structures. Frequently there is only one prompt being passed in, so it doesn't matter. Under either interpretation, the first structure is addressable as **msg. However, accessing subsequent elements is different. In Linux-PAM and OpenPAM, the second element is at *(msg[2]), while in Solaris it is at (*msg)[2].

The situation when one prompt is passed only is similar to what happens when the function first() in

👀 ptr-ptr-array.c is called for ppa and ppb: we see 42 in both cases (element [0][0]). The problem manifests itself when we have and need to reference more than one number (or more than one, as in the PAM situation above, msg structures).

Changing pointers

  • In order to change the value of a pointer in a function, it has to be passed as a pointer to a pointer.
    • This makes sense, because the storage address of a plain pointer is not known to the function, and therefore we must pass in its address so that we can change the object the address points to by dereferencing the address.

👀 ptr-change.c

Explicit type conversion

We already saw an implicit conversion when working with integer types, see the module on conversions which also mentions the 6.3 spec section with all the details.

Casting is an explicit type conversion:

(type_name)expression;

int i = 13;
long l = (long)i;		// Just an example.  The casting not needed in this
				// case, the implicit integer conversion would
				// work fine.

Casting is used to avoid compiler warnings as a sort of a hint to the compiler that you know what you are doing.

👀 cast-double.c

Explicit cast for pointers of different types works:

(C99, 6.3.2.3, 7)

A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.

This will be especially handy for pointers to structures. See structure casting for more information.

You can always cast explicitly but it depends on what will be done with the result. For example, on some architectures, you cannot access improperly unaligned objects. The following example is from a SPARC machine.

$ isainfo
sparcv9 sparc

$ cat main.c
#include <stdio.h>

int
main(void)
{
        char *s = "hello, world, hello, world";
        long long int *p = (long long int *)(s + 1);

        printf("%lld\n", *p);
}

$ gcc main.c
$ ./a.out
Bus Error (core dumped)

👀 bus-error.c

Now see the following code and figure out what is going to happen.

🔧 👀 ptr-cast.c

Note that void * is a special pointer that cannot be dereferenced. You can always assign any pointer to a (void *) pointer without any need for casting, and you can also assign any (void *) pointer to any other pointer without any need for casting. These assignments are guaranteed to not lose any information.

int i = 99;
void *p = &i;
int *pi = p;

printf("%d\n", *pi);	// will print 99
printf("%d\n", *p);	// dereferencing (void *), will error out when compiled

🔧 Verify that a staticaly allocated 2D array is stored in one piece of memory, row by row.

Hint: you need to recast a 2D array to a 1D array, then print it as a single row. There are a few different ways to do it. Do not look at the solution until you write your own code.

int a[5][5] = { ... };
// ...

🔑 2d-static-array-as-1d.c

🔧 Home assignment

Note that home assignments are entirely voluntary but writing code is the only way to learn a programming language.

Work with arrays and pointers

  • Define two integers i, j and initialize them with some arbitrary values.
  • Define an array a of pointers to int that contains pointers to these two integers.
  • Define an array b of ints that will contain the values of these two integers.
  • Define a pointer p, and assign i-th element of array a to it.
  • Define a pointer q, and assign an address of the i-th element of array b to it.
  • Print the address of p, q and then deference them.

What you are expected to get when dereferencing the p and q pointers?

What is the distance between these 2 pointers? Does it make sense to do that? A difference between two pointers is a signed integer of type ptrdiff_t. To print it out via printf, use a t modifier.

int a[2];
int *p = &a[1];
int *q = &a[0];

printf("%td\n", p - q);

Usage: ./a.out <i>

What is a reasonable value of the 1st argument (i) of the program ?

👀 ptr-to-array-of-ptrs.c