09.md

Pointer addressing

Like with arrays, it is possible to subscript pointers:

int a[10];
int *p = a;

p[0]...

This just goes directly from the standard: p[x] is equivalent to *(p + x). And remember that incrementing a pointer or an array expression goes by increments of the element type size.

int a[2] = { 1, 222 };

/* Will print 222. */
printf("%d\n", *(a + 1));

Array of pointers

See also what we already know about program arguments.

/*
 * [] is of higher precedence than *, so the following means:
 *
 *	 "a is an array of 10 pointers to char"
 */
char *a[10];

You could write it as char *(a[10]) but that would be quite uncommon and is not recommended.

You can initialize such an array the same way as before. Remember, a string literal is internally an array of chars, so we can use it to initialize the following array of char pointers (it is the same as char *p = a).

char *a[] = { "hello", "world", "!" };

👀 array-of-ptrs.c

10 minute warm-up

./a.out dragon firefly jupiter
  • Using the pointer arithmetics and the dereference operator * only , given the exact command line above, print 'g' from the first argument, 'r' from the 2nd, and 't' from the last one.
  • That is, you must not use brackets [] at all.
  • You can assume the command will have the arguments above, i.e. it's OK to crash if you run it without any arguments, for example.
  • It is still good to verify you have at least 3 arguments.
  • Remember program arguments

🔧 Home assignment

Note that home assignments are entirely voluntary but writing code is the only way to learn a programming language.

Work with arrays and pointers

  • Define two integers i, j and initialize them with some arbitrary values.
  • Define an array a of pointers to int that contains pointers to these two integers.
  • Define an array b of ints that will contain the values of these two integers.
  • Define a pointer p, and assign i-th element of array a to it.
  • Define a pointer q, and assign an address of the i-th element of array b to it.
  • Print the address of p, q and then deference them.

What you are expected to get when dereferencing the p and q pointers?

What is the distance between these 2 pointers? Does it make sense to do that? A difference between two pointers is a signed integer of type ptrdiff_t. To print it out via printf, use a t modifier.

int a[2];
int *p = &a[1];
int *q = &a[0];

printf("%td\n", p - q);

Usage: ./a.out <i>

What is a reasonable value of the 1st argument (i) of the program ?

👀 ptr-to-array-of-ptrs.c

Pointer to a pointer

We know that a pointer points to an object (unless its a NULL pointer or of type void), eg. an int. However, the object may be another address, i.e. a pointer may point to another pointer.

int i = 13;
int *pi = &i;
int **ppi = &pi;        // ppi is a pointer to a pointer to an int

// *pi is a pointer to an int, so another dereference is needed to get the value
printf("%d\n", **ppi);   // prints 13

A pointer to a pointer to an array vs. a pointer to an array of pointers

Imagine this: let's assume we have an array of pointers to an integer, like this:

int *p[10];     // note it is equivalent to "int *(p[10])"

Now, let's have a function that takes an array of int pointers and prints the value pointed to by array's n-th element. We know that when passing an array as an argument, C converts it to a pointer. So, int *p[] can be written as int **p.

void
foo(int **p, size_t idx)
{
        printf("%d\n", **(p + idx));
}

Clearly, this works fine in a situation when pp is indeed a pointer to an array of integer pointers (left). Now what if someone passes a pointer to a pointer to an integer array (right)?

Note that in both situations, ppa and ppb properly fit the declaration int **p. That is, both arrays in the chart below are 2-dimensional and both have one of the dimensions set as 1. The difference is which of the two dimensions is 1. And that is a piece of information that must be provided upfront.

    +-------+     +-------+     +-------+
 i  |   42  |   j |   7   |   k |   99  |
    +-------+     +-------+     +-------+
        ^           /      _______/
        |         /       /
    +-------+-------+-------+           +------+------+------+
 a  |   &i  |   &j  |   &k  |         b |  42  |   7  |  99  |
    +-------+-------+-------+           +------+------+------+
        ^                                   ^
        |                                   |
    +-------+                           +------+.......
ppa |   a   |                        pb |   b  |   ?  :
    +-------+                           +------+.......
                                           ^
                                           |
                                        +------+
                                    ppb |  pb  |
                                        +------+
  • What is the difference in accessing the values? I.e. what will **(x + 1) and *(*x + 1) do when x is ppa or ppb?
  • The following source code closely follows the ASCII art schema above. Please study the code and make sure you understand the output completely. Which address from the program output corresponds to the value of ? above?

👀 ptr-ptr-array.c

Example output:

$ ./a.out
 &i = 0x7ffee68b1880 (42)
 &j = 0x7ffee68b187c (7)
 &k = 0x7ffee68b1878 (99)
  a = 0x7ffee68b1890
  b = 0x7ffee68b1884
 pb = 0x7ffee68b1884 [*pb = 42]
ppb = 0x7ffee68b1868 [*ppb = 0x7ffee68b1884]
first:
0x7ffee68b1880 = 42
0x7ffee68b1884 = 42
second:
0x7ffee68b187c -> 7 (in hex 0x7)
0x7ffee68b1890 -> -427091840 (in hex 0xe68b1880)

The situation above is taken from a real-life problem. Quoting from https://unixpapa.com/incnote/pam.html:

When the conversation function is called, it is passed an array of prompts. This is always passed in as struct pam_message **mesg. However, the interpretation varies. In Linux-PAM and OpenPAM this is a pointer to an array of pointers to pam_message structures, whereas in Solaris it is a pointer to a pointer to an array of pam_message structures. Frequently there is only one prompt being passed in, so it doesn't matter. Under either interpretation, the first structure is addressable as **msg. However, accessing subsequent elements is different. In Linux-PAM and OpenPAM, the second element is at *(msg[2]), while in Solaris it is at (*msg)[2].

The situation when one prompt is passed only is similar to what happens when the function first() in 👀 ptr-ptr-array.c is called for ppa and ppb: we see 42 in both cases (element [0][0]). The problem manifests itself when we have and need to reference more than one number (or more than one, as in the PAM situation above, msg structures).

Changing pointers

  • in order to change the value of a pointer in a function, it has to be passed as a pointer to a pointer
    • this makes sense, because the storage address of a plain pointer is not known to the function, and therefore we must pass in its address so that we can change the object the address points to by dereferencing the address.

👀 ptr-change.c

Explicit type conversion

We already saw an implicit conversion when working with integer types.

Casting is an explicit type conversion:

(type_name)expression;

int i = 13;
long l = (long)i;		// Just an example.  The casting not needed in this
				// case, the implicit integer conversion would
				// work fine.

Casting is used to avoid compiler warnings as a sort of a hint to the compiler that you know what you are doing.

👀 cast-double.c

Explicit cast for pointers of different types works:

(C99, 6.3.2.3, 7)

A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.

This will be especially handy for pointers to structures. See structure casting for more information.

You can always cast explicitly but it depends on what will be done with the result. For example, on some architectures, you cannot access improperly unaligned objects. The following example is from a SPARC machine.

$ isainfo
sparcv9 sparc

$ cat main.c
#include <stdio.h>

int
main(void)
{
        char *s = "hello, world, hello, world";
        long long int *p = (long long int *)(s + 1);

        printf("%lld\n", *p);
}

$ gcc main.c
$ ./a.out
Bus Error (core dumped)

👀 bus-error.c

Now see the following code and figure out what is gonna happen. 🔧 👀 ptr-cast.c

Note that void * is a special pointer that cannot be dereferenced. You can always assign any pointer to a (void *) pointer without any need for casting, and you can also assign any (void *) pointer to any other pointer without any need for casting. These assignments are guaranteed to not lose any information.

int i = 99;
void *p = &i;
int *pi = p;

printf("%d\n", *pi);	// will print 99
printf("%d\n", *p);	// dereferencing (void *), will error out when compiled

🔧 Verify that a staticly allocated 2D array is stored in one piece of memory, row by row.

Hint: you need to recast a 2D array to a 1D array, then print it as a single row. There are a few different ways to do it. Do not look at the solution until you write your own code.

int a[5][5] = { ... };
// ...

👀 2d-static-array-as-1d.c

heap/dynamic allocation: malloc()/free()

The memory automatically allocated for local variables and function parameters is allocated in an area called a stack. There is an area called a heap to allocate memory that lasts after the function returns. This is also called dynamic allocation.

The allocator in the standard library offers the malloc()/calloc()/free()/... APIs for heap allocation.

The malloc/calloc functions return a pointer to a memory area of a specified size or a NULL pointer if the allocation failed - always check that! (even on Linux where it seems it can never fail - to be prepared for change in a configuration and also for portability to systems with more conservative memory allocation).

#define	ARRLEN	20
int *p;

if ((p = malloc(ARRLEN * sizeof (int))) == NULL)
	err(1, "malloc");
p[0] = 99;
p[ARRLEN - 1] = 77;

The prototype for malloc is as follows:

void *malloc(size_t size);

Note that as malloc returns void *, there is no need to explicitly type its result when assigned to a pointer. That is, do not use:

int *p;

p = (int *)malloc(16);

See man malloc for more memory allocation related functions.

Pointer to a structure and type casting

  • pointers to structures are often used to achieve common interface for different types

    • e.g.
struct Common { int type; };
struct A      { int type; char data[8]; };	// type == 1
struct B      { int type; char data[16]; };	// type == 2
  • then a function can be declared like so:
int func(struct Common *c);
  • internally it can do e.g.:
if (c->type == 1) {
	struct A *ap = (struct A *)c;
	ap->data[7] = 'A';
} else if (c->type == 2) {
	struct B *bp = (struct B *)c;
	ap->data[15] = 'B';
}

This is possible since all the structures have the same member on the same offset (that is offset 0).

code: 👀 struct-common.c

Or, more commonly, a function will allocate a A or B structure and return its address as a pointer to the Common struct. This pointer then needs to be casted according to its first member.

See struct sockaddr, struct sockaddr_in and struct sockaddr_in6 definitions for example on how this is done in practice.

🔧 Home assignment

Note that home assignments are entirely voluntary but writing code is the only way to learn a programming language.

Linked list

Declare a structure that will form a simple linked list and will hold an integer as a value. The program will be run with a single argument specifying how many items the list will have.

Allocate a new structure and insert into the head (global variable). Each new item will have its value incremented by one.

Aside from the value itself, each node needs to hold a pointer to the next structure in the list. The last node has the next pointer set as NULL.

Once the list is complete, print its value by traversing its items from head to end.

$ ./a.out
a.out: usage: ./a.out <num>
$ ./a.out 10
9
8
7
6
5
4
3
2
1
0

🔑 code: 👀 linked-list-free.c

Memory leaks

The C runtime does not have a garbage collector so all heap allocated memory has to be explicitly freed via free() after it is no longer needed. If not freed, that creates a resource leak called a memory leak. Depending on the problem this might cause the problem of running out of memory later on (and then malloc/calloc can start returning NULL).

The leaks can be checked using static or dynamic analyzers.

🔧 write a program that takes all arguments that follow argv[0], concatenates them (without the terminating NUL character) into one string dynamically allocated via malloc() and prints this string to the standard output.

  • the concatenation can be done either by hand (try that first) or strncat() (try that afterwards)

👀 argv-concat.c

You can then put the string processing in a loop and comment out the free() call on the allocated memory. Then check with top -s 1 on Linux, macOS, or any other Unix-like system (or use any other suitable system monitoring tool) that the memory size of the running program quickly increases.

👀 src/argv-concat-nofree.c