10.md

Warm-up

🔧 strsepc

Implement:

char *strsepc(char **stringp, int c);

which behaves like strsep(3) except that it searches only for the first occurence of a single character.

Try to use strsep() first to properly understand how it works.

👀 strsep.c

🔑 strsepc.c

heap/dynamic allocation: malloc()/free()

The memory automatically allocated for local variables and function parameters is allocated in an area called a stack. There is also an area called a heap to allocate memory that lasts after the function returns. That is also called a dynamic allocation.

The allocator in the standard C library offers the malloc()/calloc()/free()/... APIs for heap allocation.

❗ The malloc/calloc functions return a pointer to a memory area of a specified size or a null pointer if the allocation failed - always check that! (even on Linux where it seems it can never fail - to be prepared for change in a configuration and also for portability to systems with more conservative memory allocation).

#define	NUM_ELEMS	20
int *p;

if ((p = malloc(NUM_ELEMS * sizeof (int))) == NULL)
	err(1, "malloc");
p[0] = 99;
p[NUM_ELEMS - 1] = 77;

The prototype for malloc is as follows:

void *malloc(size_t size);

Note that as malloc returns void *, there is no need to explicitly type its result when assigned to a pointer, see Explicit type conversion. That is, do not use:

int *p;

p = (int *)malloc(16);	// unnecessary cast

See man malloc for more memory allocation related functions.

The C runtime does not have a garbage collector so all heap allocated memory has to be explicitly freed via free() after it is no longer needed.

🔧 Write a program that takes at least 2 arguments. First argument specifies a dimension of an array of ints, the rest are the ints to fill out the array. To allocate memory for the array, use malloc. When filling out the array, ignore extra arguments. If you have less arguments, use zero for the remaining array elements. At the end, print out the array. To convert a string to an integer, use atoi and assume numbers are correctly entered.

./a.out 10 1 2 3 7 8 999 7 7 7 9 9 9 10 11
1 2 3 7 8 999 7 7 7 9

🔑 allocate-and-fill-out-array.c

Dynamic allocation induced problems

Correctness

  • Use after free: read/write memory that is part of memory chunk that was free'd
    • Dangling pointer is often the cause of this; e.g. a structure contains a pointer to heap allocated memory which is freed at some point of time and the pointer is still used to access the memory afterwards, and expected data could be found in there, or not - in the meantime some other part of the program could have allocated piece(s) of the free'd memory.
    • The way to detect/prevent this is to set the pointer to NULL after freeing it.
  • Double free: free already free'd buffer
    • Can corrupt internal heap allocator structures if it does not track the state of memory chunks
    • Use static/dynamic analysis and/or heap allocator with the ability to detect this

👀 use-after-free.c

Memory leaks

If memory allocated on the heap is not freed, it creates a resource leak called a memory leak as the allocator deems such memory used even that your code no longer uses it.

Depending on its size(s) this might cause a problem of running out of memory later on, and then malloc/calloc can start returning a null pointer as an indication of allocation failure(s).

The leaks can be checked using static or dynamic analyzers.

🔧 Write a program that takes all arguments that follow argv[0], concatenates them (without the terminating NUL character) into one string dynamically allocated via malloc() and prints this string to the standard output.

The concatenation can be done either by hand (try that first) or using strncat() (try that afterwards).

👀 argv-concat.c

You can then put the string processing in a loop and comment out the free() call on the allocated memory. Then check with top -s 1 on Linux, macOS, or any other Unix-like system (or use any other suitable system monitoring tool) that the memory size of the running program quickly increases.

👀 argv-concat-nofree.c

Pointer to a structure and type casting

Pointers to structures are often used to achieve common interface for different types that all need to be passed as arugments to a function. Consider this:

struct common { int type; };
struct A      { int type; char data[8]; };	// type == 1
struct B      { int type; char data[16]; };	// type == 2

We have a common structure with the only structure member present in all structures, type. A function can be then declared as follows:

int func(struct common *c);

However, the function needs to process additional members for structures A, B, etc. So, internally it may cast the argument to the specific structure based on their common structure member, type:

if (c->type == 1) {
	struct A *ap = (struct A *)c;
	ap->data[7] = 'A';
} else if (c->type == 2) {
	struct B *bp = (struct B *)c;
	ap->data[15] = 'B';
}

And the func is used like this, i.e. the non-common argument is always casted to the common structure so that the compiler knows we know what we are doing:

struct A a;

if (func((struct common *)&a) == -1) {
	...

This is possible since all the structures have the same member on the same offset (that is offset 0). However, note that you need to cast properly to avoid warnings. See the code below.

👀 struct-common.c

A function may also allocate an A or B structure and return its address as a pointer to the common struct. This pointer then needs to be casted in the caller according to its first member.

See struct sockaddr, struct sockaddr_in and struct sockaddr_in6 definitions as an example on how this is done in practice.

🔧 Linked list

Declare a structure that forms a simple linked list and holds an integer as a value. The program is executed with a single argument specifying how many items the list will have.

Allocate a new structure and insert values into the head (global variable). Each new list item will have its value incremented by one.

Aside from the value itself, each node needs to hold a pointer to the next structure in the list. The last node has the next pointer set as NULL.

Once the list is completed, print its value by traversing its items from the head to its end.

$ ./a.out
a.out: usage: ./a.out <num>
$ ./a.out 10
9
8
7
6
5
4
3
2
1
0

🔑 linked-list-free.c

After the list is printed, free it. Make sure there are no memory leaks.

🔧 Binary tree from binary representation

Write a program that creates a binary tree with N + 1 nodes, where N is specified as a command line argument.

Each node, besides the root (that is always created), will contain an integer member and will be created like this: take a binary representation of the number in the sequence {0, 1, ..., N - 1} and traverse the pre-existing tree using this representation until a NULL node pointer is found. 0 means the left child, 1 means the right one.

Only the leaf nodes actually contain the number.

For example, for 5 (101 in binary), the tree will be created as follows:

  1. create root node
  2. create right node, descend
  3. create left node, descend
  4. create right node, descend, put the number in

Once the tree is populated, traverse the tree (choose your method), count the nodes and print the number to the standard output.

Variant: add a function pointer to each node that will be called on its visit. The number stored in the node will be the sole argument of this function. Use that to traverse the tree and free the memory in the process.

🔧 Home assignment

Note that home assignments are entirely voluntary but writing code is the only way to learn a programming language.

Command line option parsing

Implement a command line parser working the same as getopt(3). The global variable myoptarg will be set if an option has an argument to mimic the optarg set by getopt(). Choose a useful subset of what getopt offers.

Note: call the function mygetopt() not to intermingle with the standard library's getopt(). Implement the "POSIXly correct solution", i.e. stop once non-option argument is encountered (i.e. one not starting with -) or after the -- option is reached.

🔧 Bonus task: implement myoptind/myopterr/myoptopt variables

See the following code on how to use getopt(3):

👀 getopt.c

Create and traverse path tree

In memory, build a tree of paths, where each node is a path element. The tree starts with root / (assuming Unix-like system), the non-leaf nodes are directories, the leaves are regular files. The tree can look e.g. like this for paths /foo/a.c, /foo/b.c, /f.txt, /bar/c.c:

	 "/"
       /  |   \
      /   |     \
  "foo/" "f.txt"  "bar/"
  /   \            |
"a.c" "b.c"       "c.c"

Populate the tree with given paths and print the leaves (no particular order), one per line. Then free the allocated memory.

Each node can have constant number of children.

Possible input:

char *paths[] = {
    "/beverages/coffee/espresso.java",
    "/beverages/alcohol/beer.c",
    "/food/healthy/vegetarian/salad.txt",
    "/food/healthy/fruit/blueberries.hs",
    "/food/unhealthy/cake.md"
};

🔑 dirtree.c

Bonus tasks

  • make the function used for tree traversal generic, i.e. pass in a function pointer
  • Print also non-leaf nodes via depth first traversal, with each node indented based on its tree depth. E.g:
$ ./a.out
/
foo/
	a.c
	b.c
f.txt
bar/
	c.c
  • Remove the limitation for the constant number of children.
  • Print full paths of all files in the tree (hint: backpointers)