11.md

Warm-up

Binary tree

  • Define a structure for a binary tree using the structure node_s name.
  • Each node of the tree can store either an int or a string (char[] of arbitrary size).
  • There is the member type of an int type. That determines the data type stored in the node, e.g. 0 for int, 1 for char[].
  • Build a tree using a bunch of nodes (can be totally unbalanced).
    • Can be statically initialized (or generated randomly however this will probably take more time to do and might require allocation on the heap).
  • Implement a function to perform the depth-first traversal that accepts a pointer to its root node and prints all data stored in the tree represented by the node based on the type of data in each node.

Unions

  • Unions resemble structures but they have a different semantics.
    • Unions store its members in the same memory location.
    • This allows for different "views" of the same data.
  • They are usually combined with structures.
  • Handy for efficient data storage or HW programming.

Consider this declaration:

union foo {
	unsigned short i; // 2 bytes
	struct {
		unsigned char low;
		unsigned char high;
	} bytes;
};
  • The sizeof(union foo) will be that of its largest member.
  • Modify a value of i and it will be reflected in the low/high values because i and bytes share the same location
		0             high memory addresses
		+-------------+
		|      i      |
		+-------------+
		| low  | high |
                +-------------+
  • Union can be part of a structure, 👀 union-in-struct.c and vice versa 👀 struct-in-union.c
  • Union can be "anonymous" (e.g. no name inside of a structure), however this is non-standard just like for structures.

🔧 Use the above declaration to find out if the program is running on a big or little endian system (least significant byte is stored on lowest address). The program will print either "big endian" or "little endian" to the standard output.

🔑 union-lowhigh.c

🔧 Take the binary tree implementation and convert it to use a union to store the data.

Storage classes

  • There are two storage classes - automatic and static.
  • The storage class determines the lifetime of the storage associated with the identified object.
  • Declaration within a block creates an automatic object. Its storage is valid only within the very same block.
  • Only one storage class specifier may be given in a declaration.
  • Objects declared outside of any block are always of the static storage class (e.g. global variables).
  • Static local objects (e.g. static int i;) retain their value upon reentry to functions and blocks.
  • You can initialize a static object. The initialization happens just once.

👀 fn-static-object.c

This source code example also shows how to use a goto statement. More on that later.

👀 block-static-object.c

Local variables are implicitly automatic but you could use the auto keyword (noone does that though):

int
main(void)
{
	auto int i;
}

You can also verify that as mentioned above, global variables may be only in the static storage class:

$ cat main.c
auto int j;

int main(void)
{
}
$ cc main.c
main.c:1:10: error: illegal storage class on file-scoped variable
auto int j;
         ^
1 error generated.

Static variables and functions

If a function is declared static, it is visible only from the same source code file.

Static variables: they retain value across function calls. That is, the initialization of a static variable within a function is done only once.

Yes, the word static is overloaded in C.

🔧 Take the binary tree implementation and use the static keyword where it makes sense.

Internal vs external linkage

Static objects with the keyword static are of internal linkage, meaning they are not seen from other compilation units. Static objects without the keyword static are implicitly external.

Note that global variables in always in the static storage class, and the following global variable is visible only from within the file where it is defined because of the static keyword:

static int x_global;

int
main(void)
{
	// ...
}

As we mentioned before, static is really an overloaded word/keyword in C and may lead to confusion.

However, the following global variable (of the static storage class) is of external linkage, meaning it is visible from other compilation modules (= files).

int x_global;

int
main(void)
{
	// ...
}

Use the extern keyword to declare objects that are defined in a different compilation unit (= file).

Example:

$ cc linkage.c ext.c
Undefined symbols for architecture x86_64:
  "_si", referenced from:
      _main in linkage-917564.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

👀 linkage.c 👀 ext.c

Also note that each object must have exactly one definition. For objects with internal linkage, this rule applies separately to each translation unit, because internally-linked objects are unique to a translation unit.

Function pointers

A function name is a pointer to the function. You can pass it as an argument.

int
myadd(int a, int b)
{
	return (a + b);
}

int
process_numbers(int a, int b, int (*f)(int, int))
{
	return ((*f)(a, b));
}

(void)process_numbers(7, 13, myadd));

👀 fn-ptr.c

🔧 Take the binary tree implementation and convert the depth-first traversal function to accept a second argument that is a callback. Implement two callback functions - one that prints the integer type nodes, another that prints the string type nodes (each callback will check the type of the node). The callback has a single argument - pointer to a node.

🔧 Sort argv

Use qsort(3) function to sort argv and print it sorted then. Check the man page, you will need to write a function that compares two elements of the array you provide to the qsort() function.

First use strcmp() to sort the argument alphabetically, then use atoi() (or a function that actually checks if a string represents a number) to sort them numerically.

After that, come up with another way of sorting the arguments (e.g. according to their length etc.) and write a function for it as well.

🔑 argv-sort.c (for alphabetical sorting only)

The last initializer and a comma

In C89, having a comma after the last initializer was a syntax error. Since C99, it is allowed and optional.

struct s {
	int a;
	int b;
} s = { 1, 2, };	// ',' after 2 is optional

See why keeping the last comma always there might be a very good idea:

👀 missing-comma-in-initializer.c

Enumeration constant

We already know integer constants, character and string constants, a floating-point constant, and a constant expression (an expression that involves only constants).

There is one other kind of a constant, the enumeration constant. An enumeration is a list of constant integer values, as in:

enum boolean { NO, YES};

The first name has value 0, the next one 1, etc. You can use explicit values as well. Unspecified values continue to progress from the last value specified. For example:

enum months {
	JAN = 1,
	FEB,
	MAR,
	APR,
	MAY,
	JUN,
	JUL,
	AUG,
	SEP,
	OCT,
	NOV,
	DEC,
};

You could also do as follows:

enum weird {
	one = 3,
	two,		// 4
	three = 3,
	four = 0,
	five,		// 1
	six,		// 2
	seven = -1,
	...
}

The list consists of enumerators. I.e. both JAN = 1 and FEB are enumerators. When used in code, JAN, FEB, MAR etc. are enumeration constants. The values of enumeration constants must fit an int. However, the implementation may define any integer type to be compatible with the enumerated type under the condition that all enumeration constants fit the type. So, the implementation might choose a char to be compatible with the enum boolean above.

enum white_space { NL = '\n', SP = ' ', TAB = '\t' };

The sizeof operator applied on an enum itself tells you the integer type used (sizeof (enum weird)), and sizeof on any enumeration constant gives you the same number.

Names even in different enumerations must be distinct (that is quite clear given that how enumerations constants are being used). In other words, all enumerators share the same namespace. Values in the same enumerations need not be distinct, e.g.:

enum not_really_useful { GGG = 1, HHH = 1, GHGH = 0 };

Example: 👀 enums.c, enum-values.c

Use enums!

This:

enum commands {
	XXX_LIST,
	XXX_EXTRACT,
	XXX_CREATE
}

is so much better than this:

#define	XXX_LIST	0
#define	XXX_EXTRACT	1
#define	XXX_CREATE	2

The reason is that the enum is known to the compiler and may be included into debug symbols. So, when debugging, you might, if the compiler/debugger support it, see XXX_LIST instead of just 0. In the latter case, as the compiler only sees the numbers when compiling the code (macros were replaced by numbers in teh preprocessor phase), you will never see any symbolic names. So, seeing the symbols instead of common numbers 0 or 1 is really helping.

🔧 Convert the binary tree implementation to use enum to store the data type.

The const keyword

const is a qualifier that may be applied to the declaration of any variable. It specifies its value cannot be changed. If you apply const to an array, the array elements cannot be modified.

Note that the verb qualify here means to limit the strength or meaning of.

You can initialize the variable declared with the const qualifier.

👀 const.c

Using const will not really make the storage constant, you just cannot use that variable for an assignment to the storage.

int i = 'c';
int const *p2 = &i;

// illegal
*p2 = 'x';
// ok
i = 'x';

👀 const-not-really-a-const.c

It also depends where you apply the const keyword.

char const *s1;		// s1 is a pointer to const char
const char *s1;		// same as above
char *const s2;		// s2 is a const pointer to a char

*s1 = 'c';		// illegal
s1 = NULL;		// legal

s2 = NULL;		// illegal
*s2 = 'a';		// legal

👀 const-and-pointers.c

Using const can get a bit confusing. It is mostly used for pointer arguments to constant memory to specify that an array will not be changed in the function. For example, string functions:

size_t strlen(const char *s);
char *strncpy(char *dest, const char *src, size_t n);
...

🔧 Verify how const works

The following is a const pointer to a const character. So, you can neither do *p = ... nor p = .... Verify.

const char *const p;

🔧 Take the binary tree implementation and use the const keyword where it makes sense.

The typedef keyword

Note that a char, signed and unsigned integer types, and the floating types are collectively called basic types.

We also have derived types - an array type, a structure, a union, a function type, and a pointer type.

With typedef, you create new data type names for existing types. Note that you never create new data types with typedef. In other words, with typedef, you create synonyms to the existing types.

typedef is most commonly used for derived types but many type names provided by the C standard itself are based on basic types. For example, size_t, see its definition in /usr/include/sys/types.h.

You can also create new type names using type names you created before, see below.

typedef is used as follows:

typedef int myint;
typedef char *mycharptr;

mycharptr p = "my string";

typedef is great to create complex type names in small steps. As we will see next.

typedef char **array[10];
array a;

👀 derived-types.c

The convention is to add _t to a new type name. For example:

typedef struct mystruct_s {
	int a;
	char c;
} mystruct_t;

And you can use it like this:

struct mystruct_s x;
mystruct_t y;

x and y are of the same type.

👀 typedef.c

🔧 Take the binary tree implementation and use typedef for the node structure itself and also fot the callback.

Variable namespace and scope

Identifiers fall into several namespaces that do not interfere with one another.

These distinct classes are:

  • objects, functions, typedef constants
  • labels
  • tags of structures or unions, and enumerations
  • members of a structure or union individually

The "individual" part means each structure or union has its own namespace. So, you can have two different structures, each using the same member names.

👀 identifier-namespace.c

goto statement

Any statement may be preceded by a prefix that declares an identifier as a label name. Labels by themselves do not alter the flow of control.

Within the same function, you can jump to a label via the goto statement.

int
main(int argc, char **argv)
{
	if (argc == 1)
		goto err;
	/* Process arguments here. */
	return (0);
err:
	errx(1, "missing arguments");
	return (1);
}

Use of goto in C is often frowned upon unless it is used to jump to an error/clean-up section or break from structured code. While even that use is questioned by some, large number of major projects (Linux kernel code, OpenSSH, OpenSSL, etc.) use it that way. We strongly believe such use simplifies the code if done well.

Consider the following code with conditionals that do not nest. It leads to code duplication (or large amount of indentation to fix that):

int
myfn(void)
{
	/* declarations ommited */

	if ((p = malloc(16)) == NULL)
		return (1);
	if ((p2 = malloc(16)) == NULL) {
		free(p);
		return (1);
	}
	if ((p3 = malloc(16)) == NULL) {
		free(p);
		free(p2);
		return (1);
	}
	if ((p4 = malloc(16)) == NULL) {
		free(p);
		free(p2);
		free(p3);
		return (1);
	}
	/* etc... */

	printf("Success.\n");
	free(p);
	free(p2);
	free(p3);
	free(p4);
	/* ... */
	return (0);
}

With the conservative use of goto, we can simplify the code as follows:

int
myfn(void)
{
	/* declarations ommited */

	int ret = 1;

	if ((p = malloc(16)) == NULL)
		return (1);
	if ((p2 = malloc(16)) == NULL)
		goto err;
	if ((p3 = malloc(16)) == NULL)
		goto err2;
	if ((p4 = malloc(16)) == NULL)
		goto err3;
	/* etc... */

	/* Getting here means all went fine. */
	ret = 0;
	printf("Success.\n");

	free(p4);
err3:
	free(p3);
err2:
	free(p2);
err:
	free(p);
	return (ret);
}

The good thing is that you clean up at one common place at the end of the function. If there is something new that needs to be cleaned up after the code is modified, you add it to the clean-up section. In the former example, you would have to check all error paths and modify those one by one.

We support reasonable use of goto for the clean-up and structured code breaks. If used wisely, it leads to cleaner code and saves lots of indentation. Also note that the break and continue statements are jumps as well, and imagine how to structure your code without it.

switch statement

switch (expression) {
	case constant1:
		// statements
		break;
	case constant2:
		// statements
		break;
	.
	.
	.
	default:
		// default statements
}

Each case is a label. For that reason, you must break unless you want to code fall through. The reason for that is historical and time showed it was not the right decision.

Very often used for command line option processing.

int opt;

while ((opt = getopt(argc, argv, "c:")) != -1) {
	switch (opt) {
	case 'c':
		/*
		 * optarg should be copied since it might be
		 * overwritten by another option or freed by getopt()
		 */
		if ((code = strdup(optarg)) == NULL)
			err(1, "cannot alloc memory for -c optarg");
		break;
	case '?':
		fprintf(stderr, "unknown option: '%c'\n", optopt);
		usage(argv0);
		break;
	}
}

👀 switch.c

Also note that "any statement may be preceded by... a label". As case is a label as well (only valid within the switch statement though), you cannot have a declaration right after the case label, because declaration is not a statement. Similarly, you also cannot have a label right before a closing }. So, a null statement to the rescue:

switch (c) {
case 'a':
	;	/* must be here as a declaration is not a statement. */
	int a;
	/* ... */
default:
	;	/* must be here as the closing '}' is not a statement. */
}

The other alternative would be to move the first statement that follows the variable declaration before it.

🔧 Home assignment

Note that home assignments are entirely voluntary but writing code is the only way to learn a programming language.

Linked list, part II.

(Note: this assignment was given in one of the previous years as the assignment for the final in-person test with the 90 minute limit).

/*
 * Create a set of functions for manipulating a linked list.  The program
 * accepts input commands as command line arguments and executes them.  New
 * elements are always appended to the list and any item is always removed from
 * the head.  Keep pointers to the first list item (head) and the last one
 * (tail).
 *
 * Commands are: create a list, insert an item, remove an item, and print the
 * list.  It is mandatory to create a list before doing anything else.
 *
 *   create a list:		C
 *   insert an item:		I<nnn>
 *   remove an item:		R
 *   print a list:		P
 *
 * Do reasonable checking for dealing with invalid input.  See below.
 *
 * Example:
 *
 *   $ ./a.out C I1 I2 I3 I4 I5 P R P I6 P I7 P R P R P R P R R R P I88 P R P
 *   List: 1 2 3 4 5
 *   List: 2 3 4 5
 *   List: 2 3 4 5 6
 *   List: 2 3 4 5 6 7
 *   List: 3 4 5 6 7
 *   List: 4 5 6 7
 *   List: 5 6 7
 *   List: EMPTY
 *   List: 88
 *   List: EMPTY
 *
 *   $ ./a.out C R
 *   a.out: Cannot remove an item from an empty list.
 *
 *   $ ./a.out C P
 *   List: EMPTY
 *
 *   $ ./a.out C L
 *   a.out: Wrong command: L
 *
 *   $ ./a.out C C
 *   a.out: List already created, exiting.
 *
 *   $ ./a.out I88
 *   a.out: List not created yet.
 */

👀 linked-list.c

🔧 Animal sorting

You may remember working with an animal structure.

Use the same structure for this assignment.

🔧 Sort the array by number of legs, print it out to standard output.

🔧 Sort the array by the animal name. Print it out to standard output. Use strcmp() to do the comparison of names.

Make the comparison functions static.

Use the standard libc sort function qsort(3). Check the manual page on how it's used. You will need to define a callback function that the qsort() function will use to compare two array elements.

Make the program to accept an argument (0 or 1) and run the sorting function based on that.

Bonus: Sort the animals based on both name and number of legs.