13.md

Warm-up

๐Ÿ”ง alternating bits detection

Task: detect if binary representation of an int has alternating bits in the whole width of the type. (e.g. 01010101 if the type was char).

The answer from the program will be yes/no.

Start with naive/straightforward implementation, i.e. traverse all the bits (O(n) time) and use bit operations/arithmetics.

Note: this can be done in O(1) time using some tricks

๐Ÿ”ง Number of bits for binary representation

Given an int (assume 4 bytes) with a positive value, use bit operators to find out how many bits (and bytes) will be needed to represent the number in binary. Then rewrite ๐Ÿ‘€ print-in-binary.c using the acquired number.

Note: be careful about little vs big endian

goto statement

Any statement may be preceded by a prefix that declares an identifier as a label name. Labels by themselves do not alter the flow of control.

Within the same function, you can jump to a label via the goto statement.

int
main(int argc, char **argv)
{
	if (argc == 1)
		goto err;
	/* Process arguments here. */
	return (0);
err:
	errx(1, "missing arguments");
	return (1);
}

Use of goto in C is often frowned upon unless it is used to jump to an error/clean-up section or break from structured code. While even that use is questioned by some, large number of major projects (Linux kernel code, OpenSSH, OpenSSL, etc.) use it that way. We strongly believe such use simplifies the code if done well.

Consider the following code with conditionals that do not nest. It leads to code duplication (or large amount of indentation to fix that):

int
myfn(void)
{
	/* declarations ommited */

	if ((p = malloc(16)) == NULL)
		return (1);
	if ((p2 = malloc(16)) == NULL) {
		free(p);
		return (1);
	}
	if ((p3 = malloc(16)) == NULL) {
		free(p);
		free(p2);
		return (1);
	}
	if ((p4 = malloc(16)) == NULL) {
		free(p);
		free(p2);
		free(p3);
		return (1);
	}
	/* etc... */

	printf("Success.\n");
	free(p);
	free(p2);
	free(p3);
	free(p4);
	/* ... */
	return (0);
}

With the conservative use of goto, we can simplify the code as follows:

int
myfn(void)
{
	/* declarations ommited */

	int ret = 1;

	if ((p = malloc(16)) == NULL)
		return (1);
	if ((p2 = malloc(16)) == NULL)
		goto err;
	if ((p3 = malloc(16)) == NULL)
		goto err2;
	if ((p4 = malloc(16)) == NULL)
		goto err3;
	/* etc... */

	/* Getting here means all went fine. */
	ret = 0;
	printf("Success.\n");

	free(p4);
err3:
	free(p3);
err2:
	free(p2);
err:
	free(p);
	return (ret);
}

The good thing is that you clean up at one common place at the end of the function. If there is something new that needs to be cleaned up after the code is modified, you add it to the clean-up section. In the former example, you would have to check all error paths and modify those one by one.

We support reasonable use of goto for the clean-up and structured code breaks. If used wisely, it leads to cleaner code and saves lots of indentation. Also note that the break and continue statements are jumps as well, and imagine how to structure your code without it.

switch statement

switch (expression) {
	case constant1:
		// statements
		break;
	case constant2:
		// statements
		break;
	.
	.
	.
	default:
		// default statements
}

Each case is a label. For that reason, you must break unless you want to code fall through. The reason for that is historical and time showed it was not the right decision.

Very often used for command line option processing.

int opt;

while ((opt = getopt(argc, argv, "c:")) != -1) {
	switch (opt) {
	case 'c':
		/*
		 * optarg should be copied since it might be
		 * overwritten by another option or freed by getopt()
		 */
		if ((code = strdup(optarg)) == NULL)
			err(1, "cannot alloc memory for -c optarg");
		break;
	case '?':
		fprintf(stderr, "unknown option: '%c'\n", optopt);
		usage(argv0);
		break;
	}
}

๐Ÿ‘€ switch.c

Also note that "any statement may be preceded by... a label". As case is a label as well (only valid within the switch statement though), you cannot have a declaration right after the case label, because declaration is not a statement. Similarly, you also cannot have a label right before a closing }. So, a null statement to the rescue:

switch (c) {
case 'a':
	;	/* must be here as a declaration is not a statement. */
	int a;
	/* ... */
default:
	;	/* must be here as the closing '}' is not a statement. */
}

The other alternative would be to move the first statement that follows the variable declaration before it.

Building with a concrete C specification version

If you want to make sure your code is built following a concrete C version, it has two parts:

  • you must request the C specification version in your C code
  • you need an implementation (compiler, linker, ...) that supports the version you want

Limiting your code to the specific C spec version

In C89, there was no __STDC_VERSION__ macro (note there are two underscores both leading and terminating the macro). In the Normative Addendum 1 for C89 from 1994, the macro was defined with a value of 199409L. In the C99 specification the macro has a value of 199901L.

So, if you want your compiler to use a minimal or a specific C version, work with the macro and #ifdef and #error preprocessor directives.

#ifndef __STDC_VERSION__
#error "__STDC_VERSION__ not defined.  C99 compiler or greater required."
#else
#if __STDC_VERSION__ < 199901L
#error "C99 compiler or greater required."
#endif
#endif

๐Ÿ‘€ check-c-version.c

๐Ÿ‘€ request-c99.c

Compiler version

In the 1997: Single UNIX Specification (SUS) version 2, with systems supporting that version as UNIX98, the compiler binary supporting C89 had to be named c89. You can search SUS ver 2 here, and type c89.

In POSIX:2004, which corresponds to SUS ver 3 with updates, the C compiler binary supporting C99 standard had to be named c99. The latest SUS version (as of May, 2020) still defines c99. I expect the future SUS versions with require c11 binary to support the C11 standard.

Might be difficult to get the compiler work as the exact specification version. For example, the following will compile while // are not part of C89, and moreover, _Bool is not either and it does not issue even a warning.

janp:air:~$ cat main.c
int
main(void)
{
	// not in C89
	_Bool b;
}
$ c89 main.c
main.c:4:2: warning: // comments are not allowed in this language [-Wcomment]
        // not in C89
        ^
1 warning generated.
$ gcc -std=c89 -pedantic main.c
main.c:4:2: warning: // comments are not allowed in this language [-Wcomment]
        // not in C89
        ^
1 warning generated.

Flexible array member

  • since C99 (ยง6.7.2.1, item 16, page 103)
  • it is an array without a dimension specified as the last member of the structure
    • handy for structures with a fixed header and some "padding" data of flexible length that is allocated dynamically
    • why not to use a pointer instead? It is good when passing data over boundaries such as network, kernel/userland, etc. since deep structure copy is not necessary.
      • just copy the structure as a whole (however, it is necessary to know how large is the padding) because it is all contiguous memory
  • sizeof (the_structure) gives you the size of the structure as if the flexible array was empty.

Example:

struct item {
	int	itm_value;
	/* Other members follow*/
	/* ... */
	int	itm_plen;  // payload length in bytes
	char	itm_payload[];
	/* Nothing can follow! */
};

The memory layout looks like this:

+-------------+-----------------------+
| struct item |      payload ...      |
+-------------+-----------------------+

Previously, this was hacked around using array with 0 items and GCC accepted this. Since C99 this can be done properly using this technique.

The data for the structure is allocated as follows:

struct item *p;

p = malloc(sizeof (struct item) + payload_len * sizeof (p->itm_payload[0]));
assert(p != NULL);
p->itm_plen = payload_len * sizeof (p->itm_payload[0]);
  • with this approach the overall structure alignment might be lost
    • i.e. it is necessary to set the payload length according to the size of the structure if you want to maintain the alignment

๐Ÿ‘€ flexible-array-member.c

Structure bit fields

  • sometimes memory is scarce (imagine having to keep millions of big structures in memory) and there are members holding integer values that occupy just a couple of bytes
    • bit fields can be used to shrink the memory needed
struct foo {
	unsigned int a : 3;
	unsigned int b : 1;
	unsigned int c : 4;
};
  • the number after the colon specifies the number of bits for a given bit field

    • cannot exceed the size of the underlying data type (unsigned int in the above example)
  • you cannot use sizeof on a bit field

  • this is good for implementing network protocol headers or HW registers, however the layout of bit fields in a C structure is implementation dependant

    • if it needs to match a concrete layout, additional non-standard compiler features have to be used (#pragma's etc.)
    • there is no offsetof() for bit fields

๐Ÿ‘€ bitfield.c

  • the integer values will behave as expected, e.g.
struct foo_s {
	unsigned int a : 3;
	unsigned int b : 1;
	unsigned int c : 4;
} foo;

foo.a = 7;
foo.a++; // will wrap around to 0 since this is unsigned

๐Ÿ‘€ bitfield-wraparound.c

Tansparent handles may cause issues

If your library is stateful, the state is often managed by a handle used as an argument in the library function calls. Let's say internally in the library we need to track some ID and a position. Those members are expected to be used only by the library itself so the consumers of the library should not care or use whatever is inside the handle.

/* mylib.h */
struct ml_hndl {
	int ml_id;
	int ml_pos;
};

/* mylib.c */
struct handle *
ml_init(void)
{
	/* allocate and init the handle first */
	/* ... */

	return (h);
}

/* mylib.c continues */
enum ml_err
ml_parse(struct handle *h, char *url)
{
	enum ml_err e = ML_OK;

	/* parse the URI */
	/* ... */

	return (e);
}

/* ... */

And the consumer would use it as follows:

#include <mylib.h>

int
main(void)
{
	ml_err ret;
	struct ml_hndl *h;

	h = ml_init(void);
	if ((ret = ml_parse(h, "some URI")) != ML_OK)
		errx(1, "...");

	/* ... */
}

So far, so good. However, what if the consumer code delves into the handle and uses the information in there (because the header file is part of the library and available to the consumer)?

int
main(void)
{
	/* ... */

	print_debug("[Debug] handle ID: %d\n", h->md_id);

	/* ... */
}

This will work fine until the library provider changes the handle (believed to be internal to the library):

/* mylib.h */
struct ml_hndl {
	long long ml_id;	/* used to be an int */
	int ml_pos;
};

Unless the consumer code is recompiled after the change, which may not happen as it could be some 3rd party software bought by the customer years ago, the debug print is now incorrect as it only prints the first 4 bytes of the structure, not 8.

There are ways to mitigate that:

  • use something non-revealing as a handle, like in index to the internal array of structures that keep the states. That is possible but you now need to manage the internal structure that keeps the state structures. You also cannot change the index type unless you recompile all the consumers as well.
  • the library provider could use something called incomplete types and use them to provide an opaque handle.

Incomplete types

An incomplete type is a type that describes an object but lacks information needed to determine its size. You cannot declare objects using such types as the lack of the size prevents to allocated memory for them on the stack or heap.

$ cat main.c
struct x a;	/* declaring a variable using a structure of an unknown size */

int
main(void)
{
}

$ cc main.c
main.c:1:10: error: tentative definition has type 'struct x' that is never
      completed
struct x a;
         ^
main.c:1:8: note: forward declaration of 'struct x'
struct x a;
       ^
1 error generated.

The x structure could be completed, for all declarations of that type, by declaring the same structure tag with its defining content later in the same scope. We could only forward declare the structure though (that is, no defining any object of that structure type).

As we do not need to know the size of the x structure, the following code compiles.

$ cat main.c
struct x;	/* this is called forward declaration */

int
main(void)
{
}

$ cc main.c
$ echo $?
0

BTW, the void type is an incomplete type that can be never completed. So, you cannot declare a variable of type void.

$ cat main.c
int
main(void)
{
	void x;
}

$ cc main.c
main.c:4:7: error: variable has incomplete type 'void'
        void x;
             ^
1 error generated.

However, you can always declare and work with a pointer to an incomplete type (all structures are always aligned to the same byte boundary). The following will compile and run fine.

#include <stdio.h>

struct x *
myfn(struct x *p)
{
        return (p);
}

int
main(void)
{
        myfn(NULL);
}

This feature of the C language can be used to represent opaque handles, and those can be used to solve the problem with non-transparent handles .

Opaque structures (handles)

With the help of incomplete types, it is possible to declare a structure in a header file along with API that consumes it without revealing what the structure actually contains. The reason to do that is not to allow a programmer to depend (= use) on the structure members, thus allowing to change the structure innards anytime.

/* Forward declaration in foo.h */
struct foo;

void
do_stuff(struct foo *f);

The file implementing do_stuff() will cast the opaque structure object to an internal structure type.

/* in foo_impl.h not provided to the consumers */
struct foo_impl {
    int x;
    int y;
};

/* library implementation code */
void
do_stuff(struct foo *f)
{
	struct foo_impl *fi = (struct foo_impl *)f;

	fi->x = ...
	fi->y = ...
}

Then any .c file that includes foo.h can work with the structure (by passing it to do_stuff()) however cannot access its members directly. The structure is usable only via a pointer. Thus, the library has to provide a function to allocate a structure and return the pointer to it (and possibly also to initialize it) and functions to get/set the data inside the structure.

This is handy for libraries so they are free to change the layout of structures without breaking the consumers.

The consumer of the library then #includes only foo.h but it does not have access to foo_impl.h.

#include <stdio.h>

#include "foo.h"

int
main(void)
{
	struct foo *h;

	h = getFoo();
	doStuff(h);
}

๐Ÿ‘๏ธ