Warm-up

🔧 Print binary representation of positive number

Print argv[1] in binary (assume it is a correct positive decimal number that will fit unsigned int) to standard output. Note that printf(3) does not have a conversion specifier for it (unlike 'x' for hexa and 'o' for octal). Limit the input to positive ints.

Do not use bit operators even if you know how to (ie. do NOT use >> etc.)

Implement the functionality in void binprint(int) and call it from main().

To verify, use bc(1) with obase=2. E.g.:

$ bc
obase=2
10
1010
255
11111111
2^31-1
1111111111111111111111111111111

$ ./a.out 2147483647
1111111111111111111111111111111
$ ./a.out 255
11111111
$ ./a.out 348508345
10100110001011101000010111001

You can also use obase=8, obase=16, etc. Note that bc(1) takes only capital letters as digits, ie. use "E4" for 0xe4, not "e4". "e4" is treated as a variable:

$ bc
ibase=16
obase=2
e4
0
E4
11100100

🔑 print-in-binary.c

From 11 in 2025, run out of time in 11.

The `typedef` keyword

Note that a char, signed and unsigned integer types, and the floating types are collectively called basic types.

We also have derived types - an array type, a structure, a union, a function type, and a pointer type.

With typedef, you create new data type names for existing types. Note that you never create new data types with typedef. In other words, with typedef, you create synonyms to the existing types.

typedef is most commonly used for derived types but many type names provided by the C standard itself are based on basic types. For example, size_t, see its definition in /usr/include/sys/types.h.

You can also create new type names using type names you created before, see below.

typedef is used as follows:

typedef int myint;
typedef char *mycharptr;

mycharptr p = "my string";

typedef is great to create complex type names in small steps. As we will see next.

typedef char **array[10];
array a;

👀 derived-types.c

The convention is to add _t to a new type name. For example:

typedef struct mystruct_s {
	int a;
	char c;
} mystruct_t;

And you can use it like this:

struct mystruct_s x;
mystruct_t y;

x and y are of the same type.

👀 typedef.c

🔧 Take the binary tree implementation and use typedef for the node structure itself and also for the callback.

Bitwise operations

Bitwise operators are as follows:

Operator	Meaning
`&`	bitwise AND
`\|`	bitwise OR
`^`	bitwise XOR
`<<`	left shift
`>>`	right shift
`~`	one's complement (unary)

Note, individual bits are being processed. I.e.:

316 & 978			-> convert to binary to see it
100111100 & 1111010010		-> do logical AND bit by bit now

0100111100
1111010010
----------
0100010000	== 272 (use ibase=2 with bc(1) to verify)

So,

((338 & 978) == 272).	// == is of higher prority than &

When shifting to the left, the lower bits are filled with zeroes.

When shifting to the right, it depends on the signedness of the number. For unsigned values the upper bits are filled with zeroes. If the number is signed and negative (i.e. starts with bit 1 in two's complement representation), the behavior is implementation dependent. Usually the value it is sign-extended, i.e. filled with 1's.

👀 left-right-shift.c

Common uses of bitwise operators:

n = n & 01777;			// zero out some highest bits

/*
 * Setting individual flags.  Note that each of the flags has
 * just one bit set.
 */
#define	LIGHT_OFF	0x0000
#define	LIGHT_GREEN	0x0001
#define	LIGHT_RED	0x0002
#define	LIGHT_BLUE	0x0004
#define	LIGHT_YELLOW	0x0008
#define	LIGHT_VIOLET	0x0010
#define	LIGHT_WHITE	0x0020
// ...

lights = lights | LIGHT_YELLOW;	// turn on yellow light
lights = lights & ~LIGHT_RED;	// turn off red light

Another common case is to use an integer type to store multiple kinds of information, with mask to extract the parts. Here, a unsigned short type stores the byte value and associated flags:

#define	VALUE_MASK	0x00ff  // same as 0xff yet more readable
#define	FLAG_MASK	0xff00

#define	ONE		0x0100
#define	TWO		0x0200

unsigned short n = 16 | ONE | TWO;

if ((n & FLAG_MASK) & ONE)
	printf("ONE\n");

printf("%hhu\n", n & VALUE_MASK);

Binary operators and integer promotion

When working with bitwise operators, operands are promoted the same way as we learned in Conversions and promotions

That means:

13U & -1				-> promote int (-1) to unsigned,
					   i.e.  promote -1 to unsigned.
					   You get:

13 & 4294967295				-> now convert to binary
1101 & 11111111111111111111111111111111	-> the result clearly is 1101

👀 conv-and-prom-with-bitwise-ops.c

🔧 Task: rewrite print-in-binary.c using bitwise operators

🔧 Task: count 1 bits in a number (argv[1]).

🔑 count-1-bits.c

$ ./a.out 337
4
$ ./a.out 13375
9

bitwise operators vs. logical operators vs. comparison

Bit operators have lower precedence than comparison. As a consequence the evaluation of expressions such as (a & b == c & d) is counter intuitive.

👀 bitwise-vs-logical.c

In this case modern compilers (clang) will actually warn about missing parentheses.

You can read the explanation by Dennis Ritchie here: http://www.lysator.liu.se/c/dmr-on-or.html

🔧 Warm-up: bit difference

Task: write a program that counts the number of bits that need to be flipped so that number A becomes number B.

Note: use the character representation only for debug prints. The actual operations with the numbers are to be done with bit operations on integers.

Note: consider only positive integers, assume int.

🔑 bitdiff.c

Sub-task for home: flip those bits and verify that the numbers are the same.

🔧 alternating bits detection

Task: detect if binary representation of an unsigned int has alternating bits in the whole width of the type. (e.g. 01010101 if the type was char).

The answer from the program will be yes/no (printed to standard output).

Start with naive/straightforward implementation, i.e. traverse all the bits (O(n) time) and use bit operations/arithmetics.

Note: this can be done in O(1) time by comparing the value with 2 constants.

🔑 altbits.c

🔧 Number of bits for binary representation

Given an int (assume 4 bytes) with a positive value, use bit operators to find out how many bits (and bytes) will be needed to represent the number in binary. Then rewrite :eyes: print-in-binary.c using the acquired number.

Note: be careful about little vs big endian

🔑 numchars-for-binary.c

Reading complex declarations

To read a complex declaration, you use operator priorities and watch for parentheses.

char **argv

argv is a pointer to a pointer to char

int (*a)[10]

a is a pointer to an array of 10 ints

int *a[10]

a is an array of 10 pointers to int

void *myfunc()

myfunc is a function returning a pointer to void

void (*myfunc)()

myfunc is a pointer to a function returning void

char (*(*x())[])()

x is a function returning pointer to an array of pointers to a function returning a char

char (*(*x[3])())[5]

x is an array of 3 pointers to a function returning a pointer to an array of 5 chars

How to read it

find the identifier (non keyword or custom type) of a variable or function
start decoding:

left to right
) => reverse decoding direction
() => denotes function
[] => array
; => reverse the direction
when reading from right to left, we can hit:
- ( => reverse the direction
- * => pointer
- type identifier => starts the whole definition

Example:

char *(*(**foo[][8])())[];

You can mentally simplify as:

char *(*SOME_FUNCTION)[];

Where SOME_FUNCTION is (**foo[][8])()

Read as:

foo is an array of arrays of 8 pointers to a pointer to a function returning a pointer to an array of pointers to a char.

Alternatively, you can do it via two simpler steps:

typedef char *a_of_p[];
typedef a_of_p *(**foo[][8])();

👀 complex-declaration.c

🔧 Practice:

int *(*(*fp1)(int))[10];
char (*(*x())[])()
double *f()[]           // this is not legal in C as a function cannot
                        // return an array

Linking objects

Larger programs are often split into many source code files. One can create the binary as follows:

cc -o binary foo.c bar.c ...

and it will work however the files will be compiled every time even though they were not changed.

To save time the files are usually compiled individually and then linked into the binary using a runtime linker (sometimes called a link editor, or just a linker), a program usually called ld. This program also takes care of adding dependencies on dynamic libraries to the resulting binary, performing relocations (for relative addressing), etc.

It works like this (assuming all source code files changed):

cc -c foo.c
cc -c bar.c
...
cc -o binary foo.o bar.o ...

Do not confuse the runtime linker with dynamic linker, usually called ld.so. See its manual page for more information.

Note that the program ld was not invoked directly, it is the compiler that serves as an ld wrapper as it needs to add more object files with the C runtime under the hood. So in reality the compiler accepts some linker options (and passes them through) to make it easier for the programmer.

For GCC you can use the -### option to see what commands would actually be executed during compilation. Use -v to get the same debugging information but also execute the commands.

Usually it is desirable to avoid rebuilding objects unnecessarily, especially when there is a large number of source files. This is usually done via set of rules that establish dependencies. The most common tool for that is make and Makefile is its configuration file that drives the compilation.

Building with a concrete C specification version

If you want to make sure your code is built following a concrete C version, it has two parts:

you must request the C specification version in your C code
you need an implementation (compiler, linker, ...) that supports the version you want

Limiting your code to the specific C spec version

In C89, there was no __STDC_VERSION__ macro (note there are two underscores both leading and terminating the macro). In the Normative Addendum 1 for C89 from 1994, the macro was defined with a value of 199409L. In the C99 specification the macro has a value of 199901L.

So, if you want your compiler to use a minimal or a specific C version, work with the macro and #ifdef and #error preprocessor directives.

#ifndef __STDC_VERSION__
#error "__STDC_VERSION__ not defined.  C99 compiler or greater required."
#else
#if __STDC_VERSION__ < 199901L
#error "C99 compiler or greater required."
#endif
#endif

👀 check-c-version.c

👀 request-c99.c

Compiler version

In the 1997: Single UNIX Specification (SUS) version 2, with systems supporting that version as UNIX98, the compiler binary supporting C89 had to be named c89. You can search SUS ver 2 here, and type c89.

In POSIX:2004, which corresponds to SUS ver 3 with updates, the C compiler binary supporting C99 standard had to be named c99. The latest SUS version (as of May, 2020) still defines c99. I expect the future SUS versions with require c11 binary to support the C11 standard.

Might be difficult to get the compiler work as the exact specification version. For example, the following will compile while // are not part of C89, and moreover, _Bool is not either and it does not issue even a warning.

janp:air:~$ cat main.c
int
main(void)
{
	// not in C89
	_Bool b;
}
$ c89 main.c
main.c:4:2: warning: // comments are not allowed in this language [-Wcomment]
        // not in C89
        ^
1 warning generated.
$ gcc -std=c89 -pedantic main.c
main.c:4:2: warning: // comments are not allowed in this language [-Wcomment]
        // not in C89
        ^
1 warning generated.

Flexible array member

since C99 (§6.7.2.1, item 16, page 103)
it is an array without a dimension specified as the last member of the structure
- handy for structures with a fixed header and some "padding" data of flexible length that is allocated dynamically
- why not to use a pointer instead? It is good when passing data over boundaries such as network, kernel/userland, etc. since deep structure copy is not necessary.
  - just copy the structure as a whole (however, it is necessary to know how large is the padding) because it is all contiguous memory
sizeof (the_structure) gives you the size of the structure as if the flexible array was empty.

Example:

struct item {
	int	itm_value;
	/* Other members follow*/
	/* ... */
	int	itm_plen;  // payload length in bytes
	char	itm_payload[];
	/* Nothing can follow! */
};

The memory layout looks like this:

+-------------+-----------------------+
| struct item |      payload ...      |
+-------------+-----------------------+

Previously, this was hacked around using array with 0 items and GCC accepted this. Since C99 this can be done properly using this technique.

The data for the structure is allocated as follows:

struct item *p;

p = malloc(sizeof (struct item) + payload_len * sizeof (p->itm_payload[0]));
assert(p != NULL);
p->itm_plen = payload_len * sizeof (p->itm_payload[0]);

with this approach the overall structure alignment might be lost
- i.e. it is necessary to set the payload length according to the size of the structure if you want to maintain the alignment

👀 flexible-array-member.c

Structure bit fields

sometimes memory is scarce (imagine having to keep millions of big structures in memory) and there are members holding integer values that occupy just a couple of bytes
- bit fields can be used to shrink the memory needed

struct foo {
	unsigned int a : 3;
	unsigned int b : 1;
	unsigned int c : 4;
};

the number after the colon specifies the number of bits for a given bit field
- cannot exceed the size of the underlying data type (unsigned int in the above example)
you cannot use sizeof on a bit field
this is good for implementing network protocol headers or HW registers, however the layout of bit fields in a C structure is implementation dependant
- if it needs to match a concrete layout, additional non-standard compiler features have to be used (#pragma's etc.)
- there is no offsetof() for bit fields

👀 bitfield.c

the integer values will behave as expected w.r.t. underflow/overflow and integer conversions/manipulations, e.g.

struct foo_s {
	unsigned int a : 3;
	unsigned int b : 1;
	unsigned int c : 4;
} foo;

foo.a = 7;
foo.a++; // will wrap around to 0 since this is unsigned

👀 bitfield-wraparound.c

👀 bitfield-bitwise-ops.c

12.md