Warm-up

`toupper()`

Rewrite the convert small characters to upper case program using a function.

Implementation notes

There is a toupper() library function from C99 so use a different name, say mytoupper()
- What happens if the program defines int toupper(int) ?
  - The symbol in libc is weak so it works

Variant: do this via lookup table - array

Generate array indexed by a..z (with corresponding values A..Z) in main() and use the array in the mytoupper() function
Arrays passed as function argument are in reality converted to pointers (to be introduced later)
- So, it is useless to write their size like this:

/* see what is the value of sizeof(array) inside the func() */
void func(int array[3]);

This is better:

void func(int array[], size_t size);

Also, if the items in the array are changed in the function, they will be changed in the array itself (consequence of pointer conversion).
What happens if mytoupper(-1) ?
short is sufficient to store the value:

short uppper[] = { 'A', 'B', ... };

🔑 toupper-table.c

A trivial fgrep

There are multiple variants of the grep(1) Unix utility. One of them is fgrep aka grep -F that prints the lines received on standard input which contain given fixed string (as opposed to a regular exception). fgrep can match multiple strings.

🔧 Write code to implement part of the fgrep functionality where the program will match a single string which is hard-coded in the program in the form of (boundless) array (char needle[] = "foobar";). The program will use getchar() to read the characters from the standard input.

Sample program run (with the search string hard-coded as bash):

$ cat /etc/passwd | ./a.out
root:x:0:0:root:/root:/bin/bash
foo:x:1000:1000:Foo,,,:/home/foo:/bin/bash
backu:x:1001:1001:backup,,,:/home/backu:/bin/bash
perforce:x:133:141:Perforce Admin:/opt/perforce:/bin/bash

compared to the fgrep(1) utility:

$ fgrep bash /etc/passwd
root:x:0:0:root:/root:/bin/bash
foo:x:1000:1000:Foo,,,:/home/foo:/bin/bash
backu:x:1001:1001:backup,,,:/home/backu:/bin/bash
perforce:x:133:141:Perforce Admin:/opt/perforce:/bin/bash

Assume a reasonable maximum line width, e.g. 1024 bytes including the newline. Print an error and exit with 1 (i.e. return (1) from main) when hitting a line that crosses that limit. Test that it works as expected!

The line limit is enforced to make the program simple, i.e. prevent using huge amounts of memory and avoid memory reallocation.

Note that it is generally better to fail rather then silently choose what to do on situations like that. For example, just ignoring the rest of the line after the 1024 byte limit which might be very confusing for users.

Write tests (as a home assignment), watch out for corner cases.

Solution: TBD. You can send us your solution so that we use it here. Do it as simple as possible.

String length

We know that we can use sizeof to get a length of a character array including the terminating '\0' character. There is a more generic way to get a string length, function strlen() that returns the number of characters without the terminating zero. Note that when we know what pointers are, the existence of the function will make more sense.

The function is declared in <string.h> and returns a value of type size_t, that is the same type as the operator sizeof uses. So, remember to use the z modifier to the %u conversion specifier:

#include <string.h>

char s[] = "hello, world";

printf("Length of \"%s\": %zu\n", s, strlen(s));

Note that to include a literal " in a string, it is necessary to escape it with the backslash.

Checking empty string

strlen() can be used to check if string is empty, however if not, it will incur performance penalty as it has to go through all the characters.

Thus, it is better to check just the first character - if it is \0 then the string is empty.

👀 string-isempty.c

More on functions/arrays

Variable length array (VLA)

Automatically allocated based on a variable value. Only for local variables, not globals. Same warning applies w.r.t. array size and stack growth as with fixed width arrays.

void fn(int n, int m)
{
    int a[n + m];
    ...
}

👀 array-variable-length.c
❗ Note that it generates more code which might be avoided by choosing a reasonable constant, if possible. Also see a rant from Linus.

K&R

History: you may still see in very old code: K&R (Kernighan and Ritchie) definition with types of arguments on separate line(s):

int
foo(a, b)
int a; int b;
{
	/* body */
}

Some compilers have dropped the support for K&R definitions or will drop it soon.

Function is not an object

Like an array, function is not a first class object (i.e. no "functor" like in functional languages).

That said, there are pointers to functions (more on these later).

Scope

Every identifier has its scope.
The scope of an identifier is the part of the program within which the identifier name can be used.
For example, we can declare a variable int i in the main function, and as well as in another function. Those two variables reference different objects.
For now, we only care about two types of scope - a file, and a block.
When declaring an identifier inside a block or within the list of parameter declarations in a function definition, the identifier has block scope. Its scope is from the identifier declaration to the end of the block.
When declaring an identifier outside of any block or list of parameters, the identifier has file scope.
Note that a block is used in multiple constructs in C, e.g. while, for, etc.
Within the inner block, an identifier of the same name declared in the outer block is hidden (and not visible). See:
- 👀 shadow.c
- 👀 shadow-block.c

Functions with variable number of arguments

Functions can have variable number of arguments of different types (more on that later).
- It must have at least one fixed argument (or more), and the rest is denoted with three dots ...
- The dots are called ellipsis

int printf(fmt, ...);

The first argument typically describes the rest of arguments in some way
- Usually it is a format string as in printf. It could possibly be an argument count if those were of the same type however in that case they would probably be passed as an pointer (we will get to pointers in a later class).

🔧 write a function that takes variable number of ULL arguments and returns their sum

Use the stdarg(3) man page.
What happens if there is one int value passed in between the ULL arguments?

🔑 stdarg.c

We already saw in the numbers module that you need to match the type of arguments with the type of conversion specifiers. Why does the following work? We are using an argument of type char, which is 1 byte, but the printf functions assumes through the %d specifier there are 4 bytes on stack (x32) or in the register (x64). Yet it works. Why?

char c = 13;

printf("%d", c);

👀 wrong-modifier.c

Ternary operator

cond ? expr1 : expr 2

Is the same as:

if (cond) expr else expr2

Example:

max = (i > j) ? i : j;

Note that if we add a semicolon, an expression becomes a statement, e.g.:

(i == j) ? i++ : j++;

That will increment i or j. The parentheses are not needed but generally used for better readability.

code: 👀 ternary-assign.c

🔧 `toupper()` using ternary operator

Rewrite the convert small characters to upper case program from last time using a function that utilizes the ternary operator (single line of code)

🔑 toupper-ternary.c

🔧 Min/max of 3 integer values

Write an expression that returns maximum of 3 integers i, j, k using ternary operator. Do not use any preprocessor macros.

🔑 3max.c

🔧 Ternary operator: quiz

See the code in 👀 ternary.c

is this possible to do with ternary operator ?
how would you fix it ?
the usual solution is to put this into macro. many libraries/programs define their own MAX/MIN macros.

Negative numbers

The way negative numbers are stored is implementation defined in C99. In our world, they are usually stored in two's complement format. However, the other allowable options are one's complement and sign and magnitude.

In short, to get two's complement, you take an absolute value, create one's complement (inverting the digits in binary representation) and add 1. There are several advantages of this representation, one is that there is only 1 zero (no negative and positive zero if we used the highest bit to track the sign). That is why a signed char, for example, can hold -128 to 127, and not just -127 to 127.

For a char:

 10000000	-128
 ........
 11111101	  -3
 11111110	  -2      ^
 11111111	  -1      |
 --------------------------
 00000000	   0      |
 00000001	   1      v
 00000010	   2
 ........
 01111111	 127

On Unix systems the shell reports the -1 return value as 255 in echo $?. Even though the main() returns integer (usually 4 bytes), the calling program (shell) gets just the low 8 bits and interprets them as unsigned quantity.

👀 return-1.c

Arithmetic type conversions

In (very) general, if binary operators have operands of different types, the "lower" is promoted to "upper". E.g. if one operand is a long and the other is an int, the other operand is promoted to a long. An unsigned type is always higher to its corresponding signed type. That is, if summing up a signed int with an unsigned int, the signed one will be converted to an unsigned first.

To give you a glimpse into what this means:

int
main(void)
{
	/* Also try with '-10 + 1U + 10U' */
	printf("%ld\n", -10 + 1U + 10L);
}

You would expect the result to be 1, right?

$ cc main.c
$ ./a.out
4294967297

However, -10 + 11U + 10L would result in 11, and -1 + 2U would result in 1.

👀 not-one.c

What is more, in most operations (for every operator, the spec says what is done in this respect), char and short integer operands are promoted to an int, or to an unsigned int if they do not fit in an int. That conversion is called integer promotion. This is done to make the language runtime fast (on x86 32-bit arithmetics can be much faster than when using 16-bit operands).

Note that the only way an unsigned short would not fit a (signed) int but fit an unsigned int is if both types had the same size (e.g. 4 bytes). You will probably not use such a compiler/platform combination, ever.

The most operations from the above sentence means all binary operators. A ternary operator as well. And even some unary operators.

sizeof (1) is 4 because 1 is an int. However, if a number does not fit to an int, a higher type will be used. For example, 4294967296 (2^32 = UINT_MAX + 1) will be stored in 8 bytes, so sizeof (4294967296) is 8.

Does it sound confusing? Do not worry, we will give you specific rules later in Arithmetic/integer promotion and conversion section.

🔧 Verify that numbers that do not fit in an int will have a size of 8 bytes.

Examples

Assuming char c; declaration, then:

sizeof (999) is 4 as 999 is an int by definition.
sizeof (c) is 1, always
sizeof (c + c) is 4 as + is a binary operator and the integer promotion kicks in, as mentioned above
sizeof (++c) is still 1 as ++ is an unary operator for which the integer promotion rules do not apply.
sizeof (+c) is 4 as for unary + and -, the integer promotion is applied.
- You just need to know or check the spec if you do not remember. Most of the time it does not matter but do not fool yourself, if you do not know details like that in C, it will bite you sooner or later.
- Remember A Case Study of Toyota Unintended Acceleration and Software Safety from the first class
sizeof (1LL) will usually be 8 as long long is usually 8 bytes.

It gets more interesting if unsigned and signed numbers are involved. E.g. a signed int is promoted to an unsigned int if one of the int operand is unsigned. This is called an implicit type conversion. There is also explicit type conversion (called casting) which we will deal with later in Arithmetic/integer promotion and conversion section.

To learn more about the implicit conversion, e.g. what the specification says exactly about assigning a double value to an int, see the 6.3 Conversions section of the C99 specification

I suggest you try these out with printf("%zu", ...). The %zu format string (see the printf manual page in section 3 on Unix systems) matches the return type of the sizeof operand. The exact unsigned numeric type of what sizeof returns may differ in different implementations so %zu will work anywhere.

Example 2

If long is 8 bytes, and int 4 bytes, then -1L < 1U is true as you might expect because 1U is converted to a long because a long can represent all values an unsigned int.

However, if both types are 4 bytes, the relational expression is false! The reason is that in this case, as both types has the same integer conversion rank (C99 6.3.1.1), the signed number is converted to an unsigned. See 6.3.1.8 Usual arithmetic conversions, paragraph 1, for the details.

Two's complement representation of -1 in 4 bytes is:

(1) take absolute value of 1	00000000.00000000.00000000.00000001
(2) one's complement		11111111.11111111.11111111.11111110
(3) add 1 to get 2's complement	11111111.11111111.11111111.11111111

which is 2^32 - 1 when interpreted as unsigned quantity.

Just printf("%u\n", -1) to see it will print 4294967295 (use the Unix/Linux bc command and type 2^32-1 to verify).

👀 signed-plus-unsigned.c 👀 signed-to-unsigned.c

Example 3

The assymetry of the negative/positive interval can lead to the program crashing on architectures that detect it. This is a consequence of hardware handling rather than the language.

👀 crashme.c

Run with -INT_MIN (see limits.h) and -1. INT_MIN is usually -2147483648. The program would normally yield 2147483648 (positive number) however that would be INT_MAX + 1.
Should crash in 64-bit mode as well due to an int being passed in a 32-bit register.

$ cc -m64 crashme.c
$ ./a.out 2147483648 -1
-2147483648 -1
Floating point exception: 8
$ echo $?
136
$ kill -l 136
FPE

$ cc -m32 crashme.c
$ ./a.out 2147483648 -1
2147483647 -1
$

🔧 Quiz 1

What is the result if signed char represented in binary by 0xff and 0xff unsigned char are compared using the == operator ?
- Write down the hexadecimal representation of the integers corresponding to the 2 chars with printf()

🔑 int-promotion.c

Note that if the b character was defined as char b the result might be 1 because it is up to the compiler to choose whether char is signed or unsigned. Usually it is signed though. There are compiler options to specify this, e.g. GCC has -funsigned-char and -fsigned-char

🔧 Quiz 2

Will the program print the whole array ? - Try to come up with reason of the expected behavior before running the program.

👀 whole-array.c

Integer conversion rank

See 6.3.1.1 Boolean, characters, and integers for the full list.

Every integer type has an integer conversion rank defined as follows:

No two signed integer types shall have the same rank, even if they have the same representation.
The rank of a signed integer type shall be greater than the rank of any signed integer type with less precision.
The rank of long long int shall be greater than the rank of long int, which shall be greater than the rank of int, which shall be greater than the rank of short int, which shall be greater than the rank of signed char.
The rank of any unsigned integer type shall equal the rank of the corresponding signed integer type, if any.
The rank of char shall equal the rank of signed char and unsigned char.
The rank of _Bool shall be less than the rank of all other standard integer types.
The rank of any enumerated type shall equal the rank of the compatible integer type (see 6.7.2.2).

Function arguments

Say, what happens if you put a char as an argument of a function that accepts an int? As you might expect, the byte is just assigned to an int.

In general, the expressions passed as arguments are evaluated first, then the value of the resulted type is converted to the parameter type as in assignment.

More on that later. For more information, see 6.9.1 in C99

The order of evaluation of function arguments is unspecified, see 6.5.2.2p10 in C99:

The order of evaluation of the function designator, the actual arguments, and subexpressions within the actual arguments is unspecified, but there is a sequence point before the actual call.

That means, do not do things like these as it is not specified whether the 2nd argument expression is evaluated after the first agument is processed, or before. So, based on the compiler you use, you may either call the function with (1, 2) or (2, 2).

int i = 1;

myfn(i, ++i);

Compiler warnings

Each compiler has its own set of warnings that usually give a clue that something strange and potentially harmful can happen.
- There are differences between various compilers.
- We will focus on GCC here.
- During the old days the tool producing these warnings was called lint that had to be run separately (there was not enough memory to do both at once). Now those extra checkes are usually part of the compiler itself.
- The basics: use -Wall -Wextra
  - -Wall catches things like missing return value from a function that should return one.
  - 👀 no-return.c
- There are many places where a beginner can shoot himself into a foot by not knowing the language intricacies.
  - E.g. variable cannot be modified more than once between sequencing points (C99 6.5 paragraph 2). The -Wsequence-point that is included in -Wall warns about that.
  - 👀 sequence-point-violation.c
- The -Wshadow can catch shadow variables overriding:
  - 👀 shadow.c
  - 👀 shadow-block.c
- All or specific warnings can be turned into errors: -Werrror or -Werror=<insert_specific_error>, respectively.
  - Unless the warnings contain false positives (and those can usually be suppressed) this will help to avoid troubles later.
- There are other means to check for correctness (static/dynamic analysis), more on those later.

🔧 Use the options

Go through the programs written so far and run the compiler using the -Wall -Wextra options.

What kind of problems have you discovered? How to fix those?

👀 whole-array.c (where only -Wextra gives some clue)

Explore the compiler documentation for more helpful options. Over the time you will find a set of warning options that will serve you well.

Types of behavior in C

From C99, Annex J.

3. Terms, definitions, and symbols

...

3.4.1

implementation-defined behavior is unspecified behavior where each implementation documents how the choice is made.

An example of implementation-defined behavior is the propagation of the high-order bit when a signed integer is shifted right.

...

3.4.3

undefined behavior is behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements.

Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

An example of undefined behavior is the behavior on integer overflow.

...

3.4.4

unspecified behavior is use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance.

An example of unspecified behavior is the order in which the arguments to a function are evaluated.

Notes

More Information on Undefined Behavior

For the definition of undefined behavior see the types of behavior

Now an important note: it is not that a small part of a program (statement, expression, ...) has a result that is not predictable. Given that anything could have happened because of that, the entire execution of the program that experienced that undefined behavior is meaningless.

Not only when writing software for cars, trains, planes, nuclear power plants, etc., you really have to be aware of that.

More information can be found in A Guide to Undefined Behavior in C and C++.

Integer overflow

What happens if an int overflows?

The behavior of an integer overflow depends on whether the type is signed or unsigned.
- ❗ For signed types the behavior is undefined. I.e. you cannot rely that an overflow of a positive quantity in any signed integer will be turned into a negative number.
Some compilers will allow to choose the behavior of signed overflows using special options (-fwrapv for GCC), though.
For unsigned integers, the spec requires that overflow always wraps around (modulo power of 2).
- 👀 int-overflow.c
- 👀 unsigned-overflow.c
Use -fstrict-overflow -Wstrict-overflow (will become active only for higher optimization levels, i.e. -O<X> where X > 1) to stay on the safe side.

See the difference in using -ftrapv and not:

int
main(void)
{
	int i = 2147483647;
	printf("%d\n", ++i);
}

$ gcc main.c
$ ./a.out
-2147483648
$ gcc -ftrapv main.c
$ ./a.out
Aborted (core dumped)

🔧 Home assignments

Conway 1D

write a 1-D implementation of Conway's game of life

use rule 30 to determine the state
use two arrays (one 2-D and one 1-D) to represent the rules and its resulting values
- there are 8 rules, each has 3 values to compare against and one result
we now have the arsenal to write 2-D variant however to display 2-D world some way to refresh the terminal would be needed so we will stick with 1-D.
- use \r to reset the line between iterations
- sleep for 1 second between iterations (unistd.h is needed for that)
each life "tick" will print the line representing the world
use functions to refactor the code
once you have a working solution, try to rewrite it to be more efficient or elegant (or both)

Byte count to human readable string

Write function to convert uint64_t to human readable count (binary - see https://en.wikipedia.org/wiki/Orders\_of\_magnitude_(data) or https://en.wikipedia.org/wiki/Mebibyte, e.g. MiB as mebibyte, etc.) and print it to standard output.

void bytecnt2str(uint64_t num);

use character array to represent the magnitude letters and perform the lookup based on the actual magnitude
if there is a remainder, write a '+' following the number
write the output in single printf() (use ternary operator for the remainder)
example inputs/outputs:

1024		1 KiB
1025		1+ KiB
2047		1+ KiB
2048		2 KiB
2049		2+ KiB
5242880		5 MiB

solution: bytecnt2str.c

Offset checker

write a function that has the following prototype:

  bool check(long off, size_t size, size_t limit);

checks if the arguments are valid for accessing part of a file (no operations will be done, just the check) of size 'size' starting from an offset 'off'. The length of the file is given by the 'limit' argument.
offset may be negative

    |           |         |              |
    0          off     off+size        limit

try to capture all corner cases of what could go wrong with the values and return false on failure, true on success.

🔑 range-check.c

on Unix systems one would use ssize_t for the offset which is a signed integer type of the same size as size_t (ssize_t is not part of C99 but POSIX)
also, implement a set of test values in main() using an array

04.md