02.md

Warm-up

🔧 Convert units of measurement

Print conversion table for Ell (rope length used by Frodo and Sam in The Lord of the Rings had some 30 ells) to inches and centimeters, i.e. table with 3 columns separated by tabs.

Print the centimeter value as float with 2 digit precision. The cm value will be the last column.

Each 10 lines print a line (sequence of - characters, say 20 times). The line will immediately follow the table header and then will appear every 10 lines. Use while cycle to print the line with - characters.

Print 30 numeric rows.

Sample output:

Ell	Inches	Centimeters
--------------------
1	45	114.30
2	90	228.60
3	135	342.90
4	180	457.20
5	225	571.50
6	270	685.80
7	315	800.10
8	360	914.40
9	405	1028.70
10	450	1143.00
--------------------
11	495	1257.30
12	540	1371.60
13	585	1485.90
14	630	1600.20
15	675	1714.50
16	720	1828.80
17	765	1943.10
18	810	2057.40
19	855	2171.70
20	900	2286.00
--------------------
21	945	2400.30
22	990	2514.60
23	1035	2628.90
24	1080	2743.20
25	1125	2857.50
26	1170	2971.80
27	1215	3086.10
28	1260	3200.40
29	1305	3314.70
30	1350	3429.00

🔑 ell-in-cm.c

Source code management

  • Keep all of your code somewhere. Use a distributed source code management (SCM) system, ie. Git or Mercurial.
    • You could even keep your repo in your home directory in the Linux lab as some of those machines are accessible via SSH from anywhere or use services like Gitlab or Github and such.
    • We do recommend you never use centralized SCMs like Subversion or CVS, unless you have to (e.g. working on existing legacy software), as those are things of the past century.

Comments

  • /* One line comment */

  • Multiline comment:

  /*
   * Multiline comment.  Follow the C style.
   * Multiline comment.  Follow the C style.
   */
  • // One line comment from C99+

  • Use comments sparingly.

    • Not very useful:
    /* Increment i */
    ++i;
    • Produce meaningful comments, not like this:
    /* Probably makes sense, but maybe not */
    if (...)
             do_something()
  • Pick a reasonble style and stick to it. Mixing one line comments using both // and /* */ is not the best style.

  • In general, you can always figure out what the code does, it is just a matter of time to get there, but it may be impossible to figure out why the code works that way. The reason might be historical, related to some other decisions or existing code, purely random, or something else. If not clear, commenting code on the why is extremely important. See also Chesterton's fence principle.

Preprocessor

The main purposes of the preprocessor are: string replacement, file inclusion, general code template expansion (macros), and managing conditional compilation.

  • String replacement:

    • Basic defines: #define FOO or #define FOO 1
    • A define without a value is still meaningful for conditional compilation.
  • Including files:

    • #include "foo.h" (start in current directory and then continue the search in system paths) or #include <foo/bar.h> (just system paths)
      • Some compilers display the include search paths (e.g. clang with -v).
      • Use the -I compiler option to add search paths to the list.
  • Conditional compilation:

    • #if, #ifdef, #ifndef, #else, #endif
      • #if can be used with expressions:
        #if MY_VERS >= 42
        ...
        #endif
    • Also useful for header guards (to avoid including same header file multiple times):
      #ifndef FOO_H
      #define FOO_H
      ...
      #endif
    • Can be used e.g. for debug code:
      #ifdef DEBUG
      ... // here can be anything (where valid):
          // statements, variable declarations/definitions, function definitions, ...
      #endif
      • Then the compiler can be run with -DDEBUG to enable the code.
  • Macros: for more complicated code snippets, e.g. #define IS_ZERO(a) a == 0

    • The argument will be replaced with whatever is given.

    • Use parens for #define to prevent problems with macro expansion:

      • #define X (1 + 1)
      • Same for more complicated macros: #define MUL(a, b) ((a) * (b))

👀 mul.c

To see the result of running preprocessor on your code, use cpp or the -E option of the compiler.

🔧 Task: reimplement fahr-to-cent.c using defines instead of literal numbers

🔑 fahr-to-cent_defines.c

Expressions

  • Every expression has a value.

  • A logical expression has a value of either 0 or 1, and its type is always an int.

1 > 10	... 0
10 > 1	... 1

printf("%d\n", 1 < 10);
--> 1
/* Yes, "equal to" in C is "==" as "=" is used for an assignment */
printf("%d\n", 100 == 101);
--> 0
  • Even constants are expressions (more on that later), e.g. 1.

  • As the while statement is defined in C99 6.8.5 as follows:

while (expression) statement

...and given that a constant is also an expression, a neverending while loop can be written for example as follows. It is because it will loop until the expression becomes 0. That is never happening in this case.

while (1) {
	...
}

You could also write while (2), while (1000), or while (-1) and it would still be a neverending loop but that is not how C programmers do it.

  • Note that the statement from the spec definition can be a code block as you can see in the example code above, more on that later.

The break statement

  • The break statement will cause a jump out of a most inner while loop (well, any kind of loop but we only introduced the while loop so far).
int finished = 0;
while (1) {
	if (finished)
		break;
	/* not finished work done here */
	call_a_function();
	k = xxx;
	...
	if (yyy) {
		...
		finished = 1;
	}
	/* more work done here */
	...
}
  • There is no break <level> to say how many levels to break as might be found e.g. in a unix shell.

Basic operators

  • An equality operator is == since a single = is for an assignment.
int i = 13;
if (i == 13) {
	// will do something here
}
  • Logical AND and OR:
if (i == 13 && j < 10) {
	// ...

if (i == 1 || k > 100) {
	// ...
  • You do not need extra ()'s as || and && have lower priority than == and <, >, <=, >=, and !=. We will learn more about operator priority in later lectures.

  • Non-equality is !=

if (i != 13) {
	// ...
}

The comma operator

Useful to perform expression evaluations in one place. The first part is evaluated, then the second part. The result of the expression is the result of the second part, e.g.:

while (a = 3, b < 10) {
...
}

The cycle will be controlled by the boolean result of the second expression.

This is not limited just to 2 expressions, you can add more comma operators. It is left associative.

Note that the comma used in a variable declaration (int a, b = 3;) or a function call is not comma operator.

🔧 What will be returned? 👀 comma.c

This is handy for cycle control expressions.

The boolean type

There is a new _Bool type as of C99. Put 0 as false, and a non-zero (stick to 1 though) as a true value.

The keyword name starts with an underscore as such keywords were always reserved in C while bool, true, nor false never were. So, some older code might actually use those names so if C99 just put those in the language, it would have broken the code and that is generally not acceptable in C.

If you are certain that neither bool, true, nor false are used in the code on its own, you can use those macros if you include <stdbool.h>.

In that case, the macro bool expands to _Bool. true expands to 1 and false to 0 and both may be also used in #if preprocessing directives.

See C99 section 7.16 for more information.

See 👀 bool.c

Numbers and types

  • For example, the 1, 7, and 20000 integer literals are always integers of type int if they fit in.
    • The range of an int is [-2^31, 2^31 - 1] on 32/64 bit CPUs, that means 4 bytes of storage. However, an int may be stored in only two bytes as well. The range would be [-2^15, 2^15 - 1] then. You will likely never encounter such old platforms unless you look for them.
    • A larger decimal number will automatically become a long int, then a long long int if the number literal does not fit a (signed) long. That means if an unsigned long long type is stored in 8 bytes, one cannot use a decimal constant of 2^64 - 1 in the code and expect it to represent such a value:
$ cat main.c
int
main(void)
{
	unsigned long long ull = 18446744073709551615;
}

$ gcc -Wall -Wextra -Wno-unused main.c
main.c: In function ‘main’:
main.c:4:34: warning: integer constant is too large for its type
    4 |         unsigned long long ull = 18446744073709551615;
      |                                  ^~~~~~~~~~~~~~~~~~~~
  • However, if you printed ull (using %llu as for unsigned long long int), you will likely get 18446744073709551615 but that is because the number was first converted to -1 to fit the range (that is not guaranteed), then back to 18446744073709551615. More on that later in Arithmetic conversions.

  • Hexadecimal numbers start with 0x or 0X. Eg. 0xFF, 0Xaa, 0x13f, etc. In contrast to decimal constants, one can use a hexa constant for 2^64 - 1, which is 0xFFFFFFFFFFFFFFFF, even if unsigned long long is stored in 8 bytes. More on that later.

  • Octal numbers start with 0. Eg. 010 is 8 in decimal. Also remember the Unix file mask (umask), eg. 0644.

  • 'A' is called a character constant and is always of type int. See man ascii for their numeric values. The ASCII standard defines characters with values 0-127.

  • Note when we say a character, we mean a value that represents a character from the ASCII table. A character is not the same thing as char.

  • Types float, double

    • If you man 3 printf, you can see that %f is of type double. You can use:
float pi = 3.14
printf("%f\n", pi);
- `float`s are automatically converted to `double`s if used as arguments
  in functions with variable number of arguments (known as *variadic
  function*), i.e. like printf()
  • char (1 byte), short (usually 2 bytes), long (4 or 8 bytes), long long (usually 8 bytes, and can not be less). It also depends on whether your binary is compiled in 32 or 64 bits.

    • 🔧 See what code your compiler emits by default (i.e. without using either -m32 or -m64 options)
      • Use the file command to display the information about the binary.
  • See also 5.2.4.2 Numerical limits in the C spec. For example, an int must be at least 2 bytes but the C spec does not prevent it from being 8 bytes in the future.

  • chars and shorts are automatically converted to int if used as arguments in variadic functions, and also if used as operands in many operators. More on that later.

  • As 'X' is an int but within 0-127 (see above on the ASCII standard), it is OK to do the following as it is guaranteed to fit even when the char type is signed:

char c = 'A';
  • In printf, you need to use the same type of an argument as is expected by the conversion specified. Note that e.g. integers and floating point numbers have different representation, and printing an integer as a double (and vice versa) will lead to unexpected consequences. More on that later.
printf("%f\n", 1);

$ ./a.out
0.000000

👀 print-int-as-double.c

Signedness

  • Each integer type has a signed and unsigned variant. By default, the numeric types are signed aside from the char which depends on the implementation (of the C compiler). If you need an unsigned type, use the unsigned reserved word. If you need to ensure a signed char, use signed char explicitly.
signed int si;	// not used though, just use 'int si'
unsigned int ui;
unsigned long ul;
unsigned long long ull;
...
  • For ints, you do not even need to use the int keyword, ie. signed i, unsigned u are valid but it is recommended to use int i and unsigned int u anyway.

  • You can use long int and long long int or just long and long long, respectively. The latter is mostly used in C.

  • char and short int are converted to int in variadic functions (we will talk more about integer conversions later in semester). That is why the following is correct as the compiler will first convert variable c to int type, then put it on the stack (common argument passing convention on IA-32) or in a register up to certain number of arguments (common x86-64 calling convention).

/* OK */
char c = 127;
printf("%d\n", c);

/* OK */
short sh = 32768;
printf("%d\n", sh);

Modifiers for printf()

  • l for long, eg. long l; printf("%ld\n", l);

  • ll for long long, eg. long long ll; printf("%lld\n", ll);

  • u is unsigned, x is unsigned hexa, X is unsigned HEXA

unsigned int u = 13;
printf("%u\n", u);

unsigned long long llu = 13;
printf("%llu\n", llu);

unsigned int u = 13;
printf("%x\n", u);
// --> d
printf("%X\n", u);
// --> D
  • The following is a problem though if compiled in 32 bits as you put 4 bytes on the stack but printf will take 8 bytes. Older compilers may not warn you at all!
/* DEFINITELY NOT OK.  Remember, 13 is of the "int" type. */
printf("%lld\n", 13);

$ cc -m32 wrong-modifier.c
wrong-modifier.c:6:19: warning: format specifies type 'long
long' but the argument has type 'int' [-Wformat]
	printf("%lld\n", 13);
		~~~~     ^~
		%d
1 warning generated.
$ ./a.out
2026120757116941
  • When compiled in 64 bits, it is still as incorrect as before but it will probably print 13 anyway as 13 is assigned to a 64 bit register (because of commonly used calling convention on x86-64). So, if you use that code successfully in 64 bits you might be surprised if the code is then compiled in 32 bits and "suddenly gets broken". It was broken from the very beginning.
$ cc -m64 wrong-modifier.c
wrong-modifier.c:6:19: warning: format specifies type 'long
long' but the argument has type 'int' [-Wformat]
	printf("%lld\n", 13);
		~~~~     ^~
		%d
1 warning generated.
$ ./a.out
13

👀 wrong-modifier.c

Suffixes

  • You can explicitly specify integer constants with different integer types using suffices:

    • 13L and 13l is a long
    • 13LL and 13ll is a long long (Ll and lL is illegal)
    • 13u and 13U is an unsigned int
    • 13lu and 13LU is an unsigned long
    • 13llu and 13LLU is an unsigned long long
  • So, 0xFULL and 0XFULL is an unsigned long long 15 :-)

printf("%llu\n", 0xFULL);
// --> 15
printf("%lld", 13LL);	/* OK */
// --> 13
/* NOT OK as long may be 4 bytes while long long is 8+ bytes */
printf("%ld", 13LL);
// --> ??

Escape sequences

  • escape sequences \ooo and \xhh (not \Xhh) are character sized bit patterns, either specified as octal or hexadecimal numbers, and representing a single character. They can be used both in string and character constants.
    • see 5.2.1 Character sets and 6.4.4.4 Character constants for more information
printf("\110\x6F\154\x61");	// Used in a string literal.
printf("%c\n", '\x21');		// Used in a character constant.
// -> Hola!

getchar()

  • getchar function reads one character from the process standard input and returns its value as an int.
    • When it reaches end of input (for example, by pressing Ctrl-D in the terminal), it returns EOF
    • EOF is a define, usually set as -1. That is why getchar returns an int instead of a char as it needs an extra value for EOF.
    • getchar needs #include <stdio.h>
    • You can verify that EOF is part of <stdio>, search for "getchar" here: https://pubs.opengroup.org/onlinepubs/9699919799

🔧 Task: write code that will read characters from a terminal and prints them out.

It should work like this:

$ cat /etc/passwd | ./a.out > passwd
$ diff passwd /etc/passwd
$ echo $?
0
  • Remember, we said above that an assignment is just an expression, so it has a value. So, you can do the following:
if ((c = getchar()) == EOF)
	return (0);

instead of:

c = getchar();
if (c == EOF)
	return (0);

However, do not abuse it as you may create a hard to read code. Note the parentheses around the assignment. The = operator has a lower priority than the == operator. If the parens are not used, the following would happen:

if (c = getchar() == EOF) would be evaluated as:

if (c = (getchar() == EOF)), meaning that c would be either 0 or 1 based on whether we read a character or the terminal input is closed.

We will learn more about operator priority later in the semester.

🔑 getchar.c

The sizeof operator

  • The sizeof operator computes the byte size of its argument which is either an expression or a type name
    • This is not a function so you can use it without parens: sizeof foo unless its argument is a type name, in that case parens are required. However, for better readability parentheses are usually used.
sizeof (1);	// OK
sizeof 1;	// OK but "sizeof (1)" is better.
sizeof 1 + 1;	// See?
sizeof (int);	// OK
sizeof int;	// Syntax error.
  • Its type is size_t which is an unsigned integer according to the standard. However, the implementation (= compiler) can choose whether it is an unsigned int, an unsigned long int, or an unsigned long long int.
  • In printf(), the z modifier modifies u to size_t, so this is the right way to do it:
printf("%zu\n", sizeof (13));
// --> 4
  • You may see code using %u, %lu, %llu for sizeof values. However, that will only work based on a specific compiler and the architecture and may not work using a different combination. Always use %zu for arguments of type size_t.

  • The expression within the sizeof operator is never evaluated (the compiler should warn you about such code). Only the size in bytes needed to store the value if evaluated is returned.

int i = 1;
printf("%zu\n", sizeof (i = i + 1));
// --> 4
printf("%d\n", i);
// --> 1
  • 🔧 Try sizeof on various values and types in printf(), compile with -m 32 and -m 64 and see the difference
sizeof (1);
sizeof (char);
sizeof (long);
sizeof (long long);
sizeof ('A');
sizeof ('\075');
sizeof (1LL);
// ...
  • We will get there later in semester but if you are bored, try to figure out why the following is going to print 1 4 4:
char c;
printf("%zu\n", sizeof (c));
// --> 1
printf("%zu\n", sizeof (c + 1));
// --> 4
printf("%zu\n", sizeof (+c));
// --> 4
printf("%zu\n", sizeof (++c));
// --> 1

The sizeof operator is usually evaluated during compilation time however this is not universally true. For Variable Length Arrays (VLAs) it has to happen during runtime. The VLAs will be explained later.

Integer constants

  • An integer constant can be a decimal, octal, or hexadecimal constant.

  • All of these are equal:

printf("%c\n", 0101);
// --> A
printf("%c\n", 0x41);
// --> A
printf("%c\n", 65);
// --> A
  • Technically, 0 is an octal constant, not a decimal constant, since an octal constant always begins with 0. The following will generate an error:
printf("%d\n", 099);
main.c: In function ‘main’:
main.c:6:17: error: invalid digit "9" in octal constant
    6 |  printf("%d\n", 099);
      |                 ^~~
  • If you use a larger number than one that fits within a byte as an argument for the %c conversion, the higher bits are trimmed. The rule here is that the int argument is converted within printf to unsigned char (not just char!), then printed as a character (= letter). More on the integer conversion in upcoming lectures. See also Numbers on what happens with char or short when passed as argument to a variadic function.
    • Also note the existence of h and hh modifiers. See the printf() man page for more information.
printf("%c\n", 65 + 256 + 256 + 256 * 100);
// --> still prints A
  • Assignment is also an expression, meaning it has a value of the result, so the following is legal and all variables a, b, and c will be initialized with 13 (it is right associative).
int a, b, c;
a = b = c = 13;

🔧 Task: print ASCII table

Print ASCII table with hexadecimal values like on in the ascii(7) man page in OpenBSD except for non-printable characters print NP (non-printable).

To determine whether a character is printable you can use the isprint() function.

Use just while and if (without else).

Sample output:

00 NP	01 NP	02 NP	03 NP	04 NP	05 NP	06 NP	07 NP	
08 NP	09 NP	0a NP	0b NP	0c NP	0d NP	0e NP	0f NP	
10 NP	11 NP	12 NP	13 NP	14 NP	15 NP	16 NP	17 NP	
18 NP	19 NP	1a NP	1b NP	1c NP	1d NP	1e NP	1f NP	
20  	21 !	22 "	23 #	24 $	25 %	26 &	27 '	
28 (	29 )	2a *	2b +	2c ,	2d -	2e .	2f /	
30 0	31 1	32 2	33 3	34 4	35 5	36 6	37 7	
38 8	39 9	3a :	3b ;	3c <	3d =	3e >	3f ?	
40 @	41 A	42 B	43 C	44 D	45 E	46 F	47 G	
48 H	49 I	4a J	4b K	4c L	4d M	4e N	4f O	
50 P	51 Q	52 R	53 S	54 T	55 U	56 V	57 W	
58 X	59 Y	5a Z	5b [	5c \	5d ]	5e ^	5f _	
60 `	61 a	62 b	63 c	64 d	65 e	66 f	67 g	
68 h	69 i	6a j	6b k	6c l	6d m	6e n	6f o	
70 p	71 q	72 r	73 s	74 t	75 u	76 v	77 w	
78 x	79 y	7a z	7b {	7c |	7d }	7e ~	7f NP	

🔑 ascii-hex.c

🔧 Home assignment

Note that home assignments are entirely voluntary but writing code is the only way to learn a programming language.

🔧 Count digit occurrence

If unsure about the behavior, compile our solution and run it.

  • Read characters until EOF and count occurence of each 0-9 digit. Only use what we have learned so far. You may end up with longer code than otherwise necessary but that is OK.
$ cat /etc/passwd | ./a.out
0: 27
1: 37
2: 152
3: 38
4: 39
5: 43
6: 34
7: 35
8: 29
9: 31

🔑 count-numbers.c

  • Variant: instead of printing occurrences, print * characters to get a histogram. Use log() (see math(3)) to trim the values down.

🔧 To upper

Convert small characters to upper chars in input. Use the fact that a-z and A-Z are in two consequtive sections of the ASCII table.

Use the else branch:

	if (a) {
		...
	} else {
		...
	{

Expected output:

	$ cat /etc/passwd  | ./a.out
	##
	# USER DATABASE
	#
	# NOTE THAT THIS FILE IS CONSULTED DIRECTLY ONLY WHEN THE SYSTEM IS RUNNING
	# IN SINGLE-USER MODE.  AT OTHER TIMES THIS INFORMATION IS PROVIDED BY
	# OPEN DIRECTORY.
	#
	# SEE THE OPENDIRECTORYD(8) MAN PAGE FOR ADDITIONAL INFORMATION ABOUT
	# OPEN DIRECTORY.
	##
	NOBODY:*:-2:-2:UNPRIVILEGED USER:/VAR/EMPTY:/USR/BIN/FALSE
	...
	...

🔑 to-upper.c