06.md

Array of characters

An array of characters (chars) is called a string or a string constant. Do not confuse it with a character constant as in 'A' (single quotes) that is actually an int. A string is put in double quotes "".

So, "hello, world" as string.

Technically, a string constant is an array of characters and may be used to initialize a char array:

char foo[] = "bar";

A special thing is that a string is always terminated by a NUL (zero) byte. That is how C knows where a string ends.

That means the size of the array is one byte more than the number of characters in the string. It is because of the terminating zero (\0) that the compiler adds to terminate the string.

So, you could use an initializer as introduced earlier:

char foo[] = { 'b', 'a', 'r', '\0' };

printf("%zu\n", sizeof (foo));		// prints 4
printf("%zu\n", sizeof ("bar"));	// prints 4

However, that is not generally used as you would normally use "bar" to initialize such an array.

Using '\0' suggests it is a terminating NUL character, and it is just a zero byte. So, you could just use 0, like this, but it is not generally used:

char foo[] = { 'b', 'a', 'r', 0 };

To print a string via printf(), you use the %s conversion specifier:

printf("%s\n", foo);

🔧 What happens if you forget to specify the terminating zero in the above per-char init and try to print the string ?

👀 array-char-nozero.c

Strings and String Literals

  • "xxx" is called a string literal or a string constant

  • in lays in a consecutive piece of memory: 'x' 'x' 'x' '\0'

  • '\0' terminates the string literal. That's the C convention.

  • a string literal internally initializes an array of characters from the string, with a NUL character appended.

  • a string is a contiguous sequence of characters terminated by and including the first NUL character

    • so, a string literal may include multiple NUL characters, thus defining multiple strings. In other words, a string literal and a string are two different things.
int
main(void)
{
	char s[] = "hello, world\0, I arrived.";

	printf("%s\n", s);
}
$ gcc -Wall -Wextra main.c
$ ./a.out
hello, world
  • '\0' is called a NUL character. It's just a zero value byte but the convention is to use '\0' if it represents a string element. '\0' is just a special case of octal representation '\0ooo' where 'o' is an octal figure (0-7).

  • aside from the { 'x', ... } initializer, character array initialization may be done also with a string literal:

/* sizeof (a) is 6 as the terminating 0 is counted in */
char a[] = "hello";
  • if {} is used for the initialization, you must add the terminating zero yourself unless you use the size of the array and the string was shorter (in which case the rest would be initialized to zero as usual):
char s[] = { 'h', 'e', 'l', 'l', 'o', '\0' };

Anyway, you would probably never do it this way. Just use = "hello".

code: 👀 array-fill.c

  • string is printed via printf() as %s
printf("%s\n", "hello, world");
  • experiment with what %5s and %.5s do (use any reasonable number)

code: 👀 string-format.c

Warm-up

🔧 task:

  • extend the program 👀 tr-d-chars.c from last time to translate character input, e.g.
tail /etc/passwd | tr 'abcdefgh' '123#'
  • use character arrays defined with string literals to represent the 2 strings
    • see the tr(1) man page on what needs to happen if the 1st string is longer than the 2nd
      • do not store the expanded 2nd string as literal in your program !

code: 👀 tr.c

🔧 task (bonus): refactor the code into a function(s)

  • remember that arrays are passed into a function as a pointer (to be explained soon, not needed now) which can be used inside the function with an array subscript.

The for loop

Formally defined as:

for (<init>; <predicate>; <update>)
	<body>

It is the same as:

<init>
while (<predicate>) {
	<body>
	<update>
}

Using the for loop is very often easier and more readable.

Example:

int i;
for (i = 0; i < 10; ++i)
	printf("%d\n", i);

Or in C99:

for (int i = 0; i < 10; ++i)
	printf("%d\n", i);
  • if the break jump statement happens in the body, the control is transferred to the end of the loop without executing the <update> part.
  • with continue jump statement, the rest of the body is skipped. The <update> part is then performed for the for loop but not for the while loop.
  • both break and continue statements relate to the loop they are executed in

👀 for.c

🔧 task: compute minimum of averages of lines in 2-D integer array (of arbitrary dimensions) that have positive values (if there is negative value in given line, do not use the line for computation).

👀 2darray-min-avg.c

Expressions

"In mathematics, an expression or mathematical expression is a finite combination of symbols that is well-formed according to rules that depend on the context."

In C, expressions are, amongst others, variables, constants, strings, and expressions in parentheses. Also arithmetic expressions, relational expressions, function calls, sizeof, assignments, and ternary expressions.

http://en.cppreference.com/w/c/language/expressions

In C99, an expression can produce results (2+2 gets 4) or generate side effects (printf("foo") sends a string literal to the standard output).

code: 👀 expression-statement.c

🔧 task: make the warning an error with your choice of compiler (would be a variant of -W in GCC)

Statements

Statements are (only from what we already learned), expressions, selections (if, switch (not introduced yet)), {} blocks (known as compounds), iterations, and jumps (goto (not introduced yet), continue, break, return).

http://en.cppreference.com/w/c/language/statements

Basically, statements are pieces of code executed in sequence. The function body is a compound statement. The compound statement (a.k.a. block) is a sequence of statements and declarations. Blocks can be nested. Blocks inside a function body are good for variable reuse.

A semicolon is not used after a compound statement but it is allowed.

A declaration is not a statement (there are subtle consequences, we can show them later).

Some statements must end with ;. For example, expression statements. The following are all valid statements. They do not make much sense though and will probably generate a warning about an unused result.

char c;

c;
1 + 1;
1000;
"hello";

👀 null-statement.c 👀 compound-statement.c

w.r.t. compound statement vs. expression:

  • It is not allowed to have a compound statement within an expression.
    • That said, GCC has a language extension (gcc99) that can be used to allow this - the reason is for protecting multiple evaluation within macros.
    • The C99 standard does not define it.
    • gotcha: the code has to be compiled with gcc with the -pedantic-errors option in order to reveal the problem
gcc -std=c99 -Wall -Wextra -pedantic-errors compound-statement-invalid.c

Our recommendation is to always use these options.

👀 compound-statement-invalid.c

err() family of functions

  • not defined by any standard however handy
  • present in BSD, glibc (= Linux distros), Solaris, macOS, etc.
  • use the <err.h> include, see the err(3) man page
  • this is especially handy when working with I/O
  • instead of writing:
if (error) {
	fprintf(stderr, "error occured: %s\n", strerror(errno));
	exit(1);
}
  • write the following
if (error)
     err(1, "error occured: ");
  • notice that a newline is inserted automatically at the end of the error message.
  • or for functions that do not modify the errno value, use the x variant:
if (some_error)
	errx(1, "ERROR: %d", some_error);
  • there's also warn()/warnx() that do not exit the program

File API

Part of the standard since C90.

Opening/closing

  • fopen opens a file and returns an opaque handle (pointer)
    • NULL pointer means error
    • the mode argument controls the behavior: read, write, append
      • the + adds the other mode (write for read and vice versa, read for append)
    • write mode creates the file if it does not exist
    • the b binary mode usually does not have any effect (see the standard)
  • fclose closes the handle
    • important to avoid resource leak (it can allocate both memory and file descriptor)
  • freopen can be used to associate the standard streams (stderr, stdin, or stdout) with a file

🔧 write a code that opens the same file in an cycle (until fopen() fails) without calling fclose() on the handle. After how many iterations does it fail on your system?

👀 fopen-leak.c

I/O

  • fprintf - printf to a stream
  • fscanf
    • basically parses text input from a stream according to format string
    • except the format string all the parameters must be pointers
  • fputs/fgets - send/read string to/from a stream
  • fputc/fgetc - send/read char to/from a stream
  • fwrite/fread - for writing/reading binary data (such as structures or raw numeric types)

🔧 task: create a text file where each line begins with integer followed by space and a string (echo "1 foo" > file.txt; echo "2 bar" >> file.txt). Read the file using fscanf() and print the values (i.e. integer and a string) from each line to standard output.

Seeking

When reading/writing to a file using the above function, the current position changes accordingly. However, the position can be manipulated without performing any I/O.

  • fseek - moves the position
    • the whence parameter has 3 possible values and makes the offset parameter relative to:
      • SEEK_SET - the beginning of the file
      • SEEK_END - the end of the file
      • SEEK_CUR - the current location of the cursor in the file
  • ftell - get current position in the file

🔧 get a file size using the standard IO API (that is, lseek(2) is prohibited even if you know it).

🔧 write array of int values (use an arbitrary positive count with values ranging from INT_MAX to 0) to a file and read the values into another array and print them to the standard error output without closing the file in between the writing and reading. Use od(1) to verify the content of the file (thus it is handy to start with INT_MAX and e.g. divide by 2 for each successive value). 👀 fopen-binary.c

🔧 use the file created by the previous program. Read the values from the end of the file to the beginning of the file one by one without knowing the file size and print the numbers to the standard error output.

Pointers

Motivation:

  • memory allocation / shared memory

  • protocol buffer parsing

  • pointer arithmetics

  • the value stored in a pointer variable is the address of the memory storing the given pointer type

  • declared like this, e.g.

int *p; // pointer to an int

  • note on style:
int * p;
int *p;  // preferred in this lecture
  • good practice is to prefix pointers with the 'p' letter

  • a pointer is always associated with a type

  • to access the object the pointer points to, use a dereference operator *:

printf("%d", *p);
  • the dereference is also used to write a value to the variable pointed:
*p = 5;
  • in a declaration, you may assign like this:
int i = 5;
int *p = &i;
  • the & is an address-of operator and gets the address of the variable.

  • the pointer itself is obviously stored in memory too

     p
     +---------------+
     |     addr2     |
     +---------------+        i
     ^                        +-------+
     |                        | 5     |
   addr1                      +-------+
                              ^
                              |
                            addr2
  • the size of the pointer depends on architecture and the way the program was compiled (see -m32 / -m64 command line switches of gcc)

  • sizeof (p) will return the amount of memory to store the address of the object the pointer points to

  • sizeof (*p) will return the amount needed to store the object the pointer points to

🔧 Task: write a program to print:

  • address of the pointer
  • the address where it points to
  • the value of the pointed to variable

Use the %p formatting for the first two.

Code: 👀 ptr-basics.c

NULL pointer

  • the real danger with pointers is that invalid access results in a crash (the program is terminated by kernel)
    • this is because NULL address is left unmapped on purpose, or a page that cannot be accessed maps to the address. Note that C guarantees that zero is never a valid address for data.
  • can assign a number directly to a pointer (that should trigger a warning though. We will get to casting and how to fix that later).
int *p = 0x1234;

🔧 Task: create NULL pointer and try to read from it / write to it

Code: 👀 null-ptr.c

Basic operations

  • notice the difference:
int i;
int *p = &i;

vs.

int i;
int *p;

// set value of the pointer (i.e. the address where it points to).
p = &i;
  • operator precedence gotcha:
*p		// value of the pointed to variable
*p + 1		// the value + 1
*(p + 1)	// the value on the address + 1 (see below for what it
		// means)
  • note that the & reference operator is possible to use only on variables
    • thus this is invalid:
p = &(i + 1);
  • store value to the address pointed to by the pointer:
*p = 1;

Changing pointers:

  • pointers can be moved forward and backward
p = p + 1;
p++;
p--;

operator gotchas

  • * has bigger precedence than + so:
i = *p + 1;

is equal to

i = (*p) + 1;
  • postfix ++ has higher precedence than *:
i = *p++;

is evaluated as *(p++) but it still does the following because ++ is used in postfix mode, ie. the value of expression p++ is p:

i = *p; p++;
  • the pointer is moved by the amount of underlying (domain) data type when using arithmetics

🔧 Home assignment

Note that home assignments are entirely voluntary but writing code is the only way to learn a programming language.

Pointer difference

🔧 Task: create a pointer to an int, print it out, create a new pointer that points to p + 1. See what is the difference between the 2 pointers.

👀 ptr-diff.c