04.md

Topics: strlen, ternary operator, arithmetic type conversions, negative numbers, integer overflows, compiler warning options

Recap of the last class

  • arrays: subscripting, initialization, sizeof
  • functions: declaration, return values, parameter passing, recursion
  • array of chars

String length

We know that we can use sizeof to get a length of a character array including the terminating '\0' character. There is a more general way to get a string length, function strlen(). Note that when we know what pointers are, the existence of the fuction will make more sense.

The function is declared in <string.h> and returns a value of type size_t, that is the same type as the operator sizeof uses. So, remember to use the z modifier to the %u conversion specifier:

#include <string.h>

char s[] = "hello, world";

printf("Length of \"%s\": %zu\n", s, strlen(s));

Note that to include a literal " in a string, you just escape it.

More on functions/arrays

variable length array (VLA)

  • automatically allocated based on a variable value. Only for local variables, not globals. Same warning applies w.r.t. array size and stack growth as with fixed width arrays.
	void fn(int n, int m)
	{
	    int a[n + m];
	    ...
	}

K&R

history: you may still see in very old code: K&R definition with types of arguments on separate line(s):

int
foo(a, b)
int a; int b;
{
      /* body */
}

Some compilers have dropped the support for K&R definitions or will drop soon.

Variable shadowing

  • every identifier has its scope

  • the scope of a name is the part of the program within which the name can be used.

  • for example, we can declare a variable int i in the main function, and as well as in another function. Those two variables reference different storage.

  • for now, we only care about two types of scope - a file, and a block.

  • when declaring a variable within the outer block of the function, its scope is from the variable declaration to the end of the function block.

  • note that a block is used in multiple constructs in C, e.g. while, for, etc.

  • 👀 shadow.c 👀 shadow-block.c

Variable number of arguments

  • functions can have variable number of arguments of different types (more on that later)
    • the three dots is called ellipsis
void func(fmt, ...);
  • the first argument usually describes the rest of arguments in some way
    • usually it is a format string. It could possibly be an argument count if they were of the same type however in that case they would probably be passed as an pointer (we will get to pointers in a later class).

Function is not object

Like an array, function is not a first class object (i.e. no "functor" like in functional languages).

  • that said, there are pointers to functions (more on these later)

warmup

🔧 toupper

rewrite the convert small characters to upper case program from last time using a function.

solution: 👀 toupper.c

implementation notes

  • there is a toupper() library function from C99 so use mytoupper()
    • what happens if the program defines int toupper(int) ? (the symbol in libc is weak so it works)

variant: do this via lookup table - array

  • generate array indexed by a..z (with corresponding values A..Z) in main() and use the array in the mytoupper() function
  • arrays passed as function argument are in reality converted to pointers (to be introduced later)
    • so it is useless to write their size like this:
         void func(int array[3]); // see what is the value of sizeof(array) inside the func()
  • this is better:
         void func(int array[], size_t size);
  • also, if the items in the array are changed in the function, they will be changed in the array itself (consequence of pointer conversion)

  • what happens if mytoupper(-1) ?

  • short is sufficient to store the value:

    short uppper[] = { 'A', 'B', ... };

solution: 👀 toupper-table.c

Ternary operator

    cond ? expr1 : expr 2

same as

    if (cond) expr else expr2

e.g.

    max = (i > j) ? i : j;

note that if we add a semicolon, an expression becomes a statement, e.g.:

      (i == j) ? i++ : j++;

will increment i or j. the parens are not needed but generally used for better readability.

code: 👀 ternary-assign.c

🔧 toupper using ternary operator

rewrite the convert small characters to upper case program from last time using a function that utilizes the ternary operator (single line of code)

solution: 👀 toupper-ternary.c

🔧 min/max of 3 integer values

write an expression that returns maximum of 3 integers i,j,k using ternary operator

  • do not use any macros

solution: 👀 3max.c

quiz

see the code in 👀 ternary.c

  • is this possible to do with ternary operator ?
  • how would you fix it ?
  • the usual solution is to put this into macro. many libraries/programs define their own MAX/MIN macros.

Arithmetic type conversions

In (very) general, if binary operators have operands of different types, the "lower" is promoted to "upper". Eg. if one operand is a long double, the other operand is promoted to a long double.

What is more, in most operations (for every operator, the spec says what is done in this respect), char and short int operands are converted to ints, and to unsigned ints if they do not fit in an int. That conversion is called integer promotion. This is done to make the language runtime fast (on x86 32-bit arithmetics can be much faster than when using 16-bit operands).

Most operations means all binary operators. A ternary operator as well. And even some unary operators.

sizeof (1) is 4 because 1 is an int. However, if a number does not fit to an int, a higher type will be used. For example, 4294967296 (2^32) will be stored in 8 bytes, and sizeof (4294967296) is 8.

Does it sound confusing? Do not worry, we will give you specific rules later.

🔧 verify that numbers that do not fit in an int will have a size of 8 bytes.

Examples (assuming char c; declaration);

  • sizeof (999) is 4 as "999" is an integer by definition.
  • sizeof (c) is 1
  • sizeof (c + c) is 4 as + is a binary operator and the integer promotion kicks in
  • sizeof (++c) is still 1 as ++ is an unary operator for which the integer promotion rules do not apply.
  • sizeof (+c) is **4** as for unary +and-`, the integer promotion is applied.
  • sizeof (1LL) will usually be 8 as long long is usually 8 bytes.

It gets more interesting if unsigned and signed numbers are involved. Eg. a signed int is promoted to an unsigned int if one of the operands is unsigned.

The above is called implicit type conversion. There is also explicit type conversion (called casting) which we will deal with later.

I suggest you try these out with printf("%zu", ...). %zu (see "man 3 printf") matches the return type of the sizeof operand. The exact unsigned numeric type of what sizeof returns may differ in different implementations so %zu will work anywhere.

Negative numbers

Negative numbers are usually stored in two's complement (however, that is implementation defined by the standard).

In short, you take an absolute value, create one's complement (inverting the digits in binary representation) and add 1. There are several advantages of this representation, one is that there is only 1 zero (not negative and positive zero if we used the highest bit to track the sign).
That is why a signed char, for example, can hold -128 to 127, and not just -127 to 127.

For a char:

 10000000	-128
 ........
 11111101	  -3
 11111110	  -2      ^
 11111111	  -1      |
 --------------------------
 00000000	   0      |
 00000001	   1      v
 00000010	   2
 ........
 01111111	 127

On Unix systems the shell reports the -1 return value as 255 in echo $?. Even though the main() returns integer (4 bytes), the shell takes just low 8 bits and interprets them as unsigned quantity.

code: 👀 return-1.c

If long is 8 bytes, and an int 4, then -1L < 1U as you might expect.

However, -1 > 1U because -1 is promoted to unsigned. Two's complement representation of -1 is (see the paragraph above):

(1) take absolute value of 1	00000000.00000000.00000000.00000001
(2) one's complement		11111111.11111111.11111111.11111110
(3) add 1 to get 2's complement	11111111.11111111.11111111.11111111

which is 2^32 - 1 when interpreted as unsigned quantity.

Just printf("%u\n", -1) to see it will print 4294967295 (use the unix/linux bc command and type 2^32-1 to verify).

code: 👀 signed-plus-unsigned.c 👀 signed-to-unsigned.c

The assymetry of the negative/positive interval can lead to the program crashing on architectures that detect it. This is consequence of hardware handling rather than the language.

code: 👀 crashme.c - run with -INT_MIN (see limits.h) and -1. INT_MIN is usually -2147483648. - works in 64-bit mode as well due to int being passed in 32-bit registers

	$ cc -m64 crashme.c
	$ ./a.out 2147483648 -1
	-2147483648 -1
	Floating point exception: 8
	$ echo $?
	136
	$ kill -l 136
	FPE

	$ cc -m32 crashme.c
	$ ./a.out 2147483648 -1
	2147483647 -1
	$

Quiz 1

  • what is the result if 0xff signed char and 0xff unsigned char are compared using the == operator ?
    • write down the hexadecimal representation of the integers corresponding to the 2 chars with printf()

code: 👀 int-promotion.c - note that if the b character was defined as char b the result might be 1 because it is up to the compiler to choose whether char will be signed or unsigned. Usually it is signed though. There are compiler options to specify this, e.g. GCC has -funsigned-char and -fsigned-char

Quiz 2

Will the program print the whole array ?

  • try to come up with reason of the expected behavior before running the program.

code: 🔧 👀 whole-array.c

Function arguments

Say, what happens if you put a char as an argument of a function that accepts an int? As you might expect, the byte is just assigned to an int.

In general, the expressions passes as arguments are first evaluated, then the value of the resulted type is converted to the parameter type as in assignment.

More on that later. For more information, see 6.9.1 in C99

Compiler warnings

  • each compiler has its own set of warnings that usually give a clue that something strange and potentially harmful can happen.
    • there are differences between various compilers

    • will focus on GCC here

    • during the old days the tool producing these warnings was called 'lint' that had to be run separately

    • the basic: -Wall -Wextra

      • -Wall catches things like missing return value from a function that should return one

        code: 👀 no-return.c

      • there are many places where a beginner can shoot himself into a foot by not knowing the language intricacies.

        • e.g. variable cannot be modified more than once between sequencing points. The -Wsequence-point that is included in -Wall warns about that

        code: 👀 sequence-point-violation.c

    • the -Wshadow can catch shadow variables overriding:

      code: 👀 shadow.c 👀 shadow-block.c

    • all or specific warnings can be turned into errors: -Werrror or -Werror=<insert_specific_error> , respectively

      • unless the warnings contain false positives (and those can usually be suppressed) this will help to avoid troubles later
    • there are other means to check for correctness (static/dynamic analysis), more on those later

🔧 Use the options

Go through the programs written so far and run the compiler using the -Wall -Wextra options.

  • what kind of problems did you discover ? how to fix them ?
    • see e.g. 👀 whole-array.c example above (where only -Wextra gives some clue)

Explore the compiler documentation for more helpful options. Over the time you will find a set of warning options that will serve you well.

Integer overflow

  • what happens if int overflows ?

    • the behavior of overflow depends on whether the type is signed or unsigned
      • for signed types the behavior is undefined! I.e. you cannot rely on overflow of a positive quantity in a signed int will be turned into a negative number. Some compilers will allow to specify the behavior of signed overflows using special options (-fwrapv for GCC), though.
      • for unsigned, an overflow always wraps around (modulo power of 2) and is a defined behavior.

    code: 👀 int-overflow.c 👀 unsigned-overflow.c

    • use -fstrict-overflow -Wstrict-overflow (will become active only for higher optimization levels, i.e. -O<X> where X > 1) to stay on the safe side

Home assignments 🔧

Conway 1D

write a 1-D implementation of Conway's game of life

  • use rule 30 to determine the next
  • use two arrays (one 2-D and one 1-D) to represent the rules and its resulting values
    • there are 8 rules, each has 3 values to compare against and one result
  • we now have the arsenal to write 2-D variant however to display 2-D world some way to refresh the terminal would be needed so we will stick with 1-D.
    • use \r to reset the line between iterations
    • sleep for 1 second between iterations (unistd.h is needed for that)
  • each life "tick" will print the line representing the world
  • use functions to refactor the code
  • once having a working solution, try to rewrite it to be more efficient or elegant (or both)

byte count to human readable string

write function to convert uint64_t to human readable count (binary - see https://en.wikipedia.org/wiki/Orders_of_magnitude_(data) or https://en.wikipedia.org/wiki/Mebibyte, e.g. MiB as mebibyte, etc.) and print it to standard output

  void bytecnt2str(uint64_t num);
  • use character array to represent the magnitude letters and perform the lookup based on the actual magnitude

  • if there is a remainder, write a '+' following the number

  • write the output in single printf() (use ternary operator for the remainder)

  • example inputs/outputs:

1024		1 KiB
1025		1+ KiB
2047		1+ KiB
2048		2 KiB
2049		2+ KiB
5242880		5 MiB

solution: bytecnt2str.c

Offset checker

write a function that has the following prototype:

  bool check(long off, size_t size, size_t limit);
  • checks if the arguments are valid for accessing part of a file (no operations will be done, just the check) of size 'size' starting from an offset 'off'. The length of the file is given by the 'limit' argument.
  • an offset may be negative
    |           |         |              |
    0          off     off+size        limit
  • try to capture all corner cases of what could go wrong with the values and return false on failure, true on success.

solution: 👀 range-check.c

  • on Unix systems one would use ssize_t for the offset which is a signed integer type of the same size as size_t (this is not part of C99 but POSIX)

  • also, implement a set of test values in main() using an array