05.md

Topics

Warm-up, common mistakes, multi-dimensional arrays, home assignment, integer promotion and conversions.

Warm-up

Implement functionality provided by the following command with the specific options. Just hardcode the parameter in the code, e.g.

#define CHAR_DO_DEL     'x'

See the tr(1) manual page if unsure.

Variant 1: delete a single char (tr -d '<char>')

Example:

$ cat /etc/passwd | tr -d r > output
$ cat /etc/passwd | ./a.out > output2
$ diff output output2
$ echo $?
0

Variant 2: squeeze a single char (tr -s '<char>')

Squeeze multiple occurrences of the character into one.

$ echo "hellooooooo, wooooooorld" | tr -s o
hello, world

Common mistakes I.

I noticed several times already that you may end up doing the following:

#define MYDEF = 3

And you hit a syntax error when MYDEF is used later in the code.

What happens? A preprocessor replaces occurences of MYDEF with whatever is after MYDEF, and separated from it with a sequence of white spaces (tabs, spaces, newlines if escaped).

In the case above, MYDEF will be literary replaced with "= 3".

Check with gcc -E (or cpp) which stops after the preprocessor phase (ie. it does NOT compile anything). The following is in 👀 👀 common-mistake-with-define.c

#define MYDEF = 3

if (i < MYDEF) {
	// ...
}

The code above will end up in a syntax error as i < = 3 is not a correct expression in C (because of the space between < and =).

$ gcc -E common-mistake-with-define.c
...
...

if (i < = 3) {

}

Multidimensional arrays

  • multi-dimensional arrays: it is an array of arrays
int a[3][2] = { {0, 1}, {2, 3}, {4, 5} };
  • a is an array of 3 elements. Each of the element in an array of 2 elements.

    • we read it like that as the operator [], array subscripting, is evaluated left-to-right. See operator precedence for more information.
  • to access a given item all elements have to be specified in square brackets. i.e. a[1, 2] does not work, you have to use a[1][2]

  • in memory this is stored as consequent individual "lines". In other words, any array, no matter how many dimensiones it has, is stored as one piece of memory.

  • for 3-D array it looks as follows. a is an array of 2-dimensional arrays:

int a[2][4][3] = {
	{ {1, 2, 3}, {4, 5, 6}, {8, 7, 8}, {9, 1, 9} },
	{ {0, 0, 0}, {1, 1, 1}, {2, 2, 3}, {4, 4, 5} },
};
  • in memory:
| 1 2 3 | 4 5 6 | 8 7 8 | 9 1 9 |
| 0 0 0 | 1 1 1 | 2 2 3 | 4 4 5 |

👀 3d-array.c

  • as with the 1-dimensional array, the most "significant" dimension of the array may be omitted if statically initialized, but nothing below, i.e.
int a[][5] = { 0 };
int a[][3][6] = { 0 };

are fine. However:

int a[5][6][] = { 0 };
int a[5][][8] = { 0 };

are not.

  • the reason is that if you do not know how may "lines" are there, it's OK, but you always need to know the "length" of the "line" so that you can put them one by one.

  • see above for a[2][4][3] and its memory layout. We can easily add a new element of a which is a 2-dimensional array, but there is no way we can store the a array unless we know the exact dimensions of the 2-d subarray.

  • as before, if you do not initialize the rest, it will be set as 0. You can also used designated initializers same way as with one-dimensional arrays.

  • sizeof works as expected. So, the following prints a size of of an element of a, which is an [3][6] subarray of ints, so 72 is printed (3 * 6 * sizeof (int)) unless you are on something you borrowed from the Computer History Museum. If you happen to be in Silicon Valley, it is worth going to Mountain View to visit this one.

int a[][3][6];
printf("sizeof = %zu\n", sizeof (array[0]));

🔧 Define an array [4][2], statically initialize it, and write a function that accepts such an array and prints the first element from each [2] subarray.

👀 2d-array.c

🔧 write a program that takes a 2-D array of integers and constructs a 1-D array of maximum values in each sub-array, then prints out the new array to the standard output. For the maximum value in a sub-array, write a function.

👀 2d-array-max.c

VLA, multi-dimensional arrays, and function arguments

You can use variable-length arrays in function arguments themselves, like this:

int
myfn(int width, int a[][width])
{
	...
}

🔧 Home assignment

Note that home assignments are entirely voluntary but writing code is the only way to learn a programming language.

Mountain scenery generator

🔧 Write a simple moutain generator. At every iteration you go either straight (-), up (/), or down (\). The program generates something like the following:

$ a.out

               /                         /--
            / / \- /--                  /   \                       / /-
           / \    \   \              /--     \-                   /- \  \
          /            \            /          \             /-- /       \
      / /-              \-  /--   /-            \-          /   \         \
     / \                  \-   \ /                \-  /-- /-
    /                           \                   \-   \
 /--
-

With all the variants below, try to make the code as simple as possible. You can do really cool stuff with quite little code.

You will need a two-dimensional array to fill, then print. Use memset() to initialize the array with spaces. Check the man page for memset. For the first argument of memset, pass an array name. Its dimension does not matter in this case as long as you use its correct size in bytes.

For random numbers, use rand() and % for modulo. To initialize the random generator, use sranddev() if you have it, or srand(time(NULL)). Check the documentation if unsure (each function is supposed to have its manual page).

There is no language construct to initialize all elements of an array with a specific non-zero value, that is why we need memset. You can only zero it out using an initializer { 0 }, as we already know.

The algorithm goes from left to right, one character at a time. At each point it decides whether the mountain will grow, descend or remain the same (hence the random numbers).

Once you got a working program, refactor the code into small functions (one for printing a character based on random number, one for printing the whole 2-D array, etc.). Optionally you can try to avoid global variables by passing the array as parameter of a function. In that case, you might try to use a VLA in function arguments to see it works. See multi-dimensional arrays for more information.

👀 mountain-generator.c

Variant: nicer mountain

You can make it more complicated and make the ascii art smoother. For example, you can define that after / you cannot go one character down with \ (see above what we mean) but you could do /\, etc. You would need to keep a state of the previous character. You could generate something like this (use your imagination):

            .
           / \
          /   \__
      /\_/       \__/|
     /               |
    /                \__/............................
  _/

Variant: mountain range

The top-level function (mountain()) can be also called with the array (and its dimensions) as input and you can try calling it multiple times to see if a mountain range can be generated.

Variant: add snow caps

Usually, there is snow on the peaks.

Variant: ???

Come up with something else.

However, whatever you do, the objective is to write simple code. So, If you have something cool, send us the code, please!

Arithmetic promotion and conversions

NOTE: the following is a simplified version of what is in the standard. You need to consult the specification if unsure. See the standard for the PDF.

We already mentioned some of this in an Arithmetic type conversions section but now we will be more specific.

Integer promotion vs integer conversion vs arithmetic conversions

First, let's define three different actions, then we will go through those one by one.

Integer promotion: char -> int; short -> int; or both to unsigned int if the value does not fit a signed int (may happen if a short int is of the same size as an int, and unsigned short is used).

Integer conversion: converting integers (eg. assigning a signed long to an unsigned char, or assigning an unsigned int to a signed char)

Arithmetic conversion: many operators cause conversions. The effect is to bring the operands into a common type before the operator is applied. For example:

int i;
unsigned long long ll;

i + ll;		// type of the result is unsigned long long and 'i'
		// is converted to unsigned long long BEFORE '+' is
		// applied.  See below what happens if 'i' contains a
		// negative number.

Integer Promotion

Integer promotion (we promote chars and shorts only here) usually happens with binary and ternary operators before arithmetic conversion happens. It sometimes happens with unary operators as well. You need to consult the specification. For example:

++ and --	*NO* integer promotion
! (negation)	*NO* integer promotion
+ and -		integer promotion happens

That means for char c:

sizeof (c)	// is	1
sizeof (++c)	// is	1
sizeof (--c)	// is	1
sizeof (!c)	// is	1
sizeof (+c)	// is	4
sizeof (-c)	// is	4
sizeof (c + 1)	// is	4
// ...

Note that we already know that c is never modified as the expression in sizeof is never evaluated. See the sizeof module for to review the knowledge.

Printing sizeof value

Remember, to printf a value of sizeof, use it like the following (the sizeof result is always unsigned, and the z modifier makes sure its size (usually 4 or 8 bytes) matches the implementation, ie. whatever size_t is):

printf("%zu\n", sizeof (c));

Consult the printf man page if unsure about printf conversions and modifiers.

Integer promotion also happens in arguments of variadic functions (eg. printf). That is why the following works as expected:

char c = 'A';
printf("%c", c);

printf(3) man page says:

        c	The int argument is converted to an unsigned char, and the
		resulting character is written.

So it means that %c expects an int but we put a char there. However, since an integer promotion is applied arguments of variadic functions, c is converted to an int before it is used as an argument to printf. Note that on a 64-bit x86 platform, this conversion means no extra code as the argument (= char) is just put to a 32-bit register.

Integer conversion

This is about converting integers only.

There are three parts when converting integers:

  1. assigning an integer to another integer while the source value fits into the target -> the result is specified by the standard. It is the same value.

  2. assigning any integer to an unsigned integer -> result specified by the standard, see below.

  3. assigning an integer to a signed integer and the value does not fit. The standard says the result is implementation-defined.

Implementation-defined means the implementation (ie. the compiler) must choose how to behave in such a situation and must document it. See the standard, section 3.4.1, for the precise definition. See also behavior types.

The first rule is simple and needs no more discussion.

Integer to an unsigned integer

6.3.1.3 Signed and unsigned integers ... ...if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

Use bc(1) to work with big numbers:

$ bc
2^32
4294967296
(2^32 + 17) % 2^16
17
...
...

So, the following happens when assigning the numbers below to an unsigned char:

257 -> 1 (257 - 256)
258 -> 2 (258 - 256)
1000 -> 232 (1000 - 3*256)
4294967279 -> 239 (4294967279 - 2^32 + 256)

Examples in C:

/* 'c' will be 24 */
unsigned char c;
c = -1000;		// 24 + 4 * 256

/* 'i' will be 4294967286 */
unsigned int i;
i = -10;		// -10 + 2^32

Assign to a signed type when the value does not fit

When any integer is converted to a signed type, the value is unchanged if it can be represented in the new type and is implementation-defined otherwise.

For gcc, that implementation-defined behavior is documented in Integers implementation

  The result of, or the signal raised by, converting an
  integer to a signed integer type when the value cannot be
  represented in an object of that type (C99 6.3.1.3):

	For conversion to a type of width N, the value is
	reduced modulo 2^N to be within range of the type; no
	signal is raised.

So, with gcc (and probably any other compilers you might meet today), this means the wrap-around rule is applied for signed integers as well. However, let us repeat that the following is an example of implementation-defined behavior in this case tied to the gcc compiler.

signed char c = 128;	// 128 - 256, ie. -128 will be in 'c'.  Might surprise
			// one, right?

printf's hh modifier is for printing a char, h for a short. Note that i is first converted to an int if it is not already, as arguments of variadic functions goes through integer promotion, as we already know. Then, it is converted to a char inside printf.

/* This will print 1 if compiled with gcc. */
int i = 257;
printf("%hhd\n", i);

The compiler will probably warn you to let you know that your int might be truncated.

xxx.c:10:19: warning: format specifies type 'char' but the argument has
	     type 'int' [-Wformat]
printf("%hhd\n", i);
	~~~~     ^
	%d

Generic integer conversion algorithm

BTW, when mentioning above those three parts when converting integers, it does not hurt to cite the C99 standard in full on this. The following section covers all the integer conversions we went through above. The below mentioned "implementation-defined signal" might be to print an error and exit, for example.

6.3.1.3 Signed and unsigned integers

1 When a value with integer type is converted to another integer type other than
 _Bool, if the value can be represented by the new type, it is unchanged.

2 Otherwise, if the new type is unsigned, the value is converted by repeatedly
  adding or subtracting one more than the maximum value that can be represented in
  the new type until the value is in the range of the new type.

3 Otherwise, the new type is signed and the value cannot be represented in it;
  either the result is implementation-defined or an implementation-defined signal
  is raised.

Example: 👀 integer-conversion.c

Arithmetic conversion

Simply put, arithmetic conversion is about converting all arguments of an operator to the smallest common type before the operator is applied. That includes arithmetic operators as well as logical operators (>, <, <=, ==, >=)

Example:

char c;
signed int i;
unsigned int u;
long long ll;

c + i + u + ll;

We will learn about associativity later, but accept that the compiler actually assumes the following:

((c + i) + u) + ll)

What happens:

  1. c is converted to an int based on Integer Promotion
  2. so the result type of (c + i) is an int
  3. that result is converted to an unsigned int because of u based on Integer Conversion we just introduced above.
  4. the result type of (c + i + u) is then an unsigned int
  5. the result from 4) is then converted to a long long because of ll
  6. a long long is the result type of the whole expression

Note that at every step, Integer Conversion rules were applied.

Specifically, an unsigned integer type is "bigger" than the corresponding signed type.

int i;
unsigned int u;

i + u;		// 'i' is first converted to unsigned int

That is why -1 > 1U is TRUE since -1 is first converted to an unsigned int, ie. -1 is converted into 2^32-1, ie. 4294967295, and the expression is then evaluated as:

(4294967295U > 1U)

A ternary operator also first converts its arguments to a common type no matter which branch will be returned. For example:

unsigned int i = 0 ? 1U : -1;
printf("%u\n", i);		// will print 2^32-1 (ie. 4294967295)

However, if you do this:

int i = 0 ? 1U : -1;

Then the result of the ternary expression is 4294967295 but then it is converted to a signed int. And 4294967295 does not fit an int whose range in two's complement (gcc) is [-2^31, 2^31-1]. So, if we rely on gcc implementation-defined behavior, there is a modulo operation to the rescue and we get -1:

// (-1 + 4294967295) == 2^32
int
main(void)
{
	int i = 0 ? 1U : -1;
	printf("%d\n", i);
}

The compiler will warn you though as you are triggering something that is implementation-defined:

$ cc main.c
"main.c", line 6: warning: initializer does not fit or is out of
range: 0xffffffff
$ ./a.out
-1

Signed numbers

The spec does not say how signed numbers are internally stored. That is just an implementation detail. There are other ways to do it than via the two's complement. See Signed number representations.

However, gcc, clang, and all other compilers you probably ever meet use two's complement, as documented on Integers implementation

-  Whether signed integer types are represented using sign and
   magnitude, two's complement, or one's complement, and
   whether the extraordinary value is a trap representation or
   an ordinary value (C99 and C11 6.2.6.2).

	GCC supports only two's complement integer types, and
	all bit patterns are ordinary values.

🔧 Assignment to practice your knowledge

Try to figure out using bc(1) and what you have just learned what will be in the variables:

/*
 * Let's assume that:
 *
 *      short int is 2 bytes
 *      int is 4 bytes
 *      long is 8 bytes
 *      long long is 8 bytes
 *
 * Compile with "gcc -m64" to match the above.
 *
 * Let's assume that we use gcc that uses modulo to fit when assigning to a
 * signed integer.
 *
 * Include <limits.h> for INT_MIN (-2^31) and INT_MAX (2^31 - 1).
 */

/*
 * Use bc(1) to manually compute the values of the left sides.  When done,
 * incorporate into the C code, compile, and verify you got it right.
 */

unsigned long long ll = -13;            // Use %llu in printf() to verify.
signed char c = 999;                    // %d
short int si = -1;                      // %hd
unsigned int = -1;                      // %u
unsigned short int usi = 999999;        // %hu
signed char c = 0 ? -10 : 0U;           // %d
signed char c = -1 ? -10 : 0U;          // %d
long long ll = 1U + -10;                // %lld
unsigned long long ull = 1U + -10;      // %llu
unsigned short int si = INT_MIN + 13LU; // %hu
unsigned short int si = -INT_MAX + 13U; // %hu
signed char c = 129;                    // %d

/* What is printed? */
printf("%hhu\n", -3);
printf("0x%hhx\n", -3);
printf("%c\n", 321);
printf("%c\n", -191);

Also check the following code and try to figure out what is gonna be printed before actually building and executing the code:

👀 int-promotion.c

Now you also know everything you need to know to figure out what happens in the following code. Hint, not all elements are printed as it seems from the first look. Another hint -- sizeof's type is unsigned, and the logical operator triggers the Arithmetic Conversion, see above.

👀 whole-array.c