Warm-up, common mistakes, multi-dimensional arrays, integer promotion and conversions.
Implement functionality provided by the following command with the specific options. Just hardcode the parameter in the code, e.g.
#define CHAR_DO_DEL 'x'
See the tr(1) manual page if unsure.
Example:
$ cat /etc/passwd | tr -d r > output
$ cat /etc/passwd | ./a.out > output2
$ diff output output2
$ echo $?
0
Squeeze multiple occurrences of the character into one.
$ echo "hellooooooo, wooooooorld" | tr -s o
hello, world
Now for both variants, you can extend it to use not a single character,
but a character set (using character arrays), just like tr
does.
Some of you often end up doing the following:
#define MYDEF = 3
And you hit a syntax error when MYDEF
is used later in the code.
What happens? A preprocessor replaces occurences of MYDEF
with whatever is
after MYDEF
, and separated from it with a sequence of white spaces (tabs,
spaces, newlines if escaped).
In the case above, MYDEF
will be literary replaced with = 3
.
Check with gcc -E
(or cpp
) which stops after the preprocessor phase (ie. it
does NOT compile anything).
👀 common-mistake-with-define.c
#define MYDEF = 3
if (i < MYDEF) {
// ...
}
The code above will end up in a syntax error as i < = 3
is not a correct
expression in C because of the space between <
and =
. Note that there is
an existing operator <=
which means the obvious but having operators <
and
=
next to each other is illegal.
$ gcc -E common-mistake-with-define.c
...
...
if (i < = 3) {
}
See the preprocessor for more information.
- Multi-dimensional arrays: it is an array of arrays
int a[3][2] = { {0, 1}, {2, 3}, {4, 5} };
-
a
is an array of 3 elements. Each of the element in an array of 2 elements.- we read it that way because the operator
[]
, called array subscripting, is evaluated left-to-right. See operator precedence for more information.
- we read it that way because the operator
-
To access a given item all subscripts (= indexes) have to be specified in square brackets. i.e.
a[1, 2]
does not work, you have to usea[1][2]
. However, beware thata[1, 2]
is not a syntax error as1, 2
are two operands separated by a comma operator, soa[1, 2]
effectively meansa[2]
. -
In memory this is stored as consequent individual "lines". In other words, any array, no matter how many dimensions it has, is stored as one piece of contiguous memory.
-
For a 3-D array it looks as follows.
a
is an array of 2-dimensional arrays:
int a[2][4][3] = {
{ {1, 2, 3}, {4, 5, 6}, {8, 7, 8}, {9, 1, 9} },
{ {0, 0, 0}, {1, 1, 1}, {2, 2, 3}, {4, 4, 5} },
};
- in memory stored as:
| 1 2 3 | 4 5 6 | 8 7 8 | 9 1 9 | 0 0 0 | 1 1 1 | 2 2 3 | 4 4 5 |
- As with the 1-dimensional array, the most "significant" dimension of the array may be omitted if statically initialized, but nothing below, i.e. these are fine:
int a[][5] = { 0 };
int a[][3][6] = { 0 };
However these are not:
int a[5][6][] = { 0 };
int a[5][][8] = { 0 };
-
The reason is that if you do not specify how many "lines" are there in memory, it is OK as that can be recognized from the initializer but you always need to know the "length" of the "line" so that you can store them one by one.
-
See above for
a[2][4][3]
and its memory layout. We can easily add a new element ofa
which is a 2-dimensional array, but there is no way we can store thea
array unless we know the exact dimensions of the 2-d subarray. -
As before, if you do not initialize the rest, it will be set as 0. You can also use designated initializers the same way as with one-dimensional arrays.
-
The operator
sizeof
works as expected. So, the following piece of code prints a size of of an element ofa
, which is an[3][6]
subarray ofint
s, so72
is printed (3 * 6 * sizeof (int)
) unless you are on something you borrowed from the Computer History Museum. Side note: if you happen to be in Silicon Valley, it is worth going to Mountain View to visit this one.
int a[][3][6];
printf("sizeof = %zu\n", sizeof (array[0]));
🔧 Define 2-dimensional array of integers with dimensions [4][2]
,
statically initialize it, and write a function that accepts such an array
and prints the first element from each [2]
subarray.
🔧 Write a program that takes a 2-D array of integers and constructs a 1-D array of maximum values in each sub-array, then prints out the new array to the standard output. For the maximum value in a sub-array, write a function.
You can use variable-length arrays in function arguments themselves, like this:
int
myfn(int width, int a[][width])
{
...
}
As mentioned before, this generates more code in comparison to statically defined array dimensions and we do not recommend using it.
NOTE: the following is a simplified version of what is in the standard. You need to consult the specification if unsure. See the standard for the PDF link.
We already mentioned some of this in an Arithmetic type conversions section but now we will be more specific.
First, let's define three different actions, then we will go through those one by one.
Integer promotion: char
-> int
; short
-> int
; or both to unsigned int
if the value does not fit a signed int
(may happen if a short int
is of
the same size as an int
, and unsigned short
is used).
Integer conversion: converting integers (e.g. assigning a signed long
to an
unsigned char
, or assigning an unsigned int
to a signed char
)
Arithmetic conversion: many operators cause conversions. The effect is to bring the operands into a common type before the operator is applied. For example:
int i;
unsigned long long ull;
i + ull; // type of the result is unsigned long long and 'i'
// is converted to unsigned long long BEFORE '+' is
// applied. See below what happens if 'i' contains a
// negative number.
Integer promotion (we promote char
s and short
s only here) usually
happens with binary and ternary operators before arithmetic conversion happens.
It sometimes happens with unary operators as well. You need to consult the
specification. For example:
++ and -- *NO* integer promotion
! (negation) *NO* integer promotion
+ and - integer promotion happens for both unary and binary operations
That means for char c
:
sizeof (c) // is 1
sizeof (++c) // is 1
sizeof (--c) // is 1
sizeof (!c) // is 1
sizeof (+c) // is 4
sizeof (-c) // is 4
sizeof (c + 1) // is 4
// ...
Note that we already know that c
is never modified as the expression in
sizeof
is never evaluated. See the
sizeof module
for to review the knowledge.
Remember, to printf
a value of sizeof
, use it like the following (the
sizeof
result is always unsigned, and the z
modifier makes sure its size
(usually 4 or 8 bytes) matches the implementation, ie. whatever size_t
is):
printf("%zu\n", sizeof (c));
Consult the printf
man page if unsure about printf conversions and modifiers.
Integer promotion also happens in arguments of variadic functions (eg.
printf
). That is why the following works as expected:
char c = 'A';
printf("%c", c);
printf(3) man page says:
c The int argument is converted to an unsigned char, and the
resulting character is written.
So it means that %c
expects an int
but we put a char
there. However,
since an integer promotion is applied on non-fixed arguments of
variadic functions,
c
is converted to an int
before it is used as an argument to printf
.
Note that on a 64-bit x86 platform, this conversion means no extra code as the
argument (= char
) is just put to a 32-bit register.
This is about converting integers only.
There are three parts when converting integers:
-
Assigning an integer to another integer while the source value fits into the target -> the result is specified by the standard. The result is the same value.
-
Assigning any integer to an unsigned integer -> result specified by the standard, see below.
-
Assigning an integer to a signed integer and the value does not fit. The standard says the result is implementation-defined. It means the implementation (i.e. the compiler) must choose how to behave in such a situation and must document it. See the standard, section 3.4.1, for the precise definition. See also types of behavior.
The first rule is simple and needs not much discussion.
long long int li = 13;
/* Is it guaranteed the one byte 'c' will be 13 even that sizeof(li) is 8. */
signed char c = li;
Note that the case already showed above, char c = 'A';
, also fits this rule.
'A'
is a character constant from ASCII, so its type is an int
, and its value
fits a char
no matter whether the char
type is signed or unsigned. ASCII by
its specification only uses values 0-127 and even signed char
is required to
accommodate that range. So, as 'A'
fits the c
object, this situation is
covered by the first rule.
6.3.1.3 Signed and unsigned integers ... ...if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Use bc(1) to work with big numbers:
$ bc
2^32
4294967296
(2^32 + 17) % 2^16
17
...
...
So, the following happens when assigning the numbers below to an unsigned char
:
unsigned char c = 257; // -> 1 (257 - 256)
unsigned char c = 258; // -> 2 (258 - 256)
unsigned char c = 1000; // -> 232 (1000 - 3*256)
/* -> 239 (4294967279 - 2^32 + 256), and it's the same as 4294967279 % 256 */
unsigned char c = 4294967279;
Examples in C:
/* 'c' will be 24 */
unsigned char c;
c = -1000; // -1000 + 4 * 256
/* 'i' will be 4294967286 */
unsigned int i;
i = -10; // -10 + 2^32
When any integer is converted to a signed type, the value is unchanged if it can be represented in the new type (that is rule 1 above) and is implementation-defined otherwise.
For gcc, that implementation-defined behavior is documented in Integers implementation
The result of, or the signal raised by, converting an
integer to a signed integer type when the value cannot be
represented in an object of that type (C99 6.3.1.3):
For conversion to a type of width N, the value is
reduced modulo 2^N to be within range of the type; no
signal is raised.
So, with gcc
(and probably any other compilers you might meet today), this
means the wrap-around rule is applied for signed integers as well. However, let
us repeat that the following is an example of implementation-defined behavior
tied to the gcc compiler.
signed char c = 128; // 128 - 256, ie. -128 will be in 'c'. Might surprise
// one, right?
printf
's hh
modifier is for printing a char
(as a number, do not confuse
it with the c
modifier which prints a character), h
for a short
. Note
that i
is first converted to an int
if it is not already, as arguments of
variadic functions goes through integer promotion, as we already know. Then,
it is converted to a char
inside printf
.
/* This will print 1 if compiled with gcc. */
int i = 257;
printf("%hhd\n", i);
The compiler will probably warn you to let you know that your int
might be
truncated.
xxx.c:10:19: warning: format specifies type 'char' but the argument has
type 'int' [-Wformat]
printf("%hhd\n", i);
~~~~ ^
%d
BTW, when mentioning above those three parts when converting integers, it does not hurt to cite the C99 standard in full on this. The following section covers all the integer conversions we went through above. The below mentioned implementation-defined signal might be to print an error and exit, for example.
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than
_Bool, if the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly
adding or subtracting one more than the maximum value that can be represented
in the new type until the value is in the range of the new type.
3 Otherwise, the new type is signed and the value cannot be represented in it;
either the result is implementation-defined or an implementation-defined
signal is raised.
Example: 👀 integer-conversion.c
Simply put, arithmetic conversion is about converting all arguments of an operator to the smallest common type before the operator is applied. That includes arithmetic operators as well as logical operators (>, <, <=, ==, >=)
The details are in 6.3.1.8 Usual arithmetic conversions, paragraph 1.
Example:
char c;
signed int i;
unsigned int u;
long long ll;
c + i + u + ll;
We will learn about associativity later, but let's accept that the compiler actually assumes the following:
((c + i) + u) + ll)
What happens:
-
c
is converted to anint
based on Integer Promotion with a binary+
operator - the result type of
(c + i)
is anint
as both operands areint
s - that result is converted to an
unsigned int
(= type ofu
which is "bigger" thanint
) using the Integer Conversion we just introduced above. - the result type of
(c + i + u)
is then anunsigned int
- the result from 4) is then converted to a
long long
because of the "bigger" type ofll
- the
long long
is the result type of the whole expression
Note that at every step, Integer Conversion rules were applied. Also, an unsigned integer type is "bigger" than the corresponding signed type, see on the integer conversion rank.
int i;
unsigned int u;
i + u; // 'i' is first converted to unsigned int
That is why -1 > 1U
is TRUE since -1
is first converted to an unsigned int
, ie. -1
is converted into -1 + 2^32
, i.e. 4294967295
, and the
expression is then evaluated as:
(4294967295U > 1U)
The arguments of ternary operator are also first converted to a common type no matter which branch will be returned. For example:
unsigned int i = 0 ? 1U : -1;
printf("%u\n", i); // will print 2^32-1 (ie. 4294967295)
However, if you do this:
int i = 0 ? 1U : -1;
Then the result of the ternary expression is 4294967295
but then it is
converted to a signed int
. And 4294967295
does not fit an int
whose range
in two's complement (gcc) is [-2^31, 2^31-1]. So, if we rely on gcc
implementation-defined behavior, there is a modulo operation to the rescue and
we get -1
:
// (-1 + 4294967295) == 2^32
int
main(void)
{
int i = 0 ? 1U : -1;
printf("%d\n", i);
}
The compiler may (or may not!) warn you as you are triggering something that is implementation-defined:
$ cc main.c
"main.c", line 6: warning: initializer does not fit or is out of
range: 0xffffffff
$ ./a.out
-1
The spec does not say how signed numbers are internally stored. That is just an implementation detail. There are other ways to do it than via the two's complement. See Signed number representations.
However, gcc, clang, and all other compilers you probably ever meet use two's complement, as documented on Integers implementation
- Whether signed integer types are represented using sign and
magnitude, two's complement, or one's complement, and
whether the extraordinary value is a trap representation or
an ordinary value (C99 and C11 6.2.6.2).
GCC supports only two's complement integer types, and
all bit patterns are ordinary values.
Try to figure out using bc(1) and what you have just learned what will be in the variables:
/*
* Let's assume that:
*
* short int is 2 bytes
* int is 4 bytes
* long is 8 bytes
* long long is 8 bytes
*
* Compile with "gcc -m64" to match the above.
*
* Let's assume that we use gcc that uses modulo to fit when assigning to a
* signed integer.
*
* Include <limits.h> for INT_MIN (-2^31) and INT_MAX (2^31 - 1).
*/
/*
* Use bc(1) to manually compute the values of the left sides. When done,
* incorporate into the C code, compile, and verify you got it right.
*/
unsigned long long ull = -13; // Use %llu in printf() to verify.
signed char sc = 999; // %d
short int si = -1; // %hd
unsigned int ui = -1; // %u
unsigned short int usi = 999999; // %hu
signed char sc = 0 ? -10 : 0U; // %d
signed char sc = -1 ? -10 : 0U; // %d
long long ll = 1U + -10; // %lld
unsigned long long ull = 1U + -10; // %llu
unsigned short int usi = INT_MIN + 13LU; // %hu
unsigned short int usi = -INT_MAX + 13U; // %hu
signed char sc = 129; // %d
/* What is printed? */
printf("%hhu\n", -3);
printf("0x%hhx\n", -3);
printf("%c\n", 321);
printf("%c\n", -191);
The code is in here to verify: :eyes: integer-conversion-assignment.c
Also check the following code and try to figure out what is going to be printed:
Now you also know everything you need to know to figure out what happens in the
following code. Hint, not all elements are printed as it seems from the first
look. Another hint -- sizeof
's type is unsigned, and the logical operator
triggers the Arithmetic Conversion, see above.
Note that home assignments are entirely voluntary but writing code is the only way to learn a programming language.
🔧 Write a simple moutain generator. At every iteration you go either
straight (-
), up (/
), or down (\
). The program generates something like
the following:
$ a.out
/ /--
/ / \- /-- / \ / /-
/ \ \ \ /-- \- /- \ \
/ \ / \ /-- / \
/ /- \- /-- /- \- / \ \
/ \ \- \ / \- /-- /-
/ \ \- \
/--
-
With all the variants below, try to make the code as simple as possible. You can do really cool stuff with quite little code.
You will need a two-dimensional array to fill, then print. Use memset()
to
initialize the array with spaces. Check the man page for memset
. For the
first argument of memset
, pass an array name. Its dimension does not matter
in this case as long as you use its correct size in bytes.
For random numbers, use rand()
and %
for modulo. To initialize the random
generator, use sranddev()
if you have it, or srand(time(NULL))
. Check the
documentation if unsure as each function is supposed to have its manual page.
There is no language construct to initialize all elements of an array with a
specific non-zero value, that is why we need memset
. You can only zero it out
using an initializer { 0 }
, as we already know.
The algorithm goes from left to right, one character at a time. At each point it decides whether the mountain will grow, descend or remain the same (hence the random numbers).
Once you got a working program, refactor the code into small functions (one for printing a character based on random number, one for printing the whole 2-D array, etc.). Optionally you can try to avoid global variables by passing the array as parameter of a function. In that case, you might try to use a VLA in function arguments to see it works. See multi-dimensional arrays for more information.
You can make it more complicated and make the ascii art smoother. For example,
you can define that after /
you cannot go one character down with \
(see
above what we mean) but you could do /\
, etc. You would need to keep a state
of the previous character. You could generate something like this (use your
imagination):
.
/ \
/ \__
/\_/ \__/|
/ |
/ \__/............................
_/
The top-level function (mountain()
) can be also called with the array (and its
dimensions) as input and you can try calling it multiple times to see if a
mountain range can be generated.
Usually, there is snow on the peaks.
Come up with something else.
However, whatever you do, the objective is to write simple code. So, If you have something cool, send us the code, please!