Rewrite the convert small characters to upper case program using a function.
- There is a
toupper()
library function from C99 so use a different name, saymytoupper()
- What happens if the program defines
int toupper(int)
?- The symbol in
libc
is weak so it works
- The symbol in
- What happens if the program defines
- Generate array indexed by
a
..z
(with corresponding valuesA
..Z
) inmain()
and use the array in themytoupper()
function - Arrays passed as function argument are in reality converted to pointers (to be
introduced later)
- So, it is useless to write their size like this:
/* see what is the value of sizeof(array) inside the func() */
void func(int array[3]);
- This is better:
void func(int array[], size_t size);
-
Also, if the items in the array are changed in the function, they will be changed in the array itself (consequence of pointer conversion).
-
What happens if
mytoupper(-1)
? -
short
is sufficient to store the value:
short uppper[] = { 'A', 'B', ... };
There are multiple variants of the grep(1) Unix utility. One of them is
fgrep
aka grep -F
that prints the lines received on standard input which
contain given fixed string (as opposed to a regular exception).
fgrep
can match multiple strings.
🔧 Write code to implement part of the fgrep
functionality where the
program will match a single string which is hard-coded in the program
in the form of (boundless) array (char needle[] = "foobar";
).
The program will use getchar()
to read the characters from the standard input.
Sample program run (with the search string hard-coded as bash
):
$ cat /etc/passwd | ./a.out
root:x:0:0:root:/root:/bin/bash
foo:x:1000:1000:Foo,,,:/home/foo:/bin/bash
backu:x:1001:1001:backup,,,:/home/backu:/bin/bash
perforce:x:133:141:Perforce Admin:/opt/perforce:/bin/bash
compared to the fgrep(1) utility:
$ fgrep bash /etc/passwd
root:x:0:0:root:/root:/bin/bash
foo:x:1000:1000:Foo,,,:/home/foo:/bin/bash
backu:x:1001:1001:backup,,,:/home/backu:/bin/bash
perforce:x:133:141:Perforce Admin:/opt/perforce:/bin/bash
Assume a reasonable maximum line width, e.g. 1024 bytes including the newline.
Print an error and exit with 1 (i.e. return (1)
from main
) when hitting a
line that crosses that limit. Test that it works as expected!
The line limit is enforced to make the program simple, i.e. prevent using huge amounts of memory and avoid memory reallocation.
Note that it is generally better to fail rather then silently choose what to do on situations like that. For example, just ignoring the rest of the line after the 1024 byte limit which might be very confusing for users.
Write tests (as a home assignment), watch out for corner cases.
Solution: TBD. You can send us your solution so that we use it here. Do it as simple as possible.
We know that we can use sizeof
to get a length of a character array including
the terminating '\0'
character. There is a more generic way to get a string
length, function strlen()
that returns the number of characters without
the terminating zero. Note that when we know what pointers are, the
existence of the function will make more sense.
The function is declared in <string.h>
and returns a value of type size_t
,
that is the same type as the operator sizeof
uses. So, remember to use the
z
modifier to the %u
conversion specifier:
#include <string.h>
char s[] = "hello, world";
printf("Length of \"%s\": %zu\n", s, strlen(s));
Note that to include a literal "
in a string, it is necessary to escape it
with the backslash.
strlen()
can be used to check if string is empty, however if not, it will
incur performance penalty as it has to go through all the characters.
Thus, it is better to check just the first character - if it is \0
then
the string is empty.
- Automatically allocated based on a variable value. Only for local variables, not globals. Same warning applies w.r.t. array size and stack growth as with fixed width arrays.
void fn(int n, int m)
{
int a[n + m];
...
}
-
❗ Note that it generates more code which might be avoided with a choosing a reasonable constant, if possible. Also see a rant from Linus.
History: you may still see in very old code: K&R (Kernighan and Ritchie) definition with types of arguments on separate line(s):
int
foo(a, b)
int a; int b;
{
/* body */
}
Some compilers have dropped the support for K&R definitions or will drop it soon.
Like an array, function is not a first class object (i.e. no "functor" like in functional languages).
- That said, there are pointers to functions (more on these later).
-
Every identifier has its scope.
-
The scope of an identifier is the part of the program within which the identifier name can be used.
-
For example, we can declare a variable
int i
in themain
function, and as well as in another function. Those two variables reference different objects. -
For now, we only care about two types of scope - a file, and a block.
-
When declaring an identifier inside a block or within the list of parameter declarations in a function definition, the identifier has block scope. Its scope is from the identifier declaration to the end of the block.
-
When declaring an identifier outside of any block or list of parameters, the identifier has file scope.
-
Note that a block is used in multiple constructs in C, e.g.
while
,for
, etc. -
Within the inner block, an identifier of the same name declared in the outer block is hidden (and not visible). See:
- Functions can have variable number of arguments of different types (more on
that later).
- It must have at least one fixed argument (or more), and the rest is
denoted with three dots
...
- The dots are called ellipsis
- It must have at least one fixed argument (or more), and the rest is
denoted with three dots
int printf(fmt, ...);
- The first argument typically describes the rest of arguments in some way
- Usually it is a format string as in
printf
. It could possibly be an argument count if those were of the same type however in that case they would probably be passed as an pointer (we will get to pointers in a later class).
- Usually it is a format string as in
🔧 write a function that takes variable number of ULL
arguments and
returns their sum
- Use the stdarg(3) man page.
- What happens if there is one
int
value passed in between theULL
arguments?
🔑 stdarg.c
We already saw in the numbers module that you need to
match the type of arguments with the type of conversion specifiers. Why does
the following work? We are using an argument of type char
, which is 1 byte,
but the printf
functions assumes through the %d
specifier there are 4 bytes
on stack (x32) or in the register (x64). Yet it works. Why?
char c = 13;
printf("%d", c);
cond ? expr1 : expr 2
Is the same as:
if (cond) expr else expr2
Example:
max = (i > j) ? i : j;
Note that if we add a semicolon, an expression becomes a statement, e.g.:
(i == j) ? i++ : j++;
That will increment i
or j
. The parentheses are not needed but generally
used for better readability.
code: 👀 ternary-assign.c
Rewrite the convert small characters to upper case program from last time using a function that utilizes the ternary operator (single line of code)
Write an expression that returns maximum of 3 integers i
, j
, k
using
ternary operator. Do not use any macros (in case you know those).
🔑 3max.c
The way negative numbers are stored is implementation defined in C99. In our world, they are usually stored in two's complement format. However, the other allowable options are one's complement and sign and magnitude.
In short, to get two's complement, you take an absolute value, create one's
complement (inverting the digits in binary representation) and add 1. There are
several advantages of this representation, one is that there is only 1 zero (no
negative and positive zero if we used the highest bit to track the sign). That
is why a signed char
, for example, can hold -128 to 127, and not just -127 to
127.
For a char
:
10000000 -128
........
11111101 -3
11111110 -2 ^
11111111 -1 |
--------------------------
00000000 0 |
00000001 1 v
00000010 2
........
01111111 127
On Unix systems the shell reports the -1 return value as 255 in echo $?
. Even
though the main()
returns integer (4 bytes), the calling program (shell) gets
just the low 8 bits and interprets them as unsigned quantity.
In (very) general, if binary operators have operands of different types, the
"lower" is promoted to "upper". E.g. if one operand is a long
and the other
is an int
, the other operand is promoted to a long
. An unsigned type is
always higher to its corresponding signed type. That is, if summing up a
signed int
with an unsigned int
, the signed one will be converted to an
unsigned first.
To give you a glimpse into what this means:
int
main(void)
{
/* Also try with '-10 + 1U + 10U' */
printf("%ld\n", -10 + 1U + 10L);
}
You would expect the result to be 1
, right?
$ cc main.c
$ ./a.out
4294967297
However, -10 + 11U + 10L
would result in 11
, and -1 + 2U
would result in
1
.
What is more, in most operations (for every operator, the spec says what is
done in this respect), char
and short
integer operands are promoted to
an int
, or to an unsigned int
if they do not fit in an int
. That
conversion is called integer promotion. This is done to make the language
runtime fast (on x86 32-bit arithmetics can be much faster than when using
16-bit operands).
- Note that the only way an
unsigned short
would not fit a (signed)int
but fit anunsigned int
is if both types had the same size (e.g. 4 bytes). You will probably not use such a compiler/platform combination, ever.
The most operations from the above sentence means all binary operators. A ternary operator as well. And even some unary operators.
sizeof (1)
is 4 because 1 is an int
. However, if a number does not fit to
an int
, a higher type will be used. For example, 4294967296
(2^32 =
UINT_MAX + 1
) will be stored in 8 bytes, so sizeof (4294967296)
is 8.
Does it sound confusing? Do not worry, we will give you specific rules later in Arithmetic/integer promotion and conversion section.
🔧 Verify that numbers that do not fit in an int
will have a size of 8
bytes.
Assuming char c;
declaration, then:
-
sizeof (999)
is 4 as999
is anint
by definition. -
sizeof (c)
is 1, always -
sizeof (c + c)
is 4 as+
is a binary operator and the integer promotion kicks in, as mentioned above -
sizeof (++c)
is still 1 as++
is an unary operator for which the integer promotion rules do not apply. -
sizeof (+c)
is 4 as for unary+
and-
, the integer promotion is applied.- You just need to know or check the spec if you do not remember. Most of the time it does not matter but do not fool yourself, if you do not know details like that in C, it will bite you sooner or later.
- Remember A Case Study of Toyota Unintended Acceleration and Software Safety from the first class
-
sizeof (1LL)
will usually be 8 aslong long
is usually 8 bytes.
It gets more interesting if unsigned and signed numbers are involved. E.g. a
signed int
is promoted to an unsigned int
if one of the int
operand is
unsigned. This is called an implicit type conversion. There is also
explicit type conversion (called casting) which we will deal with later in
Arithmetic/integer promotion and conversion section.
To learn more about the implicit conversion, e.g. what the specification says
exactly about assigning a double
value to an int
, see the 6.3 Conversions
section of the
C99 specification
I suggest you try these out with printf("%zu", ...)
. The %zu
format string
(see the printf
manual page in section 3 on Unix systems) matches the return
type of the sizeof
operand. The exact unsigned numeric type of what sizeof
returns may differ in different implementations so %zu
will work anywhere.
If long
is 8 bytes, and int
4 bytes, then -1L < 1U
is true as you might
expect because 1U
is converted to a long
because a long
can represent all
values an unsigned int
.
However, if both types are 4 bytes, the relational expression is false! The reason is that in this case, as both types has the same integer conversion rank (C99 6.3.1.1), the signed number is converted to an unsigned. See 6.3.1.8 Usual arithmetic conversions, paragraph 1, for the details.
Two's complement representation of -1 in 4 bytes is:
(1) take absolute value of 1 00000000.00000000.00000000.00000001
(2) one's complement 11111111.11111111.11111111.11111110
(3) add 1 to get 2's complement 11111111.11111111.11111111.11111111
which is 2^32 - 1
when interpreted as unsigned quantity.
Just printf("%u\n", -1)
to see it will print 4294967295
(use the Unix/Linux
bc
command and type 2^32-1
to verify).
👀 signed-plus-unsigned.c 👀 signed-to-unsigned.c
The assymetry of the negative/positive interval can lead to the program crashing on architectures that detect it. This is a consequence of hardware handling rather than the language.
- Run with
-INT_MIN
(seelimits.h
) and-1
.INT_MIN
is usually-2147483648
. The program would normally yield2147483648
(positive number) however that would beINT_MAX + 1
. - Should crash in 64-bit mode as well due to an
int
being passed in a 32-bit register.
$ cc -m64 crashme.c
$ ./a.out 2147483648 -1
-2147483648 -1
Floating point exception: 8
$ echo $?
136
$ kill -l 136
FPE
$ cc -m32 crashme.c
$ ./a.out 2147483648 -1
2147483647 -1
$
- What is the result if
signed char
represented in binary by0xff
and0xff
unsigned char
are compared using the==
operator ?- Write down the hexadecimal representation of the integers
corresponding to the 2 chars with
printf()
- Write down the hexadecimal representation of the integers
corresponding to the 2 chars with
- Note that if the
b
character was defined aschar b
the result might be 1 because it is up to the compiler to choose whetherchar
is signed or unsigned. Usually it is signed though. There are compiler options to specify this, e.g. GCC has-funsigned-char
and-fsigned-char
Will the program print the whole array ? - Try to come up with reason of the expected behavior before running the program.
Say, what happens if you put a char
as an argument of a function that accepts
an int
? As you might expect, the byte is just assigned to an int
.
In general, the expressions passed as arguments are evaluated first, then the value of the resulted type is converted to the parameter type as in assignment.
More on that later. For more information, see 6.9.1 in C99
The order of evaluation of function arguments is unspecified, see 6.5.2.2p10 in C99:
The order of evaluation of the function designator, the actual arguments, and subexpressions within the actual arguments is unspecified, but there is a sequence point before the actual call.
That means, do not do things like these as it is not specified whether the 2nd
argument expression is evaluated after the first agument is processed, or
before. So, based on the compiler you use, you may either call the function
with (1, 2)
or (2, 2)
.
int i = 1;
myfn(i, ++i);
- Each compiler has its own set of warnings that usually give a clue that
something strange and potentially harmful can happen.
- There are differences between various compilers.
- We will focus on GCC here.
- During the old days the tool producing these warnings was called
lint
that had to be run separately (there was not enough memory to do both at once). Now those extra checkes are usually part of the compiler itself. - The basics: use
-Wall -Wextra
-
-Wall
catches things like missing return value from a function that should return one. - 👀 no-return.c
-
- There are many places where a beginner can shoot himself into a foot
by not knowing the language intricacies.
- E.g. variable cannot be modified more than once between
sequencing points (C99 6.5 paragraph 2). The
-Wsequence-point
that is included in-Wall
warns about that. - 👀 sequence-point-violation.c
- E.g. variable cannot be modified more than once between
sequencing points (C99 6.5 paragraph 2). The
- The
-Wshadow
can catch shadow variables overriding: - All or specific warnings can be turned into errors:
-Werrror
or-Werror=<insert_specific_error>
, respectively.- Unless the warnings contain false positives (and those can usually be suppressed) this will help to avoid troubles later.
- There are other means to check for correctness (static/dynamic analysis), more on those later.
Go through the programs written so far and run the compiler using the -Wall -Wextra
options.
- What kind of problems have you discovered? How to fix those?
👀 whole-array.c
(where only -Wextra
gives some clue)
Explore the compiler documentation for more helpful options. Over the time you will find a set of warning options that will serve you well.
From C99, Annex J.
...
implementation-defined behavior is unspecified behavior where each implementation documents how the choice is made.
An example of implementation-defined behavior is the propagation of the high-order bit when a signed integer is shifted right.
...
undefined behavior is behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements.
Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
An example of undefined behavior is the behavior on integer overflow.
...
unspecified behavior is use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance.
An example of unspecified behavior is the order in which the arguments to a function are evaluated.
See also More Information on Undefined Behavior.
What happens if an int
overflows?
- The behavior of an integer overflow depends on whether the type is
signed or unsigned.
- ❗ For signed types the behavior is undefined. I.e. you cannot rely that an overflow of a positive quantity in any signed integer will be turned into a negative number.
- Some compilers will allow to choose the behavior of signed overflows
using special options (
-fwrapv
for GCC), though. - For unsigned integers, the spec requires that overflow always wraps around (modulo power of 2).
- Use
-fstrict-overflow -Wstrict-overflow
(will become active only for higher optimization levels, i.e.-O<X>
whereX > 1
) to stay on the safe side.
See the difference in using -ftrapv
and not:
int
main(void)
{
int i = 2147483647;
printf("%d\n", ++i);
}
$ gcc main.c
$ ./a.out
-2147483648
$ gcc -ftrapv main.c
$ ./a.out
Aborted (core dumped)
write a 1-D implementation of Conway's game of life
- use rule 30 to determine the state
- use two arrays (one 2-D and one 1-D) to represent the rules and its
resulting values
- there are 8 rules, each has 3 values to compare against and one result
- we now have the arsenal to write 2-D variant however to display 2-D
world some way to refresh the terminal would be needed so we will stick
with 1-D.
- use
\r
to reset the line between iterations - sleep for 1 second between iterations (
unistd.h
is needed for that)
- use
- each life "tick" will print the line representing the world
- use functions to refactor the code
- once you have a working solution, try to rewrite it to be more efficient or elegant (or both)
Write function to convert uint64_t
to human readable count (binary - see
https://en.wikipedia.org/wiki/Orders\_of\_magnitude_(data) or
https://en.wikipedia.org/wiki/Mebibyte, e.g. MiB as mebibyte, etc.) and print it
to standard output.
void bytecnt2str(uint64_t num);
-
use character array to represent the magnitude letters and perform the lookup based on the actual magnitude
-
if there is a remainder, write a '+' following the number
-
write the output in single
printf()
(use ternary operator for the remainder) -
example inputs/outputs:
1024 1 KiB
1025 1+ KiB
2047 1+ KiB
2048 2 KiB
2049 2+ KiB
5242880 5 MiB
solution: bytecnt2str.c
write a function that has the following prototype:
bool check(long off, size_t size, size_t limit);
- checks if the arguments are valid for accessing part of a file (no operations will be done, just the check) of size 'size' starting from an offset 'off'. The length of the file is given by the 'limit' argument.
- offset may be negative
| | | |
0 off off+size limit
- try to capture all corner cases of what could go wrong with the values
and return
false
on failure,true
on success.
-
on Unix systems one would use
ssize_t
for the offset which is a signed integer type of the same size assize_t
(ssize_t
is not part of C99 but POSIX) -
also, implement a set of test values in
main()
using an array