Print out number of words read from the standard input. Ideally, use your
getchar
based code from your repository to start with.
Always try to reuse code you have already written! Then do not forget to upload it to your source code management repository.
A word is defined as a group of any characters surrounded by whitespace characters. Those are: a tabelator, a space, and a newline.
-
Write your own check for a whitespace character, do not use library functions for that.
-
Check correctness of your implementation with
wc -w <file>
-
What happens if the number of words exceeds the type that stores the count?
🔑 words.c
Does uses a library check isspace
for white space for simplicity.
🔑 words2.c It is even simpler than the first solution while not using the library function.
Example:
$ cat /etc/passwd | ./a.out
70
You define an array like this <element-type> array[<some-number>]
, for
example:
int array[5];
The integer value n
in [n]
specifies the number of array elements.
- An array subscript (also called an index) always starts from
0
and ends withn - 1
- A subscript may be any integer expression
- The type of array elements is called an element type
So, in array int array[3]
, elements will be accessible as a[0]
.. a[2]
.
Each element is of type int
, and therefore you can do e.g. printf("%d\n", a[2]);
.
-
0
was chosen as the first subscript so that it was easier to work with pointers, and also for array access efficiency - we will get to pointers later. -
What is not possible to do with arrays in C (limitations are important knowledge):
- Associative arrays
- Array subscripts returning a sub-array (like found e.g. in Python)
- Assigning to an array as a whole
-
As with integer and floating point variables, we may initialize an array during its definition. In general, the contruct for any variable initialization (not just arrays) is known as an initializer.
short array[3] = { 1, 2, 3 }; // "{ 1, 2, 3 }" is the initializer
If the array size is omitted the compiler will compute the size from the number of initializers. So, you can just do the following:
short array[] = { 1, 2, 3 };
char array[] = { 'h', 'e', 'l', 'l', 'o' };
If you need your array to contain only the elements from the initializer, omitting the array size is the way to go to avoid errors in modifying the initializer while forgetting to update the array size.
-
The
sizeof
operator on array always gets the array size in bytes. It will not get the array size in elements.- To get the number of elements in an array, you must divide the array
size in bytes by the size of its element. Always use
0
subscript, see below on why.
- To get the number of elements in an array, you must divide the array
size in bytes by the size of its element. Always use
int a[5];
printf("Elements in array: %zu\n", sizeof (a) / sizeof (a[0]));
The above is the correct approach that is immune to changing the array
declaration (i.e. the type of elements). As arrays may not be empty, there is
always an element [0]
(see below). Do not use the following:
sizeof (array) / sizeof (int)
🔧 Declare an array of char
s without setting the array size, initialize
the array during declaration (see above), then print each element of the array
on a separate line using a while
loop.
Arrays introduced so far are not dynamic and can not be resized.
- Try to perform an out-of-bound access. What is the threshold for behavior change on your system ?
- Why is it not faulting for the one-off error?
$ ./a.out
Number of array elements: 1024
One-off error (using index 1024)... OK
Assigning to index 4096... Segmentation Fault
You can also try to locate where it crashed. For more information, see some info on debugging.
You do not need to initialize all elements. With such type of an
initialization, you always start from subscript 0
, and there are no gaps:
short array[4] = { 1, 2, 3 };
In the example above, the elements not explicitly initalized are set to 0
so
the value of array[3]
will be initialized to 0
.
-
I.e.
int array[100] = { 0 };
will have all values set to0
-
The initialization is done in code added by the compiler. That code is part of the runtime system of the language.
-
Using
= {}
is not allowed by the C specification (allowed in C++) but generally accepted. Not with-Wpedantic
though:
cc -Wpedantic test.c
test.c:1:13: warning: use of GNU empty initializer extension
[-Wgnu-empty-initializer]
int a[10] = {};
^
1 warning generated.
Note: global variables are always zeroized.
There is a partial array initialization where the initializers are called designated initializers in the C spec:
char array[128] = { [0] = 'A', [2] = 'f', [4] = 'o', [6] = 'o' };
- A subscript is in square brackets
- The
[subscript]
is known as a designator. Increasing ordering is not required but expected. - The rest of elements will be initialized to zero
- If you do not specify the array size, it is taken from the highest designator index
- You can combine designators with fixed order initializers, and you always start with the next index. For example:
/* a[5] is 'e' etc. a[0]..a[3] are NUL characters (= zero bytes). */
char a[128] = { [4] = 'h', 'e', 'l', 'l', 'o' };
- You cannot specify a repetition or setting multiple elements to the same value (there is a gcc extension for that but let's not go there).
👀 array-designated-initializers.c
Note that the code file right above mentions in a comment that a missing =
is
a GCC extension. With a non-GCC compiler it does not compile:
$ cc array-designated-initializers.c
"array-designated-initializers.c", line 15: syntax error before or at: 'A'
cc: acomp failed for array-designated-initializers.c
Once an array is declared, its elements cannot be assigned at once. So, you can only do things as follows:
int array[4];
array[0] = 1;
array[1] = 2;
array[2] = array[3] = 3;
// ...
You cannot assign an array into another array - has to be done an element by element.
- Likewise for a comparison
Arrays cannot be declared as empty (int a[0]
).
- This is explicitly forbidden by the standard, see C99 6.7.5.2 Array declarators.
- GCC accepts that though. Do not use it like that.
This might be a bit confusing though:
$ gcc empty-array.c
$ ./a.out
4
0
Even if a compiler supports an empty array declaration, sizeof(a) / sizeof(a[0])
construction is always correct to compute a number of array
elements. Plus, remember the compiler does not do any array access when
computing sizeof(a[0])
as the expression is not evaluated (see
the sizeof operator
), the compiler only uses the argument to determine the size.
Functions can be used to encapsulate some work, for code re-factoring, etc.
A function has a single return value and multiple input parameters.
- If you also need to extract an error code with the value or get multiple return values, that needs to be done via passing data via reference (more on that when we have pointers).
- I.e. this is not like Go that returns an error along with the data.
A function declaration is only declaring an API without its body. In C, it is called a function prototype and it consists of a type, a function name, and an parameter list in parentheses. For example:
int digit(int c);
int space(int c);
float myfun(int c, int i, float f);
When defining the function, its body is inclosed in {}
(just like the main
function).
int
return_a_number(void)
{
return (1000);
}
When we declare or define a function, the function has parameters. When we
call such a function though, we pass in arguments. So, for the above
mentioned function digit
, if called as digit(7)
, we passed argument 7
as
the parameter c
. Note that in the past these were also called, respectively,
formal parameters and actual parameters.
Also as with the main
function, you can use void
instead of the argument
list to say the function accepts no arguments. You could just use ()
to
indicate the function has no arguments but that is an old way of doing things
and in that case the compiler will not verify the number of arguments when
calling the function (you can check it out for yourself). So, always use
(void)
if the function has no arguments.
void
hola(void)
{
printf("Hola!\n");
}
A function call is an expression so you can use it as such:
printf("%d\n", return_a_number());
The default return value type is an int
but a compiler will warn if it is
missing. You should always specify the type in
C99+
. You may use void
which means the function returns no value.
Write your function declarations at the beginning of a C file or include those into a separate header file.
If a compiler hits a function whose prototype is unknown, warnings are issued. Remember 👀 hello-world1.c ? See also here:
$ cat test.c
int
main(void)
{
fn();
}
float
fn(void)
{
return (1.0);
}
$
$ gcc test.c
test.c:4:2: warning: implicit declaration of function 'fn' is
invalid in C99
[-Wimplicit-function-declaration]
fn();
^
test.c:8:1: error: conflicting types for 'fn'
fn(void)
^
test.c:4:2: note: previous implicit declaration is here
fn();
^
1 warning and 1 error generated.
Parameter names may be omitted in prototypes, i.e.:
int myfn(int, int);
A variable defined in function parameters or locally within the function overrides global variables. Each identifier is visible (= can be used) only within a region of program text called its scope. See C99, section 6.2.1 Scopes of identifiers for more information.
Within a function body, you may use its parameters as any other local variables.
int
myfn(int i, int j)
{
i = i * j;
return (i);
}
As arguments are always passed by value in C, the variable value is not changed in the caller:
/* myfn defined here */
int
main(void)
{
int i = 3;
printf("%d\n", myfn(i, i)); // will print 9
printf("%d\n", i); // will print 3
}
Local variables are stored on a stack or in registers.
Argument passing depends on bitness and architecture. E.g. 32-bit x86 puts them on the stack, 64-bit x64 ABI puts first 6 arguments to registers, the rest on a the stack.
Functions can be recursive. 👀 recursive-fn.c
/*
* Print 0..9 recursively using a function call fn(9).
*/
#include <stdio.h>
void
myfn(int depth)
{
if (depth != 0)
myfn(depth - 1);
printf("%d\n", depth);
}
int
main(void)
{
myfn(9);
}
🔧 Print 9..0 using a recursive function.
solution 👀 recursive-fn2.c
Arrays in C are not a first class object, rather it is an aggregation of elements. An array is one of aggregate types, see C99 spec 6.2.5 Types (paragraph 21) for more information.
An array cannot be returned from a function. A pointer to an array can be but more on that later.
Not allowing to return an array is done for efficiency as copying the whole array (by value) would be too expensive.
- Watch for array sizes if used within functions.
- Arrays as local variable may significantly increase the stack size (and a stack size is limited in threaded environments).
The following may or may not happen in your environment:
$ cc func-large-array.c
$ ./a.out
Segmentation fault: 11
A contiguous sequence of non-zero bytes (char
s) terminated by a zero byte is
called a string (C99 7.1.1).
char foo[] = { 'b', 'a', 'r', 0 };
Note that a string does not have to be declared as an array as shown above, it may be just a piece of memory that meets the definition of a string above.
The crucial fact is that a string is always terminated by a null (zero) byte, sometimes also called NUL. That is how C knows where the string ends. There is no metadata involved with a string, e.g. its length. To figure out the length of a string, the string must be sequentially searched for the first NUL.
The string length is the number of bytes preceding the null character.
It also means the size of such an character array is one byte more than the
number of non-zero characters in the string. It is because of the terminating
zero (\0
). The NUL belongs to the string though.
char foo[] = { 'b', 'a', 'r', '\0' };
printf("%zu\n", sizeof (foo)); // prints 4
Using '\0'
suggests it is a terminating null character but it is still just a
zero byte. So, you could use 0
, as follows, but it is not generally used:
char foo[] = { 'b', 'a', 'r', 0 };
- To print a string via
printf
(), you use the%s
conversion specifier:
char foo[] = { 'b', 'a', 'r', 0 };
printf("%s\n", foo); /* will print "bar" without the quotes */
🔧 What happens if you forget to specify the terminating zero in the above per-char initializator and try to print the string ?
Note that the following array of characters contains three strings:
char foo[] = { 'a', '\0', 'b', 0, 'c', 0 };
And a character array does not need to be or contain a string at all:
char not_a_string[] = { 'h', 'i' };
A special case of of initializing a character array that is to represent a string is using double quotes. The terminating zero will be the last character of the array.
char mys[] = "This is a string";
int i = 0;
/* Now print the string from the array one by one. */
while (mys[i] != '\0')
printf("%c", mys[i++]);
printf("\n");
/* Normally, you would of course do the following. */
printf("%s\n", mys);
Note that the above really creates a local variable on the stack with its memory initialized from the string when the code is executed.
$ cc init-array-from-string.c
$ strings a.out | grep bar
foobar
Let's look at the following code. Obviously, the function does not return its value while it definitely should have. What happens if we use the function return value anyway?
#include <stdio.h>
int
addnum(int n1, int n2)
{
int n = n1 + n2;
}
int
main(void)
{
printf("%d\n", addnum(1, 99));
}
It simply cannot work, right? Oh, wait..., it does!?!
$ cc why-it-works.c
$ ./a.out
100
So what happened???
Well, it does not really work. It "works" by accident. Let's look at it more closely. To establish a common environment, I will be using the Linux lab at Malá Strana which you all have access to, so that you can try and see.
Before I begin, let me give you more information:
- It depends on a system and its version, and a compiler and its version.
- You can also try the
clang
compiler later on the same machine.
- You can also try the
- Will be using
gcc
which defaults to generate 64 bit binaries on the Linux distro installed on u-pl3.ms.mff.cuni.cz (and other machines in the lab). - 64 bit binaries on x86 use X86-64 ABI
- ABI is not API
- By this ABI (it's law!), the first two function integer arguments are passed through the general purpose edi and esi registers
- Integer function return value is passed back to the caller via the general purpose eax register
- The stack on x86 grows down
Let's compile the code and disassemble it. All we need are the main
and
addnum
function. See my notes inline.
$ cc why-it-works.c
$ objdump -d a.out
...
<removed non-relevant stuff>
...
0000000000001145 <addnum>:
1145: 55 push %rbp
1146: 48 89 e5 mov %rsp,%rbp
- initialize a frame pointer for this function.
1149: 89 7d ec mov %edi,-0x14(%rbp)
114c: 89 75 e8 mov %esi,-0x18(%rbp)
- here we put both function arguments on a stack. As we already
learned, function arguments may be used within a function as local
variables, so they need to be on a stack.
- why we put them to -0x14 and -0x18 offsets from the base? That's just
what this gcc version does. They are 4 bytes apart as we work with 4
byte integers.
114f: 8b 55 ec mov -0x14(%rbp),%edx
1152: 8b 45 e8 mov -0x18(%rbp),%eax
- moved the values of our local variables to general purpose registers
1155: 01 d0 add %edx,%eax
- sum up the values. As we add edx to eax, the result is in eax.
1157: 89 45 fc mov %eax,-0x4(%rbp)
- put the result to the local variable "n" which happens to be at offset
-0x4 from the frame pointer.
- now, see above, register eax is used in x86 64 ABI for the function
return value. So, by accident, we have the "right" value in the
register that is supposed to hold the return value!!!
115a: 90 nop
115b: 5d pop %rbp
115c: c3 retq
000000000000115d <main>:
115d: 55 push %rbp
115e: 48 89 e5 mov %rsp,%rbp
1161: be 63 00 00 00 mov $0x63,%esi
1166: bf 01 00 00 00 mov $0x1,%edi
- put the values 1 and 99 to the registers representing the 1st and the
2nd argument in x86 64 bit ABI
116b: e8 d5 ff ff ff callq 1145 <addnum>
- called our function from main()
1170: 89 c6 mov %eax,%esi
- x86 64 ABI expects the function return value in the register eax, so
we just put it to the input register esi representing the 2nd argument
for the printf() function.
- and, given that we happened to have the right value in eax, the code
looks like it works.
1172: 48 8d 3d 8b 0e 00 00 lea 0xe8b(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
1179: b8 00 00 00 00 mov $0x0,%eax
117e: e8 bd fe ff ff callq 1040 <printf@plt>
1183: b8 00 00 00 00 mov $0x0,%eax
1188: 5d pop %rbp
1189: c3 retq
118a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
What if we increment variable n
, does it change anything?
int
addnum(int n1, int n2)
{
int n = n1 + n2;
++n;
}
$ gcc main.c
./a.out
100
Apparently not, let's see the disassembled code:
0000000000001145 <addnum>:
1145: 55 push %rbp
1146: 48 89 e5 mov %rsp,%rbp
1149: 89 7d ec mov %edi,-0x14(%rbp)
114c: 89 75 e8 mov %esi,-0x18(%rbp)
114f: 8b 55 ec mov -0x14(%rbp),%edx
1152: 8b 45 e8 mov -0x18(%rbp),%eax
1155: 01 d0 add %edx,%eax
1157: 89 45 fc mov %eax,-0x4(%rbp)
115a: 83 45 fc 01 addl $0x1,-0x4(%rbp)
- OK, the compiler just directly adds 1 to the memory address holding
local variable "n". As it does not change the register eax, it still
"works".
115e: 90 nop
115f: 5d pop %rbp
1160: c3 retq
So, let's involve another local variable.
int
addnum(int n1, int n2)
{
int n = n1 + n2;
int i = 13;
n = i + n;
}
$ gcc main.c
$ ./a.out
13
Bingo! It no longer "works". What happened?
0000000000001145 <addnum>:
1145: 55 push %rbp
1146: 48 89 e5 mov %rsp,%rbp
1149: 89 7d ec mov %edi,-0x14(%rbp)
114c: 89 75 e8 mov %esi,-0x18(%rbp)
114f: 8b 55 ec mov -0x14(%rbp),%edx
1152: 8b 45 e8 mov -0x18(%rbp),%eax
1155: 01 d0 add %edx,%eax
- this was summing up our two function arguments
1157: 89 45 f8 mov %eax,-0x8(%rbp)
- the result went to the local variable "n"
115a: c7 45 fc 0d 00 00 00 movl $0xd,-0x4(%rbp)
- we initialized local variable "i" (0xd in hexa is 13 in decimal)
1161: 8b 45 fc mov -0x4(%rbp),%eax
- See? Now we used eax for further arithmetic operations. We just put
the current value of local variable "i" to the register.
1164: 01 45 f8 add %eax,-0x8(%rbp)
- and we added the register value to the present value of local variable
"n". And as 13 was left in eax, that served as the function return
value.
1167: 90 nop
1168: 5d pop %rbp
1169: c3 retq
🔧 Try the clang
compiler and figure out what happened there.
$ clang main.c
main.c:7:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
1 warning generated.
$ ./a.out
0
What we should take away from this situation:
- Anything looking as working does not mean it is correct.
- Ideally, if possible, test on different architectures (like x86, SPARC, ARM, etc).
- Using different compilers and different systems may help as well. For
example, if you develop in a Linux distro with
gcc
, testing it also on a macOS laptop would be worth it (C compiler in macOS Xcode IDE isclang
). - If something magically stops working that did work before, be ready for breakage like this. Even something that has worked for ages does not necessarily means the code must have been correct.
- Always use
-Wall -Wextra
GCC options when building your code. More on that later in the seminar. See thatclang
warns by default which is a good thing.
Do you want more information on x86 ABI and how it really works behind the scene? Search for Solaris Crashdump Analysis Frank Hofmann for more details.
You probably do not want all the gory details though...
Note that home assignments are entirely voluntary but writing code is the only way to learn a programming language.
Write a simple program on counting the digit occurence as in one of the previous classes but now use an array for counting the figures.
Print total number of (decimal) numbers, white space (tab, space, newline) and anything else from the input. I.e. the program prints out three numbers.
Obviously, reuse code you wrote for digit occurence counting: count-digit-occurence.md
- write the program using direct comparisons like this:
c >= '0' && c <= '9'
c == ' ' || c == '\t' || c == '\n'
- put both the checks into separate functions
- then write a new version of the program and use:
-
isspace
() from C99 -
isdigit
() vsisnumber
() - the latter detects digits + possibly more characters (depending on locale setting)
-
- Count occurrences of all ASCII letters on standard input and print out a histogram
- This will be case insensitive (i.e. implement
mytolower(int)
first) - The output will look like this:
$ cat /some/file | ./a.out
a: ***
b: *
c: *****************************
...
z: *******
$
- Use a function to print a specific number of stars.
- Use a function to do all the printing.
- Declare array(s) to be as efficient as possible.
- That is, having the lowest possible size.
- Only use global variables if necessary.