08.md

Recap

  • pointer is just a variable, stores an address of another object
  • NULL by definition unmapped (modulo stuff like the 0@0.so library on Solaris - for the curious see the ld.so.1 manual page there)
  • to get the value of the underlying data, use the dereference operator (*)
  • sizeof() applied to a pointer will get the size of its storage
    • sizeof() applied on a dereferenced pointer will get the size of the underlying data object
  • array "name" is a pointer to its 1st elem
    • the name itself is not a modifiable expression
  • subscripting arrays vs. pointer arith a[i] = *(a + i)
  • using the pointer variable is using its value, i.e. the address
  • pointer decrement/increment advances by the size of the underlying type
  • addr operator & can be used on variables
  • pointers can be subtracted, not added (no sense)
  • void pointer can be used to assign back and forth
  • an array passed as an argument to a function is converted to a pointer to its first element
  • function arguments are always passed by value
    • corollary: to change an argument of a function, it is necessary to use a pointer

🔧 warm-up: implement strcmp()

Implement

int strcmp(const char *s1, const char *s2);

returns (according to the strcmp(3) manual page)

an integer greater than, equal to, or less than 0, according as the string s1 is greater than, equal to, or less than the string s2.

🔑 👀 strcmp.c

🔧 task: compare your solution with the above. Try to reimplement it so that it is as smallest (in terms of Lines Of Code) as possible.

Program arguments

int main(int argc, char *argv[]);
  • argv is declared as an array of pointers

    • i.e. argv[i] is a pointer to char
  • the arguments of main() can have arbitrary names however please stick to the convention to avoid confusion of those who might be reading your program

  • argc is a number of command line arguments, including the command name itself (in argv[0]).

  • argv[i] are arguments as strings. Note, they are strings even if you put numbers there on the command line.

  • argv[argc] is NULL by definition.

Note: remember (see notes about array passed to function ) that in a function argument, an array is always treated as a pointer so the above effectively becomes:

        int main(int argc, char **argv);

i.e. in this context, char *argv[] and char **argv are the same.

The declaration merely hints at the memory layout.

Also, you already know that you can use an array notation with characters as well, so you could use argv[i][j] to print individual characters. Just make sure that it's not out of range.

Code: 👀 argv-as-2d-array.c

  • the memory for argc, argv is allocated before main() is called

    • the standard (C99) leaves unspecified where argc/argv are stored

      section 5.1.2.2.1: the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

  • the argv is array of pointers to null-terminated strings and must be terminated by a NULL pointer. (quote from the execve(2) man page)

  argv
  +-----------+
  |           |---------->+--------------+
  +-----------+           |              |---->+---+---+---+---+----+
                          +--------------+     | p | r | o | g | \0 |
                          |              |-\   +---+---+---+---+----+
  argc                    +--------------+  \
  +----------+            |              |-  \->+---+---+---+----+
  |    3     |            +--------------+ \    | f | o | o | \0 |
  +----------+            |     NULL     |  \   +---+---+---+----+
                          +--------------+   \
                                              ->+---+---+---+----+
                                                | b | a | r | \0 |
						+---+---+---+----+

🔧 Task: print command line arguments

  • print all command line arguments using argc
  • print all command line arguments using just argv
  • print all command line arguments not starting with -
  • print all command line arguments using a recursive function (that accepts pointer to pointer to char)

Note: for all arguments print their address as well

Note: do not print the terminating NULL entry

  • some printf() implementations barf on NULL pointer when printing via the %s format string

🔑 Code:

🔧 Task: get char distance in specific argument

write a program with usage ./a.out <a> <b> <string> to find a distance between the first occurence of character <a> and <b> in a string <string>. If either of the character is not found in the string, print an error.

./a.out a x "ahello xworld"
7

Note: do not use strchr() or the like.

🔑 Code: 👀 argv-char-dist.c

usage: semi-formal specification of program arguments

  • usually used to print when invalid option or arguments are specified
  • can be handled via errx()
  • the usage usually contains program name followed by the argument schema
    • see e.g. the nc(1) man page
  • optional arguments are enclosed in square brackets, mandatory arguments are enclosed in <> or left without brackets

Task: write a program that takes 1 or 2 arguments, if run with any other count, print usage and exit.

🔑 Code: 👀 usage.c

🔧 Task: print the n-th character of the r-th argument (count from 0)

usage: ./a.out <r> <n> [args]

do not count argv[0..2]. If not enough arguments or the argument is not long enough, print a helpful message. Only use pointer arithmetics, do not use square brackets (ie. argv[i][j] is not allowed).

./a.out 2 3 hey hi world
l

Note: use atoi() to convert the first 2 arguments to integers

🔑 Code: 👀 argv-nr.c

🔧 Task: what do these programs do when run with 2 arguments

Assume that the arguments are sufficiently long enough.

Skipping ahead: prefix ++ and dereference operator have the same precedence so they are evaluated based on associativity which is right-to-left.

int
main(int argc, char **argv)
{
	printf("%s\n", ++*++argv);
}

int
main(int argc, char **argv)
{
	printf("%s\n", argv[1]);
	printf("%s\n", ++*++argv);
	printf("%s\n", argv[0]);
	printf("%s\n", ++*++argv);
	printf("%s\n", argv[0]);
}

int
main(int argc, char **argv)
{
	printf("%s\n", *++*++argv);
}

Note: the last function might not compile with smarter compilers (such as LLVM) that include format string checks. What is expected to happen if it the last piece of code does compile and is run with one argument?

Code:

Structures

Basics

  • collection of one or more members, possibly of different types, grouped together under a single name

  • structures permit group of related members to be treated as a unit (precursor to a class in Object Oriented Programming)

  • structures can contain other structures

  • structure is specified as:

struct foo {
	... // members
};

e.g.

struct foo {
	int a;
	char b;
};
  • any type can be a member of a structure except the structure itself

    • however: a pointer to its own type is possible (remember, a pointer is just a number referencing a piece of memory)
    • unlike in C++, structure cannot contain functions. It may contain pointers to functions, though.
  • structure does not have to have a name, 👀 struct-unnamed.c

    • however then its use is limited to variable declaration
    • one can even have an "anonymous structure", however that is a C11 extension, 👀 struct-anon.c
  • struct declaration cannot contain initializers. However, the structure can be initialized with a list of initializers in the same way as arrays.

  • define a variable:

struct foo foo;
  • usually the _s postfix is used to denote a structure name

Note: the struct keyword has to be used for its definition and declaration:

foo foo; is not valid.

  • can declare structure and its variables at the same time:
struct foo_s {
	...
} foo;
  • however usually this is not done because structures are normally saved to header files (and including such a header file would mean a variable definition which is usually not desirable)

  • for better code readability, members are sometimes prefixed with a letter to denote their structure type, e.g.:

// 'sin' is a shortcut for 'Sockaddr_IN', the Internet socket
// address
struct sockaddr_in {
	short   sin_family;
	u_short sin_port;
};
  • another reason is when looking for variable names in a big source code repository (using ctags or cstyle), there would be large amount of generally named variables like port, size, etc in various source code files. However, with the prefix, like sin_port, very often you find just one, the one you are looking for.

Struct layout in memory

struct X { int a; char b; int c; };
  • the offset of the first member will be always 0

  • other members can be padded to preserve self-alignment (i.e. member is always aligned in memory to multiple of its own size)

    • the value of the padding bits is undefined
  • what will be the result of sizeof (struct X) ?

    • why ? (think about efficiency of accessing members that cross a word in memory)
  • what if char d is added at the end of the data structure ?

    • why is that ? (think about arrays and memory access again)
  • what if char *d is added at the end of the data structure ? (i.e. it will have 4 members)

    • assume this is being compiled on 64-bit machine
    • for efficiency the access to the pointer should be aligned to its size
    • if in doubt, draw a picture
+-----------+----+--------+------------+
|     a     | b  |   pad  |      c     |
+-----------+----+--------+------------+
  • does the compiler reorder struct members ? no, C is designed to trust the programmer.

code: 👀 struct-X.c

note: gcc/Clang has the -fpack-struct option that will condense the members at the expense of speed when accessing them. Use only when you know what you are doing as it may not be safe on all architectures.

link: http://www.catb.org/esr/structure-packing/

Struct members

  • members are accessed via 2 operators: . and ->

    • infix, in the group of operators with the highest precedence
    • -> is used if the variable is a pointer, . otherwise
  • e.g.:

struct foo_s {
	int a;
	char b;
} foo;

foo.a = 42;
foo.b = 'C';
  • the . and -> operators have higher precedence than * and &, so:

    &foo.b gets the address of the member b

  • structure assignment

struct foo_s one, two;

one = two;
  • is done byte by byte (shallow copy - does not follow pointers)

    • handy for members that are pointers
    • on the other hand for large structures (say hundreds of bytes) this can be quite an expensive operation
  • pointers to structures:

struct foo_s *foo;

foo->a = 42;
foo->b = 'C';

code: 👀 struct-reference.c

🔧 Task: write the above assignments to the members a and b using a de-reference operator on foo

code: 🔑 👀 struct-access.c

🔧 now if a was a pointer to integer, how would the code change ?

code: 🔑 👀 struct-access-ptr.c

Struct initialization

  • can initialize in definition using the initiator list of constant values
struct foo_s {
	int a;
	char b;
};

struct foo_s foo = { 1, 'C' };

code: 👀 struct-init.c

  • or using 'designated initializers' from C99:
struct foo_s foo = {
	.b = 'C',
	.a = 1,
};
  • the ordering in the struct declaration does not have to be preserved

  • omitted field members are implicitly initialized the same as objects that have static storage duration (ie. will be initialized to 0).

code: 👀 struct-designated-init.c

Operations on structures

You can only:

  • copy a structure
  • assign to it as a unit
  • taking its address with &
  • access its members

So, structures cannot be:

  • compared
  • incremented (obviously)

🔧 Task: animals as structures

define array of structures of this type:

struct animal {
	char name[NAME_MAX];	// max filename length should be sufficient
				// even for these long Latin names
	size_t legs;		// can have many legs
};

and initialize it with some samples (can store the array in animals.h) and implement a function:

size_t count_minlegs(struct animal *, size_t len, size_t min);

that will return number of animals in the array (of len items) that have at least min legs.

Notice that the function returns size_t. This way it is ready for future expansion. If it returned unsigned int and 32-bits was not found enough later on, the prototype would have to be changed which would cause problems for the consumers of this API.

The function will be implemented in a separate file. (Do not forget to create a header file(s).)

In the main() program (first arg will specify the min parameter of the function) pass the array of structs to the function and report the result.

Note: will need:

  • limits.h for the NAME_MAX definition
  • stddef.h for size_t (as per C99, §7.17)

🔑 code:

Note: for compilation it is only necessary to compile the *.c files and then link them together.

It can be done e.g. like this:

  cc struct-animals.c animal_minlegs.c

where the compiler will do the compilation of the individual object files and then call the linker to contruct the binary (named a.out).

Or like this:

  cc -c struct-animals.c animal_minlegs.c
  cc -o animals struct-animals.o animal_minlegs.o

which is closer to what would be done using a Makefile.

Technically, animals.h contains code, however, given it is included in a .c file it is not necessary to compile it individually.

🔧 Task: use the code from previous task and implement (in separate .c file)

  static size_t getlegs(struct animal *);

that will return number of legs for a given animal.

animals: maximum number of legs

implement:

  struct animal *maxlegs(struct animal *, size_t len);

that will use the getlegs() function and will return an animal with highest leg count. Return pointer to the structure (= array element) from the function.

The main() function (in separate file) will define an array of animals and will call maxlegs(). The name of the animal with maximum number of legs will be printed to standard output.

Note: does the original structure change if the structure returned from the function was modified within the function? How to fix this ?

🔑 code:

🔧 animal sorting

🔧 (home) Task: sort the array by number of legs, print it out to standard output.

🔧 Task: sort the array by the animal name. Print it out to standard output. Use strcmp() to do the comparison of names.

🔧 Task: add a function that will sort according to the number of legs

Make the comparison functions static.

Use the standard libc sort function qsort(3). Check the manual page on how it's used. You will need to define a callback function that the qsort() function will use to compare two array elements.

Make the program to accept an argument (0 or 1) and run the sorting function based on that.

🔑 code:

Operator precedence

There are 15 levels of operator precedence, see the table on http://en.cppreference.com/w/c/language/operator_precedence

Associativity

If there are multiple operators with the same precedence in an expression, the evaluation is decided based on associativity.

For example:

8 / 2 % 3

has 2 operators with precedence level of 3 that have left-to-right associativity. Therefore, they will be evaluated as

(8 / 2) % 3

Examples

*p++ is *(p++) as ++ is of higher priority than *. However, the value of the expression is still *p though as p is incremented after the expression is evaluated.

🔧 Task: determine the outcome of these expressions/declarations:

  • *p++
  • ++*p
  • int *p[2]
  • int (*p)[3]

Operand evaluation order

Consider the following:

int foo(void);
int bar(void);

int x = foo() + bar();

The standard does not say how the evaluation will be done. foo() can be called before or after bar(). If we add another function:

int foo(void);
int bar(void);
int another(void);

int x = foo() + bar() + another();

then the expression will become (foo() + bar()) + another() however the order in which foo() and bar() will be called is still undefined.

Common gotchas

== or != versus =

the condition in the statement:

if ((c = getchar()) != 0)
    ...

needs to be bracketed this way because =/!= has higher precedence than =.

& or * versus -> or .

-> and . (structure member access) have higher precedence than & (address of) or * (dereference)

🔧 consider structure

struct bar {
	int val;
} bar;

struct {
	int a[42];
	char *b;
	struct bar *c;
} foo;

initialize members of foo with 1, 2, 3 and "BBB", respectively and bar with 42. Use designated initializers.

write these expressions to get:

  • the address of a
  • the address of b
  • the address of the second item of a
  • the address of the 3rd character from string b
  • the 3rd character from string b
  • value of val in bar using foo
  • address of val in bar using foo

Use as few brackets as possible.

solution: 👀 struct-op-precedence.c