08.md

Warm-up

🔧 Implement strcmp

Implement:

int strcmp(const char *s1, const char *s2);

Returns (according to the strcmp(3) manual page):

an integer greater than, equal to, or less than 0, according as the string s1 is greater than, equal to, or less than the string s2.

You can add the following piece of code into your main function to test it (and include assert.h):

assert(strcmp("", "") == 0);
assert(strcmp("", "foo") < 0);
assert(strcmp("foo", "") > 0);
assert(strcmp("abc", "abd") < 0);
assert(strcmp("foo", "bar") > 0);
assert(strcmp("foo", "foo") == 0);
assert(strcmp("foo", "fooz") < 0);
assert(strcmp("fooz", "foo") > 0);

If any assert triggers, your implementation is buggy.

🔑 strcmp.c

🔧 Compare your solution with the above. Try to reimplement it so that it is as smallest (in terms of Lines Of Code) as possible.

Program arguments

int main(int argc, char *argv[]);
  • The argv is declared as an array of pointers.

    • i.e. argv[i] is a pointer to char
  • The arguments of main() can have arbitrary names however please stick to the convention to avoid confusion of those who might be reading your program.

  • The argc is a number of command line arguments, including the command name itself (in argv[0]).

  • argv[i] are arguments as strings. They are strings even if you put numbers there on the command line.

  • argv[argc] is a null pointer by definition.

Note: remember (see notes about array passed to function ) that in a function argument, an array is always treated as a pointer so the above effectively becomes:

int main(int argc, char **argv);

I.e. in this context, char *argv[] and char **argv are the same thing and there is not a preferred option.

The declaration merely hints at the memory layout. That is how it was concieved by the fathers of C, unfortunately it often causes confusion.

Also, you already know that you can use an array notation with strings as well, so you could use argv[i][j] to print individual characters. Just make sure that it is not out of range.

👀 argv-as-2d-array.c

  • The memory for argc, argv is allocated before main() is called and the C99 standard leaves unspecified where argc/argv are stored.

    section 5.1.2.2.1: the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

  • A quote from the execve(2) man page on Unix systems:

The argv is an array of pointers to null-terminated strings and must be terminated by a null pointer.

If unsure, draw a diagram. The memory addresses are just examples:

  argv
  +-----------+
  | 0xFF00    |---------->+--------------+
  +-----------+    0xFF00 |   0xBB0000   |------------>+---+---+---+---+----+
                          +--------------+   0xBB0000  | p | r | o | g | \0 |
                   0xFF08 |   0xFFAA00   |-\           +---+---+---+---+----+
  argc                    +--------------+  \
  +----------+     0xFF10 |   0xCCFF00   |-  \----------->+---+---+---+----+
  |    3     |            +--------------+ \    0xFFAA00  | f | o | o | \0 |
  +----------+            |     NULL     |  \             +---+---+---+----+
                          +--------------+   \
                                              ->+---+---+---+----+
                                      0xCCFF00  | b | a | r | \0 |
						+---+---+---+----+

🔧 Task: print command line arguments

  • Print all command line arguments using argc
  • Print all command line arguments using just argv
  • Print all command line arguments not starting with -
  • Print all command line arguments using a recursive function (that accepts pointer to pointer to char).

Note: for all arguments print their address as well.

Note: do not print the terminating null pointer entry.

Some printf() implementations barf on a null pointer when printing via the %s format string.

Code:

Also see:

🔧 Print command line arguments (part II.)

  • Print all command line arguments without using square brackets.
  • As above but do not use any variable aside from argv.

🔧 Get char distance in specific argument

Write a program with usage ./a.out <a> <b> <string> to find a distance (number of characters) between the first occurence of character <a> and <b> in a string <string>. If either of the character is not found in the string, print an error.

./a.out a x "ahello xworld"
7

Note: do not use strchr() or the like.

🔑 argv-char-dist.c

Usage string: a semi-formal specification of program arguments

  • Usually used to print when invalid option or arguments are specified
  • Can be handled via errx()
  • The usage usually contains program name followed by the argument schema
  • See e.g. the nc(1) man page
  • Optional arguments are enclosed in square brackets, mandatory arguments are enclosed in <> or left without brackets

🔧 Write a program that takes 1 or 2 arguments. If run with any other count, print a meaningful usage and exit.

🔑 usage.c

🔧 Print the n-th character of the r-th argument (count from 0)

Usage: ./a.out <r> <n> [args]

Ignore argv[0], argv[1], and argv[2]. If there are not at least n extra arguments or the n-th argument is not long enough, print a helpful message. Only use pointer arithmetics, do not use square brackets (ie. argv[i][j] is not allowed).

./a.out 2 3 hey hi world
l

Note: use atoi() to convert the first 2 arguments to integers

🔑 argv-nr.c

🔧 What do these programs do when run with 2 arguments

Assume that the arguments are sufficiently long enough.

Skipping ahead: prefix ++ and dereference operator * have the same precedence so they are evaluated based on associativity which is right-to-left.

int
main(int argc, char **argv)
{
	printf("%s\n", argv[1]);
	printf("%s\n", ++*++argv);
	printf("%s\n", argv[0]);
	printf("%s\n", ++*++argv);
	printf("%s\n", argv[0]);
}

now with extra dereference:

int
main(int argc, char **argv)
{
	printf("%s\n", *++*++argv);
}

Note: the last function might not compile with smarter compilers (such as LLVM) that include format string checks. What is expected to happen if the last piece of code does compile and is run with one argument?

Code:

Structures

Basics

  • Collection of one or more members, possibly of different types, grouped together under a single name.

  • Is one of the two aggregate types (the other aggregate type is the array)

  • Structures permit group of related members to be treated as a unit (precursor to a class in Object Oriented Programming).

  • Structures can contain other structures.

  • Structure is specified as:

struct foo {
	... // members
};

e.g.

struct foo {
	int a;
	char b;
};
  • Any type can be a member of a structure that it not incomplete and not a function. Incomplete means its size is unknown; more on that later.

    • That means a structure may not contain itself (before the structure definition is finished with the terminating }, it is an incomplete type as its size is not yet known)
    • However: a pointer to its own type is possible (remember, a pointer is just a number referencing a piece of memory, and its size is known)
    • Unlike in C++, structure cannot contain functions. It may contain pointers to functions, though.
  • Structure does not need a name, 👀 struct-unnamed.c

    • However then its use is limited to a variable declaration
    • One can even have an "anonymous structure", however that is a C11 extension, 👀 struct-anon.c
  • The struct declaration itself cannot contain initializers. However, the structure can be initialized with a list of initializers in the same way as arrays.

So, you cannot do:

struct foo {
	int a;
} = { 1 };
  • When the structure has been defined, you can declare a variable of its type:
struct foo f;
  • Sometimes the _s or _st postfix is used to hint that a name is a structure

Note: the struct keyword has to be used for its definition and declaration:

foo f; is not valid.

  • Can declare structure and objects of its type at the same time, and you can also initialize it at the same time:
struct foo_s {
	...
} foo = { 0, ... };
  • However, this is unusual because structures are normally saved to header files, and including such a header file would mean actual object definition(s) which is rarely desirable.

  • For better code readability and also to be able to search for the members in (large) code base, members are often prefixed with a common string ending with an underscore to denote their structure membership, e.g.:

/*
 * 'sin' is a shortcut for 'Sockaddr_IN', the Internet socket address
 */
struct sockaddr_in {
	short   sin_family;
	u_short sin_port;
};
  • When looking for variable names in a big source code repository (using ctags, cstyle or tools such as OpenGrok), there would be large amount of generally named variables like port, size, etc in various source code files. However, with the prefix, like sin_port, very often you find just one, the one you are looking for.

Structure layout in memory

struct X { int a; char b; int c; };
  • The offset of the first member will be always 0.

  • Other members will be padded to preserve self-alignment (i.e. a member is always aligned in memory to multiple of its own size).

    • The value of the padding bits is undefined by definition and you must not rely on it.
  • What will be the result of sizeof (struct X) above?

    • Why? (think about efficiency of accessing members that cross a word in memory).
  • What if char d is added at the end of the data structure?

    • Why is that? (think about arrays and memory access again).
  • What if char *d is added at the end of the data structure? (i.e. it will have 4 members).

    • Assume this is being compiled on 64-bit machine.
    • Again, for efficiency the access to the pointer should be aligned to its size.
    • If in doubt, draw a picture.
+-----------+----+--------+------------+
|     a     | b  |   pad  |      c     |
+-----------+----+--------+------------+
  • Does the compiler reorder struct members? No, C is designed to trust the programmer.

👀 struct-X.c

Note: gcc/Clang has the -fpack-struct option that will condense the members at the expense of speed when accessing them. Use only when you know what you are doing as it may not be safe on all architectures.

There is also attribute (or preprocessor pragma) than can be used on per structure basis.

Link: http://www.catb.org/esr/structure-packing/

Structure members

Members are accessed via 2 operators: . and ->

  • Infix operators, left-to-right associativity, both are in the group of operators with the highest precedence (priority)

  • -> is used if the variable is a pointer, . otherwise

  • E.g.:

struct foo_s {
	int a;
	char b;
} foo;

foo.a = 42;
foo.b = 'C';

The . and -> operators have higher precedence than * and &, so: &foo.b gets the address of the member b.

Structure assignment is done byte by byte (shallow copy - does not follow pointers):

struct foo_s one, two;

one = two;
  • Handy for members that are pointers.
  • On the other hand for large structures (say hundreds of bytes) this can be quite an expensive operation.

Pointers to structures are often used:

struct foo_s *foo;

foo->a = 42;
foo->b = 'C';

👀 struct-reference.c

🔧 Write the above assignments to the members a and b using a de-reference operator on foo.

🔑 struct-access.c

🔧 now if a was a pointer to integer, how would the code change?

🔑 struct-access-ptr.c

🔧 Getting offset of a member

Write a macro (or start with a function with hardcoded values) that will print the offset of the specified member of a given structure.

      offsetof(struct X, a)

Hint: exploit the fact that pointer can be assigned an integer (0) + use pointer arithmetics

The macro is useful for debugging (mapping disassembly to C code based on literal offsets) and also when working with flexible array member.

Note: offsetof() is a standard macro available since ANSI C via stddef.h.

🔑 offsetof.c

Structure initialization

Can initialize a structure in its definition using the initiator list of values.

You must either follow the ordering of members:

struct foo_s {
	int a;
	char b;
	char *s;
};

struct foo_s foo = { 1, 'C', "hello world" };

👀 struct-init.c

or use designated initializers from C99:

struct foo_s foo = {
	.b = 'C',
	.a = 1,
};

The ordering in the struct declaration does not have to be preserved (but you really should follow it though).

Omitted field members are implicitly initialized the same as objects that have static storage duration (ie. will be initialized to 0).

👀 struct-designated-init.c

Operations on structures

You can only:

  • Copy a structure.
  • Assign to it as a unit.
  • Taking its address with &.
  • Access its members.

So, structures cannot be:

  • Compared (for that one has to implement a comparator function).
  • Incremented (obviously).

🔧 Animals as structures

Define array of structures of this type:

struct animal {
	char name[NAME_MAX];	// max filename length should be sufficient
				// even for these long Latin names
	size_t legs;		// can have many legs
};

And initialize it with some samples (can define the array in animals.h) and implement a function:

size_t count_minlegs(struct animal *, size_t len, size_t min);

That will return number of animals in the array (of len items) that have at least min legs.

Notice that the function returns size_t. This way it is ready for future expansion. If it returned unsigned int and 32-bits was not found enough later on, the prototype would have to be changed which would cause problems for the consumers of this API.

The function will be implemented in a separate file. (Do not forget to create a header file(s).)

In the main() program (first program argument will specify the min parameter for the function) pass an array of structures to the count_minlegs function and report the result.

Note: you will need:

  • limits.h for the NAME_MAX definition
  • stddef.h for size_t (as per C99, §7.17)

Code:

Note: for compilation it is only necessary to compile the *.c files and then link them together.

It can be done e.g. like this:

cc struct-animals.c animal_minlegs.c

The compiler will do the compilation of the individual object files and then call the linker to contruct the binary (named a.out).

Or as follows which is closer to what would be done using a Makefile:

cc -c struct-animals.c animal_minlegs.c
cc -o animals struct-animals.o animal_minlegs.o

Technically, animals.h contains code, however, given it is included in a .c file it is not necessary to compile it individually.

🔧 Use the code from previous task and implement (in separate .c file)

static size_t getlegs(struct animal *);

That will return number of legs for a given animal.

🔧 Home assignment

Note that home assignments are entirely voluntary but writing code is the only way to learn a programming language.

Animals: maximum number of legs

Implement:

struct animal *maxlegs(struct animal *, size_t len);

It will use the getlegs() function and will return an animal with highest leg count. Return pointer to the structure (= array element) from the function.

The main() function (in separate file) will define an array of animals and will call maxlegs(). The name of the animal with maximum number of legs will be printed to standard output.

Note: does the original structure change if the structure returned from the function was modified within the function? How to fix this ?

Code:

🔑 animal_maxlegs.c 🔑 maxlegs.c 🔑 animals.h 🔑 animal.h