Print argv[1]
in binary (assume it is a correct positive decimal number
that will fit unsigned int
) to standard output.
Note that printf
(3) does not have a conversion specifier for it (unlike 'x'
for hexa and 'o' for octal). Limit the input to positive int
s.
Do not use bit operators even if you know how to (ie. do NOT use >>
etc.)
Implement the functionality in void binprint(int)
and call it from main()
.
To verify, use bc(1) with obase=2
. E.g.:
$ bc
obase=2
10
1010
255
11111111
2^31-1
1111111111111111111111111111111
$ ./a.out 2147483647
1111111111111111111111111111111
$ ./a.out 255
11111111
$ ./a.out 348508345
10100110001011101000010111001
You can also use obase=8
, obase=16
, etc. Note that bc(1) takes only capital
letters as digits, ie. use "E4" for 0xe4, not "e4". "e4" is treated as a
variable:
$ bc
ibase=16
obase=2
e4
0
E4
11100100
Bitwise operators are as follows:
Operator | Meaning |
---|---|
& |
bitwise AND |
| |
bitwise OR |
^ |
bitwise XOR |
<< |
left shift |
>> |
right shift |
~ |
one's complement (unary) |
Note, individual bits are being processed. I.e.:
316 & 978 -> convert to binary to see it
100111100 & 1111010010 -> do logical AND bit by bit now
0100111100
1111010010
----------
0100010000 == 272 (use ibase=2 with bc(1) to verify)
So,
((338 & 978) == 272). // == is of higher prority than &
When shifting to the left, the lower bits are filled with zeroes.
When shifting to the right, it depends on the signedness of the number. For unsigned values the upper bits are filled with zeroes. If the number is signed and negative (i.e. starts with bit 1 in two's complement representation), the behavior is implementation dependent. Usually the value it is sign-extended, i.e. filled with 1's.
n = n & 01777; // zero out some highest bits
/*
* Setting individual flags. Note that each of the flags has
* just one bit set.
*/
#define LIGHT_OFF 0x0000
#define LIGHT_GREEN 0x0001
#define LIGHT_RED 0x0002
#define LIGHT_BLUE 0x0004
#define LIGHT_YELLOW 0x0008
#define LIGHT_VIOLET 0x0010
#define LIGHT_WHITE 0x0020
// ...
lights = lights | LIGHT_YELLOW; // turn on yellow light
lights = lights & ~LIGHT_RED; // turn off red light
Another common case is to use an integer type to store multiple kinds of
information, with mask to extract the parts. Here, a unsigned short
type
stores the byte value and associated flags:
#define VALUE_MASK 0xff
#define FLAG_MASK 0xff00
#define ONE 0x0100
#define TWO 0x0200
unsigned short n = 16 | ONE | TWO;
if ((n & FLAG_MASK) & ONE)
printf("ONE\n");
printf("%hhu\n", n & VALUE_MASK);
When working with bitwise operators, operands are promoted the same way as we learned in Conversions and promotions
That means:
13U & -1 -> promote int (-1) to unsigned,
i.e. promote -1 to unsigned.
You get:
13 & 4294967295 -> now convert to binary
1101 & 11111111111111111111111111111111 -> the result clearly is 1101
👀 conv-and-prom-with-bitwise-ops.c
🔧 Task: rewrite print-in-binary.c
using bitwise operators
🔧 Task: count 1 bits in a number (argv[1]
).
$ ./a.out 337
4
$ ./a.out 13375
9
Task: write a program that counts the number of bits that need to be flipped so that number A becomes number B.
Note: use the character representation only for debug prints. The actual operations with the numbers are to be done with bit operations on integers.
Note: consider only positive integers, assume int
.
Sub-task for home: flip those bits and verify that the numbers are the same.
Task: detect if binary representation of an unsigned int
has alternating bits
in the whole width of the type. (e.g. 01010101
if the type was char
).
The answer from the program will be yes/no (printed to standard output).
Start with naive/straightforward implementation, i.e. traverse all the bits (O(n) time) and use bit operations/arithmetics.
Note: this can be done in O(1) time by comparing the value with 2 constants.
Given an int
(assume 4 bytes) with a positive value, use bit operators to find
out how many bits (and bytes) will be needed to represent the number in binary.
Then rewrite
:eyes: print-in-binary.c
using the acquired number.
Note: be careful about little vs big endian
To read a complex declaration, you use operator priorities and watch for parentheses.
char **argv
argv
is a pointer to a pointer to char
int (*a)[10]
a
is a pointer to an array of 10 int
s
int *a[10]
a
is an array of 10 pointers to int
void *myfunc()
myfunc
is a function returning a pointer to void
void (*myfunc)()
myfunc
is a pointer to a function returning void
char (*(*x())[])()
x
is a function returning pointer to an array of pointers to a function
returning a char
char (*(*x[3])())[5]
x
is an array of 3 pointers to a function returning a pointer to an
array of 5 char
s
- find the identifier (non keyword or custom type) of a variable or function
- start decoding:
- left to right
-
)
=> reverse decoding direction -
()
=> denotes function -
[]
=> array -
;
=> reverse the direction - when reading from right to left, we can hit:
-
(
=> reverse the direction -
*
=> pointer - type identifier => starts the whole definition
-
Example:
char *(*(**foo[][8])())[];
You can mentally simplify as:
char *(*SOME_FUNCTION)[];
Where SOME_FUNCTION
is (**foo[][8])()
Read as:
foo
is an array of arrays of 8 pointers to a pointer to a function
returning a pointer to an array of pointers to a char
.
Alternatively, you can do it via two simpler steps:
typedef char *a_of_p[];
typedef a_of_p *(**foo[][8])();
🔧 Practice:
int *(*(*fp1)(int))[10];
char (*(*x())[])()
double *f()[] // this is not legal in C as a function cannot
// return an array
Larger programs are often split into many source code files. One can create the binary as follows:
cc -o binary foo.c bar.c ...
and it will work however the files will be compiled every time even though they were not changed.
To save time the files are usually compiled individually and then linked into
the binary using a runtime linker (sometimes called a link editor, or just a
linker), a program usually called ld
. This program also takes care of adding
dependencies on dynamic libraries to the resulting binary, performing
relocations (for relative addressing), etc.
It works like this (assuming all source code files changed):
cc -c foo.c
cc -c bar.c
...
cc -o binary foo.o bar.o ...
Do not confuse the runtime linker with dynamic linker, usually called
ld.so
. See its manual page for more information.
Note that the program ld
was not invoked directly, it is the compiler that
serves as an ld
wrapper as it needs to add more object files with the C
runtime under the hood. So in reality the compiler accepts some linker options
(and passes them through) to make it easier for the programmer.
For GCC you can use the -###
option to see what commands would actually be
executed during compilation. Use -v
to get the same debugging information but
also execute the commands.
Usually it is desirable to avoid rebuilding objects unnecessarily, especially
when there is a large number of source files. This is usually done via set of
rules that establish dependencies. The most common tool for that is make
and
Makefile
is its configuration file that drives the compilation.
If you want to make sure your code is built following a concrete C version, it has two parts:
- you must request the C specification version in your C code
- you need an implementation (compiler, linker, ...) that supports the version you want
In C89, there was no
__STDC_VERSION__
macro (note there are two underscores both leading and
terminating the macro). In the Normative Addendum
1 for C89 from 1994, the macro was
defined with a value of 199409L
. In the
C99 specification
the macro has a value of 199901L
.
So, if you want your compiler to use a minimal or a specific C version, work
with the macro and #ifdef
and #error
preprocessor directives.
#ifndef __STDC_VERSION__
#error "__STDC_VERSION__ not defined. C99 compiler or greater required."
#else
#if __STDC_VERSION__ < 199901L
#error "C99 compiler or greater required."
#endif
#endif
In the 1997: Single UNIX Specification (SUS) version
2,
with systems supporting that version as UNIX98, the compiler binary supporting
C89 had to be named c89
. You can search SUS ver 2
here, and type c89
.
In
POSIX:2004,
which corresponds to SUS ver 3 with updates, the C compiler binary supporting
C99 standard had to be named c99
. The latest SUS
version (as of May, 2020)
still defines c99
. I expect the future SUS versions with require c11
binary
to support the C11 standard.
Might be difficult to get the compiler work as the exact specification version.
For example, the following will compile while //
are not part of C89, and
moreover, _Bool
is not either and it does not issue even a warning.
janp:air:~$ cat main.c
int
main(void)
{
// not in C89
_Bool b;
}
$ c89 main.c
main.c:4:2: warning: // comments are not allowed in this language [-Wcomment]
// not in C89
^
1 warning generated.
$ gcc -std=c89 -pedantic main.c
main.c:4:2: warning: // comments are not allowed in this language [-Wcomment]
// not in C89
^
1 warning generated.
- since C99 (§6.7.2.1, item 16, page 103)
- it is an array without a dimension specified as the last member of the
structure
- handy for structures with a fixed header and some "padding" data of flexible length that is allocated dynamically
- why not to use a pointer instead? It is good when passing data over
boundaries such as network, kernel/userland, etc. since deep structure
copy is not necessary.
- just copy the structure as a whole (however, it is necessary to know how large is the padding) because it is all contiguous memory
-
sizeof (the_structure)
gives you the size of the structure as if the flexible array was empty.
Example:
struct item {
int itm_value;
/* Other members follow*/
/* ... */
int itm_plen; // payload length in bytes
char itm_payload[];
/* Nothing can follow! */
};
The memory layout looks like this:
+-------------+-----------------------+
| struct item | payload ... |
+-------------+-----------------------+
Previously, this was hacked around using array with 0 items and GCC accepted this. Since C99 this can be done properly using this technique.
The data for the structure is allocated as follows:
struct item *p;
p = malloc(sizeof (struct item) + payload_len * sizeof (p->itm_payload[0]));
assert(p != NULL);
p->itm_plen = payload_len * sizeof (p->itm_payload[0]);
- with this approach the overall structure alignment might be lost
- i.e. it is necessary to set the payload length according to the size of the structure if you want to maintain the alignment
- sometimes memory is scarce (imagine having to keep millions of big structures
in memory) and there are members holding integer values that occupy just a
couple of bytes
- bit fields can be used to shrink the memory needed
struct foo {
unsigned int a : 3;
unsigned int b : 1;
unsigned int c : 4;
};
-
the number after the colon specifies the number of bits for a given bit field
- cannot exceed the size of the underlying data type (
unsigned int
in the above example)
- cannot exceed the size of the underlying data type (
-
you cannot use
sizeof
on a bit field -
this is good for implementing network protocol headers or HW registers, however the layout of bit fields in a C structure is implementation dependant
- if it needs to match a concrete layout, additional non-standard compiler
features have to be used (
#pragma
's etc.) - there is no
offsetof()
for bit fields
- if it needs to match a concrete layout, additional non-standard compiler
features have to be used (
- the integer values will behave as expected w.r.t. underflow/overflow and integer conversions/manipulations, e.g.
struct foo_s {
unsigned int a : 3;
unsigned int b : 1;
unsigned int c : 4;
} foo;
foo.a = 7;
foo.a++; // will wrap around to 0 since this is unsigned
If your library is stateful, the state is often managed by a handle used as an argument in the library function calls. Let's say internally in the library we need to track some ID and a position. Those members are expected to be used only by the library itself so the consumers of the library should not care or use whatever is inside the handle.
/* mylib.h */
struct ml_hndl {
int ml_id;
int ml_pos;
};
/* mylib.c */
struct handle *
ml_init(void)
{
/* allocate and init the handle first */
/* ... */
return (h);
}
/* mylib.c continues */
enum ml_err
ml_parse(struct handle *h, char *url)
{
enum ml_err e = ML_OK;
/* parse the URI */
/* ... */
return (e);
}
/* ... */
And the consumer would use it as follows:
#include <mylib.h>
int
main(void)
{
ml_err ret;
struct ml_hndl *h;
h = ml_init(void);
if ((ret = ml_parse(h, "some URI")) != ML_OK)
errx(1, "...");
/* ... */
}
So far, so good. However, what if the consumer code delves into the handle and uses the information in there (because the header file is part of the library and available to the consumer)?
int
main(void)
{
/* ... */
print_debug("[Debug] handle ID: %d\n", h->md_id);
/* ... */
}
This will work fine until the library provider changes the handle (believed to be internal to the library):
/* mylib.h */
struct ml_hndl {
long long ml_id; /* used to be an int */
int ml_pos;
};
Unless the consumer code is recompiled after the change, which may not happen as it could be some 3rd party software bought by the customer years ago, the debug print is now incorrect as it only prints the first 4 bytes of the structure, not 8.
There are ways to mitigate that:
- use something non-revealing as a handle, like in index to the internal array of structures that keep the states. That is possible but you now need to manage the internal structure that keeps the state structures. You also cannot change the index type unless you recompile all the consumers as well.
- the library provider could use something called incomplete types and use them to provide an opaque handle.
An incomplete type is a type that describes an object but lacks information needed to determine its size. You cannot declare objects using such types as the lack of the size prevents to allocated memory for them on the stack or heap.
$ cat main.c
struct x a; /* declaring a variable using a structure of an unknown size */
int
main(void)
{
}
$ cc main.c
main.c:1:10: error: tentative definition has type 'struct x' that is never
completed
struct x a;
^
main.c:1:8: note: forward declaration of 'struct x'
struct x a;
^
1 error generated.
The x
structure could be completed, for all declarations of that type, by
declaring the same structure tag with its defining content later in the same
scope. We could only forward declare the structure though (that is, no
defining any object of that structure type).
As we do not need to know the size of the x
structure, the following code
compiles.
$ cat main.c
struct x; /* this is called forward declaration */
int
main(void)
{
}
$ cc main.c
$ echo $?
0
BTW, the void
type is an incomplete type that can be never completed. So, you
cannot declare a variable of type void
.
$ cat main.c
int
main(void)
{
void x;
}
$ cc main.c
main.c:4:7: error: variable has incomplete type 'void'
void x;
^
1 error generated.
However, you can always declare and work with a pointer to an incomplete type (all structures are always aligned to the same byte boundary). The following will compile and run fine.
#include <stdio.h>
struct x *
myfn(struct x *p)
{
return (p);
}
int
main(void)
{
myfn(NULL);
}
This feature of the C language can be used to represent opaque handles, and those can be used to solve the problem with non-transparent handles .
With the help of incomplete types, it is possible to declare a structure in a header file along with API that consumes it without revealing what the structure actually contains. The reason to do that is not to allow a programmer to depend (= use) on the structure members, thus allowing to change the structure innards anytime.
/* Forward declaration in foo.h */
struct foo;
void
do_stuff(struct foo *f);
The file implementing do_stuff()
will cast the opaque structure object to
an internal structure type.
/* in foo_impl.h not provided to the consumers */
struct foo_impl {
int x;
int y;
};
/* library implementation code */
void
do_stuff(struct foo *f)
{
struct foo_impl *fi = (struct foo_impl *)f;
fi->x = ...
fi->y = ...
}
Then any .c
file that includes foo.h
can work with the structure
(by passing it to do_stuff()
) however cannot access its members directly.
The structure is usable only via a pointer. Thus, the library has to provide
a function to allocate a structure and return the pointer to it
(and possibly also to initialize it) and functions to get/set the data
inside the structure.
This is handy for libraries so they are free to change the layout of structures without breaking the consumers.
The consumer of the library then #include
s only foo.h
but it does not have
access to foo_impl.h
.
#include <stdio.h>
#include "foo.h"
int
main(void)
{
struct foo *h;
h = getFoo();
doStuff(h);
}
👁️
- 👀 foo.h
- 👀 foo_impl.h
- 👀 foo.c
- 👀 foo-consumer.c
Note that home assignments are entirely voluntary but writing code is the only way to learn a programming language.
Implement a function getbits(x,p,n)
that returns the (right adjusted) n
-bit
field of x
that begins at position p
. We assume that bit position 0
is at
the right end and that n
and p
are sensible positive values. For example,
getbits(x,4,3)
returns the three bits in positions 4, 3 and 2, right-adjusted.
The idea is to zero out anything that is to the left of p
, then shift those
n
bits to the very right, and you get the result.
/* 1 0000 0000: should return 1 */
i = getbits(0x100, 8, 1);
(void) printf("0x%X\n", i);
assert(i == 1);
/* 1110 0100: should return 9 (1001 in binary) */
i = getbits(0xe4, 5, 4);
(void) printf("0x%X\n", i);
assert(i == 9);
/* 1010 1010: should return the same number, ie. 0xAA */
i = getbits(0xAA, 7, 8);
(void) printf("0x%X\n", i);
assert(i == 0xAA);
In memory, build a tree of paths, where each node is a path element. The tree
starts with root /
(assuming Unix-like system), the non-leaf nodes are directories, the leaves are regular files.
The tree can look e.g. like this for paths /foo/a.c
, /foo/b.c
, /f.txt
, /bar/c.c
:
"/"
/ | \
/ | \
"foo/" "f.txt" "bar/"
/ \ |
"a.c" "b.c" "c.c"
Populate the tree with given paths and print the leaves (no particular order), one per line. Then free the allocated memory.
Each node can have constant number of children.
Possible input:
char *paths[] = {
"/beverages/coffee/espresso.java",
"/beverages/alcohol/beer.c",
"/food/healthy/vegetarian/salad.txt",
"/food/healthy/fruit/blueberries.hs",
"/food/unhealthy/cake.md"
};
- make the function used for tree traversal generic, i.e. pass in a function pointer
- Print also non-leaf nodes via depth first traversal, with each node indented based on its tree depth. E.g:
$ ./a.out
/
foo/
a.c
b.c
f.txt
bar/
c.c
- Remove the limitation for the constant number of children.
- Print full paths of all files in the tree (hint: backpointers)