12.md

🔧 Warm-up: bit difference

Task: write a program that counts the number of bits that need to be flipped so that number A becomes number B.

Note: use the character representation only for debug prints. The actual operations with the numbers are to be done with bit operations on integers.

Note: consider only positive integers, assume int.

👀 bitdiff.c 👀 binprint.c

Sub-task for home: flip those bits and verify that the numbers are the same.

Functions with variable number of arguments

🔧 write a function that takes variable number of ULL arguments and returns their sum

  • use the stdarg(3) man page
  • what happens if there is one int value passed in between the ULL arguments ?

🔑 solution: 👀 stdarg.c

Linking objects

Larger programs are often split into many source code files. One can create the binary as follows:

cc -o binary foo.c bar.c ...

and it will work however the files will be compiled every time even though they were not changed.

To save time the files are usually compiled individually and then linked into the binary using a runtime linker (sometimes called a link editor), a program usually called ld. This program also takes care of adding dependencies on dynamic libraries to the resulting binary.

It works like this (assuming all source code files changed):

cc -c foo.c
cc -c bar.c
...
cc -o binary foo.o bar.o ...

Note that the program ld was not invoked directly, it is the compiler that serves as an ld wrapper as it needs to add more object files with the C runtime under the hood. So in reality the compiler accepts some linker options (and passes them through) to make it easier for the programmer.

For GCC you can use the -### option to see what commands are executed during compilation.

Debugging

Debug prints

  • the most primitive debugging technique is to insert debug prints
  • fprintf()
    • do not use printf() for debugging unless you know what you are doing (stdout is cached in libc)
    • use a unique text for each debug text, for example:
/* some code */
fprintf(stderr, "DBG 1\n");
/* some code */
fprintf(stderr, "DBG 2\n"); ...
/* some code */
fprintf(stderr, "DBG 3\n"); ...
...
  • if you need to put another debug message in between 1 and 2, "1.5" seems like an easy way to do it.

Asserts

  • assert(3)
    • basic implementation of assert, provides the run-time condition checking
    • #define NDEBUG disables it. Check the C spec for more information.
    • you need <assert.h>
    • assert(0) is of common use to bail out and to know where it did

Task: write a custom assert() macro that prints the line and file information (using __LINE__, __FILE__ pre-defined macros) and exits.

👀 assert.c

  • there is also #error pre-processor pragma that can be used to fail during the compilation phase

  • there is static_assert() that is evaluated during compile time however this is available only in the C++0x standard

    • it can be handy for things (this is rather a hack) like:
int i;

static_assert(sizeof (void *) >= sizeof (i));
foo((void *) i);
  • in C99 we can use an array indexing with ternary operator as a check

👀 static_assert.c

  • to see how it works, run it through the pre-processor (use the -E compiler option or run cpp on the source file)

System call tracing

  • system calls: interface to the kernel to provide services like memory allocation, I/O, ...

  • prints system call arguments, return values, timing information etc.

  • this works by inserting break points or via dynamic instrumentation

    • the break points make the program slower, this overhead could be much larger than what is allowed in production
  • basic tools:

    • strace/truss/ltrace
    • dtrace (dynamic tracing)

🔧 Task: write a program that opens a file specified by the first argument of the program and reads a number of bytes specified by the second argument and writes them to standard output. Use fopen()/fread() without checking return values or error conditions. What happens if the program tries to read files like /etc/shadow on Unix? Run the program under strace to see possible clues. Refactor the program so that the I/O is performed in a function:

static void file_read(char *file, size_t len);

Make the function perform in a cycle with 16k iterations and run the program with /etc/passwd as the argument.

Again run the program under strace to see possible clues.

🔧 Now fix the program so that:

  • it properly detects all errors
  • the function can be called arbitrary number of times (in a sequence)

👀 file-open.c

Static analysis

  • compile time warnings are inherently limited
    • however, they can report on wide range of errors (format strings, offer missing include files, detect simple buffer overflows, ...)

👀 fmt-string-invalid.c 👀 buf-overflow.c

  • it fails to cover some corner cases

  • static analysis tools hook into compiler and can follow all code paths

    • by constructing graph of code blocks
    • this is useful also for taint analysis - following potentially unsanitized data input across all code paths to determine if it can be used to exploit code deficiencies
    • the method has its drawbacks - false positives, can be foiled sometimes because it does not understand enough context and program internals
  • clang: the scan-build script is a front-end to the library performing checks, accepts compilation line as argument

  • the compilation line can be a simple compiler invocation and also make

👀 buf-overflow-func.c

  • when compiled with -Wall -Wextra no warnings/errors are reported
  • when run, it usually does (depends on compiler/system) not produce any warnings and happily prints the corrupted array
    • try to compile with -DBUFFER_SIZE=128 and the result might be different due to stack smashing detection supplied by the compiler
      • if the program is compiled with -fno-stack-protector, the stack canary is not added and the program usually exits with a segmentation fault
      • try with different compilers (gcc, clang)

Run: scan-build clang buf-overflow.c

Note: works with gcc too

🔧 Task: write a program that accesses memory allocated on the heap after it is freed. See if the scan-build static analyzer can detect it. Modify the program to return the allocated memory from a function and then free + modify it in main(). Does scan-build still detect this use-after-free bug?

👀 use-after-free.c

Library call tracing

  • ltrace is the most widely used tool to print calls to library functions
    • it can provide indentation (-n)
    • can be used to do simple profiling (with -tt)
  • to print calls within the program itself use -e @MAIN
    • in a different tool, truss (Solaris, *BSD), one can do that with -u a.out

👀 recurs.c

  • run with: ltrace -n4 -L ./a.out 5

Note: ltrace can trace system calls as well with the -S option

Links:

Dynamic analysis

  • run-time analysis performed by running the program in an emulation layer (virtual machine), intercepting calls to memory allocator etc.
  • can provide more detailed detection at the cost of performance and coverage - can detect only in the code paths actually executed (compared to static analysis)
    • the performance degradation incurred by dynamic analysis is usually much higher than the one of syscall/library tracing

🔧 Task: Take a look at the code and try to find as many bugs as possible: 👀 shell.c

Then compile the program with as many checks as possible and run the program through the static analysis.

Then run it through Valgrind, perform couple of operations and exit the program, i.e.

  1. valgrind ./a.out
  2. echo couple of "commands"
  3. hit Ctrl-C

Also run it automatically, e.g.:

$ cat << EOF >/tmp/input
foo
bar
foo bar
EOF
$ valgrind ./a.out < /tmp/input
  • try to compile the program with debug info (-g) and re-run valgrind with --leak-check=full and see the difference in the output.
  • fix the program and re-run the test(s) again

Dynamic tracing

  • dtrace / SystemTap
  • should have minimal overhead, mostly can be run on production systems
  • provides syscall/library tracing as well as variety of other insights into the system (neworking abstractions, CPU performance counters/cache analysis, ...)
  • allows to ask (before unheard of) questions that could not be answered before
    • e.g. what process is stressing particular disk with I/O, what are the most contributing processes that send TCP traffic to port X and what are their most frequently executed stack traces, ...

Debugger

  • can be used for both live debugging and post-mortem analysis (from a core file produced by the program)

    • works by inserting breakpoints and displaying memory of the process
      • this is usually done using the ptrace(2) syscall
  • most common debuggers: gdb, lldb, dbx, mdb

    • some of them have the concept of plug-ins
  • compiling binaries for the debugging: the -g compiler option adds debugging information to the resulting binary, in particular mapping of assembly instructions to the source code lines

    • one can debug without this data, just some assembly knowledge is required
  • will describe gdb here

  • compile and run the program from gdb:

cc -Wall -g random.c
gdb ./a.out
> r
  • exiting the debugger:
> quit
  • or Ctrl-D

  • to get the core file, the system limit for core file may need to be bumped

ulimit -c 1000000
  • and re-run the program again

  • display the stack trace at the moment of breakpoint stop/crash:

> backtrace
  • see where exactly it crashed:
> disassembly
> info reg
> list
  • insert breakpoint
> b main
> b run
  • can also insert breakpoint to particular line (if the program was compiled with line information):
> b 8
  • print breakpoints:
> info breakpoints
  • stepping
    • single line (assembly instruction) stepping:
> next
  • step through:
> step
  • printing memory

    • print variable value:
> print idx
> print p
  • can use the usual dereference/ref operators: (once inside run())
> print *p[idx]
  • print a variable in each step:
> display i
  • to disable the printing:
> delete display i
  • breakpoints can be conditional:
> break 14 if i == idx

🔧 Warm-up/home task: alternating bits detection

Task: detect if an integer has alternating bits, e.g. 10101

Note: this can be done in O(1) time using bit operations + arithmetics

🔧 Warm-up: Number of bits for binary representation

Given an int (assume 4 bytes) with a positive value, use bit operators to find out how many bits (and bytes) will be needed to represent the number in binary. Then rewrite 👀 binprint.c using the acquired number.

Note: be careful about little vs big endian