Labs Assignment for 2019 - Simple Shell ======================================= Write a simple Unix/Linux shell. Version: Nov 1, 2019 When you are done, we require getting a link to your github project, or any similar hosting site that uses either Git or Mercurial. Your project must provide a Makefile so that running the "make" command does build the shell in the binary called "mysh" (this saves us the work when running automated tests). The first phase must be finished and handed to us before the end of the Winter semester, i.e. Sunday 23:59, January 12, 2020. All examples shown below must work as expected. The second phase must be finished and handed over to us before the end of the Summer semester, i.e. by Sunday 23:59, May 24, 2020. The deadlines are hard requirements, please stick to the schedule. Obviously, you can send us the whole thing in one step before the first deadline, no need to separate the project in two parts. If in doubt, please ask a question, and we strongly prefer you send it to the class mailing list, nswi015-l@mff.cuni.cz, so that others may benefit from the answers as well. C Style ------- Please choose a good C style, for example http://mff.devnull.cz/cstyle/, or pick another existing cstyle (e.g. https://man.openbsd.org/style). Please do not invent your own C style. This class is about Unix/Linux programming in C, so we only accept the C language, not C++. Testing and verification scenario --------------------------------- Use any Unix/Linux/MacOS system you want but your code must build and work on a Linux machine in the MFF UK lab with the "u-pl" prefix, e.g: u-pl1.ms.mff.cuni.cz Use Valgrind to make sure you do not mismanage memory. Valgrind is installed on all Linux machines in the MFF UK Unix lab. Compiler and warnings --------------------- You code must build with no warnings with the default gcc installed on a Linux machine in the MFF UK lab (see above). You must use both -Wall and -Wextra options. Implementation notes -------------------- We are interested in you dealing with the Unix/Linux stuff you learn in the class, so we recommend not spending time writing a linked list implementation or a command line parser (the latter is much harder that one might think). For getting the input line, we recommend the "readline" library, see "man readline". You should find it on any Linux, *BSD, MacOS, or Solaris machine. By using readline, you get line editing, history, completion, etc. for free. However, note that as readline is GPL (not LGPL), just by linking your shell to it requires the entire combined resulting application to be licensed under the GPL when distributed, to comply with section 5 of the GPL. If you ever plan to use your shell code somewhere else, you need to keep that in mind. If you do not want to do that, another alternative is libtecla. There is also BSD libedit (https://man.openbsd.org/OpenBSD-6.0/el_init.3). For tokenizing and parsing the command line, we suggest you to use GNU flex(1) and bison(1). They are easy to use and you should also find it virtually everywhere, already installed. We recommend O'Reilly's book "flex & bison". Having read through the first 30 pages of the book should be all you need for this project. I believe you can find the book online. Of course, if you chose so, feel free to parse the command line explicitly by your own code. You will probably need a linked list implementation as using fixed sized arrays would not suffice. We suggest you to use macros from . See "man queue" for more information. STAIL_* and TAIL_* macros should be all you need. FYI, our implementation of the 1st phase is 560 lines long, including flex (.l) and bison source files (.y) and not including the C parser code generated from our .l and .y files, while using the usual C style from http://mff.devnull.cz/cstyle/. Our complete implementation including the 2nd phase requirements fits within 900 lines of code. IT GOES WITHOUT SAYING IT IS AN INDIVIDUAL, NOT A TEAM PROJECT. DO NOT USE CODE WRITTEN BY OTHER FELLOW STUDENTS. THE IDEA IS YOU LEARN BY CODING YOUR OWN PROJECT. PROHIBITED FUNCTIONS -------------------- Family of fopen(), fread() functions, et al are prohibited. Instead use functions from the printf(3) family, and system calls like open(2), read(2), write(2), etc. popen(3), system(3), and all other functions that fork() inside are strictly prohibited. The only way to create a new process for you is to use fork(2). You must not exec any shell from your code. I.e. if you have a pipeline "cat /etc/passwd | wc -l", the solution is not the following: if (fork() == 0) { execl("/bin/sh", "sh", "-c", "cat /etc/passwd | wc -l", NULL); } You really have to fork a new process for each command in a pipeline, and connect them together with pipes. Phase 1 ------- Note - the "mysh$" prompt in examples below represents the behavior of the shell you implement. The "Bash$" prompt represents the system shell you start your shell from. - Honor and use the existing PATH you get from your parent shell, i.e. so that one can use "ls" and the command will execute. - ^C (= SIGINT) in the prompt kills the current unfinished line and offers a new prompt (same way as bash does it). E.g: mysh$ ls xxx^C mysh$ - An empty command just prints a new prompt (as bash does it). E.g. mysh$ mysh$ - Implement simple foreground command execution. Commands may be separated with a semicolon. Any number of commands separated by a semicolon must be supported (i.e. until you reach some other system limits): - i.e. "cmd [; cmd2 [; cmd3 ...]]] - for example: mysh$ date Mon Oct 8 04:51:27 PDT 2018 mysh$ mysh$ /usr/bin/echo X X mysh$ mysh$ echo X; date; echo Z X Monday, October 8, 2018 at 11:36:04 AM CEST Z mysh$ Whitespace does not matter with the semicolon, i.e.: mysh$ echo X;date;echo Z X Monday, October 8, 2018 at 11:36:04 AM CEST Z mysh$ - A semicolon after the last command is legal (as is in bash). E.g: mysh$ echo x; x However, the following is not legal: mysh$ date;; error:1: syntax error near unexpected token ';' mysh$ - Implement an internal "exit" to exit the shell. E.g.: Bash$ ./mysh mysh$ exit Bash$ echo $? 0 Bash$ The internal exit command does not change the shell's return value set from a previous command execution. See below for more information. - ^D (i.e. Ctrl-D) exits the interactive shell as well (same as the "exit" internal command). Again, see bash for an example. - Implement an internal "cd" command. It works in three modes: - "cd" goes to $HOME - "cd -" goes to previous directory after it prints the target - "cd " goes to "" The internal cd must set the PWD and OLDPWD environment variables (again see bash). E.g.: mysh$ pwd /data/maish mysh$ cd mysh$ pwd /home/jpechane mysh$ cd - /data/maish mysh$ pwd /data/maish mysh$ cd /tmp mysh$ pwd /tmp mysh$ cd - /data/maish mysh$ pwd /data/maish - The "-c" option is supported as in any other usual shell. E.g.: Bash$ mysh -c "echo XXX; date" XXX Monday, October 8, 2018 at 1:36:56 PM CEST Bash$ - The '#" comment is supported. E.g.: mysh$ echo X # comment X mysh$ # another comment mysh$ Or: ./mysh -c 'date; echo X # comment' Wednesday, October 10, 2018 at 10:10:09 AM CEST X Comments must also work in the non-interactive mode, see below. - The non-interactive mode is supported. - i.e. 'mysh test.sh' must work and the commands from the file are executed sequentially. - the shell errors out on first syntax error as a usual shell does. E.g: Bash$ cat test.sh echo X date ;; echo Y Bash$ ./mysh test.sh X Monday, October 8, 2018 at 11:38:16 AM CEST error:3: syntax error near unexpected token ';' Bash$ - shell must treat syntax errors gracefully. Print the line number where the problem occured. See the error message above. - In all modes, the return value of your shell is the return value of the last command (or pipeline) executed or some specific error on a syntax error (e.g. bash returns 2, ksh returns 3). E.g. if we use return value of 254 for a syntax error, it will look like this: Bash$ ./mysh -c "grep XXX123 /etc/passwd" Bash$ echo $? 1 Bash$ ./mysh -c ';;' error:1: syntax error near unexpected token ';' Bash$ echo $? 254 mysh$ grep XXX123 /etc/passwd mysh$ exit Bash$ echo $? 1 mysh$ grep XXX123 /etc/passwd mysh$ ^D Bash$ echo $? 1 mysh$ ls; ; error:1: syntax error near unexpected token ';' mysh$ exit Bash$ echo $? 254 - The return value of an unknown command is 127 (as in bash). E.g: Bash$ ./mysh -c 'xxx123' mysh: xxx123: No such file or directory Bash$ echo $? 127 - Error and warning messages from the shell itself go to standard error output. E.g. the message "mysh: xxx123: No such file or directory" from above goes to stderr. - The interactive shell must handle SIGINT well (= killing the foreground process via ^C). I.e.: mysh$ sleep 100 ^CKilled by signal 2. mysh$ If a command exits on receiving a signal and it is the last command run by the shell, the shell return value is 128 plus the signal number (same as in bash). E.g.: Bash$ ./mysh mysh$ sleep 10 ^CKilled by signal 2. mysh$ exit Bash$ echo $? 130 However, see also the paragraph about killing the unfinished command line with ^C. Both must work as explained. The "Killed by..." message goes to stderr. Also see above on error and warning messages. - It would be nice if you put the current working directory in the prompt, like this: mysh:/data/mysh$ cd / mysh:/$ cd /tmp mysh:/tmp$ Phase 2 ------- - implement pipes - implement the three basic redirections Pipelines ~~~~~~~~~ The number of pipes in a pipeline is only limited by the machine resources. That means no static array for pipelines. So, for example, the following prints output of date(1): mysh$ date | cat | cat | cat | cat | cat | cat | sort -n | uniq Pipelines may be separated by a semicolon, i.e. it's an extension of the 1st phase: mysh$ cat /etc/passwd | wc -l; echo X | grep X 26 X Note that the shell needs to wait for all commands in the pipeline to finish before printing out the next prompt. So you need to create the pipeline HORIZONTALLY, meaning the shell is a parent of ALL processes in the pipeline (remember, wait(2) only works on direct children). For example, the following will hang for 10 seconds before your shell returns a new prompt: mysh$ date | sleep 10 | cat If you only waited for the last command (cat(1) in this case), the shell would offer the next prompt right away which is NOT correct. Note that sleep(1) does not read from its input, and does not write anything to STDOUT, so the cat(1) command finds no writer on the pipe and exits right away. Redirections ~~~~~~~~~~~~ Implement the three basic redirections: - ">file" - ">file" Note: there is no need to implement "2>xxx", ">&N" etc. Whitespace is not significant. For example, ">file", "> file" and "> file" are all syntactically correct and equivalent. Redirection operators may come anywhere on the command line. The following invocations are equivalent: mysh$ cat /etc/passwd > output mysh$ >output cat /etc/passwd mysh$ cat >output /etc/passwd The same stands for all other redirections. For example, the following command lines are also equivalent: mysh$ cat output mysh$ >output cat output output cat mysh$ >output output >output2 >>output3 The last input redirection counts. The following will print /etc/passwd ONLY: mysh$ output >>output2 >output3 Normal shells always create all files with ">" and ">>" redirections but you do not need to do that. It's OK only to create the last one only (that is, the one that will be used for the actual output). Note that redirection comes after connecting the individual commands in the pipeline with pipes, so the following copies /etc/passwd to file "output", puts date's output to file "output2", and the last cat(1) finds no writer on the pipeline (since date(1) is printing to "output2" instead) and just exits finding STDIN empty: $ cat /etc/passwd > output | date > output2 | cat Anything not listed above is NOT required ----------------------------------------- I.e. in particular, we do NOT require any of these below: - background execution via & - environment variable support (i.e. "echo $HOME" nor "XXX=value ./a.out") - the "export" internal command. - Job control. That also means there is no need to create a new process group for each pipeline (that's what a normal shell does). However, if you do, you will need to solve how background processes the control terminal as that was out of scope of our class. - support for $? etc. - string/quoting support, i.e. the following is not required: $ echo "xxx yy" xxx yy $ echo 'xxx yy' xxx yy - `` and $( ... ) - complex commands like "if", "while", "for", "switch", etc. - line continuation with '\' If unsure, either ask on our class mailing list or check what bash does. vim:tw=72 vim:ft=conf