Labs Assignment for 2019 - Simple Shell
=======================================
Write a simple Unix/Linux shell.

Version: Nov 1, 2019

When you are done, we require getting a link to your github project, or
any similar hosting site that uses either Git or Mercurial.  Your
project must provide a Makefile so that running the "make" command does
build the shell in the binary called "mysh" (this saves us the work when
running automated tests).

The first phase must be finished and handed to us before the end of the
Winter semester, i.e. Sunday 23:59, January 12, 2020.  All examples
shown below must work as expected.

The second phase must be finished and handed over to us before the end
of the Summer semester, i.e. by Sunday 23:59, May 24, 2020.

The deadlines are hard requirements, please stick to the schedule.

Obviously, you can send us the whole thing in one step before the first
deadline, no need to separate the project in two parts.

If in doubt, please ask a question, and we strongly prefer you send it
to the class mailing list, nswi015-l@mff.cuni.cz, so that others may
benefit from the answers as well.

C Style
-------
Please choose a good C style, for example http://mff.devnull.cz/cstyle/,
or pick another existing cstyle (e.g. https://man.openbsd.org/style).
Please do not invent your own C style.

This class is about Unix/Linux programming in C, so we only accept the C
language, not C++.

Testing and verification scenario
---------------------------------
Use any Unix/Linux/MacOS system you want but your code must build and
work on a Linux machine in the MFF UK lab with the "u-pl" prefix, e.g:

	u-pl1.ms.mff.cuni.cz

Use Valgrind to make sure you do not mismanage memory.  Valgrind is
installed on all Linux machines in the MFF UK Unix lab.

Compiler and warnings
---------------------
You code must build with no warnings with the default gcc installed on a
Linux machine in the MFF UK lab (see above).

You must use both -Wall and -Wextra options.

Implementation notes
--------------------
We are interested in you dealing with the Unix/Linux stuff you learn in
the class, so we recommend not spending time writing a linked list
implementation or a command line parser (the latter is much harder that
one might think).

For getting the input line, we recommend the "readline" library, see
"man readline".  You should find it on any Linux, *BSD, MacOS, or
Solaris machine.  By using readline, you get line editing, history,
completion, etc. for free.  However, note that as readline is GPL (not
LGPL), just by linking your shell to it requires the entire combined
resulting application to be licensed under the GPL when distributed, to
comply with section 5 of the GPL.  If you ever plan to use your shell
code somewhere else, you need to keep that in mind.

If you do not want to do that, another alternative is libtecla.  There
is also BSD libedit (https://man.openbsd.org/OpenBSD-6.0/el_init.3).

For tokenizing and parsing the command line, we suggest you to use
GNU flex(1) and bison(1).  They are easy to use and you should also find
it virtually everywhere, already installed.  We recommend O'Reilly's
book "flex & bison".  Having read through the first 30 pages of the book
should be all you need for this project.  I believe you can find the
book online.  Of course, if you chose so, feel free to parse the command
line explicitly by your own code.

You will probably need a linked list implementation as using fixed sized
arrays would not suffice.  We suggest you to use macros from <queue.h>.
See "man queue" for more information.  STAIL_* and TAIL_* macros should
be all you need.

FYI, our implementation of the 1st phase is 560 lines long, including
flex (.l) and bison source files (.y) and not including the C parser
code generated from our .l and .y files, while using the usual C style
from http://mff.devnull.cz/cstyle/.  Our complete implementation
including the 2nd phase requirements fits within 900 lines of code.

IT GOES WITHOUT SAYING IT IS AN INDIVIDUAL, NOT A TEAM PROJECT.  DO NOT
USE CODE WRITTEN BY OTHER FELLOW STUDENTS.  THE IDEA IS YOU LEARN BY
CODING YOUR OWN PROJECT.

PROHIBITED FUNCTIONS
--------------------
Family of fopen(), fread() functions, et al are prohibited.  Instead use
functions from the printf(3) family, and system calls like open(2),
read(2), write(2), etc.

popen(3), system(3), and all other functions that fork() inside are
strictly prohibited.  The only way to create a new process for you is to
use fork(2).

You must not exec any shell from your code.  I.e. if you have a pipeline
"cat /etc/passwd | wc -l", the solution is not the following:

	if (fork() == 0) {
		execl("/bin/sh", "sh", "-c", "cat /etc/passwd | wc -l",
		    NULL);
	}

You really have to fork a new process for each command in a pipeline,
and connect them together with pipes.

Phase 1
-------
Note - the "mysh$" prompt in examples below represents the behavior of
the shell you implement.  The "Bash$" prompt represents the system shell
you start your shell from.

- Honor and use the existing PATH you get from your parent shell, i.e.
  so that one can use "ls" and the command will execute.

- ^C (= SIGINT) in the prompt kills the current unfinished line and
  offers a new prompt (same way as bash does it).  E.g:

	mysh$ ls xxx^C
	mysh$

- An empty command just prints a new prompt (as bash does it).  E.g.

	mysh$ <SPACE><SPACE><SPACE><ENTER>
	mysh$

- Implement simple foreground command execution.  Commands may be
  separated with a semicolon.  Any number of commands separated by a
  semicolon must be supported (i.e. until you reach some other system
  limits):

	- i.e. "cmd [; cmd2 [; cmd3 ...]]]
	- for example:

	mysh$ date
	Mon Oct  8 04:51:27 PDT 2018
	mysh$

	mysh$ /usr/bin/echo X
	X
	mysh$

	mysh$ echo X; date; echo Z
	X
	Monday, October  8, 2018 at 11:36:04 AM CEST
	Z
	mysh$

  Whitespace does not matter with the semicolon, i.e.:

	mysh$ echo X;date;echo Z
	X
	Monday, October  8, 2018 at 11:36:04 AM CEST
	Z
	mysh$

- A semicolon after the last command is legal (as is in bash).  E.g:

	mysh$ echo x;
	x

  However, the following is not legal:

	mysh$ date;;
	error:1: syntax error near unexpected token ';'
	mysh$

- Implement an internal "exit" to exit the shell.  E.g.:

	Bash$ ./mysh
	mysh$ exit
	Bash$ echo $?
	0
	Bash$

  The internal exit command does not change the shell's return value set
  from a previous command execution.  See below for more information.

- ^D (i.e. Ctrl-D) exits the interactive shell as well (same as the
  "exit" internal command).  Again, see bash for an example.

- Implement an internal "cd" command.  It works in three modes:

	- "cd" goes to $HOME
	- "cd -" goes to previous directory after it prints the target
	- "cd <dir>" goes to "<dir>"

  The internal cd must set the PWD and OLDPWD environment variables
  (again see bash).

  E.g.:

	mysh$ pwd
	/data/maish
	mysh$ cd
	mysh$ pwd
	/home/jpechane
	mysh$ cd -
	/data/maish
	mysh$ pwd
	/data/maish
	mysh$ cd /tmp
	mysh$ pwd
	/tmp
	mysh$ cd -
	/data/maish
	mysh$ pwd
	/data/maish

- The "-c" option is supported as in any other usual shell.

  E.g.:

	Bash$ mysh -c "echo XXX; date"
	XXX
	Monday, October  8, 2018 at  1:36:56 PM CEST
	Bash$

- The '#" comment is supported. E.g.:

	mysh$ echo X # comment
	X
	mysh$     # another comment
	mysh$

	Or:

	./mysh -c 'date; echo X # comment'
	Wednesday, October 10, 2018 at 10:10:09 AM CEST
	X

	Comments must also work in the non-interactive mode, see below.

- The non-interactive mode is supported.

	- i.e. 'mysh test.sh' must work and the commands from the file
	  are executed sequentially.

	- the shell errors out on first syntax error as a usual shell
	  does.  E.g:

	Bash$ cat test.sh 
	echo X
	date
	;;
	echo Y

	Bash$ ./mysh test.sh
	X
	Monday, October  8, 2018 at 11:38:16 AM CEST
	error:3: syntax error near unexpected token ';'
	Bash$

	- shell must treat syntax errors gracefully.  Print the line
	  number where the problem occured.  See the error message
	  above.

- In all modes, the return value of your shell is the return value of
  the last command (or pipeline) executed or some specific error on a
  syntax error (e.g. bash returns 2, ksh returns 3).  E.g. if we use
  return value of 254 for a syntax error, it will look like this:

	Bash$ ./mysh -c "grep XXX123 /etc/passwd"
	Bash$ echo $?
	1

	Bash$ ./mysh -c ';;'
	error:1: syntax error near unexpected token ';'
	Bash$ echo $?
	254

	mysh$ grep XXX123 /etc/passwd
	mysh$ exit
	Bash$ echo $?
	1

	mysh$ grep XXX123 /etc/passwd
	mysh$ ^D
	Bash$ echo $?
	1

	mysh$ ls; ;
	error:1: syntax error near unexpected token ';'
	mysh$ exit
	Bash$ echo $?
	254

- The return value of an unknown command is 127 (as in bash).  E.g:

	Bash$ ./mysh -c 'xxx123'
	mysh: xxx123: No such file or directory
	Bash$ echo $?
	127

- Error and warning messages from the shell itself go to standard error
  output.  E.g. the message "mysh: xxx123: No such file or directory"
  from above goes to stderr.

- The interactive shell must handle SIGINT well (= killing the
  foreground process via ^C).  I.e.:

	mysh$ sleep 100
	^CKilled by signal 2.
	mysh$ 

  If a command exits on receiving a signal and it is the last command
  run by the shell, the shell return value is 128 plus the signal number
  (same as in bash).  E.g.:
  
	Bash$ ./mysh
	mysh$ sleep 10
	^CKilled by signal 2.
	mysh$ exit
	Bash$ echo $?
	130

  However, see also the paragraph about killing the unfinished command
  line with ^C.  Both must work as explained.

  The "Killed by..." message goes to stderr.  Also see above on error
  and warning messages.

- It would be nice if you put the current working directory in the
  prompt, like this:

	mysh:/data/mysh$ cd /
	mysh:/$ cd /tmp
	mysh:/tmp$

Phase 2
-------
- implement pipes
- implement the three basic redirections

Pipelines
~~~~~~~~~
The number of pipes in a pipeline is only limited by the machine
resources.  That means no static array for pipelines.

So, for example, the following prints output of date(1):

  	mysh$ date | cat | cat | cat | cat | cat | cat | sort -n | uniq

Pipelines may be separated by a semicolon, i.e. it's an extension of the
1st phase:

	mysh$ cat /etc/passwd | wc -l; echo X | grep X
	26
	X 

Note that the shell needs to wait for all commands in the pipeline to
finish before printing out the next prompt.  So you need to create the
pipeline HORIZONTALLY, meaning the shell is a parent of ALL processes in
the pipeline (remember, wait(2) only works on direct children).  For
example, the following will hang for 10 seconds before your shell
returns a new prompt:

	mysh$ date | sleep 10 | cat

If you only waited for the last command (cat(1) in this case), the shell
would offer the next prompt right away which is NOT correct.

Note that sleep(1) does not read from its input, and does not write
anything to STDOUT, so the cat(1) command finds no writer on the pipe
and exits right away.

Redirections
~~~~~~~~~~~~
Implement the three basic redirections:

	- ">file"
	- "<file"
	- ">>file"

Note: there is no need to implement "2>xxx", ">&N" etc.

Whitespace is not significant.  For example, ">file", "> file" and
">   file" are all syntactically correct and equivalent.

Redirection operators may come anywhere on the command line.  The
following invocations are equivalent:

	mysh$ cat /etc/passwd > output
	mysh$ >output cat /etc/passwd
	mysh$ cat >output /etc/passwd

The same stands for all other redirections.  For example, the following
command lines are also equivalent:

	mysh$ cat <input >output
	mysh$ >output cat <input
	mysh$ cat >output <input
	mysh$ <input >output cat
	mysh$ >output <input cat

The last output redirection counts, ie. the following will append the
output to "output3":

	mysh$ date >output >output2 >>output3

The last input redirection counts.  The following will print /etc/passwd
ONLY:

	mysh$ </etc/group cat </etc/passwd

And the following will write /etc/group to output3, truncating it first
if it existed beforehand:

	mysh$ cat </etc/passwd </etc/group >output >>output2 >output3

Normal shells always create all files with ">" and ">>" redirections but
you do not need to do that.  It's OK only to create the last one only
(that is, the one that will be used for the actual output).

Note that redirection comes after connecting the individual commands in
the pipeline with pipes, so the following copies /etc/passwd to file
"output", puts date's output to file "output2", and the last cat(1)
finds no writer on the pipeline (since date(1) is printing to "output2"
instead) and just exits finding STDIN empty:

	$ cat /etc/passwd > output | date > output2 | cat

Anything not listed above is NOT required
-----------------------------------------
I.e. in particular, we do NOT require any of these below:

- background execution via &

- environment variable support (i.e. "echo $HOME" nor "XXX=value ./a.out")

- the "export" internal command.

- Job control.  That also means there is no need to create a new process
  group for each pipeline (that's what a normal shell does).  However,
  if you do, you will need to solve how background processes the control
  terminal as that was out of scope of our class.

- support for $? etc.

- string/quoting support, i.e. the following is not required:

	$ echo "xxx   yy"
	xxx   yy
	$ echo 'xxx   yy'
	xxx   yy

- `` and $( ... )

- complex commands like "if", "while", "for", "switch", etc.

- line continuation with '\'

If unsure, either ask on our class mailing list or check what bash does.

vim:tw=72
vim:ft=conf