Linux

UNIX introduction and history

The UNIX system was created by Dennis Ritchie and Ken Thompson in the early 1970s while working at Bell Labs (the laboratory where the transistor was invented in 1956 by Bradeen, Brattain and Shockley, and many other things, such as the laser). The UNIX operating system was not the first, but it was simple and elegant and found wide adoption for large (at the time) computers, particularly also at research universities. In the course of time, many different versions of UNIX were created, which can be traced through this giant wall chart. The main branches of the UNIX evolutionary tree are the BSD line of Unices (yes, that’s the plural!), and the System V type Unices. It is notable that both the Solaris operating system by Sun microsystem and Macintosh OSX are BSD derivates, whereas Linux is a System V derivate (roughly speaking). As computer technology evolved, UNIX became available on consumer machines, and today about 10% (guesstimate) of all computers run a UNIX-derived operating system. Linux in particular is important in high performance computing, as it is free, powerful, and very stable, it is deployed on large clusters and server farms with hundreds and sometimes thousands of compute nodes. (The installation of commercial operating systems would require to purchase a separate license for each node, increasing the cost of such compute clusters significantly). In addition, commercial operating systems are sometimes less stable than Linux, and stability is a huge issue if you run thousands of nodes!

Linux is a UNIX operating system created by Linus Torvalds, starting in 1994 when he was a student in computer sciences in Helsinki, Finland. At the time, UNIX systems were commercial software and very expensive. His dream was to create a UNIX that would be open source and could be shared – and improved – by a community of programmers. His work was based on the GNU gcc compiler; the GNU project had a long standing wish to produce its UNIX operating system, but not had the opportunity to do so, as it was busy to create all the tools necessary for the job. Because Torvalds used the tools created by the GNU project, Linux is sometimes referred to as GNU/Linux.

The precise element of the UNIX operating system that Torvalds implemented is the so-called kernel: the central piece of code that controls all other pieces of code on the computer. The kernel manages processes, memory, file systems, device drivers, and allows programs to access these elements through an application programming interface. No direct access to any of these sub-systems is allowed, because UNIX systems can have several users and need to protect users from from interfering with each other.

Modern UNIX systems such as Macintosh OSX or Linux distributions such as Ubuntu are extremely user friendly with graphical user interfaces (GUIs). In the early days, such interfaces did not exist. The user interface was a simple text display, in which commands could be typed. (Actually, the first interfaces were so-called teletype machines; essentially typewriters hooked up to computers such as this one). At the time, program and data entry was through punch cards and paper tape readers. It is interesting that the principal way many programmers interact with a UNIX system is very similar to the old text-based UNIX terminals (fortunately, the punch cards and paper tape are gone!). The terminal is a program that allows you to type commands on a line, and the output is printed to the display; the display scrolls up as more lines are added. The bottommost line is the command line, and is the only line where anything can be entered. The text the scrolled up cannot be edited anymore… just like on a teletype!

The terminal and the shell.

Here will will learn more about the terminal application. The terminal application in fact emulates an old text-mapped terminal display, so the similarity isn’t just superficial. The terminal runs a so-called shell program, that interprets commands typed by the user (or provided to the shell in a batch-file of commands). The most common shell today is the bash shell, which comes standard on most Linux distributions and Mac OSX. Therefore, we will only cover the bash shell here. Note that when you type simple commands into the shell, most shells behave almost alike. However, the shell scripting languages used in so-called shell scripts have widely different syntaxes.

The shell is usually run in a terminal window that is initially 80 characters wide with 25 lines of text. Because the shell nowadays is usually just a window in a GUI, the window can be re-sized and the shell will expand to whatever dimensions that correspond to the window size, which is convenient. In the old days, the character-mapped screens showed green characters on a black background. Today the terminal program colors can be chosen, and come usually in black on white as default. However, many people still adjust the terminal program to the old color scheme; it looks just more like a terminal!

In the shell, there is only one location where text can be typed: at the cursor position. The cursor appears only if there is also a so-called prompt. The user prompt in a bash shell is the dollar sign $. If you are the super user (root) the prompt changes to a hash sign #. The root prompt is a constant reminder that you have to be careful when typing commands: You could erase the entire system!

Command lines are ended with a return or enter key. The shell processes the command; during processing, the terminal is not available for user input. Instead, output is printed to the screen, if any. When processing is complete, the prompt and cursor are printed again and the next command can be entered.

Special keys in the shell

The shell supports some special function keys:

[command]-a	jump to the beginning of the line
[command]-e	jump to the end of the line
[command]-r	search history (see below)
right/left arrow	move right/left
up/down arrow	move up/down in the history
[control]-c	terminate the currently executing program
[control]-z	pause the currently executing program
[control]-s	scroll-lock mode – stop scrolling when there is a lot of output (‘q’ quits)
[control]-d	logout

History

The shell keeps a history file of commands that have been entered. The contents of the file can be viewed using the history command. The history can be traversed using the up and down keys. The history can also be searched using [control]-R, followed by a search term. Repeatedly pressing [control]-R will move through all the matches in the history. If the desired match is found, the left or right arrow keys may be pressed. The matched line is then put on the command line and can be executed by typing return or enter, or it can be edited before executing. There are some special history commands: !! will execute the last command again. !540 will execute line 540 in the history file again (each line is numbered, see the output of history). Working with the history functions saves a lot of typing!

The bash history is stored in a file in your home directory called .bash_history. You can delete this file to get rid of the history. This is sometimes useful, such as if you accidentally entered a password on the command line.

The file system

UNIX is built on few important concepts. One of the important concepts is the file, and processes are another.

Files are stored in a file system and provide permanent storage. Processes run the computer programs.
The file system allows the creation, access for reading and writing, and deletion of files. The files are organized into a hierarchical directory structure. The basis of the structure is the so-called root, denoted by a simple slash /. The root directory can contain either other directories, or files. If a file, say filename.txt was stored in the root directory, it could be referenced by /filename.txt. If it was in a directory under root called mydir, it could be accessed using the i. path /mydir/filename.txt. There can be an almost unlimited number of sub-directories in directories, the exact number depends on the type of file system.

file system types

The most common file system type in current use is ext3. The xfs file system is preferable for larger file servers, because it can access files faster. Other file systems include ZFS, developed by Sun, and HFS+, the file system of the Apple Macintosh.

Traversing the directory using the terminal.

The terminal provides commands to work with directories. When you first log into a UNIX system, you are in your “home directory”. The directory you are in is also called the “current directory” or “working directory”. The working directory can be determined using the command pwd (print working directory).

The directory contents can be listed using the command ls. It is a command that has a lot of options, which can be studied on its man page, using man ls. Nice command line options are ls -l, which prints more information with the files, such as access rights, modification times, and file size. ls -a also shows the so-called hidden files, which are files that start with a dot (remember the .bash_history file; it will not be listed with a simple ls).

To change the directory, you use the cd command (change directory). You can either give absolute paths (meaning paths starting at the root, thus starting the path with a slash) or a relative path, which will start at the working directory, and does not start with a slash. For example, a cd with an absolute path could be:

cd /etc/apt/

whereas a relative path could be

cd myreports/

This command works if there is a myreports/ directory in your current working directory.

cd .. will move to the parent directory. cd ../.. etc will move to the grand-parent, etc.

Simply doing cd brings you to your home directory. Another neat feature is cd -, which brings you to the directory before the last cd. So you can quickly go back and forth between to directories by subsequently repeatedly typing cd -.

File access privileges

UNIX is designed as a multiuser system, and therefore we need to protect users from accidental mishaps to malevolent behavior of other users. The UNIX file system has provisions for protecting files from being written and read by other, non-privileged users. Every file has read, write and execute privileges, which can be set for the owner, the group, and everyone else. Every user has a username and user id, and can be part of one or more groups. Groups are something a system administrator will set up to allow people to work together on a set of files; for example, the people in the development department could be in a group called ‘development’. The files of the development group could then be configured such that users in the group ‘finances’ are not allowed to overwrite them, or even not allowed to read them, for example, and vice versa.

You can figure out who you are by typing the command whoami (although if your memory is pretty good, you will still remember your login name. Unless you used su (switch user), you should be you). You can see your group affiliations by typing groups. Often, systems are configured such that every user also has its own group. Sometimes you will be in a group called users or similar.

You can inspect the privileges of a file (or directory) by using the -l option, as in ls -l.

drwxr-xr-x  2 mueller mueller   39 2010-06-10 15:04 .
drwxr-xr-x 49 mueller mueller 4096 2010-06-10 15:03 ..
-rw-r--r--  1 mueller mueller   22 2010-06-10 15:03 bitzy
-rw-r--r--  1 mueller mueller   23 2010-06-10 15:04 boo
-rw-r--r--  1 mueller mueller   21 2010-06-10 15:03 itzy

Note that the first column of the output gives us a cryptic looking code. The characters mean the following:

byte	values	explanation
1.	– (file) d (directory) s (socket) l (link)	information if file or directory
2, 3, 4.	– no permission, r read, w write, x execute	the privileges of the owner
5, 6, 7.	ditto	the privileges of the group
8, 9, 10.	ditto	the privileges of everyone else

The next fields in the ls -l output are the number of links to that file (in how many directories an entry was created for this file, usually 1), the owner, the group , the size, modification time and date, and the file name.

Changing the owner, group, and privileges

There are commands to change the owner, group and privilege setting of files:

chown changes the owner of the file. For example chmod mueller myfile.txt changes the owner to mueller from whatever it was before. Note that for most chown calls, you have to be root. Recursive ownership changes can be done using chown -R mueller mydir (this will change all the files in all subdirectories of mydir.

chgrp changes the group. Again, recursive calls are possible with -R.

chmod changes the specific read/write/execute properties of the owner, group, and ‘other’. There are a couple of different ways it can be used: For example, the following syntax is quite easy: chmod g+w myfile adds write privileges to the group. chmod u+w, chmod o+w are the equivalents for owner and ‘other’. chmod a+w affects all three. In addition to +w, +r and +x can be used for reading and executing privileges. The less user friendly way to use chmod is to specifiy the 3 bits for each of the user, group and other rwx settings by an octal number. rwx can be interpreted as a binary number with three digits. r— for example would correspond to binary 100, or octal 4. rw- would correspond to binary 110 or octal 6. rwx would correspond to binary 111 or octal 7. Using this notion for each of the privileges, we could therefore say chmod 774 myfile. This would correspond to rwxrwxr—.

Permission errors and sudo

Now that we know how the file system tracks permissions, we understand the permission denied errors that seem to plague UNIX users every now and then. For example, if you are in a directory mydir, owned by someone else than you and which does not have group write permission for a group that you are a member of, trying to create a file will result in a permission denied error. The same will happen if you copy (say, using cp) a file to a location where you don’t have permission to write, or with a mv. In these cases, you need root access or at least sudo. The sudo command can be activated for the common user; typing then sudo cp file2 notmydir/ will do the copy as root, indeterminately overwriting whatever it finds in notmydir/.

passwd and groups files

The usernames and password were traditionally stored in the /etc/passwd file. If your system still has an /etc/passwd file, you can inspect it with less /etc/passwd (it is readable by everyone. However, the passwords are either encoded or in modern systems stored elsewhere). The group affiliations are stored in the /etc/group file. You should never have to change these files directly by editing them.

Processes

We have seen that UNIX systems support many users, and thus should be able to run many programs at once, as every user will independently want to run programs. Users can even run several programs. How does UNIX achieve this, considering that most UNIX computers, until recently, had only one processor?

The answer is that the computational tasks are assigned to structures called “processes”. The system keeps an inventory of all processes, and attributes a numeric id to each process, called the process id. The system gives each process access to the CPU in a round robin fashion. Because the process switching is so fast, it appears as if all processes run at the same time, but that’s and illusion, unless you have a multicore CPU or multiprocessor machine. When a computational tasks finishes, UNIX terminates the process. If a user types a new command in the shell (and puts it in the background), for example, a new process is spawned. Thus, processes come and go like people at a lively party. You can check out the party by typing the command top, which lists all the active processes in a dynamic display. You will see new processes appear in the list, and others disappear. The process list can be sorted in different ways (see man top). Another method to see the active processes is to use the ps command. (Useful options are ps -elF, see the man page). In a simplistic way, the process structure could be thought of simply as a running program, but it has other associated properties, such as data, environment variables, and management of devices, such as disk files, and permissions.

In UNIX, a process can only be created by “forking” an existing process. A new process cannot come out of nowhere! When a process is forked, the “environment” (certain associated properties of the process) are inherited in the new process. The new process still runs the program of the parent process, and the exec command is used to start a new program in the process (note that this all happens internally and is handled by the system, not by the user). That begs the question where the first process comes from. The first process in the system is called the init process and is created by the kernel during the system startup. The init process has a process id of 1. The init process has no parent process, and the parent process field simply contains the number 0.

Processes in a terminal

When you type a command in the terminal, it is executed by the same process as the terminal. Thus, all other terminal activity stops (you cannot type anymore), until the command terminates.

You can also start a new process from a terminal; the system will spawn a new process and execute the desired program in that process. The terminal remains responsive while your program executes (if it doesn’t eat up all of the system’s resources!).

The notation for running a program in the “background” is to append an ampersand to the command, as in emacs myfile.txt &. This will open a new window with the emacs editor in it, and the command prompt immediately returns, so we can type more commands into the terminal.

If we accidentally type emacs myfile.txt (omitting by accident the ampersand), we can still type C-z (hold the control key and type a ‘z’). This interrupts the currently running program. We can now either put the program back into the foreground using the command fg, or, more likely, in the background, using bg.

You can check all the programs that were spawned from your shell by using the command jobs. Each job has a number (different from the process id), that you can use as an argument to fg or bg. In addition, a + denotes the process that was last in the foreground. A - denotes the process that was the next to last process in the foreground. You can quickly switch between two processes by typing C-z fg - ….. C-z fg - …. etc.

Users

vi and emacs

Linux and distros

Useful UNIX commands

man pages

The man command gives access to the so-called manual pages, documentation for the different UNIX commands. You can use man ls to know more about the option of ls, for example.

Note that not all shell commands have man pages. The so-called built-in commands are documented in the man page of the shell. Built-ins include things like cd, pwd, fg, bg, jobs and many others.

pipes and redirect

basic

ls	list directory contents
cd	change directory
mkdir	create a new directory
rmdir	remove empty directory
more	page through file
less	page through file with more features

text handling

grep	match patterns in text files
cut	slice text files into columns
sort	sort lines in a file
wc	count lines, words and characters in a text file
vi	simple text editor
emacs	text editor (usually not in default install)

file related

cp	copy a file	cp file1 file2
mv	move a file	mv oldfile newfile
scp	secure copy a file to or from another server	scp xyz.tar otherhost:abc.tar
rm	delete a file	rm * (delete everything – be careful with that one!)
touch	create empty file / update modification times on a file	touch myfile.txt
cat	show file contents; concatenate files	cat file1 file2 > file3
file	show information on file type
gzip / gunzip	compress a file

process related

top	show running processes and other system information
ps	list running processes
kill	terminate a process and its child processes
^z	interrupt foreground process
bg	put process in background
fg	put process in foreground
jobs – list processes running from that terminal

system related

apt-get	install new system components (Debian and Ubuntu)
apt-cache	search component descriptions (Debian and Ubuntu)
mount	mount file systems
du	analyze disk usage

How to run Linux on your computer

You can run Linux on your computer without having to install a new operating system, using new virtualization software. Several different systems are available. For this class, we recommend you download the VMWare Player software (https://www.vmware.com/products/player/, free download) and download the virtual machine we created for this class that already contains some bioinformatics packages such as blastall, bioperl, and other programs.

Exercises

Read the man page of the man command (man man).
Read the man page of the ls command

Provide feedback

Saved searches

Use saved searches to filter your results more quickly