Introduction to UNIX (and Linux)

UNIX is a generic term for a whole series of similar but distinct computer operating systems (OSes) that orginated from Ken Thompson’s and Dennis Ritchie’s early work on OSes at Bell Labs in the late 60’s. (Click here for an interesting and much more complete history of UNIX). Some of the features of UNIX that have made it popular are: multitasking – the ability to do many different things at once; multiuser – more than one person can use the computer at the same time, or at different times; portability – UNIX and linux run on almost every computer architechture ever invented; and for every UNIX flavor there is a large built-in suite of powerful, free programs. There have been many UNIXes designed over the years for a whole host of (expensive) compter hardware architechtures that you have probably never heard of. Reletively recently, however, operating systems based more or less on UNIX have begun to invade ordinary people’s lives in the form of linux and Apple Computer’s new Mac OS X. (Since, for the purposes of this class, UNIX and linux are indistinguishable, I will use them interchangably. Many people do.)

Linux was started as a hobby of a guy called Linus Torvalds in 1991 as a way to use a free UNIX-like OS on computers that regular people used, like the PCs that your Mom would use. It took off like a rocket and today is wildly popular. But linux itself isn’t a complete OS. The reason that linux was successful, and the reason that we are able to use it for this course, was becasue of Richard Stallman’s GNU (which stands for GNU’s Not Unix) project. The GNU project is a *free* UNIX-like OS and software suite that is maintained by a large number of programmers located all over the world.

Like UNIX, there are many different flavors of linux that run on a whole host of computer architechtures. Unlike UNIX, most linux distributions are completely free to download. Some, like the Red Hat Linux that installed on our lab machines, cost about $100 if you want to buy CDs with the software instead of waiting for the download. Included in that $100 is the ability to call someone and complain when things don’t work. I am not sure, but I am pretty confident that Red Hat is the most popular version of linux out there. So enjoy.

What you can do well with Linux:

Create and edit files (like Word)
Search through files (like “find in file” in Windows or on Macs)
Organize files (like “My Computer” and “My Pictures” in Windows or on Macs)
Run programs (like double clicking in Windows or on Macs)
Email
FTP
Biological sequence analysis (quickly do large numbers of blast runs, clustalw, perl scripting, etc.)

What you can’t do well with UNIX:

Watch movies
Play games
Run Java applets of dancing bears in your Web browser
and so on …

So the take home message here is that you can do a lot fast with a little practice with UNIX. It helps to know a little Perl. We’ll get to that.
The UNIX OS
The organization of the UNIX file system is very simple. It’s basically a whole bunch of directories (usually called folders in the Windows and Mac world) organized into a tree structure. Here is a toy UNIX file system that I will use as an example:

You’ll notice right away that at the top of the tree is a directory labeled / (which is called the root directory). It’s the very top of the UNIX file structure, and so it contains the all of the directories, and everything else, in the file system. The next level down from the root directory contains many directories that have important system programs in them and (at least in this example) a directory that contains all of the users of the computer. For us, the imporant level of directories is the next one down, the user directories themselves (sudhir and john in the example). I only mention the non-user directories for completeness. You do not need, and it would be wise to stay out of, any of the directories above your user directory. It’s not likely, but it’s possible that you could accidentally mess things up. Bad. For everyone.

When you log in, you will be in the equivalent directory of john. (john is my login name, so my home directory is called john.) This is where the action will happen. In this directory I can make new directories to organize my stuff, move around, make new programs, run programs, edit files, and so on. Your home directory is the first “working directory” you will encounter. The concept of the working directory is simple but critical. Your current working directory is your location in the file structure at a given time. Or, in other words, wherever you currently are is your current working directory.
Getting started
By now, I’m sure your on the edge of you seat saying, “when can we get started?”. I know, I know, learning UNIX has that effect on people. First, open a new terminal, if one isn’t open already (ask if you don’t know how to do this). You will see what’s called a prompt waiting for you to issue a command, e.g.:

[john@ccbtl11 john]$

A “terminal” is the means by which you talk to the computer. The “prompt” or the “shell prompt” interperates what you type into language the computer understands and then returns the results back to you. You can open multiple terminals at once to make it easier to work in different directories simultaneously. They will all open up in your home directory.
Making directories and moving around
Issue the ls command to ‘list’ the contents of your home directory.

[john@ccbtl11 john]$ ls
[john@ccbtl11 john]$

Nothing should be listed, becasue nothing is there. To make a new directory, use the command mkdir.

So, for example, to make the directory taing, you would issue the command

[john@ccbtl11 john]$ mkdir taing
[john@ccbtl11 john]$

Now, do another ls, and the directory taing should be there

[john@ccbtl11 john]$ ls
taing
[john@ccbtl11 john]$

Now I’ll make the rest of the directories in the example.

[john@ccbtl11 john]$ mkdir crap rna mp3
[john@ccbtl11 john]$ ls
crap mp3 rna taing
[john@ccbtl11 john]$

You need to be able to move around into your newly-created directories. This is done with the ‘change directory’ command, cd. So, to change into the taing directory,

[john@ccbtl11 john]$ cd taing
[john@ccbtl11 taing]$

To see where you are in the directory tree, use the ‘print working directory’ command, pwd

[john@ccbtl11 taing]$ pwd
/home/john/taing
[john@ccbtl11 taing]$

Now say you want to go back up to the john directory. To do this, you need to know how relative directory positions are described in UNIX.

The current directory is always referred to as . (That’s right, just a period.)

The directory above you is always called .. (That’s two dots.)

So, in our example, to go back up to the john directory,

[john@ccbtl11 taing]$ cd ..
[john@ccbtl11 john]$ pwd
/home/john
[john@ccbtl11 john]$ ls
crap mp3 rna taing
[john@ccbtl11 john]$

Now, let’s make a couple more directories, one in which to store all of our perl programs that we’ll write this semester and one to store homework.

[john@ccbtl11 john]$ mkdir taing/perl taing/hw
[john@ccbtl11 john]$ ls taing
hw perl
[john@ccbtl11 john]$

Now you can see two directories listed in taing, hw and perl. Let’s move around a bit more, to get more comfortable with it. I’ve numbered the line here for ease of discussion.

[01] [john@ccbtl11 john]$ cd taing/perl
[02] [john@ccbtl11 perl]$ pwd
[03] /home/john/taing/perl
[04] [john@ccbtl11 perl]$ cd ../hw
[05] [john@ccbtl11 hw]$ pwd
[06] /home/john/taing/hw
[07] [john@ccbtl11 hw]$ ls /
[08] bin home local nfs src usr
[09] [john@ccbtl11 hw]$ ls ../../../../
[10] bin home local nfs src usr
[11] [john@ccbtl11 hw]$ ls /home/john/
[12] crap mp3 rna taing
[13] [john@ccbtl11 hw]$ pwd
[14] /home/john/taing/hw
[15] [john@ccbtl11 hw]$ cd
[16] [john@ccbtl11 john]$ pwd
[17] /home/john
[18] [john@ccbtl11 john]$ cd taing/perl/
[19] [john@ccbtl11 perl]$ ls ~
[20] crap mp3 rna taing
[21] [john@ccbtl11 perl]$ cd ~
[22] [john@ccbtl11 john]$ pwd
[23] /home/john

In line, [04], I moved directly from the perl directory into the hw directory. In lines [07] and [09], I listed the entire contents of the root directory. In line [11], I listed the contents of my home directory by giving the ls command what’s called the “complete path” of my home directory. In line [15] I just typed cd. This always brings you back to your home directory. In line [19] I listed to contents of my home directory from the taing/perl directory by using the tilde “~”. Tilde always means home directory. Line [21] does the same thing as line [15].

From these examples, you can see that you are able to string together any number of directories for moving around or listing the contents of directories from any other location in the file structure. You just need to know where you are going relative to where you are, or know the complete path of where you are going. Play around with it.
Making, editing, removing, and viewing the contents of files
OK, now you can make new directories to organize all your files. But you need to know how to make the files, right?

One of the cool things about UNIX is that all files that you work on are just plain text files. Have you ever written a paper in Word version 5 and tried to open it in Word version 6? Sometimes it works, sometimes it doesn’t. It always works in UNIX, since the all of the files are in plain text, and always will be. You can make fancier things with other programs in UNIX, but most of the real work can easily be done in plain text. You will write your homework in this class in plain text (at least if you want me to grade it!), program in Perl using plain text, edit microarray data files that are in plain text, and so on. And you will do this all in one program that edits plain text files. It’s called emacs. There are a lot of good text editors out there, but we will use emacs becasue it’s what I know and it’s the best editor around. (If any of you are vi people, you of course are free to use it. Then again, if there are vi people out there, why are you in this class?) Emacs will take some getting used to, but it shouldn’t be that hard, and plus there’s no choice 😉
emacs is your friend
[Here is the emacs user guide table of contents. There is lots of useful info in it.]

To make a new, empty text file in emacs, type:

[john@ccbtl11 john]$ emacs newfile.txt

A new, empty emacs window should pop up on your screen. Move your cursor over the emacs window and type. Type anything. Your dog’s name, your favorite color, or your favorite muppet.

Now, find the “control” key (it may be CTRL on the keyboard). Push it, and hold it down. Keep holding it down. Now, push X and S in that order, without holding them down. Take your finger off of the CTRL key. You just saved what you wrote in a file called newfile.txt. Now hold down the CRTL key and hit X and then C. You just closed the emacs window. Do a ls in the directory to see the file you just made. These are the emacs commands that you will use about 99% of the time. A longer list of emacs commands (but by no means complete!) are:
Some emacs commands
CTRL-X-S Save contents of file.
CTRL-X-C Quit emacs.
CTRL-X-F Open a new file without stopping and restarting emacs.
CTRL-V Scroll down one page.
ALT-V Scroll up one page.
CTRL-A Move to the beginning of the line.
CTRL-E Move to the end of the line.
ALT-< Move to the beginning of the file. (note: hold down shift to use arrow key)
ALT-> Move to the end of the file. (note: hold down shift to use arrow key)
CTRL-S Search for a string letter by letter. In other words, as you type the word you are looking for, emacs finds the letters as you type.
CTRL-S, Return Plain old search for a string.
CTRL-G Start command sequence over again. E.g., if you mess up by hitting CRTL-X-X by accident, hit CTRL-G to start over. CRTL-G is your friend. You can do it anytime.

There’s many, many more emacs commands, but these are some of the most useful. You can read the emacs tutorial anytime by using the Help tab at the top of the emacs window (that’s right, you can use your mouse!).
A quick note on file naming conventions
You will notice that in the above example, I named the file newfile.txt, not just newfile. This is for a reason. The reason is that the file I created is just a text file (hence the .txt extension). Proper usage of file extentions are critical to sucessful organization in the UNIX environment and in bioinformatics circles. They are not forced on you – you can name files anything you want (well, almost anything. See the last subsection of this section) – but if you do that I guarentee that you will quickly get confused.

Here are some examples:

Perl scripts end in the .pl extension
Files with sequence in the Fasta format end in .fa or .fas or .fasta
When I run the different forms of blast, I name the output file so that it corresponds to the type of blast I ran, e.g.: filename.blastn, filename.blastp, filename.blastx, filename.tblastx, etc.

and so on …
Moving files
Let’s download a file and look at it in emacs. Get this file. Be sure to note which directory the file gets saved into from the Netscape download. Make a new directory below your home directory called yeast. Now, move your new file into the yeast directory. This is done with the move command mv. E.g.:

[john@ccbtl11 john]$ ls
chr04.fsa crap/ mp3/ newfile.txt rna/ taing/
[john@ccbtl11 john]$ mkdir yeast
[john@ccbtl11 john]$ mv chr04.fsa yeast
[john@ccbtl11 john]$ ls
crap/ mp3/ newfile.txt rna/ taing/ yeast/
[john@ccbtl11 john]$ ls yeast/
chr04.fsa
[john@ccbtl11 john]$

There is a copy command, too: cp. It’s syntax is the same as mv, e.g.:

cp [file to move/copy] [directory to move/copy file to]

Now, go into the yeast directory and look at the chromosome 4 sequence. Pretty exciting, isn’t it?
Removing files
To remove files and directories, use the ‘remove’ command rm, e.g.:

[john@ccbtl11 john]$ cd [john@ccbtl11 john]$ ls
crap/ mp3/ newfile.txt rna/ taing/ yeast/
[john@ccbtl11 john]$ rm newfile.txt
rm: remove newfile.txt (yes/no)? y
[john@ccbtl11 john]$ ls
crap/ mp3/ rna/ taing/ yeast/
[john@ccbtl11 john]$

Now to remove the yeast directory, use the rmdir command, e.g.:

[john@ccbtl11 john]$ rm -f yeast/chr04.fsa
[john@ccbtl11 john]$ rmdir yeast
[john@ccbtl11 john]$

Two things to note here. The first is that the -f flag is the “force” flag — the shell didn’t ask whether or not I was sure I wanted the file removed, it just did it. The second is that the rmdir command only works on empty directories.
more or less
To look quickly at a file’s contents without opening it in emacs, use the commands more or less. These quickly and crudely display the files contents in the terminal. You can scroll though the file page by page using the spacebar. Press Q to quit out of them and return to the shell prompt. less is basically just a fancier more program; it allows you to scoll though with up and down arrow keys. But less is not installed on all UNIX computers. These programs are usefull when you just want to check the contents of a file without editing it. (E.g., is my data in the file important_file1.dat or important_file2.dat?)
Don’t use spaces, exclamation points, question marks, …
One more thing about file names in UNIX. Spaces are no good. Neither are exclamation points, question marks, and all such things. Don’t use them. Use the underscore “_” or dash “-” instead. Basically just stick to alphanumerics (a though z, A through Z, and 0 through 9), underscores, dashes, and periods. I know it’s kind of primitive, but it’s just the way it is.
Miscellaneous tips and tricks
Here’s a bunch of things that don’t fit neatly into a catagory but are important and useful.
RTFM
Computer folks often use the acronym RTFM (for Read The F*#!ing Manual) in response to stupid questions. It’s not nice, but reading the manual is often times a better way to learn that asking a question. UNIX has what are called “man pages” for many (but not all) of the common commands. If you want to learn more about ls, for example, type man ls at the command line. Below is an example of what a typical man page looks like. Try using some of the options with the ls command to see what the output looks like. (The -a and -l are the most commonly used flags with ls.)

[john@ccbtl11 john]$ man ls

NAME ls – list contents of directory

SYNOPSIS
/usr/bin/ls [ -aAbcCdfFgilLmnopqrRstux1 ] [ file … ]

For each file that is a directory, ls lists the contents of the directory; for each file that is an ordinary file, ls repeats its name and any other information requested. The output is sorted alphabetically by default. When no argument is given, the current directory is listed. When several arguments are given, the arguments are first sorted appropriately, but file arguments appear before directories and their contents.

The following options are supported:

-a List all entries, including those that begin with a dot (.), which are normally not listed.
-A List all entries, including those that begin with a dot (.), with the exception of the working directory (.) and the parent directory (..).
-b Force printing of non-printable characters to be in the octal \ddd notation.
-c Use time of last modification of the i-node (file created, mode changed, and so forth) for sorting (-t) or printing (-l or -n).
-C Multi-column output with entries sorted down the columns. This is the default output format.
-d If an argument is a directory, list only its name

… and so on …

Some other helpful sites are:

Intro to UNIX from Lincoln Stein’s CSHL Genome Informatics course.

Intro to Unix commands from Indiana University.

It can also be helpful to do Google searches for UNIX tips. If you are really into it, I can recommend a couple of good UNIX books.
Tab completion
Tab completion rules. Say you have five files in a directory called

really_really_important_data_wow_so_important1.dat
really_really_important_data_wow_so_important2.dat
really_really_important_data_wow_so_important3.dat
really_really_important_data_wow_so_important4.dat
really_really_important_data_wow_so_important5.dat

After cursing yourself for naming the files so stupidly, you need to look though the files with less to find the data you want. Except you don’t want to keep typing really_really_important_data_wow_so_important… everytime. Use tab completion. It works like this. Type “r”. Hit tab. The shell will finish typing the names of all the files that begin with “r”, up until there is a character that isn’t common to all the files. E.g.,

[john@ccbtl11 john]$ r

[Hit tab]

[john@ccbtl11 john]$ really_really_important_data_wow_so_important

[And the shell will type everything out the the numbers 1, 2, 3, and so on.]

Here you can hit tab two times in row quickly, and the shell will give you all of the files that complete the match. This works in any case. Try hitting tab twice at a blank command prompt. Cool, huh? Play around with tab completion, it’s a handy thing and second nature to UNIX folks.
Cutting and pasting, UNIX style
In the Windows and Mac world, you can cut and paste type between progams. You usually highlight what you want, go to the Edit tab at the top of the program, select Copy, go to other program, put the cursor where you want to paste the type, go to the Edit tab at the top of the program and select Paste. You can do this in UNIX, too, but there’s a shortcut. Highlight the text you want to copy using the left mouse button, select where you want the type to go, and hit the middle mouse button. It’ll paste it there.
Wildcards
You can use what is called a wildcard (the “*” astericks) when dealing with lists of files. For example, if you were in a directory with say 100 files in it; some clustalw files, some blastn files, some blastx files, and so on, and you were interested in only the blastx files in the directory, you could simply do a ls and look by eye (this example is from one of my directories, so don’t laugh):

[john@ccbtl11 john]$ ls
0_1_all.aln
39_FP_1.fa
BOXSHADE.ps
ORNL_39.out
YHR054C_and_flank.fa
YHR054C_and_flank_0_1_2_4.aln
YHR054C_and_flank_0_1_2_4.dnd
YHR054C_and_flank_0_1_2_4.fa
YHR054C_and_flank_REV.fa
YHR054C_and_flank_REV_0_1.aln
YHR054C_and_flank_REV_0_1.dnd
YHR054C_and_flank_REV_0_1.fa
YHR054C_and_flank_REV_0_1_2_4.aln
YHR054C_and_flank_REV_0_1_2_4.dnd
YHR054C_and_flank_REV_0_1_2_4.fa
allyeast_v_39_FP_1.blastn
allyeast_v_cand_39_intergenic.blastn
cand_39_intergenic.fa
cand_39_intergenic_and_cup1.fa
cand_39_intergenic_and_cup1_intergenic.fa
cand_39_intergenic_and_cup1_intergenic_REV.fa
find_orfs
find_orfs_new_assem
new_others_v_cand_39_intergenic.blastn
new_others_v_cand_39_intergenic.blastn.aln
new_others_v_cand_39_intergenic.blastn.parsed
new_others_v_cand_39_intergenic_and_cup1_intergenic.blastn
new_others_v_cand_39_intergenic_and_cup1_intergenic.blastn.aln
new_others_v_cand_39_intergenic_and_cup1_intergenic.blastn.aln-only-top-two
new_others_v_cand_39_intergenic_and_cup1_intergenic.blastn.parsed
new_others_v_cand_39_intergenic_and_cup1_intergenic_handdone.blastn.parsed
new_others_v_cand_39_intergenic_and_cup1_intergenic_handdone.blastn.parsed.bak
new_v_39_inter_REV.aln
new_v_39_inter_REV.dnd
new_v_39_inter_REV.parsed
nt_v_cand_39_intergenic.blastn
nt_v_cand_39_intergenic.blastx
nt_v_cand_39_intergenic_and_cup1.blastn
others_v_YHR054C_and_flank.blastn
others_v_YHR054C_and_flank.blastn.aln
others_v_YHR054C_and_flank.blastn.dnd
others_v_YHR054C_and_flank.blastn.parsed
others_v_YHR054C_and_flank_REV.blastn
others_v_YHR054C_and_flank_REV.blastn.aln
others_v_YHR054C_and_flank_REV.blastn.dnd
others_v_YHR054C_and_flank_REV.blastn.parsed
others_v_cand_39_intergenic.blastn
others_v_cand_39_intergenic.blastn.aln
others_v_cand_39_intergenic.blastn.dnd
others_v_cand_39_intergenic.blastn.parsed
others_v_cand_39_intergenic_and_cup1.blastn
others_v_cand_39_intergenic_and_cup1_intergenic_REV.blastn
others_v_cand_39_intergenic_and_cup1_intergenic_REV.blastn.aln
others_v_cand_39_intergenic_and_cup1_intergenic_REV.blastn.dnd
others_v_cand_39_intergenic_and_cup1_intergenic_REV.blastn.parsed
others_v_intergenic_and_cup1.blastn
others_v_intergenic_and_cup1.blastn.aln
others_v_intergenic_and_cup1.blastn.dnd
others_v_intergenic_and_cup1.blastn.parsed
others_v_intergenic_and_cup1_intergenic.blastn
others_v_intergenic_and_cup1_intergenic.blastn.parsed
pombe_v_pred_RNA_39.blastn
pred_RNA_39.fa
new_others_v_cand_39_intergenic_and_cup1_intergenic.blastn

Which is a bit of a chore to look through, don’t you think? Now, if I wanted to look at only the blastx files,

[john@ccbtl11 john]$ ls *.blastx
nt_v_cand_39_intergenic.blastx

Which is a slightly more managable output. The wildcard is very powerful; play with it a bit. It’s also very dangereous when used in the rm command. Be careful!
Putting processes in the background
Say you want to start emacs to edit a file but you want to run blast at the command line, too. You could open a new terminal, and this is a fine idea, but your desktop can get pretty crowded with windows. You can put processes in the backround in UNIX. If you just type

[john@ccbtl11 john]$ emacs newfile.txt

you get your emacs window, but you don’t get the command line back. To get the command line back, append the “&” symbol to the end of the command, e.g.:

[john@ccbtl11 john]$ emacs newfile.txt &
[john@ccbtl11 john]$

Viola! You have a emacs window and you have the shell prompt back to keep on working. If you forget to add the ampersand to the end of the command, you can get the shell prompt back by hitting CTRL-Z and then typing bg to put it in the background.
Redirecting output
Now imagine you are blasting your favorite protein against Genbank, and that your favorite protein is a kinase. You are gonna get a helluva lot of hits, and unless you are the Greatest American Hero, you ain’t gonna be able to read the blast report as it scrolls past your terminal at the speed of light. You need to direct the output into a file to study at your leisure. You can do this with the “>” symbol, e.g.:

[john@ccbtl11 john]$ blastp /data/blast-db/genbank my_kinase.fa E=0.001 -cpus=1 -filter=dust -wordmask=seg > genbank_v_mykinase.blastp &
[john@ccbtl11 john]$

This looks complicated, but it’s pretty simple. I blasted the file my_kinase.fa against the genbank database that was located in /data/blast-db/. The flags E, cpus, filter, and wordmask are things we will learn about later. The part of the command [ > genbank_v_mykinase.blastp &] tells the shell to put the results of the blast search into a file called genbank_v_mykinase.blastp and to put the whole process in the background so you can keep working while the blast job is running.
Command history
OK, now say you just typed in the long blastp command line argument above (using tab completion of course), but that you accidentally typed in my_kinase.f instead of my_kinase.fa. Blastp will return an error. Instead of retyping the whole command, you can use the arrow keys. If you hit the “up” or “down” arrow keys, the shell will show you all of the commands you have typed in during this session (sometimes, depending on how the system is set up, from previous sessions, too). You can just edit the mistake you made using the left, right, backspace, etc. keys (just add an “a” to your filename in this case) rather than retyping the whole thing.

If you want to see what you have typed in recently, type history at the command line. It will show you a numbered list of the commands you have used. If you want to rerun a command and the command arguments are complicated (as in the blast example above), type “!” (usually called shebang) followed by the number of the command line argument and the shell will rerun it.
Moving files to and from remote computers
Here is a website with a list of free SSH and SCP clients for windows and mac machines.

There is two ways to move files from computer to computer in the UNIX world, to bad way (using ftp), and the good way (using scp). ftp stands for File Transfer Protocol, and is the original file transfer program. It’s OK for transferring files, but it stinks security-wise. It sends your username and password in plain, human readable text across the network so that any punk 13 year old kid in the Netherlands can dip into the network traffic, get your username and password, and then bring down the Genetics department computers. The Officially-sanctioned Bio5488 Way to transfer files is with scp, or Secure CoPy. All of the computers that we use should have scp installed. scp’s syntax is simple to use; it’s very similar to the cp command, e.g.:

[john@ccbtl11 john]$ scp somefile.txt john@warlord.wustl.edu:~/somefile.txt
john@warlord’s password:
somefile.txt 100% |*****************************| 256 00:00
[john@ccbtl11 john]$

In this case, I am transferring the file somefile.txt to the computer warlord in my home directory (indicated by the tilde). The colon after the remote computer’s name is important – it tells scp that we are indeed transferring this file to a remote computer.

If I were transferring the file from a remote computer to the local machine, we would do something like this:

[john@ccbtl11 john]$ scp john@warlord.wustl.edu:~/perl/coolprog.pl .
john@warlord’s password:
coolprog.pl 100% |*****************************| 196585 00:01
[john@ccbtl11 john]$

Here I have told scp that we want the file called somefile.txt fromthe perl directory (which itself is in my home directory) to be placed in the current directory, indicated by the “.” (which always means the current directory).
Permissions
Permissions are a very important part of the UNIX environment. Permissions are the rights that you and others have on files and directories in the UNIX filesystem. For example, a file that contains the website for this course will have much different permissions than the file the contains the scores for your homeworks. You can give and take away the rights to read, write, and execute the files and directories under your home directory. Here are the three types of permissions and what they enable one to do to files and directories:

r — read files; able to ls contents of directories
w — write to files; able to create or delete files in directories
x — execute files; able to cd into directories

These permissions can be granted to four different kinds of users:

u — you alone, the user
g — your group, i.e. the people in this class
o — other users not in the group
a — all others, i.e. the rest of the world

To look at the permissions on a file or directory, use the ls -l command. E.g.:

[john@ccbtl11 john]$ cd taing
[john@ccbtl11 taing]$ls -l
total 4
drwx—— 2 john bio5488 512 Dec 20 10:41 hw/
drwxr-xr-x 2 john bio5488 512 Dec 20 10:41 perl/
[john@ccbtl11 john]$

You can see the two directories hw and perl listed there, along with a bunch of other info. The important parts are noted here:

flags owner group world user group date modified

d rwx — — 2 john bio5488 512 Dec 20 10:41 hw/
d rwx r-x r-x 2 john bio5488 512 Dec 20 10:41 perl/

To change the permissions on files and directories, use the chmod command, e.g.:

[john@ccbtl11 taing]$ chmod a+w perl/
[john@ccbtl11 taing]$ ls -l
total 4
drwx—— 2 john bio5488 512 Dec 20 10:41 hw/
drwxrwxrwx 2 john bio5488 512 Dec 20 10:41 perl/

Which says to give the entire world the right make and delete files in the perl directory (bad idea). Let’s undo that:

[john@ccbtl11 taing]$ chmod og-wr perl/
total 4
drwx—— 2 john bio5488 512 Dec 20 10:41 hw/
drwxr–r– 2 john bio5488 512 Dec 20 10:41 perl/
[john@ccbtl11 john]$

Better. Now everyone can ls the contents of perl, but no one but you can do anything with the files.
Pipes (the “|” symbol)
Pipes allow you to feed the output of one command into another. The most common time to use this feature for beginners (that includes me) is to feed the output of a ls command into the programs more or less in directories with so many files that the ls command cannot fit them into a single shell window. So, for example, if you were to do a ls in the directory used in the Wildcard section above in a smallish shell window, the resluts would just fly by you. Use the pipe:

[john@ccbtl11 john]$ ls | less

and you can scroll through the ls output at your leisure.

Regular Expressions using grep
Regular expressions (regexps) are one of the reasons that UNIX and Perl are so widely used in genomics and computational biology. They take a few minutes to get used to, but after that most people quickly become hooked on them. What follows is a general introduction to regexps; we will see these in Perl a lot; we are just introducing them here.

Regexps are used in many different UNIX programs such as grep, emacs, and in the programming language Perl. They are basically a shorthand way of describing a set of strings without having to fully describe all of the strings (or a set of substrings you are interested in) in the set. (“String” is a term that will come up a lot – it’s just a computer-sciency term for “list of characters”. A sentance is a string. So is a list of numbers. Basically, anything you write or read in an emacs window could be described as a string.) For example, say you had a file of all the predicted proteins in the human. There would be on the order of 30,000 ORFs listed. In this case, we could describe a string as one of the 30,000 entries for the set of ORFs, where each entry (or string) contained the ORF name, ORF sequence, ORF molecular weight, and predicted ORF function. You could describe the entire set of entries as 30,000 strings, or you could use regexps as shorthand to describe a subset of entries that you are interested in.

Here is a toy example of a file similar to what I described above:

prot01 MKGLRWTYQSDCALA 1650 Unknown
xvr33 MALCPCPCPCDGR 1430 Copper binding

.
.
.

prot30000 MQDALA 660 Unknown

Now lets use grep to parse through this file.

[john@ccbtl11 john]$ grep Unknown human.file
prot01 MKGLRWTYQSDCALA 1650 Unknown
prot30000 MQDALA 660 Unknown
[john@ccbtl11 john]$

It retuned the lines that contained the word “Unknown”. That’s pretty straight forward.

Spelling mistakes? Something not clear? E-mail Christina at chen@genetics.wustl.edu
Last updated: Friday Jan 9, 2004 Created by John McCutcheon, 2003