If this Red Dwarf quote strikingly reminds you of your computational biology skills, keep reading ;).
Well, speaking personally, I hardly didn't get no formal education at all.
No kidding, professor...
No, it's true, bud. That's why, sometimes, I don't know stuff. Like... well, practically everything.
Was this because you brought yourself up, sir?
Right. There was no one else around, so I had to teach myself. And seeing as I didn't know anything to begin with, lessons were long and slow; especially on Thursdays when I had double nothing.
You most likely have an administrator who will start you an account. If you have sudo rights, you can create new users by typing: sudo adduser username. You can add a user into sudoers list with sudo visudo command and inserting username ALL=(ALL) ALL at the end of the opened file.
Linux and Mac OS users:
1) open terminal (Ctrl+Alt+T in Linux)
2) type ssh username@servername
3) type your password (confirm authorization by typing yes if needed)
4) you are now in your home directory /home/your_username
1) download, install, and use PuTTY [http://www.putty.org/]
1) type passwd your_username
2) type your new password and confirm
1) type htop (or top)
2) see this page to understand its output: http://www.deonsworld.co.za/2012/12/20/understanding-and-using-htop-monitor-system-resources/
3) press Q to quit
Linux and Mac OS users can use SCP (Secure copy)
Copying file to server:
scp SourceFile user@host:directory/TargetFile
Copying file from server:
scp user@host:directory/SourceFile TargetFile
Copying directory from server:
scp -r user@host:directory/SourceFolder TargetFolder
Copying directory to server:
scp -r SourceDirectory user@host:directory/
Windows users can use WinSCP [http://winscp.net/eng/index.php] or any other SCP client.
How to make/copy/move/rename directories/files, and move around in the file system?
Read any introduction into Linux. Software Carpentry tutorials [http://software-carpentry.org/lessons.html] are very useful.
To display manual/help for a particular command:
To print your current working directory:
To list all files in your current folder:
To go into your home directory:
To go into a directory inside your current directory:
To change your curent directory one level up:
To go into a particular directory given by an absolute path:
To make a directory:
To remove a file or an empty directory:
To remove a directory with files in it:
rm -rf dirname
To move or rename a file:
To copy a file:
cp filename newfilename
How do permission and ownership rights work in Linux? Can other users open/copy (read permission) or modify (write permission) my data? How to make a script executable?
Read this article at http://linuxcommand.org/lts0070.php or any other article on Linux permissions. You only need to get familiar with two commands: chmod and chown.
By default, other users can open/copy all your files, but not modify them. If you have some folders/files which you do not want to be accessible by other users, type:
chmod -R 700 dirname
To change owner of a file or directory (helpful for sudoers):
sudo chown filip:filip filename sudo chown -R filip:filip dirname
To make a script executable:
chmod u+x scriptname
1) First check if the program is already available system-wide, type:
Or if you don’t know the binary name, you can try:
2) If it's not installed, you can do it yourself
Usually, executable binaries are available for download. Download specific binaries for our system (called something like exe Linux 64bit), make them executable (see above), and you can run your program locally in your folder by typing:
Sometimes, programs require to be compiled from source. Compilation usually differs a lot for various programs. Read Readme/Install files, manuals,... General procedure is to type ./configure followed by make.
3) How to put your program into your $PATH environmental variable?
Executable binaries (“programs”) can be run from any Linux folder given that they are present in a folder included in your PATH. This is very very useful feature. To tell which directories are in your path, type: echo $PATH | tr ':' '\n'. There is a hidden .bashrc file in your home folder. While in your home, type nano .bashrc and add a line like this: export PATH=/home/yourname/programs/programname:$PATH at the end of the .bashrc file to include your program-containing folder(s) into your path.
There are many ways how to make 3rd party programs available to all server users. To reduce redundancy of widely-used programs being installed by many users in their home directories, sudo users can compile/move binaries into e.g. the /opt/src directory and symlink binaries with the /opt/bin directory (has to be added to every user’s PATH).
1) before starting your analysis, type screen and confirm by pressing Space (or type screen -S analysisname to name your screen, then you can reattach using just its name)
2) start your analysis
3) press Ctrl+A+D to dettach from your screen
4) now you can log out from the server and your process will keep running
5) to reattach to the running screen process, type screen -ls to see running screens
6) type screen -r name_of_the_screen_to_reattach
There are other ways how to do the same thing (& and disown), but they are much less convenient than the screen command.
1) There are many workshops where you can get hands on experience, see http://evomics.org/.
2) If you're looking for books, get Practical Computing for Biologists [http://practicalcomputing.org/] or similar books from O'Reilly [http://shop.oreilly.com/category/browse-subjects/science-math/bioinformatics.do].
3) Search google and internet forums for your questions [http://seqanswers.com/, http://www.biostars.org/, http://stackoverflow.com/].
4) If you know someone experienced, bug him/her with questions and pick his/her brain.
Yes, you do and practically for all biology, but try to tell this to biologists...;). And you'll probably need perhaps more than one language. I'd suggest to start with Bash (you're already using parts of it and it's pretty simple) and then move on to Python [these two books are pretty awesome: http://pythonforbiologists.com/books/index.html], but you can do the same with Perl or R and I'm pretty sure you'll meet these languages anyway during your learning curve.
Talk to other users! An unspoken rule in many labs is to use less than ⅔ of all threads. Feel free to use as many threads as needed during weekends and holidays (or if you see that nobody's using the server e.g. overnight), but always leave one or two nodes for others to use for simple tasks. If you want to use more processors, change the process priority (nice/renice commands) to be lower than your default and basically act as an transient process using resources only when available. This is what I usually do with most of my processes.
To tar and compress a file using tar and gzip or bzip2:
tar -zcvf futurefilename.tar.gz filetocompress
tar -jcvf futurefilename.tar.bz2 filetocompress
To untar or decompress a file that was created using tar:
tar -zxvf filename.tar.gz
tar -jxvf data.tar.bz2
To compress a file by gzip or bzip2:
To decompress a .gz or .bz2 file:
Have a look (echo command) at $BLASTDB and $HMMERDB environmental variables. If they're set, it means that you can use database names (such as nr or nt) for your blast searches without specifying absolute path for these databases and blast and hmmer should be able to find the particular database files. If you want to add some large database into this folder or if you need to update some of the databases, contact your admin. Often, there is really no reason to blast against the huge and poorly annotated databases such as nr and nt, try to use RefSeq, SwissProt or other properly curated alternatives as much as possible.
Use user specified tabular (or XML) output with sscinames in it. Read BLAST manual for more info: http://www.ncbi.nlm.nih.gov/books/NBK1763/#CmdLineAppsManual.Quick_start. NCBI taxdb has to be in your/our $BLASTDB environmental variable.
-outfmt '6 qseqid sseqid evalue bitscore sgi sacc staxids sscinames scomnames stitle'
My analysis interferes with another analysis currently running (e.g. for RAM). Is there a way to pause this analysis, release its used memory and restart it when the other analysis is finished? If you are running your analysis inside a screen, you can reattach to it, pause it with Ctr+Z, and then deattach. To restart it: reattach again to the screen, restart your analysis by typing fg, and then deattach your screen. This should work and eventually release memory in most of cases. If not, some programs save checkpoints, so you can kill the job and then restart from the last saved checkpoint.
I would like to use a program with graphical user interface (IGV, IGB, Artemis, PathwayTools, ...), can I use the server for it?
Yes, you can, but I cannot guarantee you that it will be fast enough for serious work because it can be painstakingly slow. There is no other way than try it and see if it limits you in any way. Since these programs are usually really easy to install and not memory/CPU demanding, why not just use your laptop?
If you have sudo rights, using CPAN is extremely easy. Simply type:
Then specify which module you need to install, e.g.:
There are many ways how a non-sudo user can install modules just for him/her-self.
Simplest solution is to append $PERL5LIB environmental variable at the end of your .bashrc file like this:
echo 'export PERL5LIB=/home/yourname/my_perl_modules' >> .bashrc
Then doublecheck that it got set by printing its content:
If you have sudo rights, type the commands below.
To install R packages:
To install Bioconductor modules:
If you have sudo rights, type one of the two following commands:
sudo pip modulname
sudo easy_install modulname
To switch between available java version [for sudoers only], type the command below.
sudo update-alternatives --config java
Bam and sam files are usually enourmous: keep only one of them, use --no-unal in bowtie2, pipes in samtools, and other ways how to keep disk space usage low. Do not frequently copy and paste huge raw files from data folders, use their paths for programs to find them when using them for assemblies, mapping. If you trim and error-correct your files prior to assemblies, do it only once and keep the corrected files! If you do not use a file, compress it. Especially if it's a fastq or sam file. Many programs can use gzipped files directly.
To get human readable info for files/directories in your current folder, type:
du -sh *
To find all your files bigger than 10 GB in your home folder, type:
find ~ -size +10G
My text files (fasta, phylip, nexus, ...) work when using my Windows (or an old Mac) machine, but they don't work when uploaded to the Linux server. What's wrong?
Characters used to define line breaks in text files differ between different operating systems and most of programs cannot deal with it [http://en.wikipedia.org/wiki/Newline].
Windows systems use a combination of a carriage return (CR) and a line feed (LF) mostly because of historic printer-compatibility reasons.
All Unix systems use line feed (LF) only. Old Macs used to use carriage return (CR) only, but newer Macs use the same line break (\n) as in Linux. Just in case you have some old text files from Macs, mac2unix utility is also installed.
To figure out origin of your text file, type:
To convert these line ends, type one of the commands below (it's pretty self-explanatory):
- this is in my opinion one of the most important things to learn for computing in biology
- just google "regex cheat sheet", there are tons of tutorials and cheat sheets available
- if you often need to extract and modify text strings in huge files (and excel is slow or runs out of memory), these expressions can do the same thing and are really snappy
- once you manage the basic ones, you can use them in grep, sed, awk, perl, python, ... you name it
- be careful, though, and always test them properly with toy data sets as they can get pretty funky and idiosyncratic (not only for newbies...)
Ctrl+R -- to search in your bash history (all your previous commands)
Ctrl+D or Ctrl+C -- to kill the process running in your terminal
Tab -- to autocomplete commands/directories
Up and down arrows -- to show recently used commands
To open the vim editor and start practicing, type:
To open very simple command line text editor:
To print a text file:
To print a text file so that you can scroll down:
To download a file from an internet adress:
To search for files:
To print the first 10 lines of a file:
To print the last 10 lines of a file:
To join lines of two files on a common field:
join filename1 filename2
To split a file into pieces:
To remove sections from each line of files:
To merge lines of files:
To sort lines of text files:
To translate or delete characters:
To report or omit repeated lines:
To transfer a URL:
A general purpose distributed information browser for the World Wide Web
To display a line of text:
To format and print data:
To print lines/words matching a pattern:
To filter and transform text:
To read from standard input and write to standard output and files: