Here is a great source to understand Expectation Maximization (EM) Algorithm and Hidden Markov Model (HMM). It also explains how to apply HMM in bioinformatics.

1st, 2nd, and 3rd Generation Genome Sequencing Technologies


Oxford Nanopore Technologies


Most popular software tools and technologies in bioinformatics

C, C++, C#, Perl, XML, Java, Python, R, SQL, CUDA, MATLAB, Octave, Spreadsheet Applications,etc.


Bioinformatics news in Nature.

The elements of bioinformatics

This is Eagle's Elements of Bioinformatics, also known as the Periodic Table of Bioinformatics. The table lists many bioinformatics tools by category, licence, and year of release. The online version is searchable and includes more information on each tool that is available by clicking on its symbol. 


Some useful UNIX commands used commonly in bioinformatics

You can use this command when BED file must be sorted by chrom then chromStart. 

sort -k1.4,1.5n -k2,2n file > sorted_chrom_file

Tutorials to setup various bioinformatics tools and solving the problems

1. Bedtools installation (UNIX)

curl<version>.tar.gz > BEDTools.tar.gz
tar -zxvf BEDTools.tar.gz
cd BEDTools
make clean
make all
sudo cp bin/* /usr/local/bin/



2. Bedtools installation (Mac OS)

First option:
Go to the following webpage and install Homebrew.
Go to terminal and type this command: ruby -e "$(curl -fsSL"
Install Homebrew-science formulae so type this command: brew tap homebrew/science
Install bedtools: brew install bedtools
This is the first way to install bedtools, but sometimes this way is not working.

Second option:
Download bedtools file here Bedtools
Unzip the file that you downloaded and go to its directory. cd /path/bedtools2-master
Type this command: make
Type this command: sudo cp ./bin/* /usr/local/bin
Now you can check: bedtools

3. Samtools installation (UNIX)

Samtools is a set of utilities that manipulate alignments in the BAM format. It imports from and exports to the SAM (Sequence Alignment/Map) format, does sorting, merging and indexing, and allows to retrieve reads in any regions swiftly.

Samtools is designed to work on a stream. It regards an input file ‘-’ as the standard input (stdin) and an output file ‘-’ as the standard output (stdout). Several commands can thus be combined with Unix pipes. Samtools always output warning and error messages to the standard error output (stderr).

Samtools is also able to open a BAM (not SAM) file on a remote FTP or HTTP server if the BAM file name starts with ‘ftp://’ or ‘http://’. Samtools checks the current working directory for the index file and will download the index upon absence. Samtools does not retrieve the entire alignment file unless it is asked to do so.


2.1. Pileup Format

It desribes the base-pair information at each chromosomal position. This format facilitates SNP/indel calling and brief alignment viewing by eyes. The pileup format has several variants.

The default output by SAMtools looks like this:

seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&

seq1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+

seq1 274 T 23 ,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6

seq1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<<

seq1 276 G 22 ...T,,.,.,...,,,.,.... 33;+<<7=7<<7<&<<1;<<6<

seq1 277 T 22 ....,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&<

seq1 278 G 23 ....,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<<

seq1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<<

where each line consists of chromosome, 1-based coordinate, reference base, the number of reads covering the site, read bases and base qualities. At the read base column, a dot stands for a match to the reference base on the forward strand, a comma for a match on the reverse strand, `ACGTN' for a mismatch on the forward strand and `acgtn' for a mismatch on the reverse strand. A pattern `\+[0-9]+[ACGTNacgtn]+' indicates there is an insertion between this reference position and the next reference position. The length of the insertion is given by the integer in the pattern, followed by the inserted sequence. Here is an example of 2bp insertions on three reads: seq2 156 A 11 .$......+2AG.+2AG.+2AGGG <975;:<<<<< Similarly, a pattern `-[0-9]+[ACGTNacgtn]+' represents a deletion from the reference. Here is an exmaple of a 4bp deletions from the reference, supported by two reads: seq3 200 A 20 ,,,,,..,.-4CACC.-4CACC....,.,,.^~. ==<<<<<<<<<<<::<;2<< Also at the read base column, a symbol `^' marks the start of a read segment which is a contiguous subsequence on the read separated by `N/S/H' CIGAR operations. The ASCII of the character following `^' minus 33 gives the mapping quality. A symbol `$' marks the end of a read segment. Start and end markers of a read are largely inspired by Phil Green's CALF format. These markers make it possible to reconstruct the read sequences from pileup. SAMtools can optionally append mapping qualities to each line of the output. This makes the output much larger, but is necessary when a subset of sites are selected.




4. CONTRA installation (UNIX)

Download CONTRA tarball and decompress it with the following command:

tar –xvzf CONTRA.<version>.tar.gz


To run CONTRA, you need the following programs:

  • Python 2.6+
Most of the scripts for CONTRA are written in Python. It requires version 2.6 in order to use the multiprocessing module. Python Website


  • R (now support all versions of R)
R Website


  • BEDTools (Included in CONTRA package. See Installation Guide)
The original source of the BEDtools can be found in BEDtools source


  • SAMtools
SAMtools source


  • [Optional] DNACopy (R-library that will be used for predicting large CNV)
DNACopy Website


You can test CONTRA program. They offer test files. Reference genome is also needed.

Example: We assume we have a target file target_test.BED and four BAM files that we want to turn into baseline file test1.BAMtest2.BAM , test3.BAM and test4.BAM. Our intended output folder is in ~/Baseline/sampleBaseline/ with the final baseline file name baseline_test.

To run the baseline script on this sample, the command line argument is:

python --target target_test.BED --files test1.BAM test2.BAM test3.BAM test4.BAM --output ~/Baseline/sampleBaseline/ --name baseline_test


Problems & Solutions


Error: The requested bed file (/home/usario/Base2/buf/CNATable.10rd.10bases.20bins.BED) could not be opened. Exiting!


Bedtools version can cause some problems while running the code. Therefore, it is advised to install offered version with CONTRA.


How to calculate read depth with bedtools


DNA Hybridization


Bioinformatics Algorithms Course


How to convert bed format to fasta format


How to install zlib