Below are a list of programs and software packages frequently used at IEG. They are used for sequence comparison and alignment, oligonucleotide probe design, statistical analysis, amplicon and shotgun sequencing processing, primer design, phylogenetic analysis, and other useful activities.
CommOligo: CommOligo can be used to design optimized oligonucleotides for microarray construction. The program designs single or multiple oligonucleotide probes (8-128mers) for individual sequences/genes using global alignment algorithms values for sequence identity, the length of maximum continuous stretches, and free energy. An accessory software tool, Commoligo_PE, can be used to select appropriate criteria if sufficient experimental data is available.
Please cite: Li X, He Z and Zhou J. 2005. Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignment and parameter estimation. Nucl Acids Res, 33: 6114-6123.
CommOligo2.0: This updated version of CommOligo is able to select group-specific oligonucleotide probes for a group of highly homologous sequences. It was specifically designed to create probes for functional genes with high similarities and is very useful for the construction of functional gene microarrays.
BLAST+: This is a standalone version of BLAST that can be installed in your PC and run in MS-DOS mode as well as platforms, including general Linux and Unix systems. Instructions and options can be found on the NCBI website. It offers search, database and filtering applications.
Clustal: The Clustal multiple alignment of nucleic acid and protein sequences is available in command-line or graphical interface and can be installed on your computer or run online. Clustal W/X is the "classic Clustal". It provides an integrated environment for performing multiple sequence and profile alignments and analyzing alignment results. Omega is the newest version and provides an increase in scalability, so that hundreds of thousands of sequences can be aligned quickly (within a few hours). Clustal is available for a number of platforms including, Windows, Linux, FreeDSD and Mac.
Primer3: Primer3 is a free program commonly used for designing PCR primers. It can design primers for multiple sequences using batch mode command line style. It is also able to select primers for a single sequence using a web-based interface. The program can be run on Windows, Linux/Unix, and other systems. Primer3 is free and offered on an "as-is", and use-at-your-own-risk basis.
Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, Rozen SG (2012) Primer3 - new capabilities and interfaces. Nucleic Acids Research 40(15):e115
Koressaar T, Remm M (2007) Enhancements and modifications of primer design program Primer3 Bioinformatics 23(10):1289-91
MEGA: Molecular Evolutionary Genetics Analysis is an integrated software package for multisequence alignment, phylogenetic tree construction, and evolutionary analysis (e.g., ancestral sequence, clock test, time tree, disease mutation). The newest version (v. 7) can be downloaded from here. An online manual is available.
MEGA is available free of charge for Windows, Mac OS X, and Linux.
Kumar S, G Stecher, and K Tamura. 2016. MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Mol Biol Evol. doi:10.1093/molbev/msw054
PAML: Phylogenetic Analysis by Maximum Likelihood is a software package for the evolutionary analysis of phylogenetic trees using maximum likelihood methods. Analyses possible include tests of the molecular clock hypothesis, rate heterogeneity along lineages and among sites, ancestoral sequence prediction and simulations. Nucleotide, amino acid and codon-based models are all supported by the software. PAML can be downloaded from here. The following platforms are supported: Windows, Mac OS X, and linux.
Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555-556.
Yang, Z. 2007. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. 24, 1586-1591.
BioPerl: BioPerl is a community effort to construct a set of standardized Perl modules designed to simplify common bioinformatics analyses. Tasks that can be carried out using BioPerl include: report parsing (Blast, HMMer, etc.); manipulation of sequence files (translation, format interconversion, etc.); sequence alignments (identify mismatches, multisequence alignment, etc.) and phylogenetic trees; evolution and population genetics analyses (PAML, pairwise statistics, Ka/Ks calculation, etc.); feature annotation. Additional information can be found here.
BioPerl requires Perl
The BioPerl package can be downloaded from here and is available for Windows, Mac OS X, and linux.
Artemis/ACT: Artemis and the Artemis Comparison Tool (ACT) are available from the Wellcome Trust Sanger Institute. Artemis is a free genome viewer and analysis tool that can accept sequence files in a variety of formats, including annotated GenBank and EMBL formats. ACT is a DNA sequence comparison tool based on Artemis that is useful in comparative genomics analyses. Both Artemis and ACT are written in Java and may be run on Windows, Mac OS X, and linux.
Carver T, Harris SR, Berriman M, Parkhill J and McQuillan JA. 2012. Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics.28(4):464-9.
Carver T, Berriman M, Tivey A, Patel C, Böhme U et al.2008. Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. Bioinformatics. 24(23):2672-6.
PyMol: PyMol is an open source molecular visualization program useful for creating publication quality protein structure images. The software can be downloaded from here; a subscription version is also available that comes with maintenance and support. The software is available for Windows, Mac OS X, and linux.
“If you torture the data long enough, Nature will confess.”
Ronald H. Coase, 1991 Novel Laureate in Economics.
Learning statistics and knowing how to use different methods cannot and should not be separated. Often there is more than one method that can provide answers, but not all are necessarily legitimate for the situation. Almost all statistical analysis has options for improving and optimizing performance. Statistical methods should not be selected based on the results but by a careful assessment of its assumptions. Without knowing what to do and just selecting default settings without determining whether the data meet the assumptions, one’s statistical analysis should not be trusted. In other words, the interpretation of analysis results cannot be done without understanding the specific statistical method used. Certain statistical procedures can be learned easily if one is equipped with a sound foundation in statistics. Thus, it is always important to learn fundamental statistics before torturing statistics to provide answers one would like to see.
Some references for learning more about statistics and specific methods are listed below.
- van Emden, H. 2008. Statistics for Terrified Biologists. Oxford: Blackwell Publishing. - This book provides an overview of basic statistical concepts in plain English, focusing on the foundations of parametric statistics.
- http://ordination.okstate.edu/overview.htm - An excellent overview of ordination methods.
- SpringOpen Books has a variety of statistics related titles. These are a few related to life sciences.
The R Package for multidimensional and spatial analysis
The R package is a group of program with multiple modules for different multidimensional and spatial analysis procedures. The coauthor of the R package, Pierre Legendre, is also coauthor of renowned so-called “greenbook” of quantitative ecology Numerical Ecology. All of the modules they implemented in the R package version 4 are introduced in their manual and discussed in depth in the greenbook. One drawback of the R package in this Bill Gates dominant world is that there is no Windows version at this moment. The R package is free and currently works at the classical environment of OS X.
R can be downloaded at https://www.r-project.org/; and R studio can be downloaded here: https://www.rstudio.com/
SAS (SAS Institute, Cary, NC)
SAS is a statistical software package used for general statistical analysis. The current version is 9.4. SAS has many modules such as SAS/STAT for statistical analysis, SAS/INSIGHT for exploratory data analysis, SAS/IML for matrix programming. SAS is a very powerful program, but it is not very intuitive since SAS/STAT runs using a program format in which users must define the location of the input file, the data format, procedures to run and output.
Applied Statistics and the SAS Programming Language (website link) by Ronald P. Cody & Jeffrey K. Smith (Prentice Hall) provides an overview of the SAS program and detailed, practical coverage of SAS commands, data output, data interpretation and reporting results.
The University of Oklahoma provides SAS free for faculty, staff and students (Link to IT Software Catalog Entry).
GSLIB (Geostatitical Software LIBrary)
GSLIB is a directory containing the geostatistical software developed at Stanford University. GSLIB has numerous modules run in the DOS command environment. The original version was written in Fortran 77 and the current version is written in Fortran 90 (v. 2.907). The user unfriendliness of the software has been improved somewhat by a commercially available GUI interface, WinGslib (available for free website link).
GSLIB, Geostatistical Software Library and User’s Guide by Clayton V. Deutsch & André G. Journel (Oxford Univ. Press) provides instructions on using this program.
Other available packages
General packages with user-friendly GUI interface, SPSS & SYSTAT
More Ecological purpose packages, CANOCO & PC-ORD
Specialized in community indices, EstimateS (Robert Collwell) & SPADE (Anne Chao)