This page last changed on Mar 06, 2007 by maising.

The data analysis pipeline for the Illumina 1G analyzer consists of a number of different modules designed to perform the complete data analysis of a run. This includes analysis of images captured by the instruments, remapping of cluster positions, base-calling and optionally sequence alignment and various statistics and graphical displays. The images, other instrumental logs and all output produced by the analysis pipeline are stored in flat text files in a hierarchical folder structure called the "run folder". This structure is described in detail by the Run-Folder specification.

The installation of the pipeline is discussed in Pipeline installation. If you are unsure how to set up your IT infrastructure and analysis computer(s) for the pipeline, you may want to read Pipeline IT requirements.

An introduction to the usage of the pipeline is given in Pipeline usage. More detailed documentation for pipeline modules is linked below. There is also the Pipeline QuickStart Guide.

The pipeline is split into different modules. These modules are collections of Perl and Python scripts and C++ code for performance-critical operations. The different pieces of software are managed by the "make" utility that is commonly used to manage the compilation of software projects. Each module has its own Makefile associated with its analysis tasks. There are scripts to autogenerate the Makefiles (analogous to what the configure mechanism would do for a software project).

The main modules are currently:

  1. The image analysis module (called Firecrest).
  2. The base-caller (Bustard)
  3. The sequence alignment. There are two different modules that can be used alternatively:
    1. PhageAlign does an exhaustive alignment (all possible alignments up to arbitrary edit distances are explored), but is slow.
    2. Eland is very fast and aligns for up to two errors from a reference.
  4. Allele caller (not part of the pipeline yet) and other sequence and alignment utilities.

There are two different modules used to generate the the relevant Makefiles:

  1. Goat generates Makefiles for image analysis, alignment and base-calling (Firecrest and Bustard). See Goat documentation - Firecrest, Bustard. Release notes can be found in Pipeline releases.
  2. Gerald generates Makefiles for sequence alignment (PhageAlign) and the other downstream analysis tools. See GERALD User Guide and FAQ. The use of ELAND is explained by Whole genome alignments using ELAND.

The two generators are developed largely independently and differ somewhat in syntax and usage. However, it is possible to start Gerald automatically from Goat, so that only one invocation is sufficient to generate a complete analysis run, which can then be executed using the "make" command.

An overview of the output produced by the pipeline is discussed in Pipeline output and visualisation. If you are interested in more details, the folder structure and many of the primary output files are partly documented in the Run-Folder specification. To understand Gerald output, it may also be useful to look at What do the different files in an analysis directory mean.

Parallelisation of the pipeline and methods to decrease analysis turn-around time are discussed in Pipeline parallelisation.

Some frequently asked questions are discussed in Pipeline FAQ.

If you are interested in alignment scores or base-call calibration, Alignment Scoring Guide and FAQ and Base-call calibration may be useful.

Document generated by Confluence on Mar 09, 2007 16:11