This page last changed on Aug 22, 2007 by maising.

The following document describes the installation of the Genome Analyzer data analysis pipeline.

Platforms

The pipeline is usually developed and run on Linux, which is the only platform we tend to support officially. Although it has never been tested and will not be supported by Illumina, it should also work on any Unix variant on which the prerequisites below are available. However, we may not be able to fix issues that you encounter on any platform other than Linux.

Prerequisites

The following prerequisites are required for the pipeline:

  • Perl (>= 5.8); the XML::Simple module and its dependencies have to be installed (see http://www.cpan.org)
  • Python (>= 2.3)
  • GNU make (>= 3.78); qmake from Sun Grid Engine (SGE) 6.0 has been reported to work
  • gnuplot (>= 3.7, 4.0 recommended)
  • ImageMagick (>= 5.4.7)
  • ghostscript
  • an SMTP server (if automated email run reports are desired)
  • zlib
  • bzlib
New dependencies in version 0.3

The last two dependencies were introduced with pipeline release 0.3; they were not needed in previous versions.

For a compilation from source, the following additional software is required:

  • gcc (including g++)
  • An optimised FFT library, either:
    • FFTW (>= 3.0.1, >= 3.1 recommended; GPLed). Note that the single-precision version of FFTW is required (often called libfftw3f.a). This can be produced by specifying the "--enable-single" option to the "./configure" procedure of FFTW. Download site http://www.fftw.org.
      ./configure --enable-single
      make
      make install
      
    • Intel Maths Kernel Library
    • IBM's ESSL

For SRF support (optional) with a modified version of io_lib, the following additional software is required:

  • Autoconf (≥ 2.59)
  • Optional libraries:
    • libcurl to enable io_lib to access files via their URL
    • libidn to provide IDN support to libcurl

The Linux distribution we are using and testing against is RedHat; the listed dependencies are satisfied by the RedHat packages perl-*, python-*, make, autoconf, gnuplot, ImageMagick, ghostscript, zlib, zlib-devel, bzip2, bzip2-devel, libtiff-devel and gcc-* as well as their respective prerequisites. The Perl XML::Simple module and fftw3 may have to be downloaded separately and installed from source.

Setup directory

It is generally a good idea to have a separate production and development copies of the code. We also recommend to keep old versions of the pipeline when you install a newer versions.

Setting up email reporting

The script Gerald/runReport.pl is called at the end of a pipeline run and sends you an email when (if) a run successfully completes. To use the (optional) email notification you need to set up an SMTP server (in the unlikely case that you haven't got one already) and set the following parameters in the GERALD config file (see Gerald User Guide and FAQ).

EMAIL_LIST your.name@yourdomain.com clamouring.experimentalist@yourdomain.com

A space separated list of email addresses you want the report to be sent to. You may get away with your.name instead of your.name@yourdomain.com, depends on your email server.

WEB_DIR_ROOT http://server/SHARE

The software assumes it can create a valid URL from the GERALD folder path by chopping off the first two path elements and prepending WEB_DIR_ROOT,
e.g. if the path is /mnt/someDrive/blah/blah/GERALD and WEB_DIR_ROOT is http://server/SHARE it will make links such as
http://server/SHARE/blah/blah/GERALD/Error.htm

EMAIL_DOMAIN yourdomain.com

Your SMTP server may refuse to accept emails from or send emails to addresses that don't end @yourdomain.com.

EMAIL_SERVER yourserver:2525

yourserver is the name or IP address of a mail server willing to accept SMTP email requests from you. 2525 is the port number of the SMTP service on that server. Generally this will be 25 - this is the default value if no port number is specified. The utility nmap (if you have it installed) may help you which port if any on a server is hosting an SMTP service.

If you don't get a friendly message when you do telnet yourserver yourPortNumber from the machine you're running GERALD on then email reporting will not work. You can run runReport.pl directly in test mode:

/runReport.pl --test yourserver:25 yourdomain.com anything your.name@yourdomain.com

should send you a test email. If it doesn't the transcript it prints out will hopefully tell you what went wrong.

Please note email reporting is considered a "nice to have" rather than a "must have" feature of the pipeline. The code as it stands works on all of the SMTP servers at Illumina. However, whether it will work for you will depend heavily on how your SMTP servers are set up locally. Your first port of call in the event of problems should be your local systems people. Failure of email reporting should not prevent the rest of a pipeline run coming to a successful completion.

Obtaining the source code

Installation from a source tar ball

This is the normal method to install the pipeline. Change to the directory in which you wanto install the pipeline and type

tar xvfz GAPipeline-version.tar.gz

where version is the version corresponding to the archive you've got. Of course, you may have to adjust the path to the archive suitably.

Installation from a binary tar ball

Unpack the archive using the command

tar xvfz GAPipeline-version-bin.tar.gz

Installation from cvs

Note: We do not currently provide external cvs access.

The pipeline can be found in the cvs module "Pipeline".

Compilation

Change into the "Pipeline" directory and type

make
make install

This will first build all C++ code, and then copy the relevant executables into the directories "Goat" and "Gerald" which contain the scripts and Makefile generators. It may be useful to add these two directories to your default path.

This build system is non-standard, as we chose to keep executable files within the pipeline folder structure rather than copying them to /usr/local/bin or similar. The main reasons for this decisions are:

  1. No root access needed.
  2. Multiple versions of the pipeline can easily co-exist.
  3. Quite a number of scripts would need to be installed.

We may change this system at some stage in the future, and of course you can simply install the contents of the "installed" Goat/ and Gerald/ folders whereever you like.

For a compilation on a 64-bit platform, see the section below.

If you want to use the Intel Math Kernel Library as an FFT backend, you have to compile the image analysis module "Firecrest" separately from the rest of the project and specify the additional variable "MKL" to make, as in "make MKL=true". You will most likely also have to set the MKL specific paths in the Makefile to the appropriate locations on your system.

Compilation on 64-bit Linux and other platforms

The compilation of the pipeline with the current pipeline Makefiles works on all platforms the pipeline is running on that we are aware of, including many 32- and 64-bit Linux versions and Solaris. However, if your compilation does not succeed on a less commonly used platform (possibly 64-bit architectures or platforms other than Linux), you may have to make manual changes to the Makefiles. If you run into compilation problems, you may have to adapt the platform specific gcc-compiler flags. Because of the optimised FFT libraries, the Firecrest/Makefile is particularly likely to be sensitive to platform-specific peculiarities. Please notify us of any changes that you needed to make.

Document generated by Confluence on Jan 11, 2008 15:41