Data analysis - documentation : Pipeline FAQ
This page last changed on Apr 03, 2007 by maising.
Installation/setupI cannot install Scipy.Scipy/Numpy/BLAS/LAPACK are no longer needed from pipeline version 0.1.9 onwards. Our Perl/Python interpreters (or /usr/bin/env) are not in the path specified in your scripts' shebangs line.We've tried to use /usr/bin/env in the scripts' where possible; but if that does not suit your needs, setting this up properly is really something you have to figure out youself; quoting from O'Reilly's "Learning Perl":
What parallelisation/cluster solutions do you recommend?At Solexa UK we are currently using qmake from Sun Grid Engine 6.0 to run the pipeline. This is the only cluster solution we have tested. (Parallelisation on single SMP machines is automatically supported by make.) Parallelisation with LSF lsmake has been reported to work at external sites. In principle, any parallelised derivative of a recent version of GNU make should be able to run the pipeline on a cluster. However, your luck will also depend on the version of make used:
For more information on parallelisation see Pipeline parallelisation. We would be extremely grateful for feedback if you manage to run the pipeline under any other distributed make/cluster batch queueing system. Does the pipeline run under LSF make?LSF make has been reported to work outside of Solexa. See Pipeline parallelisation. Goat/BustardI run the goat_pipeline.py/bustard.py script, but it does not actually produce any output!Add the "--make" option. GeraldThe pipeline does not produce any graphical output!Make sure that gnuplot and ImageMagick's convert are installed and in the path for the user/machine the pipeline is running on. Make sure that ghostscript is installed. Test postsctipt output for gnuplot. Make sure that the unfortunately named "convert" utility from the Staden package does not come before the ImageMagick convert in your path. What should the contaminant file look like?The contaminant file allows to calculate a second set of alignments with respect to a limited set of known contaminant sequences. These are given to the pipeline in "tag" or "monotemplate" mode; i.e., reads are aligned directly against the tags, starting from the first base, and no attempt is made to find an optimal alignment position. If you want to remove contamination from a genomic sample with a large number of sequences at different base positions, the recommended solution is to create a combined fast-a file containing the sequence of your sample and the suspected contaminant genome, separated by a series of "padding" bases ("N"), whose number should be equal to or greater than the experimental read length, to avoid spurious alignments across the genome boundaries. What do the READ_LENGTH/USE_BASES parameter mean?The read length is the length that is used for the alignments, not the actual sequence read length! If you are using the USE_BASES nYYYYY... option, the number of Ys in the string has to match READ_LENGTH exactly. The READ_LENGTH always has to be less or equal to the actual sequence read length. So if you have sequenced 27 cycles, and the first USE_BASES character is "n", the READ_LENGTH has to be 26 or less. Integration with other toolsHow can I submit Solexa data to a trace archive?There is currently an active discussion between various archives, research institutes and the commericial companies developing short-read technologies to standardise the short-read data formats. This effort is currently organised by British Columbia, and there is an email list ssrformat@bcgsc.ca. Subscriptions to the list can be done through http://www.bcgsc.ca/mailman/listinfo/ssrformat. Solexa expects to be able to make submissions to public archives soon. How can I load Solexa data into Ensembl?Integration with Ensembl is under active development at Solexa as part of a UK Department of Trade and Industry and Medical Research Council LINK grant to support a collaborative project with the European Bioinformatics Institute, Imperial College London and Wellcome Trust Sanger Institute. There is a working version running at Solexa. It requires modifications to release 39, and a separate Solexa database. All of this work will become publicly available, but there is no official release yet; if you are interested in this, contact analysis@solexa.com. How can I get the pipeline output into gap4?The latest version of gap4 has support for Solexa traces, but it struggles to handle large volumes of short-read data. An intermediate stop-gap version called "tgap", which can be used while gap5 is being developed, can be obtained from the website of the Sanger institute - see Pipeline output and visualisation. The site also provides conversion utilities for Solexa data. How can I import pipeline output into my database/favourite viewer/analysis package?We do not currently have any explicit export functionality for packages other than gap4 or Ensembl. However, by keeping all the final and intermediate analysis output in flat file formats and XML, we were hoping to make it easy for others to integrate the output into their own infrastructure. Future versions of the pipeline will most likely make use of database engines for the storage of analysis information. GeneralWhat do all the files in the run-folder mean?Most of the important files are described - at least at a basic level - in the Run-folder specification and What do the different files in an analysis directory mean. If these files do not contain enough information, please contact us directly. Please note that some output files are not documented because they may be considered intermediate files for the software's own purpose, or the format may not be considered to be finalised and may change in future versions. We will also try to provide more documentation in future releases. Last but not least, you have access to the source code producing any of the output files, which may give some insight as to what the numbers in a file mean. Who do I contact with questions regarding the pipeline?We have an email address "analysis@illumina.com" for any data analysis and pipeline related questions, which would be the appropriate forum. |
![]() |
Document generated by Confluence on Apr 05, 2007 11:25 |