This page last changed on Nov 22, 2007 by rshaw.

The Summary.htm page provides an overview of quality metrics for a run and links to more detailed information in the form of pages of graphs. It is intended to be quick to load into a browser; depending upon the number of lanes and tiles used, the pages to which it links may take much longer to display.

As well as being an HTML page, Summary.htm is also a valid XML file (or at least meets the expectations of the Perl XML::Simple module) to facilitate automated information extraction. The format of Summary.htm may, however, change to some extent between Pipeline releases, e.g. to provide additional statistics relevant to new analysis modes.

Chip Summary

This reports the instrument ID and the run folder. The Chip ID is a placeholder field; currently `unknown'. (The terms `chip' and `flowcell' are used interchangeably.)

Chip Results Summary

This table displays Summary chip-wide performance statistics for the run. Both the original number of detected clusters and the number that passed quality filtering are shown. In addition, a chip yield in kilobases is presented. This is the sum over analysed lanes of the product of number of quality-filtered clusters and number of bases per cluster used for analysis, i.e. excluding bases masked-out by a USE_BASES directive.

Lane Parameter Summary

This records information about the sample in each flowcell lane and the analysis that has been specified for it :-

  • Sample ID: placeholder field; currently `unknown' (undocumented approaches to supplying this value should be considered unsupported)
  • Sample Target: the reference sequence(s) against which reads from the sample in this lane are to be aligned. Depending on analysis mode this may be the name of a folder containing one or more sequence (and auxiliary) files or the name of an individual file; the required file formats also depend on analysis mode.
  • Sample Type: the analysis mode (the options are constrained by the nature of the sample) for reads from this lane.
  • Length: the number of bases used per read (excluding any bases masked out using USE_BASES); where multiple reads are produced per cluster and a distinction is maintained between them during analysis (e.g. eland_pair analysis of paired end reads), their respective lengths will be listed.
  • Filter: the criterion for clusters to be selected for analysis beyond the preliminary stages (statistics for all detected clusters and for the subset that pass filtering are annotated as `raw' and `PF', respectively in Summary.htm).
  • Num Tiles: the number of tiles from the lane that will be used in the analysis
  • Tiles: a hyperlink for each lane to the location (still within Summary.htm) of the statistics for individual tiles in that lane.

Lane Results Summary

This table displays basic data quality metrics for each lane. Apart from Lane Yield, which is the total value for the lane, all the statistics are given as means and standard deviations over the tiles (used) in the lane :-

  • Clusters (raw): the number of clusters detected by the image analysis stage (Firecrest) of the Pipeline
  • Clusters (PF): the number of detected clusters that meet the filtering criterion (see Lane Parameter Summary)
  • 1st Cycle Int (PF): the average of the four intensities (one per channel, i.e. base type) measured at the first cycle (after any masking of cycles), averaged over filtered clusters.
  • % intensity after 20 cycles (PF): the corresponding intensity statistic at (masked) cycle 20 as a percentage of that at the first cycle.
  • % PF Clusters: the percentage of clusters passing filtering
  • % Align (PF): the percentage of filtered reads that were uniquely aligned to the reference
  • Alignment Score (PF): the average filtered read alignment score (reads with multiple alignments or none effectively contribute scores of 0)
  • % Error Rate (PF): the percentage of called bases in aligned reads that do not match the reference

If eland_pair analysis has been specified for one or more lanes, then there will be two such summaries - one for each read. All lanes for which analysis has been specified will be represented in `Expanded Lane Summary : Read 1' but only those for which eland_pair analysis has been specified will contribute statistics to the Read 2 table.

Expanded Lane Summary

This displays more detailed quality metrics for each lane. Apart from the phasing and prephasing information, all values are tile means for the lane.

  • Clusters (tile mean) (raw): the number of clusters detected by the image analysis stage (Firecrest) of the Pipeline
  • % Phasing: the estimated (or specified) value used by the Pipeline for the percentage of molecules in a cluster for which sequencing falls behind the current position (cycle) within a read
  • % Prephasing: the estimated (specification is not recommended) value used by the Pipeline for the percentage of molecules in a cluster for which sequencing jumps ahead of the current position (cycle) within a read
  • % Error Rate (raw): the percentage of called bases in aligned reads (from all detected clusters) that do not match the reference
  • Equiv Perfect Clusters (raw): the number of clusters in the ideal situation of read base perfectly predicting reference base that would provide the same information content (entropy of reference base given read base and a priori assumption of equiprobable reference bases) as calculated for all actual detected clusters
  • % retained: the percentage of clusters that passed filtering
  • Cycle 2-4 Av Int (PF): the intensity averaged over cycles 2, 3 and 4 for clusters that passed filtering
  • Cycle 2-10 Av % Loss (PF): the average percentage intensity drop per cycle over cycles 2 to 10 (derived from a best fit straight line for log intensity v. cycle number)
  • Cycle 10-20 Av % Loss (PF): the average percentage intensity drop per cycle over cycles 10 to 20 (derived from a best fit straight line for log intensity v. cycle number)
  • % Align (PF): the percentage of filtered reads that were uniquely aligned to the reference
  • % Error Rate (PF): the percentage of called bases in aligned filtered reads that do not match the reference
  • Equiv Perfect Clusters (PF): the number of clusters in the ideal situation of read base perfectly predicting reference base that would provide the same information content (entropy of reference base given read base and a priori assumption of equiprobable reference bases) as calculated for the actual clusters that passed filtering

In the same manner as for the Lane Results Summary, specification of eland_pair analysis for any of the lanes will result in two Expanded Lane Summary tables - displaying statistics for Read 1 and (for lanes where it exists) Read 2.

Per-Tile Statistics

Below the two types of lane summary are per-tile statistics, grouped into a table for each lane. The statistics are a subset of those already presented in the Lane Results Summary but in these tables are averages over the detected (raw) or filtered (PF) clusters in individual tiles.

In the event that no clusters in a tile pass filtering, all the statistics for that tile will be displayed within square brackets. Such an occurrence suggests an exceptional situation (e.g. a bubble) within the tile; the brackets indicate the tile has thus been excluded from the calculation of lane statistics and that the values are reported only for diagnostic purposes.

Monotemplate Summary

This table appears after the per-tile summary table for lanes for which monotemplate analysis is specified (corresponding to monotemplate samples). Statistics are presented for each monotemplate specified :-

  • Lane: lane number
  • Template: the monotemplate sequence
  • Count: the number of reads that aligned to the monotemplate
  • Percent: the percentage of reads aligned to monotemplates that aligned to the current monotemplate
  • True 1st Cycle Intensity: the average intensity of the first base in reads aligned to the monotemplate
  • Av Error Rate: the average error rate over all cycles as a percentage of called bases for reads aligned to the monotemplate
  • % Perfect: the percentage of reads that are a perfect match to the monotemplate out of those that align to it

Pair Summary

For lanes for which eland_pair analysis was performed, there will be two per-tile summary tables (one for each read) and these will be preceded by a set of tables collectively entitled the `Pair Summary'. These provide statistics about the alignment outcomes of the two reads individually and as a pair, the latter including relative orientation and separation (insert size) of partner read alignments.

Note that if the criteria for paired alignment to be attempted are not met, the subset of tables reporting paired alignment results will be replaced with the statement `Paired alignment not performed'.

Individual Alignments

This table displays the frequencies of the various possible combinations of individual alignments outcomes within a pair, i.e. for a particular Read 1 alignment outcome what was its Read 2 alignment outcome. The possible outcomes are :-

  • Unique : a unique alignment
  • Rescuable : multiple alignments but such that a unique paired alignment could potentially be derived if the partner read is either unique or similarly rescuable
  • Repeat : multiple alignments such that consideration of the partner read will not help in selecting between them
  • Not Matched : the read was not aligned
  • Low Quality : the read contained too many uncalled bases to attempt alignment

Unique Paired Alignments

This table breaks down the unique paired alignments according to the alignment outcomes of their component reads, which can thus only be unique or rescuable.

Unique Paired Alignment Effects

This table is similar to the Unique Paired Alignments one but focuses upon what proportions of unique paired alignments were rescued, i.e. one or both of the partner reads were not individually aligned uniquely.

Non-unique Paired Alignments

This table breaks down the non-unique paired alignments according to the alignment outcomes of their component reads. (Paired alignment is attempted only for those pairs where the individual alignments of both partners are either unique or rescuable.)

Mispairing Rate

Mispairing is considered to happen when one read of a pair can be aligned (whether the alignment is unique, rescuable or against a repeat) but the other can not (whether because it is of low quality or simply no match can be found for it in the reference).

Relative Orientation Statistics

The relative orientation of a pair is the orientation of Read 2 relative to that of Read 1, i.e. defining the Read 1 orientation to be forward. It is defined as positive if the Read 2 position is greater than the Read 1 position. These statistics are given only for those pairs in which both reads were individually uniquely aligned, since these are the ones used to determine the predominant relative orientation (other orientations are considered anomalous and filtered out).

The ASCII art in the column headings is intended as a visual reminder of the definitions of the four possible relative orientations.

Insert Size Statistics

Statistics are derived from the insert sizes of those pairs in which both reads were individually uniquely aligned and which also have the predominant relative orientation. First the median is determined. Then a standard deviation value is determined independently for those values below the median and those above it. The lower and upper threshold for acceptable insert sizes are then defined as three of the relevant standard deviations below and above the median, respectively.

Insert Statistics (% of individually uniquely alignable pairs)

This table shows the numbers of inserts (out of those used to calculate insert size statistics) considered acceptable in size and of those falling outside the thresholds displayed in the above table. The percentages are relative to the original number of pairs in which both reads were individually uniquely aligned.

Document generated by Confluence on Jan 11, 2008 15:41