Detailed description of the ALEXA-Seq approach

Methods

Briefly, the goal of our analysis was to use short sequence reads (30-100 bp in length) generated by massively parallel RNA sequencing to perform gene expression analysis as well as identify and quantify both known and novel mRNA isoforms in a genome-wide fashion. Accomplishing this goal involved several basic steps. (1) Create a database of sequence features tailored to the analysis of alternative isoform expression. For example, we generated sequences representing all known connections of EnsEMBL exons as well as junctions representing all possible splicing events for the exons of each gene. We also identified 'exon regions' that are potentially informative of isoform expression. Features representing alternative exon boundaries, intron retentions and expression within intergenic regions were also generated. Existing annotations for these features were supplemented by examination of mRNA/EST databases to determine the sequence support and level of conservation of each expression feature. The figure below illustrates the creation of an 'ALEXA-Seq annotation database'. (2) Align sequence reads for each RNA sequence library to a sequence database consisting of the human genome, known transcripts and exon-exon junctions. (3) Extract expression measurements by analysis of these alignments. (4) Identification of differentially expressed genes and sequence features between samples. (5) Identification of the subset of differentially expressed features that are indicative of alternative expression (e.g. differential expression of an exon skipping isoform) rather than differential expression of a whole gene. (6) Import candidate gene lists and expression data into the ALEXA-Seq data viewer (See Results tab). A considerably more detailed description of our ALEXA-Seq analysis method is provided in Griffith et al., refer to this manuscript for any questions regarding the methods and results pertaining to the generation of data for the ALEXA-Seq data viewer.

Creation of an ALEXA-Seq annotation database

The regulation of gene transcription, transcript initiation, alternative splicing, and poly-adenylation

To assist in the identification of alternative expression events by massively parallel RNA sequence data, we developed the 'ALEXA-Seq' annotation database. Briefly, this database defines expression 'features' that can be informative of alternative expression events such as exon skipping, alternative exon boundary usage, inclusion of cryptic exons, intron retention, etc. For the human genome (build 36 / hg18), a total of ~3.8 million such features were defined. Each feature was annotated with information describing its size, repeat content, protein coding content, mRNA/EST sequence support, cross-species conservation (by examining EST/mRNA alignments from other species), etc. and assigned a descriptive feature name. A detailed description of the values defined for each feature can be downloaded: ALEXA Seq Schema Description. For further details refer to Griffith et al..