ALEXA-Seq schema description

ALEXA-SEQ SCHEMA DESCRIPTION

Each of the following field names is followed by a brief description. Note that some fields correspond to specific feature types only.

FIELD DESCRIPTIONS - BASIC

XXX_ID - Primary ID of feature type. Guaranteed to be unique within the feature type (Transcript_ID, ExonRegion_ID, Junction_ID, etc.)

FID - Feature ID. Guaranteed to be unique across all feature types

Gene_ID - Unique ALEXA gene ID (use to obtain additional info from an ALEXA gene database). ALEXA and EnsEMBL gene IDs have a one-to-one relationship

Gene_Name - EnsEMBL gene symbol

EnsEMBL_Gene_ID - EnsEMBL gene ID

Gene_Evidence - Known versus predicted genes

Gene_Type - Gene type according to EnsEMBL (e.g. protein_coding, pseudogene, miRNA, etc.)

Description - Full gene description from EnsEMBL

Supporting_EST_Count - The number of EST alignments from ESTs of the target species supporting the feature

Supporting_EnsEMBL_Count - The number of EnsEMBL transcripts supporting the feature

Supporting_mRNA_Count - The number of mRNA alignments from mRNAs of the target species supporting the feature

Supporting_xEST_Count - The number of EST alignments from ESTs of non-target species supporting the feature

Supporting_xmRNA_Count - The number of mRNA alignments from mRNAs of non-target species supporting the feature

Conserved_Species_Count - The number of species (other than the target species) with ESTs/mRNA supporting the feature

FIELD DESCRIPTIONS - COORDINATE/COMPOSITION RELATED

Chromosome - Source chromosome

Unit1_start - Start position in Gene coordinates (i.e. from start of the first exon to the end of the last exon including all introns)

Unit1_end - End position in Gene coordinates (i.e. from start of the first exon to the end of the last exon including all introns)

Unit1_start_chr - Start position in Chromosome coordinates

Unit1_end_chr - End position in Chromosome coordinates

Unit2_end - Similar to 'Unit1' above but for the downstream portion of an exon-exon junction sequence (applies to exon-exon junction sequences only)

Unit2_end_chr - Similar to 'Unit1' above but for the downstream portion of an exon-exon junction sequence (applies to exon-exon junction sequences only)

Unit2_start - Similar to 'Unit1' above but for the downstream portion of an exon-exon junction sequence (applies to exon-exon junction sequences only)

Unit2_start_chr - Similar to 'Unit1' above but for the downstream portion of an exon-exon junction sequence (applies to exon-exon junction sequences only)

Strand - Chromosome strand of the feature ('1' = '+ve strand'; '-1' = '-ve strand')

Seq_Name - A human readable, abbreviated feature name (e.g. ER1, E1_E3, I2, etc.)

Base_Count - Nucleotide base length of the feature

UnMasked_Base_Count - Number of bases that are not masked according to EnsEMBL

Coding_Base_Count - Number of bases of the feature that overlap 1 or more known open reading frames

FIELD DESCRIPTIONS - FEATURE-SPECIFIC ANNOTATIONS

Active_Base_Count - The number of 'Active' (EST/mRNA supported) bases contained within an intronic or intergenic region

Active_Region_Count - The number of discrete 'Active' (EST/mRNA supported) sub-regions contained within an intronic or intergenic region

Silent_Base_Count - The number of 'Silent' (not supported by an EST/mRNA sequence) bases contained within an intronic or intergenic region

Silent_Region_Count - The number of discrete 'Silent' (not supported by an EST/mRNA sequence) sub-regions contained within an intronic or intergenic region

Boundary_Type - For exon boundaries. Whether the boundary corresponds to an Exon-Intron (Donor) or Intron-Exon (Acceptor) site

Upstream_Gene_ID - For intergenic regions, the closest EnsEMBL gene in the upstream direction (indicated by ALEXA gene ID)

Downstream_Gene_ID - For intergenic regions, the closest EnsEMBL gene in the downstream direction (indicated by ALEXA gene ID)

Exon_Count - For genes, the number of unique exons

Exon_Content_Count - For genes, the number of distinct exon-clusters within a gene when overlapping exons are merged

Exons_Skipped - For exon-exon junctions, the number of exons skipped by the connection of two exons (0 for canonical junctions)

Feature_List - For transcripts, the names of the features that uniquely identify expression of this transcript

Gene_ID_List - In the case of intronic regions corresponding to overlapping genes on opposite strands, the list of these gene IDs

Multiple_Genes - Whether an intron region could correspond to multiple genes

Specific_Exon_Region_Count - For transcripts, the number of exon regions that are specific to this transcript

Specific_Junction_Count - For transcripts, the number of exon-exon junctions that are specific to this transcript

Specific_Trans_ID - For exon regions, exon-exon junctions and exon boundaries. If applicable, the ID of the transcript these features are unique to

Transcript_Count - For genes, the number of known transcripts

Transcript_Size - For transcripts, the length of the complete transcript. Base_Count for transcript features only considers the bases that are unique to each transcript

FIELD DESCRIPTIONS - EXPRESSION VALUES

Cumulative_Coverage - The cumulative number of bases mapped to a particular feature. For example, if a 42-mer read is mapped within the bounds of an exon, this would contribute 42 to the cumulative coverage value

Average_Coverage_RAW - The Cumulative_Coverage value divided by the bp size of the feature

Average_coverage_NORM1 - A correction of the Cumulative_Coverage value to allow comparisons between libraries of different size

Bases_Covered_1x - The total number of bases positions of a feature that were covered to a sequencing depth of 1x or greater. A feature may have a higher Average_Coverage value but this does not neccessarily mean that all of the base positions of that feature were sequenced

Percent_Coverage_1x - The proportion of all base positions of a feature that were covered to a sequencing depth of 1x or greater

Percent_Coverage_5x - The proportion of all base positions of a feature that were covered to a sequencing depth of 5x or greater

Percent_Coverage_10x - The proportion of all base positions of a feature that were covered to a sequencing depth of 10x or greater

Percent_Coverage_100x - The proportion of all base positions of a feature that were covered to a sequencing depth of 100x or greater

Expressed - If the feature was deemed to be expressed above the level of background noise this value will be '1' (otherwise '0')

Percent_Gene_Expression - The expression of a feature relative to the expression of the gene to which it belongs (not applicable for intergenic features)

FIELD DESCRIPTIONS - DIFFERENTIAL EXPRESSION VALUES

Seq_Name - A human readable, abbreviated feature name (e.g. ER1, E1_E3, I2, etc.)

FID - Feature ID. Guaranteed to be unique across all feature types

A_Average_Coverage_RAW - The Cumulative_Coverage value divided by the bp size of the feature for library A

A_Expressed - '1' if feature is expressed above background noise levels in library A, otherise '0'

B_Average_Coverage_RAW - The Cumulative_Coverage value divided by the bp size of the feature for library B

B_Expressed - '1' if feature is expressed above background noise levels in library B, otherise '0'

A_Norm - The Cumulative_Coverage value divided by the bp size of the feature for library A

B_Norm - The Cumulative_Coverage value divided by the bp size of the feature for library B

Fold_Change - Fold change ratio (in expression level between library A and B) for the feature

Log2_Diff - Log2 difference (between expression level between library A and B) for the feature

BH - Fisher's exact test p-value corrected for multiple testing by Benjamini & Hochberg method

FIELD DESCRIPTIONS - ALTERNATIVE EXPRESSION VALUES

Gene_ID - Unique ALEXA gene ID (use to obtain additional info from an ALEXA gene database). ALEXA and EnsEMBL gene IDs have a one-to-one relationship

Seq_Name - A human readable, abbreviated feature name (e.g. ER1, E1_E3, I2, etc.)

FID - Feature ID. Guaranteed to be unique across all feature types

A_GENE_RAW - The Cumulative_Coverage value divided by the bp size of the gene for library A

B_GENE_RAW - The Cumulative_Coverage value divided by the bp size of the gene for library B

A_SEQ_RAW - The Cumulative_Coverage value divided by the bp size of the feature for library A

B_SEQ_RAW - The Cumulative_Coverage value divided by the bp size of the feature for library B

A_GENE_Norm - Cumulative_Coverage value corrected for library size, for the gene in library A

B_GENE_Norm - Cumulative_Coverage value corrected for library size, for the gene in library B

A_SEQ_Norm - Cumulative_Coverage value corrected for library size, for the feature in library A

B_SEQ_Norm - Cumulative_Coverage value corrected for library size, for the feature in library B

GENE_Fold_Change - Fold change ratio (in expression level between library A and B) for the entire gene

GENE_Log2_Diff - Log2 difference (between expression level between library A and B) for the entire gene

SEQ_Fold_Change - Fold change ratio (in expression level between library A and B) for the feature only

SEQ_Log2_Diff - Log2 difference (between expression level between library A and B) for the feature only

SI - Splicing index ratio (ratio of gene normalized feature expression values between library A and B)

Reciprocal - If gene and feature are changing in opposite directions, reciprocal = 1, otherwise reciprocal = 0

Reciprocity - A metric quantifying how reciprocal and feature change is relative to gene change (N/A for non-reciprocal changes)

percent_SEQ_Log2_DE - The proportion of expression change that is happening at the feature level as opposed to the gene level

pvalue - Fisher's exact test p-value for change in tag counts between libraries

odds_ratio - Fisher's exact test odds ratio for change in tag counts between libraries

Bonferroni - Fisher's exact test p-value corrected for multiple testing by Bonferroni method

BH - Fisher's exact test p-value corrected for multiple testing by Benjamini & Hochberg method