ALEXA-SEQ SCHEMA DESCRIPTION Each of the following field names is followed by a brief description. Note that some fields correspond to specific feature types only. FIELD DESCRIPTIONS - BASIC XXX_ID - Primary ID of feature type. Guaranteed to be unique within the feature type (Transcript_ID, ExonRegion_ID, Junction_ID, etc.) FID - Feature ID. Guaranteed to be unique across all feature types Gene_ID - Unique ALEXA gene ID (use to obtain additional info from an ALEXA gene database). ALEXA and EnsEMBL gene IDs have a one-to-one relationship Gene_Name - EnsEMBL gene symbol EnsEMBL_Gene_ID - EnsEMBL gene ID Gene_Evidence - Known versus predicted genes Gene_Type - Gene type according to EnsEMBL (e.g. protein_coding, pseudogene, miRNA, etc.) Description - Full gene description from EnsEMBL Supporting_EST_Count - The number of EST alignments from ESTs of the target species supporting the feature Supporting_EnsEMBL_Count - The number of EnsEMBL transcripts supporting the feature Supporting_mRNA_Count - The number of mRNA alignments from mRNAs of the target species supporting the feature Supporting_xEST_Count - The number of EST alignments from ESTs of non-target species supporting the feature Supporting_xmRNA_Count - The number of mRNA alignments from mRNAs of non-target species supporting the feature Conserved_Species_Count - The number of species (other than the target species) with ESTs/mRNA supporting the feature FIELD DESCRIPTIONS - COORDINATE/COMPOSITION RELATED Chromosome - Source chromosome Unit1_start - Start position in Gene coordinates (i.e. from start of the first exon to the end of the last exon including all introns) Unit1_end - End position in Gene coordinates (i.e. from start of the first exon to the end of the last exon including all introns) Unit1_start_chr - Start position in Chromosome coordinates Unit1_end_chr - End position in Chromosome coordinates Unit2_end - Similar to 'Unit1' above but for the downstream portion of an exon-exon junction sequence (applies to exon-exon junction sequences only) Unit2_end_chr - Similar to 'Unit1' above but for the downstream portion of an exon-exon junction sequence (applies to exon-exon junction sequences only) Unit2_start - Similar to 'Unit1' above but for the downstream portion of an exon-exon junction sequence (applies to exon-exon junction sequences only) Unit2_start_chr - Similar to 'Unit1' above but for the downstream portion of an exon-exon junction sequence (applies to exon-exon junction sequences only) Strand - Chromosome strand of the feature ('1' = '+ve strand'; '-1' = '-ve strand') Seq_Name - A human readable, abbreviated feature name (e.g. ER1, E1_E3, I2, etc.) Base_Count - Nucleotide base length of the feature UnMasked_Base_Count - Number of bases that are not masked according to EnsEMBL Coding_Base_Count - Number of bases of the feature that overlap 1 or more known open reading frames FIELD DESCRIPTIONS - FEATURE-SPECIFIC ANNOTATIONS Active_Base_Count - The number of 'Active' (EST/mRNA supported) bases contained within an intronic or intergenic region Active_Region_Count - The number of discrete 'Active' (EST/mRNA supported) sub-regions contained within an intronic or intergenic region Silent_Base_Count - The number of 'Silent' (not supported by an EST/mRNA sequence) bases contained within an intronic or intergenic region Silent_Region_Count - The number of discrete 'Silent' (not supported by an EST/mRNA sequence) sub-regions contained within an intronic or intergenic region Boundary_Type - For exon boundaries. Whether the boundary corresponds to an Exon-Intron (Donor) or Intron-Exon (Acceptor) site Upstream_Gene_ID - For intergenic regions, the closest EnsEMBL gene in the upstream direction (indicated by ALEXA gene ID) Downstream_Gene_ID - For intergenic regions, the closest EnsEMBL gene in the downstream direction (indicated by ALEXA gene ID) Exon_Count - For genes, the number of unique exons Exon_Content_Count - For genes, the number of distinct exon-clusters within a gene when overlapping exons are merged Exons_Skipped - For exon-exon junctions, the number of exons skipped by the connection of two exons (0 for canonical junctions) Feature_List - For transcripts, the names of the features that uniquely identify expression of this transcript Gene_ID_List - In the case of intronic regions corresponding to overlapping genes on opposite strands, the list of these gene IDs Multiple_Genes - Whether an intron region could correspond to multiple genes Specific_Exon_Region_Count - For transcripts, the number of exon regions that are specific to this transcript Specific_Junction_Count - For transcripts, the number of exon-exon junctions that are specific to this transcript Specific_Trans_ID - For exon regions, exon-exon junctions and exon boundaries. If applicable, the ID of the transcript these features are unique to Transcript_Count - For genes, the number of known transcripts Transcript_Size - For transcripts, the length of the complete transcript. Base_Count for transcript features only considers the bases that are unique to each transcript FIELD DESCRIPTIONS - EXPRESSION VALUES Cumulative_Coverage - The cumulative number of bases mapped to a particular feature. For example, if a 42-mer read is mapped within the bounds of an exon, this would contribute 42 to the cumulative coverage value Average_Coverage_RAW - The Cumulative_Coverage value divided by the bp size of the feature Average_coverage_NORM1 - A correction of the Cumulative_Coverage value to allow comparisons between libraries of different size Bases_Covered_1x - The total number of bases positions of a feature that were covered to a sequencing depth of 1x or greater. A feature may have a higher Average_Coverage value but this does not neccessarily mean that all of the base positions of that feature were sequenced Percent_Coverage_1x - The proportion of all base positions of a feature that were covered to a sequencing depth of 1x or greater Percent_Coverage_5x - The proportion of all base positions of a feature that were covered to a sequencing depth of 5x or greater Percent_Coverage_10x - The proportion of all base positions of a feature that were covered to a sequencing depth of 10x or greater Percent_Coverage_100x - The proportion of all base positions of a feature that were covered to a sequencing depth of 100x or greater Expressed - If the feature was deemed to be expressed above the level of background noise this value will be '1' (otherwise '0') Percent_Gene_Expression - The expression of a feature relative to the expression of the gene to which it belongs (not applicable for intergenic features) FIELD DESCRIPTIONS - DIFFERENTIAL EXPRESSION VALUES Seq_Name - A human readable, abbreviated feature name (e.g. ER1, E1_E3, I2, etc.) FID - Feature ID. Guaranteed to be unique across all feature types A_Average_Coverage_RAW - The Cumulative_Coverage value divided by the bp size of the feature for library A A_Expressed - '1' if feature is expressed above background noise levels in library A, otherise '0' B_Average_Coverage_RAW - The Cumulative_Coverage value divided by the bp size of the feature for library B B_Expressed - '1' if feature is expressed above background noise levels in library B, otherise '0' A_Norm - The Cumulative_Coverage value divided by the bp size of the feature for library A B_Norm - The Cumulative_Coverage value divided by the bp size of the feature for library B Fold_Change - Fold change ratio (in expression level between library A and B) for the feature Log2_Diff - Log2 difference (between expression level between library A and B) for the feature BH - Fisher's exact test p-value corrected for multiple testing by Benjamini & Hochberg method FIELD DESCRIPTIONS - ALTERNATIVE EXPRESSION VALUES Gene_ID - Unique ALEXA gene ID (use to obtain additional info from an ALEXA gene database). ALEXA and EnsEMBL gene IDs have a one-to-one relationship Seq_Name - A human readable, abbreviated feature name (e.g. ER1, E1_E3, I2, etc.) FID - Feature ID. Guaranteed to be unique across all feature types A_GENE_RAW - The Cumulative_Coverage value divided by the bp size of the gene for library A B_GENE_RAW - The Cumulative_Coverage value divided by the bp size of the gene for library B A_SEQ_RAW - The Cumulative_Coverage value divided by the bp size of the feature for library A B_SEQ_RAW - The Cumulative_Coverage value divided by the bp size of the feature for library B A_GENE_Norm - Cumulative_Coverage value corrected for library size, for the gene in library A B_GENE_Norm - Cumulative_Coverage value corrected for library size, for the gene in library B A_SEQ_Norm - Cumulative_Coverage value corrected for library size, for the feature in library A B_SEQ_Norm - Cumulative_Coverage value corrected for library size, for the feature in library B GENE_Fold_Change - Fold change ratio (in expression level between library A and B) for the entire gene GENE_Log2_Diff - Log2 difference (between expression level between library A and B) for the entire gene SEQ_Fold_Change - Fold change ratio (in expression level between library A and B) for the feature only SEQ_Log2_Diff - Log2 difference (between expression level between library A and B) for the feature only SI - Splicing index ratio (ratio of gene normalized feature expression values between library A and B) Reciprocal - If gene and feature are changing in opposite directions, reciprocal = 1, otherwise reciprocal = 0 Reciprocity - A metric quantifying how reciprocal and feature change is relative to gene change (N/A for non-reciprocal changes) percent_SEQ_Log2_DE - The proportion of expression change that is happening at the feature level as opposed to the gene level pvalue - Fisher's exact test p-value for change in tag counts between libraries odds_ratio - Fisher's exact test odds ratio for change in tag counts between libraries Bonferroni - Fisher's exact test p-value corrected for multiple testing by Bonferroni method BH - Fisher's exact test p-value corrected for multiple testing by Benjamini & Hochberg method |