Periodically, all ORegAnno data records are mapped to their respective current genome builds. A tab-delimited file is created for upload as an official track in the UCSC genome browser. These files may be of use to others without the skills/inclination to use the XML dumps. Two formats are provided: The 'UCSC' format contains only those sequences to be uploaded into the UCSC Browser (ORegAnno track under 'Expression and Regulation'). These sequences/records must be (1) Active (not deprecated); (2) Mapped successfully; (3) have a positive outcome. The 'FULL' format is very similar except that it also includes sequences that were not mapped successfully and/or had a negative outcome. The 'FULL' format also includes some additional fields such as the actual sequences and evidence type, cell type, etc but is still missing some other details such as comments, and validation history. For the most complete/comprehensive records you should still use the XML dumps or SOAP web api methods. However, these files will likely contain all the necessary information for many purposes. Explanation of data fields: Species - Species for sequence/record. UCSC Build - Buildname used by UCSC genome (hg18, mm9, etc) Build - internal build name, includes names for genomes not from UCSC (e.g., CI_JGI2_45 for Ciona intestinalis, downloaded from JGI) Mapping status - identifies whether record was successfully mapped to genome for 'UCSC Build' or 'Build' Outcome - Outcome of experiment. 'Positive outcome' indicates the experiments confirmed regulatory function of the sequence. chrom - chromosome chromStart - start position chromEnd - end position Strand - strand Stable Id - ORegAnno stable identifier (example: OREG0000017) Type - Type of ORegAnno record (example: TRANSCRIPTION FACTOR BINDING SITE) Gene name - Official gene name from NCBI, Ensembl or user-defined Gene ID - Official gene identifier from NCBI, Ensembl or user-defined Gene Source - NCBI, Ensembl, or user-defined TF name - Official transcription factor name from NCBI, Ensembl or user-defined TF ID - Official transcription factor identifier from NCBI, Ensembl or user defined TF Source - NCBI, Ensembl or user-defined dbSNP ID - The dbSNP identifier for Regulatory Polymorphism records PMID - Pubmed identifier of source reference Dataset - ORegAnno dataset name (example: FLYREG) Evidence Subtypes - semicolon delimited list of ORegAnno evidence subtypes for the record Data fields only in 'FULL' version: Evidence Lines (type|subtype|celltype) - In the 'FULL' version (as of 02 Sep 2008), 'Evidence Lines' replaces 'Evidence Subtypes'. This includes evidence type, subtype, and cell type (delimited by "|"). If more than one evidence line is available, it will be further delimited by semicolon. Regulatory Sequence - Sequence shown to be bound and/or demonstrating some regulatory function Regulatory Sequence With Flank - Flanking sequence used to map sequence to genome (necessary in cases where 'Regulatory Sequence' is too small for alignment) Sequence Search Space - Sequence assayed for function Polymorphism Reference Sequence - For Regulatory Polymorphism records: the reference genome sequence Polymorphism Variant Sequence - For Regulatory Polymorphism records: the variant genome sequence shown to affect regulation