validate subpackage¶
Sub-package Documentation¶
The validation sub-package is responsible for pulling supporting reads from the bam file and re-calling events based on the evidence in a standard notation.
Types of Output Files¶
A variety of intermediate output files are given for the user. These can be used to “drill down” further into events and also for developers debugging when adding new features, etc.
expected name/suffix | file type/format | content |
---|---|---|
*.raw_evidence.bam |
bam | raw evidence |
*.contigs.bam |
bam | aligned contigs |
*.evidence.bed |
bed | evidence collection window regions |
*.validation-passed.bed |
bed | validated event positions |
*.validation-failed.tab |
text/tabbed | failed events |
*.validation-passed.tab |
text/tabbed | validated events |
*.contigs.fa |
fasta | assembled contigs |
*.contigs.blat_out.pslx |
pslx | results from blatting contigs |
*.igv.batch |
IGV batch file | igv batch file |
Algorithm Overview¶
(For each breakpoint pair)
- Calculate the window/region to read from the bam and collect evidence
- Store evidence (flanking read pair, half-mapped read, spanning read, split read, compatible flanking pairs) which match the expected event type and position
- Assemble a contig from the collected reads. see theory - assembling contigs
Generate a fasta file containing all the contig sequences
Align contigs to the reference genome (currently blat is used to perform this step)
Make the final event calls. Each level of calls consumes all supporting reads so they are not re-used in subsequent levels of calls.
(For each breakpoint pair)
- call by contig
- call by spanning read
- call by split read
- call by flanking read pair. see theory - calling breakpoints by flanking evidence
Output new calls, evidence, contigs, etc
modules