processor ID: 17939 ========================================================================== |------------------------------------------------------------------| | | | *** Running a seeded analysis *** | | | |------------------------------------------------------------------| command line: /home/grobertson/Fi/Code/20110605/GADEM_v1.3.1/bin/gadem -fseq /projects/remc_bigdata/Karsan/motifs/20110604/NFKB-ChIP-seq.score-300.L-100.12381-seqs.hg18.fa -fpwm0 /home/grobertson/Fi/PWMs/NFKB1_JASPAR_MA0105.mx -minN 6000 -maskR 1 -fout /projects/remc_bigdata/Karsan/motifs/20110604/report.txt -verbose 1 -nbs 20 Data: input (ChIP) sequence file: /projects/remc_bigdata/Karsan/motifs/20110604/NFKB-ChIP-seq.score-300.L-100.12381-seqs.hg18.fa number of sequences in input file: 12381 average sequence length: 737 total number of nucleotides: 9133212 [a,c,g,t] frequencies: 0.2280 0.2720 0.2720 0.2280 Motif model: use the user-specified pwm as the starting PWM /home/grobertson/Fi/PWMs/NFKB1_JASPAR_MA0105.mx This pwm is repeatedly used as the starting PWMf for the EM algorithm. Similar (motif variants) or different motifs may be identified. To identify motif vaiants set -ev large, e.g., 10000 and -minN small, e.g., numSeq/10. This allows gadem to identify motif variants that are present in at least 10 percent of the sequences. A large log(E-value) cutoff enures such motifs are found. Background model: background model estimated from the input data: /projects/remc_bigdata/Karsan/motifs/20110604/NFKB-ChIP-seq.score-300.L-100.12381-seqs.hg18.fa 0.228050 0.271950 0.271950 0.228050 background Markov order: 0th Declaring BINDING SITES: pwm score p-value cutoff for declaring binding site: 2.500000e-04 null pwm log-likelihood ratio score distribution deterimined using: Staden probability generating function method (Comput. Appl. Biosci., 5,89,1989). Declaring MOTIFS: motif p-value (significance of alignment) is computed using the subroutine from MEME log(E-value) cutoff: 0.00 Motif prior probability type: motif prior probability type (see documentation): 1 (gaussian motif location prior) Genetic Algorithm (GA): running a seeded analysis - GA not needed. EM: maximal number of EM steps: 40 EM convergence criterion: 1.000000e-04 fraction (number) input sequences subject to EM 1.00 (12381) MAXP value: run EM on the starting pwm /home/grobertson/Fi/PWMs/NFKB1_JASPAR_MA0105.mx 10 times, each with a different maxp: 0.10*numSeq 0.20*numSeq 0.30*numSeq 0.40*numSeq 0.50*numSeq 0.60*numSeq 0.70*numSeq 0.80*numSeq 0.90*numSeq 1.00*numSeq no spaced dyads are generated and used. pop=10 gen=1 (no GA). Other parameters: simple repeats such as 'aaaaaaaa' (see usage) are masked before running GADEM minimal no. sites for each motif: 6000 base extension and trimming? yes job started: Sun Jun 5 07:15:53 2011 =========================================================================