Methyl-C data from Lister et al:

http://genomebrowser.wustl.edu/twang_stuff/Cydney/

General CpG content and MRE fragment files:

http://hgwdev.cse.ucsc.edu/~tingwang/Costello/remc.html

----- Notes from Ting ------

(1) Yes from the Lister paper. I have nonCG data as well but that  
would be strand-specific. For simplicity and for display purpose I  
only put CpG data there. If you need the nonCG data I can send you.  
However, most of the nonCG methylation happens in highly methylated  
region (CpG). So my feeling is you don't lose anything by just working  
on the CpG data.

(2) How the score is defined: at any given site, you will have a total  
C count and mC count (mC will remain C, total C is C+T). Usually  
people use mC/Total_C to indicate methylation level. I did mC/ 
Total_C*1000 - 500 -- the sole reason is for display, so that  
unmethylated region will have a negative score and be displayed  
differently. In my scoring system, 0 means there was equal amount of  
mC and C at that position, i.e. the site is half methylated. If the  
CpG didn't have a read, then it didn't receive a score. Therefore, the  
file on H1 has 26M lines instead of 28M.

(3) I think you can start by calculating an average for each window.  
You don't need to normalize by CpG density with this data. Depending  
on the window size, you might want to throw out regions with very few  
CpGs just to avoid small sample bias. You may also try to distinguish  
regions with sub-structures, for example, the first half of the window  
is methylated and second half is unmethylated, and the average would  
be the same as a region that is partially methylated throughout. This  
type of data almost always asks for a variable window size. I bet you  
can get most you need by simply computing an average, then you can go  
after more subtle things with a more complex model of the score.

The other thing to keep in mind is each CpG has different total C, in  
another word, the confidence interval of the measurement is different  
for each CpG. This information is lost in translation (the scores). If  
you are interested in regional calls, I've done this before: collect  
all reads in this region, and compute mC/Total_C for the entire region  
(not for individual CpG). I think I can pull that kind of data for you  
as well, if you are interested.