--- title: "*MLLT1* and *MLLT3* expression in B-cell lymphomas" author: "Laura Hilton" date: today format: LCR_PDF-pdf: # Install with quarto install extension LCR-BCCRC/LCR-PDF toc: true toc-depth: 2 df-print: kable fig-pos: "ht" extra_dependencies: ["float"] html: toc: true toc-depth: 2 df-print: kable # gfm: # toc: true # toc-depth: 2 # df-print: kable execute: echo: false message: false warning: false cache: true bibliography: references.bib csl: nature-medicine.csl --- # Background The University of Montreal team of Javier Di Noia and NoƩ Seija Desivo have generated mouse and cell line models demonstrating the role of two chromatin readers, MLLT1 and MLLT3, in targeting AID to chromatin to induce somatic hypermutation (SHM). They are looking to orthogonally validate some of their findings in lymphomas: "Noe will prepare the coordinates/signal of the MLLT1 ChiP seq in RAMOS and SUDHL5 for correlation with SHM load. "The genes we are interested are all SEC components: the two histone readers (MLLT1 and MLLT3) and three scaffolds (AFF1, AFF3, AFF4) that we find contribute to AID activity. They can form 6 different complexes by combinations of one MLLT and one AFF. "This current manuscript we may submit in early spring (depending on our competitors timeline too) we would like to know if there is any correlation between *MLLT1* or *MLLT3* with AID expression and SHM load." They also have some data from TCGA to suggest that higher expression of *MLLT1* in DLBCL is associated with inferior outcomes. Here we will examine whether *MLLT1/3* expression varies across lymphomas and try to correlate it aSHM load. # Procedure ## Identify samples To examine the expression of *MLLT1/3* across lymphomas, we will incorporate BL, DLBCL, HGBCL-DH-BCL2, and FL RNAseq data used in Hilton et al, 2024 [@Hilton2024]. ```{r} #| label: setup library(tidyverse) library(GAMBLR) library(glue) library(rstatix) library(sigminer) library(BSgenome.Hsapiens.UCSC.hg19) library(quadprog) library(ComplexHeatmap) library(circlize) library(cowplot) library(ggpubr) library(ggbeeswarm) ``` ```{r} #| label: global_variables colours <- list( group = c( "BL EBV-" = "#926CAD", "BL EBV+" = "#B398C6", "FL" = "#EA8368", "HGBCL-DH-BCL2" = "#7A1616", "HGBCL-DH-BCL2 DHITsig+" = "#9c0202", "DZsig+" = "#D62828", "GCB-DLBCL" = "#F58F20", "UNCLASS-DLBCL" = "#05631E", "ABC-DLBCL" = "#05ACEF" ) ) gois <- c( "MLLT1", "MLLT3", "AFF1", "AFF3", "AFF4", "AICDA" ) ``` @tbl-load-metadata summarizes the available RNAseq data from different lymphoma types. ```{r} #| label: tbl-load-metadata #| tbl-cap: "Summary of available expression data" project_base <- "/projects/dscott_prj/breakpoint_architecture/" mrna_md <- read_tsv(glue("{project_base}/data/metadata/breakpoint_capture_md.tsv")) %>% filter(seq_type == "mrna") %>% filter(ICC_class %in% c( "BL", "DLBCL", "FL", "HGBCL-DH-BCL2" )) %>% mutate(group = ifelse( ICC_class == "DLBCL", glue("{dlbcl_call}-DLBCL"), ICC_class )) %>% mutate(group = case_when( group == "GCB-DLBCL" & dhitsig_call == "POS" ~ "DZsig+", group == "BL" & BL_EBV == "EBV-positive" ~ "BL EBV+", group == "BL" & BL_EBV == "EBV-negative" ~ "BL EBV-", group == "BL" ~ NA, TRUE ~ group )) %>% filter(group != "NA-DLBCL", !is.na(group)) %>% filter(!(group == "HGBCL-DH-BCL2" & dhitsig_call != "POS")) %>% mutate(group = factor(group, levels = names(colours$group))) %>% group_by(patient_id, biopsy_id) %>% slice_max(protocol == "Ribodepletion", n = 1, with_ties = FALSE) %>% ungroup() mrna_md %>% count(ICC_class, group) %>% knitr::kable() ``` ```{r} #| label: load_expression expr <- read_tsv(glue("{project_base}/results/expression/vst_matrix.tsv")) %>% select(hgnc_symbol, any_of(mrna_md$sample_id)) %>% filter(hgnc_symbol %in% gois) expr_mat <- column_to_rownames(expr, "hgnc_symbol") expr_long <- expr %>% pivot_longer( -hgnc_symbol, names_to = "sample_id", values_to = "expression" ) %>% left_join(mrna_md) ``` \newpage ## Compare expression across lymphoma types @fig-plot_expression shows the expression of SEC genes across different lymphoma subtypes. As shown before, *AICDA* expression is high in BL and ABC-DLBCL. Notably *MLLT1* expression is lowest in BL, which would be consistent with the globally lower rates of aSHM observed in BL. ```{r} #| label: fig-plot_expression #| fig-cap: "Expression of *AICDA* and SEC complex genes across lymphoma subtypes" #| fig-height: 10 #| fig-width: 10 #| fig-pos: h stats <- expr_long %>% group_by(hgnc_symbol) %>% pairwise_wilcox_test( expression ~ group, p.adjust.method = "BH" ) %>% filter(p.adj.signif != "ns") %>% add_y_position() expr_long %>% ggplot(aes( x = group, y = expression, colour = group )) + geom_boxplot() + geom_quasirandom() + scale_colour_manual(values = colours$group) + stat_pvalue_manual(stats, hide.ns = TRUE) + facet_wrap(~hgnc_symbol, nrow = 2) + theme_cowplot() + xlab("") + ylab("Normalized Expression") + theme( axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none" ) ``` \newpage @fig-scatter shows that there are no strong correlations between *AICDA* expression and that of any other SEC genes. ```{r} #| label: fig-scatter #| fig-cap: "Scatter plot correlating pairwise expression of the SEC genes" #| fig-height: 15 #| fig-width: 15 expr_long %>% select( sample_id, group, hgnc_symbol, expression ) %>% full_join( select(expr_long, sample_id, hgnc_symbol, expression), by = "sample_id", relationship = "many-to-many" ) %>% filter(hgnc_symbol.x != hgnc_symbol.y) %>% filter(hgnc_symbol.x == "AICDA") %>% ggplot(aes( x = expression.x, y = expression.y, colour = group )) + geom_point() + geom_smooth(method = "lm", se = FALSE) + facet_grid(group ~ hgnc_symbol.y, as.table = TRUE) + scale_colour_manual(values = colours$group) + stat_cor(label.y = 3) + xlab("AICDA Expression") + theme_cowplot() + theme(legend.position = "none") ``` \newpage @fig-heatmap shows the expression of each gene across lymphoma entities. ```{r} #| label: fig-heatmap #| fig-cap: "Heatmap of *AICDA* and SEC gene expression ordered on *AICDA* expression" #| fig-height: 8 #| fig-width: 12 # Set sample order based on expression of AICDA sample_order <- expr_long %>% filter(hgnc_symbol == "AICDA") %>% arrange(group, desc(expression)) %>% pull(sample_id) md_heatmap <- expr_long %>% # mutate(C_Gene = factor(C_Gene, levels = rev(levels(fl_igh_long$C_Gene)))) %>% select(sample_id, group) %>% distinct() %>% column_to_rownames("sample_id") md_heatmap <- md_heatmap[sample_order, , drop = FALSE] expr_scaled <- t(apply(expr_mat[c("AICDA", "AFF1", "AFF3", "AFF4", "MLLT1", "MLLT3"), rownames(md_heatmap)], 1, scale)) colnames(expr_scaled) <- rownames(md_heatmap) ha <- HeatmapAnnotation( "Group" = md_heatmap$group, col = list( "Group" = colours$group[names(colours$group) %in% unique(md_heatmap$group)] ), annotation_legend_param = list( "Group" = list(nrow = 1, title_position = "topleft") ) ) heatmap <- Heatmap( expr_scaled, name = " ", row_labels = rownames(expr_scaled), column_order = sample_order, row_names_side = "left", top_annotation = ha, column_split = md_heatmap$group, show_row_names = TRUE, show_column_names = FALSE, cluster_rows = FALSE, cluster_columns = FALSE, cluster_column_slices = FALSE, show_column_dend = FALSE, heatmap_legend_param = list( title = "Normalized Expression", legend_height = unit(5, "cm"), title_position = "leftcenter-rot" ), column_title_gp = gpar(fontsize = 10), row_names_gp = gpar(fontsize = 10) ) draw( heatmap, annotation_legend_side = "top" ) ``` \newpage The next version of the heatmap (@fig-heatmap-clustered) is clustered. This version emphasizes the heterogeneity of expression across and within different lymphoma types. ```{r} #| label: fig-heatmap-clustered #| fig-cap: "Clustered heatmap of *AICDA* and SEC gene expression" #| fig-height: 8 #| fig-width: 13 heatmap <- Heatmap( expr_scaled, name = " ", row_labels = rownames(expr_scaled), column_order = sample_order, row_names_side = "left", top_annotation = ha, column_split = md_heatmap$group, show_row_names = TRUE, show_column_names = FALSE, cluster_rows = TRUE, cluster_columns = TRUE, cluster_column_slices = TRUE, show_column_dend = TRUE, heatmap_legend_param = list( title = "Normalized Expression", legend_height = unit(5, "cm"), title_position = "leftcenter-rot" ), column_title_gp = gpar(fontsize = 10), row_names_gp = gpar(fontsize = 10) ) draw( heatmap, annotation_legend_side = "top" ) ``` \newpage ## Correlate expression with aSHM and SBS84 Next, we will try to layer on aSHM information for these tumours. For this, I'll obtain SSM status for mutations across aSHM space and layer on mutation signatures using SigMiner. This reduces the number of samples shown on the heatmap because not all samples with RNAseq also have WGS. aSHM regions are defined [here](https://raw.githubusercontent.com/morinlab/GAMBLR.data/refs/heads/master/inst/extdata/somatic_hypermutation_locations/0.5/GRCh37.tsv). Then, SigMiner will be run to quantify exposure to SBS84 across aSHM space specifically. ```{r} #| label: quantify_ashm #| output: false genomes <- read_tsv(glue("{project_base}/data/metadata/breakpoint_capture_md.tsv")) %>% filter(seq_type == "genome") %>% mutate(group = ifelse( ICC_class == "DLBCL", glue("{dlbcl_call}-DLBCL"), ICC_class )) %>% mutate(group = ifelse( group == "GCB-DLBCL" & dhitsig_call == "POS", "DZsig+", group )) %>% filter(group != "NA-DLBCL") %>% filter(!(group == "HGBCL-DH-BCL2" & dhitsig_call != "POS")) %>% mutate(group = factor(group, levels = names(colours$group))) ssm_ashm <- get_ssm_by_regions( these_sample_ids = genomes$sample_id, regions_bed = mutate(grch37_ashm_regions, name = str_c(gene, "_", "region")), use_name_column = TRUE, streamlined = FALSE, basic_columns = FALSE ) ssm_ashm_count <- ssm_ashm %>% count(Tumor_Sample_Barcode) %>% dplyr::rename( sample_id = Tumor_Sample_Barcode, total_aSHM_count = n ) %>% left_join(select(genomes, sample_id, biopsy_id)) %>% select(-sample_id) ``` ```{r} #| label: run-sigminer #| output: false maf_obj <- read_maf(ssm_ashm) tally_sbs <- sig_tally( object = maf_obj, ref_genome = "BSgenome.Hsapiens.UCSC.hg19", genome_build = "hg19", mode = "SBS" ) get_sbs <- sig_fit( t(tally_sbs$nmf_matrix), sig_index = "ALL", sig_db = "SBS" ) sbs84 <- data.frame(t(get_sbs["SBS84", , drop = FALSE])) %>% rownames_to_column("sample_id") %>% left_join(select(genomes, sample_id, biopsy_id)) %>% select(-sample_id) ``` The ordered heatmap (@fig-heatmap-SBS84) does not reveal any major patterns of association between expression of SEC genes and levels of aSHM or SBS84 exposure. ```{r} #| label: fig-heatmap-SBS84 #| fig-cap: "Heatmap of SEC gene expression ordered on *AICDA* expression annotated with aSHM amounts" #| fig-height: 8 #| fig-width: 12 # Set sample order based on expression of AICDA sample_order <- expr_long %>% filter(hgnc_symbol == "AICDA") %>% arrange(group, desc(expression)) %>% pull(sample_id) md_heatmap <- expr_long %>% select(sample_id, biopsy_id, group) %>% distinct() %>% left_join(ssm_ashm_count) %>% left_join(sbs84) %>% drop_na(total_aSHM_count) %>% column_to_rownames("sample_id") sample_order <- sample_order[sample_order %in% rownames(md_heatmap)] md_heatmap <- md_heatmap[sample_order, , drop = FALSE] expr_scaled <- expr_scaled[, sample_order] col_ashm <- colorRamp2( c(0, 50, 150), c("white", "orange", "purple") ) col_sbs84 <- colorRamp2( c(0, 50, 150), c("white", "orange", "purple") ) ha <- HeatmapAnnotation( "Group" = md_heatmap$group, "aSHM" = md_heatmap$total_aSHM_count, "SBS84" = md_heatmap$SBS84, col = list( "Group" = colours$group[names(colours$group) %in% unique(md_heatmap$group)], "aSHM" = col_ashm, "SBS84" = col_sbs84 ), annotation_legend_param = list( "Group" = list(nrow = 1, title_position = "topleft"), "aSHM" = list(nrow = 1, title_position = "topleft", direction = "horizontal"), "SBS84" = list(nrow = 1, title_position = "topleft", direction = "horizontal") ) ) heatmap <- Heatmap( expr_scaled, name = " ", row_labels = rownames(expr_scaled), column_order = sample_order, row_names_side = "left", top_annotation = ha, column_split = md_heatmap$group, show_row_names = TRUE, show_column_names = FALSE, cluster_rows = FALSE, cluster_columns = FALSE, cluster_column_slices = FALSE, show_column_dend = FALSE, heatmap_legend_param = list( title = "Normalized Expression", legend_height = unit(5, "cm"), title_position = "leftcenter-rot" ), column_title_gp = gpar(fontsize = 10), row_names_gp = gpar(fontsize = 10) ) draw( heatmap, annotation_legend_side = "top" ) ``` \newpage The clustered heatmap (@fig-heatmap-SBS84-clustered) suggests perhaps some association between levels of *MLLT1* and amount of aSHM in GCB-DLBCL and HGBCL-DH-*BCL2*. Let's explore that further with a correlation plot. ```{r} #| label: fig-heatmap-SBS84-clustered #| fig-cap: "Clustered heatmap of SEC gene expression annotated with aSHM amounts" #| fig-height: 8 #| fig-width: 12 heatmap <- Heatmap( expr_scaled, name = " ", row_labels = rownames(expr_scaled), column_order = sample_order, row_names_side = "left", top_annotation = ha, column_split = md_heatmap$group, show_row_names = TRUE, show_column_names = FALSE, cluster_rows = TRUE, cluster_columns = TRUE, cluster_column_slices = TRUE, show_column_dend = TRUE, heatmap_legend_param = list( title = "Normalized Expression", legend_height = unit(5, "cm"), title_position = "leftcenter-rot" ), column_title_gp = gpar(fontsize = 10), row_names_gp = gpar(fontsize = 10) ) draw( heatmap, annotation_legend_side = "top" ) ``` \newpage There aren't any clear linear correlations between the expression of each SEC gene and either total SBS84 (@fig-cor-SBS84) or total aSHM mutation load (@fig-cor-ashm). However, it's possible that the combination of *MLLT3* with *AICDA* is required to achieve high aSHM. For this, we will bin tumours into combinations of *MLLT3* high/low and *AICDA* high/low. We will set the median expression for each gene within each group to acknowledge the range of expression values across groups. ```{r} #| label: fig-cor-SBS84 #| fig-cap: "Correlation between SBS84 exposure and SEC genes" #| fig-height: 12 #| fig-width: 12 expr_long %>% select( sample_id, biopsy_id, group, hgnc_symbol, expression ) %>% left_join(sbs84) %>% filter(SBS84 < 100) %>% ggplot(aes( x = SBS84, y = expression, colour = group )) + geom_point() + geom_smooth(method = "lm", se = FALSE) + facet_grid(group ~ hgnc_symbol, as.table = TRUE) + scale_colour_manual(values = colours$group) + stat_cor(label.y = 3) + xlab("SBS84 Exposure") + ylab("Normalized Expression") + theme_cowplot() + theme(legend.position = "none") ``` \newpage ```{r} #| label: fig-cor-ashm #| fig-cap: "Correlation between total mutations in aSHM regions and SEC genes" #| fig-height: 12 #| fig-width: 12 expr_long %>% select( sample_id, biopsy_id, group, hgnc_symbol, expression ) %>% left_join(ssm_ashm_count) %>% filter(total_aSHM_count < 300) %>% ggplot(aes( x = total_aSHM_count, y = expression, colour = group )) + geom_point() + geom_smooth(method = "lm", se = FALSE) + facet_grid(group ~ hgnc_symbol, as.table = TRUE) + scale_colour_manual(values = colours$group) + stat_cor(label.y = 3) + xlab("Total aSHM Mutation") + ylab("Normalized Expression") + theme_cowplot() + theme(legend.position = "none") ``` ```{r} #| label: set-cutoffs med_aicda <- expr_long %>% filter(hgnc_symbol == "AICDA") %>% group_by(group) %>% summarize(med_aicda = median(expression, na.rm = TRUE)) med_mllt3 <- expr_long %>% filter(hgnc_symbol == "MLLT3") %>% group_by(group) %>% summarize(med_mllt3 = median(expression, na.rm = TRUE)) %>% ungroup() mllt3_aicda_groups <- expr_long %>% filter(hgnc_symbol %in% c("MLLT3", "AICDA")) %>% select(sample_id, group, hgnc_symbol, expression) %>% pivot_wider( names_from = "hgnc_symbol", values_from = "expression" ) %>% left_join(med_mllt3) %>% left_join(med_aicda) %>% mutate(AICDA_MLLT3_group = case_when( AICDA > med_aicda & MLLT3 > med_mllt3 ~ "high/high", AICDA > med_aicda & MLLT3 <= med_mllt3 ~ "high/low", AICDA <= med_aicda & MLLT3 > med_mllt3 ~ "low/high", AICDA <= med_aicda & MLLT3 <= med_mllt3 ~ "low/low" )) %>% select(sample_id, AICDA_MLLT3_group) mrna_md_binned <- mrna_md %>% left_join(sbs84) %>% left_join(ssm_ashm_count) %>% left_join(mllt3_aicda_groups) ``` \newpage There is no significant association between either SBS84 exposure (@fig-sbs84-binned) and *AICDA*/*MLLT3* expression or total aSHM mutations (@fig-ashm-binned); however, there is a pattern in HGBCL-DH-*BCL2*, DZSig+, and ABC-DLBCL where the cases with the highest aSHM/SBS84 burden are in the group with high *AICDA* and *MLLT3* expression. Let's explore this further with a linear model. ```{r} #| label: fig-sbs84-binned #| fig-cap: "Total aSHM mutations exposure binned by *AICDA* and *MLLT3* expression" #| fig-height: 5 #| fig-width: 12 stats <- mrna_md_binned %>% group_by(group) %>% pairwise_wilcox_test( total_aSHM_count ~ AICDA_MLLT3_group, p.adjust.method = "BH", ref.group = "low/low" ) %>% filter(p.adj.signif != "ns") mrna_md_binned %>% filter(!is.na(AICDA_MLLT3_group)) %>% ggplot(aes( x = AICDA_MLLT3_group, y = total_aSHM_count, colour = group )) + geom_boxplot(outlier.shape = NA) + geom_quasirandom() + facet_wrap(~group, nrow = 1) + scale_colour_manual(values = colours$group) + theme_cowplot() + theme( legend.position = "none", axis.text.x = element_text(angle = 45, hjust = 1) ) + ylab("Total aSHM Mutations") + xlab("Binned AICDA/MLLT3 Expression") ``` ```{r} #| label: fig-ashm-binned #| fig-cap: "SBS84 exposure binned by *AICDA* and *MLLT3* expression" #| fig-height: 5 #| fig-width: 12 stats <- mrna_md_binned %>% group_by(group) %>% pairwise_wilcox_test( SBS84 ~ AICDA_MLLT3_group, p.adjust.method = "BH", ref.group = "low/low" ) %>% filter(p.adj.signif != "ns") mrna_md_binned %>% filter(!is.na(AICDA_MLLT3_group)) %>% ggplot(aes( x = AICDA_MLLT3_group, y = SBS84, colour = group )) + geom_boxplot(outlier.shape = NA) + geom_quasirandom() + facet_wrap(~group, nrow = 1) + scale_colour_manual(values = colours$group) + theme_cowplot() + theme( legend.position = "none", axis.text.x = element_text(angle = 45, hjust = 1) ) + xlab("Binned AICDA/MLLT3 Expression") ``` \newpage @tbl-linear-model-ashm shows an association between *AICDA* expression and aSHM independent of lymphoma type, while there is no significant association with either *MLLT1* or *MLLT3*. A similar pattern appears for SBS84 (@tbl-linear-model-SBS84). ```{r} #| label: tbl-linear-model-ashm #| tbl-cap: "Linear model of aSHM vs expression of SEC genes" lm_tbl <- expr_long %>% left_join(sbs84) %>% left_join(ssm_ashm_count) %>% select( sample_id, group, expression, hgnc_symbol, total_aSHM_count, SBS84 ) %>% pivot_wider( names_from = "hgnc_symbol", values_from = "expression" ) ashm_lm <- glm(total_aSHM_count ~ group + AICDA + MLLT3 + MLLT1, data = lm_tbl) broom::tidy(ashm_lm) %>% knitr::kable() ``` ```{r} #| label: tbl-linear-model-SBS84 #| tbl-cap: "Linear model of SBS84 vs expression of SEC genes" sbs84_lm <- glm(SBS84 ~ group + AICDA + MLLT3 + MLLT1, data = lm_tbl) broom::tidy(sbs84_lm) %>% knitr::kable() ``` # Conclusions These analyses confirm an overall association between *AICDA* expression and levels of aSHM and SBS84 exposure. However, meaningful correlations with *MLLT1* and *MLLT3* expression were not identified. Further analyses of loci bound by these histone readers, as identified by ChIP-seq, may further be used to tease apart subtype-specific patterns of aSHM driven by these proteins. # References ::: {#refs} :::