Setup

library(ithi.utils)
load_base_libs()

library(methods)
library(grid)
library(gridExtra)
library(gridBase)

library(ithi.meta)
library(ithi.figures)
library(ithi.utils)
library(ithi.seq)
library(ithi.clones)
library(ithi.supp)
library(ithi.xcr)
ihc_table_path <- snakemake@input$ihc_table
xcr_table_path <- snakemake@input$xcr_table
tcr_diversity_file <- snakemake@input$tcr_diversity
bcr_diversity_file <- snakemake@input$bcr_diversity
nanostring_data_path <- snakemake@input$nanostring_data
nanostring_annotations_path <- snakemake@input$nanostring_annotations

db_path <- snakemake@params$db
annotation_colours <- ithi.figures::get_annotation_colours()

ihc_table <- fread(ihc_table_path)
xcr_table <- read_clonotypes(xcr_table_path, duplicates = FALSE, db_path = db_path)

Read 3.3% of 304822 rows
Read 36.1% of 304822 rows
Read 72.2% of 304822 rows
Read 304822 rows and 18 (of 18) columns from 0.070 GB file in 00:00:06
xcr_diversity <- ithi.supp::get_xcr_diversity(tcr_diversity_file, bcr_diversity_file, 
    db_path, xcr_table)

exprs <- fread(nanostring_data_path)
nanostring_labels <- fread(nanostring_annotations_path)

pathway_matrix <- ithi.expr::create_pathway_matrix(exprs, nanostring_labels, 
    db_path, convert_ids = TRUE, summary_method = "arithmetic_mean")
pathway_df <- pathway_matrix %>% t %>% as.data.frame %>% rownames_to_column(var = "condensed_id")
pathway_df$patient_id <- ithi.meta::map_id(pathway_df$condensed_id, from = "condensed_id", 
    to = "patient_id", db_path)

Analysis

The first point is expected. We commented on this in the manuscript – saying that patients with higher mean CD8+ TIL density also had higher TCR repertoire similarity.

The example given emphasized the wrong point – the point I wanted to highlight was that uniformity in TIL densities/T-cell associated gene expression does not imply uniformity in clonotype composition, highlighting the need to go beyond TIL densities when considering the possible utility of these TIL in clinical applications like adoptive cell therapy. If you’re treating a clonally heterogeneous disease, you better grab the TIL needed for all the clones – especially given our finding that TCR clones appear to track with tumour clones across space. The lesson would be to not be fooled by uniformity in TIL densities across sites – what appears to be uniformity in TIL response on IHC DOES NOT mean you can take TIL from only one site when doing adoptive cell therapy.

That being said, our analysis showing that higher TIL infiltrate => lower TCR variation implies that, for patients with high TIL densities, that last statement isn’t relevant, if patients with high TIL densities are the most promising candidates for this type of therapy.

There aren’t really any striking examples of patients with high TIL density, low TIL density variation, and low TCR similarity. So perhaps this line of analysis isn’t really that important, and should be kept to a side note, as it was presented in the paper, or entirely removed.

What do you guys think?

xcr_table$patient_id <- ithi.meta::factor_id(xcr_table$patient_id %>% as.character, 
    type = "patient_id", db_path)
# xcr_gt2_patients <- xcr_table %>% group_by(patient_id) %>%
# summarise(nsamples=length(unique(condensed_id))) %>% subset(nsamples > 2)

intersect_patients <- Reduce(intersect, list(xcr_table$patient_id, ihc_table$patient_id, 
    pathway_df$patient_id))
xcr_table_intersect <- subset(xcr_table, patient_id %in% intersect_patients)
xcr_table_intersect$patient_id <- factor(xcr_table_intersect$patient_id, levels = intersect(levels(xcr_table_intersect$patient_id), 
    intersect_patients))
ihc_table_intersect <- subset(ihc_table, patient_id %in% intersect_patients)
pathway_df_intersect <- subset(pathway_df, patient_id %in% intersect_patients)

tcr_objects <- plot_xcr_variability_patient(xcr_table_intersect, segment_type = "TRB", 
    filter_tissue = FALSE, id_type = "condensed_id", distance_method = "horn", 
    db_path, force_font = FALSE, resample = TRUE, include_diversity = FALSE, 
    tcr_diversity, bcr_diversity, margins = "minimal", xlabel = "", reorder_values = TRUE)
tcr_var_plot <- tcr_objects$p
tcr_patient_order <- tcr_objects$order

bcr_objects <- plot_xcr_variability_patient(xcr_table_intersect, segment_type = "IGH", 
    filter_tissue = FALSE, id_type = "condensed_id", distance_method = "horn", 
    db_path, force_font = FALSE, resample = TRUE, include_diversity = FALSE, 
    tcr_diversity, bcr_diversity, margins = "minimal", xlabel = "", reorder_values = TRUE)
bcr_var_plot <- bcr_objects$p
bcr_patient_order <- bcr_objects$order


T_cell_features <- c("T_CD8_density", "T_CD4_density", "T-Cell Functions")
B_cell_features <- c("T_CD20_density", "T_Plasma_density", "B-Cell Functions")
id_vars <- c("condensed_id", "patient_id")
T_cell_ihc_features <- intersect(T_cell_features, colnames(ihc_table_intersect))
B_cell_ihc_features <- intersect(B_cell_features, colnames(ihc_table_intersect))
T_cell_expr_features <- intersect(T_cell_features, colnames(pathway_df_intersect))
B_cell_expr_features <- intersect(B_cell_features, colnames(pathway_df_intersect))

ihc_table_intersect_melted <- reshape2::melt(ihc_table_intersect, id.vars = id_vars, 
    measure.vars = c(T_cell_ihc_features, B_cell_ihc_features), variable.name = "tiltype", 
    value.name = "density")
pathway_df_intersect_melted <- reshape2::melt(pathway_df_intersect, id.vars = id_vars, 
    measure.vars = c(T_cell_expr_features, B_cell_expr_features), variable.name = "pathway", 
    value.name = "expression")
pathway_df_intersect_melted$patient_id <- ithi.meta::factor_id(pathway_df_intersect_melted$patient_id, 
    type = "patient_id", db_path)

features_list <- list(T_cell_features, B_cell_features)
names(features_list) <- c("T_cell", "B_cell")

plot_boxes <- function(dat, yvar, ylabel, xlabel = "", log_y = FALSE, order = NULL, 
    center = FALSE) {
    if (!is.null(order)) {
        dat$patient_id <- factor(dat$patient_id, levels = as.character(order))
    }
    if (center) {
        dat <- dat %>% dplyr::group_by(patient_id) %>% mutate_(.dots = setNames(list(lazyeval::interp(quote(x_val - 
            mean(x_val, na.rm = TRUE)), x_val = as.name(yvar))), yvar))
    }
    
    p <- ggplot(dat, aes_string(x = "patient_id", y = yvar)) + stat_boxplot(geom = "errorbar", 
        width = 0.3) + geom_boxplot(aes(fill = patient_id)) + theme_bw() + theme_Publication() + 
        theme_nature() + ylab(ylabel) + guides(fill = FALSE) + xlab(xlabel) + 
        ithi.utils::ggmargins(type = "minimal") + scale_fill_manual(values = annotation_colours$patient_id)
    if (log_y) {
        p <- p + scale_y_continuous(breaks = ithi.utils::log_scale_breaks(), 
            labels = ithi.utils::log_scale_labels())
    }
    return(p)
}

order_boxplots <- FALSE
log_y <- TRUE
center <- FALSE

produce_variation_boxplots <- function(features, ihc_table_intersect_melted, 
    pathway_df_intersect_melted, patient_order = NULL) {
    ihc_feats <- intersect(features, ihc_table_intersect_melted$tiltype)
    
    ihc_plots <- lapply(ihc_feats, function(x) {
        dat <- subset(ihc_table_intersect_melted, tiltype == x)
        p <- plot_boxes(dat, yvar = "density", ylabel = x, xlabel = "", log_y = log_y, 
            order = patient_order, center = center)
        return(p)
    })
    names(ihc_plots) <- ihc_feats
    
    expr_feats <- intersect(features, pathway_df_intersect_melted$pathway)
    expr_plots <- lapply(expr_feats, function(x) {
        dat <- subset(pathway_df_intersect_melted, pathway == x)
        p <- plot_boxes(dat, yvar = "expression", ylabel = x, xlabel = "Patient", 
            order = patient_order, center = center)
        return(p)
    })
    names(expr_plots) <- expr_feats
    
    return(c(ihc_plots, expr_plots))
}
tcr_variation_boxplots <- produce_variation_boxplots(features = T_cell_features, 
    ihc_table_intersect_melted = ihc_table_intersect_melted, pathway_df_intersect_melted = pathway_df_intersect_melted, 
    patient_order = tcr_patient_order)

bcr_variation_boxplots <- produce_variation_boxplots(features = B_cell_features, 
    ihc_table_intersect_melted = ihc_table_intersect_melted, pathway_df_intersect_melted = pathway_df_intersect_melted, 
    patient_order = bcr_patient_order)


T_cell_components <- list(tcr_var_plot, tcr_variation_boxplots$T_CD8_density, 
    tcr_variation_boxplots$T_CD4_density, tcr_variation_boxplots$`T-Cell Functions`)
B_cell_components <- list(bcr_var_plot, bcr_variation_boxplots$T_CD20_density, 
    bcr_variation_boxplots$T_Plasma_density, bcr_variation_boxplots$`B-Cell Functions`)

T_cell_grobs <- lapply(T_cell_components, ggplotGrob)
B_cell_grobs <- lapply(B_cell_components, ggplotGrob)
T_cell_plot <- do.call(gridExtra::rbind.gtable, T_cell_grobs)
B_cell_plot <- do.call(gridExtra::rbind.gtable, B_cell_grobs)
grid.newpage()
grid.draw(T_cell_plot)

grid.newpage()
grid.draw(B_cell_plot)

What you should be looking to compare is the VALUE of the first boxplot in each 4-tuple with the SPREAD of each of the boxplots in the subsequent 3 boxplots. I realize this is confusing – what would be a better way of showing this?

  1. A key example to note would be, for example, patient 15 – high TCR similarity (low variation) while having high variation in T_CD8_density and T_CD4_density. This suggests that, despite having very similar TCR repertoires across the board, the degree of infiltration can vary substantially from site to site.

  2. Another interesting example would be patient 17 – fairly low variation in CD8 and CD4 densities, with relatively high CD8 and CD4 densities, and fairly variable TCR repertoires (low TCR similarity). This would imply that – even in patients that are uniformly TIL-hot, they can harbor vastly different TCR repertoires from site to site. However, I might be squinting a bit too hard at the plot; I would not say this is a particularly strong example. It’s not surprising that this case is rare, because high CD8 TIL density is associated with high TCR similarity.

Perhaps we can change the presentation of this entire section – we can start off by introducing how TCR/BCR diversity correlate to TIL densities, then show these variation plots (or a better version of them), and start by describing the lack of patients falling in (2) (on the other hand, low TCR similarity with similar TIL densities does exist, but only for TIL-cold patients), and from there talk about how CD8+ TIL density is correlated to TCR similarity. That way this information isn’t redundant, and provides a more comprehensive view of intrapatient variability before going into summary statistics.

CAMILA: Would be great to get your comments/thoughts about this intrapatient variation analysis.

---
title: "Variation analysis"
---
                        ```{r, echo=FALSE, message=FALSE, warning=FALSE}

######## Snakemake header ########
library(methods)
Snakemake <- setClass(
    "Snakemake",
    slots = c(
        input = "list",
        output = "list",
        params = "list",
        wildcards = "list",
        threads = "numeric",
        log = "list",
        resources = "list",
        config = "list",
        rule = "character"
    )
)
snakemake <- Snakemake(
    input = list('notebooks/immune_variation.Rmd', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ihc_table.tsv', '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/IGH/postfilter_diversity_stats/diversity.strict.resampled.txt', '/shahlab/alzhang/projects/ITH_Immune/data/expression/nanostring/pancancer_annotations.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/xcr_table.tsv', '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/TRB/postfilter_diversity_stats/diversity.strict.resampled.txt', '/shahlab/alzhang/projects/ITH_Immune/results/nanostring_results/ith_full/qc/limma_quantile/normalized_expression_voa_labels_filtered.tsv', "notebook" = 'notebooks/immune_variation.Rmd', "ihc_table" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ihc_table.tsv', "bcr_diversity" = '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/IGH/postfilter_diversity_stats/diversity.strict.resampled.txt', "nanostring_annotations" = '/shahlab/alzhang/projects/ITH_Immune/data/expression/nanostring/pancancer_annotations.tsv', "xcr_table" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/xcr_table.tsv', "tcr_diversity" = '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/TRB/postfilter_diversity_stats/diversity.strict.resampled.txt', "nanostring_data" = '/shahlab/alzhang/projects/ITH_Immune/results/nanostring_results/ith_full/qc/limma_quantile/normalized_expression_voa_labels_filtered.tsv'),
    output = list('/shahlab/alzhang/projects/ITH_Immune/paper/results/review/notebooks/run2/immune_variation.nb.html'),
    params = list('immune_variation_analysis', '/shahlab/alzhang/projects/ITH_Immune/metadata/db/immune_project.sqlite3', "name" = 'immune_variation_analysis', "db" = '/shahlab/alzhang/projects/ITH_Immune/metadata/db/immune_project.sqlite3'),
    wildcards = list(),
    threads = 1,
    log = list('/shahlab/alzhang/clusttmp/paperreview2/notebooks/immune_variation_analysis.log'),
    resources = list(),
    config = list("he_results_dir" = '/shahlab/alzhang/data/ithi/finn_results/he_output_Nov29', "prevalence_threshold" = 0.01, "rooney_mutsigcv_file" = '/shahlab/alzhang/projects/ITH_Immune/external/other_papers/mmc6.xlsx', "molsubtypes" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/molsubtypes.tsv', "known_subtypes_array" = '/shahlab/alzhang/projects/ITH_Immune/data/expression/array/subtypes/known_subtypes.tsv', "snv_cluster_dir" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster', "til_clusters_output" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/intermediates/run2/til_clusters_output.txt', "ith_icgc_bc" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ith_icgc_merged_bc.tsv', "tilcluster_supervised_ipynb" = '/shahlab/alzhang/projects/ITH_Immune/paper/review/ipy/tilcluster_supervisedmulticlass.ipynb', "clone_trees" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/tree_data.tsv', "mmctm_final_patient_dir" = '/shahlab/alzhang/projects/ITH_Immune/results/mmctm_results/ith_by-patient_with-ov', "clola_result_file" = '/shahlab/alzhang/pipeline_outputs/ith_immune/clola/run4/clola_condensed_results/beta/clola_results.tsv', "somatic_coding_result_dir" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/somatic_coding_variants', "ith_stats" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ith_statistics.tsv', "tcr_diversity" = '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/TRB/postfilter_diversity_stats/diversity.strict.resampled.txt', "total_tiltypes" = c('T_CD8_density', 'T_CD4_density', 'T_CD20_density', 'T_Plasma_density'), "icgc_specimen" = '/shahlab/alzhang/data/ICGC/specimen.tsv', "array_expression_file" = '/shahlab/alzhang/projects/ITH_Immune/data/expression/array/gene_exprs_rma_batch_corrected.txt', "tumour_purity" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/tumour_purity.tsv', "image_summary2" = '/shahlab/alzhang/data/ithi/yuan_hecr_image_results_2.csv', "notebook_dir" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/review/notebooks/run2', "epitopes_unique_filtered" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/epitopes_unique_filtered.tsv', "variability_type" = 'stabilize', "distance_method" = 'horn', "tils_for_variability" = c('T_CD8_density', 'T_CD4_density', 'T_CD20_density', 'T_Plasma_density'), "benchmarkdir" = '/shahlab/alzhang/benchmarks/paperreview2', "tils_for_cluster" = c('E_CD8_density', 'E_CD4_density', 'E_CD20_density', 'E_Plasma_density', 'S_CD8_density', 'S_CD4_density', 'S_CD20_density', 'S_Plasma_density'), "breakpoint_table" = '/shahlab/amcpherson/projects/ith3/ith3/notebooks/bespoke/ith_breakpoints.tsv', "clone_prevalences" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/clone_data.tsv', "image_summary" = '/shahlab/alzhang/data/ithi/yuan_hecr_image_results.csv', "remixt_cellularity_ploidy" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/remixt_cellularity_ploidy.tsv', "logdir" = '/shahlab/alzhang/clusttmp/paperreview2', "ith_stat_types" = c('entropy', 'postprocessed_divergence', 'combined_ith_normalized', 'proportion_subclonal'), "ihc_table" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ihc_table.tsv', "finnhe_pipeline_results_dir" = '/shahlab/alzhang/pipeline_outputs/ith_immune/finnhe/run1', "ihc_features_output" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/intermediates/run2/ihc_features_output.txt', "snv_table" = '/shahlab/amcpherson/projects/ith3/ith3/notebooks/bespoke/ith_snvs.tsv', "igpartition_outdir" = '/shahlab/alzhang/pipeline_outputs/ith_immune/igpartition/run22', "nanostring_data" = '/shahlab/alzhang/projects/ITH_Immune/results/nanostring_results/ith_full/qc/limma_quantile/normalized_expression_voa_labels_filtered.tsv', "bcr_diversity" = '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/IGH/postfilter_diversity_stats/diversity.strict.resampled.txt', "refseq_gene_file" = '/shahlab/alzhang/data/genome/hg19/refseq_genes.bed', "xcr_table" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/xcr_table.tsv', "nanostring_annotations" = '/shahlab/alzhang/projects/ITH_Immune/data/expression/nanostring/pancancer_annotations.tsv', "clone_branch_lengths" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/branch_data.tsv', "icgc_subtypes" = '/shahlab/alzhang/data/ICGC/icgc_primary_tumour_subtypes.tsv', "copynumber_table" = '/shahlab/alzhang/data/ithi/master_copynumber_file.tsv', "db" = '/shahlab/alzhang/projects/ITH_Immune/metadata/db/immune_project.sqlite3', "patients_for_clonal" = c(1, 2, 3, 4, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17), "all_tiltypes" = c('T_CD8_density', 'T_CD4_density', 'T_CD20_density', 'T_Plasma_density', 'E_CD8_density', 'E_CD4_density', 'E_CD20_density', 'E_Plasma_density', 'S_CD8_density', 'S_CD4_density', 'S_CD20_density', 'S_Plasma_density'), "table_dir" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/review/tables/run2', "neoediting_outdir" = '/shahlab/alzhang/pipeline_outputs/ith_immune/neoediting/run6'),
    rule = 'immune_variation_analysis'
)
######## Original script #########

                        ```


## Setup

```{r global_chunk_options, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, tidy=TRUE, warning=FALSE, message=FALSE, cache=TRUE) #cache=TRUE
```

```{r}
library(ithi.utils)
load_base_libs()

library(methods)
library(grid)
library(gridExtra)
library(gridBase)

library(ithi.meta)
library(ithi.figures)
library(ithi.utils)
library(ithi.seq)
library(ithi.clones)
library(ithi.supp)
library(ithi.xcr)
```

```{r}
ihc_table_path <- snakemake@input$ihc_table
xcr_table_path <- snakemake@input$xcr_table
tcr_diversity_file <- snakemake@input$tcr_diversity
bcr_diversity_file <- snakemake@input$bcr_diversity
nanostring_data_path <- snakemake@input$nanostring_data
nanostring_annotations_path <- snakemake@input$nanostring_annotations

db_path <- snakemake@params$db
```

```{r}
annotation_colours <- ithi.figures::get_annotation_colours()

ihc_table <- fread(ihc_table_path)
xcr_table <- read_clonotypes(xcr_table_path, duplicates = FALSE, db_path = db_path)

xcr_diversity <- ithi.supp::get_xcr_diversity(tcr_diversity_file, bcr_diversity_file, db_path, xcr_table)

exprs <- fread(nanostring_data_path)
nanostring_labels <- fread(nanostring_annotations_path)

pathway_matrix <- ithi.expr::create_pathway_matrix(exprs, nanostring_labels, db_path, convert_ids = TRUE, summary_method = "arithmetic_mean")
pathway_df <- pathway_matrix %>% t %>% as.data.frame %>% rownames_to_column(var = "condensed_id")
pathway_df$patient_id <- ithi.meta::map_id(pathway_df$condensed_id, from = "condensed_id", to = "patient_id", db_path)
```

## Analysis

The first point is expected. We commented on this in the manuscript -- saying that patients with higher mean CD8+ TIL density also had higher TCR repertoire similarity. 

The example given emphasized the wrong point -- the point I wanted to highlight was that uniformity in TIL densities/T-cell associated gene expression does not imply uniformity in clonotype composition, highlighting the need to go beyond TIL densities when considering the possible  utility of these TIL in clinical applications like adoptive cell therapy. If you're treating a clonally heterogeneous disease, you better grab the TIL needed for all the clones -- especially given our finding that TCR clones appear to track with tumour clones across space. The lesson would be to not be fooled by uniformity in TIL densities across sites -- what appears to be uniformity in TIL response on IHC DOES NOT mean you can take  TIL from only one site when doing adoptive cell therapy. 

That being said, our analysis showing that higher TIL infiltrate => lower TCR variation implies that, for patients with high TIL densities, that last statement isn't relevant, if patients with high TIL densities are the most promising candidates for this type of therapy. 

There aren't really any striking examples of patients with high TIL density, low TIL density variation, and low TCR similarity. **So perhaps this line of analysis isn't really that important, and should be kept to a side note, as it was presented in the paper, or entirely removed.**

What do you guys think?

```{r}
xcr_table$patient_id <- ithi.meta::factor_id(xcr_table$patient_id %>% as.character, type = "patient_id", db_path)
#xcr_gt2_patients <- xcr_table %>% group_by(patient_id) %>% summarise(nsamples=length(unique(condensed_id))) %>% subset(nsamples > 2)

intersect_patients <- Reduce(intersect, list(xcr_table$patient_id, ihc_table$patient_id, pathway_df$patient_id))
xcr_table_intersect <- subset(xcr_table, patient_id %in% intersect_patients)
xcr_table_intersect$patient_id <- factor(xcr_table_intersect$patient_id, levels = intersect(levels(xcr_table_intersect$patient_id), intersect_patients))
ihc_table_intersect <- subset(ihc_table, patient_id %in% intersect_patients)
pathway_df_intersect <- subset(pathway_df, patient_id %in% intersect_patients)

tcr_objects <- plot_xcr_variability_patient(xcr_table_intersect, segment_type = "TRB", filter_tissue = FALSE,
                                            id_type = "condensed_id", distance_method = "horn", db_path, force_font = FALSE, resample = TRUE,
                                            include_diversity = FALSE, tcr_diversity, bcr_diversity, margins = "minimal", xlabel = "",
                                            reorder_values = TRUE)
tcr_var_plot <- tcr_objects$p
tcr_patient_order <- tcr_objects$order

bcr_objects <- plot_xcr_variability_patient(xcr_table_intersect, segment_type = "IGH", filter_tissue = FALSE,
                                            id_type = "condensed_id", distance_method = "horn", db_path, force_font = FALSE, resample = TRUE,
                                            include_diversity = FALSE, tcr_diversity, bcr_diversity, margins = "minimal", xlabel = "",
                                            reorder_values = TRUE)
bcr_var_plot <- bcr_objects$p
bcr_patient_order <- bcr_objects$order


T_cell_features <- c("T_CD8_density", "T_CD4_density",
                     "T-Cell Functions")
B_cell_features <- c("T_CD20_density", "T_Plasma_density",
                     "B-Cell Functions")
id_vars <- c("condensed_id", "patient_id")
T_cell_ihc_features <- intersect(T_cell_features, colnames(ihc_table_intersect))
B_cell_ihc_features <- intersect(B_cell_features, colnames(ihc_table_intersect))
T_cell_expr_features <- intersect(T_cell_features, colnames(pathway_df_intersect))
B_cell_expr_features <- intersect(B_cell_features, colnames(pathway_df_intersect))

ihc_table_intersect_melted <- reshape2::melt(ihc_table_intersect, id.vars = id_vars, measure.vars = c(T_cell_ihc_features, B_cell_ihc_features), 
                                             variable.name = "tiltype", value.name = "density")
pathway_df_intersect_melted <- reshape2::melt(pathway_df_intersect, id.vars = id_vars, measure.vars = c(T_cell_expr_features, B_cell_expr_features), variable.name = "pathway", value.name = "expression")
pathway_df_intersect_melted$patient_id <- ithi.meta::factor_id(pathway_df_intersect_melted$patient_id, type = "patient_id", db_path)

features_list <- list(T_cell_features, B_cell_features)
names(features_list) <- c("T_cell", "B_cell")

plot_boxes <- function(dat, yvar, ylabel, xlabel = "", log_y = FALSE, order = NULL, center = FALSE) {
  if (!is.null(order)) {
    dat$patient_id <- factor(dat$patient_id, levels = as.character(order))
  }
  if (center) {
    dat <- dat %>% dplyr::group_by(patient_id) %>% mutate_(.dots = setNames(list(lazyeval::interp(quote(x_val - mean(x_val, na.rm=TRUE)), x_val = as.name(yvar))), yvar))
  }
  
  p <- ggplot(dat, aes_string(x="patient_id", y = yvar)) + stat_boxplot(geom = 'errorbar', width = 0.3) + geom_boxplot(aes(fill=patient_id)) + theme_bw() + theme_Publication() + theme_nature() + ylab(ylabel) + guides(fill=FALSE) + xlab(xlabel) + ithi.utils::ggmargins(type = "minimal") + scale_fill_manual(values = annotation_colours$patient_id)
  if (log_y) {
    p <- p + scale_y_continuous(breaks = ithi.utils::log_scale_breaks(), labels = ithi.utils::log_scale_labels())
  }
  return(p)
}

order_boxplots <- FALSE
log_y <- TRUE
center <- FALSE

produce_variation_boxplots <- function(features, ihc_table_intersect_melted, 
                                       pathway_df_intersect_melted, patient_order = NULL) {
  ihc_feats <- intersect(features, ihc_table_intersect_melted$tiltype)
  
  ihc_plots <- lapply(ihc_feats, function(x) {
    dat <- subset(ihc_table_intersect_melted, tiltype == x)
    p <- plot_boxes(dat, yvar = "density", ylabel = x, xlabel = "", log_y = log_y, order = patient_order, center = center)
    return(p)
  })
  names(ihc_plots) <- ihc_feats
  
  expr_feats <- intersect(features, pathway_df_intersect_melted$pathway)
  expr_plots <- lapply(expr_feats, function(x) {
    dat <- subset(pathway_df_intersect_melted, pathway == x)
    p <- plot_boxes(dat, yvar = "expression", ylabel = x, xlabel = "Patient", order = patient_order, center = center)
    return(p)
  })
  names(expr_plots) <- expr_feats
  
  return(c(ihc_plots, expr_plots))
}
```

```{r}
tcr_variation_boxplots <- produce_variation_boxplots(features = T_cell_features, ihc_table_intersect_melted = ihc_table_intersect_melted, 
                                                     pathway_df_intersect_melted = pathway_df_intersect_melted, patient_order = tcr_patient_order)

bcr_variation_boxplots <- produce_variation_boxplots(features = B_cell_features, ihc_table_intersect_melted = ihc_table_intersect_melted, 
                                                     pathway_df_intersect_melted = pathway_df_intersect_melted, patient_order = bcr_patient_order)


T_cell_components <- list(tcr_var_plot,
                          tcr_variation_boxplots$T_CD8_density,
                          tcr_variation_boxplots$T_CD4_density,
                          tcr_variation_boxplots$`T-Cell Functions`
                          )
B_cell_components <- list(bcr_var_plot,
                          bcr_variation_boxplots$T_CD20_density,
                          bcr_variation_boxplots$T_Plasma_density,
                          bcr_variation_boxplots$`B-Cell Functions`
                          )

T_cell_grobs <- lapply(T_cell_components, ggplotGrob)
B_cell_grobs <- lapply(B_cell_components, ggplotGrob)
```

```{r}
T_cell_plot <- do.call(gridExtra::rbind.gtable, T_cell_grobs)
B_cell_plot <- do.call(gridExtra::rbind.gtable, B_cell_grobs)
```


```{r, fig.height = 10}
grid.newpage()
grid.draw(T_cell_plot)
```

```{r, fig.height = 10}
grid.newpage()
grid.draw(B_cell_plot)
```

What you should be looking to compare is the VALUE of the first boxplot in each 4-tuple with the SPREAD of each of the boxplots in the subsequent 3 boxplots. I realize this is confusing -- what would be a better way of showing this? 

(1) A key example to note would be, for example, patient 15 -- high TCR similarity (low variation) while having high variation in T_CD8_density and T_CD4_density. This suggests that, despite having very similar TCR repertoires across the board, the degree of infiltration can vary substantially from site to site.

(2) Another interesting example would be patient 17 -- fairly low variation in CD8 and CD4 densities, with relatively high CD8 and CD4 densities, and fairly variable TCR repertoires (low TCR similarity). This would imply that -- even in patients that are uniformly TIL-hot, they can harbor vastly different TCR repertoires from site to site. However, I might be squinting a bit too hard at the plot; I would not say this is a particularly strong example. It's not surprising that this case is rare, because high CD8 TIL density is associated with high TCR similarity. 

Perhaps we can change the presentation of this entire section -- we can start off by introducing how TCR/BCR diversity correlate to TIL densities, then show these variation plots (or a better version of them), and start by describing the lack of patients falling in (2) (on the other hand, low TCR similarity with similar TIL densities does exist, but only for TIL-cold patients), and from there talk about how CD8+ TIL density is correlated to TCR similarity. That way this information isn't redundant, and provides a more comprehensive view of intrapatient variability before going into summary statistics. 


**CAMILA**: Would be great to get your comments/thoughts about this intrapatient variation analysis. 
