Setup

library(ithi.utils)
load_base_libs()

library(methods)
library(grid)
library(gridExtra)
library(gridBase)

library(ithi.meta)
library(ithi.figures)
library(ithi.utils)
library(ithi.seq)
library(ithi.clones)
library(ithi.supp)
library(ithi.xcr)
ihc_table_path <- snakemake@input$ihc_table
xcr_table_path <- snakemake@input$xcr_table
neoediting_outdir <- snakemake@input$neoediting_outdir
snv_cluster_files <- snakemake@input$snv_cluster_files
clone_tree_file <- snakemake@input$clone_tree_file
clone_branch_length_file <- snakemake@input$clone_branch_length_file
clone_prevalence_file <- snakemake@input$clone_prevalence_file
tcr_diversity_file <- snakemake@input$tcr_diversity
bcr_diversity_file <- snakemake@input$bcr_diversity
epitopes_unique_file <- snakemake@input$epitopes_unique_filtered

db_path <- snakemake@params$db
annotation_colours <- ithi.figures::get_annotation_colours()

ihc_table <- fread(ihc_table_path)
xcr_table <- read_clonotypes(xcr_table_path, duplicates = FALSE, db_path = db_path)

Read 36.1% of 304822 rows
Read 88.6% of 304822 rows
Read 304822 rows and 18 (of 18) columns from 0.070 GB file in 00:00:04
tree_branch_data <- read_clone_tree_data(clone_tree_file, clone_branch_length_file, 
    clone_prevalence_file, db_path)

neoediting_res <- supp_neoediting(neoediting_outdir, ihc_table, db_path, tree_branch_data, 
    wtfilter = TRUE, full_epitopes = FALSE, snv_cluster_files = snv_cluster_files)
epitopes_unique <- fread(epitopes_unique_file)

Analysis

clonotype_publicity <- xcr_table %>% dplyr::group_by_(.dots = c("id", "type")) %>% 
    dplyr::summarise(npatients = length(unique(patient_id)))

public_clonotypes <- subset(clonotype_publicity, npatients > 1)
xcr_table_public <- subset(xcr_table, id %in% public_clonotypes$id)
xcr_table_public$id_simple <- factor(xcr_table_public$id) %>% as.numeric

xcr_table_public_sorted <- xcr_table_public[order(xcr_table_public$id_simple), 
    ]
xcr_table_public_sorted_subset <- subset(xcr_table_public_sorted, select = c("voa", 
    "patient_id", "condensed_id", "count", "freq", "type", "id_simple"))
ggplot(xcr_table, aes(x = freq)) + geom_histogram() + scale_x_log10() + theme_bw() + 
    theme_Publication() + theme_nature() + facet_wrap(~type, scales = "free")

ggplot(xcr_table_public_sorted, aes(x = freq)) + geom_histogram() + scale_x_log10() + 
    theme_bw() + theme_Publication() + theme_nature() + facet_wrap(~type, scales = "free")

So some of these clonotypes are definitely expanded. We can do a quick and dirty test to see if this is significant:

n <- 10000
segment_types <- unique(xcr_table_public_sorted$type)

empirical_pvals <- sapply(segment_types, function(segment) {
    dat <- subset(xcr_table_public_sorted, type == segment)
    observed_median <- dat$freq %>% median
    
    draws <- nrow(dat)
    null_medians <- sapply(1:n, function(i) sample(subset(xcr_table, type == 
        segment)$freq, size = draws, replace = FALSE) %>% median)
    
    p_empirical <- length(which(null_medians >= observed_median))/n
    
    return(p_empirical)
})
names(empirical_pvals) <- segment_types

empirical_pvals
IGH TRB 
  0   1 

So public TCRs are not expanded, while public BCRs are. This is interesting – but there is a possibility that contamination occurred to a greater degree in the BCR assays compared to the TCR assays. Unfortunately, some low-level contamination is unavoidable with the plate-based layout – even after doing informatic filtering.

So I’m hesitant to over-interpret this finding.

Next we need to see if public TCR clones are more commonly present in patients with public BCR clones.

The patients with public TCR and BCR clones are:

subset(xcr_table_public_sorted_subset, type == "TRB")$patient_id %>% unique %>% 
    sort
 [1]  1  2  3  4  7  8  9 14 17 20 21 22 23
subset(xcr_table_public_sorted_subset, type == "IGH")$patient_id %>% unique %>% 
    sort
 [1]  1  2  3  4  7  8  9 11 12 13 14 15 17 18 19 20 21 22 23

Almost every patient contains a public BCR clone. So instead, we should check if pairs correspond to each other (i.e if there are more patient A-B pairs for both TCR and BCR).

patient_combinations <- xcr_table_public_sorted_subset %>% dplyr::group_by(type, 
    id_simple) %>% do(unique_patients = unique(sort(.$patient_id)))

trb_contamination_combinations <- subset(patient_combinations, type == "TRB")
igh_contamination_combinations <- subset(patient_combinations, type == "IGH")

shared_combinations <- intersect(trb_contamination_combinations$unique_patients, 
    igh_contamination_combinations$unique_patients)

So 3 out of 15 TCR and 19 BCR patient combinations are shared. What shared combinations refers to are cases where patients A and B both share both >= 1 common TCR and BCR clone.

While this is probably higher than what’s expected purely by chance – and perhaps you could argue that publicity in TCRs correlates with publicity in BCRs – do keep in mind that contamination would have the same type of signal (patients w/ samples that have been contaminated could be contaminated in both TCRs and BCRs. Furthermore, if contamination occurs at a high level in TCRs – which makes it hard to filter out – it is also more likely to occur at a high level in BCRs.) So the potential biological interest of this finding is obscured by the very real – and quite probable – possibility that this is all just due to rare instances of contamination that couldn’t be adequately filtered out.

# epitopes_annotated <- neoediting_res$nonsynonymous
epitopes_annotated <- epitopes_unique
neoantigen_table <- subset(epitopes_annotated, `NetMHCpan MT Percentile` <= 
    2)

neoantigen_table$mutant_epitope_id <- factor(neoantigen_table$`MT Epitope Seq`) %>% 
    as.numeric

mutant_epitope_publicity <- neoantigen_table %>% dplyr::group_by(mutant_epitope_id, 
    `MT Epitope Seq`) %>% dplyr::summarise(npatient = length(unique(patient_id)))

nrow(subset(mutant_epitope_publicity, npatient > 1))
[1] 0

So there are no public neoepitopes. Nevertheless, as the relationship between TCRs/BCRs and antigens is many-to-many, this is not surprising and does not rule out the possibility of tumour-reactive, public clonotypes.

That being said, there is an aspect of the plate-based layout that is vulnerable to cross-sample contamination. Unfortunately this is unavoidable; and despite bioinformatic filters employed to eliminate this (Supp Methods), there is the possibility that public clonotypes are the result of contamination. As this probability is non-zero, we should perhaps acknowledge this in the main text?

Summary

We have public TCR and BCR clonotypes in many patients. Only patients 10 and 16 lack public BCR clonotypes. They also lack public TCR clonotypes, but the significance of these 2 patients is likely just because TIL densities are very low in these 2 patients.

However, the most likely source of public clonotypes is due to contamination, as a result of the plate-based layout of library preparation. We’ve taken steps to eliminate what appear to be likely incidences of contamination (see Supplementary Methods), but we cannot claim to capture all instances. As such, I’m hesitant to over-interpret any such findings – even if we find that certain samples that share more public TCR clonotypes also share more public BCR clonotypes – this would also be consistent with a pattern due to contamination, where samples that are contaminated for TCRs are more likely to be contaminated for BCRs (and vice versa), as the library prep for both is done with the same layout of samples on the plate(s).

We can confidently say that no neoepitopes are shared between patients.

---
title: "Public clonotype/neoantigen analysis"
---
                        ```{r, echo=FALSE, message=FALSE, warning=FALSE}

######## Snakemake header ########
library(methods)
Snakemake <- setClass(
    "Snakemake",
    slots = c(
        input = "list",
        output = "list",
        params = "list",
        wildcards = "list",
        threads = "numeric",
        log = "list",
        resources = "list",
        config = "list",
        rule = "character"
    )
)
snakemake <- Snakemake(
    input = list('/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/xcr_table.tsv', '/shahlab/alzhang/pipeline_outputs/ith_immune/neoediting/run6', '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/IGH/postfilter_diversity_stats/diversity.strict.resampled.txt', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_1.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_2.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_3.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_4.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_7.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_9.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_10.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_11.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_12.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_13.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_14.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_15.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_16.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_17.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/clone_data.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ihc_table.tsv', 'notebooks/public_clonotypes.Rmd', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/epitopes_unique_filtered.tsv', '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/TRB/postfilter_diversity_stats/diversity.strict.resampled.txt', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/branch_data.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/tree_data.tsv', "xcr_table" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/xcr_table.tsv', "neoediting_outdir" = '/shahlab/alzhang/pipeline_outputs/ith_immune/neoediting/run6', "bcr_diversity" = '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/IGH/postfilter_diversity_stats/diversity.strict.resampled.txt', "snv_cluster_files" = c('/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_1.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_2.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_3.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_4.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_7.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_9.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_10.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_11.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_12.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_13.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_14.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_15.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_16.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster/patient_17.tsv'), "clone_prevalence_file" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/clone_data.tsv', "ihc_table" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ihc_table.tsv', "notebook" = 'notebooks/public_clonotypes.Rmd', "epitopes_unique_filtered" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/epitopes_unique_filtered.tsv', "tcr_diversity" = '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/TRB/postfilter_diversity_stats/diversity.strict.resampled.txt', "clone_branch_length_file" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/branch_data.tsv', "clone_tree_file" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/tree_data.tsv'),
    output = list('/shahlab/alzhang/projects/ITH_Immune/paper/results/review/notebooks/run2/public_clonotypes.nb.html'),
    params = list('public_clonotype_analysis', '/shahlab/alzhang/projects/ITH_Immune/metadata/db/immune_project.sqlite3', "name" = 'public_clonotype_analysis', "db" = '/shahlab/alzhang/projects/ITH_Immune/metadata/db/immune_project.sqlite3'),
    wildcards = list(),
    threads = 1,
    log = list('/shahlab/alzhang/clusttmp/paperreview2/notebooks/public_clonotype_analysis.log'),
    resources = list(),
    config = list("nanostring_data" = '/shahlab/alzhang/projects/ITH_Immune/results/nanostring_results/ith_full/qc/limma_quantile/normalized_expression_voa_labels_filtered.tsv', "tils_for_variability" = c('T_CD8_density', 'T_CD4_density', 'T_CD20_density', 'T_Plasma_density'), "neoediting_outdir" = '/shahlab/alzhang/pipeline_outputs/ith_immune/neoediting/run6', "snv_table" = '/shahlab/amcpherson/projects/ith3/ith3/notebooks/bespoke/ith_snvs.tsv', "total_tiltypes" = c('T_CD8_density', 'T_CD4_density', 'T_CD20_density', 'T_Plasma_density'), "ith_stat_types" = c('entropy', 'postprocessed_divergence', 'combined_ith_normalized', 'proportion_subclonal'), "clola_result_file" = '/shahlab/alzhang/pipeline_outputs/ith_immune/clola/run4/clola_condensed_results/beta/clola_results.tsv', "logdir" = '/shahlab/alzhang/clusttmp/paperreview2', "tilcluster_supervised_ipynb" = '/shahlab/alzhang/projects/ITH_Immune/paper/review/ipy/tilcluster_supervisedmulticlass.ipynb', "icgc_subtypes" = '/shahlab/alzhang/data/ICGC/icgc_primary_tumour_subtypes.tsv', "epitopes_unique_filtered" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/epitopes_unique_filtered.tsv', "benchmarkdir" = '/shahlab/alzhang/benchmarks/paperreview2', "ith_stats" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ith_statistics.tsv', "igpartition_outdir" = '/shahlab/alzhang/pipeline_outputs/ith_immune/igpartition/run22', "icgc_specimen" = '/shahlab/alzhang/data/ICGC/specimen.tsv', "prevalence_threshold" = 0.01, "patients_for_clonal" = c(1, 2, 3, 4, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17), "ihc_features_output" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/intermediates/run2/ihc_features_output.txt', "til_clusters_output" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/intermediates/run2/til_clusters_output.txt', "rooney_mutsigcv_file" = '/shahlab/alzhang/projects/ITH_Immune/external/other_papers/mmc6.xlsx', "xcr_table" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/xcr_table.tsv', "image_summary" = '/shahlab/alzhang/data/ithi/yuan_hecr_image_results.csv', "snv_cluster_dir" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster', "tils_for_cluster" = c('E_CD8_density', 'E_CD4_density', 'E_CD20_density', 'E_Plasma_density', 'S_CD8_density', 'S_CD4_density', 'S_CD20_density', 'S_Plasma_density'), "ith_icgc_bc" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ith_icgc_merged_bc.tsv', "distance_method" = 'horn', "remixt_cellularity_ploidy" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/remixt_cellularity_ploidy.tsv', "refseq_gene_file" = '/shahlab/alzhang/data/genome/hg19/refseq_genes.bed', "clone_trees" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/tree_data.tsv', "molsubtypes" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/molsubtypes.tsv', "nanostring_annotations" = '/shahlab/alzhang/projects/ITH_Immune/data/expression/nanostring/pancancer_annotations.tsv', "all_tiltypes" = c('T_CD8_density', 'T_CD4_density', 'T_CD20_density', 'T_Plasma_density', 'E_CD8_density', 'E_CD4_density', 'E_CD20_density', 'E_Plasma_density', 'S_CD8_density', 'S_CD4_density', 'S_CD20_density', 'S_Plasma_density'), "mmctm_final_patient_dir" = '/shahlab/alzhang/projects/ITH_Immune/results/mmctm_results/ith_by-patient_with-ov', "table_dir" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/review/tables/run2', "image_summary2" = '/shahlab/alzhang/data/ithi/yuan_hecr_image_results_2.csv', "tumour_purity" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/tumour_purity.tsv', "bcr_diversity" = '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/IGH/postfilter_diversity_stats/diversity.strict.resampled.txt', "ihc_table" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ihc_table.tsv', "clone_branch_lengths" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/branch_data.tsv', "clone_prevalences" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/clone_data.tsv', "copynumber_table" = '/shahlab/alzhang/data/ithi/master_copynumber_file.tsv', "he_results_dir" = '/shahlab/alzhang/data/ithi/finn_results/he_output_Nov29', "db" = '/shahlab/alzhang/projects/ITH_Immune/metadata/db/immune_project.sqlite3', "somatic_coding_result_dir" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/somatic_coding_variants', "array_expression_file" = '/shahlab/alzhang/projects/ITH_Immune/data/expression/array/gene_exprs_rma_batch_corrected.txt', "notebook_dir" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/review/notebooks/run2', "known_subtypes_array" = '/shahlab/alzhang/projects/ITH_Immune/data/expression/array/subtypes/known_subtypes.tsv', "variability_type" = 'stabilize', "tcr_diversity" = '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/TRB/postfilter_diversity_stats/diversity.strict.resampled.txt', "breakpoint_table" = '/shahlab/amcpherson/projects/ith3/ith3/notebooks/bespoke/ith_breakpoints.tsv', "finnhe_pipeline_results_dir" = '/shahlab/alzhang/pipeline_outputs/ith_immune/finnhe/run1'),
    rule = 'public_clonotype_analysis'
)
######## Original script #########

                        ```


## Setup

```{r global_chunk_options, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, tidy=TRUE, warning=FALSE, message=FALSE, cache=TRUE) #cache=TRUE
```

```{r}
library(ithi.utils)
load_base_libs()

library(methods)
library(grid)
library(gridExtra)
library(gridBase)

library(ithi.meta)
library(ithi.figures)
library(ithi.utils)
library(ithi.seq)
library(ithi.clones)
library(ithi.supp)
library(ithi.xcr)
```

```{r}
ihc_table_path <- snakemake@input$ihc_table
xcr_table_path <- snakemake@input$xcr_table
neoediting_outdir <- snakemake@input$neoediting_outdir
snv_cluster_files <- snakemake@input$snv_cluster_files
clone_tree_file <- snakemake@input$clone_tree_file
clone_branch_length_file <- snakemake@input$clone_branch_length_file
clone_prevalence_file <- snakemake@input$clone_prevalence_file
tcr_diversity_file <- snakemake@input$tcr_diversity
bcr_diversity_file <- snakemake@input$bcr_diversity
epitopes_unique_file <- snakemake@input$epitopes_unique_filtered 

db_path <- snakemake@params$db
```

```{r}
annotation_colours <- ithi.figures::get_annotation_colours()

ihc_table <- fread(ihc_table_path)
xcr_table <- read_clonotypes(xcr_table_path, duplicates = FALSE, db_path = db_path)

tree_branch_data <- read_clone_tree_data(clone_tree_file, clone_branch_length_file, clone_prevalence_file, db_path)

neoediting_res <- supp_neoediting(neoediting_outdir, ihc_table, db_path, tree_branch_data, wtfilter = TRUE, full_epitopes = FALSE, 
                                  snv_cluster_files = snv_cluster_files)
epitopes_unique <- fread(epitopes_unique_file)
```

## Analysis

```{r}
clonotype_publicity <- xcr_table %>% dplyr::group_by_(.dots = c("id", "type")) %>% 
  dplyr::summarise(npatients=length(unique(patient_id)))

public_clonotypes <- subset(clonotype_publicity, npatients > 1)
xcr_table_public <- subset(xcr_table, id %in% public_clonotypes$id)
xcr_table_public$id_simple <- factor(xcr_table_public$id) %>% as.numeric

xcr_table_public_sorted <- xcr_table_public[order(xcr_table_public$id_simple),]
xcr_table_public_sorted_subset <- subset(xcr_table_public_sorted, select = c("voa", "patient_id", "condensed_id", "count", "freq", "type", "id_simple"))
```

```{r}
ggplot(xcr_table, aes(x=freq)) + geom_histogram() + scale_x_log10() + theme_bw() + theme_Publication() + theme_nature() + facet_wrap(~ type, scales = "free")
```

```{r}
ggplot(xcr_table_public_sorted, aes(x=freq)) + geom_histogram() + scale_x_log10() + theme_bw() + theme_Publication() + theme_nature() + facet_wrap(~ type, scales = "free")
```

So some of these clonotypes are definitely expanded. We can do a quick and dirty test to see if this is significant:

```{r}
n <- 10000
segment_types <- unique(xcr_table_public_sorted$type)

empirical_pvals <- sapply(segment_types, function(segment) {
  dat <- subset(xcr_table_public_sorted, type == segment)
  observed_median <- dat$freq %>% median
  
  draws <- nrow(dat)
  null_medians <- sapply(1:n, function(i) sample(subset(xcr_table, type == segment)$freq, size = draws, replace = FALSE) %>% median)
  
  p_empirical <- length(which(null_medians >= observed_median))/n
  
  return(p_empirical)
})
names(empirical_pvals) <- segment_types

empirical_pvals
```

So public TCRs are not expanded, while public BCRs are. This is interesting -- but there is a possibility that contamination occurred to a greater degree in the BCR assays compared to the TCR assays. Unfortunately, some low-level contamination is unavoidable with the plate-based layout -- even after doing informatic filtering. 

So I'm hesitant to over-interpret this finding. 


Next we need to see if public TCR clones are more commonly present in patients with public BCR clones. 

The patients with public TCR and BCR clones are:

```{r}
subset(xcr_table_public_sorted_subset, type == "TRB")$patient_id %>% unique %>% sort
```

```{r}
subset(xcr_table_public_sorted_subset, type == "IGH")$patient_id %>% unique %>% sort
```

Almost every patient contains a public BCR clone. So instead, we should check if pairs correspond to each other (i.e if there are more patient A-B pairs for both TCR and BCR).

```{r}
patient_combinations <- xcr_table_public_sorted_subset %>% dplyr::group_by(type, id_simple) %>% do(unique_patients=unique(sort(.$patient_id)))

trb_contamination_combinations <- subset(patient_combinations, type == 'TRB')
igh_contamination_combinations <- subset(patient_combinations, type == "IGH")

shared_combinations <- intersect(trb_contamination_combinations$unique_patients, igh_contamination_combinations$unique_patients)
```

So `r length(shared_combinations)` out of `r trb_contamination_combinations$unique_patients %>% unique %>% length` TCR and `r igh_contamination_combinations$unique_patients %>% unique %>% length` BCR patient combinations are shared. What shared combinations refers to are cases where patients A and B both share both >= 1 common TCR *and* BCR clone. 

While this is probably higher than what's expected purely by chance -- and perhaps you could argue that publicity in TCRs correlates with publicity in BCRs -- do keep in mind that contamination would have the same type of signal (patients w/ samples that have been contaminated could be contaminated in both TCRs and BCRs. Furthermore, if contamination occurs at a high level in TCRs -- which makes it hard to filter out -- it is also more likely to occur at a high level in BCRs.) So the potential biological interest of this finding is obscured by the very real -- and quite probable -- possibility that this is all just due to rare instances of contamination that couldn't be adequately filtered out.  

```{r}
#epitopes_annotated <- neoediting_res$nonsynonymous
epitopes_annotated <- epitopes_unique
neoantigen_table <- subset(epitopes_annotated, `NetMHCpan MT Percentile` <= 2)

neoantigen_table$mutant_epitope_id <- factor(neoantigen_table$`MT Epitope Seq`) %>% as.numeric

mutant_epitope_publicity <- neoantigen_table %>% dplyr::group_by(mutant_epitope_id, `MT Epitope Seq`) %>% dplyr::summarise(npatient=length(unique(patient_id)))

nrow(subset(mutant_epitope_publicity, npatient > 1))
```

So there are no public neoepitopes. Nevertheless, as the relationship between TCRs/BCRs and antigens is many-to-many, this is not surprising and does not rule out the possibility of tumour-reactive, public clonotypes. 

That being said, there is an aspect of the plate-based layout that is vulnerable to cross-sample contamination. Unfortunately this is unavoidable; and despite bioinformatic filters employed to eliminate this (Supp Methods), there is the possibility that public clonotypes are the result of contamination. As this probability is non-zero, we should perhaps acknowledge this in the main text? 

## Summary

We have public TCR and BCR clonotypes in many patients. Only patients 10 and 16 lack public BCR clonotypes. They also lack public TCR clonotypes, but the significance of these 2 patients is likely just because TIL densities are very low in these 2 patients. 

However, the most likely source of public clonotypes is due to contamination, as a result of the plate-based layout of library preparation. We've taken steps to eliminate what appear to be likely incidences of contamination (see Supplementary Methods), but we cannot claim to capture all instances. As such, I'm hesitant to over-interpret any such findings -- even if we find that certain samples that share more public TCR clonotypes also share more public BCR clonotypes -- this would also be consistent with a pattern due to contamination, where samples that are contaminated for TCRs are more likely to be contaminated for BCRs (and vice versa), as the library prep for both is done with the same layout of samples on the plate(s). 

We can confidently say that no neoepitopes are shared between patients. 
