Setup

library(ithi.utils)
load_base_libs()

library(methods)
library(grid)
library(gridExtra)
library(gridBase)

library(ithi.meta)
library(ithi.figures)
library(ithi.utils)
library(ithi.seq)
library(ithi.clones)
library(ithi.supp)
library(ithi.xcr)
ihc_table_path <- snakemake@input$ihc_table
xcr_table_path <- snakemake@input$xcr_table
clone_tree_file <- snakemake@input$clone_tree_file
clone_branch_length_file <- snakemake@input$clone_branch_length_file
clone_prevalence_file <- snakemake@input$clone_prevalence_file
tcr_diversity_file <- snakemake@input$tcr_diversity
bcr_diversity_file <- snakemake@input$bcr_diversity
molecular_subtype_file <- snakemake@input$molsubtypes
clonal_measures_file <- snakemake@input$ith_stats_file
mutsig_dir <- snakemake@input$mutsig_dir
ith_icgc_bc_file <- snakemake@input$ith_icgc_bc
nanostring_annotations_path <- snakemake@input$nanostring_annotations
icgc_subtype_file <- snakemake@input$icgc_subtypes
icgc_specimen_file <- snakemake@input$icgc_specimen
master_variant_file <- snakemake@input$snv_table
master_breakpoint_file <- snakemake@input$breakpoint_table

db_path <- snakemake@params$db
tils_for_cluster <- snakemake@params$tils_for_cluster
stat_types <- snakemake@params$ith_stat_types
nclusts <- 3

annotation_colours <- ithi.figures::get_annotation_colours()

ihc_table <- fread(ihc_table_path)
xcr_table <- read_clonotypes(xcr_table_path, duplicates = FALSE, db_path = db_path)

Read 19.7% of 304822 rows
Read 62.3% of 304822 rows
Read 88.6% of 304822 rows
Read 304822 rows and 18 (of 18) columns from 0.070 GB file in 00:00:05
molsubtypes <- fread(molecular_subtype_file)

tree_branch_data <- read_clone_tree_data(clone_tree_file, clone_branch_length_file, 
    clone_prevalence_file, db_path)

xcr_diversity <- ithi.supp::get_xcr_diversity(tcr_diversity_file, bcr_diversity_file, 
    db_path, xcr_table)

clonal_measures <- ithi.clones::read_ith_stats(clonal_measures_file, db_path, 
    duplicates = FALSE)

ith_icgc_bc <- fread(ith_icgc_bc_file)
icgc_specimen_data <- fread(icgc_specimen_file)

nanostring_labels <- fread(nanostring_annotations_path)

til_clusters <- ithi.figures:::get_til_clusters(ihc_table, molsubtypes, tils_for_cluster = tils_for_cluster, 
    nclusts = 3)

master_variant_table <- read_variant_file(master_variant_file, db_path)
master_breakpoint_table <- read_variant_file(master_breakpoint_file, db_path)

sig_results <- produce_signature_results(mutsig_dir, master_variant_table, master_breakpoint_table, 
    db_path)
sig_results_snv <- sig_results$snv
sig_results_sv <- sig_results$sv

Analysis

Reviewer 3 has several comments about statistical tests made in the manuscript. These are all valid + easy to address – some are mistakes/typos on my part, others are things we could/should have done, etc.

Fig S4B

Fixed to only show patients with samples in both groups.

cor_tilsubtype_clonal_res <- supp_cor_tilsubtype_clonal(ihc_table, tree_branch_data, 
    tils_for_cluster, nclusts, molsubtypes, db_path)
grid.newpage()
grid.draw(cor_tilsubtype_clonal_res$plots$tilclust_clonalsim)

There, no more ‘orphaned’ points.

Permutation test

Q: What analysis was performed in the sentence “(p > 0.3, permutation test, Figure S4B)” on page 6, which permutation test? Figure S4B is not a permutation test.

This was actually a nested ranks test. It is incorrectly noted as a Wilcoxon signed-rank within the figure legend – this needs to be fixed.

I can see why this would be confused for a MW test, though. What isn’t clearly noted here is that I plot the MEANS for each patient. In fact, all pairwise comparisons in each patient are being compared, not just the mean by cluster equality.

The reason why a Wilcoxon signed-rank is because, to make that type of comparison, patient needs to be accounted for as a random effect. The nested ranks test is one possible extension of a MW/Wilcoxon-type test that uses bootstrapping to compute p-values. See the R package nestedRanksTest.

Tumour purity vs. ITH

Q: In figure S4A, the correlation between “proportion sub-clonal” and “cellularity” is significant (p=0.0169), yet in the manuscript the authors say: “and none of the clonal measures were confounded by tumor purity (all p > 0.2, Figure S4A)”. Page 6.

This is a wording issue. Technically we’re right, but we should make it more obvious by listing out the CLONAL measures, i.e. not including proportion subclonal CN. This is a valid point though – we want to avoid using ambiguous wording wherever possible – not worth saving a few characters to make things more confusing.

Figure 1C

Q: Figure 1C does not say which statistical test was used, and if a Kruskal-Wallis was used, a post hoc analysis should be employed.

Point taken. We’ll do some post-hoc tests to compare pairwise significance between groups.

Since this is a Kruskal-Wallis, the most appropriate post-hoc test to use is the Dunn test.

May also be a good idea to annotate on the plot itself.

nonalpha_cluster_colours <- stringr::str_extract(annotation_colours$til_cluster_colours, 
    "#[0-9A-Z]{6}")
ith_boxplots <- ithi.figures:::plot_ith_boxplots(clonal_measures, stat_types, 
    til_clusters, nonalpha_cluster_colours, force_font = FALSE, scale_factor = 7/18, 
    orientation = "wide", post_hoc = TRUE)
grid.newpage()
grid.draw(ith_boxplots)

TODO: Add a legend for the asterisks. Two asterisks corresponds to 0.05, one to 0.1, and 3 to 0.001.

As per the results above, all measures except clone divergence show a significant difference between all other classes and ES-TIL.

Figure 6B

Q: Statistical analysis of 6B should include post hoc test.

This is going to be removed from the text (because the entire mutation signatures section is going to go poof), but let’s do it anyways for interest (and because I need this for a slide).

label_file <- Sys.glob(file.path(mutsig_dir, "output", "*_labels.tsv"))
labels <- data.table::fread(label_file)
labels <- subset(labels, select = c(library, project, histotype))
labels$library <- stringr::str_replace(labels$library, "^patient_", "")
labels <- labels %>% plyr::rename(c(library = "patient_id"))

signature_labels <- stringr::str_extract(c(colnames(sig_results_snv), colnames(sig_results_sv)), 
    "SN?V-[0-9]+")
signature_labels <- signature_labels[!is.na(signature_labels)]

sigheat_patient_res <- ithi.supp:::get_formal_clusters(sig_results, labels, 
    signature_labels, annotation_colours)

col_clust <- sigheat_patient_res$hc
sample_order <- col_clust$labels[col_clust$order]
clusters <- cutree(col_clust, 3)
general_labels <- as.data.frame(clusters) %>% tibble::rownames_to_column(var = "patient_id") %>% 
    plyr::rename(c(clusters = "cluster"))
general_labels$cluster <- as.character(general_labels$cluster)

message("Plotting boxplots ...")
nonalpha_cluster_colours <- stringr::str_extract(annotation_colours$mutsig_clusters, 
    "#[0-9A-Z]{6}")

# Technically icgc_subtypes hasn't been assigned, but we don't use it
# anymore so w/e
immune_boxplot_fig <- ithi.figures:::plot_immune_boxplot(ith_icgc_bc, level = "patient", 
    nanostring_labels, general_labels, molsubtypes, icgc_specimen_data, annotation_colours, 
    sample_order, db_path, icgc_subtypes, nonalpha_cluster_colours, force_font = FALSE, 
    post_hoc = TRUE)
grid.newpage()
grid.draw(immune_boxplot_fig)

TODO: Modify text to acknowledge that there is significant difference between H-HRD and H-FBI-2 in SOME immune pathways. Or alternatively just be more specific in saying that there isn’t significance in the other categories (instead of saying that they’re similar overall).

---
title: "Statistical errata and post-hoc tests"
---
                        ```{r, echo=FALSE, message=FALSE, warning=FALSE}

######## Snakemake header ########
library(methods)
Snakemake <- setClass(
    "Snakemake",
    slots = c(
        input = "list",
        output = "list",
        params = "list",
        wildcards = "list",
        threads = "numeric",
        log = "list",
        resources = "list",
        config = "list",
        rule = "character"
    )
)
snakemake <- Snakemake(
    input = list('/shahlab/amcpherson/projects/ith3/ith3/notebooks/bespoke/ith_snvs.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/branch_data.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/xcr_table.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ihc_table.tsv', '/shahlab/alzhang/projects/ITH_Immune/data/expression/nanostring/pancancer_annotations.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ith_statistics.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/molsubtypes.tsv', '/shahlab/amcpherson/projects/ith3/ith3/notebooks/bespoke/ith_breakpoints.tsv', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/tree_data.tsv', '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/IGH/postfilter_diversity_stats/diversity.strict.resampled.txt', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ith_icgc_merged_bc.tsv', '/shahlab/alzhang/data/ICGC/specimen.tsv', '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/TRB/postfilter_diversity_stats/diversity.strict.resampled.txt', '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/clone_data.tsv', '/shahlab/alzhang/data/ICGC/icgc_primary_tumour_subtypes.tsv', 'notebooks/statistical_errata_posthoc.Rmd', '/shahlab/alzhang/projects/ITH_Immune/results/mmctm_results/ith_by-patient_with-ov', "snv_table" = '/shahlab/amcpherson/projects/ith3/ith3/notebooks/bespoke/ith_snvs.tsv', "clone_branch_length_file" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/branch_data.tsv', "xcr_table" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/xcr_table.tsv', "ihc_table" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ihc_table.tsv', "nanostring_annotations" = '/shahlab/alzhang/projects/ITH_Immune/data/expression/nanostring/pancancer_annotations.tsv', "ith_stats_file" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ith_statistics.tsv', "molsubtypes" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/molsubtypes.tsv', "breakpoint_table" = '/shahlab/amcpherson/projects/ith3/ith3/notebooks/bespoke/ith_breakpoints.tsv', "clone_tree_file" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/tree_data.tsv', "bcr_diversity" = '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/IGH/postfilter_diversity_stats/diversity.strict.resampled.txt', "ith_icgc_bc" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ith_icgc_merged_bc.tsv', "icgc_specimen" = '/shahlab/alzhang/data/ICGC/specimen.tsv', "tcr_diversity" = '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/TRB/postfilter_diversity_stats/diversity.strict.resampled.txt', "clone_prevalence_file" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/clone_data.tsv', "icgc_subtypes" = '/shahlab/alzhang/data/ICGC/icgc_primary_tumour_subtypes.tsv', "notebook" = 'notebooks/statistical_errata_posthoc.Rmd', "mutsig_dir" = '/shahlab/alzhang/projects/ITH_Immune/results/mmctm_results/ith_by-patient_with-ov'),
    output = list('/shahlab/alzhang/projects/ITH_Immune/paper/results/review/notebooks/run2/statistical_errata_posthoc.nb.html'),
    params = list(c('entropy', 'postprocessed_divergence', 'combined_ith_normalized', 'proportion_subclonal'), '/shahlab/alzhang/projects/ITH_Immune/metadata/db/immune_project.sqlite3', 'statistical_errata_posthoc_analysis', c('E_CD8_density', 'E_CD4_density', 'E_CD20_density', 'E_Plasma_density', 'S_CD8_density', 'S_CD4_density', 'S_CD20_density', 'S_Plasma_density'), "ith_stat_types" = c('entropy', 'postprocessed_divergence', 'combined_ith_normalized', 'proportion_subclonal'), "db" = '/shahlab/alzhang/projects/ITH_Immune/metadata/db/immune_project.sqlite3', "tils_for_cluster" = c('E_CD8_density', 'E_CD4_density', 'E_CD20_density', 'E_Plasma_density', 'S_CD8_density', 'S_CD4_density', 'S_CD20_density', 'S_Plasma_density'), "name" = 'statistical_errata_posthoc_analysis'),
    wildcards = list(),
    threads = 1,
    log = list('/shahlab/alzhang/clusttmp/paperreview2/notebooks/statistical_errata_posthoc_analysis.log'),
    resources = list(),
    config = list("known_subtypes_array" = '/shahlab/alzhang/projects/ITH_Immune/data/expression/array/subtypes/known_subtypes.tsv', "image_summary" = '/shahlab/alzhang/data/ithi/yuan_hecr_image_results.csv', "somatic_coding_result_dir" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/somatic_coding_variants', "notebook_dir" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/review/notebooks/run2', "prevalence_threshold" = 0.01, "snv_cluster_dir" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/snv_cluster', "array_expression_file" = '/shahlab/alzhang/projects/ITH_Immune/data/expression/array/gene_exprs_rma_batch_corrected.txt', "snv_table" = '/shahlab/amcpherson/projects/ith3/ith3/notebooks/bespoke/ith_snvs.tsv', "finnhe_pipeline_results_dir" = '/shahlab/alzhang/pipeline_outputs/ith_immune/finnhe/run1', "tils_for_cluster" = c('E_CD8_density', 'E_CD4_density', 'E_CD20_density', 'E_Plasma_density', 'S_CD8_density', 'S_CD4_density', 'S_CD20_density', 'S_Plasma_density'), "clone_prevalences" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/clone_data.tsv', "patients_for_clonal" = c(1, 2, 3, 4, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17), "ihc_features_output" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/intermediates/run2/ihc_features_output.txt', "all_tiltypes" = c('T_CD8_density', 'T_CD4_density', 'T_CD20_density', 'T_Plasma_density', 'E_CD8_density', 'E_CD4_density', 'E_CD20_density', 'E_Plasma_density', 'S_CD8_density', 'S_CD4_density', 'S_CD20_density', 'S_Plasma_density'), "clone_branch_lengths" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/branch_data.tsv', "refseq_gene_file" = '/shahlab/alzhang/data/genome/hg19/refseq_genes.bed', "variability_type" = 'stabilize', "tils_for_variability" = c('T_CD8_density', 'T_CD4_density', 'T_CD20_density', 'T_Plasma_density'), "bcr_diversity" = '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/IGH/postfilter_diversity_stats/diversity.strict.resampled.txt', "xcr_table" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/xcr_table.tsv', "benchmarkdir" = '/shahlab/alzhang/benchmarks/paperreview2', "ihc_table" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ihc_table.tsv', "breakpoint_table" = '/shahlab/amcpherson/projects/ith3/ith3/notebooks/bespoke/ith_breakpoints.tsv', "db" = '/shahlab/alzhang/projects/ITH_Immune/metadata/db/immune_project.sqlite3', "epitopes_unique_filtered" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/epitopes_unique_filtered.tsv', "clone_trees" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/clones/tree_data.tsv', "igpartition_outdir" = '/shahlab/alzhang/pipeline_outputs/ith_immune/igpartition/run22', "icgc_specimen" = '/shahlab/alzhang/data/ICGC/specimen.tsv', "he_results_dir" = '/shahlab/alzhang/data/ithi/finn_results/he_output_Nov29', "copynumber_table" = '/shahlab/alzhang/data/ithi/master_copynumber_file.tsv', "table_dir" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/review/tables/run2', "icgc_subtypes" = '/shahlab/alzhang/data/ICGC/icgc_primary_tumour_subtypes.tsv', "rooney_mutsigcv_file" = '/shahlab/alzhang/projects/ITH_Immune/external/other_papers/mmc6.xlsx', "tumour_purity" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/tumour_purity.tsv', "neoediting_outdir" = '/shahlab/alzhang/pipeline_outputs/ith_immune/neoediting/run6', "logdir" = '/shahlab/alzhang/clusttmp/paperreview2', "ith_stat_types" = c('entropy', 'postprocessed_divergence', 'combined_ith_normalized', 'proportion_subclonal'), "tcr_diversity" = '/shahlab/alzhang/pipeline_outputs/ith_immune/mixcr/mixcr_runs/ith_1_2_3/mixcr5/postprocess/TRB/postfilter_diversity_stats/diversity.strict.resampled.txt', "til_clusters_output" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/intermediates/run2/til_clusters_output.txt', "remixt_cellularity_ploidy" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/remixt_cellularity_ploidy.tsv', "nanostring_annotations" = '/shahlab/alzhang/projects/ITH_Immune/data/expression/nanostring/pancancer_annotations.tsv', "distance_method" = 'horn', "molsubtypes" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/molsubtypes.tsv', "mmctm_final_patient_dir" = '/shahlab/alzhang/projects/ITH_Immune/results/mmctm_results/ith_by-patient_with-ov', "ith_icgc_bc" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ith_icgc_merged_bc.tsv', "nanostring_data" = '/shahlab/alzhang/projects/ITH_Immune/results/nanostring_results/ith_full/qc/limma_quantile/normalized_expression_voa_labels_filtered.tsv', "tilcluster_supervised_ipynb" = '/shahlab/alzhang/projects/ITH_Immune/paper/review/ipy/tilcluster_supervisedmulticlass.ipynb', "clola_result_file" = '/shahlab/alzhang/pipeline_outputs/ith_immune/clola/run4/clola_condensed_results/beta/clola_results.tsv', "total_tiltypes" = c('T_CD8_density', 'T_CD4_density', 'T_CD20_density', 'T_Plasma_density'), "ith_stats" = '/shahlab/alzhang/projects/ITH_Immune/paper/results/tables/run2/ith_statistics.tsv', "image_summary2" = '/shahlab/alzhang/data/ithi/yuan_hecr_image_results_2.csv'),
    rule = 'statistical_errata_posthoc_analysis'
)
######## Original script #########

                        ```


## Setup

```{r global_chunk_options, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, tidy=TRUE, warning=FALSE, message=FALSE, cache=TRUE) #cache=TRUE
```

```{r}
library(ithi.utils)
load_base_libs()

library(methods)
library(grid)
library(gridExtra)
library(gridBase)

library(ithi.meta)
library(ithi.figures)
library(ithi.utils)
library(ithi.seq)
library(ithi.clones)
library(ithi.supp)
library(ithi.xcr)
```

```{r}
ihc_table_path <- snakemake@input$ihc_table
xcr_table_path <- snakemake@input$xcr_table
clone_tree_file <- snakemake@input$clone_tree_file
clone_branch_length_file <- snakemake@input$clone_branch_length_file
clone_prevalence_file <- snakemake@input$clone_prevalence_file
tcr_diversity_file <- snakemake@input$tcr_diversity
bcr_diversity_file <- snakemake@input$bcr_diversity
molecular_subtype_file <- snakemake@input$molsubtypes
clonal_measures_file <- snakemake@input$ith_stats_file
mutsig_dir <- snakemake@input$mutsig_dir
ith_icgc_bc_file <- snakemake@input$ith_icgc_bc
nanostring_annotations_path <- snakemake@input$nanostring_annotations
icgc_subtype_file <- snakemake@input$icgc_subtypes
icgc_specimen_file <- snakemake@input$icgc_specimen
master_variant_file <- snakemake@input$snv_table
master_breakpoint_file <- snakemake@input$breakpoint_table

db_path <- snakemake@params$db
tils_for_cluster <- snakemake@params$tils_for_cluster
stat_types <- snakemake@params$ith_stat_types
```

```{r}
nclusts <- 3

annotation_colours <- ithi.figures::get_annotation_colours()

ihc_table <- fread(ihc_table_path)
xcr_table <- read_clonotypes(xcr_table_path, duplicates = FALSE, db_path = db_path)
molsubtypes <- fread(molecular_subtype_file)

tree_branch_data <- read_clone_tree_data(clone_tree_file, clone_branch_length_file, clone_prevalence_file, db_path)

xcr_diversity <- ithi.supp::get_xcr_diversity(tcr_diversity_file, bcr_diversity_file, db_path, xcr_table)

clonal_measures <- ithi.clones::read_ith_stats(clonal_measures_file, db_path, duplicates = FALSE)

ith_icgc_bc <- fread(ith_icgc_bc_file)
icgc_specimen_data <- fread(icgc_specimen_file)

nanostring_labels <- fread(nanostring_annotations_path)

til_clusters <- ithi.figures:::get_til_clusters(ihc_table, molsubtypes, tils_for_cluster = tils_for_cluster, nclusts = 3)

master_variant_table <- read_variant_file(master_variant_file, db_path)
master_breakpoint_table <- read_variant_file(master_breakpoint_file, db_path)

sig_results <- produce_signature_results(mutsig_dir, master_variant_table, master_breakpoint_table, db_path)
sig_results_snv <- sig_results$snv
sig_results_sv <- sig_results$sv
```

## Analysis

Reviewer 3 has several comments about statistical tests made in the manuscript. These are all valid + easy to address -- some are mistakes/typos on my part, others are things we could/should have done, etc. 

### Fig S4B

Fixed to only show patients with samples in both groups. 

```{r}
cor_tilsubtype_clonal_res <- supp_cor_tilsubtype_clonal(ihc_table, tree_branch_data, tils_for_cluster, nclusts, molsubtypes, db_path)
```


```{r}
grid.newpage()
grid.draw(cor_tilsubtype_clonal_res$plots$tilclust_clonalsim)
```

There, no more 'orphaned' points. 

### Permutation test

**Q: What analysis was performed in the sentence "(p > 0.3, permutation test, Figure S4B)" on page 6, which permutation test? Figure S4B is not a permutation test.**

This was actually a nested ranks test. It is incorrectly noted as a Wilcoxon signed-rank within the figure legend -- this needs to be fixed. 

I can see why this would be confused for a MW test, though. What isn't clearly noted here is that I plot the MEANS for each patient. In fact, all pairwise comparisons in each patient are being compared, not just the mean by cluster equality. 

The reason why a Wilcoxon signed-rank is because, to make that type of comparison, patient needs to be accounted for as a random effect. The nested ranks test is one possible extension of a MW/Wilcoxon-type test that uses bootstrapping to compute p-values. See the R package nestedRanksTest. 

### Tumour purity vs. ITH

**Q: In figure S4A, the correlation between "proportion sub-clonal" and "cellularity" is significant (p=0.0169), yet in the manuscript the authors say: "and none of the clonal measures were confounded by tumor purity (all p > 0.2, Figure S4A)". Page 6.**

This is a wording issue. Technically we're right, but we should make it more obvious by listing out the CLONAL measures, i.e. not including proportion subclonal CN. This is a valid point though -- we want to avoid using ambiguous wording wherever possible -- not worth saving a few characters to make things more confusing. 

### Figure 1C

**Q: Figure 1C does not say which statistical test was used, and if a Kruskal-Wallis was used, a post hoc analysis should be employed.**

Point taken. We'll do some post-hoc tests to compare pairwise significance between groups.  

Since this is a Kruskal-Wallis, the most appropriate post-hoc test to use is the Dunn test.

May also be a good idea to annotate on the plot itself. 

```{r, results='hide'}
nonalpha_cluster_colours <- stringr::str_extract(annotation_colours$til_cluster_colours, "#[0-9A-Z]{6}")
ith_boxplots <- ithi.figures:::plot_ith_boxplots(clonal_measures, stat_types, til_clusters, nonalpha_cluster_colours, force_font = FALSE, scale_factor = 7/18, orientation = "wide", post_hoc = TRUE)
```

```{r}
grid.newpage()
grid.draw(ith_boxplots)
```

TODO: Add a legend for the asterisks. Two asterisks corresponds to 0.05, one to 0.1, and 3 to 0.001. 

As per the results above, all measures except clone divergence show a significant difference between all other classes and ES-TIL. 

### Figure 6B

**Q: Statistical analysis of 6B should include post hoc test.**

This is going to be removed from the text (because the entire mutation signatures section is going to go poof), but let's do it anyways for interest (and because I need this for a slide). 

```{r, results='hide'}
label_file <- Sys.glob(file.path(mutsig_dir, "output", "*_labels.tsv"))
labels <- data.table::fread(label_file)
labels <- subset(labels, select=c(library, project, histotype))
labels$library <- stringr::str_replace(labels$library, "^patient_", "")
labels <- labels %>% plyr::rename(c("library"="patient_id"))

signature_labels <- stringr::str_extract(c(colnames(sig_results_snv), colnames(sig_results_sv)), "SN?V-[0-9]+")
signature_labels <- signature_labels[!is.na(signature_labels)]

sigheat_patient_res <-  ithi.supp:::get_formal_clusters(sig_results, labels, signature_labels, annotation_colours)

col_clust <- sigheat_patient_res$hc
sample_order <- col_clust$labels[col_clust$order]
clusters <- cutree(col_clust, 3)
general_labels <- as.data.frame(clusters) %>% tibble::rownames_to_column(var = "patient_id") %>% plyr::rename(c("clusters"="cluster"))
general_labels$cluster <- as.character(general_labels$cluster)

message("Plotting boxplots ...")
nonalpha_cluster_colours <- stringr::str_extract(annotation_colours$mutsig_clusters, "#[0-9A-Z]{6}")

# Technically icgc_subtypes hasn't been assigned, but we don't use it anymore so w/e
immune_boxplot_fig <- ithi.figures:::plot_immune_boxplot(ith_icgc_bc, level = "patient", nanostring_labels, general_labels, molsubtypes, icgc_specimen_data, annotation_colours, sample_order, db_path, icgc_subtypes, nonalpha_cluster_colours, force_font = FALSE, post_hoc = TRUE)
```

```{r}
grid.newpage()
grid.draw(immune_boxplot_fig)
```

TODO: Modify text to acknowledge that there is significant difference between H-HRD and H-FBI-2 in SOME immune pathways. Or alternatively just be more specific in saying that there isn't significance in the other categories (instead of saying that they're similar overall).