library(ithi.utils)
load_base_libs()
library(methods)
library(forcats)
library(grid)
library(gridExtra)
library(gridBase)
library(ithi.meta)
library(ithi.figures)
library(ithi.utils)
library(ithi.seq)
library(ithi.supp)
library(ithi.xcr)
ihc_table_path <- snakemake@input$ihc_table
xcr_table_path <- snakemake@input$xcr_table
tcr_diversity_file <- snakemake@input$tcr_diversity
bcr_diversity_file <- snakemake@input$bcr_diversity
molecular_subtype_file <- snakemake@input$molsubtypes
db_path <- snakemake@params$db
total_tiltypes <- snakemake@params$total_tiltypes
all_tiltypes <- snakemake@params$all_tiltypes
tils_for_cluster <- snakemake@params$tils_for_cluster
annotation_colours <- ithi.figures::get_annotation_colours()
ihc_table <- fread(ihc_table_path)
xcr_table <- read_clonotypes(xcr_table_path, duplicates = FALSE, db_path = db_path)
Read 19.7% of 304822 rows
Read 55.8% of 304822 rows
Read 88.6% of 304822 rows
Read 304822 rows and 18 (of 18) columns from 0.070 GB file in 00:00:05
molsubtypes <- fread(molecular_subtype_file)
tcr_diversity <- read_xcr_diversity_file(tcr_diversity_file, db_path)
bcr_diversity <- read_xcr_diversity_file(bcr_diversity_file, db_path)
xcr_diversity <- ithi.supp::get_xcr_diversity(tcr_diversity_file, bcr_diversity_file,
db_path, xcr_table)
til_clusters <- ithi.figures:::get_til_clusters(ihc_table, molsubtypes, tils_for_cluster = tils_for_cluster,
nclusts = 3)
Naively, if it was true that less infiltrated tumours have more TCRs to be sequenced, N-TIL tumours should also have higher TCR/BCR diversity, which is not the case.
This is a valid point for ES-TIL vs. S-TIL. It could be argued that having more expanded TIL that result in missing detection of rare TIL clonotypes in high-TIL samples is actually signal that we’re trying to capture, though. If very rare clonotypes are squeezed out because we don’t have enough reads, that means that the clonotypes that we do see are expanded relative to those – and this should be quantified as less diversity. This was the motivation behind using species diversity measures like Shannon entropy (in addition to just using unique clonotype count).
We’ll address this 2 ways:
If the reason behind our findings is that rare TCRs are missed, the first point should reveal no differences between S-TIL and ES-TIL populations.
More broadly, if the degree of infiltration confounds these TCR/BCR diversity for some other reason, the second point should address that. Note that we use total TIL densities (and not epithelial/stromal split) when doing this, because TIL clusters are defined on the basis of epithelial/stromal TIL densities.
As for what the point of this analysis is – now that we’ve switched this analysis to using our TIL subtypes rather than the transcriptomic/molecular subtypes, the point of this analysis is to describe other properties of our subtypes, i.e. the properties of TCR/BCR diversity. Naively one might think from our discussion of total CD8/CD4 TIL densities being correlated with TCR diversity in the manuscript (PAGE NUMBER) that the most ‘immune-hot’ samples should have the highest TCR/BCR diversity. In fact, however, this analysis shows that it’s actually the S-TIL samples that have the highest diversity, implying that the stromal TIL populations in S-TIL samples may be more clonally diverse than the TIL populations in ES-TIL samples. This isn’t significant (it’s a trend), but still seems important to note.
Other ways to get at this question include using diversity measures that estimate the proportion of unsampled clonotypes (e.g. Chao1; though the accuracy of these measures is debated in the literature, see ____). We can also use other diversity measures that put greater weight on the more abundant clonotypes, i.e. Hill diversity for alpha >= 2.
tilsubtype_immune_res <- supp_molsubtype_immune_properties(ihc_table, molsubtypes,
xcr_table, tcr_diversity_file, bcr_diversity_file, db_path, tiltypes = all_tiltypes,
subtype_class = "til_clusters", raw_measures = c("clonotypes_unique", "D50_index",
"chao1", "efron_thisted", "inverse_simpson", "renyi_4", "renyi_8", "renyi_16",
"renyi_32", "renyi_64"), translated_measures = c("Unique clonotypes",
"D50 index", "Chao1", "Efron Thisted", "Inverse Simpson", "Renyi 4",
"Renyi 8", "Renyi 16", "Renyi 32", "Renyi 64"))
tilsubtype_immune_res$subtype_diversity_plots$Chao1
tilsubtype_immune_res$subtype_diversity_plots$`Efron Thisted`
which recapitulate the same results.
While the differences between S-TIL and ES-TIL aren’t significant, they’re visually comparable to the (also non-significant) findings with other diversity measures.
tilsubtype_immune_res$subtype_diversity_plots$`Inverse Simpson`
tilsubtype_immune_res$subtype_diversity_plots$`Renyi 4`
tilsubtype_immune_res$subtype_diversity_plots$`Renyi 8`
tilsubtype_immune_res$subtype_diversity_plots$`Renyi 16`
tilsubtype_immune_res$subtype_diversity_plots$`Renyi 32`
tilsubtype_immune_res$subtype_diversity_plots$`Renyi 64`
To get at this point, we can also design a model with total CD8/CD4/CD20/Plasma TIL density as an additional covariate. Note that we are not interested in the differences between epithelial and stromal here, since that is likely part of the signal.
ihc_table_subset <- subset(ihc_table, select = c("condensed_id", "patient_id",
total_tiltypes))
ihc_xcr_cluster_table <- Reduce(plyr::join, list(xcr_diversity, til_clusters,
ihc_table_subset))
ihc_xcr_cluster_table_filtered <- subset(ihc_xcr_cluster_table, !is.na(cluster) &
cluster != "N-TIL")
get_lm_result <- function(var = "tcr_shannon_entropy", random_effects = NULL) {
if (is.null(random_effects)) {
mod <- lm(as.formula(paste0(var, " ~ T_CD8_density + T_CD4_density + T_CD20_density + T_Plasma_density + cluster")),
data = ihc_xcr_cluster_table_filtered)
} else {
mod <- lmer(as.formula(paste0(var, " ~ T_CD8_density + T_CD4_density + T_CD20_density + T_Plasma_density + cluster + ",
random_effects)), data = ihc_xcr_cluster_table_filtered)
}
return(mod)
}
vars <- c("tcr_shannon_entropy", "tcr_clonotypes_unique", "bcr_shannon_entropy",
"bcr_clonotypes_unique")
In a linear model without any random effects:
lm_res <- lapply(vars, function(x) get_lm_result(x))
lm_res_summaries <- lapply(lm_res, summary)
names(lm_res_summaries) <- vars
lm_res_summaries
$tcr_shannon_entropy
Call:
lm(formula = as.formula(paste0(var, " ~ T_CD8_density + T_CD4_density + T_CD20_density + T_Plasma_density + cluster")),
data = ihc_xcr_cluster_table_filtered)
Residuals:
Min 1Q Median 3Q Max
-399.59 -133.20 -0.17 84.19 617.52
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 362.7619 92.1328 3.937 0.000474 ***
T_CD8_density 0.1778 0.2916 0.610 0.546748
T_CD4_density 2.0635 0.8220 2.510 0.017891 *
T_CD20_density 3.7129 1.9956 1.861 0.072968 .
T_Plasma_density -3.2347 1.4720 -2.198 0.036120 *
clusterES-TIL -275.3710 91.9434 -2.995 0.005568 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 211.8 on 29 degrees of freedom
Multiple R-squared: 0.4016, Adjusted R-squared: 0.2984
F-statistic: 3.893 on 5 and 29 DF, p-value: 0.008036
$tcr_clonotypes_unique
Call:
lm(formula = as.formula(paste0(var, " ~ T_CD8_density + T_CD4_density + T_CD20_density + T_Plasma_density + cluster")),
data = ihc_xcr_cluster_table_filtered)
Residuals:
Min 1Q Median 3Q Max
-572.83 -260.80 -11.26 199.30 997.70
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 618.8036 173.7924 3.561 0.0013 **
T_CD8_density 0.9359 0.5500 1.702 0.0995 .
T_CD4_density 3.8569 1.5505 2.487 0.0189 *
T_CD20_density 6.1439 3.7643 1.632 0.1135
T_Plasma_density -5.4491 2.7766 -1.963 0.0594 .
clusterES-TIL -477.2909 173.4352 -2.752 0.0101 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 399.5 on 29 degrees of freedom
Multiple R-squared: 0.358, Adjusted R-squared: 0.2473
F-statistic: 3.234 on 5 and 29 DF, p-value: 0.01929
$bcr_shannon_entropy
Call:
lm(formula = as.formula(paste0(var, " ~ T_CD8_density + T_CD4_density + T_CD20_density + T_Plasma_density + cluster")),
data = ihc_xcr_cluster_table_filtered)
Residuals:
Min 1Q Median 3Q Max
-390.67 -248.96 -91.31 40.16 1465.92
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 314.3465 194.8001 1.614 0.117
T_CD8_density -0.2258 0.6165 -0.366 0.717
T_CD4_density -2.2107 1.7380 -1.272 0.213
T_CD20_density 0.1582 4.2193 0.037 0.970
T_Plasma_density 0.1241 3.1122 0.040 0.968
clusterES-TIL 239.8725 194.3997 1.234 0.227
Residual standard error: 447.8 on 29 degrees of freedom
Multiple R-squared: 0.1212, Adjusted R-squared: -0.03034
F-statistic: 0.7997 on 5 and 29 DF, p-value: 0.559
$bcr_clonotypes_unique
Call:
lm(formula = as.formula(paste0(var, " ~ T_CD8_density + T_CD4_density + T_CD20_density + T_Plasma_density + cluster")),
data = ihc_xcr_cluster_table_filtered)
Residuals:
Min 1Q Median 3Q Max
-1317.7 -740.0 -81.1 346.1 3869.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2130.99076 520.89394 4.091 0.000312 ***
T_CD8_density -0.07509 1.64851 -0.046 0.963981
T_CD4_density -4.14022 4.64727 -0.891 0.380319
T_CD20_density 6.69066 11.28235 0.593 0.557764
T_Plasma_density -6.54300 8.32212 -0.786 0.438117
clusterES-TIL -292.44442 519.82317 -0.563 0.578041
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1197 on 29 degrees of freedom
Multiple R-squared: 0.08741, Adjusted R-squared: -0.06993
F-statistic: 0.5555 on 5 and 29 DF, p-value: 0.7329
In a linear model with patient as a random effect:
lm_res <- lapply(vars, function(x) get_lm_result(x, random_effects = "(1|patient_id)"))
lm_res_summaries <- lapply(lm_res, summary)
names(lm_res_summaries) <- vars
lm_res_summaries
$tcr_shannon_entropy
Linear mixed model fit by REML t-tests use Satterthwaite approximations
to degrees of freedom [lmerMod]
Formula:
tcr_shannon_entropy ~ T_CD8_density + T_CD4_density + T_CD20_density +
T_Plasma_density + cluster + (1 | patient_id)
Data: ihc_xcr_cluster_table_filtered
REML criterion at convergence: 443.5
Scaled residuals:
Min 1Q Median 3Q Max
-1.63619 -0.60722 0.08228 0.42727 2.85964
Random effects:
Groups Name Variance Std.Dev.
patient_id (Intercept) 5454 73.85
Residual 40383 200.96
Number of obs: 35, groups: patient_id, 14
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 349.1769 93.4229 27.1810 3.738 0.000875 ***
T_CD8_density 0.2091 0.2980 26.4730 0.702 0.489074
T_CD4_density 2.0257 0.8418 25.4810 2.406 0.023681 *
T_CD20_density 4.1567 2.0456 24.9620 2.032 0.052922 .
T_Plasma_density -3.2183 1.4873 28.1860 -2.164 0.039095 *
clusterES-TIL -270.9569 94.6134 23.6310 -2.864 0.008635 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) T_CD8_ T_CD4_ T_CD20 T_Pls_
T_CD8_dnsty 0.018
T_CD4_dnsty -0.483 -0.056
T_CD20_dnst 0.109 0.475 -0.268
T_Plsm_dnst -0.333 -0.748 0.004 -0.645
clstrES-TIL -0.486 -0.320 -0.261 -0.095 0.484
$tcr_clonotypes_unique
Linear mixed model fit by REML t-tests use Satterthwaite approximations
to degrees of freedom [lmerMod]
Formula:
tcr_clonotypes_unique ~ T_CD8_density + T_CD4_density + T_CD20_density +
T_Plasma_density + cluster + (1 | patient_id)
Data: ihc_xcr_cluster_table_filtered
REML criterion at convergence: 480.4
Scaled residuals:
Min 1Q Median 3Q Max
-1.47738 -0.63967 -0.04096 0.50804 2.40637
Random effects:
Groups Name Variance Std.Dev.
patient_id (Intercept) 9071 95.24
Residual 151915 389.76
Number of obs: 35, groups: patient_id, 14
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 607.9392 175.1537 27.3080 3.471 0.00174 **
T_CD8_density 0.9648 0.5571 25.6680 1.732 0.09534 .
T_CD4_density 3.8549 1.5715 25.1650 2.453 0.02143 *
T_CD20_density 6.4703 3.8159 24.9590 1.696 0.10240
T_Plasma_density -5.4571 2.7947 27.8740 -1.953 0.06097 .
clusterES-TIL -482.6953 176.1027 23.9530 -2.741 0.01139 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) T_CD8_ T_CD4_ T_CD20 T_Pls_
T_CD8_dnsty 0.034
T_CD4_dnsty -0.489 -0.056
T_CD20_dnst 0.095 0.475 -0.259
T_Plsm_dnst -0.340 -0.757 -0.004 -0.630
clstrES-TIL -0.509 -0.345 -0.237 -0.098 0.508
$bcr_shannon_entropy
Linear mixed model fit by REML t-tests use Satterthwaite approximations
to degrees of freedom [lmerMod]
Formula:
bcr_shannon_entropy ~ T_CD8_density + T_CD4_density + T_CD20_density +
T_Plasma_density + cluster + (1 | patient_id)
Data: ihc_xcr_cluster_table_filtered
REML criterion at convergence: 452.9
Scaled residuals:
Min 1Q Median 3Q Max
-2.71224 -0.12789 -0.03281 0.11130 2.43026
Random effects:
Groups Name Variance Std.Dev.
patient_id (Intercept) 160588 400.7
Residual 18893 137.5
Number of obs: 35, groups: patient_id, 14
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 190.9305 132.8360 22.2040 1.437 0.165
T_CD8_density -0.1104 0.2796 19.2220 -0.395 0.697
T_CD4_density 0.1074 0.8374 20.2040 0.128 0.899
T_CD20_density -0.7765 2.1534 21.3180 -0.361 0.722
T_Plasma_density 0.6888 1.4028 19.7380 0.491 0.629
clusterES-TIL 78.4135 97.3060 20.2440 0.806 0.430
Correlation of Fixed Effects:
(Intr) T_CD8_ T_CD4_ T_CD20 T_Pls_
T_CD8_dnsty -0.103
T_CD4_dnsty -0.305 0.028
T_CD20_dnst 0.180 0.397 -0.395
T_Plsm_dnst -0.196 -0.652 0.090 -0.758
clstrES-TIL -0.212 -0.153 -0.429 -0.073 0.310
$bcr_clonotypes_unique
Linear mixed model fit by REML t-tests use Satterthwaite approximations
to degrees of freedom [lmerMod]
Formula:
bcr_clonotypes_unique ~ T_CD8_density + T_CD4_density + T_CD20_density +
T_Plasma_density + cluster + (1 | patient_id)
Data: ihc_xcr_cluster_table_filtered
REML criterion at convergence: 523
Scaled residuals:
Min 1Q Median 3Q Max
-2.28387 -0.26752 -0.04838 0.21259 2.79887
Random effects:
Groups Name Variance Std.Dev.
patient_id (Intercept) 1236760 1112.1
Residual 272049 521.6
Number of obs: 35, groups: patient_id, 14
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 1894.9573 415.6618 25.1950 4.559 0.000115 ***
T_CD8_density -0.5746 1.0327 21.5620 -0.556 0.583707
T_CD4_density 0.6400 3.0640 23.0430 0.209 0.836379
T_CD20_density 7.1545 7.7956 24.6540 0.918 0.367637
T_Plasma_density -2.9706 5.1569 22.2680 -0.576 0.570359
clusterES-TIL -701.4705 355.6819 23.2780 -1.972 0.060583 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) T_CD8_ T_CD4_ T_CD20 T_Pls_
T_CD8_dnsty -0.109
T_CD4_dnsty -0.356 0.017
T_CD20_dnst 0.196 0.410 -0.377
T_Plsm_dnst -0.229 -0.662 0.075 -0.749
clstrES-TIL -0.256 -0.167 -0.422 -0.074 0.324
Regardless of whether patient is accounted for as a random effect, ES-TIL samples have less TCR diversity (unique clonotype count and entropy) than S-TIL samples, controlling for total CD8, CD4, CD20, and Plasma cell density.
Therefore, it appears to remain that TCR/BCR diversity is highest in ES-TIL samples, even when using diversity indices that estimate the number of missing clonotypes. While this is not significant, there is a trend.
Hopefully this is answered adequately. Thoughts?