Contents. Introduction 2. PART A - data import & R session setup 3 Step 1 - read in the data load the libraries sessioninfo...

Size: px

Start display at page:

Download "Contents. Introduction 2. PART A - data import & R session setup 3 Step 1 - read in the data load the libraries sessioninfo..."

Shanon Carr
5 years ago
Views:

1 R code for The human immune system is robustly maintained in multiple stable equilibriums shaped by age and cohabitation Ed Carr, on behalf of co-authors 10 September 2015 Contents Introduction 2 PART A - data import & R session setup 3 Step 1 - read in the data load the libraries sessioninfo PART B - Figure code and illustrations 7 Figure Figure Figure Figure Figure

2 Introduction This document supports the main text of the paper. Code can be re-run by the reader on their machine. There are two recommended ways to achieve this. 1. copy and paste code from this pdf into an instance of R 2. Open the.rmd (RStudio describes this format : Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see rstudio.com. ) in RStudio or your favourite code editor. Send lines of code from your editor to R as you wish. 2

3 PART A - data import & R session setup Step 1 - read in the data Data is provided in two formats. 1. xls to peruse in Excel. 2. RData file. This contains each sheet of the xls file as a separate object. The data within the xls file and RData files are identical. For ease, we will import the data into R only from the RData file. # This code assumes that the RData file is in your working directory. load(file = "Original_data_for_resource_v2.RData") ls() [1] "all.data" "cell_type_pal" "data" [4] "short.flow.names" These four objects are also the names of the xls sheets, if you wish to view them in Excel. 3

4 2 - load the libraries # The following libraries are used. You will need to download these locally # from CRAN/bioconductor Remove the # from the next 2 lines to download this # source(' bioclite('relaimpo') # Then re-run the bioclite line, replacing with each library name below. library(relaimpo) Loading required package: MASS Loading required package: boot Loading required package: survey Loading required package: grid Attaching package: 'survey' The following object is masked from 'package:graphics': dotchart Loading required package: mitools This is the global version of package relaimpo. If you are a non-us user, a version with the interesting additional metric pmvd is available from Ulrike Groempings web site at prof.beuth-hochschule.de/groemping. library(rcolorbrewer) library(ellipse) library(gplots) Attaching package: 'gplots' The following object is masked from 'package:stats': lowess library(vegan) Loading required package: permute Loading required package: lattice Attaching package: 'lattice' The following object is masked from 'package:boot': melanoma 4

5 This is vegan Attaching package: 'vegan' The following object is masked from 'package:survey': calibrate library(pspearman) # Random seed is set (so that re-runs look the same) set.seed(42) 5

6 3 - sessioninfo # This command tells you what versions you are using. Useful if you get # different results to those in the paper. sessioninfo() R version ( ) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Dutch_Belgium.1252 LC_CTYPE=Dutch_Belgium.1252 [3] LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C [5] LC_TIME=Dutch_Belgium.1252 attached base packages: [1] grid stats graphics grdevices utils datasets methods [8] base other attached packages: [1] pspearman_0.3-0 vegan_2.3-0 lattice_ [4] permute_0.8-4 gplots_ ellipse_0.3-8 [7] RColorBrewer_1.1-2 relaimpo_2.2-2 mitools_2.3 [10] survey_ boot_ MASS_ loaded via a namespace (and not attached): [1] bitops_1.0-6 catools_ cluster_2.0.3 [4] corpcor_1.6.8 digest_0.6.8 evaluate_0.7.2 [7] formatr_1.2 gdata_ gtools_3.4.2 [10] htmltools_0.2.6 KernSmooth_ knitr_1.11 [13] magrittr_1.5 Matrix_1.2-2 mgcv_1.8-7 [16] nlme_ parallel_3.1.2 rmarkdown_0.8 [19] stringi_0.5-5 stringr_1.0.0 tools_3.1.2 [22] yaml_

7 PART B - Figure code and illustrations For each figure, the R code is shown first, then the same code is run to plot the figure. 7

8 Figure 1 Panel A is drawn first. # Correlation matrix using the data from # the last visit of each individual: flow.cytokine.data <- data[, 13:66] xxc3 <- cor(flow.cytokine.data, use = "pairwise.complete.obs", method = "spearman") levels(factor(short.flow.names$cell_type_col)) # Dendrogram drawn from heatmap function par(mai = c(0, 0, 0, 0), oma = c(0, 0, 0, 0), omi = c(0, 0, 0, 0)) hm <- heatmap(xxc3, RowSideColors = short.flow.names$cell_type_col, labcol = "", labrow = "", Colv = NA, margins = c(0, 0), col = 0) # These lines were set by user lines(x = c(0.26, 0.3), y = c(1, 0.85), lwd = 2) lines(x = c(0.26, 0.3), y = c(0, 0.05), lwd = 2) # Re-order the correlation matrix to # match the dendrogram xxc3 <- xxc3[rev(hm$rowind), rev(hm$rowind)] # Over-plot the entire device with a # clear plot to allow full legend # control. par(fig = c(0, 1, 0, 1), oma = c(0, 0, 0, 0), mar = c(0, 0, 0, 0) + 0, new = TRUE) mtext("a", side = 3, line = -2, cex = 1.5, adj = 0.1) # Over-plot the entire device with a # clear plot to allow full legend # control. par(fig = c(0, 1, 0, 0.6), oma = c(0, 0, 0, 0), mar = c(0, 0, 0, 0) + 0, new = TRUE) legend("topright", fill = cell_type_pal, legend = levels(short.flow.names$cell_type), cex = 0.8, bty = "n") cor.col <- bluered(11) # Over-plot the entire device with a # clear plot par(fig = c(0, 1, 0, 1), mar = c(0, 0, 0, 0), new = TRUE) # plotcorr3 function is build from the # plotcorr function in ellipse Minor, # simple adjustments were made to alter # the colour scheme and plot margins The # original plotcorr is a better function # to use in nearly all circumstances If # you want to directly replicate our # figure, use plotcorr3 plotcorr3 <- function(corr, outline = TRUE, col = "grey", numbers = FALSE, type = c("full", "lower", "upper"), diag = (type == "full"), bty = "n", axes = FALSE, 8

9 xlab = "", ylab = "", asp = 1, cex.lab = par("cex.lab"), cex = 1 * par("cex"), mar = c(0, 5.5, 0, 0), col.lab = "",...) { savepar <- par(pty = "s", mar = mar) on.exit(par(savepar)) if (is.null(corr)) return(invisible()) if ((!is.matrix(corr)) (round(min(corr, na.rm = TRUE), 6) < -1) (round(max(corr, na.rm = TRUE), 6) > 1)) stop("need a correlation matrix") plot.new() par(new = TRUE) rowdim <- dim(corr)[1] coldim <- dim(corr)[2] rowlabs <- dimnames(corr)[[1]] collabs <- dimnames(corr)[[2]] if (is.null(rowlabs)) rowlabs <- 1:rowdim if (is.null(collabs)) collabs <- 1:coldim rowlabs <- as.character(rowlabs) collabs <- as.character(collabs) col <- rep(col, length = length(corr)) dim(col) <- dim(corr) type <- match.arg(type) cols <- 1:coldim rows <- 1:rowdim xshift <- 0 yshift <- 0 if (!diag) { if (type == "upper") { cols <- 2:coldim rows <- 1:(rowdim - 1) xshift <- 1 else if (type == "lower") { cols <- 1:(coldim - 1) rows <- 2:rowdim yshift <- -1 maxdim <- max(length(rows), length(cols)) plt <- par("plt") xlabwidth <- max(strwidth(rowlabs[rows], units = "figure", cex = cex.lab))/(plt[2] - plt[1]) xlabwidth <- xlabwidth * maxdim/(1 - xlabwidth) ylabwidth <- max(strwidth(collabs[cols], units = "figure", cex = cex.lab))/(plt[4] - plt[3]) ylabwidth <- ylabwidth * maxdim/(1 - ylabwidth) 9

10 plot(c(-xlabwidth - 0.5, maxdim + 0.5), c(0.5, maxdim ylabwidth), type = "n", bty = bty, axes = axes, xlab = "", ylab = "", asp = asp, cex.lab = cex.lab,...) text(rep(0, length(rows)), length(rows):1, labels = rowlabs[rows], adj = 1, cex = cex.lab, col = col.lab) text(cols - xshift, rep(length(rows) + 1, length(cols)), labels = collabs[cols], srt = 90, adj = 0, cex = cex.lab, col = col.lab) mtext(xlab, 1, 0) mtext(ylab, 2, 0) mat <- diag(c(1, 1)) plotcorrinternal <- function() { if (i == j &&!diag) return() if (!numbers) { mat[1, 2] <- corr[i, j] mat[2, 1] <- mat[1, 2] ell <- ellipse(mat, t = 0.43) ell[, 1] <- ell[, 1] + j - xshift ell[, 2] <- ell[, 2] + length(rows) i - yshift polygon(ell, col = col[i, j]) if (outline) lines(ell) else { text(j xshift, length(rows) i - yshift, round(10 * corr[i, j], 0), adj = 1, cex = cex) for (i in 1:dim(corr)[1]) { for (j in 1:dim(corr)[2]) { if (type == "full") { plotcorrinternal() else if (type == "lower" && (i >= j)) { plotcorrinternal() else if (type == "upper" && (i <= j)) { plotcorrinternal() invisible() # plotcorr plotcorr3(xxc3, type = "lower", col = cor.col[5 * xxc3 + 6], cex.lab = 0.5, diag = T, xlab = "", 10

11 ylab = "", col.lab = short.flow.names[rev(hm$rowind), ]$cell_type_col) 11

12 Panel B is shown below. flow.cytokine.data <- data[, 13:66] xc <- cor(flow.cytokine.data, use = "pairwise.complete.obs", method = "spearman") xc.dist <- dist(xc) # monomds from vegan package set.seed(42) xcmds <- monomds(xc.dist, k = 2) par(mar = c(4, 8, 1, 8)) # plot the MDS plot plot(xcmds, type = "p", ylim = c(-1, 1.75), las = 1, cex.axis = 0.75, xlab = "First dimension of non-metric mutlidimensional scaling (NMDS)", ylab = "Second dimension of NMDS") # Add ellipses for each cell type for (i in 1:nlevels(short.flow.names$cell_type)) { ordiellipse(xcmds, groups = short.flow.names$cell_type, draw = "polygon", col = cell_type_pal[i], show.groups = levels(short.flow.names$cell_type)[i], alpha = 75) # Labelled spider for 'precursors' ordispider(xcmds, groups = short.flow.names$cell_type, label = T, show.groups = "Precursor", spiders = "centroid") # Unlabelled spiders for the other cell # types ordispider(xcmds, groups = short.flow.names$cell_type, label = F, show.groups = c("core cell types", "Cytokine", "Humoral", "Inflammatory", "Regulatory"), spiders = "centroid") legend("topleft", fill = adjustcolor(cell_type_pal, alpha = 1/255 * 125), levels(short.flow.names$cell_type), cex = 0.75, bty = "n") mtext("b", side = 3, adj = -0.25, line = -1, cex = 1.5) 12

13 A CD4+prolif Treg NK Th2 NKT CD8+EM IL 12 TNFa IL 10 IFNg IL 6 Lymphocyte Treg prolif mdcs pdcs IL 4 Bnaive B cell gdtcr Th1 Tc1 CD4+EM CD8+ CD8+IL2+ CD4+IL2+ Th17 CD8+EMRA CD4+EMRA Th10 Plasmablast B IgE+ CD8+GMCSF+ IL 13 CD4+IL21+ inkt IL 8 Bmem IL 17 Bswitch BAFF MBL CD8+prolif B trans CD4+CM CD8+CM Tfh CD4+GMCSF+ T cells CD4+ DCs CD8+naive CD8+RTE CD4+RTE CD4+naive CD4+prolif Treg NK Th2 NKT CD8+EM IL 12 TNFa IL 10 IFNg IL 6 Lymphocyte Treg prolif mdcs pdcs IL 4 Bnaive B cell gdtcr Th1 Tc1 CD4+EM CD8+ CD8+IL2+ CD4+IL2+ Th17 CD8+EMRA CD4+EMRA Th10 Plasmablast B IgE+ CD8+GMCSF+ IL 13 CD4+IL21+ inkt IL 8 Bmem IL 17 Bswitch BAFF MBL CD8+prolif B trans CD4+CM CD8+CM Tfh CD4+GMCSF+ T cells CD4+ DCs CD8+naive CD8+RTE CD4+RTE CD4+naive Core cell types Cytokine Humoral Inflammatory Precursor Regulatory B Second dimension of NMDS Core cell types Cytokine Humoral Inflammatory Precursor Regulatory Precursor First dimension of non metric mutlidimensional scaling (NMDS) 13

14 Figure 2 # model: response ~ visit (within # individuals) + PatientID (between # individuals) All individuals # For all flow parameters: all.models <- matrix(nrow = nrow(short.flow.names), ncol = 12) colnames(all.models) <- c("p_of_model", "r2_of_model", "prop_attributed_to_visit", "prop_attributed_to_patientid", "p_cont_healthy", "r2_cont_healthy", "prop_attributed_to_visit_cont_healthy", "prop_attributed_to_patientid_cont_healthy", "p_diarrhoea", "r2_diarrhoea", "prop_attributed_to_visit_diarrhoea", "prop_attributed_to_patientid_diarrhoea") rownames(all.models) <- short.flow.names$short_name for (i in c(13:54)) { model <- lm(all.data[, i] ~ all.data[, "visit"] + all.data[, "PatientID"], na.action = "na.exclude") all.models[c(i ), 2] <- summary(model)$r.squared all.models[c(i ), 1] <- pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F) if (i %in% c(13:23, 25:26, 28:39, 41:42, 44:46, 49:54)) { ca <- calc.relimp(model, diff = T, rela = T) all.models[c(i ), 3] <- ca$lmg[grepl(pattern = "visit", x = names(ca$lmg))] all.models[c(i ), 4] <- ca$lmg[grepl(pattern = "PatientID", x = names(ca$lmg))] rm(ca) rm(model) cat(i, sep = "\t") # All models have R2 and p from lm; those that calc.relimp can disentangle are disentangled. colnames(all.data) # Continuously healthy cohort Repeat # these loops excluding the volunteers # who got diarrhoea AND we have their # first visit for i.e. define a # 'continuously healthy' population. If # they got diarrhoea, they got diarrhoea # between visits 1 and 2. We have lots # of people with visits 2,3,4 and not # one, so they are 'continuously # healthy'. cont.healthy <- all.data[!all.data$sickduringtravel. == "yes", ] # no one who was sick # get just the mulitiple attenders: cont.healthy.ids <- cont.healthy[duplicated(cont.healthy$patientid), 14

15 ]$PatientID cont.healthy.ids <- factor(cont.healthy.ids) cont.healthy <- cont.healthy[cont.healthy$patientid %in% cont.healthy.ids, ] summary(cont.healthy$visit) cont.healthy$patientid <- factor(cont.healthy$patientid) nlevels(cont.healthy$patientid) nrow(cont.healthy) # Now get the travellers where we missed # their diarrhoea spots: sickies <- all.data[all.data$sickduringtravel. == "yes" & duplicated(all.data$patientid), ]$PatientID sickies.visits <- all.data[all.data$patientid %in% sickies, c("visit", "PatientID")] sickies.from.start <- sickies.visits[sickies.visits == 1, ]$PatientID cont.healthy <- rbind(cont.healthy, all.data[all.data$patientid %in% sickies[(!sickies %in% sickies.from.start)], ]) cont.healthy$patientid <- factor(cont.healthy$patientid) nlevels(cont.healthy$patientid) summary(cont.healthy$patientid) for (i in c(13:54)) { model <- lm(cont.healthy[, i] ~ cont.healthy[, "visit"] + cont.healthy[, "PatientID"], na.action = "na.exclude") all.models[c(i ), 6] <- summary(model)$r.squared all.models[c(i ), 5] <- pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F) if (i %in% c(13:23, 25:26, 28:39, 41:42, 44:46, 49:54)) { ca <- calc.relimp(model, diff = T, rela = T) all.models[c(i ), 7] <- ca$lmg[grepl(pattern = "visit", x = names(ca$lmg))] all.models[c(i ), 8] <- ca$lmg[grepl(pattern = "PatientID", x = names(ca$lmg))] rm(ca) rm(model) cat(i, sep = "\t") Travelers cohort definition Diarrhoeal cohort travelers <- all.data[all.data$sickduringtravel. %in% c("yes", "no"), ] travelers <- travelers[travelers$visit %in% c(1, 2), ] # Get people immediately prior and on return from travel. travelers <- travelers[c(1:117, 119), ] # There's a re-staining of a PBMC sample with date 'NA' travelers$patientid <- factor(travelers$patientid) # Remove unused levels nlevels(travelers$patientid) 15

16 both_visits <- levels(travelers$patientid)[summary(travelers$patientid, max = 1e+05) == 2] travelers <- travelers[travelers$patientid %in% both_visits, ] summary(travelers$visit) summary(travelers$sickduringtravel.) travelers$patientid <- factor(travelers$patientid) # remove empty levels all(travelers[travelers$visit == 1, ]$PatientID %in% travelers[travelers$visit == 2, ]$PatientID) all(travelers[travelers$visit == 2, ]$PatientID %in% travelers[travelers$visit == 1, ]$PatientID) # Repeat for those with diarrhoea: diarrhoea.getters.id <- all.data[all.data$visit == 1 & all.data$sickduringtravel. == "yes", ]$PatientID diarrhoea.getters.id <- factor(diarrhoea.getters.id) for (i in c(13:23, 25:54)) { model <- lm(all.data[all.data$patientid %in% diarrhoea.getters.id, i] ~ all.data[all.data$patientid %in% diarrhoea.getters.id, "visit"] + all.data[all.data$patientid %in% diarrhoea.getters.id, "PatientID"], na.action = "na.exclude") all.models[c(i ), 10] <- summary(model)$r.squared all.models[c(i ), 9] <- pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F) rm(model) cat(i, sep = "\t") MDS plot # Correlation matrix: travelers.cor <- cor(t(travelers[, 13:66]), use = "pairwise.complete.obs", method = "spearman") rownames(travelers.cor) <- paste0(travelers$patientid, "V", travelers$visit) colnames(travelers.cor) <- paste0(travelers$patientid, "V", travelers$visit) # Distance matrix: d <- dist(travelers.cor) # MDS scaling fit <- cmdscale(d, eig = TRUE, k = 2) # k is the number of dim # ff$points stores the x y coordinates, # so we pull them out for ease ff <- fit$points # Measure distances: 1. make a 'pair' # factor: ff <- as.data.frame(ff) ff$pair <- paste0(substr(rownames(ff), 1, 6)) ff$pair <- factor(ff$pair) # 2. calculate distance between visits: pc.pair.distances <- matrix(nrow = nlevels(ff$pair), 16

17 ncol = 1) # A container for the results for (i in 1:nlevels(ff$pair)) { pair2 <- ff[ff$pair %in% levels(ff$pair)[i], ] # this should give 2 rows and a single pair. pc.pair.distances[i, 1] <- sqrt(((pair2[1, 1] - pair2[2, 1]) * (pair2[1, 1] - pair2[2, 1])) # PC1 difference + ((pair2[1, 2] - pair2[2, 2]) * (pair2[1, 2] - pair2[2, 2])) # PC2 difference ) rm(pair2) # 3 split into those who had diarrhoea # and those who did not have diarrhoea: diarrhoea <- travelers[travelers$visit == 1 & travelers$sickduringtravel. == "yes", ] diarrhoea.pc.pair.distances <- pc.pair.distances[levels(ff$pair) %in% diarrhoea$patientid, ] nodiarrhoea.pc.pair.distances <- pc.pair.distances[!levels(ff$pair) %in% diarrhoea$patientid, ] # Now to construct the figure: par(mar = c(6, 5, 2, 2) + 1) m <- rbind(c(1, 1), c(1, 1), c(2, 2), c(2, 2), c(3, 4), c(3, 4), c(3, 4), c(5, 5), c(5, 5)) layout(m) plot(all.models[1:43, "r2_of_model"], ylim = c(0, 1), ylab = expression(italic("r"^"2")), xlab = "", las = 1, pch = 1, cex = 1.5, xaxt = "n") Axis(side = 1, at = 1:43, labels = rownames(all.models[1:43, ]), las = 3) # points(all.models.3[,'r2_of_model3'], # ylim = c(0,1), ylab = expression('r' ^ # '2' * ' of model'), xlab = 'Immune # parameters', las = 1, pch = 20) points(all.models[1:43, "r2_cont_healthy"], ylim = c(0, 1), pch = 20) points(all.models[1:43, "r2_diarrhoea"], ylim = c(0, 1), pch = 15) legend("bottomright", pch = c(1, 20, 15), legend = c("all volunteers", "Continuously healthy", "Acute gastroenteritis"), ncol = 3, bty = "n") mtext("a", side = 3, adj = 0, line = 1, cex = 1.5) plot(-log10(p.adjust(all.models[1:43, "p_of_model"], method = "bonferroni", n = nrow(all.models[!is.na(all.models[, "p_of_model"]), ]))), ylim = c(0, 25), ylab = expression("-log"["10"] * italic(" P")), xlab = "", las = 1, pch = 1, cex = 1.5, xaxt = "n") 17

18 Axis(side = 1, at = 1:43, labels = rownames(all.models[1:43, ]), las = 3) points(-log10(p.adjust(all.models[1:43, "p_cont_healthy"], method = "bonferroni", n = nrow(all.models[!is.na(all.models[, "p_cont_healthy"]), ]))), ylim = c(0, 25), pch = 20) points(-log10(p.adjust(all.models[1:43, "p_diarrhoea"], method = "bonferroni", n = nrow(all.models[!is.na(all.models[, "p_diarrhoea"]), ]))), ylim = c(0, 25), pch = 15) # legend('topleft', pch = c(1,20), legend # = c('all volunteers', 'No diarrhoea')) abline(h = -log10(0.05)) mtext("b", side = 3, adj = 0, line = 1, cex = 1.5) diarrhoea.pal <- brewer.pal(3, "Dark2") diarrhoea.pal <- diarrhoea.pal[1:2] diarrhoea.pal <- rev(diarrhoea.pal) plot(ff[, 1], ff[, 2], xlab = "Principal coordinate 1", ylab = "Principal coordinate 2", type = "n", las = 1) for (i in 1:nlevels(ff$pair)) { lines(ff[ff$pair == levels(ff$pair)[i], 1], ff[ff$pair == levels(ff$pair)[i], 2], col = "grey") points(ff[, 1], ff[, 2], xlab = "Coordinate 1", ylab = "Coordinate 2", type = "p", pch = ifelse(grepl(x = substr(rownames(ff), 7, 8), "V1"), 20, 18), cex = 1.3, col = ifelse(ff$pair %in% diarrhoea$patientid, diarrhoea.pal[1], diarrhoea.pal[2])) mtext("c", side = 3, adj = 0, line = 1, cex = 1.5) # legend('bottomright', pch = c(20), col # = diarrhoea.pal[c(2,1)], legend = # c('continuously healthy', 'Acute # gastroenteritis'),ncol = 1, bty = 'n', # cex = 1.3) stripchart(diarrhoea.pc.pair.distances, main = "", vertical = F, xlim = c(0, max(diarrhoea.pc.pair.distances) + 2), method = "jitter", jitter = 1, at = 4, pch = 20, xlab = "Immunological distance", col = diarrhoea.pal[1]) boxplot(pc.pair.distances, add = T, at = 2.5, horizontal = T, outline = F, frame.plot = F, axes = F) stripchart(nodiarrhoea.pc.pair.distances, vertical = F, method = "jitter", jitter = 1, add = T, at = 0, col = diarrhoea.pal[2], pch = 20) boxplot(nodiarrhoea.pc.pair.distances, add = T, at = -1.5, horizontal = T, outline = F, frame.plot = F, axes = F) axis(side = 2, at = c(0, 4), labels = c("continuously \n healthy", "Acute \n gastroenteritis"), las = 1) 18

19 legend("bottomright", pch = 20, col = diarrhoea.pal, c("acute gastroenteritis", "Continuously healthy"), ncol = 2, bty = "n", # title = ('Paired samples')) mtext("d", side = 3, adj = 0, line = 1, cex = 1.5) lines(x = c(2.7, 2.7), y = c(-1.5, 2.5), lwd = 2) text(x = 2.9, y = 0.5, adj = c(0, NA), label = paste0("p=", signif(wilcox.test(diarrhoea.pc.pair.distances, nodiarrhoea.pc.pair.distances)$p.value, digits = 2)), cex = 1.1) barplot(t(all.models[1:43 &!is.na(all.models[, 3]), 3:4]), ylab = expression("proportion of " * italic("r"^"2")), las = 2, cex.names = 1, names = gsub("btrans", " ", x = gsub("plasmablast", " ", x = gsub("cd8+gmcsf+", " ", fixed = T, x = gsub("cd4+gmcsf+", " ", fixed = T, x = gsub("b IgE+", " ", fixed = T, x = gsub("th10", " ", rownames(all.models[1:43 &!is.na(all.models[, 3]), ]))))))), legend.text = gsub("prop_attributed_to_visit", "intraindividual", gsub(pattern = "prop_attributed_to_patientid", replacement = "interindividual", colnames(all.models[1:43, 3:4]))), args.legend = list("topleft", bg = "white", bty = "o", box.lwd = 0)) mtext("e", side = 3, adj = 0, line = 1, cex = 1.5) 19

20 A R All volunteers Continuously healthy Acute gastroenteritis log10 P B C Lymphocyte T cells CD4+ CD4+RTE CD4+naive CD4+EM CD4+EMRA CD4+prolif CD4+CM Th1 Th2 Th17 Th10 Tfh CD4+IL21+ CD4+GMCSF+ CD4+IL2+ Treg Treg prolif CD8+ CD8+naive CD8+RTE CD8+CM CD8+EM CD8+EMRA CD8+prolif Tc1 CD8+IL2+ CD8+GMCSF+ gdtcr B cell B trans Bnaive Bswitch Bmem B IgE+ Plasmablast NKT inkt NK DCs mdcs pdcs Lymphocyte T cells CD4+ CD4+RTE CD4+naive CD4+EM CD4+EMRA CD4+prolif CD4+CM Th1 Th2 Th17 Th10 Tfh CD4+IL21+ CD4+GMCSF+ CD4+IL2+ Treg Treg prolif CD8+ CD8+naive CD8+RTE CD8+CM CD8+EM CD8+EMRA CD8+prolif Tc1 CD8+IL2+ CD8+GMCSF+ gdtcr B cell B trans Bnaive Bswitch Bmem B IgE+ Plasmablast NKT inkt NK DCs mdcs pdcs D Principal coordinate Acute gastroenteritis Continuously healthy p= Acute gastroenteritis Continuously healthy Principal coordinate 1 Immunological distance Proportion of R E interindividual intraindividual Lymphocyte T cells CD4+ CD4+RTE CD4+naive CD4+EM CD4+EMRA CD4+prolif CD4+CM Th1 Th2 Tfh CD4+IL2+ Treg Treg prolif CD8+ CD8+naive CD8+RTE CD8+CM CD8+EM CD8+EMRA CD8+prolif Tc1 gdtcr Bnaive Bswitch NKT inkt NK DCs mdcs 20

21 Figure 4 Firstly some calculations. # Plot the correlates: cors <- stats::cor(cbind(data[, 13:66], Age = data$ageattimeofsampling.years.), use = "pairwise.complete.obs", method = "spearman") cors <- cors[rownames(cors)!= ("Th10"), ] # there is only 1 sample with age + Th10 data cors <- cors[, colnames(cors)!= ("Th10")] # there is only 1 sample with age + Th10 data # Ready for the plotcorr xc <- cors[order(cors[, "Age"], na.last = F), order(cors[, "Age"], na.last = F)] cor.col <- bluered(11) rownames(xc) # Coloured labels for the cell type # colours xc.col <- character(length = nrow(xc)) for (i in 1:nrow(xc) - 1) { xc.col[i] <- short.flow.names[short.flow.names$short_name == rownames(xc)[i], ]$cell_type_col # Age - the final row - should be black xc.col[nrow(xc)] <- "black" # Calculate the p-values for age colnames(xc) ord.flow.data <- data[, colnames(xc)[1:ncol(xc) - 1]] # Get the data in the right order, so we get p values to correspond with the r2 in the corrplot ord.flow.data <- cbind(ord.flow.data, Age = data$ageattimeofsampling.years.) # Get the data in the righ ord.flow.data <- ord.flow.data[!is.na(ord.flow.data$age), ] summary(ord.flow.data) p.values <- numeric() for (i in 1:(ncol(ord.flow.data) - 1)) { p.temp <- pspearman::spearman.test(ord.flow.data[!is.na(ord.flow.data[, i]), i], ord.flow.data[!is.na(ord.flow.data[, i]), ]$Age, approximation = "t-distribution") p.temp$p.value p.values[i] <- p.temp$p.value cat(i) p.values <- p.adjust(p = p.values, method = "bonferroni", n = length(p.values)) Code for panel A. plotcorr3(xc, mar = c(0.1 + c(0, 0, 0, 8)), type = "lower", col = cor.col[5 * xc + 6], cex.lab = 0.5, diag = T, col.lab = xc.col) mtext("a", side = 3, adj = 0, line = 0, cex = 1.5) par(fig = c(0, 1, 0, 1), mar = c(0, 4, 0, 0), omi = c(2.9, 7.3/2-0.5, 0.8, 0.2), mgp = c(1, 0.6, 0), new = TRUE) # Over-plot for inset graphs control. # Top of inset figure 21

22 plot(xc[!rownames(xc) %in% c("age"), "Age"], ylab = "", ylim = c(-1, 1), pch = 20, col = cor.col[5 * xc[, "Age"] + 6], cex = 1, xlab = "", las = 1, cex.axis = 0.75, xaxt = "n") points(xc[!rownames(xc) %in% c("age"), "Age"], pch = 21, cex = 1) axis(side = 1, at = c(0, 10, 20, 30, 40, 50), labels = F, tick = T) abline(v = c(10, 20, 30, 40, 50, 60, 70, 80), lty = 3, cex = 1.5) mtext(expression(italic("r")), side = 2, line = 1.5, adj = 0.5, cex = 0.75) par(fig = c(0, 1, 0, 1), mar = c(0, 4, 0, 0), omi = c(1.9, 7.3/2-0.5, 1.8, 0.2), mgp = c(1, 0.6, 0), new = TRUE) plot(-log10(p.values), ylab = "", pch = 20, cex = 1, xlab = "Immune parameter", las = 1, ylim = c(0, 30), cex.axis = 0.75, xaxt = "n") Axis(side = 1, at = c(0, 10, 20, 30, 40, 50), cex.axis = 0.75) mtext(text = "Immune parameter", side = 1, line = 1.5, adj = 0.5, cex = 0.75) mtext(expression("-log"["10"] * italic(" P")), side = 2, line = 1.5, adj = 0.6, cex = 0.75) abline(v = c(10, 20, 30, 40, 50, 60, 70, 80), lty = 3, cex = 1.5) abline(h = -log10(0.01), cex = 3) # Bottom half of inset figure age.sex.models <- matrix(nrow = nrow(short.flow.names), ncol = 8) rownames(age.sex.models) <- rownames(short.flow.names$short_name) for (i in c(13:23, 25:66)) { model <- lm(data[, i] ~ data[, "Ageattimeofsampling.years."] + data[, "Sex"], na.action = "na.exclude") library(relaimpo) ca <- calc.relimp(model, diff = T, rela = T) age.sex.models[c(i ), 1] <- ca$r2 age.sex.models[c(i ), 2] <- ca$lmg[grepl(pattern = "Age", x = names(ca$lmg))] age.sex.models[c(i ), 3] <- ca$lmg[grepl(pattern = "Sex", x = names(ca$lmg))] age.sex.models[c(i ), 4] <- pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F) model <- lm(data[, i] ~ data[, "Ageattimeofsampling.years."], na.action = "na.exclude") age.sex.models[c(i ), 5] <- summary(model)$r.squared age.sex.models[c(i ), 6] <- pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F) model <- lm(data[, i] ~ data[, "Sex"], na.action = "na.exclude") 22

23 age.sex.models[c(i ), 7] <- summary(model)$r.squared age.sex.models[c(i ), 8] <- pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F) cat(i) 23

24 Panels B to K drawn here. correlations <- cbind(r = xc[rownames(xc)!= "Age", "Age"], p.values, neg.log10.p = -log10(p.values)) correlations <- cbind(correlations, r2 = correlations[, "r"] * correlations[, "r"]) #r2 here is not truly R2, but it is a handy way to set an r < # Now we can plot a few, targeted # scatterplots to show a correlation with # age sig.correlations <- correlations[correlations[, "p.values"] < 0.01 & correlations[, "r2"] > , ] sig.correlations <- as.data.frame(sig.correlations) # sig.correlations <- # sig.correlations[rev(order(apply(data[! # is.na(data$ageattimeofsampling.years.) #,rownames(sig.correlations)], 2, # function(x) max(x, na.rm = TRUE)))),] # #this was to sort by magnitude of # response sig.correlations$cell_type <- factor("", levels = levels(short.flow.names$cell_type)) sig.correlations$cell_type_col <- as.character(1:nrow(sig.correlations)) for (i in 1:nrow(sig.correlations)) { sig.correlations$cell_type[i] <- short.flow.names[short.flow.names$short_name == rownames(sig.correlations)[i], ]$cell_type sig.correlations$cell_type_col[i] <- short.flow.names[short.flow.names$short_name == rownames(sig.correlations)[i], ]$cell_type_col sig.correlations$fudge_for_ordering <- as.character(sig.correlations$cell_type) sig.correlations[sig.correlations$fudge_for_ordering == "Cytokine", ]$fudge_for_ordering <- "A" sig.correlations <- sig.correlations[order(sig.correlations$fudge_for_ordering, decreasing = T), ] # Sort these into order cell_type groups: library(rcolorbrewer) data$sex_cols <- factor(data$sex, levels = c("f", "M")) pal <- brewer.pal(3, "Set1") par(mfrow = c(2, 5)) par(cex = 1) par(mar = c(0.75, 0.75, 0.75, 0.75), oma = c(2, 3, 1, 3)) par(tcl = -0.25) par(mgp = c(2, 0.4, 0)) for (i in 1:nrow(sig.correlations)) { plot(data[, rownames(sig.correlations)[i]] ~ data$ageattimeofsampling.years., col = pal[data$sex_cols], axes = FALSE, type = "p", pch = 20, cex = 0.6, ylim = c(ifelse(sig.correlations$cell_type[i]!= "Cytokine", 0, min(data[!is.na(data$ageattimeofsampling.years.), rownames(sig.correlations)[i]], na.rm = T)), 1.2 * max(data[!is.na(data$ageattimeofsampling.years.), 24

25 rownames(sig.correlations)[i]], na.rm = T)), yaxt = "n", xaxt = "n") Axis(x = ifelse(1.2 * max(data[!is.na(data$ageattimeofsampling.years.), rownames(sig.correlations)[i]], na.rm = T) < 100, 1.2 * max(data[!is.na(data$ageattimeofsampling.years.), rownames(sig.correlations)[i]], na.rm = T), 100), side = ifelse(sig.correlations$cell_type[i] == "Cytokine", 4, 2), labels = T, las = 1, cex.axis = 0.6) box() mtext(letters[i + 1], side = 3, line = 0, adj = 0, cex = 1.5) mtext(rownames(sig.correlations)[i], side = 3, line = 0.75, adj = 1, cex = 0.75, col = sig.correlations$cell_type_col[i]) mtext(paste0("p=", signif(as.numeric(sig.correlations[i, "p.values"]), digits = 2)), side = 3, line = 0, adj = 1, cex = 0.75) if (i == 1) legend("topright", c("f", "M"), col = pal[1:nlevels(data$sex_cols)], pch = 20, cex = 0.6, ncol = 1) if (i %in% c(6:10)) Axis(x = data$ageattimeofsampling.years., side = 1, cex.axis = 0.75, at = c(0, 25, 50, 75)) if (i %in% c(1:5)) Axis(x = data$ageattimeofsampling.years., side = 1, at = c(0, 25, 50, 75), labels = F, tick = T, tcl = 0.25) abline(coef(lm(data[, rownames(sig.correlations)[i]] ~ data$ageattimeofsampling.years.)), lwd = 2) abline(coef(lm(data[data$sex == "M", rownames(sig.correlations)[i]] ~ data[data$sex == "M", "Ageattimeofsampling.years."], na.action = na.omit)), untf = T, col = pal[2], lwd = 2) abline(coef(lm(data[data$sex == "F", rownames(sig.correlations)[i]] ~ data[data$sex == "F", "Ageattimeofsampling.years."], na.action = na.omit)), untf = T, col = pal[1], lwd = 2) mtext("age", side = 1, outer = TRUE, line = 1, cex = 0.75) mtext("% flow parameter", side = 2, outer = TRUE, line = 1, cex = 0.75) mtext(expression("log"[10] * "[cytokine]/pg.ml"^"-1"), side = 4, outer = TRUE, line = 1, cex = 0.75) 25

26 Code for panels L and M par(mfcol = c(1, 2), mar = c(3, 5, 0.5, 0.5), omi = c(0, 0, 0, 0), mgp = c(1, 0.6, 0)) plot(age.sex.models[, 5], ylab = "", xlab = "", xaxt = "n", yaxt = "n", las = 1, pch = 15, cex = 0.75, ylim = c(0, 1), cex.axis = 0.75, cex.lab = 0.75, col = "grey") # Age points(age.sex.models[, 7], ylim = c(0, 1), ylab = expression(italic("r"^"2")), xlab = "", las = 1, cex = 0.75, xaxt = "n", pch = 20) # Sex points(age.sex.models[, 1], ylim = c(0, 1), ylab = expression(italic("r"^"2")), xlab = "", las = 1, cex = 1.25, xaxt = "n", pch = 1, col = "black") # Both Axis(side = 1, at = c(0, 10, 20, 30, 40, 50), cex.axis = 0.75) Axis(side = 2, at = c(0, 0.5, 1), las = 1, cex.axis = 0.75) mtext(text = "Immune parameter", side = 1, line = 1.5, adj = 0.5, cex = 0.75) mtext(expression(italic("r"^"2")), side = 2, line = 1.5, adj = 0.6, cex = 0.75) mtext("l", side = 3, adj = -0.3, line = -1, cex = 1.5) legend("topleft", pch = c(1, 20, 15), pt.cex = c(1.25, 0.75, 0.75), col = c("black", "black", "grey"), ncol = 3, legend = c("sex + age", "Sex", "Age"), cex = 0.75, bty = "n") plot(-log10(p.adjust(age.sex.models[, 6], method = "bonferroni", n = nrow(age.sex.models))), ylim = c(0, 40), ylab = "", xlab = "", xaxt = "n", las = 1, pch = 15, cex = 0.75, cex.axis = 0.75, cex.lab = 0.75, yaxt = "n", col = "grey") points(-log10(p.adjust(age.sex.models[, 8], method = "bonferroni", n = nrow(age.sex.models))), ylim = c(0, 25), pch = 20, cex = 0.75) points(-log10(p.adjust(age.sex.models[, 4], method = "bonferroni", n = nrow(age.sex.models))), ylim = c(0, 25), pch = 1, cex = 1.25) Axis(side = 1, at = c(0, 10, 20, 30, 40, 50), cex.axis = 0.75) Axis(side = 2, at = c(0, 20, 40), cex.axis = 0.75, las = 1) mtext(text = "Immune parameter", side = 1, line = 1.5, adj = 0.5, cex = 0.75) abline(h = -log10(0.05)) mtext(expression("-log"["10"] * italic(" P")), side = 2, line = 1.5, adj = 0.6, cex = 0.75) mtext("m", side = 3, adj = -0.3, line = -1, cex = 1.5) 26

27 data[, rownames(sig.correlations)[i]] data[, rownames(sig.correlations)[i]] % flow parameter A CD4+RTE B trans CD8+RTE CD4+naive CD8+naive DCs B cell pdcs IL 17 Bnaive CD8+EM gdtcr CD4+GMCSF+ IL 12 Th17 IL 13 IL 8 MBL Tfh BAFF Treg prolif IL 4 Bswitch Bmem CD4+ CD8+prolif Lymphocyte CD4+EMRA Th2 CD4+IL21+ IL 10 CD8+GMCSF+ Plasmablast IFNg CD4+CM T cells Treg NKT CD8+EMRA TNFa B IgE+ NK mdcs CD4+EM CD4+prolif CD8+CM CD8+ IL 6 Th1 CD4+IL2+ inkt Tc1 CD8+IL2+ Age CD4+RTE B trans CD8+RTE CD4+naive CD8+naive data[, rownames(sig.correlations)[i]] data[, rownames(sig.correlations)[i]] B CD4+RTE p=3e 18 F M DCs B cell pdcs IL 17 Bnaive CD8+EM gdtcr CD4+GMCSF+ IL 12 Th17 IL 13 IL 8 MBL Tfh BAFF Treg prolif IL 4 Bswitch Bmem CD4+ CD8+prolif Lymphocyte CD4+EMRA Th2 CD4+IL21+ IL 10 CD8+GMCSF+ Plasmablast IFNg CD4+CM T cells Treg NKT CD8+EMRA TNFa B IgE+ NK mdcs CD4+EM CD4+prolif CD8+CM CD8+ IL 6 Th1 CD4+IL2+ inkt Tc1 CD8+IL2+ Age C data[, rownames(sig.correlations)[i]] data[, rownames(sig.correlations)[i]] B trans p=8.1e r log 10 P D CD8+RTE p=1.1e Immune parameter Immune parameter 15 data$ageattimeofsampling.years. data$ageattimeofsampling.years. data$ageattimeofsampling.years. data$ageattimeofsampling.years. data$ageattimeofsampling.years G Tc1 p=4.9e H CD8+IL2+ p=7.8e I data[, rownames(sig.correlations)[i]] data[, rownames(sig.correlations)[i]] CD8+ p=7.8e E data[, rownames(sig.correlations)[i]] data[, rownames(sig.correlations)[i]] Th1 p=2.9e F CD4+IL2+ p=5.2e 23 data$ageattimeofsampling.years. data$ageattimeofsampling.years. data$ageattimeofsampling.years. data$ageattimeofsampling.years. data$ageattimeofsampling.years. 1.0 L Sex + age Sex Age 0.5 R Age M log 10 P J inkt p=6.2e K IL 6 p=6.6e log 10 [cytokine]/pg.ml Immune parameter Immune parameter 27

28 Figure 5 par(mar = c(6, 5, 2, 3) + 1) mm <- cbind(c(1, 1, 2, 2, 3, 3, 3, 5, 5), c(1, 1, 2, 2, 4, 4, 4, 5, 5)) layout(mm) # Now we can try a depression/anxiety # model: colnames(data) HADS <- matrix(nrow = , ncol = 4) rownames(hads) <- colnames(data)[13:66] colnames(hads) <- c("p_of_model", "r2_of_model", "HADSdepression", "HADSanxiety") # visually inspect ca$lmg from the loop functions below to check c for (i in c(13:24, 26:64)) { model <- lm(data[, i] ~ data$hadsdepressionscore + data$hadsanxietyscore) HADS[c(i ), 1] <- pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F) HADS[c(i ), 2] <- summary(model)$r.squared if (i %in% c(13:24, 26:63)) { ca <- calc.relimp(model, diff = T, rela = T) HADS[c(i ), 3] <- ca$lmg[grepl(pattern = "depression", x = names(ca$lmg))] HADS[c(i ), 4] <- ca$lmg[grepl(pattern = "anxiety", x = names(ca$lmg))] rm(ca) rm(model) cat(i, sep = "\t") colnames(data)[13:66] == short.flow.names$short_name length(colnames(data)[13:66]) length(short.flow.names$short_name) # Check there are no children with BMI # measurements: summary(data[data$ageattimeofsampling.years. < 18, ]$BMI) # No one under the age of 18 has a BMI # listed. BMI.models <- matrix(nrow = , ncol = 8) rownames(bmi.models) <- colnames(data)[13:66] colnames(bmi.models) <- c("p_of_model", "r2_of_model", "BMI", "Age", "r2_for_bmi_alone", "p_for_bmi_alone", "r2_for_age_alone", "p_for_age_alone") # visually inspect ca$lmg from the loop functions below to c data.no.kids <- data[data$ageattimeofsampling.years. >= 18, ] # Skipping out any children. for (i in c(13:24, 26:65)) { model <- lm(data.no.kids[, i] ~ data.no.kids$bmi + data.no.kids$ageattimeofsampling.years., na.action = "na.exclude") 28

29 BMI.models[c(i ), 1] <- pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F) BMI.models[c(i ), 2] <- summary(model)$r.squared cat(i) ca <- calc.relimp(model, diff = T, rela = T) BMI.models[c(i ), 3] <- ca$lmg[grepl(pattern = "BMI", x = names(ca$lmg))] BMI.models[c(i ), 4] <- ca$lmg[grepl(pattern = "Age", x = names(ca$lmg))] rm(ca) for (i in c(13:24, 26:65)) { model.bmi <- lm(data.no.kids[, i] ~ data.no.kids$bmi) BMI.models[c(i ), 5] <- summary(model.bmi)$r.squared BMI.models[c(i ), 6] <- pf(summary(model.bmi)$fstatistic[1], summary(model.bmi)$fstatistic[2], summary(model.bmi)$fstatistic[3], lower.tail = F) model.age <- lm(data.no.kids[, i] ~ data.no.kids$ageattimeofsampling.years.) BMI.models[c(i ), 7] <- summary(model.age)$r.squared BMI.models[c(i ), 8] <- pf(summary(model.age)$fstatistic[1], summary(model.age)$fstatistic[2], summary(model.age)$fstatistic[3], lower.tail = F) rm(model) cat(i, sep = "\t") plot(bmi.models[, 7], ylim = c(0, 1), ylab = expression(italic("r"^"2")), xlab = "", las = 1, pch = 15, cex = 1.5, xaxt = "n", col = "grey") # Age only Axis(side = 1, at = 1:53, labels = rownames(all.models[1:53, ]), las = 3) points(bmi.models[, 5], pch = 20) # BMI only points(bmi.models[, 2], pch = 1, cex = 1.5) # Both legend("topleft", pch = c(1, 20, 15), pt.cex = c(1.5, 1, 1), col = c("black", "black", "grey"), legend = c("bmi + age", "BMI", "Age"), ncol = 3, bty = "n") mtext("a", side = 3, adj = 0, line = 1, cex = 1.5) plot(-log10(p.adjust(bmi.models[, 8], method = "bonferroni", n = 65-13)), ylab = expression("-log"["10"] * italic(" P")), xlab = "", las = 1, pch = 15, cex = 1, col = "grey", xaxt = "n", ylim = c(0, 40)) points(-log10(p.adjust(bmi.models[, 6], method = "bonferroni", n = 65-13)), pch = 20) # BMI only points(-log10(p.adjust(bmi.models[, 1], method = "bonferroni", n = 65-13)), pch = 1, cex = 1.5) # Both abline(h = -log10(0.05)) Axis(side = 1, at = 1:53, labels = rownames(all.models[1:53, ]), las = 3) mtext("b", side = 3, adj = 0, line = 1, cex = 1.5) 29

30 plot(data.no.kids$bmi ~ data.no.kids$ageattimeofsampling.years., ylab = "BMI", xlab = "Age", col = pal[data.no.kids$sex_cols], pch = 20, cex = 1.5) abline(coef(lm(data.no.kids$bmi ~ data.no.kids$ageattimeofsampling.years.)), lwd = 2.5) abline(coef(lm(data.no.kids[data.no.kids$sex == "M", ]$BMI ~ data.no.kids[data.no.kids$sex == "M", "Ageattimeofsampling.years."], na.action = na.omit)), untf = T, col = pal[2], lwd = 2.5) abline(coef(lm(data.no.kids[data.no.kids$sex == "F", ]$BMI ~ data.no.kids[data.no.kids$sex == "F", "Ageattimeofsampling.years."], na.action = na.omit)), untf = T, col = pal[1], lwd = 2.5) legend("topleft", c("f", "M"), col = pal[1:nlevels(data.no.kids$sex_cols)], pch = 20, bty = "n") mtext("c", side = 3, adj = 0, line = 1, cex = 1.5) BMI.models <- cbind(bmi.models, filt_bmi_r2 = BMI.models[, 3], filt_age_r2 = BMI.models[, 4]) colnames(bmi.models) BMI.models[, "filt_bmi_r2"] <- ifelse(p.adjust(bmi.models[, 1], method = "bonferroni", n = ) < 0.05, BMI.models[, "filt_bmi_r2"], 0) BMI.models[, "filt_age_r2"] <- ifelse(p.adjust(bmi.models[, 1], method = "bonferroni", n = ) < 0.05, BMI.models[, "filt_age_r2"], 0) barplot(t(bmi.models[p.adjust(bmi.models[, 1], method = "bonferroni", n = ) < 0.05 &!is.na(bmi.models[, 1]), c("filt_bmi_r2", "filt_age_r2")]), ylab = expression("proportion of " * italic("r"^"2")), las = 2, cex.names = 1, names.arg = rownames(bmi.models[p.adjust(bmi.models[, 1], method = "bonferroni", n = ) < 0.05 &!is.na(bmi.models[, 1]), ]), legend.text = substr(colnames(bmi.models[p.adjust(bmi.models[, 1], method = "bonferroni", n = ) < 0.05, c("filt_bmi_r2", "filt_age_r2")]), 6, 8), args.legend = list(x = 9, y = 0.9, bg = "white", box.lwd = 0)) mtext("d", side = 3, adj = 0, line = 1, cex = 1.5) plot(hads[, 2], ylim = c(0, 1), ylab = expression(italic("r"^"2")), las = 1, xlab = "", pch = 20, xaxt = "n") par(new = TRUE) plot(-log10(p.adjust(hads[, 1], method = "bonferroni", n = )), ylab = "", xlab = "", ylim = c(0, 10), pch = 20, col = "grey", xaxt = "n", frame = F, yaxt = "n") Axis(x = -log10(p.adjust(hads[, 1], method = "bonferroni", n = )), side = 4, las = 1) Axis(side = 1, at = 1:53, labels = rownames(all.models[1:53, ]), las = 3) legend("topleft", bty = "n", pch = 20, col = c("black", 30

31 "grey"), c(expression(italic("r"^"2") * " for HADS"), expression("-log"["10"] * italic(" P") * " for HADS")), ncol = 2) mtext(expression("-log"["10"] * italic(" P")), side = 4, line = 2, cex = 0.6) mtext("e", side = 3, adj = 0, line = 1, cex = 1.5) abline(h = -log10(0.05), col = "grey") data.bmi <- data.no.kids[!(is.na(data.no.kids$bmi)), ] 31

32 A R BMI + age BMI Age Lymphocyte T cells CD4+ CD4+RTE CD4+naive CD4+EM CD4+EMRA CD4+prolif CD4+CM Th1 Th2 Th17 Th10 Tfh CD4+IL21+ CD4+GMCSF+ CD4+IL2+ Treg Treg prolif CD8+ CD8+naive CD8+RTE CD8+CM CD8+EM CD8+EMRA CD8+prolif Tc1 CD8+IL2+ CD8+GMCSF+ gdtcr B cell B trans Bnaive Bswitch Bmem B IgE+ Plasmablast NKT inkt NK DCs mdcs pdcs TNFa IL 4 IL 6 IL 8 IL 10 IL 17 IL 12 IL 13 IFNg BAFF B log10 P Lymphocyte T cells CD4+ CD4+RTE CD4+naive CD4+EM CD4+EMRA CD4+prolif CD4+CM Th1 Th2 Th17 Th10 Tfh CD4+IL21+ CD4+GMCSF+ CD4+IL2+ Treg Treg prolif CD8+ CD8+naive CD8+RTE CD8+CM CD8+EM CD8+EMRA CD8+prolif Tc1 CD8+IL2+ CD8+GMCSF+ gdtcr B cell B trans Bnaive Bswitch Bmem B IgE+ Plasmablast NKT inkt NK DCs mdcs pdcs TNFa IL 4 IL 6 IL 8 IL 10 IL 17 IL 12 IL 13 IFNg BAFF BMI C F M Proportion of R D Age BMI Age Th1 CD4+IL2+ CD8+RTE Tc1 CD8+IL2+ IL 6 E R R 2 for HADS log 10 P for HADS Lymphocyte T cells CD4+ CD4+RTE CD4+naive CD4+EM CD4+EMRA CD4+prolif CD4+CM Th1 Th2 Th17 Th10 Tfh CD4+IL21+ CD4+GMCSF+ CD4+IL2+ Treg Treg prolif CD8+ CD8+naive CD8+RTE CD8+CM CD8+EM CD8+EMRA CD8+prolif Tc1 CD8+IL2+ CD8+GMCSF+ gdtcr B cell B trans Bnaive Bswitch Bmem B IgE+ Plasmablast NKT inkt NK DCs mdcs pdcs TNFa IL 4 IL 6 IL 8 IL 10 IL 17 IL 12 IL 13 IFNg BAFF log 10 P 32

33 Figure 6 parents <- data[grepl(pattern = "PB", data$patientid) grepl(pattern = "PA", data$patientid), ] # Call the family pairs: parents$pairid <- substr(parents$patientid, start = 1, stop = 6) parents$pairid <- factor(parents$pairid) summary(parents$pairid, max = 1000) # Separate the siblings out parents$siblings <- substr(parents$patientid, start = 7, stop = 8) summary(factor(parents$siblings)) parents <- parents[parents$siblings %in% c("f", "m", "M"), ] parents$siblings <- gsub(pattern = "m", replacement = "M", parents$siblings) # Find duplicate paired parents paired.parents.list <- parents[duplicated(parents$pairid), ]$PairID paired.parents <- parents[parents$pairid %in% paired.parents.list, ] paired.parents$pairid <- factor(paired.parents$pairid) # Lose empty levels of the factor parent.xc <- cor(t(paired.parents[, 13:66]), use = "pairwise.complete.obs", method = "spearman") ncol(parent.xc) rownames(parent.xc) <- paired.parents$patientid colnames(parent.xc) <- paired.parents$patientid View(parent.xc) parent.xc2 <- parent.xc[rownames(parent.xc)!= c("pa/031f", "PA/031M"), colnames(parent.xc)!= c("pa/031f", "PA/031M")] # These only have cytokine data nrow(parent.xc2) m <- rbind(c(1, 2, 2)) layout(m) d <- dist(parent.xc2) fit <- cmdscale(d, eig = TRUE, k = 2) # k is the number of dim ff <- fit$points pal <- brewer.pal(3, "Set1") # Measure distances: 1. make a 'pair' # factor: ff <- as.data.frame(ff) ff$pair <- paste0(substr(rownames(ff), 1, 2), substr(rownames(ff), 4, 6)) ff$pair <- factor(ff$pair) summary(ff$sex) # plot(fit$points[,1], fit$points[,2], # xlab='coordinate 1', ylab='coordinate # 2') plot(ff[, 1], ff[, 2], xlab = "Coordinate 1", ylab = "Coordinate 2", cex.axis = 1.3, type = "n", pch = 20) 33

34 for (i in 1:nlevels(ff$pair)) { lines(ff[ff$pair == levels(ff$pair)[i], 1], ff[ff$pair == levels(ff$pair)[i], 2], col = "grey") points(ff[, 1], ff[, 2], xlab = "Coordinate 1", ylab = "Coordinate 2", cex.axis = 1.3, type = "p", pch = 20, col = ifelse(substr(rownames(ff), 7, 7) == "M", pal[1], pal[2])) mtext("a", side = 3, adj = 0, line = 2, cex = 1.5) legend("bottomleft", pch = 20, col = pal, legend = c("f", "M"), bty = "n") # 2. calculate distance between pairs: pc.pair.distances <- matrix(nrow = nlevels(ff$pair), ncol = 1) # A container for the results for (i in 1:nlevels(ff$pair)) { pair2 <- ff[ff$pair %in% levels(ff$pair)[i], ] # this should give 2 rows and a single pair. pc.pair.distances[i, 1] <- sqrt(((pair2[1, 1] - pair2[2, 1]) * (pair2[1, 1] - pair2[2, 1])) # PC1 difference + ((pair2[1, 2] - pair2[2, 2]) * (pair2[1, 2] - pair2[2, 2])) # PC2 difference ) rm(pair2) # 3 random pairs: ff$sex <- substr(rownames(ff), 7, 7) ff$sex <- factor(ff$sex) ff <- ff[order(ff$sex), ] summary(ff$sex) dim(ff) # Now the first 70 are females; the last # 70 are males We can iterate on this - # pick a female based on # levels(ff$pair)[i], then randomly # select 5 non i males. random.pair.distances <- matrix(nrow = nlevels(ff$pair) * 5, ncol = 1) set.seed(42) # so the random samples aren't resampled every time we knit the pdf, so the figures are co for (i in 1:nlevels(ff$pair)) { pair3 <- ff[ff$pair %in% levels(ff$pair)[i] & ff$sex == "F", ] # pick out each father in turn pair3 <- rbind(pair3, ff[sample(rownames(ff)[71:140], size = 5, replace = F), ]) # randomly sample 5 women, who are in rows 77:152 for (n in (1:5)) { random.pair.distances[(i - 1) * 5 + n, 1] <- sqrt(((pair3[1, 1] - pair3[(n + 1), 1]) * (pair3[1, 1] - pair3[(n + 1), 1])) # PC1 difference + ((pair3[1, 2] - pair3[(n + 1), 2]) * (pair3[1, 34

35 2] - pair3[(n + 1), 2])) # PC2 difference ) rm(pair3) rm(ff2) stripchart(pc.pair.distances, vertical = F, xlim = c(0, max(random.pair.distances[, 1]) + 1.5), method = "jitter", jitter = 1, at = 4, pch = 20, xlab = "distance", las = 1, ylim = c(-2, 6)) boxplot(pc.pair.distances, add = T, at = 2.5, horizontal = T, outline = F, frame.plot = F, axes = F) stripchart(random.pair.distances, vertical = F, method = "jitter", jitter = 1, add = T, at = 0, col = "grey", pch = 20, frame.plot = F, axes = F) boxplot(random.pair.distances, add = T, at = -1.5, horizontal = T, outline = F, axes = F) axis(side = 2, at = c(0, 4), labels = c("random", "Parents")) legend("topleft", pch = 20, col = c("black", "grey"), c("parental pairs", "Random pairs"), ncol = 2, bty = "n") lines(x = c(6, 6), y = (c(-1.5, 2.5))) text(x = 6.2, y = 0.5, adj = 0, labels = paste0("p=", signif(wilcox.test(pc.pair.distances[, 1], random.pair.distances[, 1])$p.value, digits = 2))) mtext("b", side = 3, adj = 0, line = 2, cex = 1.5) 35

36 A B Coordinate F M Random Parents Parental pairs Random pairs p=7.7e Coordinate 1 distance

Online Appendix to Mixed Modeling for Irregularly Sampled and Correlated Functional Data: Speech Science Spplications

Online Appendix to Mixed Modeling for Irregularly Sampled and Correlated Functional Data: Speech Science Spplications Marianne Pouplier, Jona Cederbaum, Philip Hoole, Stefania Marin, Sonja Greven R Syntax