Contents. Introduction 2. PART A - data import & R session setup 3 Step 1 - read in the data load the libraries sessioninfo...

Size: px
Start display at page:

Download "Contents. Introduction 2. PART A - data import & R session setup 3 Step 1 - read in the data load the libraries sessioninfo..."

Transcription

1 R code for The human immune system is robustly maintained in multiple stable equilibriums shaped by age and cohabitation Ed Carr, on behalf of co-authors 10 September 2015 Contents Introduction 2 PART A - data import & R session setup 3 Step 1 - read in the data load the libraries sessioninfo PART B - Figure code and illustrations 7 Figure Figure Figure Figure Figure

2 Introduction This document supports the main text of the paper. Code can be re-run by the reader on their machine. There are two recommended ways to achieve this. 1. copy and paste code from this pdf into an instance of R 2. Open the.rmd (RStudio describes this format : Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see rstudio.com. ) in RStudio or your favourite code editor. Send lines of code from your editor to R as you wish. 2

3 PART A - data import & R session setup Step 1 - read in the data Data is provided in two formats. 1. xls to peruse in Excel. 2. RData file. This contains each sheet of the xls file as a separate object. The data within the xls file and RData files are identical. For ease, we will import the data into R only from the RData file. # This code assumes that the RData file is in your working directory. load(file = "Original_data_for_resource_v2.RData") ls() [1] "all.data" "cell_type_pal" "data" [4] "short.flow.names" These four objects are also the names of the xls sheets, if you wish to view them in Excel. 3

4 2 - load the libraries # The following libraries are used. You will need to download these locally # from CRAN/bioconductor Remove the # from the next 2 lines to download this # source(' bioclite('relaimpo') # Then re-run the bioclite line, replacing with each library name below. library(relaimpo) Loading required package: MASS Loading required package: boot Loading required package: survey Loading required package: grid Attaching package: 'survey' The following object is masked from 'package:graphics': dotchart Loading required package: mitools This is the global version of package relaimpo. If you are a non-us user, a version with the interesting additional metric pmvd is available from Ulrike Groempings web site at prof.beuth-hochschule.de/groemping. library(rcolorbrewer) library(ellipse) library(gplots) Attaching package: 'gplots' The following object is masked from 'package:stats': lowess library(vegan) Loading required package: permute Loading required package: lattice Attaching package: 'lattice' The following object is masked from 'package:boot': melanoma 4

5 This is vegan Attaching package: 'vegan' The following object is masked from 'package:survey': calibrate library(pspearman) # Random seed is set (so that re-runs look the same) set.seed(42) 5

6 3 - sessioninfo # This command tells you what versions you are using. Useful if you get # different results to those in the paper. sessioninfo() R version ( ) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Dutch_Belgium.1252 LC_CTYPE=Dutch_Belgium.1252 [3] LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C [5] LC_TIME=Dutch_Belgium.1252 attached base packages: [1] grid stats graphics grdevices utils datasets methods [8] base other attached packages: [1] pspearman_0.3-0 vegan_2.3-0 lattice_ [4] permute_0.8-4 gplots_ ellipse_0.3-8 [7] RColorBrewer_1.1-2 relaimpo_2.2-2 mitools_2.3 [10] survey_ boot_ MASS_ loaded via a namespace (and not attached): [1] bitops_1.0-6 catools_ cluster_2.0.3 [4] corpcor_1.6.8 digest_0.6.8 evaluate_0.7.2 [7] formatr_1.2 gdata_ gtools_3.4.2 [10] htmltools_0.2.6 KernSmooth_ knitr_1.11 [13] magrittr_1.5 Matrix_1.2-2 mgcv_1.8-7 [16] nlme_ parallel_3.1.2 rmarkdown_0.8 [19] stringi_0.5-5 stringr_1.0.0 tools_3.1.2 [22] yaml_

7 PART B - Figure code and illustrations For each figure, the R code is shown first, then the same code is run to plot the figure. 7

8 Figure 1 Panel A is drawn first. # Correlation matrix using the data from # the last visit of each individual: flow.cytokine.data <- data[, 13:66] xxc3 <- cor(flow.cytokine.data, use = "pairwise.complete.obs", method = "spearman") levels(factor(short.flow.names$cell_type_col)) # Dendrogram drawn from heatmap function par(mai = c(0, 0, 0, 0), oma = c(0, 0, 0, 0), omi = c(0, 0, 0, 0)) hm <- heatmap(xxc3, RowSideColors = short.flow.names$cell_type_col, labcol = "", labrow = "", Colv = NA, margins = c(0, 0), col = 0) # These lines were set by user lines(x = c(0.26, 0.3), y = c(1, 0.85), lwd = 2) lines(x = c(0.26, 0.3), y = c(0, 0.05), lwd = 2) # Re-order the correlation matrix to # match the dendrogram xxc3 <- xxc3[rev(hm$rowind), rev(hm$rowind)] # Over-plot the entire device with a # clear plot to allow full legend # control. par(fig = c(0, 1, 0, 1), oma = c(0, 0, 0, 0), mar = c(0, 0, 0, 0) + 0, new = TRUE) mtext("a", side = 3, line = -2, cex = 1.5, adj = 0.1) # Over-plot the entire device with a # clear plot to allow full legend # control. par(fig = c(0, 1, 0, 0.6), oma = c(0, 0, 0, 0), mar = c(0, 0, 0, 0) + 0, new = TRUE) legend("topright", fill = cell_type_pal, legend = levels(short.flow.names$cell_type), cex = 0.8, bty = "n") cor.col <- bluered(11) # Over-plot the entire device with a # clear plot par(fig = c(0, 1, 0, 1), mar = c(0, 0, 0, 0), new = TRUE) # plotcorr3 function is build from the # plotcorr function in ellipse Minor, # simple adjustments were made to alter # the colour scheme and plot margins The # original plotcorr is a better function # to use in nearly all circumstances If # you want to directly replicate our # figure, use plotcorr3 plotcorr3 <- function(corr, outline = TRUE, col = "grey", numbers = FALSE, type = c("full", "lower", "upper"), diag = (type == "full"), bty = "n", axes = FALSE, 8

9 xlab = "", ylab = "", asp = 1, cex.lab = par("cex.lab"), cex = 1 * par("cex"), mar = c(0, 5.5, 0, 0), col.lab = "",...) { savepar <- par(pty = "s", mar = mar) on.exit(par(savepar)) if (is.null(corr)) return(invisible()) if ((!is.matrix(corr)) (round(min(corr, na.rm = TRUE), 6) < -1) (round(max(corr, na.rm = TRUE), 6) > 1)) stop("need a correlation matrix") plot.new() par(new = TRUE) rowdim <- dim(corr)[1] coldim <- dim(corr)[2] rowlabs <- dimnames(corr)[[1]] collabs <- dimnames(corr)[[2]] if (is.null(rowlabs)) rowlabs <- 1:rowdim if (is.null(collabs)) collabs <- 1:coldim rowlabs <- as.character(rowlabs) collabs <- as.character(collabs) col <- rep(col, length = length(corr)) dim(col) <- dim(corr) type <- match.arg(type) cols <- 1:coldim rows <- 1:rowdim xshift <- 0 yshift <- 0 if (!diag) { if (type == "upper") { cols <- 2:coldim rows <- 1:(rowdim - 1) xshift <- 1 else if (type == "lower") { cols <- 1:(coldim - 1) rows <- 2:rowdim yshift <- -1 maxdim <- max(length(rows), length(cols)) plt <- par("plt") xlabwidth <- max(strwidth(rowlabs[rows], units = "figure", cex = cex.lab))/(plt[2] - plt[1]) xlabwidth <- xlabwidth * maxdim/(1 - xlabwidth) ylabwidth <- max(strwidth(collabs[cols], units = "figure", cex = cex.lab))/(plt[4] - plt[3]) ylabwidth <- ylabwidth * maxdim/(1 - ylabwidth) 9

10 plot(c(-xlabwidth - 0.5, maxdim + 0.5), c(0.5, maxdim ylabwidth), type = "n", bty = bty, axes = axes, xlab = "", ylab = "", asp = asp, cex.lab = cex.lab,...) text(rep(0, length(rows)), length(rows):1, labels = rowlabs[rows], adj = 1, cex = cex.lab, col = col.lab) text(cols - xshift, rep(length(rows) + 1, length(cols)), labels = collabs[cols], srt = 90, adj = 0, cex = cex.lab, col = col.lab) mtext(xlab, 1, 0) mtext(ylab, 2, 0) mat <- diag(c(1, 1)) plotcorrinternal <- function() { if (i == j &&!diag) return() if (!numbers) { mat[1, 2] <- corr[i, j] mat[2, 1] <- mat[1, 2] ell <- ellipse(mat, t = 0.43) ell[, 1] <- ell[, 1] + j - xshift ell[, 2] <- ell[, 2] + length(rows) i - yshift polygon(ell, col = col[i, j]) if (outline) lines(ell) else { text(j xshift, length(rows) i - yshift, round(10 * corr[i, j], 0), adj = 1, cex = cex) for (i in 1:dim(corr)[1]) { for (j in 1:dim(corr)[2]) { if (type == "full") { plotcorrinternal() else if (type == "lower" && (i >= j)) { plotcorrinternal() else if (type == "upper" && (i <= j)) { plotcorrinternal() invisible() # plotcorr plotcorr3(xxc3, type = "lower", col = cor.col[5 * xxc3 + 6], cex.lab = 0.5, diag = T, xlab = "", 10

11 ylab = "", col.lab = short.flow.names[rev(hm$rowind), ]$cell_type_col) 11

12 Panel B is shown below. flow.cytokine.data <- data[, 13:66] xc <- cor(flow.cytokine.data, use = "pairwise.complete.obs", method = "spearman") xc.dist <- dist(xc) # monomds from vegan package set.seed(42) xcmds <- monomds(xc.dist, k = 2) par(mar = c(4, 8, 1, 8)) # plot the MDS plot plot(xcmds, type = "p", ylim = c(-1, 1.75), las = 1, cex.axis = 0.75, xlab = "First dimension of non-metric mutlidimensional scaling (NMDS)", ylab = "Second dimension of NMDS") # Add ellipses for each cell type for (i in 1:nlevels(short.flow.names$cell_type)) { ordiellipse(xcmds, groups = short.flow.names$cell_type, draw = "polygon", col = cell_type_pal[i], show.groups = levels(short.flow.names$cell_type)[i], alpha = 75) # Labelled spider for 'precursors' ordispider(xcmds, groups = short.flow.names$cell_type, label = T, show.groups = "Precursor", spiders = "centroid") # Unlabelled spiders for the other cell # types ordispider(xcmds, groups = short.flow.names$cell_type, label = F, show.groups = c("core cell types", "Cytokine", "Humoral", "Inflammatory", "Regulatory"), spiders = "centroid") legend("topleft", fill = adjustcolor(cell_type_pal, alpha = 1/255 * 125), levels(short.flow.names$cell_type), cex = 0.75, bty = "n") mtext("b", side = 3, adj = -0.25, line = -1, cex = 1.5) 12

13 A CD4+prolif Treg NK Th2 NKT CD8+EM IL 12 TNFa IL 10 IFNg IL 6 Lymphocyte Treg prolif mdcs pdcs IL 4 Bnaive B cell gdtcr Th1 Tc1 CD4+EM CD8+ CD8+IL2+ CD4+IL2+ Th17 CD8+EMRA CD4+EMRA Th10 Plasmablast B IgE+ CD8+GMCSF+ IL 13 CD4+IL21+ inkt IL 8 Bmem IL 17 Bswitch BAFF MBL CD8+prolif B trans CD4+CM CD8+CM Tfh CD4+GMCSF+ T cells CD4+ DCs CD8+naive CD8+RTE CD4+RTE CD4+naive CD4+prolif Treg NK Th2 NKT CD8+EM IL 12 TNFa IL 10 IFNg IL 6 Lymphocyte Treg prolif mdcs pdcs IL 4 Bnaive B cell gdtcr Th1 Tc1 CD4+EM CD8+ CD8+IL2+ CD4+IL2+ Th17 CD8+EMRA CD4+EMRA Th10 Plasmablast B IgE+ CD8+GMCSF+ IL 13 CD4+IL21+ inkt IL 8 Bmem IL 17 Bswitch BAFF MBL CD8+prolif B trans CD4+CM CD8+CM Tfh CD4+GMCSF+ T cells CD4+ DCs CD8+naive CD8+RTE CD4+RTE CD4+naive Core cell types Cytokine Humoral Inflammatory Precursor Regulatory B Second dimension of NMDS Core cell types Cytokine Humoral Inflammatory Precursor Regulatory Precursor First dimension of non metric mutlidimensional scaling (NMDS) 13

14 Figure 2 # model: response ~ visit (within # individuals) + PatientID (between # individuals) All individuals # For all flow parameters: all.models <- matrix(nrow = nrow(short.flow.names), ncol = 12) colnames(all.models) <- c("p_of_model", "r2_of_model", "prop_attributed_to_visit", "prop_attributed_to_patientid", "p_cont_healthy", "r2_cont_healthy", "prop_attributed_to_visit_cont_healthy", "prop_attributed_to_patientid_cont_healthy", "p_diarrhoea", "r2_diarrhoea", "prop_attributed_to_visit_diarrhoea", "prop_attributed_to_patientid_diarrhoea") rownames(all.models) <- short.flow.names$short_name for (i in c(13:54)) { model <- lm(all.data[, i] ~ all.data[, "visit"] + all.data[, "PatientID"], na.action = "na.exclude") all.models[c(i ), 2] <- summary(model)$r.squared all.models[c(i ), 1] <- pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F) if (i %in% c(13:23, 25:26, 28:39, 41:42, 44:46, 49:54)) { ca <- calc.relimp(model, diff = T, rela = T) all.models[c(i ), 3] <- ca$lmg[grepl(pattern = "visit", x = names(ca$lmg))] all.models[c(i ), 4] <- ca$lmg[grepl(pattern = "PatientID", x = names(ca$lmg))] rm(ca) rm(model) cat(i, sep = "\t") # All models have R2 and p from lm; those that calc.relimp can disentangle are disentangled. colnames(all.data) # Continuously healthy cohort Repeat # these loops excluding the volunteers # who got diarrhoea AND we have their # first visit for i.e. define a # 'continuously healthy' population. If # they got diarrhoea, they got diarrhoea # between visits 1 and 2. We have lots # of people with visits 2,3,4 and not # one, so they are 'continuously # healthy'. cont.healthy <- all.data[!all.data$sickduringtravel. == "yes", ] # no one who was sick # get just the mulitiple attenders: cont.healthy.ids <- cont.healthy[duplicated(cont.healthy$patientid), 14

15 ]$PatientID cont.healthy.ids <- factor(cont.healthy.ids) cont.healthy <- cont.healthy[cont.healthy$patientid %in% cont.healthy.ids, ] summary(cont.healthy$visit) cont.healthy$patientid <- factor(cont.healthy$patientid) nlevels(cont.healthy$patientid) nrow(cont.healthy) # Now get the travellers where we missed # their diarrhoea spots: sickies <- all.data[all.data$sickduringtravel. == "yes" & duplicated(all.data$patientid), ]$PatientID sickies.visits <- all.data[all.data$patientid %in% sickies, c("visit", "PatientID")] sickies.from.start <- sickies.visits[sickies.visits == 1, ]$PatientID cont.healthy <- rbind(cont.healthy, all.data[all.data$patientid %in% sickies[(!sickies %in% sickies.from.start)], ]) cont.healthy$patientid <- factor(cont.healthy$patientid) nlevels(cont.healthy$patientid) summary(cont.healthy$patientid) for (i in c(13:54)) { model <- lm(cont.healthy[, i] ~ cont.healthy[, "visit"] + cont.healthy[, "PatientID"], na.action = "na.exclude") all.models[c(i ), 6] <- summary(model)$r.squared all.models[c(i ), 5] <- pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F) if (i %in% c(13:23, 25:26, 28:39, 41:42, 44:46, 49:54)) { ca <- calc.relimp(model, diff = T, rela = T) all.models[c(i ), 7] <- ca$lmg[grepl(pattern = "visit", x = names(ca$lmg))] all.models[c(i ), 8] <- ca$lmg[grepl(pattern = "PatientID", x = names(ca$lmg))] rm(ca) rm(model) cat(i, sep = "\t") Travelers cohort definition Diarrhoeal cohort travelers <- all.data[all.data$sickduringtravel. %in% c("yes", "no"), ] travelers <- travelers[travelers$visit %in% c(1, 2), ] # Get people immediately prior and on return from travel. travelers <- travelers[c(1:117, 119), ] # There's a re-staining of a PBMC sample with date 'NA' travelers$patientid <- factor(travelers$patientid) # Remove unused levels nlevels(travelers$patientid) 15

16 both_visits <- levels(travelers$patientid)[summary(travelers$patientid, max = 1e+05) == 2] travelers <- travelers[travelers$patientid %in% both_visits, ] summary(travelers$visit) summary(travelers$sickduringtravel.) travelers$patientid <- factor(travelers$patientid) # remove empty levels all(travelers[travelers$visit == 1, ]$PatientID %in% travelers[travelers$visit == 2, ]$PatientID) all(travelers[travelers$visit == 2, ]$PatientID %in% travelers[travelers$visit == 1, ]$PatientID) # Repeat for those with diarrhoea: diarrhoea.getters.id <- all.data[all.data$visit == 1 & all.data$sickduringtravel. == "yes", ]$PatientID diarrhoea.getters.id <- factor(diarrhoea.getters.id) for (i in c(13:23, 25:54)) { model <- lm(all.data[all.data$patientid %in% diarrhoea.getters.id, i] ~ all.data[all.data$patientid %in% diarrhoea.getters.id, "visit"] + all.data[all.data$patientid %in% diarrhoea.getters.id, "PatientID"], na.action = "na.exclude") all.models[c(i ), 10] <- summary(model)$r.squared all.models[c(i ), 9] <- pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F) rm(model) cat(i, sep = "\t") MDS plot # Correlation matrix: travelers.cor <- cor(t(travelers[, 13:66]), use = "pairwise.complete.obs", method = "spearman") rownames(travelers.cor) <- paste0(travelers$patientid, "V", travelers$visit) colnames(travelers.cor) <- paste0(travelers$patientid, "V", travelers$visit) # Distance matrix: d <- dist(travelers.cor) # MDS scaling fit <- cmdscale(d, eig = TRUE, k = 2) # k is the number of dim # ff$points stores the x y coordinates, # so we pull them out for ease ff <- fit$points # Measure distances: 1. make a 'pair' # factor: ff <- as.data.frame(ff) ff$pair <- paste0(substr(rownames(ff), 1, 6)) ff$pair <- factor(ff$pair) # 2. calculate distance between visits: pc.pair.distances <- matrix(nrow = nlevels(ff$pair), 16

17 ncol = 1) # A container for the results for (i in 1:nlevels(ff$pair)) { pair2 <- ff[ff$pair %in% levels(ff$pair)[i], ] # this should give 2 rows and a single pair. pc.pair.distances[i, 1] <- sqrt(((pair2[1, 1] - pair2[2, 1]) * (pair2[1, 1] - pair2[2, 1])) # PC1 difference + ((pair2[1, 2] - pair2[2, 2]) * (pair2[1, 2] - pair2[2, 2])) # PC2 difference ) rm(pair2) # 3 split into those who had diarrhoea # and those who did not have diarrhoea: diarrhoea <- travelers[travelers$visit == 1 & travelers$sickduringtravel. == "yes", ] diarrhoea.pc.pair.distances <- pc.pair.distances[levels(ff$pair) %in% diarrhoea$patientid, ] nodiarrhoea.pc.pair.distances <- pc.pair.distances[!levels(ff$pair) %in% diarrhoea$patientid, ] # Now to construct the figure: par(mar = c(6, 5, 2, 2) + 1) m <- rbind(c(1, 1), c(1, 1), c(2, 2), c(2, 2), c(3, 4), c(3, 4), c(3, 4), c(5, 5), c(5, 5)) layout(m) plot(all.models[1:43, "r2_of_model"], ylim = c(0, 1), ylab = expression(italic("r"^"2")), xlab = "", las = 1, pch = 1, cex = 1.5, xaxt = "n") Axis(side = 1, at = 1:43, labels = rownames(all.models[1:43, ]), las = 3) # points(all.models.3[,'r2_of_model3'], # ylim = c(0,1), ylab = expression('r' ^ # '2' * ' of model'), xlab = 'Immune # parameters', las = 1, pch = 20) points(all.models[1:43, "r2_cont_healthy"], ylim = c(0, 1), pch = 20) points(all.models[1:43, "r2_diarrhoea"], ylim = c(0, 1), pch = 15) legend("bottomright", pch = c(1, 20, 15), legend = c("all volunteers", "Continuously healthy", "Acute gastroenteritis"), ncol = 3, bty = "n") mtext("a", side = 3, adj = 0, line = 1, cex = 1.5) plot(-log10(p.adjust(all.models[1:43, "p_of_model"], method = "bonferroni", n = nrow(all.models[!is.na(all.models[, "p_of_model"]), ]))), ylim = c(0, 25), ylab = expression("-log"["10"] * italic(" P")), xlab = "", las = 1, pch = 1, cex = 1.5, xaxt = "n") 17

18 Axis(side = 1, at = 1:43, labels = rownames(all.models[1:43, ]), las = 3) points(-log10(p.adjust(all.models[1:43, "p_cont_healthy"], method = "bonferroni", n = nrow(all.models[!is.na(all.models[, "p_cont_healthy"]), ]))), ylim = c(0, 25), pch = 20) points(-log10(p.adjust(all.models[1:43, "p_diarrhoea"], method = "bonferroni", n = nrow(all.models[!is.na(all.models[, "p_diarrhoea"]), ]))), ylim = c(0, 25), pch = 15) # legend('topleft', pch = c(1,20), legend # = c('all volunteers', 'No diarrhoea')) abline(h = -log10(0.05)) mtext("b", side = 3, adj = 0, line = 1, cex = 1.5) diarrhoea.pal <- brewer.pal(3, "Dark2") diarrhoea.pal <- diarrhoea.pal[1:2] diarrhoea.pal <- rev(diarrhoea.pal) plot(ff[, 1], ff[, 2], xlab = "Principal coordinate 1", ylab = "Principal coordinate 2", type = "n", las = 1) for (i in 1:nlevels(ff$pair)) { lines(ff[ff$pair == levels(ff$pair)[i], 1], ff[ff$pair == levels(ff$pair)[i], 2], col = "grey") points(ff[, 1], ff[, 2], xlab = "Coordinate 1", ylab = "Coordinate 2", type = "p", pch = ifelse(grepl(x = substr(rownames(ff), 7, 8), "V1"), 20, 18), cex = 1.3, col = ifelse(ff$pair %in% diarrhoea$patientid, diarrhoea.pal[1], diarrhoea.pal[2])) mtext("c", side = 3, adj = 0, line = 1, cex = 1.5) # legend('bottomright', pch = c(20), col # = diarrhoea.pal[c(2,1)], legend = # c('continuously healthy', 'Acute # gastroenteritis'),ncol = 1, bty = 'n', # cex = 1.3) stripchart(diarrhoea.pc.pair.distances, main = "", vertical = F, xlim = c(0, max(diarrhoea.pc.pair.distances) + 2), method = "jitter", jitter = 1, at = 4, pch = 20, xlab = "Immunological distance", col = diarrhoea.pal[1]) boxplot(pc.pair.distances, add = T, at = 2.5, horizontal = T, outline = F, frame.plot = F, axes = F) stripchart(nodiarrhoea.pc.pair.distances, vertical = F, method = "jitter", jitter = 1, add = T, at = 0, col = diarrhoea.pal[2], pch = 20) boxplot(nodiarrhoea.pc.pair.distances, add = T, at = -1.5, horizontal = T, outline = F, frame.plot = F, axes = F) axis(side = 2, at = c(0, 4), labels = c("continuously \n healthy", "Acute \n gastroenteritis"), las = 1) 18

19 legend("bottomright", pch = 20, col = diarrhoea.pal, c("acute gastroenteritis", "Continuously healthy"), ncol = 2, bty = "n", # title = ('Paired samples')) mtext("d", side = 3, adj = 0, line = 1, cex = 1.5) lines(x = c(2.7, 2.7), y = c(-1.5, 2.5), lwd = 2) text(x = 2.9, y = 0.5, adj = c(0, NA), label = paste0("p=", signif(wilcox.test(diarrhoea.pc.pair.distances, nodiarrhoea.pc.pair.distances)$p.value, digits = 2)), cex = 1.1) barplot(t(all.models[1:43 &!is.na(all.models[, 3]), 3:4]), ylab = expression("proportion of " * italic("r"^"2")), las = 2, cex.names = 1, names = gsub("btrans", " ", x = gsub("plasmablast", " ", x = gsub("cd8+gmcsf+", " ", fixed = T, x = gsub("cd4+gmcsf+", " ", fixed = T, x = gsub("b IgE+", " ", fixed = T, x = gsub("th10", " ", rownames(all.models[1:43 &!is.na(all.models[, 3]), ]))))))), legend.text = gsub("prop_attributed_to_visit", "intraindividual", gsub(pattern = "prop_attributed_to_patientid", replacement = "interindividual", colnames(all.models[1:43, 3:4]))), args.legend = list("topleft", bg = "white", bty = "o", box.lwd = 0)) mtext("e", side = 3, adj = 0, line = 1, cex = 1.5) 19

20 A R All volunteers Continuously healthy Acute gastroenteritis log10 P B C Lymphocyte T cells CD4+ CD4+RTE CD4+naive CD4+EM CD4+EMRA CD4+prolif CD4+CM Th1 Th2 Th17 Th10 Tfh CD4+IL21+ CD4+GMCSF+ CD4+IL2+ Treg Treg prolif CD8+ CD8+naive CD8+RTE CD8+CM CD8+EM CD8+EMRA CD8+prolif Tc1 CD8+IL2+ CD8+GMCSF+ gdtcr B cell B trans Bnaive Bswitch Bmem B IgE+ Plasmablast NKT inkt NK DCs mdcs pdcs Lymphocyte T cells CD4+ CD4+RTE CD4+naive CD4+EM CD4+EMRA CD4+prolif CD4+CM Th1 Th2 Th17 Th10 Tfh CD4+IL21+ CD4+GMCSF+ CD4+IL2+ Treg Treg prolif CD8+ CD8+naive CD8+RTE CD8+CM CD8+EM CD8+EMRA CD8+prolif Tc1 CD8+IL2+ CD8+GMCSF+ gdtcr B cell B trans Bnaive Bswitch Bmem B IgE+ Plasmablast NKT inkt NK DCs mdcs pdcs D Principal coordinate Acute gastroenteritis Continuously healthy p= Acute gastroenteritis Continuously healthy Principal coordinate 1 Immunological distance Proportion of R E interindividual intraindividual Lymphocyte T cells CD4+ CD4+RTE CD4+naive CD4+EM CD4+EMRA CD4+prolif CD4+CM Th1 Th2 Tfh CD4+IL2+ Treg Treg prolif CD8+ CD8+naive CD8+RTE CD8+CM CD8+EM CD8+EMRA CD8+prolif Tc1 gdtcr Bnaive Bswitch NKT inkt NK DCs mdcs 20

21 Figure 4 Firstly some calculations. # Plot the correlates: cors <- stats::cor(cbind(data[, 13:66], Age = data$ageattimeofsampling.years.), use = "pairwise.complete.obs", method = "spearman") cors <- cors[rownames(cors)!= ("Th10"), ] # there is only 1 sample with age + Th10 data cors <- cors[, colnames(cors)!= ("Th10")] # there is only 1 sample with age + Th10 data # Ready for the plotcorr xc <- cors[order(cors[, "Age"], na.last = F), order(cors[, "Age"], na.last = F)] cor.col <- bluered(11) rownames(xc) # Coloured labels for the cell type # colours xc.col <- character(length = nrow(xc)) for (i in 1:nrow(xc) - 1) { xc.col[i] <- short.flow.names[short.flow.names$short_name == rownames(xc)[i], ]$cell_type_col # Age - the final row - should be black xc.col[nrow(xc)] <- "black" # Calculate the p-values for age colnames(xc) ord.flow.data <- data[, colnames(xc)[1:ncol(xc) - 1]] # Get the data in the right order, so we get p values to correspond with the r2 in the corrplot ord.flow.data <- cbind(ord.flow.data, Age = data$ageattimeofsampling.years.) # Get the data in the righ ord.flow.data <- ord.flow.data[!is.na(ord.flow.data$age), ] summary(ord.flow.data) p.values <- numeric() for (i in 1:(ncol(ord.flow.data) - 1)) { p.temp <- pspearman::spearman.test(ord.flow.data[!is.na(ord.flow.data[, i]), i], ord.flow.data[!is.na(ord.flow.data[, i]), ]$Age, approximation = "t-distribution") p.temp$p.value p.values[i] <- p.temp$p.value cat(i) p.values <- p.adjust(p = p.values, method = "bonferroni", n = length(p.values)) Code for panel A. plotcorr3(xc, mar = c(0.1 + c(0, 0, 0, 8)), type = "lower", col = cor.col[5 * xc + 6], cex.lab = 0.5, diag = T, col.lab = xc.col) mtext("a", side = 3, adj = 0, line = 0, cex = 1.5) par(fig = c(0, 1, 0, 1), mar = c(0, 4, 0, 0), omi = c(2.9, 7.3/2-0.5, 0.8, 0.2), mgp = c(1, 0.6, 0), new = TRUE) # Over-plot for inset graphs control. # Top of inset figure 21

22 plot(xc[!rownames(xc) %in% c("age"), "Age"], ylab = "", ylim = c(-1, 1), pch = 20, col = cor.col[5 * xc[, "Age"] + 6], cex = 1, xlab = "", las = 1, cex.axis = 0.75, xaxt = "n") points(xc[!rownames(xc) %in% c("age"), "Age"], pch = 21, cex = 1) axis(side = 1, at = c(0, 10, 20, 30, 40, 50), labels = F, tick = T) abline(v = c(10, 20, 30, 40, 50, 60, 70, 80), lty = 3, cex = 1.5) mtext(expression(italic("r")), side = 2, line = 1.5, adj = 0.5, cex = 0.75) par(fig = c(0, 1, 0, 1), mar = c(0, 4, 0, 0), omi = c(1.9, 7.3/2-0.5, 1.8, 0.2), mgp = c(1, 0.6, 0), new = TRUE) plot(-log10(p.values), ylab = "", pch = 20, cex = 1, xlab = "Immune parameter", las = 1, ylim = c(0, 30), cex.axis = 0.75, xaxt = "n") Axis(side = 1, at = c(0, 10, 20, 30, 40, 50), cex.axis = 0.75) mtext(text = "Immune parameter", side = 1, line = 1.5, adj = 0.5, cex = 0.75) mtext(expression("-log"["10"] * italic(" P")), side = 2, line = 1.5, adj = 0.6, cex = 0.75) abline(v = c(10, 20, 30, 40, 50, 60, 70, 80), lty = 3, cex = 1.5) abline(h = -log10(0.01), cex = 3) # Bottom half of inset figure age.sex.models <- matrix(nrow = nrow(short.flow.names), ncol = 8) rownames(age.sex.models) <- rownames(short.flow.names$short_name) for (i in c(13:23, 25:66)) { model <- lm(data[, i] ~ data[, "Ageattimeofsampling.years."] + data[, "Sex"], na.action = "na.exclude") library(relaimpo) ca <- calc.relimp(model, diff = T, rela = T) age.sex.models[c(i ), 1] <- ca$r2 age.sex.models[c(i ), 2] <- ca$lmg[grepl(pattern = "Age", x = names(ca$lmg))] age.sex.models[c(i ), 3] <- ca$lmg[grepl(pattern = "Sex", x = names(ca$lmg))] age.sex.models[c(i ), 4] <- pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F) model <- lm(data[, i] ~ data[, "Ageattimeofsampling.years."], na.action = "na.exclude") age.sex.models[c(i ), 5] <- summary(model)$r.squared age.sex.models[c(i ), 6] <- pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F) model <- lm(data[, i] ~ data[, "Sex"], na.action = "na.exclude") 22

23 age.sex.models[c(i ), 7] <- summary(model)$r.squared age.sex.models[c(i ), 8] <- pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F) cat(i) 23

24 Panels B to K drawn here. correlations <- cbind(r = xc[rownames(xc)!= "Age", "Age"], p.values, neg.log10.p = -log10(p.values)) correlations <- cbind(correlations, r2 = correlations[, "r"] * correlations[, "r"]) #r2 here is not truly R2, but it is a handy way to set an r < # Now we can plot a few, targeted # scatterplots to show a correlation with # age sig.correlations <- correlations[correlations[, "p.values"] < 0.01 & correlations[, "r2"] > , ] sig.correlations <- as.data.frame(sig.correlations) # sig.correlations <- # sig.correlations[rev(order(apply(data[! # is.na(data$ageattimeofsampling.years.) #,rownames(sig.correlations)], 2, # function(x) max(x, na.rm = TRUE)))),] # #this was to sort by magnitude of # response sig.correlations$cell_type <- factor("", levels = levels(short.flow.names$cell_type)) sig.correlations$cell_type_col <- as.character(1:nrow(sig.correlations)) for (i in 1:nrow(sig.correlations)) { sig.correlations$cell_type[i] <- short.flow.names[short.flow.names$short_name == rownames(sig.correlations)[i], ]$cell_type sig.correlations$cell_type_col[i] <- short.flow.names[short.flow.names$short_name == rownames(sig.correlations)[i], ]$cell_type_col sig.correlations$fudge_for_ordering <- as.character(sig.correlations$cell_type) sig.correlations[sig.correlations$fudge_for_ordering == "Cytokine", ]$fudge_for_ordering <- "A" sig.correlations <- sig.correlations[order(sig.correlations$fudge_for_ordering, decreasing = T), ] # Sort these into order cell_type groups: library(rcolorbrewer) data$sex_cols <- factor(data$sex, levels = c("f", "M")) pal <- brewer.pal(3, "Set1") par(mfrow = c(2, 5)) par(cex = 1) par(mar = c(0.75, 0.75, 0.75, 0.75), oma = c(2, 3, 1, 3)) par(tcl = -0.25) par(mgp = c(2, 0.4, 0)) for (i in 1:nrow(sig.correlations)) { plot(data[, rownames(sig.correlations)[i]] ~ data$ageattimeofsampling.years., col = pal[data$sex_cols], axes = FALSE, type = "p", pch = 20, cex = 0.6, ylim = c(ifelse(sig.correlations$cell_type[i]!= "Cytokine", 0, min(data[!is.na(data$ageattimeofsampling.years.), rownames(sig.correlations)[i]], na.rm = T)), 1.2 * max(data[!is.na(data$ageattimeofsampling.years.), 24

25 rownames(sig.correlations)[i]], na.rm = T)), yaxt = "n", xaxt = "n") Axis(x = ifelse(1.2 * max(data[!is.na(data$ageattimeofsampling.years.), rownames(sig.correlations)[i]], na.rm = T) < 100, 1.2 * max(data[!is.na(data$ageattimeofsampling.years.), rownames(sig.correlations)[i]], na.rm = T), 100), side = ifelse(sig.correlations$cell_type[i] == "Cytokine", 4, 2), labels = T, las = 1, cex.axis = 0.6) box() mtext(letters[i + 1], side = 3, line = 0, adj = 0, cex = 1.5) mtext(rownames(sig.correlations)[i], side = 3, line = 0.75, adj = 1, cex = 0.75, col = sig.correlations$cell_type_col[i]) mtext(paste0("p=", signif(as.numeric(sig.correlations[i, "p.values"]), digits = 2)), side = 3, line = 0, adj = 1, cex = 0.75) if (i == 1) legend("topright", c("f", "M"), col = pal[1:nlevels(data$sex_cols)], pch = 20, cex = 0.6, ncol = 1) if (i %in% c(6:10)) Axis(x = data$ageattimeofsampling.years., side = 1, cex.axis = 0.75, at = c(0, 25, 50, 75)) if (i %in% c(1:5)) Axis(x = data$ageattimeofsampling.years., side = 1, at = c(0, 25, 50, 75), labels = F, tick = T, tcl = 0.25) abline(coef(lm(data[, rownames(sig.correlations)[i]] ~ data$ageattimeofsampling.years.)), lwd = 2) abline(coef(lm(data[data$sex == "M", rownames(sig.correlations)[i]] ~ data[data$sex == "M", "Ageattimeofsampling.years."], na.action = na.omit)), untf = T, col = pal[2], lwd = 2) abline(coef(lm(data[data$sex == "F", rownames(sig.correlations)[i]] ~ data[data$sex == "F", "Ageattimeofsampling.years."], na.action = na.omit)), untf = T, col = pal[1], lwd = 2) mtext("age", side = 1, outer = TRUE, line = 1, cex = 0.75) mtext("% flow parameter", side = 2, outer = TRUE, line = 1, cex = 0.75) mtext(expression("log"[10] * "[cytokine]/pg.ml"^"-1"), side = 4, outer = TRUE, line = 1, cex = 0.75) 25

26 Code for panels L and M par(mfcol = c(1, 2), mar = c(3, 5, 0.5, 0.5), omi = c(0, 0, 0, 0), mgp = c(1, 0.6, 0)) plot(age.sex.models[, 5], ylab = "", xlab = "", xaxt = "n", yaxt = "n", las = 1, pch = 15, cex = 0.75, ylim = c(0, 1), cex.axis = 0.75, cex.lab = 0.75, col = "grey") # Age points(age.sex.models[, 7], ylim = c(0, 1), ylab = expression(italic("r"^"2")), xlab = "", las = 1, cex = 0.75, xaxt = "n", pch = 20) # Sex points(age.sex.models[, 1], ylim = c(0, 1), ylab = expression(italic("r"^"2")), xlab = "", las = 1, cex = 1.25, xaxt = "n", pch = 1, col = "black") # Both Axis(side = 1, at = c(0, 10, 20, 30, 40, 50), cex.axis = 0.75) Axis(side = 2, at = c(0, 0.5, 1), las = 1, cex.axis = 0.75) mtext(text = "Immune parameter", side = 1, line = 1.5, adj = 0.5, cex = 0.75) mtext(expression(italic("r"^"2")), side = 2, line = 1.5, adj = 0.6, cex = 0.75) mtext("l", side = 3, adj = -0.3, line = -1, cex = 1.5) legend("topleft", pch = c(1, 20, 15), pt.cex = c(1.25, 0.75, 0.75), col = c("black", "black", "grey"), ncol = 3, legend = c("sex + age", "Sex", "Age"), cex = 0.75, bty = "n") plot(-log10(p.adjust(age.sex.models[, 6], method = "bonferroni", n = nrow(age.sex.models))), ylim = c(0, 40), ylab = "", xlab = "", xaxt = "n", las = 1, pch = 15, cex = 0.75, cex.axis = 0.75, cex.lab = 0.75, yaxt = "n", col = "grey") points(-log10(p.adjust(age.sex.models[, 8], method = "bonferroni", n = nrow(age.sex.models))), ylim = c(0, 25), pch = 20, cex = 0.75) points(-log10(p.adjust(age.sex.models[, 4], method = "bonferroni", n = nrow(age.sex.models))), ylim = c(0, 25), pch = 1, cex = 1.25) Axis(side = 1, at = c(0, 10, 20, 30, 40, 50), cex.axis = 0.75) Axis(side = 2, at = c(0, 20, 40), cex.axis = 0.75, las = 1) mtext(text = "Immune parameter", side = 1, line = 1.5, adj = 0.5, cex = 0.75) abline(h = -log10(0.05)) mtext(expression("-log"["10"] * italic(" P")), side = 2, line = 1.5, adj = 0.6, cex = 0.75) mtext("m", side = 3, adj = -0.3, line = -1, cex = 1.5) 26

27 data[, rownames(sig.correlations)[i]] data[, rownames(sig.correlations)[i]] % flow parameter A CD4+RTE B trans CD8+RTE CD4+naive CD8+naive DCs B cell pdcs IL 17 Bnaive CD8+EM gdtcr CD4+GMCSF+ IL 12 Th17 IL 13 IL 8 MBL Tfh BAFF Treg prolif IL 4 Bswitch Bmem CD4+ CD8+prolif Lymphocyte CD4+EMRA Th2 CD4+IL21+ IL 10 CD8+GMCSF+ Plasmablast IFNg CD4+CM T cells Treg NKT CD8+EMRA TNFa B IgE+ NK mdcs CD4+EM CD4+prolif CD8+CM CD8+ IL 6 Th1 CD4+IL2+ inkt Tc1 CD8+IL2+ Age CD4+RTE B trans CD8+RTE CD4+naive CD8+naive data[, rownames(sig.correlations)[i]] data[, rownames(sig.correlations)[i]] B CD4+RTE p=3e 18 F M DCs B cell pdcs IL 17 Bnaive CD8+EM gdtcr CD4+GMCSF+ IL 12 Th17 IL 13 IL 8 MBL Tfh BAFF Treg prolif IL 4 Bswitch Bmem CD4+ CD8+prolif Lymphocyte CD4+EMRA Th2 CD4+IL21+ IL 10 CD8+GMCSF+ Plasmablast IFNg CD4+CM T cells Treg NKT CD8+EMRA TNFa B IgE+ NK mdcs CD4+EM CD4+prolif CD8+CM CD8+ IL 6 Th1 CD4+IL2+ inkt Tc1 CD8+IL2+ Age C data[, rownames(sig.correlations)[i]] data[, rownames(sig.correlations)[i]] B trans p=8.1e r log 10 P D CD8+RTE p=1.1e Immune parameter Immune parameter 15 data$ageattimeofsampling.years. data$ageattimeofsampling.years. data$ageattimeofsampling.years. data$ageattimeofsampling.years. data$ageattimeofsampling.years G Tc1 p=4.9e H CD8+IL2+ p=7.8e I data[, rownames(sig.correlations)[i]] data[, rownames(sig.correlations)[i]] CD8+ p=7.8e E data[, rownames(sig.correlations)[i]] data[, rownames(sig.correlations)[i]] Th1 p=2.9e F CD4+IL2+ p=5.2e 23 data$ageattimeofsampling.years. data$ageattimeofsampling.years. data$ageattimeofsampling.years. data$ageattimeofsampling.years. data$ageattimeofsampling.years. 1.0 L Sex + age Sex Age 0.5 R Age M log 10 P J inkt p=6.2e K IL 6 p=6.6e log 10 [cytokine]/pg.ml Immune parameter Immune parameter 27

28 Figure 5 par(mar = c(6, 5, 2, 3) + 1) mm <- cbind(c(1, 1, 2, 2, 3, 3, 3, 5, 5), c(1, 1, 2, 2, 4, 4, 4, 5, 5)) layout(mm) # Now we can try a depression/anxiety # model: colnames(data) HADS <- matrix(nrow = , ncol = 4) rownames(hads) <- colnames(data)[13:66] colnames(hads) <- c("p_of_model", "r2_of_model", "HADSdepression", "HADSanxiety") # visually inspect ca$lmg from the loop functions below to check c for (i in c(13:24, 26:64)) { model <- lm(data[, i] ~ data$hadsdepressionscore + data$hadsanxietyscore) HADS[c(i ), 1] <- pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F) HADS[c(i ), 2] <- summary(model)$r.squared if (i %in% c(13:24, 26:63)) { ca <- calc.relimp(model, diff = T, rela = T) HADS[c(i ), 3] <- ca$lmg[grepl(pattern = "depression", x = names(ca$lmg))] HADS[c(i ), 4] <- ca$lmg[grepl(pattern = "anxiety", x = names(ca$lmg))] rm(ca) rm(model) cat(i, sep = "\t") colnames(data)[13:66] == short.flow.names$short_name length(colnames(data)[13:66]) length(short.flow.names$short_name) # Check there are no children with BMI # measurements: summary(data[data$ageattimeofsampling.years. < 18, ]$BMI) # No one under the age of 18 has a BMI # listed. BMI.models <- matrix(nrow = , ncol = 8) rownames(bmi.models) <- colnames(data)[13:66] colnames(bmi.models) <- c("p_of_model", "r2_of_model", "BMI", "Age", "r2_for_bmi_alone", "p_for_bmi_alone", "r2_for_age_alone", "p_for_age_alone") # visually inspect ca$lmg from the loop functions below to c data.no.kids <- data[data$ageattimeofsampling.years. >= 18, ] # Skipping out any children. for (i in c(13:24, 26:65)) { model <- lm(data.no.kids[, i] ~ data.no.kids$bmi + data.no.kids$ageattimeofsampling.years., na.action = "na.exclude") 28

29 BMI.models[c(i ), 1] <- pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F) BMI.models[c(i ), 2] <- summary(model)$r.squared cat(i) ca <- calc.relimp(model, diff = T, rela = T) BMI.models[c(i ), 3] <- ca$lmg[grepl(pattern = "BMI", x = names(ca$lmg))] BMI.models[c(i ), 4] <- ca$lmg[grepl(pattern = "Age", x = names(ca$lmg))] rm(ca) for (i in c(13:24, 26:65)) { model.bmi <- lm(data.no.kids[, i] ~ data.no.kids$bmi) BMI.models[c(i ), 5] <- summary(model.bmi)$r.squared BMI.models[c(i ), 6] <- pf(summary(model.bmi)$fstatistic[1], summary(model.bmi)$fstatistic[2], summary(model.bmi)$fstatistic[3], lower.tail = F) model.age <- lm(data.no.kids[, i] ~ data.no.kids$ageattimeofsampling.years.) BMI.models[c(i ), 7] <- summary(model.age)$r.squared BMI.models[c(i ), 8] <- pf(summary(model.age)$fstatistic[1], summary(model.age)$fstatistic[2], summary(model.age)$fstatistic[3], lower.tail = F) rm(model) cat(i, sep = "\t") plot(bmi.models[, 7], ylim = c(0, 1), ylab = expression(italic("r"^"2")), xlab = "", las = 1, pch = 15, cex = 1.5, xaxt = "n", col = "grey") # Age only Axis(side = 1, at = 1:53, labels = rownames(all.models[1:53, ]), las = 3) points(bmi.models[, 5], pch = 20) # BMI only points(bmi.models[, 2], pch = 1, cex = 1.5) # Both legend("topleft", pch = c(1, 20, 15), pt.cex = c(1.5, 1, 1), col = c("black", "black", "grey"), legend = c("bmi + age", "BMI", "Age"), ncol = 3, bty = "n") mtext("a", side = 3, adj = 0, line = 1, cex = 1.5) plot(-log10(p.adjust(bmi.models[, 8], method = "bonferroni", n = 65-13)), ylab = expression("-log"["10"] * italic(" P")), xlab = "", las = 1, pch = 15, cex = 1, col = "grey", xaxt = "n", ylim = c(0, 40)) points(-log10(p.adjust(bmi.models[, 6], method = "bonferroni", n = 65-13)), pch = 20) # BMI only points(-log10(p.adjust(bmi.models[, 1], method = "bonferroni", n = 65-13)), pch = 1, cex = 1.5) # Both abline(h = -log10(0.05)) Axis(side = 1, at = 1:53, labels = rownames(all.models[1:53, ]), las = 3) mtext("b", side = 3, adj = 0, line = 1, cex = 1.5) 29

30 plot(data.no.kids$bmi ~ data.no.kids$ageattimeofsampling.years., ylab = "BMI", xlab = "Age", col = pal[data.no.kids$sex_cols], pch = 20, cex = 1.5) abline(coef(lm(data.no.kids$bmi ~ data.no.kids$ageattimeofsampling.years.)), lwd = 2.5) abline(coef(lm(data.no.kids[data.no.kids$sex == "M", ]$BMI ~ data.no.kids[data.no.kids$sex == "M", "Ageattimeofsampling.years."], na.action = na.omit)), untf = T, col = pal[2], lwd = 2.5) abline(coef(lm(data.no.kids[data.no.kids$sex == "F", ]$BMI ~ data.no.kids[data.no.kids$sex == "F", "Ageattimeofsampling.years."], na.action = na.omit)), untf = T, col = pal[1], lwd = 2.5) legend("topleft", c("f", "M"), col = pal[1:nlevels(data.no.kids$sex_cols)], pch = 20, bty = "n") mtext("c", side = 3, adj = 0, line = 1, cex = 1.5) BMI.models <- cbind(bmi.models, filt_bmi_r2 = BMI.models[, 3], filt_age_r2 = BMI.models[, 4]) colnames(bmi.models) BMI.models[, "filt_bmi_r2"] <- ifelse(p.adjust(bmi.models[, 1], method = "bonferroni", n = ) < 0.05, BMI.models[, "filt_bmi_r2"], 0) BMI.models[, "filt_age_r2"] <- ifelse(p.adjust(bmi.models[, 1], method = "bonferroni", n = ) < 0.05, BMI.models[, "filt_age_r2"], 0) barplot(t(bmi.models[p.adjust(bmi.models[, 1], method = "bonferroni", n = ) < 0.05 &!is.na(bmi.models[, 1]), c("filt_bmi_r2", "filt_age_r2")]), ylab = expression("proportion of " * italic("r"^"2")), las = 2, cex.names = 1, names.arg = rownames(bmi.models[p.adjust(bmi.models[, 1], method = "bonferroni", n = ) < 0.05 &!is.na(bmi.models[, 1]), ]), legend.text = substr(colnames(bmi.models[p.adjust(bmi.models[, 1], method = "bonferroni", n = ) < 0.05, c("filt_bmi_r2", "filt_age_r2")]), 6, 8), args.legend = list(x = 9, y = 0.9, bg = "white", box.lwd = 0)) mtext("d", side = 3, adj = 0, line = 1, cex = 1.5) plot(hads[, 2], ylim = c(0, 1), ylab = expression(italic("r"^"2")), las = 1, xlab = "", pch = 20, xaxt = "n") par(new = TRUE) plot(-log10(p.adjust(hads[, 1], method = "bonferroni", n = )), ylab = "", xlab = "", ylim = c(0, 10), pch = 20, col = "grey", xaxt = "n", frame = F, yaxt = "n") Axis(x = -log10(p.adjust(hads[, 1], method = "bonferroni", n = )), side = 4, las = 1) Axis(side = 1, at = 1:53, labels = rownames(all.models[1:53, ]), las = 3) legend("topleft", bty = "n", pch = 20, col = c("black", 30

31 "grey"), c(expression(italic("r"^"2") * " for HADS"), expression("-log"["10"] * italic(" P") * " for HADS")), ncol = 2) mtext(expression("-log"["10"] * italic(" P")), side = 4, line = 2, cex = 0.6) mtext("e", side = 3, adj = 0, line = 1, cex = 1.5) abline(h = -log10(0.05), col = "grey") data.bmi <- data.no.kids[!(is.na(data.no.kids$bmi)), ] 31

32 A R BMI + age BMI Age Lymphocyte T cells CD4+ CD4+RTE CD4+naive CD4+EM CD4+EMRA CD4+prolif CD4+CM Th1 Th2 Th17 Th10 Tfh CD4+IL21+ CD4+GMCSF+ CD4+IL2+ Treg Treg prolif CD8+ CD8+naive CD8+RTE CD8+CM CD8+EM CD8+EMRA CD8+prolif Tc1 CD8+IL2+ CD8+GMCSF+ gdtcr B cell B trans Bnaive Bswitch Bmem B IgE+ Plasmablast NKT inkt NK DCs mdcs pdcs TNFa IL 4 IL 6 IL 8 IL 10 IL 17 IL 12 IL 13 IFNg BAFF B log10 P Lymphocyte T cells CD4+ CD4+RTE CD4+naive CD4+EM CD4+EMRA CD4+prolif CD4+CM Th1 Th2 Th17 Th10 Tfh CD4+IL21+ CD4+GMCSF+ CD4+IL2+ Treg Treg prolif CD8+ CD8+naive CD8+RTE CD8+CM CD8+EM CD8+EMRA CD8+prolif Tc1 CD8+IL2+ CD8+GMCSF+ gdtcr B cell B trans Bnaive Bswitch Bmem B IgE+ Plasmablast NKT inkt NK DCs mdcs pdcs TNFa IL 4 IL 6 IL 8 IL 10 IL 17 IL 12 IL 13 IFNg BAFF BMI C F M Proportion of R D Age BMI Age Th1 CD4+IL2+ CD8+RTE Tc1 CD8+IL2+ IL 6 E R R 2 for HADS log 10 P for HADS Lymphocyte T cells CD4+ CD4+RTE CD4+naive CD4+EM CD4+EMRA CD4+prolif CD4+CM Th1 Th2 Th17 Th10 Tfh CD4+IL21+ CD4+GMCSF+ CD4+IL2+ Treg Treg prolif CD8+ CD8+naive CD8+RTE CD8+CM CD8+EM CD8+EMRA CD8+prolif Tc1 CD8+IL2+ CD8+GMCSF+ gdtcr B cell B trans Bnaive Bswitch Bmem B IgE+ Plasmablast NKT inkt NK DCs mdcs pdcs TNFa IL 4 IL 6 IL 8 IL 10 IL 17 IL 12 IL 13 IFNg BAFF log 10 P 32

33 Figure 6 parents <- data[grepl(pattern = "PB", data$patientid) grepl(pattern = "PA", data$patientid), ] # Call the family pairs: parents$pairid <- substr(parents$patientid, start = 1, stop = 6) parents$pairid <- factor(parents$pairid) summary(parents$pairid, max = 1000) # Separate the siblings out parents$siblings <- substr(parents$patientid, start = 7, stop = 8) summary(factor(parents$siblings)) parents <- parents[parents$siblings %in% c("f", "m", "M"), ] parents$siblings <- gsub(pattern = "m", replacement = "M", parents$siblings) # Find duplicate paired parents paired.parents.list <- parents[duplicated(parents$pairid), ]$PairID paired.parents <- parents[parents$pairid %in% paired.parents.list, ] paired.parents$pairid <- factor(paired.parents$pairid) # Lose empty levels of the factor parent.xc <- cor(t(paired.parents[, 13:66]), use = "pairwise.complete.obs", method = "spearman") ncol(parent.xc) rownames(parent.xc) <- paired.parents$patientid colnames(parent.xc) <- paired.parents$patientid View(parent.xc) parent.xc2 <- parent.xc[rownames(parent.xc)!= c("pa/031f", "PA/031M"), colnames(parent.xc)!= c("pa/031f", "PA/031M")] # These only have cytokine data nrow(parent.xc2) m <- rbind(c(1, 2, 2)) layout(m) d <- dist(parent.xc2) fit <- cmdscale(d, eig = TRUE, k = 2) # k is the number of dim ff <- fit$points pal <- brewer.pal(3, "Set1") # Measure distances: 1. make a 'pair' # factor: ff <- as.data.frame(ff) ff$pair <- paste0(substr(rownames(ff), 1, 2), substr(rownames(ff), 4, 6)) ff$pair <- factor(ff$pair) summary(ff$sex) # plot(fit$points[,1], fit$points[,2], # xlab='coordinate 1', ylab='coordinate # 2') plot(ff[, 1], ff[, 2], xlab = "Coordinate 1", ylab = "Coordinate 2", cex.axis = 1.3, type = "n", pch = 20) 33

34 for (i in 1:nlevels(ff$pair)) { lines(ff[ff$pair == levels(ff$pair)[i], 1], ff[ff$pair == levels(ff$pair)[i], 2], col = "grey") points(ff[, 1], ff[, 2], xlab = "Coordinate 1", ylab = "Coordinate 2", cex.axis = 1.3, type = "p", pch = 20, col = ifelse(substr(rownames(ff), 7, 7) == "M", pal[1], pal[2])) mtext("a", side = 3, adj = 0, line = 2, cex = 1.5) legend("bottomleft", pch = 20, col = pal, legend = c("f", "M"), bty = "n") # 2. calculate distance between pairs: pc.pair.distances <- matrix(nrow = nlevels(ff$pair), ncol = 1) # A container for the results for (i in 1:nlevels(ff$pair)) { pair2 <- ff[ff$pair %in% levels(ff$pair)[i], ] # this should give 2 rows and a single pair. pc.pair.distances[i, 1] <- sqrt(((pair2[1, 1] - pair2[2, 1]) * (pair2[1, 1] - pair2[2, 1])) # PC1 difference + ((pair2[1, 2] - pair2[2, 2]) * (pair2[1, 2] - pair2[2, 2])) # PC2 difference ) rm(pair2) # 3 random pairs: ff$sex <- substr(rownames(ff), 7, 7) ff$sex <- factor(ff$sex) ff <- ff[order(ff$sex), ] summary(ff$sex) dim(ff) # Now the first 70 are females; the last # 70 are males We can iterate on this - # pick a female based on # levels(ff$pair)[i], then randomly # select 5 non i males. random.pair.distances <- matrix(nrow = nlevels(ff$pair) * 5, ncol = 1) set.seed(42) # so the random samples aren't resampled every time we knit the pdf, so the figures are co for (i in 1:nlevels(ff$pair)) { pair3 <- ff[ff$pair %in% levels(ff$pair)[i] & ff$sex == "F", ] # pick out each father in turn pair3 <- rbind(pair3, ff[sample(rownames(ff)[71:140], size = 5, replace = F), ]) # randomly sample 5 women, who are in rows 77:152 for (n in (1:5)) { random.pair.distances[(i - 1) * 5 + n, 1] <- sqrt(((pair3[1, 1] - pair3[(n + 1), 1]) * (pair3[1, 1] - pair3[(n + 1), 1])) # PC1 difference + ((pair3[1, 2] - pair3[(n + 1), 2]) * (pair3[1, 34

35 2] - pair3[(n + 1), 2])) # PC2 difference ) rm(pair3) rm(ff2) stripchart(pc.pair.distances, vertical = F, xlim = c(0, max(random.pair.distances[, 1]) + 1.5), method = "jitter", jitter = 1, at = 4, pch = 20, xlab = "distance", las = 1, ylim = c(-2, 6)) boxplot(pc.pair.distances, add = T, at = 2.5, horizontal = T, outline = F, frame.plot = F, axes = F) stripchart(random.pair.distances, vertical = F, method = "jitter", jitter = 1, add = T, at = 0, col = "grey", pch = 20, frame.plot = F, axes = F) boxplot(random.pair.distances, add = T, at = -1.5, horizontal = T, outline = F, axes = F) axis(side = 2, at = c(0, 4), labels = c("random", "Parents")) legend("topleft", pch = 20, col = c("black", "grey"), c("parental pairs", "Random pairs"), ncol = 2, bty = "n") lines(x = c(6, 6), y = (c(-1.5, 2.5))) text(x = 6.2, y = 0.5, adj = 0, labels = paste0("p=", signif(wilcox.test(pc.pair.distances[, 1], random.pair.distances[, 1])$p.value, digits = 2))) mtext("b", side = 3, adj = 0, line = 2, cex = 1.5) 35

36 A B Coordinate F M Random Parents Parental pairs Random pairs p=7.7e Coordinate 1 distance

Online Appendix to Mixed Modeling for Irregularly Sampled and Correlated Functional Data: Speech Science Spplications

Online Appendix to Mixed Modeling for Irregularly Sampled and Correlated Functional Data: Speech Science Spplications Online Appendix to Mixed Modeling for Irregularly Sampled and Correlated Functional Data: Speech Science Spplications Marianne Pouplier, Jona Cederbaum, Philip Hoole, Stefania Marin, Sonja Greven R Syntax

More information

Package interspread. September 7, Index 11. InterSpread Plus: summary information

Package interspread. September 7, Index 11. InterSpread Plus: summary information Package interspread September 7, 2012 Version 0.2-2 Date 2012-09-07 Title Functions for analysing InterSpread Plus simulation output Author Mark Stevenson A package for analysing

More information

Unstable Laser Emission Vignette for the Data Set laser of the R package hyperspec

Unstable Laser Emission Vignette for the Data Set laser of the R package hyperspec Unstable Laser Emission Vignette for the Data Set laser of the R package hyperspec Claudia Beleites DIA Raman Spectroscopy Group, University of Trieste/Italy (2005 2008) Spectroscopy

More information

Canadian climate: function-on-function regression

Canadian climate: function-on-function regression Canadian climate: function-on-function regression Sarah Brockhaus Institut für Statistik, Ludwig-Maximilians-Universität München, Ludwigstraße 33, D-0539 München, Germany. The analysis is based on the

More information

fishr Vignette - Age-Length Keys to Assign Age from Lengths

fishr Vignette - Age-Length Keys to Assign Age from Lengths fishr Vignette - Age-Length Keys to Assign Age from Lengths Dr. Derek Ogle, Northland College December 16, 2013 The assessment of ages for a large number of fish is very time-consuming, whereas measuring

More information

Jian WANG, PhD. Room A115 College of Fishery and Life Science Shanghai Ocean University

Jian WANG, PhD. Room A115 College of Fishery and Life Science Shanghai Ocean University Jian WANG, PhD j_wang@shou.edu.cn Room A115 College of Fishery and Life Science Shanghai Ocean University Contents 1. Introduction to R 2. Data sets 3. Introductory Statistical Principles 4. Sampling and

More information

Correlation. January 11, 2018

Correlation. January 11, 2018 Correlation January 11, 2018 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order

More information

Measurement, Scaling, and Dimensional Analysis Summer 2017 METRIC MDS IN R

Measurement, Scaling, and Dimensional Analysis Summer 2017 METRIC MDS IN R Measurement, Scaling, and Dimensional Analysis Summer 2017 Bill Jacoby METRIC MDS IN R This handout shows the contents of an R session that carries out a metric multidimensional scaling analysis of the

More information

coenocliner: a coenocline simulation package for R

coenocliner: a coenocline simulation package for R coenocliner: a coenocline simulation package for R Gavin L. Simpson Institute of Environmental Change and Society University of Regina Abstract This vignette provides an introduction to, and user-guide

More information

Homework 6 Solutions

Homework 6 Solutions Homework 6 Solutions set.seed(1) library(mvtnorm) samp.theta

More information

R-companion to: Estimation of the Thurstonian model for the 2-AC protocol

R-companion to: Estimation of the Thurstonian model for the 2-AC protocol R-companion to: Estimation of the Thurstonian model for the 2-AC protocol Rune Haubo Bojesen Christensen, Hye-Seong Lee & Per Bruun Brockhoff August 24, 2017 This document describes how the examples in

More information

Package bpp. December 13, 2016

Package bpp. December 13, 2016 Type Package Package bpp December 13, 2016 Title Computations Around Bayesian Predictive Power Version 1.0.0 Date 2016-12-13 Author Kaspar Rufibach, Paul Jordan, Markus Abt Maintainer Kaspar Rufibach Depends

More information

Class 04 - Statistical Inference

Class 04 - Statistical Inference Class 4 - Statistical Inference Question 1: 1. What parameters control the shape of the normal distribution? Make some histograms of different normal distributions, in each, alter the parameter values

More information

Lab 9: An Introduction to Wavelets

Lab 9: An Introduction to Wavelets Lab 9: An Introduction to Wavelets June 5, 2003 In this lab, we introduce some basic concepts of wavelet theory and present the essentials required for possible applications of wavelet decompositions for

More information

Renormalizing Illumina SNP Cell Line Data

Renormalizing Illumina SNP Cell Line Data Renormalizing Illumina SNP Cell Line Data Kevin R. Coombes 17 March 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives..................................

More information

Package diffeq. February 19, 2015

Package diffeq. February 19, 2015 Version 1.0-1 Package diffeq February 19, 2015 Title Functions from the book Solving Differential Equations in R Author Karline Soetaert Maintainer Karline Soetaert

More information

Lecture 5 : The Poisson Distribution

Lecture 5 : The Poisson Distribution Lecture 5 : The Poisson Distribution Jonathan Marchini November 5, 2004 1 Introduction Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume,

More information

Metric Predicted Variable on One Group

Metric Predicted Variable on One Group Metric Predicted Variable on One Group Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information. Prior Homework

More information

Package rnmf. February 20, 2015

Package rnmf. February 20, 2015 Type Package Title Robust Nonnegative Matrix Factorization Package rnmf February 20, 2015 An implementation of robust nonnegative matrix factorization (rnmf). The rnmf algorithm decomposes a nonnegative

More information

GENERALIZED ERROR DISTRIBUTION

GENERALIZED ERROR DISTRIBUTION CHAPTER 21 GENERALIZED ERROR DISTRIBUTION 21.1 ASSIGNMENT Write R functions for the Generalized Error Distribution, GED. Nelson [1991] introduced the Generalized Error Distribution for modeling GARCH time

More information

Package CEC. R topics documented: August 29, Title Cross-Entropy Clustering Version Date

Package CEC. R topics documented: August 29, Title Cross-Entropy Clustering Version Date Title Cross-Entropy Clustering Version 0.9.4 Date 2016-04-23 Package CEC August 29, 2016 Author Konrad Kamieniecki [aut, cre], Przemyslaw Spurek [ctb] Maintainer Konrad Kamieniecki

More information

Gov 2000: 9. Regression with Two Independent Variables

Gov 2000: 9. Regression with Two Independent Variables Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Harvard University mblackwell@gov.harvard.edu Where are we? Where are we going? Last week: we learned about how to calculate a simple

More information

Gov 2000: 7. What is Regression?

Gov 2000: 7. What is Regression? Gov 2000: 7. What is Regression? Matthew Blackwell Harvard University mblackwell@gov.harvard.edu October 15, 2016 Where are we? Where are we going? What we ve been up to: estimating parameters of population

More information

Consequences of biodiversity loss diverge from expectation due to. Ocean and Earth Science, National Oceanography Centre Southampton,

Consequences of biodiversity loss diverge from expectation due to. Ocean and Earth Science, National Oceanography Centre Southampton, Supplementary Material Consequences of biodiversity loss diverge from expectation due to post-extinction compensatory responses Authors: Matthias S. Thomsen 1*, Clement Garcia 2, Stefan G. Bolam 2, Ruth

More information

Introduction to Statistics and R

Introduction to Statistics and R Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary

More information

Metric Predicted Variable on Two Groups

Metric Predicted Variable on Two Groups Metric Predicted Variable on Two Groups Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information. Goals

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

Chapter 5 Exercises 1

Chapter 5 Exercises 1 Chapter 5 Exercises 1 Data Analysis & Graphics Using R, 2 nd edn Solutions to Exercises (December 13, 2006) Preliminaries > library(daag) Exercise 2 For each of the data sets elastic1 and elastic2, determine

More information

samplesizelogisticcasecontrol Package

samplesizelogisticcasecontrol Package samplesizelogisticcasecontrol Package January 31, 2017 > library(samplesizelogisticcasecontrol) Random data generation functions Let X 1 and X 2 be two variables with a bivariate normal ditribution with

More information

Machine Learning - TP

Machine Learning - TP Machine Learning - TP Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr http://www.nathalievilla.org IUT STID (Carcassonne) & SAMM (Université Paris 1) Formation INRA, Niveau 3 Formation INRA (Niveau

More information

Using the tmle.npvi R package

Using the tmle.npvi R package Using the tmle.npvi R package Antoine Chambaz Pierre Neuvial Package version 0.10.0 Date 2015-05-13 Contents 1 Citing tmle.npvi 1 2 The non-parametric variable importance parameter 2 3 Using the tmle.npvi

More information

Package esaddle. R topics documented: January 9, 2017

Package esaddle. R topics documented: January 9, 2017 Package esaddle January 9, 2017 Type Package Title Extended Empirical Saddlepoint Density Approximation Version 0.0.3 Date 2017-01-07 Author Matteo Fasiolo and Simon Wood Maintainer Matteo Fasiolo

More information

Package mpmcorrelogram

Package mpmcorrelogram Type Package Package mpmcorrelogram Title Multivariate Partial Mantel Correlogram Version 0.1-4 Depends vegan Date 2017-11-17 Author Marcelino de la Cruz November 17, 2017 Maintainer Marcelino de la Cruz

More information

Package clustergeneration

Package clustergeneration Version 1.3.4 Date 2015-02-18 Package clustergeneration February 19, 2015 Title Random Cluster Generation (with Specified Degree of Separation) Author Weiliang Qiu , Harry Joe

More information

Follow-up data with the Epi package

Follow-up data with the Epi package Follow-up data with the Epi package Summer 2014 Michael Hills Martyn Plummer Bendix Carstensen Retired Highgate, London International Agency for Research on Cancer, Lyon plummer@iarc.fr Steno Diabetes

More information

Exercises for Applied Predictive Modeling Chapter 6 Linear Regression and Its Cousins

Exercises for Applied Predictive Modeling Chapter 6 Linear Regression and Its Cousins Exercises for Applied Predictive Modeling Chapter 6 Linear Regression and Its Cousins Max Kuhn, Kjell Johnson Version 1 January 8, 2015 The solutions in this file uses several R packages not used in the

More information

The OmicCircos usages by examples

The OmicCircos usages by examples The OmicCircos usages by examples Ying Hu and Chunhua Yan October 30, 2017 Contents 1 Introduction 2 2 Input file formats 2 2.1 segment data............................................. 2 2.2 mapping data.............................................

More information

Package leiv. R topics documented: February 20, Version Type Package

Package leiv. R topics documented: February 20, Version Type Package Version 2.0-7 Type Package Package leiv February 20, 2015 Title Bivariate Linear Errors-In-Variables Estimation Date 2015-01-11 Maintainer David Leonard Depends R (>= 2.9.0)

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 12 Analysing Longitudinal Data I: Computerised Delivery of Cognitive Behavioural Therapy Beat the Blues

More information

Introduction to ggplot2. ggplot2 is (in my opinion) one of the best documented packages in R. The full documentation for it can be found here:

Introduction to ggplot2. ggplot2 is (in my opinion) one of the best documented packages in R. The full documentation for it can be found here: Introduction to ggplot2 This practical introduces a slightly different method of creating plots in R using the ggplot2 package. The package is an implementation of Leland Wilkinson's Grammar of Graphics-

More information

Package FDRSeg. September 20, 2017

Package FDRSeg. September 20, 2017 Type Package Package FDRSeg September 20, 2017 Title FDR-Control in Multiscale Change-Point Segmentation Version 1.0-3 Date 2017-09-20 Author Housen Li [aut], Hannes Sieling [aut], Timo Aspelmeier [cre]

More information

The OmicCircos usages by examples

The OmicCircos usages by examples The OmicCircos usages by examples Ying Hu and Chunhua Yan October 14, 2013 Contents 1 Introduction 2 2 Input file formats 3 2.1 segment data............................................... 3 2.2 mapping

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 12 Analysing Longitudinal Data I: Computerised Delivery of Cognitive Behavioural Therapy Beat the Blues

More information

Some examples using the BHH2 package

Some examples using the BHH2 package Some examples using the 2 package rnesto arrios University of Wisconsin-Madison March, 2005 ontents 1 ntroduction 2 2 Permutation Test 2 unction permtest........................................ 2 3 ot

More information

R: A Quick Reference

R: A Quick Reference R: A Quick Reference Colorado Reed January 17, 2012 Contents 1 Basics 2 1.1 Arrays and Matrices....................... 2 1.2 Lists................................ 3 1.3 Loading Packages.........................

More information

AMS 132: Discussion Section 2

AMS 132: Discussion Section 2 Prof. David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz AMS 132: Discussion Section 2 All computer operations in this course will be described for the Windows

More information

Regression Analysis in R

Regression Analysis in R Regression Analysis in R 1 Purpose The purpose of this activity is to provide you with an understanding of regression analysis and to both develop and apply that knowledge to the use of the R statistical

More information

A Handbook of Statistical Analyses Using R 3rd Edition. Torsten Hothorn and Brian S. Everitt

A Handbook of Statistical Analyses Using R 3rd Edition. Torsten Hothorn and Brian S. Everitt A Handbook of Statistical Analyses Using R 3rd Edition Torsten Hothorn and Brian S. Everitt CHAPTER 12 Quantile Regression: Head Circumference for Age 12.1 Introduction 12.2 Quantile Regression 12.3 Analysis

More information

Holiday Assignment PS 531

Holiday Assignment PS 531 Holiday Assignment PS 531 Prof: Jake Bowers TA: Paul Testa January 27, 2014 Overview Below is a brief assignment for you to complete over the break. It should serve as refresher, covering some of the basic

More information

Temporal Learning: IS50 prior RT

Temporal Learning: IS50 prior RT Temporal Learning: IS50 prior RT Loading required package: Matrix Jihyun Suh 1/27/2016 This data.table install has not detected OpenMP support. It will work but slower in single threaded m Attaching package:

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

R Package ecolmod: figures and examples from Soetaert and Herman (2009)

R Package ecolmod: figures and examples from Soetaert and Herman (2009) R Package ecolmod: figures and examples from Soetaert and Herman (2009) Karline Soetaert Royal Netherlands Institute of Sea Research (NIOZ) Yerseke, The Netherlands Abstract This document contains some

More information

The Geodatabase Working with Spatial Analyst. Calculating Elevation and Slope Values for Forested Roads, Streams, and Stands.

The Geodatabase Working with Spatial Analyst. Calculating Elevation and Slope Values for Forested Roads, Streams, and Stands. GIS LAB 7 The Geodatabase Working with Spatial Analyst. Calculating Elevation and Slope Values for Forested Roads, Streams, and Stands. This lab will ask you to work with the Spatial Analyst extension.

More information

Moving into the information age: From records to Google Earth

Moving into the information age: From records to Google Earth Moving into the information age: From records to Google Earth David R. R. Smith Psychology, School of Life Sciences, University of Hull e-mail: davidsmith.butterflies@gmail.com Introduction Many of us

More information

2/1/2016. Species Abundance Curves Plot of rank abundance (x-axis) vs abundance or P i (yaxis).

2/1/2016. Species Abundance Curves Plot of rank abundance (x-axis) vs abundance or P i (yaxis). Specie Abundance Curve Plot of rank abundance (x-axi) v abundance or P i (yaxi). More divere communitie lack numerically dominant pecie, flatter line. Proportion abundance 0 200 400 600 800 A C DB F E

More information

Univariate Descriptive Statistics for One Sample

Univariate Descriptive Statistics for One Sample Department of Psychology and Human Development Vanderbilt University 1 Introduction 2 3 4 5 6 7 8 Introduction Our first step in descriptive statistics is to characterize the data in a single group of

More information

Introduction to Simple Linear Regression

Introduction to Simple Linear Regression Introduction to Simple Linear Regression 1. Regression Equation A simple linear regression (also known as a bivariate regression) is a linear equation describing the relationship between an explanatory

More information

Modern Regression HW #6 Solutions

Modern Regression HW #6 Solutions 36-401 Modern Regression HW #6 Solutions Problem 1 [32 points] (a) (4 pts.) DUE: 10/27/2017 at 3PM Given : Chick 50 150 300 50 150 300 50 150 300 50 150 300 Weight 50 150 300 50 150 300 50 150 300 Figure

More information

Package covrna. R topics documented: September 7, Type Package

Package covrna. R topics documented: September 7, Type Package Type Package Package covrna September 7, 2018 Title Multivariate Analysis of Transcriptomic Data Version 1.6.0 Author Maintainer This package provides

More information

The following R code (R version 3.3.2) was used for the counts normalization (eg: brain 3,186 cells):

The following R code (R version 3.3.2) was used for the counts normalization (eg: brain 3,186 cells): Methods Single cell raw counts normalization From the single cell sequencing result, there were 3,186 brain vascular-associated cells, 1,504 lung vascular-associated cells, and 250 brain pure astrocytes

More information

Explore the data. Anja Bråthen Kristoffersen Biomedical Research Group

Explore the data. Anja Bråthen Kristoffersen Biomedical Research Group Explore the data Anja Bråthen Kristoffersen Biomedical Research Group density 0.2 0.4 0.6 0.8 Probability distributions Can be either discrete or continuous (uniform, bernoulli, normal, etc) Defined by

More information

Chapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004)

Chapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Chapter 5 Exercises 1 Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Preliminaries > library(daag) Exercise 2 The final three sentences have been reworded For each of the data

More information

Introduction to RStudio

Introduction to RStudio Introduction to RStudio Carl Tony Fakhry Jie Chen April 4, 2015 Introduction R is a powerful language and environment for statistical computing and graphics. R is freeware and there is lot of help available

More information

Package sklarsomega. May 24, 2018

Package sklarsomega. May 24, 2018 Type Package Package sklarsomega May 24, 2018 Title Measuring Agreement Using Sklar's Omega Coefficient Version 1.0 Date 2018-05-22 Author John Hughes Maintainer John Hughes

More information

(Re)introduction to Statistics Dan Lizotte

(Re)introduction to Statistics Dan Lizotte (Re)introduction to Statistics Dan Lizotte 2017-01-17 Statistics The systematic collection and arrangement of numerical facts or data of any kind; (also) the branch of science or mathematics concerned

More information

Package gma. September 19, 2017

Package gma. September 19, 2017 Type Package Title Granger Mediation Analysis Version 1.0 Date 2018-08-23 Package gma September 19, 2017 Author Yi Zhao , Xi Luo Maintainer Yi Zhao

More information

How many states. Record high temperature

How many states. Record high temperature Record high temperature How many states Class Midpoint Label 94.5 99.5 94.5-99.5 0 97 99.5 104.5 99.5-104.5 2 102 102 104.5 109.5 104.5-109.5 8 107 107 109.5 114.5 109.5-114.5 18 112 112 114.5 119.5 114.5-119.5

More information

Conditional variable importance in R package extendedforest

Conditional variable importance in R package extendedforest Conditional variable importance in R package extendedforest Stephen J. Smith, Nick Ellis, C. Roland Pitcher February 10, 2011 Contents 1 Introduction 1 2 Methods 2 2.1 Conditional permutation................................

More information

Visualizing Big Ranking Data

Visualizing Big Ranking Data Visualizing Big Ranking Data STAT3319 Statistics Project Written Report Name: Yiming Li University Number: 2011810187 Supervisor: Dr. Philip L.H. Yu 8 May, 2014 Contents 1 Introduction 3 2 Definitions

More information

Package cellscape. October 15, 2018

Package cellscape. October 15, 2018 Package cellscape October 15, 2018 Title Explores single cell copy number profiles in the context of a single cell tree Version 1.4.0 Description CellScape facilitates interactive browsing of single cell

More information

Package msir. R topics documented: April 7, Type Package Version Date Title Model-Based Sliced Inverse Regression

Package msir. R topics documented: April 7, Type Package Version Date Title Model-Based Sliced Inverse Regression Type Package Version 1.3.1 Date 2016-04-07 Title Model-Based Sliced Inverse Regression Package April 7, 2016 An R package for dimension reduction based on finite Gaussian mixture modeling of inverse regression.

More information

Package RootsExtremaInflections

Package RootsExtremaInflections Type Package Package RootsExtremaInflections May 10, 2017 Title Finds Roots, Extrema and Inflection Points of a Curve Version 1.1 Date 2017-05-10 Author Demetris T. Christopoulos Maintainer Demetris T.

More information

Metric Predicted Variable With One Nominal Predictor Variable

Metric Predicted Variable With One Nominal Predictor Variable Metric Predicted Variable With One Nominal Predictor Variable Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more

More information

A course in statistical modelling. session 09: Modelling count variables

A course in statistical modelling. session 09: Modelling count variables A Course in Statistical Modelling SEED PGR methodology training December 08, 2015: 12 2pm session 09: Modelling count variables Graeme.Hutcheson@manchester.ac.uk blackboard: RSCH80000 SEED PGR Research

More information

Understanding p Values

Understanding p Values Understanding p Values James H. Steiger Vanderbilt University James H. Steiger Vanderbilt University Understanding p Values 1 / 29 Introduction Introduction In this module, we introduce the notion of a

More information

Introduction to ArcMap

Introduction to ArcMap Introduction to ArcMap ArcMap ArcMap is a Map-centric GUI tool used to perform map-based tasks Mapping Create maps by working geographically and interactively Display and present Export or print Publish

More information

Introductory Statistics with R: Linear models for continuous response (Chapters 6, 7, and 11)

Introductory Statistics with R: Linear models for continuous response (Chapters 6, 7, and 11) Introductory Statistics with R: Linear models for continuous response (Chapters 6, 7, and 11) Statistical Packages STAT 1301 / 2300, Fall 2014 Sungkyu Jung Department of Statistics University of Pittsburgh

More information

The evdbayes Package

The evdbayes Package The evdbayes Package April 19, 2006 Version 1.0-5 Date 2006-18-04 Title Bayesian Analysis in Extreme Theory Author Alec Stephenson and Mathieu Ribatet. Maintainer Mathieu Ribatet

More information

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data

More information

The Statistical Sleuth in R: Chapter 9

The Statistical Sleuth in R: Chapter 9 The Statistical Sleuth in R: Chapter 9 Linda Loi Kate Aloisio Ruobing Zhang Nicholas J. Horton January 21, 2013 Contents 1 Introduction 1 2 Effects of light on meadowfoam flowering 2 2.1 Data coding, summary

More information

Matematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer

Matematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer Lunds universitet Matematikcentrum Matematisk statistik Matematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer General information on labs During the rst half of the course MASA01 we will have

More information

The Rain in Spain - Tableau Public Workbook

The Rain in Spain - Tableau Public Workbook The Rain in Spain - Tableau Public Workbook This guide will take you through the steps required to visualize how the rain falls in Spain with Tableau public. (All pics from Mac version of Tableau) Workbook

More information

Star Cluster Photometry and the H-R Diagram

Star Cluster Photometry and the H-R Diagram Star Cluster Photometry and the H-R Diagram Contents Introduction Star Cluster Photometry... 1 Downloads... 1 Part 1: Measuring Star Magnitudes... 2 Part 2: Plotting the Stars on a Colour-Magnitude (H-R)

More information

Electric Fields and Equipotentials

Electric Fields and Equipotentials Electric Fields and Equipotentials Note: There is a lot to do in this lab. If you waste time doing the first parts, you will not have time to do later ones. Please read this handout before you come to

More information

Advanced Forecast. For MAX TM. Users Manual

Advanced Forecast. For MAX TM. Users Manual Advanced Forecast For MAX TM Users Manual www.maxtoolkit.com Revised: June 24, 2014 Contents Purpose:... 3 Installation... 3 Requirements:... 3 Installer:... 3 Setup: spreadsheet... 4 Setup: External Forecast

More information

How to work correctly statistically about sex ratio

How to work correctly statistically about sex ratio How to work correctly statistically about sex ratio Marc Girondot Version of 12th April 2014 Contents 1 Load packages 2 2 Introduction 2 3 Confidence interval of a proportion 4 3.1 Pourcentage..............................

More information

Problems from Chapter 3 of Shumway and Stoffer s Book

Problems from Chapter 3 of Shumway and Stoffer s Book UNIVERSITY OF UTAH GUIDED READING TIME SERIES Problems from Chapter 3 of Shumway and Stoffer s Book Author: Curtis MILLER Supervisor: Prof. Lajos HORVATH November 10, 2015 UNIVERSITY OF UTAH DEPARTMENT

More information

Eyetracking Analysis in R

Eyetracking Analysis in R Eyetracking Analysis in R Michael Seedorff Department of Biostatistics University of Iowa Jacob Oleson Department of Biostatistics University of Iowa Grant Brown Department of Biostatistics University

More information

Description of the ED library Basic Atoms

Description of the ED library Basic Atoms Description of the ED library Basic Atoms Simulation Software / Description of the ED library BASIC ATOMS Enterprise Dynamics Copyright 2010 Incontrol Simulation Software B.V. All rights reserved Papendorpseweg

More information

Package SpatPCA. R topics documented: February 20, Type Package

Package SpatPCA. R topics documented: February 20, Type Package Type Package Package SpatPCA February 20, 2018 Title Regularized Principal Component Analysis for Spatial Data Version 1.2.0.0 Date 2018-02-20 URL https://github.com/egpivo/spatpca BugReports https://github.com/egpivo/spatpca/issues

More information

R Demonstration ANCOVA

R Demonstration ANCOVA R Demonstration ANCOVA Objective: The purpose of this week s session is to demonstrate how to perform an analysis of covariance (ANCOVA) in R, and how to plot the regression lines for each level of the

More information

1 Model Economy. 1.1 Demographics

1 Model Economy. 1.1 Demographics 1 Model Economy To quantify the effects demographic dynamics have on savings, investment, rate of return on the factors of production, and, finally, current account, we calibrate a two-country general

More information

Newton s Cooling Model in Matlab and the Cooling Project!

Newton s Cooling Model in Matlab and the Cooling Project! Newton s Cooling Model in Matlab and the Cooling Project! James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University March 10, 2014 Outline Your Newton

More information

Fitting Cox Regression Models

Fitting Cox Regression Models Department of Psychology and Human Development Vanderbilt University GCM, 2010 1 Introduction 2 3 4 Introduction The Partial Likelihood Method Implications and Consequences of the Cox Approach 5 Introduction

More information

Case Study: Modelling Industrial Dryer Temperature Arun K. Tangirala 11/19/2016

Case Study: Modelling Industrial Dryer Temperature Arun K. Tangirala 11/19/2016 Case Study: Modelling Industrial Dryer Temperature Arun K. Tangirala 11/19/2016 Background This is a case study concerning time-series modelling of the temperature of an industrial dryer. Data set contains

More information

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses. ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on

More information

Jian WANG, PhD. Room A115 College of Fishery and Life Science Shanghai Ocean University

Jian WANG, PhD. Room A115 College of Fishery and Life Science Shanghai Ocean University Jian WANG, PhD j_wang@shou.edu.cn Room A115 College of Fishery and Life Science Shanghai Ocean University Useful Links Slides: http://sihua.us/biostatistics.htm Datasets: http://users.monash.edu.au/~murray/bdar/index.html

More information

Probability and Discrete Distributions

Probability and Discrete Distributions AMS 7L LAB #3 Fall, 2007 Objectives: Probability and Discrete Distributions 1. To explore relative frequency and the Law of Large Numbers 2. To practice the basic rules of probability 3. To work with the

More information