pensim Package Example (Version 1.2.9)
|
|
- Winfred Hensley
- 6 years ago
- Views:
Transcription
1 pensim Package Example (Version 1.2.9) Levi Waldron March 13, 2014 Contents 1 Introduction 1 2 Example data 2 3 Nested cross-validation Summarization and plotting Getting coefficients for model fit on all the data 4 5 Simulation of collinear high-dimensional data with survival or binary outcome Binary outcome example Survival outcome example Introduction This package acts as a wrapper to the penalized R package to add the following functionality to that package: ˆ repeated tuning on different data folds, with parallelization to take advantage of multiple processors ˆ two-dimensional tuning of the Elastic Net ˆ calculation of cross-validated risk scores through a nested cross-validation procedure, with parallelization to take advantage of multiple processors It also provides a function for simulation of collinear high-dimensional data with survival or binary response. This package was developed in support of the study by Waldron et al.[3]. This paper contains greater detail on proper application of the methods provided here. Please cite this paper when using the pensim package in your research, as well as the penalized package[2]. 1
2 2 Example data pensim provides example data from a microarray experiment investigating survival of cancer patients with lung adenocarcinomas (Beer et al., 2002)[1]. Load this data and do an initial pre-filter of genes with low IQR: > library(pensim) > data(beer.exprs) > data(beer.survival) > ##select just 100 genes to speed computation, just for the sake of example: > set.seed(1) > beer.exprs.sample <- beer.exprs[sample(1:nrow(beer.exprs), 100), ] > # > gene.quant <- apply(beer.exprs.sample, 1, quantile, probs=0.75) > dat.filt <- beer.exprs.sample[gene.quant>log2(100), ] > gene.iqr <- apply(dat.filt, 1, IQR) > dat.filt <- as.matrix(dat.filt[gene.iqr>0.5, ]) > dat.filt <- t(dat.filt) > dat.filt <- data.frame(dat.filt) > # > library(survival) > surv.obj <- Surv(beer.survival$os, beer.survival$status) Note that the expression data are in wide format, with one column per predictor (gene). It is recommended to put covariate data in a dataframe object, rather than a matrix. 3 Nested cross-validation Unbiased estimation of prediction accuracy involves two levels of cross-validation: an outer level for estimating prediction accuracy, and an inner level for model tuning. This process is simplified by the opt.nested.crossval function. It is recommended first to establish the arguments for the penalized regression by testing on the penalized package functions optl1 (for LASSO), optl2 (for Ridge) or cvl (for Elastic Net). Here we use LASSO. Setting maxlambda1=5 is not a generally recommended procedure, but is useful in this toy example to avoid converging on the null model. > library(penalized) > testfit <- optl1(response=surv.obj, + penalized=dat.filt, + fold=5, + maxlambda1=5, + positive=false, + standardize=true, + trace=false) 2
3 Now pass these arguments to opt.nested.crossval() for cross-validated calculation and assessment of risk scores, with the additional arguments: ˆ outerfold and nprocessors: number of folds for the outer level of crossvalidation, and the number of processors to use for the outer level of cross-validation (see?opt.nested.crossval) ˆ optfun and scaling: which optimization function to use (opt1d for LASSO or Ridge, opt2d for Elastic Net) - see?opt.splitval. ˆ setpen and nsim: setpen defines whether to do LASSO ( L1 ) or Ridge ( L2 ), nsim defines the number of times to repeat tuning (see?opt1d. opt2d has different required arguments.) > set.seed(1) > preds <- opt.nested.crossval(outerfold=5, nprocessors=1, #opt.nested.crossval arguments + optfun="opt1d", scaling=false, #opt.splitval arguments + setpen="l1", nsim=1, #opt1d arguments + response=surv.obj, #rest are penalized::optl1 arguments + penalized=dat.filt, + fold=5, + positive=false, + standardize=true, + trace=false) Ideally nsim would be 50, and outerfold and fold would be 10, but the values below speed computation 200x compared to these recommended values. Note that here we are using the standardize=true argument of optl1 rather than the scaling=true argument of opt.splitval. These two approaches to scaling are roughly equivalent, but the scaling approaches are not the same (scaling=true does z-score, standardize=true scales to unit central L2 norm), and results will not be identical. Also, using standardize=true scales variables but provides coeffients for the original scale, whereas using scaling=true scales variables in the training set then applies the same scales to the test set. 3.1 Summarization and plotting Cox fit on the continuous risk predictions: > coxfit.continuous <- coxph(surv.obj~preds) > summary(coxfit.continuous) Call: coxph(formula = surv.obj ~ preds) n= 86, number of events= 24 coef exp(coef) se(coef) z Pr(> z ) 3
4 preds exp(coef) exp(-coef) lower.95 upper.95 preds Concordance= (se = ) Rsquare= (max possible= ) Likelihood ratio test= 2.01 on 1 df, p= Wald test = 2.22 on 1 df, p= Score (logrank) test = 2.3 on 1 df, p= Dichotomize the cross-validated risk predictions at the median, for visualization: > preds.dichot <- preds > median(preds) Plot the ROC curve: 4 Getting coefficients for model fit on all the data Finally, we can get coefficients for the model fit on all the data, for future use. Note that nsim should ideally be greater than 1, to train the model using multiple foldings for cross-validation. The output of opt1d or opt2d will be a matrix with one row per simulation. The default behavior in opt.nested.crossval() is to take the simulation with highest cross-validated partial log likelihood (CVL), which is the recommended way to select a model from the multiple simulations. > beer.coefs <- opt1d(setpen="l1",nsim=1, + response=surv.obj, + penalized=dat.filt, + fold=5, + maxlambda1=5, + positive=false, + standardize=true, + trace=false) We can also include unpenalized covariates, if desired. Note that when keeping only one variable for a penalized or unpenalized covariate, indexing a dataframe like [1] instead of doing [, 1] preserves the variable name. With [, 1] the variable name gets converted to. > beer.coefs.unpen <- opt1d(setpen="l1", nsim=1, + response=surv.obj, + penalized=dat.filt[-1], # This is equivalent to dat.filt[,-1] + unpenalized=dat.filt[1], 4
5 > if(require(survivalroc)){ + nobs <- length(preds) + cutoff < preds.roc <- survivalroc(stime=beer.survival$os, status=beer.survival$status, + marker=preds, predict.time=cutoff, span = 0.01*nobs^(-0.20)) + plot(preds.roc$fp, preds.roc$tp, type="l", xlim=c(0, 1), ylim=c(0, 1), + xlab=paste( "FP", "\n", "AUC = ", round(preds.roc$auc, 3)), + lty=2, + ylab="tp", main="lasso predictions\n ROC curve at 12 months") + abline(0, 1) + } LASSO predictions ROC curve at 12 months TP FP AUC = Figure 1: ROC plot of cross-validated continuous risk predictions at 12 months. Note that the predictions are better if you don t randomly select 250 genes to start with! We only did this to ease the load on the CRAN checking servers. 5
6 + fold=5, + maxlambda1=5, + positive=false, + standardize=true, + trace=false) Note the non-zero first coefficient this time, due to it being unpenalized: > beer.coefs[1, 1:5] #example output with no unpenalized covariates L1 cvl U21931_at Y00285_s_at S78187_at > beer.coefs.unpen[1, 1:5] #example output with first covariate unpenalized L1 cvl U21931_at Y00285_s_at S78187_at Simulation of collinear high-dimensional data with survival or binary outcome The pensim also provides a convenient means to simulation high-dimensional expression data with (potentially censored) survival outcome or binary outcome which is dependent on specified covariates. 5.1 Binary outcome example First, generate the data. Here we simulate 20 variables. The first 15 (group a ) are uncorrelated, and have no association with outcome. The final five (group b ) have covariance of 0.8 to each other variable in that group. The response variable is associated with the first variable group b (firstonly=true) with a coefficient of 2. Binary outcomes for n s = 50 samples are simulated as a Bernoulli distribution with probability for patient s: p s = exp( βx s ) with β s,16 = 0.5 and all other β s,i equal to zero. The code for this simulation is as follows: (1) > set.seed(9) > x <- create.data(nvars=c(15, 5), 6
7 + cors=c(0, 0.8), + associations=c(0, 2), + firstonly=c(true, TRUE), + nsamples=50, + response="binary", + logisticintercept=0.5) Take a look at the simulated data: > summary(x) Length Class Mode summary 6 data.frame list associations 20 -none- numeric covariance 400 -none- numeric data 21 data.frame list > x$summary start end cors associations num firstonly a TRUE b TRUE A simple logistic model fails at variable selection in this case: > simplemodel <- glm(outcome ~., data=x$data, family=binomial) > summary(simplemodel) Call: glm(formula = outcome ~., family = binomial, data = x$data) Deviance Residuals: Min 1Q Median 3Q Max e e e e e-04 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) a a a a a a a a a
8 a a a a a a b b b b b (Dispersion parameter for binomial family taken to be 1) Null deviance: e+01 on 49 degrees of freedom Residual deviance: e-07 on 29 degrees of freedom AIC: 42 Number of Fisher Scoring iterations: 25 But LASSO does a better job, selecting several of the collinear variables in the b group of variables which are associated with outcome: > lassofit <- opt1d(nsim=3, nprocessors=1, setpen="l1", penalized=x$data[1:20], + response=x$data[, "outcome"], trace=false, fold=10) > print(lassofit) L1 cvl (Intercept) a.1 a.2 [1,] [2,] [3,] a.3 a.4 a.5 a.6 a.7 a.8 a.9 [1,] [2,] [3,] a.10 a.11 a.12 a.13 a.14 a.15 b.1 [1,] [2,] [3,] b.2 b.3 b.4 b.5 [1,] [2,] [3,] And visualize the data as a heatmap: 8
9 > dat <- t(as.matrix(x$data[, -match("outcome", colnames(x$data))])) > heatmap(dat, ColSideColors=ifelse(x$data$outcome==0, "black", "white")) a.3 a.2 a.12 a.5 a.10 a.6 a.9 a.1 a.15 a.8 a.14 a.13 a.11 a.4 a.7 b.5 b.4 b.3 b.1 b Figure 2: Heatmap of simulated data with binary response. 9
10 5.2 Survival outcome example We simulate these data in the same way, but with response= timetoevent. Here censoring is uniform random between times 2 and 10, generating approximately 34% censoring: > set.seed(1) > x <- create.data(nvars=c(15, 5), + cors=c(0, 0.8), + associations=c(0, 0.5), + firstonly=c(true, TRUE), + nsamples=50, + censoring=c(2, 10), + response="timetoevent") How many events are censored? > sum(x$data$cens==0)/nrow(x$data) [1] 0.42 Kaplan-Meier plot of this simulated cohort: 10
11 > library(survival) > surv.obj <- Surv(x$data$time, x$data$cens) > plot(survfit(surv.obj~1), ylab="survival probability", xlab="time") Survival probability time Figure 3: Kaplan-Meier plot of survival of simulated cohort. 11
12 References [1] David G. Beer, Sharon L.R. Kardia, Chiang-Ching Huang, Thomas J. Giordano, Albert M. Levin, David E. Misek, Lin Lin, Guoan Chen, Tarek G. Gharib, Dafydd G. Thomas, Michelle L. Lizyness, Rork Kuick, Satoru Hayasaka, Jeremy M.G. Taylor, Mark D. Iannettoni, Mark B. Orringer, and Samir Hanash. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med, 8(8): , [2] Jelle J Goeman. L1 penalized estimation in the cox proportional hazards model. Biometrical Journal. Biometrische Zeitschrift, 52(1):70 84, February PMID: [3] Levi Waldron, Melania Pintilie, Ming-Sound Tsao, Frances A Shepherd, Curtis Huttenhower, and Igor Jurisica. Optimized application of penalized regression methods to diverse genomic data. Bioinformatics, 27(24): , PMID: > sessioninfo() R Under development (unstable) ( r65113) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grdevices utils [6] datasets methods base other attached packages: [1] survivalroc_1.0.3 penalized_ survival_ [4] pensim_1.2.9 loaded via a namespace (and not attached): [1] MASS_ parallel_3.1.0 tools_
CGEN(Case-control.GENetics) Package
CGEN(Case-control.GENetics) Package October 30, 2018 > library(cgen) Example of snp.logistic Load the ovarian cancer data and print the first 5 rows. > data(xdata, package="cgen") > Xdata[1:5, ] id case.control
More informationsamplesizelogisticcasecontrol Package
samplesizelogisticcasecontrol Package January 31, 2017 > library(samplesizelogisticcasecontrol) Random data generation functions Let X 1 and X 2 be two variables with a bivariate normal ditribution with
More informationMatched Pair Data. Stat 557 Heike Hofmann
Matched Pair Data Stat 557 Heike Hofmann Outline Marginal Homogeneity - review Binary Response with covariates Ordinal response Symmetric Models Subject-specific vs Marginal Model conditional logistic
More informationBooklet of Code and Output for STAD29/STA 1007 Midterm Exam
Booklet of Code and Output for STAD29/STA 1007 Midterm Exam List of Figures in this document by page: List of Figures 1 NBA attendance data........................ 2 2 Regression model for NBA attendances...............
More informationBooklet of Code and Output for STAD29/STA 1007 Midterm Exam
Booklet of Code and Output for STAD29/STA 1007 Midterm Exam List of Figures in this document by page: List of Figures 1 Packages................................ 2 2 Hospital infection risk data (some).................
More informationβ j = coefficient of x j in the model; β = ( β1, β2,
Regression Modeling of Survival Time Data Why regression models? Groups similar except for the treatment under study use the nonparametric methods discussed earlier. Groups differ in variables (covariates)
More informationPackage penalized. February 21, 2018
Version 0.9-50 Date 2017-02-01 Package penalized February 21, 2018 Title L1 (Lasso and Fused Lasso) and L2 (Ridge) Penalized Estimation in GLMs and in the Cox Model Author Jelle Goeman, Rosa Meijer, Nimisha
More informationSurvival Regression Models
Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant
More informationcor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )
Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation
More informationPart [1.0] Measures of Classification Accuracy for the Prediction of Survival Times
Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times Patrick J. Heagerty PhD Department of Biostatistics University of Washington 1 Biomarkers Review: Cox Regression Model
More informationLogistic Regression 21/05
Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression
More informationRelative-risk regression and model diagnostics. 16 November, 2015
Relative-risk regression and model diagnostics 16 November, 2015 Relative risk regression More general multiplicative intensity model: Intensity for individual i at time t is i(t) =Y i (t)r(x i, ; t) 0
More informationMultivariable Fractional Polynomials
Multivariable Fractional Polynomials Axel Benner September 7, 2015 Contents 1 Introduction 1 2 Inventory of functions 1 3 Usage in R 2 3.1 Model selection........................................ 3 4 Example
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationLogistic Regressions. Stat 430
Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to
More informationSurvival analysis in R
Survival analysis in R Niels Richard Hansen This note describes a few elementary aspects of practical analysis of survival data in R. For further information we refer to the book Introductory Statistics
More informationLogistic & Tobit Regression
Logistic & Tobit Regression Different Types of Regression Binary Regression (D) Logistic transformation + e P( y x) = 1 + e! " x! + " x " P( y x) % ln$ ' = ( + ) x # 1! P( y x) & logit of P(y x){ P(y
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationThe NanoStringDiff package
The NanoStringDiff package Hong Wang 1, Chi Wang 2,3* 1 Department of Statistics, University of Kentucky,Lexington, KY; 2 Markey Cancer Center, University of Kentucky, Lexington, KY ; 3 Department of Biostatistics,
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationSurvival analysis in R
Survival analysis in R Niels Richard Hansen This note describes a few elementary aspects of practical analysis of survival data in R. For further information we refer to the book Introductory Statistics
More informationMultivariable Fractional Polynomials
Multivariable Fractional Polynomials Axel Benner May 17, 2007 Contents 1 Introduction 1 2 Inventory of functions 1 3 Usage in R 2 3.1 Model selection........................................ 3 4 Example
More informationVisualising very long data vectors with the Hilbert curve Description of the Bioconductor packages HilbertVis and HilbertVisGUI.
Visualising very long data vectors with the Hilbert curve Description of the Bioconductor packages HilbertVis and HilbertVisGUI. Simon Anders European Bioinformatics Institute, Hinxton, Cambridge, UK sanders@fs.tum.de
More informationPackage covtest. R topics documented:
Package covtest February 19, 2015 Title Computes covariance test for adaptive linear modelling Version 1.02 Depends lars,glmnet,glmpath (>= 0.97),MASS Author Richard Lockhart, Jon Taylor, Ryan Tibshirani,
More informationGeneralized Linear Models in R
Generalized Linear Models in R NO ORDER Kenneth K. Lopiano, Garvesh Raskutti, Dan Yang last modified 28 4 2013 1 Outline 1. Background and preliminaries 2. Data manipulation and exercises 3. Data structures
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationA Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic
More informationIntroduction to pepxmltab
Introduction to pepxmltab Xiaojing Wang October 30, 2018 Contents 1 Introduction 1 2 Convert pepxml to a tabular format 1 3 PSMs Filtering 4 4 Session Information 5 1 Introduction Mass spectrometry (MS)-based
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example
More informationTMA 4275 Lifetime Analysis June 2004 Solution
TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,
More informationThe coxvc_1-1-1 package
Appendix A The coxvc_1-1-1 package A.1 Introduction The coxvc_1-1-1 package is a set of functions for survival analysis that run under R2.1.1 [81]. This package contains a set of routines to fit Cox models
More informationSurvival Analysis. STAT 526 Professor Olga Vitek
Survival Analysis STAT 526 Professor Olga Vitek May 4, 2011 9 Survival Data and Survival Functions Statistical analysis of time-to-event data Lifetime of machines and/or parts (called failure time analysis
More informationLecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016
Statistics 255 - Survival Analysis Presented March 8, 2016 Dan Gillen Department of Statistics University of California, Irvine 12.1 Examples Clustered or correlated survival times Disease onset in family
More informationDisease Ontology Semantic and Enrichment analysis
Disease Ontology Semantic and Enrichment analysis Guangchuang Yu, Li-Gen Wang Jinan University, Guangzhou, China April 21, 2012 1 Introduction Disease Ontology (DO) provides an open source ontology for
More informationPoisson Regression. The Training Data
The Training Data Poisson Regression Office workers at a large insurance company are randomly assigned to one of 3 computer use training programmes, and their number of calls to IT support during the following
More informationQuantitative genetic (animal) model example in R
Quantitative genetic (animal) model example in R Gregor Gorjanc gregor.gorjanc@bfro.uni-lj.si April 30, 2018 Introduction The following is just a quick introduction to quantitative genetic model, which
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationFitting Cox Regression Models
Department of Psychology and Human Development Vanderbilt University GCM, 2010 1 Introduction 2 3 4 Introduction The Partial Likelihood Method Implications and Consequences of the Cox Approach 5 Introduction
More informationDuration of Unemployment - Analysis of Deviance Table for Nested Models
Duration of Unemployment - Analysis of Deviance Table for Nested Models February 8, 2012 The data unemployment is included as a contingency table. The response is the duration of unemployment, gender and
More informationR Hints for Chapter 10
R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.
More informationIn contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require
Chapter 5 modelling Semi parametric We have considered parametric and nonparametric techniques for comparing survival distributions between different treatment groups. Nonparametric techniques, such as
More informationIntroduction to Statistics and R
Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary
More informationOther likelihoods. Patrick Breheny. April 25. Multinomial regression Robust regression Cox regression
Other likelihoods Patrick Breheny April 25 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/29 Introduction In principle, the idea of penalized regression can be extended to any sort of regression
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationIntroduction to logistic regression
Introduction to logistic regression Tuan V. Nguyen Professor and NHMRC Senior Research Fellow Garvan Institute of Medical Research University of New South Wales Sydney, Australia What we are going to learn
More informationLecture 8 Stat D. Gillen
Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 8.1 Example of two ways to stratify Suppose a confounder C has 3 levels
More informationssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm
Kedem, STAT 430 SAS Examples: Logistic Regression ==================================== ssh abc@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm a. Logistic regression.
More informationUsing R in 200D Luke Sonnet
Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random
More informationLeftovers. Morris. University Farm. University Farm. Morris. yield
Leftovers SI 544 Lada Adamic 1 Trellis graphics Trebi Wisconsin No. 38 No. 457 Glabron Peatland Velvet No. 475 Manchuria No. 462 Svansota Trebi Wisconsin No. 38 No. 457 Glabron Peatland Velvet No. 475
More informationSurvival Prediction Under Dependent Censoring: A Copula-based Approach
Survival Prediction Under Dependent Censoring: A Copula-based Approach Yi-Hau Chen Institute of Statistical Science, Academia Sinica 2013 AMMS, National Sun Yat-Sen University December 7 2013 Joint work
More informationSTATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours
Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationMODULE 6 LOGISTIC REGRESSION. Module Objectives:
MODULE 6 LOGISTIC REGRESSION Module Objectives: 1. 147 6.1. LOGIT TRANSFORMATION MODULE 6. LOGISTIC REGRESSION Logistic regression models are used when a researcher is investigating the relationship between
More informationSurvival Analysis I (CHL5209H)
Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really
More informationLecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016
Statistics 255 - Survival Analysis Presented March 3, 2016 Motivating Dan Gillen Department of Statistics University of California, Irvine 11.1 First question: Are the data truly discrete? : Number of
More informationModeling Overdispersion
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in
More informationMAS3301 / MAS8311 Biostatistics Part II: Survival
MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the
More informationAge 55 (x = 1) Age < 55 (x = 0)
Logistic Regression with a Single Dichotomous Predictor EXAMPLE: Consider the data in the file CHDcsv Instead of examining the relationship between the continuous variable age and the presence or absence
More informationSPH 247 Statistical Analysis of Laboratory Data. April 28, 2015 SPH 247 Statistics for Laboratory Data 1
SPH 247 Statistical Analysis of Laboratory Data April 28, 2015 SPH 247 Statistics for Laboratory Data 1 Outline RNA-Seq for differential expression analysis Statistical methods for RNA-Seq: Structure and
More informationRegression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102
Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical
More informationGeneralized Linear Models
Generalized Linear Models Lecture 7. Models with binary response II GLM (Spring, 2018) Lecture 7 1 / 13 Existence of estimates Lemma (Claudia Czado, München, 2004) The log-likelihood ln L(β) in logistic
More informationHomework 5 - Solution
STAT 526 - Spring 2011 Homework 5 - Solution Olga Vitek Each part of the problems 5 points 1. Agresti 10.1 (a) and (b). Let Patient Die Suicide Yes No sum Yes 1097 90 1187 No 203 435 638 sum 1300 525 1825
More informationMultistate models and recurrent event models
and recurrent event models Patrick Breheny December 6 Patrick Breheny University of Iowa Survival Data Analysis (BIOS:7210) 1 / 22 Introduction In this final lecture, we will briefly look at two other
More informationExercise 5.4 Solution
Exercise 5.4 Solution Niels Richard Hansen University of Copenhagen May 7, 2010 1 5.4(a) > leukemia
More informationLogistic Regression - problem 6.14
Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values
More informationREGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520
REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 Department of Statistics North Carolina State University Presented by: Butch Tsiatis, Department of Statistics, NCSU
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationPart III Measures of Classification Accuracy for the Prediction of Survival Times
Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples
More informationR Output for Linear Models using functions lm(), gls() & glm()
LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base
More informationStatistical aspects of prediction models with high-dimensional data
Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by
More informationMultistate models and recurrent event models
Multistate models Multistate models and recurrent event models Patrick Breheny December 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Multistate models In this final lecture,
More informationLecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016
Proportional Hazards Model - Handling Ties and Survival Estimation Statistics 255 - Survival Analysis Presented February 4, 2016 likelihood - Discrete Dan Gillen Department of Statistics University of
More informationUnit 5 Logistic Regression Practice Problems
Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises
More informationLecture 7 Time-dependent Covariates in Cox Regression
Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationAnalysing categorical data using logit models
Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester
More informationTruck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation
Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical
More informationPAPER 218 STATISTICAL LEARNING IN PRACTICE
MATHEMATICAL TRIPOS Part III Thursday, 7 June, 2018 9:00 am to 12:00 pm PAPER 218 STATISTICAL LEARNING IN PRACTICE Attempt no more than FOUR questions. There are SIX questions in total. The questions carry
More informationFast Regularization Paths via Coordinate Descent
August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationRegression Methods for Survey Data
Regression Methods for Survey Data Professor Ron Fricker! Naval Postgraduate School! Monterey, California! 3/26/13 Reading:! Lohr chapter 11! 1 Goals for this Lecture! Linear regression! Review of linear
More informationTento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/
Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.
More informationSTAT 526 Spring Midterm 1. Wednesday February 2, 2011
STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationRegression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.
Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression
More informationWeek 7 Multiple factors. Ch , Some miscellaneous parts
Week 7 Multiple factors Ch. 18-19, Some miscellaneous parts Multiple Factors Most experiments will involve multiple factors, some of which will be nuisance variables Dealing with these factors requires
More informationGeneralised linear models. Response variable can take a number of different formats
Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion
More informationNeural networks (not in book)
(not in book) Another approach to classification is neural networks. were developed in the 1980s as a way to model how learning occurs in the brain. There was therefore wide interest in neural networks
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationAdvanced Quantitative Data Analysis
Chapter 24 Advanced Quantitative Data Analysis Daniel Muijs Doing Regression Analysis in SPSS When we want to do regression analysis in SPSS, we have to go through the following steps: 1 As usual, we choose
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationIntroduction to the Generalized Linear Model: Logistic regression and Poisson regression
Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)
More informationInteractions in Logistic Regression
Interactions in Logistic Regression > # UCBAdmissions is a 3-D table: Gender by Dept by Admit > # Same data in another format: > # One col for Yes counts, another for No counts. > Berkeley = read.table("http://www.utstat.toronto.edu/~brunner/312f12/
More informationFollow-up data with the Epi package
Follow-up data with the Epi package Summer 2014 Michael Hills Martyn Plummer Bendix Carstensen Retired Highgate, London International Agency for Research on Cancer, Lyon plummer@iarc.fr Steno Diabetes
More information