Resampling-Based Information Criteria for Adaptive Linear Model Selection
|
|
- Randell Melton
- 6 years ago
- Views:
Transcription
1 Resampling-Based Information Criteria for Adaptive Linear Model Selection Phil Reiss January 5, 2010 Joint work with Joe Cavanaugh, Lei Huang, and Amy Krain Roy
2 Outline Motivating application: amygdala connectivity Linear models and linear model selection Akaike s information criterion and its limitations Resampling and its application to model selection Simulation results Back to the connectivity application 2
3 Motivation Stein et al. (2007) proposed a network of effective amygdala connectivity comprising 10 connections among the following 8 regions of interest: AMY (amygdala); INS (insula); OFC (orbitofrontal cortex); PCC (posterior cingulate cortex); PFC (prefrontal cortex); PHG (parahippocampal gyrus); SUB (subgenual cingulate); SUP (supragenual cingulate). We have measured resting-state functional connectivity (FC) for these 10 connections in 42 individuals who also responded to a battery of psychological tests. Goal: Use multiple linear regression to investigate whether these 10 FC scores (x) predict psychological variables (y). More specifically, regress y on an appropriate subset of the x s ( model selection problem). 3
4 Figure 1: The eight brain ROIs studied by Stein et al. (2007), and the 10 connections that they identified. Black dots indicate ROIs that lie on the mid-sagittal plane shown, while grey dots indicate left-hemisphere ROIs that have been projected onto this plane for illustration. 1
5 Simple linear regression: fit least-squares line x y x y 4
6 y x2 y x2 Multiple linear regression: fit least-squares (hyper)plane x1 x1 5
7 Which has lower sum of squared residuals: exact model y = 3 + 2x + ε, or fitted model y = x + ε? y x 6
8 Basic model equation for multiple linear regression where y i = β 0 + β 1 x i β p x ip + ε i, y i is the ith subject s outcome, β 0 is the intercept of the best-fit line (or plane...), x i1,..., x ip are subject i s values for predictor 1,..., p, β 1,..., β p are coefficients or effects of predictor 1,..., p, ε i is an error term which in the nicest case is 1. independent across subjects, 2. identically distributed for each subject, 3. normally distributed. Each of these 3 conditions can be relaxed. 7
9 Model selection In many situations we are not given the p predictors in the model. Rather, we have a number perhaps a large number of candidate predictors, and we seek a subset of the predictors that will constitute the best model. More ambitiously, we may say: we seek the true model. But... 8
10 The true model has two meanings! 1. Right set of predictors. E.g., suppose a person s HAM-D score is a linear function of his/her age and a FC measure, plus random noise. Then the true model is y i = β 0 + β 1 x i1 + β 2 x i2 + ε i, where y i =ith individual s HAM-D, x i1 =age, x i2 =FC. All other linear models are underfitted, e.g. y i = β 0 + β 1 x i1 + ε i ; overfitted, e.g. y i = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 + ε i for some additional predictor x 3 ; or under/over -fitted, e.g. y i = β 0 + β 1 x i1 + β 3 x i3 + ε i. 2. Right coefficient values in above example, the actual numeric values of β 0, β 1, β 2. To reduce confusion, we ll refer to #1 as the true model, and #2 as the exact model. The exact model is inherently unknowable, assuming a finite sample. Model selection aims for the true model. 9
11 General strategy for model selection: minimize prediction error Model fit gives rise to fitted values ŷ i = ˆβ 0 + ˆβ 1 x i ˆβ p x ip minimizing the sum of squared residuals SS resid = nx (y i ŷ i ) 2 i=1 If we collected a new data set with same x s but new outcomes y +, the predicted outcomes would be ŷ 1,..., ŷ n as defined above. Generally speaking, on average, the sum of squared prediction errors SS pred = nx (y + i ŷ i) 2 i=1 is smaller for the true model than for other models. Strategy: 1. for each candidate model, estimate mean SS pred ; 2. select the model for which estimate is lowest. 10
12 How to estimate mean SS pred for each model? Naïve idea: estimate by mean SS pred = SS resid = n (y + i ŷ i ) 2 i=1 n (y i ŷ i ) 2 i=1 select model with lowest SS resid. That s a bad idea, for at least two reasons. First reason: As predictors are added to the model, SS resid always decreases. So if you choose the minimum-ss resid model, you ll always choose the largest candidate model! 11
13 Another reason: Even with the true model, SS resid is a poor estimate of mean SS pred y ynew x x SS fitted resid < SSexact resid E(SSexact pred ) < E(SSfitted pred ) SS resid is an overoptimistic (negatively biased) estimate of E(SS pred ). 12
14 Akaike information criterion (AIC) In the early 1970s, Akaike proposed to do model selection by minimizing the Kullback-Leibler information in linear model context, essentially the mean of n log(ss pred ). n log(ss resid ) is a negatively biased estimate. It turns out that, for a model with p predictors, the bias is about 2p. Thus AIC = n log(ss resid ) + 2p is an unbiased estimate of K-L info. As number of predictors grows, n log(ss resid ), 2p. AIC-minimizing model strikes an ideal balance between goodness of fit and model parsimony. 13
15 AIC with many candidate models In a standard application, we compute AIC = n log(ss resid ) + 2p for each of several candidate models, and select the model with lowest AIC. But what if we have many candidate models? E.g., with 10 possible predictors, there are 2 10 = 1024 candidate models. Even if none of these are true predictors (true model is the null model with no predictors), it is likely that at least one of these 1024 models will have lower AIC than the null model just by chance. This is closely related to the problem of multiple hypothesis testing. To account for multiplicity, need a modified AIC n log(ss resid ) + penalty for some penalty > 2p. But how to find this penalty? 14
16 Resampling We want to find a penalty such that n log(ss pred ) n log(ss resid ) + penalty, i.e., penalty n log(ss pred ) n log(ss resid ) = population quantity Q pop sample quantity Q sample. Resampling is a general way to estimate the difference between an (observed) sample quantity Q sample and a corresponding (unobservable) population quantity Q pop. Idea: Repeatedly draw resampled data sets (samples from the sample). Resample : Sample :: Sample : Population (approximately). Average Q sample Q resample is an estimate of Q pop Q sample. Two ways to resample: 1. Bootstrapping: draw a sample of n with replacement. E.g., if n = 10, might take observations 1, 3, 3, 4, 5, 7, 7, 9, 10, Subsampling: draw a sample of m < n without replacement. E.g., if n = 10, m = 8, might take observations 1, 2, 3, 4, 7, 8, 9,
17 Simulation study We simulated four collections of data sets with 20 candidate predictors, of which 0, 2, 6, or 10 were true predictors. We then compared how well the the true predictors were recovered by the following model selection methods: 1. AIC; 2. AIC c (corrected AIC); 3. CIC (Tibshirani and Knight, 1999); 4. two bootstrap methods; 5. two subsampling methods. 16
18 True coefficients AIC AIC c CIC Bootstrap, method 1 Bootstrap, method 2 Subsampling, method 1 Subsampling, method 2 17
19 True coefficients AIC AIC c CIC Bootstrap, method 1 Bootstrap, method 2 Subsampling, method 1 Subsampling, method 2 18
20 True coefficients AIC AIC c CIC Bootstrap, method 1 Bootstrap, method 2 Subsampling, method 1 Subsampling, method 2 19
21 True coefficients AIC AIC c CIC Bootstrap, method 1 Bootstrap, method 2 Subsampling, method 1 Subsampling, method 2 20
22 Back to Stein seed data Regressed the following outcomes on the ten connections: 1. Rosenberg self-esteem score 2. Mood and Anxiety Symptom Questionnaire (MASQ) General Distress: Depressive Symptoms (GDD) subscale Selected best models using 1. AIC 2. AIC c 3. our subsampling criterion 21
23 Self-esteem results 13 models, with 1 5 predictors, had somewhat lower AIC than the null model. However, both AIC c and subsampling method choose the null model as the best model, suggesting that chance variation accounts for the AIC-based findings. 22
24 MASQ-GDD results Number of models outscoring the null model is 78 for AIC and 56 for AIC c, but only 6 for subsampling method. The PCC-AMY and SUB-INS connections tended to appear in most of the best models by all three methods. Subsampling method favored more parsimonious models; AIC favored less parsimonious; AIC c in between. PCC-AMY and SUB-INS appear positively related with depression: one-sd increase in either FC score is associated with 10% increase in the GDD subscale. The coefficients of determination are roughly 11% for PCC-AMY alone, 13% for SUB-INS alone, and 25% for the model containing both. 23
25 AIC c ad IC sub AMY PHG AMY SUB INS AMY OFC AMY OFC PFC PCC AMY SUB INS SUB SUP SUP AMY SUP PCC AMY PHG AMY SUB INS AMY OFC AMY OFC PFC PCC AMY SUB INS SUB SUP SUP AMY SUP PCC Criterion value Figure 2: Information criterion values for the best models for MASQ- GDD score, based on AIC c and on ICsub ad. The dotted line in the right plot indicates ICsub ad for the null model. 2
26 Summary A popular way to select a model from a set of candidates is to take the model with lowest AIC. However, when selecting among a large set of models, the minimum-aic model is likely to overfit. Resampling-based information criteria can correct this tendency. For our data set: 1. Minimum-AIC model suggests self-esteem predicted by some of the connections; our methods suggest these effects are spurious. 2. PCC-AMY and SUB-INS seem to be positively associated with MASQ GDD subscale; our method again gives more parsimonious results than AIC. 24
Resampling-Based Information Criteria for Best- Subset Regression
University of Haifa From the SelectedWorks of Philip T. Reiss 202 Resampling-Based Information Criteria for Best- Subset Regression Philip T. Reiss, New York University Lei Huang, Columbia University Joseph
More informationResampling-based information criteria for best-subset regression
Ann Inst Stat Math (202) 64:6 86 DOI 0.007/s0463-02-0353- Resampling-based information criteria for best-subset regression Philip T. Reiss Lei Huang Joseph E. Cavanaugh Amy Krain Roy Received: 5 July 200
More informationTransformations The bias-variance tradeoff Model selection criteria Remarks. Model selection I. Patrick Breheny. February 17
Model selection I February 17 Remedial measures Suppose one of your diagnostic plots indicates a problem with the model s fit or assumptions; what options are available to you? Generally speaking, you
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationSelection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models
Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation
More informationSimple Linear Regression
Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors
More informationThis model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that
Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear
More informationHow the mean changes depends on the other variable. Plots can show what s happening...
Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationAll models are wrong but some are useful. George Box (1979)
All models are wrong but some are useful. George Box (1979) The problem of model selection is overrun by a serious difficulty: even if a criterion could be settled on to determine optimality, it is hard
More informationResampling techniques for statistical modeling
Resampling techniques for statistical modeling Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Resampling techniques p.1/33 Beyond the empirical error
More informationRegression Analysis: Basic Concepts
The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationSimultaneous Confidence Bands for the Coefficient Function in Functional Regression
University of Haifa From the SelectedWorks of Philip T. Reiss August 7, 2008 Simultaneous Confidence Bands for the Coefficient Function in Functional Regression Philip T. Reiss, New York University Available
More informationDiscrepancy-Based Model Selection Criteria Using Cross Validation
33 Discrepancy-Based Model Selection Criteria Using Cross Validation Joseph E. Cavanaugh, Simon L. Davies, and Andrew A. Neath Department of Biostatistics, The University of Iowa Pfizer Global Research
More information2.2 Classical Regression in the Time Series Context
48 2 Time Series Regression and Exploratory Data Analysis context, and therefore we include some material on transformations and other techniques useful in exploratory data analysis. 2.2 Classical Regression
More informationIntroduction to Within-Person Analysis and RM ANOVA
Introduction to Within-Person Analysis and RM ANOVA Today s Class: From between-person to within-person ANOVAs for longitudinal data Variance model comparisons using 2 LL CLP 944: Lecture 3 1 The Two Sides
More informationECON3150/4150 Spring 2016
ECON3150/4150 Spring 2016 Lecture 4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo Last updated: January 26, 2016 1 / 49 Overview These lecture slides covers: The linear regression
More informationBayesian Estimation of Prediction Error and Variable Selection in Linear Regression
Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression Andrew A. Neath Department of Mathematics and Statistics; Southern Illinois University Edwardsville; Edwardsville, IL,
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationLinear Models (continued)
Linear Models (continued) Model Selection Introduction Most of our previous discussion has focused on the case where we have a data set and only one fitted model. Up until this point, we have discussed,
More informationThe Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models
The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health
More informationStep 2: Select Analyze, Mixed Models, and Linear.
Example 1a. 20 employees were given a mood questionnaire on Monday, Wednesday and again on Friday. The data will be first be analyzed using a Covariance Pattern model. Step 1: Copy Example1.sav data file
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationAnswers to Problem Set #4
Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2
More informationECON3150/4150 Spring 2015
ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationSUPPLEMENTARY INFORMATION
SUPPLEMENTARY INFORMATION 1. Supplementary Tables 2. Supplementary Figures 1/12 Supplementary tables TABLE S1 Response to Expected Value: Anatomical locations of regions correlating with the expected value
More information1 Introduction 1. 2 The Multiple Regression Model 1
Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests
More informationSupplementary Information. Brain networks involved in tactile speed classification of moving dot patterns: the. effects of speed and dot periodicity
Supplementary Information Brain networks involved in tactile speed classification of moving dot patterns: the effects of speed and dot periodicity Jiajia Yang, Ryo Kitada *, Takanori Kochiyama, Yinghua
More informationMLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project
MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More information10. Time series regression and forecasting
10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the
More informationTuning Parameter Selection in L1 Regularized Logistic Regression
Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2012 Tuning Parameter Selection in L1 Regularized Logistic Regression Shujing Shi Virginia Commonwealth University
More informationCh 6. Model Specification. Time Series Analysis
We start to build ARIMA(p,d,q) models. The subjects include: 1 how to determine p, d, q for a given series (Chapter 6); 2 how to estimate the parameters (φ s and θ s) of a specific ARIMA(p,d,q) model (Chapter
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationFEEG6017 lecture: Akaike's information criterion; model reduction. Brendan Neville
FEEG6017 lecture: Akaike's information criterion; model reduction Brendan Neville bjn1c13@ecs.soton.ac.uk Occam's razor William of Occam, 1288-1348. All else being equal, the simplest explanation is the
More information22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)
22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are
More informationStatistics 262: Intermediate Biostatistics Model selection
Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.
More informationTracking whole-brain connectivity dynamics in the resting-state
Tracking whole-brain connectivity dynamics in the resting-state Supplementary Table. Peak Coordinates of ICNs ICN regions BA t max Peak (mm) (continued) BA t max Peak (mm) X Y Z X Y Z Subcortical networks
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationCHAPTER 6: SPECIFICATION VARIABLES
Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero
More informationChapter 7: Model Assessment and Selection
Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has
More informationChapter 3: Regression Methods for Trends
Chapter 3: Regression Methods for Trends Time series exhibiting trends over time have a mean function that is some simple function (not necessarily constant) of time. The example random walk graph from
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of
More information22s:152 Applied Linear Regression. 1-way ANOVA visual:
22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis
More informationAkaike Information Criterion
Akaike Information Criterion Shuhua Hu Center for Research in Scientific Computation North Carolina State University Raleigh, NC February 7, 2012-1- background Background Model statistical model: Y j =
More informationProblems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B
Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2
More informationSUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING
Submitted to the Annals of Applied Statistics SUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING By Philip T. Reiss, Lan Huo, Yihong Zhao, Clare
More informationDay 4: Shrinkage Estimators
Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have
More informationPDEEC Machine Learning 2016/17
PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering
More information1 The Classic Bivariate Least Squares Model
Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating
More informationHeteroscedasticity 1
Heteroscedasticity 1 Pierre Nguimkeu BUEC 333 Summer 2011 1 Based on P. Lavergne, Lectures notes Outline Pure Versus Impure Heteroscedasticity Consequences and Detection Remedies Pure Heteroscedasticity
More informationRegression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.
TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted
More informationBioinformatics: Network Analysis
Bioinformatics: Network Analysis Model Fitting COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Parameter estimation Model selection 2 Parameter Estimation 3 Generally
More informationBootstrap, Jackknife and other resampling methods
Bootstrap, Jackknife and other resampling methods Part VI: Cross-validation Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD) 453 Modern
More informationGiven a sample of n observations measured on k IVs and one DV, we obtain the equation
Psychology 8 Lecture #13 Outline Prediction and Cross-Validation One of the primary uses of MLR is for prediction of the value of a dependent variable for future observations, or observations that were
More informationLecture 30. DATA 8 Summer Regression Inference
DATA 8 Summer 2018 Lecture 30 Regression Inference Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu) Contributions by Fahad Kamran (fhdkmrn@berkeley.edu) and
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More information2. Regression Review
2. Regression Review 2.1 The Regression Model The general form of the regression model y t = f(x t, β) + ε t where x t = (x t1,, x tp ), β = (β 1,..., β m ). ε t is a random variable, Eε t = 0, Var(ε t
More informationMeasures of Fit from AR(p)
Measures of Fit from AR(p) Residual Sum of Squared Errors Residual Mean Squared Error Root MSE (Standard Error of Regression) R-squared R-bar-squared = = T t e t SSR 1 2 ˆ = = T t e t p T s 1 2 2 ˆ 1 1
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression
36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form
More informationThe Influence of Additive Outliers on the Performance of Information Criteria to Detect Nonlinearity
The Influence of Additive Outliers on the Performance of Information Criteria to Detect Nonlinearity Saskia Rinke 1 Leibniz University Hannover Abstract In this paper the performance of information criteria
More informationNiche Modeling. STAMPS - MBL Course Woods Hole, MA - August 9, 2016
Niche Modeling Katie Pollard & Josh Ladau Gladstone Institutes UCSF Division of Biostatistics, Institute for Human Genetics and Institute for Computational Health Science STAMPS - MBL Course Woods Hole,
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationMS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015
MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates
More informationHeteroskedasticity. Part VII. Heteroskedasticity
Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least
More informationOutline. Topic 13 - Model Selection. Predicting Survival - Page 350. Survival Time as a Response. Variable Selection R 2 C p Adjusted R 2 PRESS
Topic 13 - Model Selection - Fall 2013 Variable Selection R 2 C p Adjusted R 2 PRESS Outline Automatic Search Procedures Topic 13 2 Predicting Survival - Page 350 Surgical unit wants to predict survival
More informationCorrelated Data: Linear Mixed Models with Random Intercepts
1 Correlated Data: Linear Mixed Models with Random Intercepts Mixed Effects Models This lecture introduces linear mixed effects models. Linear mixed models are a type of regression model, which generalise
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationMultiple Regression. Peerapat Wongchaiwat, Ph.D.
Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression Model Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model
More informationEnsemble Methods and Random Forests
Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization
More informationModel Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection
Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist
More informationCourse Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model
Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the
More informationProblem Set 2: Box-Jenkins methodology
Problem Set : Box-Jenkins methodology 1) For an AR1) process we have: γ0) = σ ε 1 φ σ ε γ0) = 1 φ Hence, For a MA1) process, p lim R = φ γ0) = 1 + θ )σ ε σ ε 1 = γ0) 1 + θ Therefore, p lim R = 1 1 1 +
More informationLinear Regression In God we trust, all others bring data. William Edwards Deming
Linear Regression ddebarr@uw.edu 2017-01-19 In God we trust, all others bring data. William Edwards Deming Course Outline 1. Introduction to Statistical Learning 2. Linear Regression 3. Classification
More information10. Alternative case influence statistics
10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the
More informationIntroductory Econometrics
Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction
More informationChapter 16. Simple Linear Regression and dcorrelation
Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationMultiple Regression Introduction to Statistics Using R (Psychology 9041B)
Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment
More informationDifference in two or more average scores in different groups
ANOVAs Analysis of Variance (ANOVA) Difference in two or more average scores in different groups Each participant tested once Same outcome tested in each group Simplest is one-way ANOVA (one variable as
More informationTECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study
TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More information