Median Cross-Validation

Size: px
Start display at page:

Download "Median Cross-Validation"

Transcription

1 Median Cross-Validation Chi-Wai Yu 1, and Bertrand Clarke 2 1 Department of Mathematics Hong Kong University of Science and Technology 2 Department of Medicine University of Miami IISA 2011

2 Outline Motivational Example 1 Motivational Example

3 1. Motivational Example Consider a data set n = 30, p = 2, one response. Fit Y = β 1 X 1 + β 2 X 2 + ɛ, regression through the origin. Do the usual Frequentist analysis with normal error...much the same as a Bayes analysis with a flat prior. Param. Estimate SE p-value β β It is not reasonable to take either estimate as 0. Q-Q plot will confirm this.

4 Q-Q plot Motivational Example Quantile plot of the model: Y on X1 and X2 quantile of standardized residuals quantile of N(0,1) Figure: The normal quantile plot of model of Y on the model with X 1 and X 2.

5 Here is the data: Motivational Example Outcome Y X 1 X 2 Outcome Y X 1 X

6 The Problem: Motivational Example The model we used fits nearly perfectly, natural regression equation is Ŷ = 3.8X 1 + X 2. PROBLEM: This is dead wrong. The data generator for Y was Y i = 2X 1i + E i, for i = 1,..., n, (1) where {E i : i = 1,..., n} are IID standard Cauchy and X 1 is generated IID N(0, 1). That is, X 2 is not in the correct model! (So what did the p-value mean???) In fact, X 2 was constructed by setting X 2 = ê + normal noise, where ê is the residual from the LSE s in (1) using only Y and X 1. The heavy tailed Cauchy is built into X 2.

7 Implications: The technical term for this is cheating. However: The standard checks on normality can be misleading when a heavy tail is built into the explanatory variables. Otherwise put: We must not search naively through variables to select the ones that create a normal noise term when the noise term is not normal. Could this have been detected? Yes...but you use up data as you estimate/perform tests. With n = 30 we ve already estimated β 1, β 2, σ and done 3 hypothesis tests (β 1 = 0, β 2 = 0, and H 0 : normal error; plot as test statistic Di Cook) using up degrees of freedom ( 4,?). If we sphere our data or use other transformations we also use degrees of freedom.

8 Often must deal with heavy tails: With small n and not-small p, you can t be sure you re not just constructing a normal noise in place of a heavy tailed noise by variable selection. In addition, there is a lot of work on heavy tailed distributions that deserves to be better known. Normal-Independent (NI) class from Lange and Sinsheimer (1992) used in Lachos, Bandyopadhyay, and Dey (2011) to analyse viral loads in HIV. The NI class includes the t m distributions among others. Tressou (2008) used a (heavy tailed) Pareto in a nonparametric Bayes setting for a clustering step in estimating dietary risks. In general, Cauchy = N(0,1)/N(0,1), so ratios are often heavy tailed.

9 More generally: Motivational Example Levy-alpha-stable distributions: Like Cauchy they usually don t have a mean. Occur in Brownian motions. Inverse Gaussian has a Levy distribution limit in some cases. Applications in finance. Log-normal, Weibull (shape parameter < 1)... Our Message: For heavy tailed error, and other contexts, we propose a median version of CV that seems to work better than mean squared error CV. Simple: Just replace the sum in regular CV by a median.

10 2. Forms of CV Consider a regression model Y i = f λ (X i ; β) + E i, where {E i : i = 1,..., n} are IID median 0. Suppose ˆf λ (X) = f λ (X, ˆβ) is an estimate of the regression function from M = {f λ, λ Λ}. Assume λ varies over a finite set, then estimate β. To choose λ, could use Information Criteria (AIC, BIC,...) or shrinkage methods (SCAD, ALASSO, AEN...)

11 Forms of CV: Motivational Example Here we ll use CV approaches based on within-sample predictive accuracy. Benefits of CV: CV does not require choosing a penalty term, CV combines both (internal) prediction and fit. LOO-CV: Find the model with smallest value of CV (λ) = 1 n (y i f λ (x i ; ˆβ n (i) )) 2. i=1 Usually use leave-k-out CV, k increasing with n. Choose k n? Vast literature on CV...review by Arlot (2010), theorem of Yang (2007). Need second moments for CV; very sensitive to outliers.

12 Robust CV: Motivational Example Robust CV (Huber 1964, 1973) is to find λ to minimize 1 n ρ(y i f λ (x i ; ˆβ n (i) )), i=1 where ρ is subjectively chosen. If we choose ρ(t) so that ρ increases slower than t 2 when t, then the minimum is less sensitive to extreme values of residuals, than regular CV. (This still requires choice of ρ subjective.) In nonparametric regression, Leung (2005) shows the minimum is asymptotically independent of the choice of ρ. For small or moderate sample sizes, the minimum of may depend on ρ.

13 : Motivational Example Use the sample median in place of the mean in CV: ˆλ = arg min λ med (y i f λ (x i ; ˆβ (i) )) 2. 1 i n Three advantages 1) median automatically gives invariance of the estimators up to increasing functions, 2) the minimum always exists, 3) resistant to outliers more stable than moments. Loss functions are nonnegative and therefore right skewed. The median is a better location for skewed distributions than the mean is. Zheng and Yang (1998) used an MCV to choose the k in k-nn s regression. More generally: model selection, smoothing parameters (Yu 2009), decay parameters, anywhere you might use CV.

14 Another Median Criterion: Rousseeuw (1984) least median of squares (LMS): ˆβ LMS = arg min median [y i f (x i ; β)] 2. β 1 i n LMS is an alternative to the LSE s and to other robustified estimators. LMS has cube-root rate, asymptotic distribution, Kim and Pollard (1987). When the error term in a regression model is heavy tailed, LSE s tend to do poorly because there will be outliers. By contrast, the LMS is highly resistant to outliers. MCV is an alternative to CV like LMS is an alternative to LSE. CV works better with LSE s than with LMS s; MCV works better with LMS than with LSE s.

15 : Consistency Suppose three nested models: Model 1: Y = 2(1 + X 1 ) + E, Model 2: Y = 2(1 + X 1 + X 2 ) + E, Model 3: Y = 2(1 + X 1 + X 2 + X 3 ) + E. If Model 2 is true, Model 1 underfits and Model 3 overfits. Generate samples of size n = 50, take the covariates IID Unif[0,1] and use three noise distributions: i) standard normal, ii) standard Cauchy and iii) contamination, 80%*N(0, 1) + 20%*N(15, 1). Over 1000 reps, find P M2 (MSP chooses M k ), k = 1, 2, 3, MSP = CV-LS, MCV-LMS.

16 N(0,1) error, 5-fold: Motivational Example 5 fold : Standard normal N(0,1) % for the chosen model usual CV Model Figure: Proportion of times each model chosen by MCV and CV.

17 Cauchy error, 5-fold: Motivational Example 5 fold : Standard Cauchy % for the chosen model usual CV Model Figure: Proportion of times each model is selected by MCV and CV.

18 Contamination, 5-fold: 5 fold : 4/5N(0,1)+1/5N(10,1) % for the chosen model usual CV Model Figure: Proportion of times each model is selected by MCV and CV.

19 : Size of β Suppose we use CV and MCV to compare { Model 1: Y = X1 β 1 + E, Model 2: Y = X 1 β 1 + X 2 β 2 + E. When β 2 = 0, we reduce to Model 1. So for each value of β 2 0 taken as true we can look at how well CV and MCV can distinguish the two models. Note the difference when β 2 = 0!

20 Cauchy error, LOO LOO CV and MCV with sample size 30 and Cauchy error % to choose the true model usual CV value of beta_2 Figure: LOO MCV and CV for Cauchy errors.

21 Cauchy error, 5-fold: 5 fold CV and MCV with sample size 30 and Cauchy error % to choose the true model usual CV value of beta_2 Figure: 5 fold MCV and CV for Cauchy errors.

22 two models, df vs. β 2, black dots = MCV better: Comparison of MCV and CV under the t error distributions degree of freedom beta_2

23 Tentative Inferences: Motivational Example With normal error, or high df s, CV wins (β 0 not too large). As the error becomes heavier-tailed, MCV is better able than CV to identify the correct model. MCV seems able to ignore residuals that are too big because of large noise components while CV focusses on them. CV tends to sparsify : Put too much mass incorrectly on smaller models. Prefers models that are too small even when they re wrong. Comment: Need the non-sparse case for prediction...tianxi Cai: Can t just look at the top SNP s. If n increases, the probability CV and MCV of choosing the right model increases (rises to well over.5) but the same qualitative properties hold. range of β s depending onmedian n and CV df for which MCV wins

24 4. No theory...let the pictures do the talking. Imagine 5 explanatory variables in a linear regression model. Generate the X j s from a Unif[c, c] where c is the 5-th percentile of a Cauchy. Consider 2 5 models... all (non-nested) submodels of Y + β 0 = 5 β j X j + ɛ. j=1 Which model classes do 5-fold CV-LS and MCV-LMS choose when ɛ is Cauchy, n = 70, and various models are taken as true? We ll see that CV misses all the small terms.

25 model β = (5, 5, 5, 5, 5): CV vs MCV % of choosing true model Eprop Mprop Figure: Last model on RHS is correct.

26 model β = (5, 5, 5, 5,.5): CV vs MCV % of choosing true model Eprop Mprop Figure: Last model on RHS is correct, 2nd last model has β 5 = 0.

27 β = (5, 5, 5,.5,.5): CV vs MCV % of choosing true model Eprop Mprop Figure: Last model on RHS is correct

28 β = (5, 5,.5,.5,.5): CV vs MCV % of choosing true model Eprop Mprop Figure: Last model on RHS is correct

29 β = (5,.5,.5,.5,.5): CV vs MCV % of choosing true model Eprop Mprop Figure: Last model on RHS is correct

30 β = (.5,.5,.5,.5,.5) CV vs MCV % of choosing true model Eprop Mprop Figure: Last model on RHS is correct

31 β = (5, 5,.5, 0, 0): CV vs MCV % of choosing true model Eprop Mprop Figure: CV splits its weight on the sparse model (purple, β 3 = β 4 = β 5 = 0) and the true model with β 3 0.

32 Percent time correct choice as a function of df in t Again, let s look at how the probability of correct selection of the true model depends on the df s of the error given a true model. For comparison purposes, we look at correct selection and selection of a model that has symmetric difference at most 1 with the true model (may miss or add a term). Again, n = 70, 5-fold (M)CV.

33 β = (5, 5, 5, 5, 5) true: t distribution with different df and true model with beta=( ) % that cv/mcv chose the true model SD MCV SD CV MCV CV degree of freedom Figure: MCV and SD-MCV coincide. As df increases, CV and SD-CV catch up to MCV, SD-MCV, by 1.7.

34 β = (5, 5, 5,.5,.5) true: t distribution with different df and true model with beta=( ) % that cv/mcv chose the true model SD MCV SD CV MCV CV degree of freedom Figure: MCV, SD-MCV deteriorate as df increases; CV, SD-CV improve as df increases. Crossover 2.

35 β = (5,.5,.5,.5,.5) true: t distribution with different df and true model with beta=( ) % that cv/mcv chose the true model SD MCV SD CV MCV CV degree of freedom Figure: MCV, SD-MCV deteriorate as df increases; CV, SD-CV improve as df increases. Crossover 1.8.

36 True Model Inside Motivational Example New simulation: Consider 25 nested linear regression models the true model is Y = β 0 + J +E where E is Cauchy and J = 1,..., 25. Suppose J = 20 is the true model for which β j = 2/(j 1). Let s compare the sampling distributions of CV (with LS and LMS) and MCV with (LS and LMS) over the model list. j=1 We see that MCV-LMS is best at detecting small terms.

37 Cauchy error, 5-fold, nonsparse CV vs MCV % of choosing true model Eprop Mprop EMprop3 EMprop4 Figure: 25 models, model 21 is correct, β s decreasing, n 300

38 Cauchy error, 5-fold, nonsparse CV vs MCV % of choosing true model Eprop Mprop EMprop3 EMprop4 Figure: 25 models, model 21 is correct, β s decreasing, n 1500

39 5. and Future Work For heavy tailed errors, especially where the leading terms are not enough, use MCV, not CV. When you have light tailed (normal) errors, use regular CV unless β j /σ c n (simulations not shown). Diagnostic for using MCV rather than CV: histogram of residuals. (If not normal, use MCV. If normal, rule out having constructed the normal error and use CV.) Maybe the Bahadur representation can help quantify these findings...bahadur (1966), JKG (1971), Mazumder and Serfling (2009)... Empirical process approach Kim and Pollard (1987) style as for LMS? Random effects models???

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Model Fitting. Jean Yves Le Boudec

Model Fitting. Jean Yves Le Boudec Model Fitting Jean Yves Le Boudec 0 Contents 1. What is model fitting? 2. Linear Regression 3. Linear regression with norm minimization 4. Choosing a distribution 5. Heavy Tail 1 Virus Infection Data We

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

Tentative solutions TMA4255 Applied Statistics 16 May, 2015 Norwegian University of Science and Technology Department of Mathematical Sciences Page of 9 Tentative solutions TMA455 Applied Statistics 6 May, 05 Problem Manufacturer of fertilizers a) Are these independent

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population Lecture 5 1 Lecture 3 The Population Variance The population variance, denoted σ 2, is the sum of the squared deviations about the population mean divided by the number of observations in the population,

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER MLE Going the Way of the Buggy Whip Used to be gold standard of statistical estimation Minimum variance

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the

More information

Efficient and Robust Scale Estimation

Efficient and Robust Scale Estimation Efficient and Robust Scale Estimation Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline Introduction and motivation The robust scale estimator

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Learning Objectives for Stat 225

Learning Objectives for Stat 225 Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Diagnostics and Remedial Measures

Diagnostics and Remedial Measures Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can

More information

What s New in Econometrics? Lecture 14 Quantile Methods

What s New in Econometrics? Lecture 14 Quantile Methods What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Risk Minimization Barnabás Póczos What have we seen so far? Several classification & regression algorithms seem to work fine on training datasets: Linear

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

9. Robust regression

9. Robust regression 9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

12 Statistical Justifications; the Bias-Variance Decomposition

12 Statistical Justifications; the Bias-Variance Decomposition Statistical Justifications; the Bias-Variance Decomposition 65 12 Statistical Justifications; the Bias-Variance Decomposition STATISTICAL JUSTIFICATIONS FOR REGRESSION [So far, I ve talked about regression

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Model Checking and Improvement

Model Checking and Improvement Model Checking and Improvement Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Model Checking All models are wrong but some models are useful George E. P. Box So far we have looked at a number

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate Review and Comments Chi-square tests Unit : Simple Linear Regression Lecture 1: Introduction to SLR Statistics 1 Monika Jingchen Hu June, 20 Chi-square test of GOF k χ 2 (O E) 2 = E i=1 where k = total

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Correlation 1. December 4, HMS, 2017, v1.1

Correlation 1. December 4, HMS, 2017, v1.1 Correlation 1 December 4, 2017 1 HMS, 2017, v1.1 Chapter References Diez: Chapter 7 Navidi, Chapter 7 I don t expect you to learn the proofs what will follow. Chapter References 2 Correlation The sample

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Half-Day 1: Introduction to Robust Estimation Techniques

Half-Day 1: Introduction to Robust Estimation Techniques Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Half-Day 1: Introduction to Robust Estimation Techniques Andreas Ruckstuhl Institut fr Datenanalyse

More information

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators Thilo Klein University of Cambridge Judge Business School Session 4: Linear regression,

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Supplementary Material for Wang and Serfling paper

Supplementary Material for Wang and Serfling paper Supplementary Material for Wang and Serfling paper March 6, 2017 1 Simulation study Here we provide a simulation study to compare empirically the masking and swamping robustness of our selected outlyingness

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

On Model Selection Criteria for Climate Change Impact Studies. Xiaomeng Cui Dalia Ghanem Todd Kuffner UC Davis UC Davis WUSTL

On Model Selection Criteria for Climate Change Impact Studies. Xiaomeng Cui Dalia Ghanem Todd Kuffner UC Davis UC Davis WUSTL 1 / 47 On Model Selection Criteria for Climate Change Impact Studies Xiaomeng Cui Dalia Ghanem Todd Kuffner UC Davis UC Davis WUSTL December 8, 2017 University of Southern California 2 / 47 Motivation

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

EECS E6690: Statistical Learning for Biological and Information Systems Lecture1: Introduction

EECS E6690: Statistical Learning for Biological and Information Systems Lecture1: Introduction EECS E6690: Statistical Learning for Biological and Information Systems Lecture1: Introduction Prof. Predrag R. Jelenković Time: Tuesday 4:10-6:40pm 1127 Seeley W. Mudd Building Dept. of Electrical Engineering

More information

Recitation 5. Inference and Power Calculations. Yiqing Xu. March 7, 2014 MIT

Recitation 5. Inference and Power Calculations. Yiqing Xu. March 7, 2014 MIT 17.802 Recitation 5 Inference and Power Calculations Yiqing Xu MIT March 7, 2014 1 Inference of Frequentists 2 Power Calculations Inference (mostly MHE Ch8) Inference in Asymptopia (and with Weak Null)

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Focused fine-tuning of ridge regression

Focused fine-tuning of ridge regression Focused fine-tuning of ridge regression Kristoffer Hellton Department of Mathematics, University of Oslo May 9, 2016 K. Hellton (UiO) Focused tuning May 9, 2016 1 / 22 Penalized regression The least-squares

More information

Construction of PoSI Statistics 1

Construction of PoSI Statistics 1 Construction of PoSI Statistics 1 Andreas Buja and Arun Kumar Kuchibhotla Department of Statistics University of Pennsylvania September 8, 2018 WHOA-PSI 2018 1 Joint work with "Larry s Group" at Wharton,

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

Accounting for Complex Sample Designs via Mixture Models

Accounting for Complex Sample Designs via Mixture Models Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

SUMMARIZING MEASURED DATA. Gaia Maselli

SUMMARIZING MEASURED DATA. Gaia Maselli SUMMARIZING MEASURED DATA Gaia Maselli maselli@di.uniroma1.it Computer Network Performance 2 Overview Basic concepts Summarizing measured data Summarizing data by a single number Summarizing variability

More information

Robustness. James H. Steiger. Department of Psychology and Human Development Vanderbilt University. James H. Steiger (Vanderbilt University) 1 / 37

Robustness. James H. Steiger. Department of Psychology and Human Development Vanderbilt University. James H. Steiger (Vanderbilt University) 1 / 37 Robustness James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 37 Robustness 1 Introduction 2 Robust Parameters and Robust

More information

Multivariate Calibration with Robust Signal Regression

Multivariate Calibration with Robust Signal Regression Multivariate Calibration with Robust Signal Regression Bin Li and Brian Marx from Louisiana State University Somsubhra Chakraborty from Indian Institute of Technology Kharagpur David C Weindorf from Texas

More information

Residuals and model diagnostics

Residuals and model diagnostics Residuals and model diagnostics Patrick Breheny November 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/42 Introduction Residuals Many assumptions go into regression models, and the Cox proportional

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Distribution Fitting (Censored Data)

Distribution Fitting (Censored Data) Distribution Fitting (Censored Data) Summary... 1 Data Input... 2 Analysis Summary... 3 Analysis Options... 4 Goodness-of-Fit Tests... 6 Frequency Histogram... 8 Comparison of Alternative Distributions...

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Dynamic Time Series Regression: A Panacea for Spurious Correlations

Dynamic Time Series Regression: A Panacea for Spurious Correlations International Journal of Scientific and Research Publications, Volume 6, Issue 10, October 2016 337 Dynamic Time Series Regression: A Panacea for Spurious Correlations Emmanuel Alphonsus Akpan *, Imoh

More information

Machine Learning and Data Mining. Linear regression. Kalev Kask

Machine Learning and Data Mining. Linear regression. Kalev Kask Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance

More information

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

Linear regression for heavy tails

Linear regression for heavy tails Linear regression for heavy tails Guus Balkema & Paul Embrechts Universiteit van Amsterdam & ETH Zürich Abstract There exist several estimators of the regression line in the simple linear regression Y

More information

AR, MA and ARMA models

AR, MA and ARMA models AR, MA and AR by Hedibert Lopes P Based on Tsay s Analysis of Financial Time Series (3rd edition) P 1 Stationarity 2 3 4 5 6 7 P 8 9 10 11 Outline P Linear Time Series Analysis and Its Applications For

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

A Bayesian perspective on GMM and IV

A Bayesian perspective on GMM and IV A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information