Chapter 11: Robust & Quantile regression

Similar documents
Chapter 11: Robust & Quantile regression

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Residuals from regression on original data 1

7.1, 7.3, 7.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1/ 31

a = 4 levels of treatment A = Poison b = 3 levels of treatment B = Pretreatment n = 4 replicates for each treatment combination

Detecting and Assessing Data Outliers and Leverage Points

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc

Section Poisson Regression

STAT 704 Sections IRLS and Bootstrap

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Simple Linear Regression

Topic 18: Model Selection and Diagnostics

Introduction to Linear regression analysis. Part 2. Model comparisons

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

Indian Statistical Institute

9. Robust regression

Remedial Measures for Multiple Linear Regression Models

Lecture 11: Simple Linear Regression

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

Lecture 1: Linear Models and Applications

Simple Linear Regression

Regression Analysis for Data Containing Outliers and High Leverage Points

Homework 2: Simple Linear Regression

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Chapter 1 Linear Regression with One Predictor

Lecture 18 Miscellaneous Topics in Multiple Regression

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

Overview Scatter Plot Example

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Chapter 5: Logistic Regression-I

EXST7015: Estimating tree weights from other morphometric variables Raw data print

Lecture 7 Remedial Measures

CHAPTER 5. Outlier Detection in Multivariate Data

Chapter 11. Analysis of Variance (One-Way)

Bayesian Model Diagnostics and Checking

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

ssh tap sas913, sas

Regression Diagnostics for Survey Data

PAijpam.eu M ESTIMATION, S ESTIMATION, AND MM ESTIMATION IN ROBUST REGRESSION

Statistical Modelling in Stata 5: Linear Models

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Regression diagnostics

Regression without measurement error using proc calis

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

Multiple Linear Regression

Simple logistic regression

Lecture 1 Linear Regression with One Predictor Variable.p2

Generalized linear models

Approximate Median Regression via the Box-Cox Transformation

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Repeated Measures Data

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Stat 302 Statistical Software and Its Applications SAS: Simple Linear Regression

Topic 23: Diagnostics and Remedies

Topic 14: Inference in Multiple Regression

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

STATISTICS 479 Exam II (100 points)

Lectures on Simple Linear Regression Stat 431, Summer 2012

Half-Day 1: Introduction to Robust Estimation Techniques

Lecture 3: Inference in SLR

ssh tap sas913, sas

Lecture 10: 2 k Factorial Design Montgomery: Chapter 6

9 Correlation and Regression

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

Math 3330: Solution to midterm Exam

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

COMPLEMENTARY LOG-LOG MODEL

Beyond GLM and likelihood

The General Linear Model. November 20, 2007

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

Lecture 4. Checking Model Adequacy

SAS/STAT 14.1 User s Guide. The QUANTREG Procedure

Chapter 6 Multiple Regression

Poisson Data. Handout #4

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Statistics 5100 Spring 2018 Exam 1

Lecture notes on Regression & SAS example demonstration

STAT 705 Chapter 16: One-way ANOVA

Sections 4.1, 4.2, 4.3

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

Small Sample Corrections for LTS and MCD

10 Model Checking and Regression Diagnostics

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Introduction and Single Predictor Regression. Correlation

Chapter 2 Inferences in Simple Linear Regression

Power Analysis for One-Way ANOVA

Reaction Days

Week 7.1--IES 612-STA STA doc

The General Linear Model. April 22, 2008

Correlation and Regression

Measuring relationships among multiple responses

Linear Modelling: Simple Regression

Quantile regression and heteroskedasticity

Chapter 20 : Two factor studies one case per treatment Chapter 21: Randomized complete block designs

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

Transcription:

Chapter 11: Robust & Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/17

11.3: Robust regression 11.3 Influential cases rem. measure: Robust regression Leverages h ii and deleted residuals t i useful for finding outlying x i and Y i (w.r.t. the model) cases. Cook s D i and DFFIT i indicate which cases are highly influencing the fit of the model, i.e. the OLS b. What to do with influential and/or outlying cases? Are they transcription errors or somehow unrepresentative of the target population? Outliers are often interesting in their own right and can help guide the building of a better model. Robust regression dampens the effect of outlying cases on estimation to provide a better fit to the majority of cases. Useful in situations when there s no time for influence diagnostics or a more careful analysis. 2/17

Robust regression is effective when the error distribution is not normal, but heavy-tailed. M-estimation is a general class of estimation methods. Choose β to minimize Q(β) = n ρ(y i x iβ), i=1 where ρ( ) is some function. ρ(u) = u 2 gives OLS b. ρ(u) = u gives L 1 regression, or least absolute residual (LAR) regression. Huber s method described next builds a ρ( ) that is a compromise between OLS and LAR. It looks like u 2 for u close to zero and u further away. 3/17

Iteratively Reweighted Least Squares Outlying values of r j i = y i x i bj are (iteratively) given less weight in the estimation process. 0 (Starting b 0 ) OLS: w 0 i = 1/e 2 i from e = Y Xb. Set j = 1. 1 (WLS for b j using W j 1 ): b j = (X W j 1 X) 1 X W j 1 Y. 2 ˆσ j = median{ Y i x i bj /Φ 1 (0.75) : i = 1,...,n}. ( 3 w j Yi x i = W i ), bj where ˆσ j { 1 if u 1.345 W(u) = 1.345 u if u > 1 }. Set j to j +1. 1 Repeat steps 1 through 3 until ˆσ j and b j stabilize. There are other weight functions (SAS default is bisquare, pp. 439-440) and other methods for updating ˆσ j see book and SAS documentation if interested. 4/17

SAS code: M-estimation w/ Huber weight data prof; input state$ mathprof parents homelib reading tvwatch absences; datalines; Alabama 252 75 78 34 18 18 Arizona 259 75 73 41 12 26...et cetera... Wisconsin 274 81 86 38 8 21 Wyoming 272 85 86 43 7 23 ; proc robustreg data=prof method=m (wf=huber); model mathprof=parents homelib reading tvwatch absences; run; proc robustreg data=prof method=m (wf=huber); model mathprof=parents homelib reading tvwatch absences; test absences reading; run; proc robustreg data=prof method=m (wf=huber); model mathprof=parents homelib tvwatch ; id state; output out=out p=p sr=sr; run; goptions reset=all; symbol1 v=dot pointlabel=("#state"); proc gplot data=out; plot sr*(p parents homelib tvwatch); run; 5/17

SAS output... 11.3 Influential cases rem. measure: Robust regression The ROBUSTREG Procedure Parameter Estimates Robust Linear Tests Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 178.1630 28.2897 122.7162 233.6097 39.66 <.0001 parents 1 0.3574 0.2007-0.0360 0.7507 3.17 0.0750 homelib 1 0.7702 0.1403 0.4952 1.0452 30.14 <.0001 reading 1 0.1495 0.2100-0.2620 0.5611 0.51 0.4763 tvwatch 1-0.8218 0.2752-1.3612-0.2824 8.92 0.0028 absences 1-0.0121 0.2058-0.4155 0.3912 0.00 0.9530 Test Test Chi- Test Statistic Lambda DF Square Pr > ChiSq Rho 0.2201 0.8646 2 0.25 0.8805 Rn2 0.5073 2 0.51 0.7760 Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 185.8796 21.4793 143.7809 227.9784 74.89 <.0001 parents 1 0.3623 0.1766 0.0162 0.7085 4.21 0.0402 homelib 1 0.7543 0.1257 0.5080 1.0007 36.01 <.0001 tvwatch 1-0.9369 0.2184-1.3650-0.5089 18.40 <.0001 Scale 1 4.2666 6/17

7/17

8/17

9/17

10/17

Quantile Regression 11.3 Influential cases rem. measure: Robust regression Another option is general quantile regression. L 1 regression (p. 438), described above, is median regression and can be carried out in SAS PROC QUANTREG. Other quantiles can similarly be regressed upon. The QUANTREG procedure uses robust multivariate location and scale estimates for leverage point detection. So you can do similar analyses as in PROC REG, but robustly. QUANTREG solves minimization problem using simplex algorithm, details in documentation. 11/17

Median, or L 1 regression minimizes Q 0.5 (β) = min β R p n Y i x iβ. Let 0 < τ < 1 be a probability. Koenker and Bassett (1978) define b τ, the τth quantile regression effects vector, to minimize Q τ (β) = min τ Y i x iβ +(1 τ) Y i x iβ. β R p i:y i x i β i:y i <x i β Let Y h be a draw from the population with accompanying x h. The b τ that minimizes Q τ (β) satisfies i=1 P(Y h x h b τ) τ. Note q τ (x) = x b τ. How are the elements of b τ interpreted? 12/17

SAS code: Lung pressure data data pressure; input spap empty eject gas @@; datalines; 49.0 45.0 36.0 45.0 55.0 30.0 28.0 40.0 85.0 11.0 16.0 42.0 32.0 30.0 46.0 40.0 26.0 39.0 76.0 43.0 28.0 42.0 78.0 27.0 95.0 17.0 24.0 36.0 26.0 63.0 80.0 42.0 74.0 25.0 12.0 52.0 37.0 32.0 27.0 35.0 31.0 37.0 37.0 55.0 49.0 29.0 34.0 47.0 38.0 26.0 32.0 28.0 41.0 38.0 45.0 30.0 12.0 38.0 99.0 26.0 44.0 25.0 38.0 47.0 29.0 27.0 51.0 44.0 40.0 37.0 32.0 54.0 31.0 34.0 40.0 36.0 ; data pressure2; set pressure; eject2=eject**2; proc quantreg data=pressure2; model spap=eject / quantile=0.25; output out=o1 p=p1; proc quantreg data=pressure2; model spap=eject / quantile=0.50; output out=o2 p=p2; proc quantreg data=pressure2; model spap=eject / quantile=0.75; output out=o3 p=p3; data o4; set o1 o2 o3; proc sort data=o4; by quantile eject; goptions reset=all; symbol1 color=black value=dot interpol=none; symbol2 color=black value=none l=3 interpol=join; symbol3 color=black value=none l=1 interpol=join; symbol4 color=black value=none l=3 interpol=join; legend1 value=("data" "25%" "50%" "75%"); proc gplot data=o4; plot spap*eject p1*eject p2*eject p3*eject / overlay legend=legend1; 13/17

SAS output... 11.3 Influential cases rem. measure: Robust regression The QUANTREG Procedure Quantile 0.25 95% Confidence Parameter DF Estimate Limits Intercept 1 49.3585 35.9819 88.2810 eject 1-0.3774-2.4717-0.1364 Quantile 0.5 95% Confidence Parameter DF Estimate Limits Intercept 1 65.1667 45.3958 83.0584 eject 1-0.5370-1.1718-0.2699 Quantile 0.75 95% Confidence Parameter DF Estimate Limits Intercept 1 82.5517 65.5742 110.0804 eject 1-0.7126-0.9293-0.5127 14/17

15/17

Normal-errors regression is also quantile regression For our standard model: Y = x β +ǫ, ǫ N(0,σ 2 ), note that P(Y q τ ) = τ P(x β +ǫ q τ ) = τ ( ǫ P σ q τ x ) β = τ σ ( qτ x ) β Φ = τ σ q τ x β σ = Φ 1 (τ) q τ (x) = Φ 1 (τ)σ +x β. 16/17

When x j increases by one, every quantile increases by β j. What does this imply about the quantile functions? Could we see a plot like we saw for the lung pressure data a few slides ago? Final comment: L 1 regression is obtained as the MLE from a standard regression model assuming the errors are distributed double-exponential (Laplace). 17/17