Robust regression in R. Eva Cantoni

Size: px
Start display at page:

Download "Robust regression in R. Eva Cantoni"

Transcription

1 Robust regression in R Eva Cantoni Research Center for Statistics and Geneva School of Economics and Management, University of Geneva, Switzerland April 4th, 2017

2 1 Robust statistics philosopy 2 Robust regression 3 R ressources 4 Examples 5 Bibliography

3 Against what is robust statistics robust? Robust Statistics aims at producing consistent and possibly efficient estimators and test statistics with stable level when the model is slightly misspecified. Model misspecification encompasses a relatively large set of possibilities, and robust statistics cannot deal with all types of model misspecifications. By slight model misspecification, we suppose that the data generating process lies in a neighborhood of the true (postulated) model, the one that is considered as useful for the problem under investigation.

4 Against what is robust statistics robust? This neighborhood is formalized as F ε = (1 ε)f θ + εg, (1) F θ is the postulated model, θ is a set of parameters of interest, G is an arbitrary distribution and 0 ε 1 captures the amount of model misspecification

5 Against what is robust statistics robust? Inference Classical G 0 << ε < 1 0 < ε << 1 ε = 0 arbitrary?? F θ G = z F ε F ε F θ G = F (θ,θ ) F ε F ε F θ G such that F ε = F (θ,θ ) F (θ,θ ) F (θ,θ ) F θ Robust arbitrary? F θ F θ G = z F ε F θ F θ G = F (θ,θ ) F ε F θ F θ G such that F ε = F (θ,θ ) F (θ,θ ) F θ F θ Table: Models at which inference can be at best done.

6 Robustness measures Robust estimators protect against: bias under contamination breakdown point They imply a trade-off between efficiency and robustness!

7 Linear model and classical estimation For i = 1,..., n consider with ɛ i N (0, σ 2 ). y i = x T i β + ɛ i, The maximum likelihood estimator ˆβ ML minimizes ( n y i xi T σ i=1 ˆβ ML ) 2 = n i=1 r 2 i, or, alternatively, solves ( n y i xi T σ i=1 ˆβ ML ) x i = n r i x i = 0. i=1

8 Outliers x y Classification des points aberrants Point vertical Bon point levier Mauvais point levier

9 Robustness vs diagnostic Masking... ML residuals Index resid(olsmultiple)

10 Robustness vs diagnostic Masking... Data and ML fit x y

11 M and GM-estimation The M-estimator ˆβ M solves ) n (y i xi T ˆβ M ψ c x i = σ i=1 n ψ c (r i )x i = i=1 n w c (r i ) r i x i = 0, i=1 where w c (r i ) = ψ c (r i )/r i, or minimizes ) n (y i xi T ˆβ M ρ c. σ i=1 The Mallows GM-estimator ˆβ GM is an alternative that solves ) n (y i xi T ˆβ GM n ψ c w(x i )x i = w(r i )w(x i )r i x i = 0. σ i=1 i=1

12 ψ c and ρ c functions Huber Tukey or bisquare psi_c(r) psi_c(r) r r ρ H (u) ρ B (u) u u

13 Comparison x y OLS M Huber GM Huber M Tukey

14 S-estimation The S-estimator ˆβ S is an alternative that minimizes ) n (y i xi T ˆβ S ρ c, s i=1 where s is a scale M-estimator that solves 1 n ( ρ (1) yi x T ) i β c = b. n s, i=1

15 MM-estimation The MM-estimator is a two-step estimator constructed as follow: 1. Let s n be the scale estimate from an initial S-estimator. 2. With ρ (2) c ( ) ρ (1) c ( ), the MM-estimator ˆβ MM minimizes ( ) n y i xi T ˆβ MM. i=1 ρ (2) c s n

16 Comparison x y OLS M Huber M Tukey MM S

17 Robust GLM (GM-estimator) For the GLM model (e.g. logistic, Poisson) g(µ i ) = x T i β where E(Y i ) = µ i, Var(Y i ) = v(µ i ) and r i = (y i µ i ), the robust φvµi estimator is defined by n [ 1 ] ψ c (r i )w(x i ) µ i a(β) = 0, (2) φvµi i=1 where µ i = µ i / β = µ i / η i x i and a(β) = 1 n n i=1 E[ψ(r i; c)]w(x i )/ φv µi µ i. The constant a(β) is a correction term to ensure Fisher consistency.

18 R functions for robust linear regression (G)M-estimation MASS: rlm() with method= M (Huber, Tukey, Hampel) Choice for the scale estimator: MAD, Huber Proposal 2 S-estimation robust: lmrob with estim= Initial robustbase: lmrob.s MM-estimation MASS: rlm() with method= MM robust: lmrob (with estim= Final, default) robustbase: lmrob()

19 R functions for other models robustbase: glmrob GM-estimation, Huber (include Gaussian) Negative binomial model: glmrob.nb from From our book webpage: eva-cantoni/books/ robust-methods-in-biostatistics/

20 Coleman Dataset coleman from package robustbase. A data frame with 20 observations on the following 6 variables. salaryp: staff salaries per pupil fatherwc: percent of white-collar fathers sstatus: socioeconomic status composite deviation: means for family size, family intactness, father s education, mother s education, and home items teachersc: mean teacher s verbal test score motherlev: mean mother s educational level, one unit is equal to two school years Y: verbal mean test score (y, all sixth graders)

21 Coleman > summary (m. lm ) C a l l : lm ( formula = Y., data = coleman ) Residuals : Min 1Q Median 3Q Max C o e f f i c i e n t s : Est imat e Std. E r r o r t v a l u e Pr(> t ) ( I n t e r c e p t ) s a l a r y P fatherwc s s t a t u s e 05 t e a c h e r S c motherlev S i g n i f. c o d e s : 0 Äò Äô Äò Äô Äò Äô Äò. Äô 0. 1 Äò Äô 1 Residual standard e r r o r : on 14 degrees of freedom M u l t i p l e R squared : , Adjusted R squared : F s t a t i s t i c : on 5 and 14 DF, p v a l u e : e 07

22 Coleman > r e q u i r e ( r o b u s t b a s e ) > summary (m. lmrob, s e t t i n g = KS2011 ) C a l l : lmrob ( formula = Y., data = coleman ) \ > method = MM Residuals : Min 1Q Median 3Q Max C o e f f i c i e n t s : Est imat e Std. E r r o r t v a l u e Pr(> t ) ( I n t e r c e p t ) s a l a r y P fatherwc e 05 s s t a t u s e 11 teachersc e 08 motherlev S i g n i f. c o d e s : 0 Äò Äô Äò Äô Äò Äô Äò. Äô 0. 1 Äò Äô 1 Robust r e s i d u a l s t a n d a r d e r r o r : M u l t i p l e R squared : , Adjusted R squared : Convergence i n 11 IRWLS i t e r a t i o n s Robustness weights : o b s e r v a t i o n 18 i s an o u t l i e r w i t h w e i g h t = 0 ( < ) ; The r e m a i n i n g 19 ones a r e summarized as Min. 1 st Qu. Median Mean 3 rd Qu. Max

23 Coleman Ctn d: A l g o r i t h m i c parameters : t u n i n g. c h i bb t u n i n g. p s i r e f i n e. t o l r e l. t o l e e e e e 0 s o l v e. t o l eps. o u t l i e r eps. x warn. l i m i t. r e j e c t warn. l i m i t. meanr e e e e e 0 nresample max. i t b e s t. r. s k. f a s t. s k. max maxit. s c a l t r a c e. l e v mts compute. rd f a s t. s. l a r g e. n p s i s u b s a m p l i n g cov compute. o u t l i e r. s t a t s b i s q u a r e n o n s i n g u l a r. vcov. a v a r 1 SM s e e d : i n t ( 0 )

24 Coleman Robustness weights m.lmrob$rweights Index

25 Coleman > summary (m. rlm ) C a l l : rlm ( f o r m u l a = Y., data = coleman ) Residuals : Min 1Q Median 3Q Max C o e f f i c i e n t s : Value Std. E r r o r t v a l u e ( I n t e r c e p t ) s a l a r y P fatherwc s s t a t u s t e a c h e r S c motherlev Residual standard e r r o r : on 14 degrees of freedom

26 Breastfeeding UK study on the decision of pregnant women to breastfeed. 135 expecting mothers asked on their feeding choice (breast= 1 if breastfeeding, try to breastfeed and mixed brest- and bottle-feeding, =0 if exclusive bottle-feeding). Covariates: advancement of the pregnancy (pregnancy, end or beginning), how mothers fed as babies (howfed, some breastfeeding or only bottle-feeding), how mother s friend fed their babies (howfedfriends, some breastfeeding or only bottle-feeding), if had a partner (partner, no or yes), age (age), age at which left full time education (educat), ethnic group (ethnic, white or non white) and if ever smoked (smokebf, no or yes) or if stopped smoking (smokenow, no or yes). The first listed level of each factor is used as the reference (coded 0).

27 Breastfeeding The sample characteristics are as follow: out of the 135 observations, 99 were from mothers that have decided at least to try to breastfeed, 54 mothers were at the beginning of their pregnancy, 77 were themselves breastfed as baby, 85 of the mother s friend had breastfed their babies, 114 mothers had a partner, median age was (with minimum equal 17 and maximum equal 40), median age at the end of education was 17 (minimum=14, maximum=38), 77 mothers were white and 32 mothers were smoking during the pregnancy, whereas 51 had smoked before.

28 Breastfeeding breast i Bernoulli(µ i ), so that E(breast i ) = µ i and Var(breast i ) = µ i (1 µ i ) (Binomial family). Use the logit link. logit(e(breast)) = logit(p(breast)) = = β 0 + β 1 pregnancy + β 2 howfed + β 3 howfedfr + β 4 partner + β 5 ethnic + β 6 smokebf + β 7 smokenow + β 8 age + β 9 educat, where logit(µ i ) = log( µ i 1 µ i ), with µ i /(1 µ i ) being the odds of a success, and µ i = P(breast) is the probability of at least try to breastfeed.

29 Breastfeeding > r e q u i r e ( r o b u s t b a s e ) > breast. glmrobwx=glmrob ( decwhat howfedfr+ethnic+educat+age+grp+howfed+partner+ smokenow+smokebf, family=binomial, weights. on. x= hat, tcc =1.5, data=breast ) > summary ( b r e a s t. glmrobwx ) C a l l : glmrob ( formula = decwhat howfedfr + ethnic + educat + age + grp + howfed + partner + smokenow + smokebf, family = binomial, data = breast, w e i g h t s. on. x = hat, t c c = 1. 5 ) C o e f f i c i e n t s : E s t i m a t e Std. E r r o r z v a l u e Pr(> z ) ( I n t e r c e p t ) h o w f e d f r B r e a s t ethnicnon w h i t e e d u c a t age g r p B e g i n n i n g h o w f e d B r e a s t p a r t n e r P a r t n e r smokenowyes smokebfyes S i g n i f. codes : R o b u s t n e s s w e i g h t s w. r w. x : Min. 1 st Qu. Median Mean 3 rd Qu. Max Number o f o b s e r v a t i o n s : 135 F i t t e d by method Mqle ( i n 12 i t e r a t i o n s ) ( Dispersion parameter f o r binomial family taken to be 1)

30 Breastfeeding Ctn d: No d e v i a n c e v a l u e s a v a i l a b l e A l g o r i t h m i c parameters : acc t c c maxit 50 t e s t. acc c o e f

31 Breastfeeding Robustness weights w(x) Index w(r) Index

32 Mailing list and conferences Dedicated mailing list: ICORS 2017 in Wollongong, Australia: ICORS 2016 in Geneva:

33 References [1] Jean-Jacques Droesbeke, Gilbert Saporta, and Christine Thomas-Agnan. Méthodes robustes en statistique. Editions TECHNIP, [2] Frank R. Hampel, Elvezio M. Ronchetti, Peter J. Rousseeuw, and Werner A. Stahel. Robust Statistics: The Approach Based on Influence Functions. Wiley, New York, [3] S. Heritier, E. Cantoni, S. Copt, and M.-P. Victoria-Feser. Robust Methods in Biostatistics. Wiley-Interscience, [4] Peter J. Huber. Robust Statistics. Wiley, New York, [5] Peter J. Huber and Elvezio M. Ronchetti. Robust Statistics. Wiley, New York, Second Edition. [6] R.A. Maronna, R.D. Martin, and V.J. Yohai. Robust statistics. Wiley New York, 2006.

Introduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland.

Introduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland. Introduction to Robust Statistics Elvezio Ronchetti Department of Econometrics University of Geneva Switzerland Elvezio.Ronchetti@metri.unige.ch http://www.unige.ch/ses/metri/ronchetti/ 1 Outline Introduction

More information

Definitions of ψ-functions Available in Robustbase

Definitions of ψ-functions Available in Robustbase Definitions of ψ-functions Available in Robustbase Manuel Koller and Martin Mächler July 18, 2018 Contents 1 Monotone ψ-functions 2 1.1 Huber.......................................... 3 2 Redescenders

More information

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics

More information

Small Sample Corrections for LTS and MCD

Small Sample Corrections for LTS and MCD myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

Robust Statistics with S (R) JWT. Robust Statistics Collaborative Package Development: robustbase. Robust Statistics with R the past II

Robust Statistics with S (R) JWT. Robust Statistics Collaborative Package Development: robustbase. Robust Statistics with R the past II Robust Statistics with S (R) JWT Robust Statistics Collaborative Package Development: robustbase Martin Maechler 1 Andreas Ruckstuhl 2 1 ETH Zurich 2 ZHW Winterthur Switzerland maechler@r-project.org,

More information

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

More information

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS AIDA TOMA The nonrobustness of classical tests for parametric models is a well known problem and various

More information

A Modified M-estimator for the Detection of Outliers

A Modified M-estimator for the Detection of Outliers A Modified M-estimator for the Detection of Outliers Asad Ali Department of Statistics, University of Peshawar NWFP, Pakistan Email: asad_yousafzay@yahoo.com Muhammad F. Qadir Department of Statistics,

More information

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples. Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression

More information

Robust negative binomial regression

Robust negative binomial regression Robust negative binomial regression Michel Amiguet, Alfio Marazzi, Marina S. Valdora, Víctor J. Yohai september 2016 Negative Binomial Regression Poisson regression is the standard method used to model

More information

robmixglm: An R package for robust analysis using mixtures

robmixglm: An R package for robust analysis using mixtures robmixglm: An R package for robust analysis using mixtures 1 Introduction 1.1 Model Ken Beath Macquarie University Australia Package robmixglm implements the method of Beath (2017). This assumes that data

More information

Robust Statistics. Frank Klawonn

Robust Statistics. Frank Klawonn Robust Statistics Frank Klawonn f.klawonn@fh-wolfenbuettel.de, frank.klawonn@helmholtz-hzi.de Data Analysis and Pattern Recognition Lab Department of Computer Science University of Applied Sciences Braunschweig/Wolfenbüttel,

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

OPTIMAL B-ROBUST ESTIMATORS FOR THE PARAMETERS OF THE GENERALIZED HALF-NORMAL DISTRIBUTION

OPTIMAL B-ROBUST ESTIMATORS FOR THE PARAMETERS OF THE GENERALIZED HALF-NORMAL DISTRIBUTION REVSTAT Statistical Journal Volume 15, Number 3, July 2017, 455 471 OPTIMAL B-ROBUST ESTIMATORS FOR THE PARAMETERS OF THE GENERALIZED HALF-NORMAL DISTRIBUTION Authors: Fatma Zehra Doğru Department of Econometrics,

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Lecture 12 Robust Estimation

Lecture 12 Robust Estimation Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes

More information

Computing Robust Regression Estimators: Developments since Dutter (1977)

Computing Robust Regression Estimators: Developments since Dutter (1977) AUSTRIAN JOURNAL OF STATISTICS Volume 41 (2012), Number 1, 45 58 Computing Robust Regression Estimators: Developments since Dutter (1977) Moritz Gschwandtner and Peter Filzmoser Vienna University of Technology,

More information

6. INFLUENTIAL CASES IN GENERALIZED LINEAR MODELS. The Generalized Linear Model

6. INFLUENTIAL CASES IN GENERALIZED LINEAR MODELS. The Generalized Linear Model 79 6. INFLUENTIAL CASES IN GENERALIZED LINEAR MODELS The generalized linear model (GLM) extends from the general linear model to accommodate dependent variables that are not normally distributed, including

More information

WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION

WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION Michael Amiguet 1, Alfio Marazzi 1, Victor Yohai 2 1 - University of Lausanne, Institute for Social and Preventive Medicine, Lausanne, Switzerland 2 - University

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Robust model selection criteria for robust S and LT S estimators

Robust model selection criteria for robust S and LT S estimators Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com

More information

Using R in 200D Luke Sonnet

Using R in 200D Luke Sonnet Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random

More information

Small sample corrections for LTS and MCD

Small sample corrections for LTS and MCD Metrika (2002) 55: 111 123 > Springer-Verlag 2002 Small sample corrections for LTS and MCD G. Pison, S. Van Aelst*, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY

COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY Rand R. Wilcox Dept of Psychology University of Southern California Florence Clark Division of Occupational

More information

Logistic Regression - problem 6.14

Logistic Regression - problem 6.14 Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Midwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter

Midwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter Midwest Big Data Summer School: Introduction to Statistics Kris De Brabanter kbrabant@iastate.edu Iowa State University Department of Statistics Department of Computer Science June 20, 2016 1/27 Outline

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

9. Robust regression

9. Robust regression 9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................

More information

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla. Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit

More information

MULTIVARIATE TECHNIQUES, ROBUSTNESS

MULTIVARIATE TECHNIQUES, ROBUSTNESS MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March Presented by: Tanya D. Havlicek, ACAS, MAAA ANTITRUST Notice The Casualty Actuarial Society is committed

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

Breakdown points of Cauchy regression-scale estimators

Breakdown points of Cauchy regression-scale estimators Breadown points of Cauchy regression-scale estimators Ivan Mizera University of Alberta 1 and Christine H. Müller Carl von Ossietzy University of Oldenburg Abstract. The lower bounds for the explosion

More information

A new robust model selection method in GLM with application to ecological data

A new robust model selection method in GLM with application to ecological data Sakate and Kashid Environ Syst Res 2016 5:9 DOI 10.1186/s40068-016-0060-7 RESEARCH Open Access A new robust model selection method in GLM with application to ecological data D. M. Sakate * and D. N. Kashid

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Robust Variable Selection Through MAVE

Robust Variable Selection Through MAVE Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION (REFEREED RESEARCH) COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION Hakan S. Sazak 1, *, Hülya Yılmaz 2 1 Ege University, Department

More information

Leverage effects on Robust Regression Estimators

Leverage effects on Robust Regression Estimators Leverage effects on Robust Regression Estimators David Adedia 1 Atinuke Adebanji 2 Simon Kojo Appiah 2 1. Department of Basic Sciences, School of Basic and Biomedical Sciences, University of Health and

More information

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach Kristina Pestaria Sinaga, Manuntun Hutahaean 2, Petrus Gea 3 1, 2, 3 University of Sumatera Utara,

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions

More information

PAijpam.eu M ESTIMATION, S ESTIMATION, AND MM ESTIMATION IN ROBUST REGRESSION

PAijpam.eu M ESTIMATION, S ESTIMATION, AND MM ESTIMATION IN ROBUST REGRESSION International Journal of Pure and Applied Mathematics Volume 91 No. 3 2014, 349-360 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v91i3.7

More information

Logistic Regression 21/05

Logistic Regression 21/05 Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Detecting and Assessing Data Outliers and Leverage Points

Detecting and Assessing Data Outliers and Leverage Points Chapter 9 Detecting and Assessing Data Outliers and Leverage Points Section 9.1 Background Background Because OLS estimators arise due to the minimization of the sum of squared errors, large residuals

More information

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE Eric Blankmeyer Department of Finance and Economics McCoy College of Business Administration Texas State University San Marcos

More information

Various Issues in Fitting Contingency Tables

Various Issues in Fitting Contingency Tables Various Issues in Fitting Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Complete Tables with Zero Entries In contingency tables, it is possible to have zero entries in a

More information

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Biostatistics for physicists fall Correlation Linear regression Analysis of variance Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

GARCH Models Estimation and Inference

GARCH Models Estimation and Inference GARCH Models Estimation and Inference Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 1 Likelihood function The procedure most often used in estimating θ 0 in

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Indian Statistical Institute

Indian Statistical Institute Indian Statistical Institute Introductory Computer programming Robust Regression methods with high breakdown point Author: Roll No: MD1701 February 24, 2018 Contents 1 Introduction 2 2 Criteria for evaluating

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102 Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Outlier Robust Nonlinear Mixed Model Estimation

Outlier Robust Nonlinear Mixed Model Estimation Outlier Robust Nonlinear Mixed Model Estimation 1 James D. Williams, 2 Jeffrey B. Birch and 3 Abdel-Salam G. Abdel-Salam 1 Business Analytics, Dow AgroSciences, 9330 Zionsville Rd. Indianapolis, IN 46268,

More information

Maximum Likelihood Conjoint Measurement in R

Maximum Likelihood Conjoint Measurement in R Maximum Likelihood Conjoint Measurement in R Kenneth Knoblauch¹, Blaise Tandeau de Marsac¹, Laurence T. Maloney² 1. Inserm, U846 Stem Cell and Brain Research Institute Dept. Integrative Neurosciences Bron,

More information

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using

More information

Generalized Linear Models. stat 557 Heike Hofmann

Generalized Linear Models. stat 557 Heike Hofmann Generalized Linear Models stat 557 Heike Hofmann Outline Intro to GLM Exponential Family Likelihood Equations GLM for Binomial Response Generalized Linear Models Three components: random, systematic, link

More information

FAMA-FRENCH 1992 REDUX WITH ROBUST STATISTICS

FAMA-FRENCH 1992 REDUX WITH ROBUST STATISTICS FAMA-FRENCH 1992 REDUX WITH ROBUST STATISTICS R. Douglas Martin* Professor Emeritus Applied Mathematics and Statistics University of Washington IAQF, NYC December 11, 2018** *Joint work with Christopher

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

A SHORT COURSE ON ROBUST STATISTICS. David E. Tyler Rutgers The State University of New Jersey. Web-Site dtyler/shortcourse.

A SHORT COURSE ON ROBUST STATISTICS. David E. Tyler Rutgers The State University of New Jersey. Web-Site  dtyler/shortcourse. A SHORT COURSE ON ROBUST STATISTICS David E. Tyler Rutgers The State University of New Jersey Web-Site www.rci.rutgers.edu/ dtyler/shortcourse.pdf References Huber, P.J. (1981). Robust Statistics. Wiley,

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points

Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points Ettore Marubini (1), Annalisa Orenti (1) Background: Identification and assessment of outliers, have

More information

Identifying and accounting for outliers and extreme response patterns in latent variable modelling

Identifying and accounting for outliers and extreme response patterns in latent variable modelling Identifying and accounting for outliers and extreme response patterns in latent variable modelling Irini Moustaki Athens University of Economics and Business Outline 1. Define the problem of outliers and

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information

Regression Methods for Survey Data

Regression Methods for Survey Data Regression Methods for Survey Data Professor Ron Fricker! Naval Postgraduate School! Monterey, California! 3/26/13 Reading:! Lohr chapter 11! 1 Goals for this Lecture! Linear regression! Review of linear

More information

A Robust Strategy for Joint Data Reconciliation and Parameter Estimation

A Robust Strategy for Joint Data Reconciliation and Parameter Estimation A Robust Strategy for Joint Data Reconciliation and Parameter Estimation Yen Yen Joe 1) 3), David Wang ), Chi Bun Ching 3), Arthur Tay 1), Weng Khuen Ho 1) and Jose Romagnoli ) * 1) Dept. of Electrical

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

arxiv: v4 [math.st] 16 Nov 2015

arxiv: v4 [math.st] 16 Nov 2015 Influence Analysis of Robust Wald-type Tests Abhik Ghosh 1, Abhijit Mandal 2, Nirian Martín 3, Leandro Pardo 3 1 Indian Statistical Institute, Kolkata, India 2 Univesity of Minnesota, Minneapolis, USA

More information

Robustness of location estimators under t- distributions: a literature review

Robustness of location estimators under t- distributions: a literature review IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Robustness of location estimators under t- distributions: a literature review o cite this article: C Sumarni et al 07 IOP Conf.

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

STA102 Class Notes Chapter Logistic Regression

STA102 Class Notes Chapter Logistic Regression STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Projection Estimators for Generalized Linear Models

Projection Estimators for Generalized Linear Models Projection Estimators for Generalized Linear Models Andrea Bergesio and Victor J. Yohai Andrea Bergesio is Teaching Assistant, Universidad Nacional del Litoral, IMAL, Güemes 3450, S3000GLN Santa Fe, Argentina

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00 Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section

More information

Mixed-effects Maximum Likelihood Difference Scaling

Mixed-effects Maximum Likelihood Difference Scaling Mixed-effects Maximum Likelihood Difference Scaling Kenneth Knoblauch Inserm U 846 Stem Cell and Brain Research Institute Dept. Integrative Neurosciences Bron, France Laurence T. Maloney Department of

More information

Regression Analysis By Example

Regression Analysis By Example Regression Analysis By Example Third Edition SAMPRIT CHATTERJEE New York University ALI S. HADI Cornell University BERTRAM PRICE Price Associates, Inc. A Wiley-Interscience Publication JOHN WILEY & SONS,

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models

Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models Massimiliano Bratti & Alfonso Miranda In many fields of applied work researchers need to model an

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information