Indian Statistical Institute

Size: px
Start display at page:

Download "Indian Statistical Institute"

Transcription

1 Indian Statistical Institute Introductory Computer programming Robust Regression methods with high breakdown point Author: Roll No: MD1701 February 24, 2018

2 Contents 1 Introduction 2 2 Criteria for evaluating regression estimators Breakdown point The Mean Square Error The Bounded Influence Robust Regression methods Least Trimmed Square estimator M estimator S estimator Algorithms M estimation S estimation Simulation Study 8 6 Simulation Results Mean of the estimates of coefficients Mean Square Error Absolute Bias Mean Square Error Absolute Bias Comparison of Run times Least Absolute Deviation Huber M estimate Bisquare M estimate Conclusions 19 1

3 1 Introduction Regression Analysis is used to study how a dependent variable is linearly related to the set of explanatory variables. The linear regression model is given by y i = β 0 + β 1 x i1 + β 2 x i2 +...β p 1 x ip 1 + ɛ i i = 1, 2, 3...n or in matrix notation Y = Xβ + ɛ where Y denote the n 1 column vector containing the value of the dependent variable X is a n p matrix with first column consisting of 1s and the i th column as the vector containing (i 1) th explanatory variable β as the unknown p 1 vector of parameters and ɛ as the vector containing errors. Regression analysis aims to find the estimate of the unknown vector β.one of the commonly used methods is to find those values of ˆβ which minimises the value of e e where e is the vector of residuals i.e., ˆβ = arg min(y Xβ) (Y Xβ) β In order to use this, the assumptions on which this is based should be met the most important of which being that the errors are normally distributed. However one of the common problems is the presence of the outliers in the data which can be attributed to different sources like faulty measurement, incorrect reading, failure of instrument etc. Outlier in general can be classified in two types one being the outlier in X direction and outlier in y direction. Outlier in x direction(or high leverage points)corresponds to those in which the x i may be an outlier in the p dimensional space occupied by the rows of the regression matrix. Outlier in y direction are those where there is a large difference between the y value and the value predicted by the model. In this study we will compare various robust regression methods on the basis of their MSE and absolute bias in case when the data has outliers in y direction. 2

4 2 Criteria for evaluating regression estimators 2.1 Breakdown point Breakdown points is a very popular quantitative characteristic of robustness. Consider a random sample x 0 = (x 1, x 2,...x n ) and the corresponding value T n (x 0 ) of an estimator of functional T based on the sample x 0. Now suppose in this original sample we can replace any of the m components by arbitrary values(even infinity is allowed) and denote the new sample by x m and let T n (x m ) denotes the value of the estimator for this new sample. The Breakdown point of the estimator T n for sample x 0 is the number ɛ n(t n, x 0 ) = m (x 0 ) n where m (x 0 ) is the smallest integer m,for which 2.2 The Mean Square Error sup T n (x m ) T n (x o ) = x 0 The Mean Square error is another performance criteria used to evaluate the perfromance of the estimator. It is defined as MSE = ( ˆβ R β) ( ˆβ R β) where ˆβ R is the vector of robust parameter estimates and β is the vector of the true model coefficients. 2.3 The Bounded Influence Bounded influence in the X space is the estimators resistance to being pulled toward extreme observation in the X space. To determine whether an estimator has bounded influence is done by studying the influence function. Define influence function IF T,F (.) of the estimator T, at the underlying probability distribution F,by IF T,F (x 0 ) = lim ɛ 0 T (F ) T ( F ) ɛ = [ T ( F ] ) ɛ ɛ=0 3

5 Here T(F) denote the estimator of interest expressed as a functional. The functional T ( F ) represent the estimator of interest under the altered CDF. Thus the influence function is actually a first order derivative of an estimator viewed as a functional and measures the influence of the point x 0 on the estimator T. 3 Robust Regression methods. 3.1 Least Trimmed Square estimator The Least trimmed square estimator is defined as ˆβ LT S = arg min β h i=1 (e 2 (i)) where e (j) denotes the jth smallest residual. The best robust properties are achieved when h = n and in this case breakdown 2 point is equal to 50% which is the highest possible breakdown point. 3.2 M estimator The M estimator of β is defined as the solution M n of the minimization n ρ(y i x iβ) i=0 with respect to β R p,where ρ is continuous. Here the residuals are also scaled but we have merged the scaling factor in our tuning constant in the ρ function A reasonable ρ function should satisfy following properties Always non negative i.e., ρ(e) 0 e R. ρ(0) = 0 Symmetric i.e., ρ(e) = ρ( e) Monotone i.e., ρ(e) ρ(e ) if e > e Let the derivation of ρ i.e., ρ be ψ and define the weight function w(e) = ψ(e) and e set w i = w(e i ). 4

6 Differentiating the above equation and setting the derivative equal to 0 we get set of p estimating equations. substituting w i = w(e i ) we get, n ψ(y i x ib)x i = 0 (1) i=1 n w i (y i x ib)x i = 0 (2) i=1 Solving these equation is equivalent to solving a weighted least square problem,however the weights depend upon the residuals itself. We shall use an iterative algorithm(irls) to solve it which is described later in detail. The choice of ρ function determines the estimate of β.we will investigate following ρ function: 1. Huber 2. Bisquare 3. Least Square e 2 ρ H (e) = for e k 2 k e k2 otherwise 2 1 for e k w H (e) = k otherwise e e 2 ρ B (e) = for e k 2 k e k2 otherwise 2 { [ k 2 ( e ) ] } for e k w B (e) = 6 k k 2 otherwise 6 ρ LS (e) = e 2 w LS (e) = 1 5

7 4. Least Absolute Deviation ρ LAD = e e LAD = 1 e Note that least square method is a special case of M estimation with ρ LS (e) = e 2 NOTE: The value of k for Huber and Bisquare estimator is called tuning constant. Smaller value of k produce more resistance to outliers but at the expense of lower efficieny when errors are normally distributed. The tuning constant is chosen to give reasonably high efficiency in case of normal case. We have chosen k = 1.345ˆσ for huber estimate and k = ˆσ for bisquare estimator which provides an efficiency of close to 95% in case of normal errors. 3.3 S estimator The goal of S estimator is to have a simple high breakdown point estimator which share the flexibility and nice asymptodic properties of the M estimator.the name S estimator was chosen because they are based on estimate of scale The weakness of M estimation is the lack of consideration on the data distribution and not a function of the overall data because only using the median as the weighted value. This method uses the residual standard deviation to overcome the weaknesses of median. Huber (1964) defined M-scale estimates for e = (e 1,..., e n ) as { } s M (e) = inf s > 0 : 1 n ( ei ) ρ b n s To guarantee consistency when the data are normally distributed, the constant b is usually chosen to be E φ (ρ(u)), where φ denotes the standard normal distribution. When ρ is continuous, equality is achieved for s M (e) The regression estimates associated with M-scales are the S-estimators,in particular they satisfy where ŝ = s M (r( ˆβ n )) ˆβ n = arg min β R p i=1 n ( ) ei (β) ρ ŝ i=1 6

8 4 Algorithms 4.1 M estimation 1. Obtain an initial estimate of β 0 such as the least squares estimates. 2. At each iteration calculate the residuals e (t 1) i w[e (t 1) i ] 3. Solve for the new weighted least square estimates b (t) = [ X W (t 1) x ] 1 X W (t 1) y 4. Repeat step 2 and 3 until the estimated coefficients converge and associated weights w (t 1) i = This method is called Iteratively Reweighted Least Squares(IRLS) method. 4.2 S estimation 1. Obtain an initial estimate of β 0 using the method of least squares. 2. Calculate the residual values e i = y i ŷ i. 3. Calculate value median e i median(e i ) ˆσ i = n i=1 w ie 2 i nk 4. Calculate value u i = e i ˆσ i, iteration = 1, iteration > 1 5. Calculate weighted values If iteration number = 1 [ ( ui ) ] 2 2 1, u w i = i , otherwise If iteration number > 1 w i = ρ(u i) u 2 i 6. Calculate ˆβ s with WLS method with weights w i. 7. Repeat steps 2-6 to obtain a convergent value of ˆβ s. 7

9 5 Simulation Study Simulation study is conducted to compare different methods of estimation. These methods are 1. The ordinary least square estimator(ols). 2. The least median square estimator(lms) 3. The least trimmed square estimator (LTS). 4. The S estimator. 5. The least absolute deviation estimator (lad). 6. The Huber M estimator (M h uber) with k = The Turkey s M estimator(m t urkey) with k = Criterias used for comparison of the regression estimates are Mean Square Error(MSE) and Absolute Bias(AB). The data is generated according to the model: Y = 1 + X 1 + X 2 + X 3 + X 4 + e and hence the true value of coefficients are all equal to 1 The data simulation is repeated 5000 times to obtain 5000 independent samples of X and Y of given size n. This process is done for sample of size n=30 and n=100.(least Median Square is only implemented for n=30 case). In order to cover effects of various situations on the regression coefficient five scenarios of the density function of error have been used. These are as follows: Scenario I e N(0,1) ; the standard normal distribution Scenario II e t distribution with degree of freedom 1; the cauchy distribution. Scenario III e t distribution with degree of freedom 5. Scenario IV e N(0,1) with 25% outliers in y direction from N(0,10) distribution. Scenario V e N(0,1) with 40% outliers in y direction from N(0,10) distribution. 8

10 6 Simulation Results Simulation results for the case when n= Mean of the estimates of coefficients. Scenario I Method β 0 β 1 β 2 β 3 β 4 LTS OLS M BISQUARE M HUBER LAD S Scenario II Method β 0 β 1 β 2 β 3 β 4 LTS OLS M BISQUARE M HUBER LAD S Scenario III Method β 0 β 1 β 2 β 3 β 4 LTS OLS M BISQUARE M HUBER LAD S

11 Scenario IV Method β 0 β 1 β 2 β 3 β 4 LTS OLS M BISQUARE M HUBER LAD S Scenario V Method β 0 β 1 β 2 β 3 β 4 LTS OLS M BISQUARE M HUBER LAD S Mean Square Error Scenario I Scenario II LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S

12 Scenario III Scenario IV Scenario V LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S Absolute Bias Scenario I LTS OLS M Bisquare M Huber LAD S

13 Scenario II Scenario III Scenario IV Scenario V LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S

14 Case when n= Mean Square Error Scenario I Scenario II LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S Scenario III Scenario IV LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S

15 Scenario V LTS OLS M Bisquare M Huber LAD S Absolute Bias Scenario I Scenario II Scenario III LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S

16 Scenario IV Scenario V LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S

17 7 Comparison of Run times We did a study to compare the running time of two different implementation of the same methods. For a separate implementation we used functions of package Robustbase. 7.1 Least Absolute Deviation The function lmrob.lar() in the package uses Simplex method to find the solution of the required equation while we have used IRLS which is an iterative algorithm. Plot of run time obtained is: Time (seconds) Index 16

18 7.2 Huber M estimate Comparison of run time required for the two implementation is done. We have used the function rlm() of the MASS package.from the graph it is clearly evident that the function rlm() performs significantly better than m huber() function Time (seconds) Size of sample in thousands 17

19 7.3 Bisquare M estimate The function rlm() also gives Bisquare M estimate upon changing the psi function in its arguments. The plot of run time comparison is as follows: Time (seconds) Size of sample in thousands 18

20 8 Conclusions Scenario I The OLS estimate obtain the best performance. The M estimate and the S estimates have a better performance than the high breakdown point estimates(i.e., LTS). As is expected the MSE is lower for case when sample size is 100 compared to when sample size is 30. This is also the case with absolute bias. Scenario II Ordinary least square method perform worst with the estimates being biased and very MSE. Even LAD and Huber M estimate perform very poorly. MSE for S estimate was better compared to others but still are very poor. LTS estimates has best performance. In terms of absolute bias, the LTS, bisquare M and S estimates has the lowest bias. Scenario III In this case Huber M and Bisquare M estimate have the best performance in terms of both absolute bias. OLS perform better than LTS and LAD estimates. Scenario IV In terms of MSE, LAD and OLS have the worst performance both for the case when n=30 and n=100.on the other hand S estimates have best performance followed by LTS and M estimates. In terms of absolute bias,lts and S estimates have lowest bias. Scenario V Again the worst performance is by OLS and then LAD whereas the best performance is by S and LTS followed by M estimates In terms of absolute bias LTS AND S estimtes have the lowest bias. Final Verdict Except for the case when errors have standard normal distribution, OLS perform the worst suggesting needs of alternate methods which are more robust. LTS and S performs the best in all the other 4 scenarios. S estimates perform better than M estimates because M estimates does not take care of scaling factor which S estimate does. 19

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

PAijpam.eu M ESTIMATION, S ESTIMATION, AND MM ESTIMATION IN ROBUST REGRESSION

PAijpam.eu M ESTIMATION, S ESTIMATION, AND MM ESTIMATION IN ROBUST REGRESSION International Journal of Pure and Applied Mathematics Volume 91 No. 3 2014, 349-360 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v91i3.7

More information

9. Robust regression

9. Robust regression 9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................

More information

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics

More information

Lecture 12 Robust Estimation

Lecture 12 Robust Estimation Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes

More information

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,

More information

Chapter 11: Robust & Quantile regression

Chapter 11: Robust & Quantile regression Chapter 11: Robust & Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 13 11.3: Robust regression Leverages h ii and deleted residuals t i

More information

Computing Robust Regression Estimators: Developments since Dutter (1977)

Computing Robust Regression Estimators: Developments since Dutter (1977) AUSTRIAN JOURNAL OF STATISTICS Volume 41 (2012), Number 1, 45 58 Computing Robust Regression Estimators: Developments since Dutter (1977) Moritz Gschwandtner and Peter Filzmoser Vienna University of Technology,

More information

Different types of regression: Linear, Lasso, Ridge, Elastic net, Ro

Different types of regression: Linear, Lasso, Ridge, Elastic net, Ro Different types of regression: Linear, Lasso, Ridge, Elastic net, Robust and K-neighbors Faculty of Mathematics, Informatics and Mechanics, University of Warsaw 04.10.2009 Introduction We are given a linear

More information

Remedial Measures for Multiple Linear Regression Models

Remedial Measures for Multiple Linear Regression Models Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline

More information

KANSAS STATE UNIVERSITY Manhattan, Kansas

KANSAS STATE UNIVERSITY Manhattan, Kansas ROBUST MIXTURE MODELING by CHUN YU M.S., Kansas State University, 2008 AN ABSTRACT OF A DISSERTATION submitted in partial fulfillment of the requirements for the degree Doctor of Philosophy Department

More information

Regression Estimation in the Presence of Outliers: A Comparative Study

Regression Estimation in the Presence of Outliers: A Comparative Study International Journal of Probability and Statistics 2016, 5(3): 65-72 DOI: 10.5923/j.ijps.20160503.01 Regression Estimation in the Presence of Outliers: A Comparative Study Ahmed M. Gad 1,*, Maha E. Qura

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Robust model selection criteria for robust S and LT S estimators

Robust model selection criteria for robust S and LT S estimators Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Robust high-dimensional linear regression: A statistical perspective

Robust high-dimensional linear regression: A statistical perspective Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,

More information

Introduction to Linear regression analysis. Part 2. Model comparisons

Introduction to Linear regression analysis. Part 2. Model comparisons Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual

More information

High Breakdown Point Estimation in Regression

High Breakdown Point Estimation in Regression WDS'08 Proceedings of Contributed Papers, Part I, 94 99, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS High Breakdown Point Estimation in Regression T. Jurczyk Charles University, Faculty of Mathematics and

More information

Robust multivariate methods in Chemometrics

Robust multivariate methods in Chemometrics Robust multivariate methods in Chemometrics Peter Filzmoser 1 Sven Serneels 2 Ricardo Maronna 3 Pierre J. Van Espen 4 1 Institut für Statistik und Wahrscheinlichkeitstheorie, Technical University of Vienna,

More information

Lecture 14 October 13

Lecture 14 October 13 STAT 383C: Statistical Modeling I Fall 2015 Lecture 14 October 13 Lecturer: Purnamrita Sarkar Scribe: Some one Disclaimer: These scribe notes have been slightly proofread and may have typos etc. Note:

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Regression #3: Properties of OLS Estimator

Regression #3: Properties of OLS Estimator Regression #3: Properties of OLS Estimator Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #3 1 / 20 Introduction In this lecture, we establish some desirable properties associated with

More information

Measuring robustness

Measuring robustness Measuring robustness 1 Introduction While in the classical approach to statistics one aims at estimates which have desirable properties at an exactly speci ed model, the aim of robust methods is loosely

More information

Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model

Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model 204 Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model S. A. Ibrahim 1 ; W. B. Yahya 2 1 Department of Physical Sciences, Al-Hikmah University, Ilorin, Nigeria. e-mail:

More information

A Modified M-estimator for the Detection of Outliers

A Modified M-estimator for the Detection of Outliers A Modified M-estimator for the Detection of Outliers Asad Ali Department of Statistics, University of Peshawar NWFP, Pakistan Email: asad_yousafzay@yahoo.com Muhammad F. Qadir Department of Statistics,

More information

STAT 704 Sections IRLS and Bootstrap

STAT 704 Sections IRLS and Bootstrap STAT 704 Sections 11.4-11.5. IRLS and John Grego Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 14 LOWESS IRLS LOWESS LOWESS (LOcally WEighted Scatterplot Smoothing)

More information

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2014 1 / 13 Chapter 8: Polynomials & Interactions

More information

Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters

Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters Kyriaki Kitikidou, Elias Milios, Lazaros Iliadis, and Minas Kaymakis Democritus University of Thrace,

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Chapter 11: Robust & Quantile regression

Chapter 11: Robust & Quantile regression Chapter 11: Robust & Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/17 11.3: Robust regression 11.3 Influential cases rem. measure: Robust regression

More information

Introduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data

Introduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data Discussion of sampling approach in big data Big data discussion group at MSCS of UIC Outline 1 Introduction 2 The framework 3 Bias and variance 4 Approximate computation of leverage 5 Empirical evaluation

More information

Leverage effects on Robust Regression Estimators

Leverage effects on Robust Regression Estimators Leverage effects on Robust Regression Estimators David Adedia 1 Atinuke Adebanji 2 Simon Kojo Appiah 2 1. Department of Basic Sciences, School of Basic and Biomedical Sciences, University of Health and

More information

Outlier detection and variable selection via difference based regression model and penalized regression

Outlier detection and variable selection via difference based regression model and penalized regression Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Multivariate Calibration with Robust Signal Regression

Multivariate Calibration with Robust Signal Regression Multivariate Calibration with Robust Signal Regression Bin Li and Brian Marx from Louisiana State University Somsubhra Chakraborty from Indian Institute of Technology Kharagpur David C Weindorf from Texas

More information

Robust Methods in Regression Analysis: Comparison and Improvement. Mohammad Abd- Almonem H. Al-Amleh. Supervisor. Professor Faris M.

Robust Methods in Regression Analysis: Comparison and Improvement. Mohammad Abd- Almonem H. Al-Amleh. Supervisor. Professor Faris M. Robust Methods in Regression Analysis: Comparison and Improvement By Mohammad Abd- Almonem H. Al-Amleh Supervisor Professor Faris M. Al-Athari This Thesis was Submitted in Partial Fulfillment of the Requirements

More information

Robust Regression. Catherine Stuart

Robust Regression. Catherine Stuart Robust Regression Catherine Stuart 16th April, 2011 Abstract An introduction to robustness in statistics, with emphasis on its relevance to regression analysis. The weaknesses of the least squares estimator

More information

Breakdown points of Cauchy regression-scale estimators

Breakdown points of Cauchy regression-scale estimators Breadown points of Cauchy regression-scale estimators Ivan Mizera University of Alberta 1 and Christine H. Müller Carl von Ossietzy University of Oldenburg Abstract. The lower bounds for the explosion

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

VARIABLE SELECTION IN ROBUST LINEAR MODELS

VARIABLE SELECTION IN ROBUST LINEAR MODELS The Pennsylvania State University The Graduate School Eberly College of Science VARIABLE SELECTION IN ROBUST LINEAR MODELS A Thesis in Statistics by Bo Kai c 2008 Bo Kai Submitted in Partial Fulfillment

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

Chapter 11 Building the Regression Model II:

Chapter 11 Building the Regression Model II: Chapter 11 Building the Regression Model II: Remedial Measures( 補救措施 ) 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 11 1 / 48 11.1 WLS remedial measures may

More information

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems Modern Applied Science; Vol. 9, No. ; 05 ISSN 9-844 E-ISSN 9-85 Published by Canadian Center of Science and Education Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity

More information

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points International Journal of Contemporary Mathematical Sciences Vol. 13, 018, no. 4, 177-189 HIKARI Ltd, www.m-hikari.com https://doi.org/10.1988/ijcms.018.8616 Alternative Biased Estimator Based on Least

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Half-Day 1: Introduction to Robust Estimation Techniques

Half-Day 1: Introduction to Robust Estimation Techniques Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Half-Day 1: Introduction to Robust Estimation Techniques Andreas Ruckstuhl Institut fr Datenanalyse

More information

Package ltsbase. R topics documented: February 20, 2015

Package ltsbase. R topics documented: February 20, 2015 Package ltsbase February 20, 2015 Type Package Title Ridge and Liu Estimates based on LTS (Least Trimmed Squares) Method Version 1.0.1 Date 2013-08-02 Author Betul Kan Kilinc [aut, cre], Ozlem Alpu [aut,

More information

Spatial autocorrelation: robustness of measures and tests

Spatial autocorrelation: robustness of measures and tests Spatial autocorrelation: robustness of measures and tests Marie Ernst and Gentiane Haesbroeck University of Liege London, December 14, 2015 Spatial Data Spatial data : geographical positions non spatial

More information

A Comparison of Robust Estimators Based on Two Types of Trimming

A Comparison of Robust Estimators Based on Two Types of Trimming Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical

More information

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 4 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 23 Recommended Reading For the today Serial correlation and heteroskedasticity in

More information

ARE THE BAD LEVERAGE POINTS THE MOST DIFFICULT PROBLEM FOR ESTIMATING THE UNDERLYING REGRESSION MODEL?

ARE THE BAD LEVERAGE POINTS THE MOST DIFFICULT PROBLEM FOR ESTIMATING THE UNDERLYING REGRESSION MODEL? ARE THE BAD LEVERAGE POINTS THE MOST DIFFICULT PROBLEM FOR ESTIMATING THE UNDERLYING REGRESSION MODEL? RESEARCH SUPPORTED BY THE GRANT OF THE CZECH SCIENCE FOUNDATION 13-01930S PANEL P402 ROBUST METHODS

More information

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013 18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are

More information

Definitions of ψ-functions Available in Robustbase

Definitions of ψ-functions Available in Robustbase Definitions of ψ-functions Available in Robustbase Manuel Koller and Martin Mächler July 18, 2018 Contents 1 Monotone ψ-functions 2 1.1 Huber.......................................... 3 2 Redescenders

More information

Small Sample Corrections for LTS and MCD

Small Sample Corrections for LTS and MCD myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

Predictive Regression and Robust Hypothesis Testing: Predictability Hidden by Anomalous Observations

Predictive Regression and Robust Hypothesis Testing: Predictability Hidden by Anomalous Observations Predictive Regression and Robust Hypothesis Testing: Predictability Hidden by Anomalous Observations Lorenzo Camponovo Olivier Scaillet Fabio Trojani Discussion by Grigory Vilkov 23-27 July 2012 European

More information

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)

More information

Lecture 4: Regression Analysis

Lecture 4: Regression Analysis Lecture 4: Regression Analysis 1 Regression Regression is a multivariate analysis, i.e., we are interested in relationship between several variables. For corporate audience, it is sufficient to show correlation.

More information

Weighted Least Squares I

Weighted Least Squares I Weighted Least Squares I for i = 1, 2,..., n we have, see [1, Bradley], data: Y i x i i.n.i.d f(y i θ i ), where θ i = E(Y i x i ) co-variates: x i = (x i1, x i2,..., x ip ) T let X n p be the matrix of

More information

Regression #2. Econ 671. Purdue University. Justin L. Tobias (Purdue) Regression #2 1 / 24

Regression #2. Econ 671. Purdue University. Justin L. Tobias (Purdue) Regression #2 1 / 24 Regression #2 Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #2 1 / 24 Estimation In this lecture, we address estimation of the linear regression model. There are many objective functions

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

odhady a jejich (ekonometrické)

odhady a jejich (ekonometrické) modifikace Stochastic Modelling in Economics and Finance 2 Advisor : Prof. RNDr. Jan Ámos Víšek, CSc. Petr Jonáš 27 th April 2009 Contents 1 2 3 4 29 1 In classical approach we deal with stringent stochastic

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

F9 F10: Autocorrelation

F9 F10: Autocorrelation F9 F10: Autocorrelation Feng Li Department of Statistics, Stockholm University Introduction In the classic regression model we assume cov(u i, u j x i, x k ) = E(u i, u j ) = 0 What if we break the assumption?

More information

Robust forecasting of non-stationary time series DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI)

Robust forecasting of non-stationary time series DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) Faculty of Business and Economics Robust forecasting of non-stationary time series C. Croux, R. Fried, I. Gijbels and K. Mahieu DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI KBI 1018

More information

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population Lecture 5 1 Lecture 3 The Population Variance The population variance, denoted σ 2, is the sum of the squared deviations about the population mean divided by the number of observations in the population,

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Nonrobust and Robust Objective Functions

Nonrobust and Robust Objective Functions Nonrobust and Robust Objective Functions The objective function of the estimators in the input space is built from the sum of squared Mahalanobis distances (residuals) d 2 i = 1 σ 2(y i y io ) C + y i

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 2: Linear Regression (v3) Ramesh Johari rjohari@stanford.edu September 29, 2017 1 / 36 Summarizing a sample 2 / 36 A sample Suppose Y = (Y 1,..., Y n ) is a sample of real-valued

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

Various Extensions Based on Munich Chain Ladder Method

Various Extensions Based on Munich Chain Ladder Method Various Extensions Based on Munich Chain Ladder Method etr Jedlička Charles University, Department of Statistics 20th June 2007, 50th Anniversary ASTIN Colloquium etr Jedlička (UK MFF) Various Extensions

More information

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015 MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Robust Statistics. Frank Klawonn

Robust Statistics. Frank Klawonn Robust Statistics Frank Klawonn f.klawonn@fh-wolfenbuettel.de, frank.klawonn@helmholtz-hzi.de Data Analysis and Pattern Recognition Lab Department of Computer Science University of Applied Sciences Braunschweig/Wolfenbüttel,

More information

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel

More information

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu

More information

Median Cross-Validation

Median Cross-Validation Median Cross-Validation Chi-Wai Yu 1, and Bertrand Clarke 2 1 Department of Mathematics Hong Kong University of Science and Technology 2 Department of Medicine University of Miami IISA 2011 Outline Motivational

More information

Econometrics. Andrés M. Alonso. Unit 1: Introduction: The regression model. Unit 2: Estimation principles. Unit 3: Hypothesis testing principles.

Econometrics. Andrés M. Alonso. Unit 1: Introduction: The regression model. Unit 2: Estimation principles. Unit 3: Hypothesis testing principles. Andrés M. Alonso andres.alonso@uc3m.es Unit 1: Introduction: The regression model. Unit 2: Estimation principles. Unit 3: Hypothesis testing principles. Unit 4: Heteroscedasticity in the regression model.

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Optimal normalization of DNA-microarray data

Optimal normalization of DNA-microarray data Optimal normalization of DNA-microarray data Daniel Faller 1, HD Dr. J. Timmer 1, Dr. H. U. Voss 1, Prof. Dr. Honerkamp 1 and Dr. U. Hobohm 2 1 Freiburg Center for Data Analysis and Modeling 1 F. Hoffman-La

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Variable Selection and Model Building

Variable Selection and Model Building LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 37 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur The complete regression

More information

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression Robust Statistics robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

Improved Liu Estimators for the Poisson Regression Model

Improved Liu Estimators for the Poisson Regression Model www.ccsenet.org/isp International Journal of Statistics and Probability Vol., No. ; May 202 Improved Liu Estimators for the Poisson Regression Model Kristofer Mansson B. M. Golam Kibria Corresponding author

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

A Robust Strategy for Joint Data Reconciliation and Parameter Estimation

A Robust Strategy for Joint Data Reconciliation and Parameter Estimation A Robust Strategy for Joint Data Reconciliation and Parameter Estimation Yen Yen Joe 1) 3), David Wang ), Chi Bun Ching 3), Arthur Tay 1), Weng Khuen Ho 1) and Jose Romagnoli ) * 1) Dept. of Electrical

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

Inference for High Dimensional Robust Regression

Inference for High Dimensional Robust Regression Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Package ForwardSearch

Package ForwardSearch Package ForwardSearch February 19, 2015 Type Package Title Forward Search using asymptotic theory Version 1.0 Date 2014-09-10 Author Bent Nielsen Maintainer Bent Nielsen

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

Prediction Intervals in the Presence of Outliers

Prediction Intervals in the Presence of Outliers Prediction Intervals in the Presence of Outliers David J. Olive Southern Illinois University July 21, 2003 Abstract This paper presents a simple procedure for computing prediction intervals when the data

More information

WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION

WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION Michael Amiguet 1, Alfio Marazzi 1, Victor Yohai 2 1 - University of Lausanne, Institute for Social and Preventive Medicine, Lausanne, Switzerland 2 - University

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Minimum Regularized Covariance Determinant Estimator

Minimum Regularized Covariance Determinant Estimator Minimum Regularized Covariance Determinant Estimator Honey, we shrunk the data and the covariance matrix Kris Boudt (joint with: P. Rousseeuw, S. Vanduffel and T. Verdonck) Vrije Universiteit Brussel/Amsterdam

More information