Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Size: px
Start display at page:

Download "Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'"

Transcription

1 Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression

2 Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where f(x) is the systematic part (that can be predicted using the inputs X) and ε is a disturbance (error) term, accounting for the all the variation sources dierent from X. Assumptions: Linearity. The regression function is linear in the inputs: f(x) = β 0 + p β j X j Weak exogeneity: E(ε X) = 0, i.e. X and ε are independent in the mean. j=1 Homoscedasticity: E(ε 2 X) = Var(ε X) = σ 2.

3 The inputs set X includes quantititative variables, nonlinear transformations and basis expansions (to be dened later), as well as dummy variables coding the levels of qualitative inputs. Under these assumptions the regression function is interpreted as the conditional mean of Y given X f(x) = E(Y X). This is the optimal predictor of Y under square loss (minimum mean square estimator of Y ).

4 Estimation Set of training data {(y i, x i ), i = 1,..., N}, with x i = (x i1, x i2,..., x ip ). We assume that the training sample is a sample of size N, drawn independently (with replacement) from the underlying population, so that for the i-th individual we can write p y i = β 0 + β j x ij + ε i, j=1 Writing the N equations in matrix form: y = Xβ + ε i = 1,..., N with E(ε X) = 0 and Var(ε X) = σ 2 I. y 1 1 x 11 x x 1k... x 1p y 2 1 x 21 x x 2k... x 2p y =. y i, X = x i1 x i2... x ik... x ip, ε = y N 1 x N1 x N2... x Nk... x Np β = [β 0, β 1,..., β p] ε 1 ε 2. ε i. ε N

5 We aim at estimating the unknown parameters β 0,..., β p, and σ 2. An important estimation method is Ordinary Least Squares (OLS). OLS obtains an estimate of β denoted ˆβ, as the minimizers of the residual sum of squares RSS(β) = (y Xβ) (y Xβ). There is a closed form solution to the above problem. Computing the partial derivatives of RSS(β) w.r.t. each β j and equating to zero yields a p + 1 linear system of equations in p + 1 unknowns, X Xβ = X y which is solved w.r.t. β. The solution is unique provided that the X's are not collinear. In matrix notation the solution is ˆβ = (X X) 1 X y

6 Example: p = 1 (simple linear regression) Nβ 0 + β 1 i x i = i y i β 0 i x i + β 1 i x2 i = i x iy i We solve this system by substitution: solving wrt β 0 the 1st eqn. yields ˆβ 0 = ȳ ˆβ 1 x Replacing into the 2nd and solving wrt ˆβ 1 gives: 1 N i ˆβ 1 = x iy i xȳ Cov(x, y) i = = (x i x)(y i ȳ) x 2 i x 2 Var(x) i (x i x) 2 1 N

7 Predicted values and residuals Predicted (tted) values ŷ = Xˆβ = X(X X) 1 X y = Hy H = X(X X) 1 X is the hat matrix, projecting y orthogonally onto the vector space generated by X. Residuals The OLS residuals are where M = I H. e = y ŷ = (I H)y = My

8 Properties of ŷ and e The tted values and the LS residuals have properties that are simple to show and are good to know. Orthogonality. X e = 0. The residuals are uncorrelated with each of the variable in X. In the model includes the intercept, the residuals have zero mean. ŷ e = 0 RSS(ˆβ) = e e = i e2 i (henceforth RSS) From y = ŷ + e and the previous properties, y 2 = ŷ 2 + e 2 ȳ = N 1 i y = ŷ (average of predicted values)

9 Estimation of σ 2 For estimating σ 2 (the variance of the error term) we could use the variance of the residuals corrected for the number of degrees of freedom: ˆσ 2 i = e2 i N p 1 N p 1 is known as the number of degrees of freedom. ˆσ is the standard error of regression (SER).

10 Goodness of t Dene T SS = i (y i ȳ) 2 (Total sum of squares - Deviance of y i ) ESS = i (ŷ i ȳ) 2 (Explained sum of squares - Deviance of ŷ i ) RSS = i e2 i (Residual sum of squares - Deviance of e i) It is easy to prove that A relative measure of g.o.f. is T SS = ESS + RSS R 2 = ESS T SS = 1 RSS T SS

11 This measures suers from a serious drawback when used for model selection: the inclusion of (possibly irrelevant) additional regressors always produces an increase in R 2. Adjusted R 2 R 2 = 1 RSS/(N p 1) T SS/(N 1) = 1 N 1 N p 1 (1 R2 )

12 Properties of the OLS estimators We are going to look at the statistical properties of the OLS estimators (and other related quantities, such as the tted values and the residuals), hypothesizing that we can draw repeated training samples from the same population, keeping the X's xed and drawing dierent Y 's. A distribution of outcomes will arise. Under the stated assumptions, E(ˆβ X) = β, Var(ˆβ) = σ 2 (X X) 1 Also, E(ˆσ 2 ) = σ 2. Moreover, the OLS estimators are Best Linear Unbiased (Gauss Markov theorem): E( ˆβ j ) = β j, j and Var( ˆβ j ) = MSE( ˆβ j ) is the smallest among the class of all linear unbiased estimators. Example: p = 1 (simple regression model) [ Var( ˆβ 1 0 ) = σ 2 N + x 2 ] i (x i x) 2, Var( ˆβ σ 2 1 ) = i (x i x) 2,

13 If we further assume ɛ i x i NID(0, σ 2 ), y i x i N(β 0 + β 1 x i, σ 2 ) The OLS estimators have a normal distribution: ˆβ N(β, σ 2 (X X) 1 ). ˆβ is the maximum likelihood estimator of the coecients (we will say more about it). Letting s.e.( ˆβ j ) denote the estimated standard error of the OLS estimator, i.e. the j-th element of Var(ˆβ) ˆ = ˆσ 2 (X X) 1, t j = ˆβ j β j s.e.( ˆβ j ) t N p 1, a Student's t r.v. with N p 1 d.o.f. This result is used to test hypotheses on a single coecient and to produce interval estimates.

14 Suppose we wish to test for the signicance of subset of the coecients. The relevant statistic for the null that J coecients are all equal to zero is F = (RSS 0 RSS)/J RSS/(N p 1) F J,N p 1 where RSS 0 is the residual sum of squares of the restricted model (i.e. it has J less explanatory variables), and F J,N K is Fisher's distribution with J and N p 1 d.o.f. As a special case, the test for H 0 : β 1 = β 2 = = β p = 0 (all the regression coecients are zero, except for the intercept) is: F = ESS/p RSS/(N p 1) = R 2 /p (1 R 2 )/(N p 1)

15 Example: clothing data. Regression of total sales (logs) on hours worked and store size (logs). Call: lm(formula = y ~ x1 + x2, data = clothing) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** x < 2e-16 *** x e-12 *** --- Residual standard error: on 397 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 2 and 397 DF, p-value: < 2.2e-16

16 Figure : Fitted values log(hoursw) log(ssize) log(tsales)

17 Example: house price regression in R lm(formula = price ~ sqft + Age + Pool + Bedrooms + Pool + Firepl Waterfront + DOM) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) sqft < 2e-16 *** Age e-06 *** Pool Bedrooms e-06 *** Fireplace Waterfront e-11 *** DOM Residual standard error: on 1072 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 7 and 1072 DF, p-value: < 2.2e-16

18 Figure : Histogram of residuals Frequency e+05 0e+00 2e+05 4e+05 6e+05 8e+05 e

19 Properties of the OLS residuals The i-th residual e i has zero expectation (under the model's assumptions) and Var(e i ) = σ 2 (1 h i ), where h i = x i(x X) 1 x i is the leverage of observation i (the i-th diagonal element of H). h i is an important diagnostic quantity. It is a measure of remoteness of the inputs for the i-th individual in the space of the inputs. The barplot of h i versus i illustrates the inuence of the i-th observation on the t. It is possible to show that i h i = p + 1 and the average is (p + 1)/N. The standardized residual is dened as r i = e i ˆσ 1 h i and is used in regression diagnostics (normal probability plot, check for homoscedasticity).

20 Figure : Anscombe Quartet: 4 datasets giving the same set tted values

21 Figure : Anscombe Quartet: Leverage plot (h i vs i)

22 Specication issues: omitted variable bias, collinearity, etc. Consider the regression model y = Xβ + γw + ε, where w is a vector of N measurements on an added variable W. We can show that the OLS estimator of γ is and ˆγ = w My w Mw, M = I X(X X) 1 X, Var(ˆγ) = σ2 w Mw. These expressions have a nice interpretation: ˆγ results from Regress Y on X and obtain the residuals e Y X = My; Regress W on X and obtain the residuals e W X = Mw; Regress e Y X on e W X to obtain the LS estimate ˆγ (partial regr. coe.).

23 Multicollinearity Mw are the residuals of the regression of W on X. w Mw is the residual sum of squares from that regression. If X explains most of the variation in W, this is very small, and Var(ˆγ) will be very high. We can conclude that the eect that W exerts on Y is poorly (i.e. very imprecisely) estimated.

24 Omission of relevant variables If W is omitted from the regression, the OLS estimator ˆβ is biased, but also more precise: E(ˆβ) = β + (X X) 1 X wγ, Var(ˆβ) = σ 2 (X X) 1 σ 2 (X M W X) 1 ; M W = I w(w w) 1 w, where σ 2 (X M W X) 1 is the variance of the estimator of β when W is included in the model. If W is irrelevant (true γ = 0) there is only a cost associated to including it in the model. In any case, if W is orthogonal to X the distribution of ˆβ is not aected by W.

25 Prediction of a new sample observation We are interested in predicting Y for a new unit with values of X in the vector x o = (1, x o1,..., x op ). The optimal predictor under square loss is ŷ o = ˆf(x o ) = ˆβ 0 + ˆβ 1 x o1 + + ˆβ p x op = x o ˆβ Under the stated assumptions, the predictor is unbiased, i.e. the prediction error has zero mean, E(Y o ŷ o ) = 0, and Var(Y o ŷ o ) = σ 2 (1 + x o(x X) 1 x o ) Note: the stated assumptions concern the selection of the set of predictors in X and the properties of ε.

26 Predictive accuracy depends on 1. Ability to select all the relevant variable. Failure to do so will result in an inated σ 2 and E(ε X) 0, so that ˆβ is no longer optimal. 2. How removed is the unit prole x o from the rest: x o(x X) 1 x o measures the (Mahalanobis) distance of the values of the X 1,..., X p for the o-th unit in the space of the X's.

27 If the true model is y = Xβ + γw + ɛ, E(ɛ X, w) = 0, Var(ɛ X, w) = σ 2 ɛ I, the above predictor is biased and its mean square error is with E[(Y 0 ˆf(x o )) 2 X, w, x o, w o ] = σ 2 ɛ (1 + x o(x X) 1 x o ) + Bias 2, Bias = (β E(ˆβ)) x o + γw o = (w o w X(X X) 1 x o )γ The term in parenthesis is the error of prediction of w o using the information in x o.

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)

More information

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Multivariate Regression Analysis

Multivariate Regression Analysis Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Analisi Statistica per le Imprese

Analisi Statistica per le Imprese , Analisi Statistica per le Imprese Dip. di Economia Politica e Statistica 4.3. 1 / 33 You should be able to:, Underst model building using multiple regression analysis Apply multiple regression analysis

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Linear Regression Models

Linear Regression Models Linear Regression Models November 13, 2018 1 / 89 1 Basic framework Model specification and assumptions Parameter estimation: least squares method Coefficient of determination R 2 Properties of the least

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Bias and variance (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 49 Our plan today We saw in last lecture that model scoring methods seem to be trading off two different

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

9. Least squares data fitting

9. Least squares data fitting L. Vandenberghe EE133A (Spring 2017) 9. Least squares data fitting model fitting regression linear-in-parameters models time series examples validation least squares classification statistics interpretation

More information

Advanced Quantitative Methods: ordinary least squares

Advanced Quantitative Methods: ordinary least squares Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012 1 2 3 4 5 Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Chapter 3: Multiple Regression. August 14, 2018

Chapter 3: Multiple Regression. August 14, 2018 Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Quantitative Methods I: Regression diagnostics

Quantitative Methods I: Regression diagnostics Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear

More information

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

The general linear regression with k explanatory variables is just an extension of the simple regression as follows 3. Multiple Regression Analysis The general linear regression with k explanatory variables is just an extension of the simple regression as follows (1) y i = β 0 + β 1 x i1 + + β k x ik + u i. Because

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Lecture 4: Regression Analysis

Lecture 4: Regression Analysis Lecture 4: Regression Analysis 1 Regression Regression is a multivariate analysis, i.e., we are interested in relationship between several variables. For corporate audience, it is sufficient to show correlation.

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Multiple Regression Analysis

Multiple Regression Analysis Chapter 4 Multiple Regression Analysis The simple linear regression covered in Chapter 2 can be generalized to include more than one variable. Multiple regression analysis is an extension of the simple

More information

Lecture 11: Regression Methods I (Linear Regression)

Lecture 11: Regression Methods I (Linear Regression) Lecture 11: Regression Methods I (Linear Regression) Fall, 2017 1 / 40 Outline Linear Model Introduction 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) The Simple Linear Regression Model based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #2 The Simple

More information

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Ref.:   Spring SOS3003 Applied data analysis for social science Lecture note SOS3003 Applied data analysis for social science Lecture note 05-2010 Erling Berge Department of sociology and political science NTNU Spring 2010 Erling Berge 2010 1 Literature Regression criticism I Hamilton

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

11 Hypothesis Testing

11 Hypothesis Testing 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes

More information

Lecture 11: Regression Methods I (Linear Regression)

Lecture 11: Regression Methods I (Linear Regression) Lecture 11: Regression Methods I (Linear Regression) 1 / 43 Outline 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear Regression Ordinary Least Squares Statistical

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Essential of Simple regression

Essential of Simple regression Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship

More information

Lecture 1: OLS derivations and inference

Lecture 1: OLS derivations and inference Lecture 1: OLS derivations and inference Econometric Methods Warsaw School of Economics (1) OLS 1 / 43 Outline 1 Introduction Course information Econometrics: a reminder Preliminary data exploration 2

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Motivation for multiple regression

Motivation for multiple regression Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope

More information

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017 Summary of Part II Key Concepts & Formulas Christopher Ting November 11, 2017 christopherting@smu.edu.sg http://www.mysmu.edu/faculty/christophert/ Christopher Ting 1 of 16 Why Regression Analysis? Understand

More information

The Aitken Model. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 41

The Aitken Model. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 41 The Aitken Model Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 41 The Aitken Model (AM): Suppose where y = Xβ + ε, E(ε) = 0 and Var(ε) = σ 2 V for some σ 2 > 0 and some known

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X = The Gauss-Markov Linear Model y Xβ + ɛ y is an n random vector of responses X is an n p matrix of constants with columns corresponding to explanatory variables X is sometimes referred to as the design

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

The Multiple Regression Model Estimation

The Multiple Regression Model Estimation Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Econometrics I. Introduction to Regression Analysis (Outline of the course) Marijus Radavi ius

Econometrics I. Introduction to Regression Analysis (Outline of the course) Marijus Radavi ius Econometrics I Introduction to Regression Analysis (Outline of the course) Marijus Radavi ius Vilnius 2012 Content 1 Introduction 2 1.1 Motivation................................. 2 1.2 Terminology................................

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

Steps in Regression Analysis

Steps in Regression Analysis MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4) 2-1 Steps in Regression Analysis 1. Review the literature and develop the theoretical model 2. Specify the model:

More information

ECON 4230 Intermediate Econometric Theory Exam

ECON 4230 Intermediate Econometric Theory Exam ECON 4230 Intermediate Econometric Theory Exam Multiple Choice (20 pts). Circle the best answer. 1. The Classical assumption of mean zero errors is satisfied if the regression model a) is linear in the

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests

MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests 1/16 MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests Dominique Guillot Departments of Mathematical Sciences University of Delaware February 17, 2016 Statistical

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Statistics - Lecture Three. Linear Models. Charlotte Wickham   1. Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Lecture 1 Intro to Spatial and Temporal Data

Lecture 1 Intro to Spatial and Temporal Data Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

L2: Two-variable regression model

L2: Two-variable regression model L2: Two-variable regression model Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: September 4, 2014 What we have learned last time...

More information

x 21 x 22 x 23 f X 1 X 2 X 3 ε

x 21 x 22 x 23 f X 1 X 2 X 3 ε Chapter 2 Estimation 2.1 Example Let s start with an example. Suppose that Y is the fuel consumption of a particular model of car in m.p.g. Suppose that the predictors are 1. X 1 the weight of the car

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

4 Multiple Linear Regression

4 Multiple Linear Regression 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ

More information

Variable Selection and Model Building

Variable Selection and Model Building LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 37 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur The complete regression

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

3. Linear Regression With a Single Regressor

3. Linear Regression With a Single Regressor 3. Linear Regression With a Single Regressor Econometrics: (I) Application of statistical methods in empirical research Testing economic theory with real-world data (data analysis) 56 Econometrics: (II)

More information

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27 Estimation of the Response Mean Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 27 The Gauss-Markov Linear Model y = Xβ + ɛ y is an n random vector of responses. X is an n p matrix

More information

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators Thilo Klein University of Cambridge Judge Business School Session 4: Linear regression,

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate

More information

AMS-207: Bayesian Statistics

AMS-207: Bayesian Statistics Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information