Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'
|
|
- Ginger Price
- 5 years ago
- Views:
Transcription
1 Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression
2 Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where f(x) is the systematic part (that can be predicted using the inputs X) and ε is a disturbance (error) term, accounting for the all the variation sources dierent from X. Assumptions: Linearity. The regression function is linear in the inputs: f(x) = β 0 + p β j X j Weak exogeneity: E(ε X) = 0, i.e. X and ε are independent in the mean. j=1 Homoscedasticity: E(ε 2 X) = Var(ε X) = σ 2.
3 The inputs set X includes quantititative variables, nonlinear transformations and basis expansions (to be dened later), as well as dummy variables coding the levels of qualitative inputs. Under these assumptions the regression function is interpreted as the conditional mean of Y given X f(x) = E(Y X). This is the optimal predictor of Y under square loss (minimum mean square estimator of Y ).
4 Estimation Set of training data {(y i, x i ), i = 1,..., N}, with x i = (x i1, x i2,..., x ip ). We assume that the training sample is a sample of size N, drawn independently (with replacement) from the underlying population, so that for the i-th individual we can write p y i = β 0 + β j x ij + ε i, j=1 Writing the N equations in matrix form: y = Xβ + ε i = 1,..., N with E(ε X) = 0 and Var(ε X) = σ 2 I. y 1 1 x 11 x x 1k... x 1p y 2 1 x 21 x x 2k... x 2p y =. y i, X = x i1 x i2... x ik... x ip, ε = y N 1 x N1 x N2... x Nk... x Np β = [β 0, β 1,..., β p] ε 1 ε 2. ε i. ε N
5 We aim at estimating the unknown parameters β 0,..., β p, and σ 2. An important estimation method is Ordinary Least Squares (OLS). OLS obtains an estimate of β denoted ˆβ, as the minimizers of the residual sum of squares RSS(β) = (y Xβ) (y Xβ). There is a closed form solution to the above problem. Computing the partial derivatives of RSS(β) w.r.t. each β j and equating to zero yields a p + 1 linear system of equations in p + 1 unknowns, X Xβ = X y which is solved w.r.t. β. The solution is unique provided that the X's are not collinear. In matrix notation the solution is ˆβ = (X X) 1 X y
6 Example: p = 1 (simple linear regression) Nβ 0 + β 1 i x i = i y i β 0 i x i + β 1 i x2 i = i x iy i We solve this system by substitution: solving wrt β 0 the 1st eqn. yields ˆβ 0 = ȳ ˆβ 1 x Replacing into the 2nd and solving wrt ˆβ 1 gives: 1 N i ˆβ 1 = x iy i xȳ Cov(x, y) i = = (x i x)(y i ȳ) x 2 i x 2 Var(x) i (x i x) 2 1 N
7 Predicted values and residuals Predicted (tted) values ŷ = Xˆβ = X(X X) 1 X y = Hy H = X(X X) 1 X is the hat matrix, projecting y orthogonally onto the vector space generated by X. Residuals The OLS residuals are where M = I H. e = y ŷ = (I H)y = My
8 Properties of ŷ and e The tted values and the LS residuals have properties that are simple to show and are good to know. Orthogonality. X e = 0. The residuals are uncorrelated with each of the variable in X. In the model includes the intercept, the residuals have zero mean. ŷ e = 0 RSS(ˆβ) = e e = i e2 i (henceforth RSS) From y = ŷ + e and the previous properties, y 2 = ŷ 2 + e 2 ȳ = N 1 i y = ŷ (average of predicted values)
9 Estimation of σ 2 For estimating σ 2 (the variance of the error term) we could use the variance of the residuals corrected for the number of degrees of freedom: ˆσ 2 i = e2 i N p 1 N p 1 is known as the number of degrees of freedom. ˆσ is the standard error of regression (SER).
10 Goodness of t Dene T SS = i (y i ȳ) 2 (Total sum of squares - Deviance of y i ) ESS = i (ŷ i ȳ) 2 (Explained sum of squares - Deviance of ŷ i ) RSS = i e2 i (Residual sum of squares - Deviance of e i) It is easy to prove that A relative measure of g.o.f. is T SS = ESS + RSS R 2 = ESS T SS = 1 RSS T SS
11 This measures suers from a serious drawback when used for model selection: the inclusion of (possibly irrelevant) additional regressors always produces an increase in R 2. Adjusted R 2 R 2 = 1 RSS/(N p 1) T SS/(N 1) = 1 N 1 N p 1 (1 R2 )
12 Properties of the OLS estimators We are going to look at the statistical properties of the OLS estimators (and other related quantities, such as the tted values and the residuals), hypothesizing that we can draw repeated training samples from the same population, keeping the X's xed and drawing dierent Y 's. A distribution of outcomes will arise. Under the stated assumptions, E(ˆβ X) = β, Var(ˆβ) = σ 2 (X X) 1 Also, E(ˆσ 2 ) = σ 2. Moreover, the OLS estimators are Best Linear Unbiased (Gauss Markov theorem): E( ˆβ j ) = β j, j and Var( ˆβ j ) = MSE( ˆβ j ) is the smallest among the class of all linear unbiased estimators. Example: p = 1 (simple regression model) [ Var( ˆβ 1 0 ) = σ 2 N + x 2 ] i (x i x) 2, Var( ˆβ σ 2 1 ) = i (x i x) 2,
13 If we further assume ɛ i x i NID(0, σ 2 ), y i x i N(β 0 + β 1 x i, σ 2 ) The OLS estimators have a normal distribution: ˆβ N(β, σ 2 (X X) 1 ). ˆβ is the maximum likelihood estimator of the coecients (we will say more about it). Letting s.e.( ˆβ j ) denote the estimated standard error of the OLS estimator, i.e. the j-th element of Var(ˆβ) ˆ = ˆσ 2 (X X) 1, t j = ˆβ j β j s.e.( ˆβ j ) t N p 1, a Student's t r.v. with N p 1 d.o.f. This result is used to test hypotheses on a single coecient and to produce interval estimates.
14 Suppose we wish to test for the signicance of subset of the coecients. The relevant statistic for the null that J coecients are all equal to zero is F = (RSS 0 RSS)/J RSS/(N p 1) F J,N p 1 where RSS 0 is the residual sum of squares of the restricted model (i.e. it has J less explanatory variables), and F J,N K is Fisher's distribution with J and N p 1 d.o.f. As a special case, the test for H 0 : β 1 = β 2 = = β p = 0 (all the regression coecients are zero, except for the intercept) is: F = ESS/p RSS/(N p 1) = R 2 /p (1 R 2 )/(N p 1)
15 Example: clothing data. Regression of total sales (logs) on hours worked and store size (logs). Call: lm(formula = y ~ x1 + x2, data = clothing) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** x < 2e-16 *** x e-12 *** --- Residual standard error: on 397 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 2 and 397 DF, p-value: < 2.2e-16
16 Figure : Fitted values log(hoursw) log(ssize) log(tsales)
17 Example: house price regression in R lm(formula = price ~ sqft + Age + Pool + Bedrooms + Pool + Firepl Waterfront + DOM) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) sqft < 2e-16 *** Age e-06 *** Pool Bedrooms e-06 *** Fireplace Waterfront e-11 *** DOM Residual standard error: on 1072 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 7 and 1072 DF, p-value: < 2.2e-16
18 Figure : Histogram of residuals Frequency e+05 0e+00 2e+05 4e+05 6e+05 8e+05 e
19 Properties of the OLS residuals The i-th residual e i has zero expectation (under the model's assumptions) and Var(e i ) = σ 2 (1 h i ), where h i = x i(x X) 1 x i is the leverage of observation i (the i-th diagonal element of H). h i is an important diagnostic quantity. It is a measure of remoteness of the inputs for the i-th individual in the space of the inputs. The barplot of h i versus i illustrates the inuence of the i-th observation on the t. It is possible to show that i h i = p + 1 and the average is (p + 1)/N. The standardized residual is dened as r i = e i ˆσ 1 h i and is used in regression diagnostics (normal probability plot, check for homoscedasticity).
20 Figure : Anscombe Quartet: 4 datasets giving the same set tted values
21 Figure : Anscombe Quartet: Leverage plot (h i vs i)
22 Specication issues: omitted variable bias, collinearity, etc. Consider the regression model y = Xβ + γw + ε, where w is a vector of N measurements on an added variable W. We can show that the OLS estimator of γ is and ˆγ = w My w Mw, M = I X(X X) 1 X, Var(ˆγ) = σ2 w Mw. These expressions have a nice interpretation: ˆγ results from Regress Y on X and obtain the residuals e Y X = My; Regress W on X and obtain the residuals e W X = Mw; Regress e Y X on e W X to obtain the LS estimate ˆγ (partial regr. coe.).
23 Multicollinearity Mw are the residuals of the regression of W on X. w Mw is the residual sum of squares from that regression. If X explains most of the variation in W, this is very small, and Var(ˆγ) will be very high. We can conclude that the eect that W exerts on Y is poorly (i.e. very imprecisely) estimated.
24 Omission of relevant variables If W is omitted from the regression, the OLS estimator ˆβ is biased, but also more precise: E(ˆβ) = β + (X X) 1 X wγ, Var(ˆβ) = σ 2 (X X) 1 σ 2 (X M W X) 1 ; M W = I w(w w) 1 w, where σ 2 (X M W X) 1 is the variance of the estimator of β when W is included in the model. If W is irrelevant (true γ = 0) there is only a cost associated to including it in the model. In any case, if W is orthogonal to X the distribution of ˆβ is not aected by W.
25 Prediction of a new sample observation We are interested in predicting Y for a new unit with values of X in the vector x o = (1, x o1,..., x op ). The optimal predictor under square loss is ŷ o = ˆf(x o ) = ˆβ 0 + ˆβ 1 x o1 + + ˆβ p x op = x o ˆβ Under the stated assumptions, the predictor is unbiased, i.e. the prediction error has zero mean, E(Y o ŷ o ) = 0, and Var(Y o ŷ o ) = σ 2 (1 + x o(x X) 1 x o ) Note: the stated assumptions concern the selection of the set of predictors in X and the properties of ε.
26 Predictive accuracy depends on 1. Ability to select all the relevant variable. Failure to do so will result in an inated σ 2 and E(ε X) 0, so that ˆβ is no longer optimal. 2. How removed is the unit prole x o from the rest: x o(x X) 1 x o measures the (Mahalanobis) distance of the values of the X 1,..., X p for the o-th unit in the space of the X's.
27 If the true model is y = Xβ + γw + ɛ, E(ɛ X, w) = 0, Var(ɛ X, w) = σ 2 ɛ I, the above predictor is biased and its mean square error is with E[(Y 0 ˆf(x o )) 2 X, w, x o, w o ] = σ 2 ɛ (1 + x o(x X) 1 x o ) + Bias 2, Bias = (β E(ˆβ)) x o + γw o = (w o w X(X X) 1 x o )γ The term in parenthesis is the error of prediction of w o using the information in x o.
BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1
BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)
More informationBusiness Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony
More informationIntroduction to Estimation Methods for Time Series models. Lecture 1
Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationMultivariate Regression Analysis
Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationAnalisi Statistica per le Imprese
, Analisi Statistica per le Imprese Dip. di Economia Politica e Statistica 4.3. 1 / 33 You should be able to:, Underst model building using multiple regression analysis Apply multiple regression analysis
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationLinear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77
Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationRegression Analysis for Data Containing Outliers and High Leverage Points
Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain
More informationReview of Econometrics
Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,
More informationLinear Regression Models
Linear Regression Models November 13, 2018 1 / 89 1 Basic framework Model specification and assumptions Parameter estimation: least squares method Coefficient of determination R 2 Properties of the least
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Bias and variance (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 49 Our plan today We saw in last lecture that model scoring methods seem to be trading off two different
More informationRegression Models - Introduction
Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,
More informationMultiple Regression Analysis. Part III. Multiple Regression Analysis
Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant
More informationCAS MA575 Linear Models
CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers
More informationSimple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.
Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1
More information9. Least squares data fitting
L. Vandenberghe EE133A (Spring 2017) 9. Least squares data fitting model fitting regression linear-in-parameters models time series examples validation least squares classification statistics interpretation
More informationAdvanced Quantitative Methods: ordinary least squares
Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012 1 2 3 4 5 Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationChapter 3: Multiple Regression. August 14, 2018
Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ
More informationHomoskedasticity. Var (u X) = σ 2. (23)
Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This
More informationQuantitative Methods I: Regression diagnostics
Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear
More informationThe general linear regression with k explanatory variables is just an extension of the simple regression as follows
3. Multiple Regression Analysis The general linear regression with k explanatory variables is just an extension of the simple regression as follows (1) y i = β 0 + β 1 x i1 + + β k x ik + u i. Because
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationLecture 4: Regression Analysis
Lecture 4: Regression Analysis 1 Regression Regression is a multivariate analysis, i.e., we are interested in relationship between several variables. For corporate audience, it is sufficient to show correlation.
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationMultiple Regression Analysis
Chapter 4 Multiple Regression Analysis The simple linear regression covered in Chapter 2 can be generalized to include more than one variable. Multiple regression analysis is an extension of the simple
More informationLecture 11: Regression Methods I (Linear Regression)
Lecture 11: Regression Methods I (Linear Regression) Fall, 2017 1 / 40 Outline Linear Model Introduction 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationApplied Econometrics (QEM)
Applied Econometrics (QEM) The Simple Linear Regression Model based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #2 The Simple
More informationRef.: Spring SOS3003 Applied data analysis for social science Lecture note
SOS3003 Applied data analysis for social science Lecture note 05-2010 Erling Berge Department of sociology and political science NTNU Spring 2010 Erling Berge 2010 1 Literature Regression criticism I Hamilton
More informationThis model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that
Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear
More information11 Hypothesis Testing
28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes
More informationLecture 11: Regression Methods I (Linear Regression)
Lecture 11: Regression Methods I (Linear Regression) 1 / 43 Outline 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear Regression Ordinary Least Squares Statistical
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1
MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationHypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima
Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s
More informationMatrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =
Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationEssential of Simple regression
Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship
More informationLecture 1: OLS derivations and inference
Lecture 1: OLS derivations and inference Econometric Methods Warsaw School of Economics (1) OLS 1 / 43 Outline 1 Introduction Course information Econometrics: a reminder Preliminary data exploration 2
More informationChapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression
BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between
More informationRegression and Statistical Inference
Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationMotivation for multiple regression
Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope
More informationQuantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017
Summary of Part II Key Concepts & Formulas Christopher Ting November 11, 2017 christopherting@smu.edu.sg http://www.mysmu.edu/faculty/christophert/ Christopher Ting 1 of 16 Why Regression Analysis? Understand
More informationThe Aitken Model. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 41
The Aitken Model Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 41 The Aitken Model (AM): Suppose where y = Xβ + ε, E(ε) = 0 and Var(ε) = σ 2 V for some σ 2 > 0 and some known
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More informationXβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =
The Gauss-Markov Linear Model y Xβ + ɛ y is an n random vector of responses X is an n p matrix of constants with columns corresponding to explanatory variables X is sometimes referred to as the design
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationThe Multiple Regression Model Estimation
Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationEconometrics I. Introduction to Regression Analysis (Outline of the course) Marijus Radavi ius
Econometrics I Introduction to Regression Analysis (Outline of the course) Marijus Radavi ius Vilnius 2012 Content 1 Introduction 2 1.1 Motivation................................. 2 1.2 Terminology................................
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
More informationSteps in Regression Analysis
MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4) 2-1 Steps in Regression Analysis 1. Review the literature and develop the theoretical model 2. Specify the model:
More informationECON 4230 Intermediate Econometric Theory Exam
ECON 4230 Intermediate Econometric Theory Exam Multiple Choice (20 pts). Circle the best answer. 1. The Classical assumption of mean zero errors is satisfied if the regression model a) is linear in the
More informationLecture 14 Simple Linear Regression
Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationMATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests
1/16 MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests Dominique Guillot Departments of Mathematical Sciences University of Delaware February 17, 2016 Statistical
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationReview of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley
Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationIntroductory Econometrics
Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More informationStatistics - Lecture Three. Linear Models. Charlotte Wickham 1.
Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationLecture 1 Intro to Spatial and Temporal Data
Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1
More informationChapter 14. Linear least squares
Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given
More informationECON The Simple Regression Model
ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In
More informationL2: Two-variable regression model
L2: Two-variable regression model Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: September 4, 2014 What we have learned last time...
More informationx 21 x 22 x 23 f X 1 X 2 X 3 ε
Chapter 2 Estimation 2.1 Example Let s start with an example. Suppose that Y is the fuel consumption of a particular model of car in m.p.g. Suppose that the predictors are 1. X 1 the weight of the car
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationChapter 1. Linear Regression with One Predictor Variable
Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical
More information4 Multiple Linear Regression
4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ
More informationVariable Selection and Model Building
LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 37 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur The complete regression
More informationLecture 2. The Simple Linear Regression Model: Matrix Approach
Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution
More informationRecent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data
Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More information3. Linear Regression With a Single Regressor
3. Linear Regression With a Single Regressor Econometrics: (I) Application of statistical methods in empirical research Testing economic theory with real-world data (data analysis) 56 Econometrics: (II)
More informationEstimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27
Estimation of the Response Mean Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 27 The Gauss-Markov Linear Model y = Xβ + ɛ y is an n random vector of responses. X is an n p matrix
More informationMFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators
MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators Thilo Klein University of Cambridge Judge Business School Session 4: Linear regression,
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationAMS-207: Bayesian Statistics
Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.
More informationSTATISTICS 110/201 PRACTICE FINAL EXAM
STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable
More information