14 Multiple Linear Regression
|
|
- Kathryn Allison
- 5 years ago
- Views:
Transcription
1 B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in terms of a single regressor variable x with a corresponding regression model of the form Y = β 0 + β 1 x + ε. In multiple linear regression, the response variable is expressed as a linear function of k regressor (or predictor or explanatory) variables, x 1, x 2,..., x k, with corresponding model of the form Y = β 0 + β 1 x 1 + β 2 x β k x k + ε. Suppose that n observations have been made of the variables (where n k + 1) with x ij the i-th observed value of the j-th regressor variable, corresponding to the i-th observed value y i of the response variable. Thus the data could be tabulated in the following form of a data matrix. The multiple linear regression model is Observation Variable Number y x 1 x 2... x k 1 y 1 x 11 x x 1k 2 y 2 x 21 x x 2k Y i = β n y n x n1 x n2... x nk k β j x ij + ε i i = 1,..., n, (1) j=1 where the x ij, i = 1,..., n; j = 1,..., k are regarded as fixed, β 0, β 1,..., β k are unknown parameters and the errors ε i, i = 1,..., n are assumed to be NID(0, σ 2 ), with σ 2 unknown. The model (1) may be written in matrix notation as where Y is an n 1 vector of observations, Y = Xβ + ε, (2) Y = (Y 1, Y 2,..., Y n ), β is a p 1 vector of parameters, where p = k + 1, β = (β 0, β 1,..., β k ), 1
2 ε is an n 1 vector of errors, ε = (ε 1, ε 2,..., ε n ), and X is an n p matrix, the design matrix, 1 x 11 x x 1k 1 x 21 x x 2k X = x n1 x n2... x nk. Equation (2) expresses the regression model in the form of what is known as the general linear model. The general linear model encompasses a wide variety of cases, including the linear statistical model for the one-way completely randomized design. For these models the design matrix X turns out to have a rather different form than for the regression model in that each element of the matrix is either 0 or 1. The term design matrix is used because X expresses the structure of the underlying experimental design. According to the method of least squares, we choose as our estimates, b = (b 0, b 1,..., b k ), the vector of parameters β whose elements jointly minimize the error (or residual) sum of squares, L = (y Xβ) (y Xβ), i.e., setting x i0 = 1, L = ( n y i i=1 ) 2 k x ij β j. (3) The expression (3) is minimized by setting the partial derivatives with respect to each of the β r, r = 0,..., k equal to zero. This yields the normal equations, a set of p = k +1 simultaneous linear equations for the p unknowns, b 0, b 1,..., b k, j=0 n k x ir x ij b j = i=1 j=0 n x ir y i r = 0,..., k, i=1 which may be written in matrix form as Note that X X is a symmetric p p matrix Rank and invertibility X Xb = X y. (4) The rank, rank(a), of a matrix A is the number of linearly independent columns of A. Recall that our design matrix X is an n p matrix with n p. It follows that rank(x) p. If rank(x) = p then X is said to be of full rank. It may be shown that rank(x X) = rank(x). (5) 2
3 A square matrix is said to be non-singular if it has an inverse. A p p square matrix is non-singular if and only if it is of full rank p. If X X is non-singular, which by the result (5) occurs if and only if X is of full rank p, then the normal equations (4) have a unique solution, ˆβ = (X X) 1 X Y. (6) It will generally be the case for sensible regression models that the design matrix X is of full rank, but this is not necessarily always the case. To take an extreme example, if the silly mistake is made of taking one of the regressor variables to be a scaled version of another, e.g., if one regressor variable is height measured in inches and another is the same height measured in cms, then the two corresponding columns of the matrix X are scalar multiples of each other and hence rank(x) < p. The normal equations do not then have a unique solution the estimates of the parameters are not well-determined. For a given set of data, assuming that X is of full rank p, the formal mathematical solution (6) of the normal equations (4) is translated in a statistical package such as S+ into a numerical procedure for solving the normal equations The hat matrix and leverage Assuming that X is of full rank, the vector Ŷ of fitted values is given by Ŷ = Xˆβ = HY, where, using Equation (6), the hat matrix H is defined by H = X(X X) 1 X. Note that H is a symmetric n n matrix of rank p. The vector ˆε of residuals is given by ˆε = Y Ŷ = (I H)Y, where I is the n n identity matrix. It turns out that var(ˆε i ) = (1 h i )σ 2 i = 1,..., n, where h i is the i-th diagonal element of the hat matrix H and is known as the leverage of the i-th observation. The leverage h i may be regarded as a measure of the remoteness of the i-th observation from the remaining n 1 observations in the space of the regressor variables. It is always the case that 1/n h i 1 i = 1,..., n and h i = p, so that h = p/n. If an individual h i is large then the corresponding observation may have a large influence in determining the estimated regression coefficients. Recall that we can obtain a list of the leverage values from within S+ by using the function lm.influence() applied to the appropriate model object. We may regard h i as being high if h i > min(0.99, 3p/n). 3
4 14.4 The covariance matrix of the estimators of the parameters It turns out that ˆβ r is a normally distributed, unbiased estimator of β r, r = 0,..., k, with variance given by the r-th diagonal element of the matrix σ 2 (X X) 1, where the rows and columns of the matrix are indexed 0, 1,..., k (rather than 1, 2,..., k + 1). In fact, it can be shown that ˆβ N p (β, σ 2 (X X) 1 ) Example An estimate is required of the percentage yield of petroleum spirit from crude oil, based upon certain rough laboratory determinations of properties of the crude oil. The following table shows actual percentage yields of petroleum spirit, y, and four properties, x 1, x 2, x 3, x 4, of the crude oil, for samples from 32 different crudes. 4
5 Data on yields of petroleum spirit The variables recorded are as follows. y x 1 x 2 x 3 x y: percentage yield of petroleum spirit x 1 : specific gravity of the crude x 2 : crude oil vapour pressure, measured in pounds per square inch x 3 : the ASTM 10% distillation point, in F x 4 : the petroleum fraction end point, in F 5
6 It is required to use these data to provide an equation for predicting y from measurements of the four explanatory variables, x 1, x 2, x 3, x 4, (or some subset of them). The data have been read into an S+ data frame oil. The function names is used to assign names to the five variables. The linear model function lm is then used to carry out a multiple linear regression of the response variable spirit upon the four regressor variables, gravity, pressure, distil and endpoint, the results of which are stored in the object oil.lm. > y <- c(69, 144, 74, 85, 80, 28, 50, 122, 100, 152, 268, 140, 147, 64, 176, 223, 248, 260, 349, 182, 232, 180, 131, 161, 321, 347, 317, 336, 304, 266, 278, 457)/10 > x1 <- c(384, 403, 400, 318, 408, 413, 381, 508, 322, 384, 403, 322, 318, 413, 381, 508, 322, 384, 403, 400, 322, 318, 408, 413, 381, 508, 322, 384, 400, 408, 413, 508)/ 10 > x2 <- c(61, 48, 61, 2, 35, 18, 12, 86, 52, 61, 48, 24, 2, 18, 12, 86, 52, 61, 48, 61, 24, 2, 35, 18, 12, 86, 52, 61, 61, 35, 18, 86)/10 > x3 <- c(220, 231, 217, 316, 210, 267, 274, 190, 236, 220, 231, 284, 316, 267, 274, 190, 236, 220, 231, 217, 284, 316, 210, 267, 274, 190, 236, 220, 217, 210, 267, 190) > x4 <- c(235, 307, 212, 365, 218, 235, 285, 205, 267, 300, 367, 351, 379, 275, 365, 275, 360, 365, 395, 272, 424, 428, 273, 358, 444, 345, 402, 410, 340, 347, 416, 407) > oil <- data.frame(y, x1, x2, x3, x4) > names(oil) <- c("spirit", "gravity", "pressure", "distil", "endpoint") > oil.lm <- lm(spirit ~ gravity + pressure + distil + endpoint, data = oil) > summary(oil.lm) Call: lm(formula = spirit ~ gravity + pressure + distil + endpoint, data = oil) Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value Pr(> t ) (Intercept) gravity pressure distil endpoint Residual standard error: on 27 degrees of freedom Multiple R-Squared: Adjusted R-squared: F-statistic: on 4 and 27 degrees of freedom, the p-value is 0 6
7 14.6 ANOVA for multiple linear regression We now begin to outline the theory that will enable us to interpret the above analysis of variance and to carry out further analyses. It turns out that we may partition the total sum of squares SS T S yy into the sum of the regression sum of squares SS Reg, and the error (or residual) sum of squares SS R, i.e., SS T = SS Reg + SS R. The regression sum of squares, which is that part of the total sum of squares that is accounted for by the fitted regression, is given by k SS Reg = ˆβ r S ry, where S ry = r=1 n (x ir x.r )(Y i Ȳ ) r = 1,..., k. i=1 Corresponding to the above partition of the sum of squares we have the following ANOVA, where, as in earlier ANOVAs, Ŝ2 MS R is an unbiased estimator of the error variance σ 2. ANOVA TABLE Source DF SS M S Regression k ˆβr S ry SS Reg /k Error n k 1 by subtraction S 2 SS R /(n k 1) Total n 1 S yy We may wish to test the hypothesis that there is no linear relationship between the response variable y and the regressor variables x 1, x 2,..., x k. Formally, we test the null hypothesis against the alternative H 0 : β 1 = β 2 =... = β k = 0 H 1 : β j 0 for some j, j = 1,..., k. This hypothesis is tested using the test statistic which under H 0 has the F k,n k 1 distribution. F = MS Reg MS R, Clearly, from the S+ output, there is overwhelming evidence (F obs = 171.7, p = 0.000) to reject the null hypothesis of no linear relationship between the yield of petroleum spirit and the four regressor variables. 7
13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More information11 Hypothesis Testing
28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationMatrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =
Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write
More informationBusiness Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationLeverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.
Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationLecture 2. The Simple Linear Regression Model: Matrix Approach
Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution
More informationComparing Nested Models
Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent
More informationChapter 5 Matrix Approach to Simple Linear Regression
STAT 525 SPRING 2018 Chapter 5 Matrix Approach to Simple Linear Regression Professor Min Zhang Matrix Collection of elements arranged in rows and columns Elements will be numbers or symbols For example:
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationChapter 3: Multiple Regression. August 14, 2018
Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationTopic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form
Topic 7 - Matrix Approach to Simple Linear Regression Review of Matrices Outline Regression model in matrix form - Fall 03 Calculations using matrices Topic 7 Matrix Collection of elements arranged in
More informationLecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is
Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationREGRESSION ANALYSIS AND INDICATOR VARIABLES
REGRESSION ANALYSIS AND INDICATOR VARIABLES Thesis Submitted in partial fulfillment of the requirements for the award of degree of Masters of Science in Mathematics and Computing Submitted by Sweety Arora
More informationStatistical Techniques II EXST7015 Simple Linear Regression
Statistical Techniques II EXST7015 Simple Linear Regression 03a_SLR 1 Y - the dependent variable 35 30 25 The objective Given points plotted on two coordinates, Y and X, find the best line to fit the data.
More information18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013
18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are
More informationLECTURE 2 LINEAR REGRESSION MODEL AND OLS
SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More information20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =
20. ONE-WAY ANALYSIS OF VARIANCE 1 20.1. Balanced One-Way Classification Cell means parametrization: Y ij = µ i + ε ij, i = 1,..., I; j = 1,..., J, ε ij N(0, σ 2 ), In matrix form, Y = Xβ + ε, or 1 Y J
More informationMa 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA
Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA March 6, 2017 KC Border Linear Regression II March 6, 2017 1 / 44 1 OLS estimator 2 Restricted regression 3 Errors in variables 4
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More informationLecture 3: Multiple Regression
Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More informationx 21 x 22 x 23 f X 1 X 2 X 3 ε
Chapter 2 Estimation 2.1 Example Let s start with an example. Suppose that Y is the fuel consumption of a particular model of car in m.p.g. Suppose that the predictors are 1. X 1 the weight of the car
More information3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.
7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of
More informationDr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46
BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics
More informationOutline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model
Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationReference: Davidson and MacKinnon Ch 2. In particular page
RNy, econ460 autumn 03 Lecture note Reference: Davidson and MacKinnon Ch. In particular page 57-8. Projection matrices The matrix M I X(X X) X () is often called the residual maker. That nickname is easy
More informationStatistical Modelling in Stata 5: Linear Models
Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does
More informationProperties of the least squares estimates
Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares
More informationIntroduction to Linear Regression Rebecca C. Steorts September 15, 2015
Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using
More informationHomoskedasticity. Var (u X) = σ 2. (23)
Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This
More informationLecture 4: Regression Analysis
Lecture 4: Regression Analysis 1 Regression Regression is a multivariate analysis, i.e., we are interested in relationship between several variables. For corporate audience, it is sufficient to show correlation.
More informationPOL 213: Research Methods
Brad 1 1 Department of Political Science University of California, Davis April 15, 2008 Some Matrix Basics What is a matrix? A rectangular array of elements arranged in rows and columns. 55 900 0 67 1112
More informationMultivariate Regression (Chapter 10)
Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More informationCoefficient of Determination
Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance
More informationLecture 6: Geometry of OLS Estimation of Linear Regession
Lecture 6: Geometry of OLS Estimation of Linear Regession Xuexin Wang WISE Oct 2013 1 / 22 Matrix Algebra An n m matrix A is a rectangular array that consists of nm elements arranged in n rows and m columns
More informationSimple Linear Regression
Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring
More informationCAS MA575 Linear Models
CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers
More informationassumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )
Multiple Linear Regression is used to relate a continuous response (or dependent) variable Y to several explanatory (or independent) (or predictor) variables X 1, X 2,, X k assumes a linear relationship
More informationTopic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model
Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationBiostatistics 380 Multiple Regression 1. Multiple Regression
Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)
More informationUnit 6 - Simple linear regression
Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable
More informationAddition and subtraction: element by element, and dimensions must match.
Matrix Essentials review: ) Matrix: Rectangular array of numbers. ) ranspose: Rows become columns and vice-versa ) single row or column is called a row or column) Vector ) R ddition and subtraction: element
More information1 Multiple Regression
1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only
More informationInverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1
Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is
More informationEstimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27
Estimation of the Response Mean Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 27 The Gauss-Markov Linear Model y = Xβ + ɛ y is an n random vector of responses. X is an n p matrix
More informationLecture 2. Simple linear regression
Lecture 2. Simple linear regression Jesper Rydén Department of Mathematics, Uppsala University jesper@math.uu.se Regression and Analysis of Variance autumn 2014 Overview of lecture Introduction, short
More informationTopic 3: Inference and Prediction
Topic 3: Inference and Prediction We ll be concerned here with testing more general hypotheses than those seen to date. Also concerned with constructing interval predictions from our regression model.
More informationStatistics for Engineers Lecture 9 Linear Regression
Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April
More informationTopic 20: Single Factor Analysis of Variance
Topic 20: Single Factor Analysis of Variance Outline Single factor Analysis of Variance One set of treatments Cell means model Factor effects model Link to linear regression using indicator explanatory
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More information1 Least Squares Estimation - multiple regression.
Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationThe Standard Linear Model: Hypothesis Testing
Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 25: The Standard Linear Model: Hypothesis Testing Relevant textbook passages: Larsen Marx [4]:
More information6. Multiple Linear Regression
6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X
More informationLecture 19 Multiple (Linear) Regression
Lecture 19 Multiple (Linear) Regression Thais Paiva STA 111 - Summer 2013 Term II August 1, 2013 1 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013 Lecture Plan 1 Multiple regression
More informationTopic 3: Inference and Prediction
Topic 3: Inference and Prediction We ll be concerned here with testing more general hypotheses than those seen to date. Also concerned with constructing interval predictions from our regression model.
More informationINTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y
INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationMa 3/103: Lecture 24 Linear Regression I: Estimation
Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the
More informationSTAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing
STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =
More information12 The Analysis of Residuals
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 12 The Analysis of Residuals 12.1 Errors and residuals Recall that in the statistical model for the completely randomized one-way design, Y ij
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 2: Linear Regression (v3) Ramesh Johari rjohari@stanford.edu September 29, 2017 1 / 36 Summarizing a sample 2 / 36 A sample Suppose Y = (Y 1,..., Y n ) is a sample of real-valued
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationLinear Models and Estimation by Least Squares
Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More informationA discussion on multiple regression models
A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationIn the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)
RNy, econ460 autumn 04 Lecture note Orthogonalization and re-parameterization 5..3 and 7.. in HN Orthogonalization of variables, for example X i and X means that variables that are correlated are made
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationTests of Linear Restrictions
Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some
More informationLecture 14 Simple Linear Regression
Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent
More informationRegression Analysis for Data Containing Outliers and High Leverage Points
Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain
More informationThe Classical Linear Regression Model
The Classical Linear Regression Model ME104: Linear Regression Analysis Kenneth Benoit August 14, 2012 CLRM: Basic Assumptions 1. Specification: Relationship between X and Y in the population is linear:
More informationMAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik
MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,
More informationNo other aids are allowed. For example you are not allowed to have any other textbook or past exams.
UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In
More informationSchool of Mathematical Sciences. Question 1
School of Mathematical Sciences MTH5120 Statistical Modelling I Practical 8 and Assignment 7 Solutions Question 1 Figure 1: The residual plots do not contradict the model assumptions of normality, constant
More informationStatistics - Lecture Three. Linear Models. Charlotte Wickham 1.
Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions
More informationEstimable Functions and Their Least Squares Estimators. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 51
Estimable Functions and Their Least Squares Estimators Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 51 Consider the GLM y = n p X β + ε, where E(ε) = 0. p 1 n 1 n 1 Suppose
More information