MATH 567: Mathematical Techniques in Data Science Linear Regression: old and new

Size: px
Start display at page:

Download "MATH 567: Mathematical Techniques in Data Science Linear Regression: old and new"

Transcription

1 1/14 MATH 567: Mathematical Techniques in Data Science Linear Regression: old and new Dominique Guillot Departments of Mathematical Sciences University of Delaware February 13, 2017

2 Linear Regression: old and new 2/14 Typical problem: we are given n observations of variables X 1,..., X p and Y.

3 2/14 Linear Regression: old and new Typical problem: we are given n observations of variables X 1,..., X p and Y. Goal: Use X 1,..., X p to try to predict Y.

4 2/14 Linear Regression: old and new Typical problem: we are given n observations of variables X 1,..., X p and Y. Goal: Use X 1,..., X p to try to predict Y. Example: Cars data compiled using Kelley Blue Book (n = 805, p = 11).

5 2/14 Linear Regression: old and new Typical problem: we are given n observations of variables X 1,..., X p and Y. Goal: Use X 1,..., X p to try to predict Y. Example: Cars data compiled using Kelley Blue Book (n = 805, p = 11). Find a linear model Y = β 1 X β p X p.

6 2/14 Linear Regression: old and new Typical problem: we are given n observations of variables X 1,..., X p and Y. Goal: Use X 1,..., X p to try to predict Y. Example: Cars data compiled using Kelley Blue Book (n = 805, p = 11). Find a linear model Y = β 1 X β p X p. In the example, we want: price = β 1 mileage + β 2 cylinder +...

7 Linear regression: classical setting 3/14 p = nb. of variables, n = nb. of observations.

8 Linear regression: classical setting 3/14 p = nb. of variables, n = nb. of observations. Classical setting: n p (n much larger than p). With enough observations, we hope to be able to build a good model.

9 Linear regression: classical setting 3/14 p = nb. of variables, n = nb. of observations. Classical setting: n p (n much larger than p). With enough observations, we hope to be able to build a good model. Note: even if the true relationship between the variables is not linear, we can include transformations of variables.

10 Linear regression: classical setting 3/14 p = nb. of variables, n = nb. of observations. Classical setting: n p (n much larger than p). With enough observations, we hope to be able to build a good model. Note: even if the true relationship between the variables is not linear, we can include transformations of variables. E.g. X p+1 = X 2 1, X p+2 = X 2 2,...

11 Linear regression: classical setting 3/14 p = nb. of variables, n = nb. of observations. Classical setting: n p (n much larger than p). With enough observations, we hope to be able to build a good model. Note: even if the true relationship between the variables is not linear, we can include transformations of variables. E.g. X p+1 = X 2 1, X p+2 = X 2 2,... Note: adding transformed variables can increase p signicantly.

12 Linear regression: classical setting 3/14 p = nb. of variables, n = nb. of observations. Classical setting: n p (n much larger than p). With enough observations, we hope to be able to build a good model. Note: even if the true relationship between the variables is not linear, we can include transformations of variables. E.g. X p+1 = X 2 1, X p+2 = X 2 2,... Note: adding transformed variables can increase p signicantly. A complex model requires a lot of observations.

13 Linear regression: classical setting 3/14 p = nb. of variables, n = nb. of observations. Classical setting: n p (n much larger than p). With enough observations, we hope to be able to build a good model. Note: even if the true relationship between the variables is not linear, we can include transformations of variables. E.g. X p+1 = X 2 1, X p+2 = X 2 2,... Note: adding transformed variables can increase p signicantly. A complex model requires a lot of observations. Trade-o between complexity and interpretability.

14 Linear regression: classical setting p = nb. of variables, n = nb. of observations. Classical setting: n p (n much larger than p). With enough observations, we hope to be able to build a good model. Note: even if the true relationship between the variables is not linear, we can include transformations of variables. E.g. X p+1 = X 2 1, X p+2 = X 2 2,... Note: adding transformed variables can increase p signicantly. A complex model requires a lot of observations. Trade-o between complexity and interpretability. Modern setting: In modern problems, it is often the case that n p. Requires supplementary assumptions (e.g. sparsity). Can still build good models with very few observations. 3/14

15 Classical setting 4/14 Idea: Y R n 1 X R n p

16 Classical setting 4/14 Idea: Y R n 1 X R n p y 1 Y = y X = x 1 x 2... x p,... y n where x 1,..., x p R n 1 are the observations of X 1,... X p.

17 Classical setting 4/14 Idea: Y R n 1 X R n p y 1 Y = y X = x 1 x 2... x p,... y n where x 1,..., x p R n 1 are the observations of X 1,... X p. We want Y = β 1 X β p X p. Equivalent to solving β 1 β 2 Y = Xβ β =.. β p

18 Classical setting (cont.) 5/14 We need to solve Y = Xβ. In general, the system has no solution (n p) or innitely many solutions (n p).

19 Classical setting (cont.) 5/14 We need to solve Y = Xβ. In general, the system has no solution (n p) or innitely many solutions (n p). A popular approach is to solve the system in the least squares sense: ˆβ = argmin β R p Y Xβ 2.

20 Classical setting (cont.) 5/14 We need to solve Y = Xβ. In general, the system has no solution (n p) or innitely many solutions (n p). A popular approach is to solve the system in the least squares sense: ˆβ = argmin β R p Y Xβ 2. How do we compute the solution? Calculus approach:

21 Classical setting (cont.) 5/14 We need to solve Y = Xβ. In general, the system has no solution (n p) or innitely many solutions (n p). A popular approach is to solve the system in the least squares sense: ˆβ = argmin β R p Y Xβ 2. How do we compute the solution? Calculus approach: 0 = Y Xβ 2 = n (y k X k1 β 1 X k2 β 2 X kp β p ) 2 β i β i k=1 n = 2 (y k X k1 β 1 X k2 β 2 X kp β p ) ( X ki ) k=1

22 Classical setting (cont.) 5/14 We need to solve Y = Xβ. In general, the system has no solution (n p) or innitely many solutions (n p). A popular approach is to solve the system in the least squares sense: ˆβ = argmin β R p Y Xβ 2. How do we compute the solution? Calculus approach: 0 = Y Xβ 2 = n (y k X k1 β 1 X k2 β 2 X kp β p ) 2 β i β i k=1 n = 2 (y k X k1 β 1 X k2 β 2 X kp β p ) ( X ki ) Therefore, k=1 n n X ki (X k1 β 1 + X k2 β X kp β p ) = X ki y k k=1 k=1

23 Calculus approach (cont.) 6/14 Now n n X ki (X k1 β 1 +X k2 β 2 + +X kp β p ) = X ki y k i = 1,..., p, k=1 k=1 is equivalent to: X T Xβ = X T y (Normal equations).

24 Calculus approach (cont.) 6/14 Now n n X ki (X k1 β 1 +X k2 β 2 + +X kp β p ) = X ki y k i = 1,..., p, k=1 k=1 is equivalent to: X T Xβ = X T y (Normal equations). If X T X is invertible, then ˆβ = (X T X) 1 X T Y is the unique minimum of Y Xβ 2.

25 Calculus approach (cont.) 6/14 Now n n X ki (X k1 β 1 +X k2 β 2 + +X kp β p ) = X ki y k i = 1,..., p, k=1 k=1 is equivalent to: X T Xβ = X T y (Normal equations). If X T X is invertible, then ˆβ = (X T X) 1 X T Y is the unique minimum of Y Xβ 2. Proved by computing the Hessian matrix: 2 β i β j Y Xβ 2 = 2X T X.

26 Linear algebra approach 7/14 Want to solve Y = Xβ. Linear algebra approach: Recall: If V R n is a subspace and w V, then the best approximation of w be a vector in V is proj V (w).

27 Linear algebra approach 7/14 Want to solve Y = Xβ. Linear algebra approach: Recall: If V R n is a subspace and w V, then the best approximation of w be a vector in V is Best in the sense that: proj V (w). w proj V (w) w v v V.

28 Linear algebra approach 7/14 Want to solve Y = Xβ. Linear algebra approach: Recall: If V R n is a subspace and w V, then the best approximation of w be a vector in V is Best in the sense that: proj V (w). w proj V (w) w v v V. Note: Xβ col(x) = span(x 1,..., x p ). If Y col(x), then the best approximation of Y by a vector in col(x) is proj col(x)(y ).

29 Linear algebra approach (cont.) 8/14 So Y proj col(x)(y ) Y Xβ β R p.

30 Linear algebra approach (cont.) 8/14 So Y proj col(x)(y ) Y Xβ β R p. Therefore, to nd ˆβ, we solve X ˆβ = proj col(x)(y ) (Note: this system always has a solution.)

31 Linear algebra approach (cont.) 8/14 So Y proj col(x)(y ) Y Xβ β R p. Therefore, to nd ˆβ, we solve X ˆβ = proj col(x)(y ) (Note: this system always has a solution.) With a little more work, we can nd an explicit solution: Y X ˆβ = Y proj col(x)(y ) = proj col(x) (Y ).

32 Linear algebra approach (cont.) 8/14 So Y proj col(x)(y ) Y Xβ β R p. Therefore, to nd ˆβ, we solve X ˆβ = proj col(x)(y ) (Note: this system always has a solution.) With a little more work, we can nd an explicit solution: Recall Y X ˆβ = Y proj col(x)(y ) = proj col(x) (Y ). col(x) = null(x T ).

33 Linear algebra approach (cont.) 8/14 So Y proj col(x)(y ) Y Xβ β R p. Therefore, to nd ˆβ, we solve X ˆβ = proj col(x)(y ) (Note: this system always has a solution.) With a little more work, we can nd an explicit solution: Recall Thus, Y X ˆβ = Y proj col(x)(y ) = proj col(x) (Y ). col(x) = null(x T ). Y X ˆβ = proj null(x T )(Y ) null(x T ).

34 Linear algebra approach (cont.) 8/14 So Y proj col(x)(y ) Y Xβ β R p. Therefore, to nd ˆβ, we solve X ˆβ = proj col(x)(y ) (Note: this system always has a solution.) With a little more work, we can nd an explicit solution: Recall Thus, That implies: Y X ˆβ = Y proj col(x)(y ) = proj col(x) (Y ). col(x) = null(x T ). Y X ˆβ = proj null(x T )(Y ) null(x T ). X T (Y X ˆβ) = 0.

35 Linear algebra approach (cont.) 8/14 So Y proj col(x)(y ) Y Xβ β R p. Therefore, to nd ˆβ, we solve X ˆβ = proj col(x)(y ) (Note: this system always has a solution.) With a little more work, we can nd an explicit solution: Recall Thus, That implies: Equivalently, Y X ˆβ = Y proj col(x)(y ) = proj col(x) (Y ). col(x) = null(x T ). Y X ˆβ = proj null(x T )(Y ) null(x T ). X T X ˆβ = X T Y X T (Y X ˆβ) = 0. (Normal equations).

36 The least squares theorem 9/14 Theorem (Least squares theorem) Let A R n m and b R n. Then 1 Ax = b always has a least squares solution ˆx. 2 A vector ˆx is a least squares solution i it satises the normal equations A T Aˆx = A T b. 3 ˆx is unique the columns of A are linearly independent A T A is invertible. In that case, the unique least squares solution is given by ˆx = (A T A) 1 A T b.

37 The least squares theorem 9/14 Theorem (Least squares theorem) Let A R n m and b R n. Then 1 Ax = b always has a least squares solution ˆx. 2 A vector ˆx is a least squares solution i it satises the normal equations A T Aˆx = A T b. 3 ˆx is unique the columns of A are linearly independent A T A is invertible. In that case, the unique least squares solution is given by ˆx = (A T A) 1 A T b. In R: model <- lm(y X 1 + X X p ).

38 Measuring the t of a linear model 10/14 How good is our linear model? We examine the mean squared error: MSE( ˆβ) = 1 n y X ˆβ 2 = 1 n n (y i ŷ i ) 2. k=1

39 Measuring the t of a linear model 10/14 How good is our linear model? We examine the mean squared error: MSE( ˆβ) = 1 n y X ˆβ 2 = 1 n n (y i ŷ i ) 2. k=1 Example: model <- lm(auto$mpg ~ Auto$horsepower + Auto$weight) sm <- summary(model) mean(sm$residuals^2) # The MSE

40 The coecient of determination 11/14 The coecient of determination, called R squared and denoted R 2 : n R 2 i=1 = 1 (y i ŷ i ) 2 n i=1 (y i y) 2.

41 The coecient of determination 11/14 The coecient of determination, called R squared and denoted R 2 : n R 2 i=1 = 1 (y i ŷ i ) 2 n i=1 (y i y) 2. Often used to measure the quality of a linear model.

42 The coecient of determination 11/14 The coecient of determination, called R squared and denoted R 2 : n R 2 i=1 = 1 (y i ŷ i ) 2 n i=1 (y i y) 2. Often used to measure the quality of a linear model. In some sense, the R 2 measures how much better is the prediction, compared to a constant prediction equal to the average of the y i s.

43 The coecient of determination 11/14 The coecient of determination, called R squared and denoted R 2 : n R 2 i=1 = 1 (y i ŷ i ) 2 n i=1 (y i y) 2. Often used to measure the quality of a linear model. In some sense, the R 2 measures how much better is the prediction, compared to a constant prediction equal to the average of the y i s. In R: sm$r.squared. (As above, sm <- summary(model)).

44 The coecient of determination 11/14 The coecient of determination, called R squared and denoted R 2 : n R 2 i=1 = 1 (y i ŷ i ) 2 n i=1 (y i y) 2. Often used to measure the quality of a linear model. In some sense, the R 2 measures how much better is the prediction, compared to a constant prediction equal to the average of the y i s. In R: sm$r.squared. (As above, sm <- summary(model)). In a linear model with an intercept, R 2 equals the square of the correlation coecient between the observed Y and the predicted values Ŷ.

45 The coecient of determination 11/14 The coecient of determination, called R squared and denoted R 2 : n R 2 i=1 = 1 (y i ŷ i ) 2 n i=1 (y i y) 2. Often used to measure the quality of a linear model. In some sense, the R 2 measures how much better is the prediction, compared to a constant prediction equal to the average of the y i s. In R: sm$r.squared. (As above, sm <- summary(model)). In a linear model with an intercept, R 2 equals the square of the correlation coecient between the observed Y and the predicted values Ŷ. A model with a R 2 close to 1 ts the data well.

46 Measuring the t of a linear model (cont.) 12/14 We can examine the distribution of the residuals: hist(sm$residuals) Desirable properties: Symmetry Light tail.

47 Measuring the t of a linear model (cont.) 12/14 We can examine the distribution of the residuals: hist(sm$residuals) Desirable properties: Symmetry Light tail. A heavy tail suggests there may be outliers. Can use transformations such as log,, or 1/x to improve the t.

48 Measuring the t of a linear model (cont.) 13/14 Plotting the residuals as a function of the mpg (or tted values), we immediately observe some patterns. Outliers? Separate categories of cars?

49 Improving the model 14/14 Add more variables to the model. Select the best variables to include. Use transformations. Separate cars into categories. etc.

50 Improving the model 14/14 Add more variables to the model. Select the best variables to include. Use transformations. Separate cars into categories. etc. For example, let us t a model only for cars with a mpg less than 25:

MATH 829: Introduction to Data Mining and Analysis Linear Regression: old and new

MATH 829: Introduction to Data Mining and Analysis Linear Regression: old and new 1/15 MATH 829: Introduction to Data Mining and Analysis Linear Regression: old and new Dominique Guillot Departments of Mathematical Sciences University of Delaware February 10, 2016 Linear Regression:

More information

MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution

MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution 1/16 MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution Dominique Guillot Departments of Mathematical Sciences University of Delaware February 26, 2016 Computing the lasso

More information

MATH 829: Introduction to Data Mining and Analysis Least angle regression

MATH 829: Introduction to Data Mining and Analysis Least angle regression 1/14 MATH 829: Introduction to Data Mining and Analysis Least angle regression Dominique Guillot Departments of Mathematical Sciences University of Delaware February 29, 2016 Least angle regression (LARS)

More information

Least squares problems Linear Algebra with Computer Science Application

Least squares problems Linear Algebra with Computer Science Application Linear Algebra with Computer Science Application April 8, 018 1 Least Squares Problems 11 Least Squares Problems What do you do when Ax = b has no solution? Inconsistent systems arise often in applications

More information

Chapter 6: Orthogonality

Chapter 6: Orthogonality Chapter 6: Orthogonality (Last Updated: November 7, 7) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved around.. Inner products

More information

Worksheet for Lecture 25 Section 6.4 Gram-Schmidt Process

Worksheet for Lecture 25 Section 6.4 Gram-Schmidt Process Worksheet for Lecture Name: Section.4 Gram-Schmidt Process Goal For a subspace W = Span{v,..., v n }, we want to find an orthonormal basis of W. Example Let W = Span{x, x } with x = and x =. Give an orthogonal

More information

MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression

MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression 1/9 MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression Dominique Guillot Deartments of Mathematical Sciences University of Delaware February 15, 2016 Distribution of regression

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis 1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

MATH 829: Introduction to Data Mining and Analysis Graphical Models I MATH 829: Introduction to Data Mining and Analysis Graphical Models I Dominique Guillot Departments of Mathematical Sciences University of Delaware May 2, 2016 1/12 Independence and conditional independence:

More information

MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests

MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests 1/16 MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests Dominique Guillot Departments of Mathematical Sciences University of Delaware February 17, 2016 Statistical

More information

Solutions to Review Problems for Chapter 6 ( ), 7.1

Solutions to Review Problems for Chapter 6 ( ), 7.1 Solutions to Review Problems for Chapter (-, 7 The Final Exam is on Thursday, June,, : AM : AM at NESBITT Final Exam Breakdown Sections % -,7-9,- - % -9,-,7,-,-7 - % -, 7 - % Let u u and v Let x x x x,

More information

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction 22s:152 Applied Linear Regression Chapter 5: Ordinary Least Squares Regression Part 2: Multiple Linear Regression Introduction Basic idea: we have more than one covariate or predictor for modeling a dependent

More information

11 Regression. Introduction. The Correlation Coefficient. The Least-Squares Regression Line

11 Regression. Introduction. The Correlation Coefficient. The Least-Squares Regression Line 11 Regression The Correlation Coefficient The Least-Squares Regression Line The Correlation Coefficient Introduction A bivariate data set consists of n, (x 1, y 1 ),, (x n, y n ). A scatterplot is a of

More information

Math 3191 Applied Linear Algebra

Math 3191 Applied Linear Algebra Math 9 Applied Linear Algebra Lecture : Orthogonal Projections, Gram-Schmidt Stephen Billups University of Colorado at Denver Math 9Applied Linear Algebra p./ Orthonormal Sets A set of vectors {u, u,...,

More information

STAT 350: Geometry of Least Squares

STAT 350: Geometry of Least Squares The Geometry of Least Squares Mathematical Basics Inner / dot product: a and b column vectors a b = a T b = a i b i a b a T b = 0 Matrix Product: A is r s B is s t (AB) rt = s A rs B st Partitioned Matrices

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29 Analysis of variance Gilles Guillot gigu@dtu.dk September 30, 2013 Gilles Guillot (gigu@dtu.dk) September 30, 2013 1 / 29 1 Introductory example 2 One-way ANOVA 3 Two-way ANOVA 4 Two-way ANOVA with interactions

More information

Linear Models Review

Linear Models Review Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign

More information

POL 213: Research Methods

POL 213: Research Methods Brad 1 1 Department of Political Science University of California, Davis April 15, 2008 Some Matrix Basics What is a matrix? A rectangular array of elements arranged in rows and columns. 55 900 0 67 1112

More information

Simple Linear Regression for the MPG Data

Simple Linear Regression for the MPG Data Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory

More information

Math 2331 Linear Algebra

Math 2331 Linear Algebra 6. Orthogonal Projections Math 2 Linear Algebra 6. Orthogonal Projections Jiwen He Department of Mathematics, University of Houston jiwenhe@math.uh.edu math.uh.edu/ jiwenhe/math2 Jiwen He, University of

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Machine Learning (CSE 446): Learning as Minimizing Loss; Least Squares

Machine Learning (CSE 446): Learning as Minimizing Loss; Least Squares Machine Learning (CSE 446): Learning as Minimizing Loss; Least Squares Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 13 Review 1 / 13 Alternate View of PCA: Minimizing

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

We will now find the one line that best fits the data on a scatter plot.

We will now find the one line that best fits the data on a scatter plot. General Education Statistics Class Notes Least-Squares Regression (Section 4.2) We will now find the one line that best fits the data on a scatter plot. We have seen how two variables can be correlated

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

STAT 3A03 Applied Regression With SAS Fall 2017

STAT 3A03 Applied Regression With SAS Fall 2017 STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Pseudoinverse and Adjoint Operators

Pseudoinverse and Adjoint Operators ECE 275AB Lecture 5 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 5 ECE 275A Pseudoinverse and Adjoint Operators ECE 275AB Lecture 5 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p.

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Math 3191 Applied Linear Algebra

Math 3191 Applied Linear Algebra Math 191 Applied Linear Algebra Lecture 16: Change of Basis Stephen Billups University of Colorado at Denver Math 191Applied Linear Algebra p.1/0 Rank The rank of A is the dimension of the column space

More information

Math 261 Lecture Notes: Sections 6.1, 6.2, 6.3 and 6.4 Orthogonal Sets and Projections

Math 261 Lecture Notes: Sections 6.1, 6.2, 6.3 and 6.4 Orthogonal Sets and Projections Math 6 Lecture Notes: Sections 6., 6., 6. and 6. Orthogonal Sets and Projections We will not cover general inner product spaces. We will, however, focus on a particular inner product space the inner product

More information

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1 CSCI5654 (Linear Programming, Fall 2013) Lectures 10-12 Lectures 10,11 Slide# 1 Today s Lecture 1. Introduction to norms: L 1,L 2,L. 2. Casting absolute value and max operators. 3. Norm minimization problems.

More information

Least Squares. Stephen Boyd. EE103 Stanford University. October 28, 2017

Least Squares. Stephen Boyd. EE103 Stanford University. October 28, 2017 Least Squares Stephen Boyd EE103 Stanford University October 28, 2017 Outline Least squares problem Solution of least squares problem Examples Least squares problem 2 Least squares problem suppose m n

More information

MATH 304 Linear Algebra Lecture 19: Least squares problems (continued). Norms and inner products.

MATH 304 Linear Algebra Lecture 19: Least squares problems (continued). Norms and inner products. MATH 304 Linear Algebra Lecture 19: Least squares problems (continued). Norms and inner products. Orthogonal projection Theorem 1 Let V be a subspace of R n. Then any vector x R n is uniquely represented

More information

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line Chapter 7 Linear Regression (Pt. 1) 7.1 Introduction Recall that r, the correlation coefficient, measures the linear association between two quantitative variables. Linear regression is the method of fitting

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Lecture 13: Simple Linear Regression in Matrix Format

Lecture 13: Simple Linear Regression in Matrix Format See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 13: Simple Linear Regression in Matrix Format 36-401, Section B, Fall 2015 13 October 2015 Contents 1 Least Squares in Matrix

More information

Chapter 2: simple regression model

Chapter 2: simple regression model Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Math 2331 Linear Algebra

Math 2331 Linear Algebra 4.5 The Dimension of a Vector Space Math 233 Linear Algebra 4.5 The Dimension of a Vector Space Shang-Huan Chiu Department of Mathematics, University of Houston schiu@math.uh.edu math.uh.edu/ schiu/ Shang-Huan

More information

Problem Set 9 Due: In class Tuesday, Nov. 27 Late papers will be accepted until 12:00 on Thursday (at the beginning of class).

Problem Set 9 Due: In class Tuesday, Nov. 27 Late papers will be accepted until 12:00 on Thursday (at the beginning of class). Math 3, Fall Jerry L. Kazdan Problem Set 9 Due In class Tuesday, Nov. 7 Late papers will be accepted until on Thursday (at the beginning of class).. Suppose that is an eigenvalue of an n n matrix A and

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Fitting a regression model

Fitting a regression model Fitting a regression model We wish to fit a simple linear regression model: y = β 0 + β 1 x + ɛ. Fitting a model means obtaining estimators for the unknown population parameters β 0 and β 1 (and also for

More information

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is. Linear regression We have that the estimated mean in linear regression is The standard error of ˆµ Y X=x is where x = 1 n s.e.(ˆµ Y X=x ) = σ ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. 1 n + (x x)2 i (x i x) 2 i x i. The

More information

Summer Session Practice Final Exam

Summer Session Practice Final Exam Math 2F Summer Session 25 Practice Final Exam Time Limit: Hours Name (Print): Teaching Assistant This exam contains pages (including this cover page) and 9 problems. Check to see if any pages are missing.

More information

Lecture 6: Linear Regression (continued)

Lecture 6: Linear Regression (continued) Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.

More information

Row Space, Column Space, and Nullspace

Row Space, Column Space, and Nullspace Row Space, Column Space, and Nullspace MATH 322, Linear Algebra I J. Robert Buchanan Department of Mathematics Spring 2015 Introduction Every matrix has associated with it three vector spaces: row space

More information

ES-2 Lecture: More Least-squares Fitting. Spring 2017

ES-2 Lecture: More Least-squares Fitting. Spring 2017 ES-2 Lecture: More Least-squares Fitting Spring 2017 Outline Quick review of least-squares line fitting (also called `linear regression ) How can we find the best-fit line? (Brute-force method is not efficient)

More information

For each problem, place the letter choice of your answer in the spaces provided on this page.

For each problem, place the letter choice of your answer in the spaces provided on this page. Math 6 Final Exam Spring 6 Your name Directions: For each problem, place the letter choice of our answer in the spaces provided on this page...... 6. 7. 8. 9....... 6. 7. 8. 9....... B signing here, I

More information

x 21 x 22 x 23 f X 1 X 2 X 3 ε

x 21 x 22 x 23 f X 1 X 2 X 3 ε Chapter 2 Estimation 2.1 Example Let s start with an example. Suppose that Y is the fuel consumption of a particular model of car in m.p.g. Suppose that the predictors are 1. X 1 the weight of the car

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression Chapter 12 12-1 North Seattle Community College BUS21 Business Statistics Chapter 12 Learning Objectives In this chapter, you learn:! How to use regression analysis to predict the value of a dependent

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

9. Robust regression

9. Robust regression 9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

MATH 567: Mathematical Techniques in Data Science Categorical data

MATH 567: Mathematical Techniques in Data Science Categorical data 1/10 MATH 567: Mathematical Techniques in Data Science Categorical data Dominique Guillot Departments of Mathematical Sciences University of Delaware February 27, 2017 Predicting categorical variables

More information

Linear Least Square Problems Dr.-Ing. Sudchai Boonto

Linear Least Square Problems Dr.-Ing. Sudchai Boonto Dr-Ing Sudchai Boonto Department of Control System and Instrumentation Engineering King Mongkuts Unniversity of Technology Thonburi Thailand Linear Least-Squares Problems Given y, measurement signal, find

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

ECON 450 Development Economics

ECON 450 Development Economics ECON 450 Development Economics Statistics Background University of Illinois at Urbana-Champaign Summer 2017 Outline 1 Introduction 2 3 4 5 Introduction Regression analysis is one of the most important

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

STA 2101/442 Assignment Four 1

STA 2101/442 Assignment Four 1 STA 2101/442 Assignment Four 1 One version of the general linear model with fixed effects is y = Xβ + ɛ, where X is an n p matrix of known constants with n > p and the columns of X linearly independent.

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 2: Linear Regression (v3) Ramesh Johari rjohari@stanford.edu September 29, 2017 1 / 36 Summarizing a sample 2 / 36 A sample Suppose Y = (Y 1,..., Y n ) is a sample of real-valued

More information

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016 Linear models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark October 5, 2016 1 / 16 Outline for today linear models least squares estimation orthogonal projections estimation

More information

Math 4377/6308 Advanced Linear Algebra

Math 4377/6308 Advanced Linear Algebra 2. Linear Transformations Math 4377/638 Advanced Linear Algebra 2. Linear Transformations, Null Spaces and Ranges Jiwen He Department of Mathematics, University of Houston jiwenhe@math.uh.edu math.uh.edu/

More information

Math 3191 Applied Linear Algebra

Math 3191 Applied Linear Algebra Math 9 Applied Linear Algebra Lecture : Null and Column Spaces Stephen Billups University of Colorado at Denver Math 9Applied Linear Algebra p./8 Announcements Study Guide posted HWK posted Math 9Applied

More information

Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics

Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics Prof. Carolina Caetano 1 Notation and language Recall the notation that we discussed in the previous classes. We call the outcome

More information

Math 2331 Linear Algebra

Math 2331 Linear Algebra 6.2 Orthogonal Sets Math 233 Linear Algebra 6.2 Orthogonal Sets Jiwen He Department of Mathematics, University of Houston jiwenhe@math.uh.edu math.uh.edu/ jiwenhe/math233 Jiwen He, University of Houston

More information

Math Models of OR: Extreme Points and Basic Feasible Solutions

Math Models of OR: Extreme Points and Basic Feasible Solutions Math Models of OR: Extreme Points and Basic Feasible Solutions John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 1180 USA September 018 Mitchell Extreme Points and Basic Feasible Solutions

More information

Linear Systems. Class 27. c 2008 Ron Buckmire. TITLE Projection Matrices and Orthogonal Diagonalization CURRENT READING Poole 5.4

Linear Systems. Class 27. c 2008 Ron Buckmire. TITLE Projection Matrices and Orthogonal Diagonalization CURRENT READING Poole 5.4 Linear Systems Math Spring 8 c 8 Ron Buckmire Fowler 9 MWF 9: am - :5 am http://faculty.oxy.edu/ron/math//8/ Class 7 TITLE Projection Matrices and Orthogonal Diagonalization CURRENT READING Poole 5. Summary

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD

More information

Linear Regression Linear Least Squares

Linear Regression Linear Least Squares Linear Regression Linear Least Squares ME 120 Notes Gerald Recktenwald Portland State University Department of Mechanical Engineering gerry@me.pdx.edu ME120: Linear Regression Introduction Introduction

More information

Well-developed and understood properties

Well-developed and understood properties 1 INTRODUCTION TO LINEAR MODELS 1 THE CLASSICAL LINEAR MODEL Most commonly used statistical models Flexible models Well-developed and understood properties Ease of interpretation Building block for more

More information

Math 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section and

Math 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section and Math 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section 2.1.1 and 8.1-8.2.6 Overview Scatterplots Explanatory and Response Variables Describing Association The Regression Equation

More information

Wed Feb The vector spaces 2, 3, n. Announcements: Warm-up Exercise:

Wed Feb The vector spaces 2, 3, n. Announcements: Warm-up Exercise: Wed Feb 2 4-42 The vector spaces 2, 3, n Announcements: Warm-up Exercise: 4-42 The vector space m and its subspaces; concepts related to "linear combinations of vectors" Geometric interpretation of vectors

More information

ft-uiowa-math2550 Assignment NOTRequiredJustHWformatOfQuizReviewForExam3part2 due 12/31/2014 at 07:10pm CST

ft-uiowa-math2550 Assignment NOTRequiredJustHWformatOfQuizReviewForExam3part2 due 12/31/2014 at 07:10pm CST me me ft-uiowa-math2550 Assignment NOTRequiredJustHWformatOfQuizReviewForExam3part2 due 12/31/2014 at 07:10pm CST 1. (1 pt) local/library/ui/eigentf.pg A is n n an matrices.. There are an infinite number

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

A Primer in Econometric Theory

A Primer in Econometric Theory A Primer in Econometric Theory Lecture 1: Vector Spaces John Stachurski Lectures by Akshay Shanker May 5, 2017 1/104 Overview Linear algebra is an important foundation for mathematics and, in particular,

More information

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition MATH 322, Linear Algebra I J. Robert Buchanan Department of Mathematics Spring 205 Motivation When working with an inner product space, the most

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Fitting Linear Statistical Models to Data by Least Squares I: Introduction

Fitting Linear Statistical Models to Data by Least Squares I: Introduction Fitting Linear Statistical Models to Data by Least Squares I: Introduction Brian R. Hunt and C. David Levermore University of Maryland, College Park Math 420: Mathematical Modeling January 25, 2012 version

More information

Case Study: Least-Squares Solutions

Case Study: Least-Squares Solutions Case Study: Least-Squares Solutions In this case study, we examine techniques for dealing with large systems of equations which have no solution. The problem of readjusting the North American Datum is

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Eigenvalues and Eigenvectors

Eigenvalues and Eigenvectors Eigenvalues and Eigenvectors Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) Eigenvalues and Eigenvectors Fall 2015 1 / 14 Introduction We define eigenvalues and eigenvectors. We discuss how to

More information

Designing Information Devices and Systems I Discussion 13B

Designing Information Devices and Systems I Discussion 13B EECS 6A Fall 7 Designing Information Devices and Systems I Discussion 3B. Orthogonal Matching Pursuit Lecture Orthogonal Matching Pursuit (OMP) algorithm: Inputs: A set of m songs, each of length n: S

More information

MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis

MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis Logistic regression MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware March 6,

More information

Correlation 1. December 4, HMS, 2017, v1.1

Correlation 1. December 4, HMS, 2017, v1.1 Correlation 1 December 4, 2017 1 HMS, 2017, v1.1 Chapter References Diez: Chapter 7 Navidi, Chapter 7 I don t expect you to learn the proofs what will follow. Chapter References 2 Correlation The sample

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

2-7 Solving Quadratic Inequalities. ax 2 + bx + c > 0 (a 0)

2-7 Solving Quadratic Inequalities. ax 2 + bx + c > 0 (a 0) Quadratic Inequalities In One Variable LOOKS LIKE a quadratic equation but Doesn t have an equal sign (=) Has an inequality sign (>,

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Math Linear Algebra

Math Linear Algebra Math 220 - Linear Algebra (Summer 208) Solutions to Homework #7 Exercise 6..20 (a) TRUE. u v v u = 0 is equivalent to u v = v u. The latter identity is true due to the commutative property of the inner

More information