MATH 567: Mathematical Techniques in Data Science Linear Regression: old and new
|
|
- Kelley Hunter
- 5 years ago
- Views:
Transcription
1 1/14 MATH 567: Mathematical Techniques in Data Science Linear Regression: old and new Dominique Guillot Departments of Mathematical Sciences University of Delaware February 13, 2017
2 Linear Regression: old and new 2/14 Typical problem: we are given n observations of variables X 1,..., X p and Y.
3 2/14 Linear Regression: old and new Typical problem: we are given n observations of variables X 1,..., X p and Y. Goal: Use X 1,..., X p to try to predict Y.
4 2/14 Linear Regression: old and new Typical problem: we are given n observations of variables X 1,..., X p and Y. Goal: Use X 1,..., X p to try to predict Y. Example: Cars data compiled using Kelley Blue Book (n = 805, p = 11).
5 2/14 Linear Regression: old and new Typical problem: we are given n observations of variables X 1,..., X p and Y. Goal: Use X 1,..., X p to try to predict Y. Example: Cars data compiled using Kelley Blue Book (n = 805, p = 11). Find a linear model Y = β 1 X β p X p.
6 2/14 Linear Regression: old and new Typical problem: we are given n observations of variables X 1,..., X p and Y. Goal: Use X 1,..., X p to try to predict Y. Example: Cars data compiled using Kelley Blue Book (n = 805, p = 11). Find a linear model Y = β 1 X β p X p. In the example, we want: price = β 1 mileage + β 2 cylinder +...
7 Linear regression: classical setting 3/14 p = nb. of variables, n = nb. of observations.
8 Linear regression: classical setting 3/14 p = nb. of variables, n = nb. of observations. Classical setting: n p (n much larger than p). With enough observations, we hope to be able to build a good model.
9 Linear regression: classical setting 3/14 p = nb. of variables, n = nb. of observations. Classical setting: n p (n much larger than p). With enough observations, we hope to be able to build a good model. Note: even if the true relationship between the variables is not linear, we can include transformations of variables.
10 Linear regression: classical setting 3/14 p = nb. of variables, n = nb. of observations. Classical setting: n p (n much larger than p). With enough observations, we hope to be able to build a good model. Note: even if the true relationship between the variables is not linear, we can include transformations of variables. E.g. X p+1 = X 2 1, X p+2 = X 2 2,...
11 Linear regression: classical setting 3/14 p = nb. of variables, n = nb. of observations. Classical setting: n p (n much larger than p). With enough observations, we hope to be able to build a good model. Note: even if the true relationship between the variables is not linear, we can include transformations of variables. E.g. X p+1 = X 2 1, X p+2 = X 2 2,... Note: adding transformed variables can increase p signicantly.
12 Linear regression: classical setting 3/14 p = nb. of variables, n = nb. of observations. Classical setting: n p (n much larger than p). With enough observations, we hope to be able to build a good model. Note: even if the true relationship between the variables is not linear, we can include transformations of variables. E.g. X p+1 = X 2 1, X p+2 = X 2 2,... Note: adding transformed variables can increase p signicantly. A complex model requires a lot of observations.
13 Linear regression: classical setting 3/14 p = nb. of variables, n = nb. of observations. Classical setting: n p (n much larger than p). With enough observations, we hope to be able to build a good model. Note: even if the true relationship between the variables is not linear, we can include transformations of variables. E.g. X p+1 = X 2 1, X p+2 = X 2 2,... Note: adding transformed variables can increase p signicantly. A complex model requires a lot of observations. Trade-o between complexity and interpretability.
14 Linear regression: classical setting p = nb. of variables, n = nb. of observations. Classical setting: n p (n much larger than p). With enough observations, we hope to be able to build a good model. Note: even if the true relationship between the variables is not linear, we can include transformations of variables. E.g. X p+1 = X 2 1, X p+2 = X 2 2,... Note: adding transformed variables can increase p signicantly. A complex model requires a lot of observations. Trade-o between complexity and interpretability. Modern setting: In modern problems, it is often the case that n p. Requires supplementary assumptions (e.g. sparsity). Can still build good models with very few observations. 3/14
15 Classical setting 4/14 Idea: Y R n 1 X R n p
16 Classical setting 4/14 Idea: Y R n 1 X R n p y 1 Y = y X = x 1 x 2... x p,... y n where x 1,..., x p R n 1 are the observations of X 1,... X p.
17 Classical setting 4/14 Idea: Y R n 1 X R n p y 1 Y = y X = x 1 x 2... x p,... y n where x 1,..., x p R n 1 are the observations of X 1,... X p. We want Y = β 1 X β p X p. Equivalent to solving β 1 β 2 Y = Xβ β =.. β p
18 Classical setting (cont.) 5/14 We need to solve Y = Xβ. In general, the system has no solution (n p) or innitely many solutions (n p).
19 Classical setting (cont.) 5/14 We need to solve Y = Xβ. In general, the system has no solution (n p) or innitely many solutions (n p). A popular approach is to solve the system in the least squares sense: ˆβ = argmin β R p Y Xβ 2.
20 Classical setting (cont.) 5/14 We need to solve Y = Xβ. In general, the system has no solution (n p) or innitely many solutions (n p). A popular approach is to solve the system in the least squares sense: ˆβ = argmin β R p Y Xβ 2. How do we compute the solution? Calculus approach:
21 Classical setting (cont.) 5/14 We need to solve Y = Xβ. In general, the system has no solution (n p) or innitely many solutions (n p). A popular approach is to solve the system in the least squares sense: ˆβ = argmin β R p Y Xβ 2. How do we compute the solution? Calculus approach: 0 = Y Xβ 2 = n (y k X k1 β 1 X k2 β 2 X kp β p ) 2 β i β i k=1 n = 2 (y k X k1 β 1 X k2 β 2 X kp β p ) ( X ki ) k=1
22 Classical setting (cont.) 5/14 We need to solve Y = Xβ. In general, the system has no solution (n p) or innitely many solutions (n p). A popular approach is to solve the system in the least squares sense: ˆβ = argmin β R p Y Xβ 2. How do we compute the solution? Calculus approach: 0 = Y Xβ 2 = n (y k X k1 β 1 X k2 β 2 X kp β p ) 2 β i β i k=1 n = 2 (y k X k1 β 1 X k2 β 2 X kp β p ) ( X ki ) Therefore, k=1 n n X ki (X k1 β 1 + X k2 β X kp β p ) = X ki y k k=1 k=1
23 Calculus approach (cont.) 6/14 Now n n X ki (X k1 β 1 +X k2 β 2 + +X kp β p ) = X ki y k i = 1,..., p, k=1 k=1 is equivalent to: X T Xβ = X T y (Normal equations).
24 Calculus approach (cont.) 6/14 Now n n X ki (X k1 β 1 +X k2 β 2 + +X kp β p ) = X ki y k i = 1,..., p, k=1 k=1 is equivalent to: X T Xβ = X T y (Normal equations). If X T X is invertible, then ˆβ = (X T X) 1 X T Y is the unique minimum of Y Xβ 2.
25 Calculus approach (cont.) 6/14 Now n n X ki (X k1 β 1 +X k2 β 2 + +X kp β p ) = X ki y k i = 1,..., p, k=1 k=1 is equivalent to: X T Xβ = X T y (Normal equations). If X T X is invertible, then ˆβ = (X T X) 1 X T Y is the unique minimum of Y Xβ 2. Proved by computing the Hessian matrix: 2 β i β j Y Xβ 2 = 2X T X.
26 Linear algebra approach 7/14 Want to solve Y = Xβ. Linear algebra approach: Recall: If V R n is a subspace and w V, then the best approximation of w be a vector in V is proj V (w).
27 Linear algebra approach 7/14 Want to solve Y = Xβ. Linear algebra approach: Recall: If V R n is a subspace and w V, then the best approximation of w be a vector in V is Best in the sense that: proj V (w). w proj V (w) w v v V.
28 Linear algebra approach 7/14 Want to solve Y = Xβ. Linear algebra approach: Recall: If V R n is a subspace and w V, then the best approximation of w be a vector in V is Best in the sense that: proj V (w). w proj V (w) w v v V. Note: Xβ col(x) = span(x 1,..., x p ). If Y col(x), then the best approximation of Y by a vector in col(x) is proj col(x)(y ).
29 Linear algebra approach (cont.) 8/14 So Y proj col(x)(y ) Y Xβ β R p.
30 Linear algebra approach (cont.) 8/14 So Y proj col(x)(y ) Y Xβ β R p. Therefore, to nd ˆβ, we solve X ˆβ = proj col(x)(y ) (Note: this system always has a solution.)
31 Linear algebra approach (cont.) 8/14 So Y proj col(x)(y ) Y Xβ β R p. Therefore, to nd ˆβ, we solve X ˆβ = proj col(x)(y ) (Note: this system always has a solution.) With a little more work, we can nd an explicit solution: Y X ˆβ = Y proj col(x)(y ) = proj col(x) (Y ).
32 Linear algebra approach (cont.) 8/14 So Y proj col(x)(y ) Y Xβ β R p. Therefore, to nd ˆβ, we solve X ˆβ = proj col(x)(y ) (Note: this system always has a solution.) With a little more work, we can nd an explicit solution: Recall Y X ˆβ = Y proj col(x)(y ) = proj col(x) (Y ). col(x) = null(x T ).
33 Linear algebra approach (cont.) 8/14 So Y proj col(x)(y ) Y Xβ β R p. Therefore, to nd ˆβ, we solve X ˆβ = proj col(x)(y ) (Note: this system always has a solution.) With a little more work, we can nd an explicit solution: Recall Thus, Y X ˆβ = Y proj col(x)(y ) = proj col(x) (Y ). col(x) = null(x T ). Y X ˆβ = proj null(x T )(Y ) null(x T ).
34 Linear algebra approach (cont.) 8/14 So Y proj col(x)(y ) Y Xβ β R p. Therefore, to nd ˆβ, we solve X ˆβ = proj col(x)(y ) (Note: this system always has a solution.) With a little more work, we can nd an explicit solution: Recall Thus, That implies: Y X ˆβ = Y proj col(x)(y ) = proj col(x) (Y ). col(x) = null(x T ). Y X ˆβ = proj null(x T )(Y ) null(x T ). X T (Y X ˆβ) = 0.
35 Linear algebra approach (cont.) 8/14 So Y proj col(x)(y ) Y Xβ β R p. Therefore, to nd ˆβ, we solve X ˆβ = proj col(x)(y ) (Note: this system always has a solution.) With a little more work, we can nd an explicit solution: Recall Thus, That implies: Equivalently, Y X ˆβ = Y proj col(x)(y ) = proj col(x) (Y ). col(x) = null(x T ). Y X ˆβ = proj null(x T )(Y ) null(x T ). X T X ˆβ = X T Y X T (Y X ˆβ) = 0. (Normal equations).
36 The least squares theorem 9/14 Theorem (Least squares theorem) Let A R n m and b R n. Then 1 Ax = b always has a least squares solution ˆx. 2 A vector ˆx is a least squares solution i it satises the normal equations A T Aˆx = A T b. 3 ˆx is unique the columns of A are linearly independent A T A is invertible. In that case, the unique least squares solution is given by ˆx = (A T A) 1 A T b.
37 The least squares theorem 9/14 Theorem (Least squares theorem) Let A R n m and b R n. Then 1 Ax = b always has a least squares solution ˆx. 2 A vector ˆx is a least squares solution i it satises the normal equations A T Aˆx = A T b. 3 ˆx is unique the columns of A are linearly independent A T A is invertible. In that case, the unique least squares solution is given by ˆx = (A T A) 1 A T b. In R: model <- lm(y X 1 + X X p ).
38 Measuring the t of a linear model 10/14 How good is our linear model? We examine the mean squared error: MSE( ˆβ) = 1 n y X ˆβ 2 = 1 n n (y i ŷ i ) 2. k=1
39 Measuring the t of a linear model 10/14 How good is our linear model? We examine the mean squared error: MSE( ˆβ) = 1 n y X ˆβ 2 = 1 n n (y i ŷ i ) 2. k=1 Example: model <- lm(auto$mpg ~ Auto$horsepower + Auto$weight) sm <- summary(model) mean(sm$residuals^2) # The MSE
40 The coecient of determination 11/14 The coecient of determination, called R squared and denoted R 2 : n R 2 i=1 = 1 (y i ŷ i ) 2 n i=1 (y i y) 2.
41 The coecient of determination 11/14 The coecient of determination, called R squared and denoted R 2 : n R 2 i=1 = 1 (y i ŷ i ) 2 n i=1 (y i y) 2. Often used to measure the quality of a linear model.
42 The coecient of determination 11/14 The coecient of determination, called R squared and denoted R 2 : n R 2 i=1 = 1 (y i ŷ i ) 2 n i=1 (y i y) 2. Often used to measure the quality of a linear model. In some sense, the R 2 measures how much better is the prediction, compared to a constant prediction equal to the average of the y i s.
43 The coecient of determination 11/14 The coecient of determination, called R squared and denoted R 2 : n R 2 i=1 = 1 (y i ŷ i ) 2 n i=1 (y i y) 2. Often used to measure the quality of a linear model. In some sense, the R 2 measures how much better is the prediction, compared to a constant prediction equal to the average of the y i s. In R: sm$r.squared. (As above, sm <- summary(model)).
44 The coecient of determination 11/14 The coecient of determination, called R squared and denoted R 2 : n R 2 i=1 = 1 (y i ŷ i ) 2 n i=1 (y i y) 2. Often used to measure the quality of a linear model. In some sense, the R 2 measures how much better is the prediction, compared to a constant prediction equal to the average of the y i s. In R: sm$r.squared. (As above, sm <- summary(model)). In a linear model with an intercept, R 2 equals the square of the correlation coecient between the observed Y and the predicted values Ŷ.
45 The coecient of determination 11/14 The coecient of determination, called R squared and denoted R 2 : n R 2 i=1 = 1 (y i ŷ i ) 2 n i=1 (y i y) 2. Often used to measure the quality of a linear model. In some sense, the R 2 measures how much better is the prediction, compared to a constant prediction equal to the average of the y i s. In R: sm$r.squared. (As above, sm <- summary(model)). In a linear model with an intercept, R 2 equals the square of the correlation coecient between the observed Y and the predicted values Ŷ. A model with a R 2 close to 1 ts the data well.
46 Measuring the t of a linear model (cont.) 12/14 We can examine the distribution of the residuals: hist(sm$residuals) Desirable properties: Symmetry Light tail.
47 Measuring the t of a linear model (cont.) 12/14 We can examine the distribution of the residuals: hist(sm$residuals) Desirable properties: Symmetry Light tail. A heavy tail suggests there may be outliers. Can use transformations such as log,, or 1/x to improve the t.
48 Measuring the t of a linear model (cont.) 13/14 Plotting the residuals as a function of the mpg (or tted values), we immediately observe some patterns. Outliers? Separate categories of cars?
49 Improving the model 14/14 Add more variables to the model. Select the best variables to include. Use transformations. Separate cars into categories. etc.
50 Improving the model 14/14 Add more variables to the model. Select the best variables to include. Use transformations. Separate cars into categories. etc. For example, let us t a model only for cars with a mpg less than 25:
MATH 829: Introduction to Data Mining and Analysis Linear Regression: old and new
1/15 MATH 829: Introduction to Data Mining and Analysis Linear Regression: old and new Dominique Guillot Departments of Mathematical Sciences University of Delaware February 10, 2016 Linear Regression:
More informationMATH 829: Introduction to Data Mining and Analysis Computing the lasso solution
1/16 MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution Dominique Guillot Departments of Mathematical Sciences University of Delaware February 26, 2016 Computing the lasso
More informationMATH 829: Introduction to Data Mining and Analysis Least angle regression
1/14 MATH 829: Introduction to Data Mining and Analysis Least angle regression Dominique Guillot Departments of Mathematical Sciences University of Delaware February 29, 2016 Least angle regression (LARS)
More informationLeast squares problems Linear Algebra with Computer Science Application
Linear Algebra with Computer Science Application April 8, 018 1 Least Squares Problems 11 Least Squares Problems What do you do when Ax = b has no solution? Inconsistent systems arise often in applications
More informationChapter 6: Orthogonality
Chapter 6: Orthogonality (Last Updated: November 7, 7) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved around.. Inner products
More informationWorksheet for Lecture 25 Section 6.4 Gram-Schmidt Process
Worksheet for Lecture Name: Section.4 Gram-Schmidt Process Goal For a subspace W = Span{v,..., v n }, we want to find an orthonormal basis of W. Example Let W = Span{x, x } with x = and x =. Give an orthogonal
More informationMATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression
1/9 MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression Dominique Guillot Deartments of Mathematical Sciences University of Delaware February 15, 2016 Distribution of regression
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationMATH 829: Introduction to Data Mining and Analysis Principal component analysis
1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models I
MATH 829: Introduction to Data Mining and Analysis Graphical Models I Dominique Guillot Departments of Mathematical Sciences University of Delaware May 2, 2016 1/12 Independence and conditional independence:
More informationMATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests
1/16 MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests Dominique Guillot Departments of Mathematical Sciences University of Delaware February 17, 2016 Statistical
More informationSolutions to Review Problems for Chapter 6 ( ), 7.1
Solutions to Review Problems for Chapter (-, 7 The Final Exam is on Thursday, June,, : AM : AM at NESBITT Final Exam Breakdown Sections % -,7-9,- - % -9,-,7,-,-7 - % -, 7 - % Let u u and v Let x x x x,
More information22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction
22s:152 Applied Linear Regression Chapter 5: Ordinary Least Squares Regression Part 2: Multiple Linear Regression Introduction Basic idea: we have more than one covariate or predictor for modeling a dependent
More information11 Regression. Introduction. The Correlation Coefficient. The Least-Squares Regression Line
11 Regression The Correlation Coefficient The Least-Squares Regression Line The Correlation Coefficient Introduction A bivariate data set consists of n, (x 1, y 1 ),, (x n, y n ). A scatterplot is a of
More informationMath 3191 Applied Linear Algebra
Math 9 Applied Linear Algebra Lecture : Orthogonal Projections, Gram-Schmidt Stephen Billups University of Colorado at Denver Math 9Applied Linear Algebra p./ Orthonormal Sets A set of vectors {u, u,...,
More informationSTAT 350: Geometry of Least Squares
The Geometry of Least Squares Mathematical Basics Inner / dot product: a and b column vectors a b = a T b = a i b i a b a T b = 0 Matrix Product: A is r s B is s t (AB) rt = s A rs B st Partitioned Matrices
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More informationAnalysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29
Analysis of variance Gilles Guillot gigu@dtu.dk September 30, 2013 Gilles Guillot (gigu@dtu.dk) September 30, 2013 1 / 29 1 Introductory example 2 One-way ANOVA 3 Two-way ANOVA 4 Two-way ANOVA with interactions
More informationLinear Models Review
Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign
More informationPOL 213: Research Methods
Brad 1 1 Department of Political Science University of California, Davis April 15, 2008 Some Matrix Basics What is a matrix? A rectangular array of elements arranged in rows and columns. 55 900 0 67 1112
More informationSimple Linear Regression for the MPG Data
Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory
More informationMath 2331 Linear Algebra
6. Orthogonal Projections Math 2 Linear Algebra 6. Orthogonal Projections Jiwen He Department of Mathematics, University of Houston jiwenhe@math.uh.edu math.uh.edu/ jiwenhe/math2 Jiwen He, University of
More informationChapter 16. Simple Linear Regression and dcorrelation
Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationMachine Learning (CSE 446): Learning as Minimizing Loss; Least Squares
Machine Learning (CSE 446): Learning as Minimizing Loss; Least Squares Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 13 Review 1 / 13 Alternate View of PCA: Minimizing
More informationChapter 16. Simple Linear Regression and Correlation
Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationWe will now find the one line that best fits the data on a scatter plot.
General Education Statistics Class Notes Least-Squares Regression (Section 4.2) We will now find the one line that best fits the data on a scatter plot. We have seen how two variables can be correlated
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationSTAT 3A03 Applied Regression With SAS Fall 2017
STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationPseudoinverse and Adjoint Operators
ECE 275AB Lecture 5 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 5 ECE 275A Pseudoinverse and Adjoint Operators ECE 275AB Lecture 5 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p.
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationMath 3191 Applied Linear Algebra
Math 191 Applied Linear Algebra Lecture 16: Change of Basis Stephen Billups University of Colorado at Denver Math 191Applied Linear Algebra p.1/0 Rank The rank of A is the dimension of the column space
More informationMath 261 Lecture Notes: Sections 6.1, 6.2, 6.3 and 6.4 Orthogonal Sets and Projections
Math 6 Lecture Notes: Sections 6., 6., 6. and 6. Orthogonal Sets and Projections We will not cover general inner product spaces. We will, however, focus on a particular inner product space the inner product
More informationCSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1
CSCI5654 (Linear Programming, Fall 2013) Lectures 10-12 Lectures 10,11 Slide# 1 Today s Lecture 1. Introduction to norms: L 1,L 2,L. 2. Casting absolute value and max operators. 3. Norm minimization problems.
More informationLeast Squares. Stephen Boyd. EE103 Stanford University. October 28, 2017
Least Squares Stephen Boyd EE103 Stanford University October 28, 2017 Outline Least squares problem Solution of least squares problem Examples Least squares problem 2 Least squares problem suppose m n
More informationMATH 304 Linear Algebra Lecture 19: Least squares problems (continued). Norms and inner products.
MATH 304 Linear Algebra Lecture 19: Least squares problems (continued). Norms and inner products. Orthogonal projection Theorem 1 Let V be a subspace of R n. Then any vector x R n is uniquely represented
More informationChapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line
Chapter 7 Linear Regression (Pt. 1) 7.1 Introduction Recall that r, the correlation coefficient, measures the linear association between two quantitative variables. Linear regression is the method of fitting
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationLecture 13: Simple Linear Regression in Matrix Format
See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 13: Simple Linear Regression in Matrix Format 36-401, Section B, Fall 2015 13 October 2015 Contents 1 Least Squares in Matrix
More informationChapter 2: simple regression model
Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.
More informationKeller: Stats for Mgmt & Econ, 7th Ed July 17, 2006
Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationMath 2331 Linear Algebra
4.5 The Dimension of a Vector Space Math 233 Linear Algebra 4.5 The Dimension of a Vector Space Shang-Huan Chiu Department of Mathematics, University of Houston schiu@math.uh.edu math.uh.edu/ schiu/ Shang-Huan
More informationProblem Set 9 Due: In class Tuesday, Nov. 27 Late papers will be accepted until 12:00 on Thursday (at the beginning of class).
Math 3, Fall Jerry L. Kazdan Problem Set 9 Due In class Tuesday, Nov. 7 Late papers will be accepted until on Thursday (at the beginning of class).. Suppose that is an eigenvalue of an n n matrix A and
More informationAMS 7 Correlation and Regression Lecture 8
AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationFitting a regression model
Fitting a regression model We wish to fit a simple linear regression model: y = β 0 + β 1 x + ɛ. Fitting a model means obtaining estimators for the unknown population parameters β 0 and β 1 (and also for
More informationLinear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.
Linear regression We have that the estimated mean in linear regression is The standard error of ˆµ Y X=x is where x = 1 n s.e.(ˆµ Y X=x ) = σ ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. 1 n + (x x)2 i (x i x) 2 i x i. The
More informationSummer Session Practice Final Exam
Math 2F Summer Session 25 Practice Final Exam Time Limit: Hours Name (Print): Teaching Assistant This exam contains pages (including this cover page) and 9 problems. Check to see if any pages are missing.
More informationLecture 6: Linear Regression (continued)
Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.
More informationRow Space, Column Space, and Nullspace
Row Space, Column Space, and Nullspace MATH 322, Linear Algebra I J. Robert Buchanan Department of Mathematics Spring 2015 Introduction Every matrix has associated with it three vector spaces: row space
More informationES-2 Lecture: More Least-squares Fitting. Spring 2017
ES-2 Lecture: More Least-squares Fitting Spring 2017 Outline Quick review of least-squares line fitting (also called `linear regression ) How can we find the best-fit line? (Brute-force method is not efficient)
More informationFor each problem, place the letter choice of your answer in the spaces provided on this page.
Math 6 Final Exam Spring 6 Your name Directions: For each problem, place the letter choice of our answer in the spaces provided on this page...... 6. 7. 8. 9....... 6. 7. 8. 9....... B signing here, I
More informationx 21 x 22 x 23 f X 1 X 2 X 3 ε
Chapter 2 Estimation 2.1 Example Let s start with an example. Suppose that Y is the fuel consumption of a particular model of car in m.p.g. Suppose that the predictors are 1. X 1 the weight of the car
More informationThis model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that
Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear
More informationChapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression
Chapter 12 12-1 North Seattle Community College BUS21 Business Statistics Chapter 12 Learning Objectives In this chapter, you learn:! How to use regression analysis to predict the value of a dependent
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More information9. Robust regression
9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More informationMATH 567: Mathematical Techniques in Data Science Categorical data
1/10 MATH 567: Mathematical Techniques in Data Science Categorical data Dominique Guillot Departments of Mathematical Sciences University of Delaware February 27, 2017 Predicting categorical variables
More informationLinear Least Square Problems Dr.-Ing. Sudchai Boonto
Dr-Ing Sudchai Boonto Department of Control System and Instrumentation Engineering King Mongkuts Unniversity of Technology Thonburi Thailand Linear Least-Squares Problems Given y, measurement signal, find
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationECON 450 Development Economics
ECON 450 Development Economics Statistics Background University of Illinois at Urbana-Champaign Summer 2017 Outline 1 Introduction 2 3 4 5 Introduction Regression analysis is one of the most important
More informationHomoskedasticity. Var (u X) = σ 2. (23)
Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This
More informationSTA 2101/442 Assignment Four 1
STA 2101/442 Assignment Four 1 One version of the general linear model with fixed effects is y = Xβ + ɛ, where X is an n p matrix of known constants with n > p and the columns of X linearly independent.
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 2: Linear Regression (v3) Ramesh Johari rjohari@stanford.edu September 29, 2017 1 / 36 Summarizing a sample 2 / 36 A sample Suppose Y = (Y 1,..., Y n ) is a sample of real-valued
More informationLinear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016
Linear models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark October 5, 2016 1 / 16 Outline for today linear models least squares estimation orthogonal projections estimation
More informationMath 4377/6308 Advanced Linear Algebra
2. Linear Transformations Math 4377/638 Advanced Linear Algebra 2. Linear Transformations, Null Spaces and Ranges Jiwen He Department of Mathematics, University of Houston jiwenhe@math.uh.edu math.uh.edu/
More informationMath 3191 Applied Linear Algebra
Math 9 Applied Linear Algebra Lecture : Null and Column Spaces Stephen Billups University of Colorado at Denver Math 9Applied Linear Algebra p./8 Announcements Study Guide posted HWK posted Math 9Applied
More informationNotes 6: Multivariate regression ECO 231W - Undergraduate Econometrics
Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics Prof. Carolina Caetano 1 Notation and language Recall the notation that we discussed in the previous classes. We call the outcome
More informationMath 2331 Linear Algebra
6.2 Orthogonal Sets Math 233 Linear Algebra 6.2 Orthogonal Sets Jiwen He Department of Mathematics, University of Houston jiwenhe@math.uh.edu math.uh.edu/ jiwenhe/math233 Jiwen He, University of Houston
More informationMath Models of OR: Extreme Points and Basic Feasible Solutions
Math Models of OR: Extreme Points and Basic Feasible Solutions John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 1180 USA September 018 Mitchell Extreme Points and Basic Feasible Solutions
More informationLinear Systems. Class 27. c 2008 Ron Buckmire. TITLE Projection Matrices and Orthogonal Diagonalization CURRENT READING Poole 5.4
Linear Systems Math Spring 8 c 8 Ron Buckmire Fowler 9 MWF 9: am - :5 am http://faculty.oxy.edu/ron/math//8/ Class 7 TITLE Projection Matrices and Orthogonal Diagonalization CURRENT READING Poole 5. Summary
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD
More informationLinear Regression Linear Least Squares
Linear Regression Linear Least Squares ME 120 Notes Gerald Recktenwald Portland State University Department of Mechanical Engineering gerry@me.pdx.edu ME120: Linear Regression Introduction Introduction
More informationWell-developed and understood properties
1 INTRODUCTION TO LINEAR MODELS 1 THE CLASSICAL LINEAR MODEL Most commonly used statistical models Flexible models Well-developed and understood properties Ease of interpretation Building block for more
More informationMath 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section and
Math 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section 2.1.1 and 8.1-8.2.6 Overview Scatterplots Explanatory and Response Variables Describing Association The Regression Equation
More informationWed Feb The vector spaces 2, 3, n. Announcements: Warm-up Exercise:
Wed Feb 2 4-42 The vector spaces 2, 3, n Announcements: Warm-up Exercise: 4-42 The vector space m and its subspaces; concepts related to "linear combinations of vectors" Geometric interpretation of vectors
More informationft-uiowa-math2550 Assignment NOTRequiredJustHWformatOfQuizReviewForExam3part2 due 12/31/2014 at 07:10pm CST
me me ft-uiowa-math2550 Assignment NOTRequiredJustHWformatOfQuizReviewForExam3part2 due 12/31/2014 at 07:10pm CST 1. (1 pt) local/library/ui/eigentf.pg A is n n an matrices.. There are an infinite number
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationA Primer in Econometric Theory
A Primer in Econometric Theory Lecture 1: Vector Spaces John Stachurski Lectures by Akshay Shanker May 5, 2017 1/104 Overview Linear algebra is an important foundation for mathematics and, in particular,
More informationOrthonormal Bases; Gram-Schmidt Process; QR-Decomposition
Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition MATH 322, Linear Algebra I J. Robert Buchanan Department of Mathematics Spring 205 Motivation When working with an inner product space, the most
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLecture 2. The Simple Linear Regression Model: Matrix Approach
Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution
More informationFitting Linear Statistical Models to Data by Least Squares I: Introduction
Fitting Linear Statistical Models to Data by Least Squares I: Introduction Brian R. Hunt and C. David Levermore University of Maryland, College Park Math 420: Mathematical Modeling January 25, 2012 version
More informationCase Study: Least-Squares Solutions
Case Study: Least-Squares Solutions In this case study, we examine techniques for dealing with large systems of equations which have no solution. The problem of readjusting the North American Datum is
More informationEstimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.
Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.
More informationEigenvalues and Eigenvectors
Eigenvalues and Eigenvectors Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) Eigenvalues and Eigenvectors Fall 2015 1 / 14 Introduction We define eigenvalues and eigenvectors. We discuss how to
More informationDesigning Information Devices and Systems I Discussion 13B
EECS 6A Fall 7 Designing Information Devices and Systems I Discussion 3B. Orthogonal Matching Pursuit Lecture Orthogonal Matching Pursuit (OMP) algorithm: Inputs: A set of m songs, each of length n: S
More informationMATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis
Logistic regression MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware March 6,
More informationCorrelation 1. December 4, HMS, 2017, v1.1
Correlation 1 December 4, 2017 1 HMS, 2017, v1.1 Chapter References Diez: Chapter 7 Navidi, Chapter 7 I don t expect you to learn the proofs what will follow. Chapter References 2 Correlation The sample
More informationOslo Class 6 Sparsity based regularization
RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More information2-7 Solving Quadratic Inequalities. ax 2 + bx + c > 0 (a 0)
Quadratic Inequalities In One Variable LOOKS LIKE a quadratic equation but Doesn t have an equal sign (=) Has an inequality sign (>,
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationMath Linear Algebra
Math 220 - Linear Algebra (Summer 208) Solutions to Homework #7 Exercise 6..20 (a) TRUE. u v v u = 0 is equivalent to u v = v u. The latter identity is true due to the commutative property of the inner
More information