9. Least squares data fitting
|
|
- Edwina Stone
- 6 years ago
- Views:
Transcription
1 L. Vandenberghe EE133A (Spring 2017) 9. Least squares data fitting model fitting regression linear-in-parameters models time series examples validation least squares classification statistics interpretation 9-1
2 Model fitting suppose x and a scalar quantity y are related as y f(x) x is the explanatory variable or independent variable y is the outcome, or response variable, or dependent variable we don t know f, but have some idea about its general form Model fitting find an approximate model ˆf for f, based on observations we use the notation ŷ for the model prediction of the outcome y: ŷ = ˆf(x) Least squares data fitting 9-2
3 Prediction error we have data x 1, y 1, x 2, y 2,..., x N, y N data are also called observations, examples, samples, measurements model prediction for data sample i is ŷ i = ˆf(x i ) the prediction error or residual for data sample i is r i = ŷ i y i = ˆf(x i ) y i the model ˆf fits the data well if the N residuals ri are small prediction error can be quantified using the mean square error (MSE) 1 N N i=1 r 2 i the square root of the MSE is the RMS error Least squares data fitting 9-3
4 Outline model fitting regression linear-in-parameters models time series examples validation least squares classification statistics interpretation
5 Regression we first consider the regression model (page 1-32): ˆf(x) = x T β + v here the independent variable x is an n-vector the elements of x are the regressors the model is parameterized by the weight vector β and the offset (intercept) v the prediction error for data sample i is r i = ˆf(x i ) y i = x T i β + v y i the MSE is 1 N N ri 2 = 1 N i=1 N (x T i β + v y i ) 2 i=1 Least squares data fitting 9-4
6 Least squares regression choose the model parameters v, β that minimize the MSE 1 N N (x T i β + v y i ) 2 i=1 this is a least squares problem: minimize Aθ y 2 with A = 1 x T 1 1 x T x T N, θ = [ v β ], y = y 1 y 2. y N we write the solution as ˆθ = (ˆv, ˆβ) Least squares data fitting 9-5
7 Example: house price regression model example of page 1-32 ŷ = x T β + v ŷ is predicted sales price (in 1000 dollars); y is actual sales price two regressors: x 1 is house area; x 2 is number of bedrooms 800 data set of N = 774 house sales RMS error of least squares fit is 74.8 Predicted price ŷ Actual price y Least squares data fitting 9-6
8 Example: house price regression model regression model with additional regressors feature vector x has 7 elements ŷ = x T β + v x 1 is area of the house (in 1000 square feet) x 2 = max {x 1 1.5, 0}, i.e., area in excess of 1.5 (in 1000 square feet) x 3 is number of bedrooms x 4 is one for a condo; zero otherwise x 5, x 6, x 7 specify location (four groups of ZIP codes) Location x 5 x 6 x 7 A B C D Least squares data fitting 9-7
9 Example: house price regression model use least squares to fit the eight model parameters v, β RMS fitting error is Predicted price ŷ (thousand dollars) Actual price y (thousand dollars) Least squares data fitting 9-8
10 Outline model fitting regression linear-in-parameters models time series examples validation least squares classification statistics interpretation
11 Linear-in-parameters model we choose the model ˆf(x) from a family of models ˆf(x) = θ 1 f 1 (x) + θ 2 f 2 (x) + + θ p f p (x) the functions f i are scalar valued basis functions (chosen by us) the basis functions often include a constant function (typically, f 1 (x) = 1) the coefficients θ 1,..., θ p are the model parameters the model ˆf(x) is linear in the parameters θi if f 1 (x) = 1, this can be interpreted as a regression model ŷ = β T x + v with parameters v = θ 1, β = θ 2:p and new features x generated from x: x 1 = f 2 (x),..., x p = f p (x), Least squares data fitting 9-9
12 Least squares model fitting fit linear-in-parameters model to data set (x 1, y 1 ),..., (x N, y N ) residual for data sample i is r i = ˆf(x i ) y i = θ 1 f 1 (x i ) + + θ p f p (x i ) y i least squares model fitting: choose parameters θ by minimizing MSE 1 N (r2 1 + r r 2 N) = 1 N r 2 this is a least squares problem: minimize Aθ y 2 with A = f 1 (x 1 ) f 2 (x 1 ) f p (x 1 ) f 1 (x 2 ) f 2 (x 2 ) f p (x 2 ) f 1 (x N ) f 2 (x N ) f p (x N ), θ = θ 1 θ 2.. θ p, y = y 1 y 2.. y N Least squares data fitting 9-10
13 Example: polynomial approximation ˆf(x) = θ 1 + θ 2 x + θ 3 x θ p x p 1 a linear-in-parameters model with basis functions 1, x,..., x p 1 least squares model fitting: choose parameters θ by minimizing MSE 1 N ( (f(x1 ) y 1 ) 2 + (f(x 2 ) y 2 ) (f(x N ) y N ) 2) in matrix notation: minimize Aθ y 2 with A = 1 x 1 x 2 1 x p x 2 x 2 2 x p x N x 2 N xp 1 N 2. Least squares data fitting 9-11
14 Example ˆf(x) degree 2 (p = 3) ˆf(x) degree 6 ˆf(x) degree 10 x ˆf(x) degree 15 x data set of 100 samples x x Least squares data fitting 9-12
15 Piecewise-affine function define knot points a 1 < a 2 < < a k on the real axis piecewise-affine function is continuous, and affine on each interval [a k, a k+1 ] piecewise-affine function with knot points a 1,..., a k can be written as f(x) = θ 1 + θ 2 x + θ 3 (x a 1 ) θ 2+k (x a k ) + where u + = max {u, 0} (x + 1) (x 1) x x Least squares data fitting 9-13
16 Piecewise-affine function fitting piecewise-affine model is in linear in the parameters θ, with basis functions f 1 (x) = 1, f 2 (x) = x, f 3 (x) = (x a 1 ) +,..., f k+2 (x) = (x a k ) + Example: fit piecewise-affine function with knots a 1 = 1, a 2 = 1 to 100 points ˆf(x) x Least squares data fitting 9-14
17 Outline model fitting regression linear-in-parameters models time series examples validation least squares classification statistics interpretation
18 Time series trend N data samples from time series: y i is value at time i, for i = 1,..., N straight-line fit ŷ i = θ 1 + θ 2 i is the trend line difference y ŷ is the de-trended time series least squares fitting of trend line: minimize Aθ y 2 with A = N, y = y 1 y 2 y 3.. y N Least squares data fitting 9-15
19 Example: world petroleum consumption time series of world petroleum consumption (million barrels/day) versus year left figure shows data samples and trend line right figure shows de-trended time series Consumption De-trended consumption Year Year Least squares data fitting 9-16
20 Trend plus seasonal component model time series as a linear trend plus a periodic component with period P : ŷ = ŷ trend + ŷ seas with ŷ trend = θ 1 (1, 2,..., N) and ŷ seas = (θ 2, θ 3,..., θ P +1, θ 2, θ 3,..., θ P +1,..., θ 2, θ 3,..., θ P +1 ) the mean of ŷ seas serves as a constant offset difference y ŷ is the de-trended, seasonally adjusted time series least squares formulation: minimize Aθ y 2 with A 1:N,1 = 1 2. N, A 1:N,2:P +1 = I P I P.. I P, y = y 1 y 2. y N Least squares data fitting 9-17
21 Example: vehicle miles traveled in the US per month 10 5 Data Miles (millions) Jan Least squares fit of linear trend and seasonal (12-month) component Miles (millions) Least squares data fitting 9-18
22 Auto-regressive (AR) time series model ẑ t+1 = β 1 z t + + β M z t M+1, t = M, M + 1,... z 1, z 2,... is a time series ẑ t+1 is a prediction of z t+1, made at time t prediction ẑ t+1 is a linear function of previous M values z t,..., z t M+1 M is the memory of the model Least squares fitting of AR model: given oberved data z 1,..., z T, minimize (ẑ M+1 z M+1 ) 2 + (ẑ M+2 z M+2 ) (ẑ T z T ) 2 this is a least squares problem: minimize Aβ y 2 with A = z M z M 1 z 1 z M+1 z M z 2... z T 1 z T 2 z T M, β = β 1 β 2.. β M, y = z M+1 z M+2. z T Least squares data fitting 9-19
23 Example: hourly temperature at LAX Temperature ( F) t AR model of memory M = 8 model was fit on time series of length T = 744 (May 1 31, 2016) plot shows first five days Least squares data fitting 9-20
24 Outline model fitting regression linear-in-parameters models time series examples validation least squares classification statistics interpretation
25 Generalization and validation Generalization ability: ability of model to predict outcomes for new data Model validation: to assess generalization ability, divide data in two sets: training set and test (or validation) set use training set to fit model use test set to get an idea of generalization ability this is also called out-of-sample validation Over-fit model model with low prediction error on training set, bad generalization ability prediction error on training set is much smaller than on test set Least squares data fitting 9-21
26 Example: polynomial fitting Relative RMS error Train Test Degree training set is data set of 100 points used on page 9-11 test set is a similar set of 100 points plot suggests using degree 6 Least squares data fitting 9-22
27 Over-fitting polynomial of degree 20 on training and test set ˆf(x) training set ˆf(x) test set x x over-fitting is evident at the left end of the interval Least squares data fitting 9-23
28 Cross-validation an extension of out-of-sample validation divide data in K sets (folds); typical values are K = 5, K = 10 for i = 1 to K, fit model i using fold i as test set and other data as training set compare parameters and train/test RMS errors for the K models House price model (page 9-7) with 5 folds (155 or 154 samples each) Model parameters RMS error Fold v β 1 β 2 β 3 β 4 β 5 β 6 β 7 Train Test Least squares data fitting 9-24
29 Outline model fitting regression linear-in-parameters models time series example validation least squares classification statistics interpretation
30 Boolean (two-way) classification a data fitting problem where the outcome y can take two values +1, 1 values of y represent two categories (true/false, spam/not spam,... ) model ŷ = ˆf(x) is called a Boolean classifier Least squares classifier use least squares to fit a model f(x) to training data set (x1, y 1 ),..., (x N, y N ) f(x) can be a regression model f(x) = x T β + v or linear in parameters f(x) = θ 1 f 1 (x) + + θ p f p (x) take sign of f(x) to get a Boolean classifier ˆf(x) = sign( f(x)) = { +1 if f(x) 0 1 if f(x) < 0 Least squares data fitting 9-25
31 Example: handwritten digit classification MNIST data set used in homework images of handwritten digits (n = 28 2 = 784 pixels) data set contains training examples; test examples we only use the 493 pixels that are nonzero in at least 600 training examples Boolean classifier distinguishes digit zero (y = 1) from other digits (y = 1) Least squares data fitting 9-26
32 Classifier with basic regression model ˆf(x) = sign( f(x)) = sign(x T β + v) x is vector of 493 pixel intensities figure shows distribution of f(x i ) = x T i ˆβ + ˆv on training data set 0.1 Positive (digit 0) Negative (digits 1 9) Fraction f(x i ) blue bars to the left of dashed line are false negatives (misclassified digits zero) red bars to the right of dashed line are false positives (misclassified non-zeros) Least squares data fitting 9-27
33 Prediction error for each data point x, y we have four combinations of prediction and outcome Prediction Outcome ŷ = +1 ŷ = 1 y = +1 true positive false negative y = 1 false positive true negative classifier can be evaluated by counting data points for each combination Training data set Test data set Prediction Outcome ŷ = +1 ŷ = 1 Total y = y = All Prediction Outcome ŷ = +1 ŷ = 1 Total y = y = All error rate ( )/60000 = 1.6% error rate ( )/10000 = 1.6% Least squares data fitting 9-28
34 Classifier with additional nonlinear features p ˆf(x) = sign( f(x)) = sign( θ i f i (x)) basis functions include constant, 493 elements of x, plus 5000 functions i=1 f i (x) = max {0, r T i x + s i } with randomly generated r i, s i figure shows distribution of f(x i ) on training data set 0.15 Positive (digit 0) Negative (digits 1 9) Fraction f(x i ) Least squares data fitting 9-29
35 Prediction error Training data set: error rate 0.21% Prediction Outcome ŷ = +1 ŷ = 1 Total y = y = All Test data set: error rate 0.24% Prediction Outcome ŷ = +1 ŷ = 1 Total y = y = All Least squares data fitting 9-30
36 Multi-class classification a data fitting problem where the outcome y can takes values 1,..., K values of y represent K labels or categories multi-class classifier ŷ = ˆf(x) maps x to an element of {1, 2,..., K} Least squares multi-class classifier for k = 1,..., K, compute Boolean classifier to distinguish class k from not k ˆf k (x) = sign( f k (x)) define multi-class classifier as ˆf(x) = argmax k=1,...,k f k (x) Least squares data fitting 9-31
37 Example: handwritten digit classification we compute a least squares Boolean classifier for each digit versus the rest ˆf k (x) = sign(x T β k + v k ), k = 1,..., K table shows results for test set (error rate 13.9%) Prediction Digit Total All Least squares data fitting 9-32
38 Example: handwritten digit classification ten least squares Boolean classifiers use 5000 new features (page 9-29) table shows results for test set (error rate 2.6%) Prediction Digit Total All Least squares data fitting 9-33
39 Outline model fitting regression linear-in-parameters models time series example validation least squares classification statistics interpretation
40 Linear regression model y = Xβ + ɛ β is (non-random) p-vector of unknown parameters X is n p (data matrix of design matrix, i.e., result of experiment design) if there is an offset v, we include it in β and add column of ones in X ɛ is a random n-vector (random error or disturbance) y is an observable random n-vector this notation differs from previous sections but is common in statistics we discuss methods for estimating parameters β from observations of y Least squares data fitting 9-34
41 Assumptions X is tall (n > p) with linearly independent columns random disturbances ɛ i have zero mean E ɛ i = 0 for i = 1,..., n random disturbances have equal variances σ 2 E ɛ 2 i = σ 2 for i = 1,..., n random disturbances are uncorrelated (have zero covariances) E(ɛ i ɛ j ) = 0 for i, j = 1,..., n and i j last three assumptions can be combined using matrix and vector notation: E ɛ = 0, E ɛɛ T = σ 2 I Least squares data fitting 9-35
42 Least squares estimator least squares estimate ˆβ of parameters β, given the observations y, is ˆβ = X y = (X T X) 1 X T y y = Xβ + ɛ ɛ e = y X ˆβ range(x) Xβ X ˆβ X ˆβ is the orthogonal projection of y on range(x) residual e = y X ˆβ is an (observable) random variable Least squares data fitting 9-36
43 Mean and covariance of least squares estimate ˆβ = X (Xβ + ɛ) = β + X ɛ least squares estimator is unbiased: E ˆβ = β covariance matrix of least squares estimate is E ( ˆβ β)( ˆβ β) T = E ( (X ɛ)(x ɛ) T ) covariance of ˆβ i and ˆβ j (i j) is E = E ( (X T X) 1 X T ɛɛ T X(X T X) 1) = σ 2 (X T X) 1 ( ( ˆβ i β i )( ˆβ ) j β j ) = σ 2 ( (X T X) 1) ij for i = j, this is the variance of ˆβ i Least squares data fitting 9-37
44 Estimate of σ 2 ɛ y = Xβ + ɛ e = y X ˆβ E ɛ 2 = nσ 2 E e 2 = (n p)σ 2 Xβ X ˆβ E X( ˆβ β) 2 = pσ 2 (proof on next page) define estimate ˆσ of σ as ˆσ = e n p ˆσ 2 is an unbiased estimate of σ 2 : E ˆσ 2 = 1 n p E e 2 = σ 2 Least squares data fitting 9-38
45 Proof. first expression is immediate: E ɛ 2 = n i=1 E ɛ2 i = nσ2 to show that E X( ˆβ β) 2 = pσ 2, first note that X( ˆβ β) = XX y Xβ = XX (Xβ + ɛ) Xβ = XX ɛ = X(X T X) 1 X T ɛ on line 3 we used X X = I (however, note that XX I if X is tall!) squared norm of X(β ˆβ) is X( ˆβ β) 2 = ɛ T (XX ) 2 ɛ = ɛ T XX ɛ first step uses symmetry of XX ; second step, X X = I Least squares data fitting 9-39
46 expected value of squared norm is E X( ˆβ β) 2 = E ( ɛ T XX ɛ ) = i,j E(ɛ i ɛ j )(XX ) ij n = σ 2 (XX ) ii i=1 = σ 2 n i=1 j=1 p X ij (X ) ji p = σ 2 (X X) jj = pσ 2 j=1 expression E e 2 = (n p)σ 2 on page 9-38 now follows from ɛ 2 = e + X ˆβ Xβ 2 = e 2 + X( ˆβ β) 2 Least squares data fitting 9-40
47 Linear estimator linear regression model (p.9-34), with same assumptions as before (p.9-35): y = Xβ + ɛ a linear estimator of β maps observations y to the estimate ˆβ = By estimator is defined by the p n matrix B least squares estimator is an example with B = X Least squares data fitting 9-41
48 Unbiased linear estimator if B is a left inverse of X, then estimator ˆβ = By can be written as: ˆβ = By = B(Xβ + ɛ) = β + Bɛ this shows that the linear estimator is unbiased (E ˆβ = β) if BX = I covariance matrix of unbiased linear estimator is E ( ( ˆβ β)( ˆβ β) T ) = E ( Bɛɛ T B T ) = σ 2 BB T if c is an (non-random) p-vector, then estimate c T ˆβ of c T β has variance E (c T ˆβ c T β) 2 = σ 2 c T BB T c least squares estimator is an example with B = X and BB T = (X T X) 1 Least squares data fitting 9-42
49 Best linear unbiased estimator if B is a left inverse of X then for all p-vectors c c T BB T c c T (X T X) 1 c (proof on next page) left-hand side gives variance of c T ˆβ for linear unbiased estimator ˆβ = By right-hand side gives variance of c T ˆβls for least squares estimator ˆβ ls = X y least squares estimator is the best linear unbiased estimator (BLUE) this is known as the Gauss-Markov theorem Least squares data fitting 9-43
50 Proof. use BX = I to write BB T as BB T = (B (X T X) 1 X T )(B T X(X T X) 1 ) + (X T X) 1 = (B X )(B X ) T + (X T X) 1 hence, c T BB T C = c T (B X )(B X ) T c + c T (X T X) 1 c = (B X ) T c 2 + c T (X T X) 1 c c T (X T X) 1 c with equality if B = X Least squares data fitting 9-44
Least Squares Data Fitting
Least Squares Data Fitting Stephen Boyd EE103 Stanford University October 31, 2017 Outline Least squares model fitting Validation Feature engineering Least squares model fitting 2 Setup we believe a scalar
More informationLeast Squares Classification
Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting
More information12. Cholesky factorization
L. Vandenberghe ECE133A (Winter 2018) 12. Cholesky factorization positive definite matrices examples Cholesky factorization complex positive definite matrices kernel methods 12-1 Definitions a symmetric
More informationBusiness Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationIntroduction to Estimation Methods for Time Series models. Lecture 1
Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More information1. Vectors. notation. examples. vector operations. linear functions. complex vectors. complexity of vector computations
L. Vandenberghe ECE133A (Winter 2018) 1. Vectors notation examples vector operations linear functions complex vectors complexity of vector computations 1-1 Vector a vector is an ordered finite list of
More informationHomoskedasticity. Var (u X) = σ 2. (23)
Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationApplied Econometrics (QEM)
Applied Econometrics (QEM) The Simple Linear Regression Model based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #2 The Simple
More informationBANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1
BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationRegression: Lecture 2
Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationIn the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)
RNy, econ460 autumn 04 Lecture note Orthogonalization and re-parameterization 5..3 and 7.. in HN Orthogonalization of variables, for example X i and X means that variables that are correlated are made
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More informationMS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015
MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationProperties of the least squares estimates
Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationHOMEWORK #4: LOGISTIC REGRESSION
HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2018 Due: Friday, February 23rd, 2018, 11:55 PM Submit code and report via EEE Dropbox You should submit a
More informationReference: Davidson and MacKinnon Ch 2. In particular page
RNy, econ460 autumn 03 Lecture note Reference: Davidson and MacKinnon Ch. In particular page 57-8. Projection matrices The matrix M I X(X X) X () is often called the residual maker. That nickname is easy
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationRegression #3: Properties of OLS Estimator
Regression #3: Properties of OLS Estimator Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #3 1 / 20 Introduction In this lecture, we establish some desirable properties associated with
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationMa 3/103: Lecture 24 Linear Regression I: Estimation
Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the
More informationNonlinear Least Squares
Nonlinear Least Squares Stephen Boyd EE103 Stanford University December 6, 2016 Outline Nonlinear equations and least squares Examples Levenberg-Marquardt algorithm Nonlinear least squares classification
More informationECON The Simple Regression Model
ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationEconometrics Homework 1
Econometrics Homework Due Date: March, 24. by This problem set includes questions for Lecture -4 covered before midterm exam. Question Let z be a random column vector of size 3 : z = @ (a) Write out z
More informationAMS-207: Bayesian Statistics
Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationExample: Suppose Y has a Poisson distribution with mean
Transformations A variance stabilizing transformation may be useful when the variance of y appears to depend on the value of the regressor variables, or on the mean of y. Table 5.1 lists some commonly
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth
More informationChapter 14 Student Lecture Notes 14-1
Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:
More informationCOPYRIGHT. Abraham, B. and Ledolter, J. Introduction to Regression Modeling Belmont, CA: Duxbury Press, 2006
COPYRIGHT Abraham, B. and Ledolter, J. Introduction to Regression Modeling Belmont, CA: Duxbury Press, 2006 4 Multiple Linear Regression Model 4.1 INTRODUCTION In this chapter we consider the general linear
More informationECON3150/4150 Spring 2016
ECON3150/4150 Spring 2016 Lecture 6 Multiple regression model Siv-Elisabeth Skjelbred University of Oslo February 5th Last updated: February 3, 2016 1 / 49 Outline Multiple linear regression model and
More informationSF2930: REGRESION ANALYSIS LECTURE 1 SIMPLE LINEAR REGRESSION.
SF2930: REGRESION ANALYSIS LECTURE 1 SIMPLE LINEAR REGRESSION. Tatjana Pavlenko 17 January 2018 WHAT IS REGRESSION? INTRODUCTION Regression analysis is a statistical technique for investigating and modeling
More informationTime Series. Karanveer Mohan Keegan Go Stephen Boyd. EE103 Stanford University. November 15, 2016
Time Series Karanveer Mohan Keegan Go Stephen Boyd EE103 Stanford University November 15, 2016 Outline Introduction Linear operations Least-squares Prediction Introduction 2 Time series data represent
More informationLinear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77
Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical
More informationLinear Models Review
Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign
More informationProbability and Statistics Notes
Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline
More informationThe Multiple Regression Model Estimation
Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:
More informationLecture 3: Multiple Regression
Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u
More informationChapter 7 Student Lecture Notes 7-1
Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model
More informationRegression Models - Introduction
Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,
More informationHypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima
Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s
More informationLecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is
Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y
More informationXβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =
The Gauss-Markov Linear Model y Xβ + ɛ y is an n random vector of responses X is an n p matrix of constants with columns corresponding to explanatory variables X is sometimes referred to as the design
More informationECON 5350 Class Notes Functional Form and Structural Change
ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this
More informationLinear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Linear Models DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Linear regression Least-squares estimation
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationThe outline for Unit 3
The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter
More informationThe linear model is the most fundamental of all serious statistical models encompassing:
Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationThe regression model with one fixed regressor cont d
The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8
More informationSupport Vector Machines
Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationLecture 6: Linear Regression (continued)
Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.
More informationHOMEWORK #4: LOGISTIC REGRESSION
HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your
More informationThe Simple Regression Model. Part II. The Simple Regression Model
Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square
More informationBusiness Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More information4 Multiple Linear Regression
4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ
More informationFIRST MIDTERM EXAM ECON 7801 SPRING 2001
FIRST MIDTERM EXAM ECON 780 SPRING 200 ECONOMICS DEPARTMENT, UNIVERSITY OF UTAH Problem 2 points Let y be a n-vector (It may be a vector of observations of a random variable y, but it does not matter how
More informationLinear Regression Communication, skills, and understanding Calculator Use
Linear Regression Communication, skills, and understanding Title, scale and label the horizontal and vertical axes Comment on the direction, shape (form), and strength of the relationship and unusual features
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationSection 3: Simple Linear Regression
Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationEstimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27
Estimation of the Response Mean Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 27 The Gauss-Markov Linear Model y = Xβ + ɛ y is an n random vector of responses. X is an n p matrix
More informationChapter 1. Linear Regression with One Predictor Variable
Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationStatistics 135 Fall 2008 Final Exam
Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations
More informationWISE International Masters
WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are
More information6.867 Machine learning: lecture 2. Tommi S. Jaakkola MIT CSAIL
6.867 Machine learning: lecture 2 Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learning problem hypothesis class, estimation algorithm loss and estimation criterion sampling, empirical and
More informationThis model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that
Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear
More information1 Introduction to Generalized Least Squares
ECONOMICS 7344, Spring 2017 Bent E. Sørensen April 12, 2017 1 Introduction to Generalized Least Squares Consider the model Y = Xβ + ɛ, where the N K matrix of regressors X is fixed, independent of the
More informationPanel Data Models. James L. Powell Department of Economics University of California, Berkeley
Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More information4. Nonlinear regression functions
4. Nonlinear regression functions Up to now: Population regression function was assumed to be linear The slope(s) of the population regression function is (are) constant The effect on Y of a unit-change
More informationBivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.
Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationPractical Econometrics. for. Finance and Economics. (Econometrics 2)
Practical Econometrics for Finance and Economics (Econometrics 2) Seppo Pynnönen and Bernd Pape Department of Mathematics and Statistics, University of Vaasa 1. Introduction 1.1 Econometrics Econometrics
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationInverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1
Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is
More information