Linear regression. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Size: px

Start display at page:

Download "Linear regression. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda"

Cecil Hardy
5 years ago
Views:

1 Linear regression DS GA 1002 Statistical and Mathematical Models Carlos Fernandez-Granda

2 Linear models Least-squares estimation Overfitting Example: Global warming

3 Regression The aim is to learn a function h that relates a response or dependent variable y to several observed variables x 1, x 2,..., x p, known as covariates, features or independent variables The response is assumed to be of the form y = h ( x) + z where x R p contains the features and z is noise

4 Linear regression The regression function h is assumed to be linear y (i) = x (i) T β + z (i), 1 i n Our aim is to estimate β R p from the data

5 Linear regression In matrix form y (1) x (1) y (2) 1 x (1) 2 x (1) p β = x (2) 1 x (2) 2 x p (2) 1 z (1) β 2 + z (2) y (n) x (n) 1 x (n) 2 x p (n) β p z (n) Equivalently, y = X β + z

6 Linear model for GDP Population Unemployment GDP rate (%) (USD millions) California Minnesota Oregon Nevada Idaho Alaska South Carolina ???

7 Linear model for GDP After normalizing the features and the response y := , X := Aim: find β R 2 such that y X β The estimate for the GDP of South Carolina will be x T sc β

8 Linear models Least-squares estimation Overfitting Example: Global warming

9 Least squares For fixed β we can evaluate the error using n ) 2 (y (i) x (i) T β y = X β 2 i=1 2 The least-squares estimate β LS minimizes this cost function β LS := arg min y X β β 2

10 Least-squares fit Data Least-squares fit 0.8 y x

11 Linear model for GDP The least-squares estimate is β LS = [ ] GDP roughly proportional to the population Unemployment doesn t help (linearly)

12 Linear model for GDP GDP Estimate California Minnesota Oregon Nevada Idaho Alaska South Carolina

13 Geometric interpretation Any vector X β is in the span of the columns of X The least-squares estimate is the closest vector to y that can be represented in this way This is the projection of y onto the column space of X

14 Geometric interpretation

15 Probabilistic interpretation We model the noise as an iid Gaussian random vector Z Entries have zero mean and variance σ 2 The data are a realization of the random vector Y := X β + Z Y is Gaussian with mean X β and covariance matrix σ 2 I

16 Likelihood The joint pdf of Y is The likelihood is n ( 1 f Y ( a) := exp 1 ( ( i=1 2πσ 2σ 2 a i X β ) ) ) 2 i ( 1 = (2π) n σ exp 1 ) a X β 2 n 2σ 2 2 L y ( β ) = ( 1 (2π) n exp 1 2 y X β ) 2 2

17 Maximum-likelihood estimate The maximum-likelihood estimate is ( ) β ML = arg max L y β β ( ) = arg max log L y β β = arg min β = β LS y X β 2 2

18 Linear models Least-squares estimation Overfitting Example: Global warming

19 Temperature predictor A friend tells you: I found a cool way to predict the temperature in New York: It s just a linear combination of the temperature in every other state. I fit the model on data from the last month and a half and it s perfect!

20 Overfitting If a model is very complex, it may overfit the data To evaluate a model we separate the data into a training and a test set 1. We fit the model using the training set 2. We evaluate the error on the test set

21 Experiment X train, X test, z train and β are iid Gaussian with mean 0 and variance 1 y train = X train β + z train y test = X test β We use y train and X train to compute β LS error train = error test = X train βls 2 y train y train 2 X test βls 2 y test y test 2

22 Experiment Error (training) Error (test) Noise level (training) Relative error (l2 norm) n

23 Linear models Least-squares estimation Overfitting Example: Global warming

24 Maximum temperatures in Oxford, UK Temperature (Celsius)

25 Maximum temperatures in Oxford, UK Temperature (Celsius)

26 Linear model y t β 0 + β 1 cos ( ) ( ) 2πt + β 12 2πt 2 sin + β 12 3 t 1 t n is the time in months (n = )

27 Model fitted by least squares Temperature (Celsius) Data Model

28 Model fitted by least squares Temperature (Celsius) Data Model

29 Model fitted by least squares Temperature (Celsius) Data Model

30 Trend: Increase of 0.75 C / 100 years (1.35 F) Temperature (Celsius) Data Trend

31 Model for minimum temperatures Temperature (Celsius) Data Model

32 Model for minimum temperatures Temperature (Celsius) Data Model

33 Model for minimum temperatures Temperature (Celsius) Data Model

34 Trend: Increase of 0.88 C / 100 years (1.58 F) Temperature (Celsius) Data Trend

DS-GA 1002 Lecture notes 12 Fall Linear regression

DS-GA 1002 Lecture notes 12 Fall Linear regression DS-GA Lecture notes 1 Fall 16 1 Linear models Linear regression In statistics, regression consists of learning a function relating a certain quantity of interest y, the response or dependent variable,