Least Squares Data Fitting

Size: px
Start display at page:

Download "Least Squares Data Fitting"

Transcription

1 Least Squares Data Fitting Stephen Boyd EE103 Stanford University October 31, 2017

2 Outline Least squares model fitting Validation Feature engineering Least squares model fitting 2

3 Setup we believe a scalar y and an n-vector x are related by model y f(x) x is called the independent variable y is called the outcome or response variable f : R n R gives the relation between x and y often x is a feature vector, and y is something we want to predict we don t know f, which gives the true relationship between x and y Least squares model fitting 3

4 Data we are given some data x (1),..., x (N), y (1),..., y (N) also called observations, examples, samples, or measurements x (i), y (i) is ith data pair x (i) j is the jth component of ith data point x (i) Least squares model fitting 4

5 Model choose model ˆf : R n R, a guess or approximation of f linear in the parameters model form: ˆf(x) = θ 1 f 1 (x) + + θ p f p (x) f i : R n R are basis functions that we choose θ i are model parameters that we choose ŷ (i) = ˆf(x (i) ) is (the model s) prediction of y (i) we d like ŷ (i) y (i), i.e., model is consistent with observed data Least squares model fitting 5

6 Least squares data fitting prediction error or residual is r i = y (i) ŷ (i) least squares data fitting: choose model parameters θ i to minimize RMS prediction error on data set ( (r (1) ) (r (N) ) 2 N ) 1/2 this can be formulated (and solved) as a least squares problem Least squares model fitting 6

7 Least squares data fitting express y (i), ŷ (i), and r (i) as N-vectors y d = (y (1),..., y (N) ) is vector of outcomes ŷ d = (ŷ (1),..., ŷ (N) ) is vector of predictions r d = (r (1),..., r (N) ) is vector of residuals rms(r d ) is RMS prediction error define N p matrix A, A ij = f j (x (i) ), so ŷ d = Aθ least squares data fitting: choose θ to minimize r d 2 = y d ŷ d 2 = y d Aθ 2 = Aθ y d 2 ˆθ = (A T A) 1 A T y (if columns of A are independent) Aˆθ y 2 /N is minimum mean-square (fitting) error Least squares model fitting 7

8 Fitting a constant model simplest possible model: p = 1, f 1 (x) = 1, so model ˆf(x) = θ1 is a constant function A = 1, so ˆθ 1 = (1 T 1) 1 1 T y d = (1/N)1 T y d = avg(y d ) the mean of y (1),..., y (N) is the least squares fit by a constant MMSE is std(y d ) 2 ; RMS error is std(y d ) more sophisticated models are judged against the constant model Least squares model fitting 8

9 Fitting univariate functions when n = 1, we seek to approximate a function f : R R we can plot the data (x i, y i ) and the model function ŷ = ˆf(x) Least squares model fitting 9

10 Straight-line fit p = 2, with f 1 (x) = 1, f 2 (x) = x model has form ˆf(x) = θ 1 + θ 2 x matrix A has form 1 x (1) 1 x (2) A =.. 1 x (N) can work out ˆθ 1 and ˆθ 2 explicitly: ˆf(x) = avg(y d ) + ρ std(yd ) std(x d ) (x avg(xd )) where x d = (x (1),..., x (N) ) Least squares model fitting 10

11 Example ˆf(x) Least squares model fitting 11 x

12 Asset α and β x is return of whole market, y is return of a particular asset write straight-line model as ŷ = (r rf + α) + β(x µ mkt ) µ mkt is the average market return r rf is the risk-free interest rate several other slightly different definitions are used called asset α and β, widely used Least squares model fitting 12

13 Time series trend y (i) is value of quantity at time x (i) = i ŷ (i) = ˆθ 1 + ˆθ 2 i, i = 1,..., N, is called trend line y d ŷ d is called de-trended time series ˆθ2 is trend coefficient Least squares model fitting 13

14 World petroleum consumption 90 Petroleum consumption (million barrels per day) Year Least squares model fitting 14

15 World petroleum consumption, de-trended 6 Petroleum consumption (million barrels per day) Year Least squares model fitting 15

16 Polynomial fit f i (x) = x i 1, i = 1,..., p model is a polynomial of degree less than p ˆf(x) = θ 1 + θ 2 x + + θ p x p 1 (here x i means scalar x to ith power; x (i) is ith data point) A is Vandermonde matrix 1 x (1) (x (1) ) p 1 1 x (2) (x (2) ) p 1 A =... 1 x (N) (x (N) ) p 1 Least squares model fitting 16

17 Example N = 100 data points ˆf(x) degree 2 ˆf(x) degree 6 x x ˆf(x) degree 10 ˆf(x) degree 15 x Least squares model fitting 17 x

18 Regression as general data fitting regression model is affine function ŷ = ˆf(x) = x T β + v fits general fitting form with basis functions f 1 (x) = 1, f i (x) = x i 1, i = 2,..., n + 1 so model is ŷ = θ 1 + θ 2 x θ n+1 x n = x T θ 2:n + θ 1 β = θ 2:n+1, v = θ 1 Least squares model fitting 18

19 General data fitting as regression general fitting model ˆf(x) = θ1 f 1 (x) + + θ p f p (x) common assumption: f 1 (x) = 1 same as regression model ˆf( x) = x T β + v, with x = (f 2(x),..., f p(x)) are transformed features v = θ 1, β = θ 2:p Least squares model fitting 19

20 Auto-regressive time series model time zeries z 1, z 2,... auto-regressive (AR) prediction model: ẑ t+1 = θ 1 z t + + θ M z t M+1, t = M, M + 1,... M is memory of model ẑ t+1 is prediction of next value, based on previous M values we ll choose β to minimize sum of squares of prediction errors, (ẑ M+1 z M+1 ) (ẑ T z T ) 2 put in general form with y (i) = z M+i, x (i) = (z M+i 1,..., z i ), i = 1,..., T M Least squares model fitting 20

21 Example hourly temperature at LAX in May 2016, length 744 average is F, standard deviation 3.05 F predictor ẑ t+1 = z t gives RMS error 1.16 F predictor ẑ t+1 = z t 23 gives RMS error 1.73 F AR model with M = 8 gives RMS error 0.98 F Least squares model fitting 21

22 Example solid line shows one-hour ahead predictions from AR model, first 5 days Temperature ( F) k Least squares model fitting 22

23 Outline Least squares model fitting Validation Feature engineering Validation 23

24 Generalization basic idea: goal of model is not to predict outcome for the given data instead it is to predict the outcome on new, unseen data a model that makes reasonable predictions on new, unseen data has generalization ability, or generalizes a model that makes poor predictions on new, unseen data is said to suffer from over-fit Validation 24

25 Validation a simple and effective method to guess if a model will generalize split original data into a training set and a test set typical splits: 80%/20%, 90%/10% build ( train ) model on training data set then check the model s predictions on the test data set (can also compare RMS prediction error on train and test data) if they are similar, we can guess the model will generalize Validation 25

26 Validation can be used to choose among different candidate models, e.g. polynomials of different degrees regression models with different sets of regressors we d use one with low, or lowest, test error Validation 26

27 Example polynomials fit using training set of 100 points plots below show performance with test set of 100 points ˆf(x) degree 2 ˆf(x) degree 6 x x ˆf(x) degree 10 ˆf(x) degree 15 x x Validation 27

28 Example suggests degree 4, 5, or 6 are reasonable choices Relative RMS error Data set Test set Degree Validation 28

29 Cross validation to carry out cross validation: divide data into 10 folds for i = 1,..., 10, build (train) model using all folds except i test model on data in fold i interpreting cross validation results: if test RMS errors are much larger than train RMS errors, model is over-fit if test and train RMS errors are similar and consistent, we can guess the model will have a similar RMS error on future data Validation 29

30 Example house price, regression fit with x = (area/1000 ft. 2, bedrooms) 774 sales, divided into 5 folds of 155 sales each fit 5 regression models, removing each fold Model parameters RMS error Fold v β 1 β 2 Train Test Validation 30

31 Outline Least squares model fitting Validation Feature engineering Feature engineering 31

32 Feature engineering start with original or base feature n-vector x choose basis functions f 1,..., f p to create mapped feature p-vector (f 1 (x),..., f p (x)) now fit linear in parameters model with mapped features ŷ = θ 1 f 1 (x) + + θ p f p (x) check the model using validation Feature engineering 32

33 Transforming features standardizing features: replace x i with (x i b i )/a i b i mean value of the feature across the data a i standard deviation of the feature across the data new features are called z-scores log transform: if x i is nonnegative and spans a wide range, replace it with log(1 + x i ) hi and lo features: create new features given by max{x 1 b, 0}, min{x 1 a, 0} (called hi and lo versions of original feature x i ) Feature engineering 33

34 Example house price prediction start with base features x 1 is area of house (in 1000ft. 2 ) x 2 is number of bedrooms x 3 is 1 for condo, 0 for house x 4 is zip code of address (62 values) we ll use p = 8 basis functions: f 1(x) = 1, f 2(x) = x 1, f 3(x) = max{x 1 1.5, 0} f 4(x) = x 2, f 5(x) = x 3 f 6(x), f 7(x), f 8(x) are Boolean functions of x 4 which encode 4 groups of nearby zipcodes (i.e., neighborhood) five fold model validation Feature engineering 34

35 Example Model parameters RMS error Fold θ 1 θ 2 θ 3 θ 4 θ 5 θ 6 θ 7 θ 8 Train Test Feature engineering 35

9. Least squares data fitting

9. Least squares data fitting L. Vandenberghe EE133A (Spring 2017) 9. Least squares data fitting model fitting regression linear-in-parameters models time series examples validation least squares classification statistics interpretation

More information

Time Series. Karanveer Mohan Keegan Go Stephen Boyd. EE103 Stanford University. November 15, 2016

Time Series. Karanveer Mohan Keegan Go Stephen Boyd. EE103 Stanford University. November 15, 2016 Time Series Karanveer Mohan Keegan Go Stephen Boyd EE103 Stanford University November 15, 2016 Outline Introduction Linear operations Least-squares Prediction Introduction 2 Time series data represent

More information

Least Squares Classification

Least Squares Classification Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting

More information

Nonlinear Least Squares

Nonlinear Least Squares Nonlinear Least Squares Stephen Boyd EE103 Stanford University December 6, 2016 Outline Nonlinear equations and least squares Examples Levenberg-Marquardt algorithm Nonlinear least squares classification

More information

1. Vectors. notation. examples. vector operations. linear functions. complex vectors. complexity of vector computations

1. Vectors. notation. examples. vector operations. linear functions. complex vectors. complexity of vector computations L. Vandenberghe ECE133A (Winter 2018) 1. Vectors notation examples vector operations linear functions complex vectors complexity of vector computations 1-1 Vector a vector is an ordered finite list of

More information

Norm and Distance. Stephen Boyd. EE103 Stanford University. September 27, 2017

Norm and Distance. Stephen Boyd. EE103 Stanford University. September 27, 2017 Norm and Distance Stephen Boyd EE103 Stanford University September 27, 2017 Outline Norm and distance Distance Standard deviation Angle Norm and distance 2 Norm the Euclidean norm (or just norm) of an

More information

Final Exam. Problem Score Problem Score 1 /10 8 /10 2 /10 9 /10 3 /10 10 /10 4 /10 11 /10 5 /10 12 /10 6 /10 13 /10 7 /10 Total /130

Final Exam. Problem Score Problem Score 1 /10 8 /10 2 /10 9 /10 3 /10 10 /10 4 /10 11 /10 5 /10 12 /10 6 /10 13 /10 7 /10 Total /130 EE103/CME103: Introduction to Matrix Methods December 9 2015 S. Boyd Final Exam You may not use any books, notes, or computer programs (e.g., Julia). Throughout this exam we use standard mathematical notation;

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Least Squares. Stephen Boyd. EE103 Stanford University. October 28, 2017

Least Squares. Stephen Boyd. EE103 Stanford University. October 28, 2017 Least Squares Stephen Boyd EE103 Stanford University October 28, 2017 Outline Least squares problem Solution of least squares problem Examples Least squares problem 2 Least squares problem suppose m n

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Inverses. Stephen Boyd. EE103 Stanford University. October 28, 2017

Inverses. Stephen Boyd. EE103 Stanford University. October 28, 2017 Inverses Stephen Boyd EE103 Stanford University October 28, 2017 Outline Left and right inverses Inverse Solving linear equations Examples Pseudo-inverse Left and right inverses 2 Left inverses a number

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Distribution-Free Distribution Regression

Distribution-Free Distribution Regression Distribution-Free Distribution Regression Barnabás Póczos, Alessandro Rinaldo, Aarti Singh and Larry Wasserman AISTATS 2013 Presented by Esther Salazar Duke University February 28, 2014 E. Salazar (Reading

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Lecture 6: Linear Regression

Lecture 6: Linear Regression Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Opening Theme: Flexibility vs. Stability

Opening Theme: Flexibility vs. Stability Opening Theme: Flexibility vs. Stability Patrick Breheny August 25 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction We begin this course with a contrast of two simple, but very different,

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s

More information

Sequential Convex Programming

Sequential Convex Programming Sequential Convex Programming sequential convex programming alternating convex optimization convex-concave procedure Prof. S. Boyd, EE364b, Stanford University Methods for nonconvex optimization problems

More information

Linear Least Square Problems Dr.-Ing. Sudchai Boonto

Linear Least Square Problems Dr.-Ing. Sudchai Boonto Dr-Ing Sudchai Boonto Department of Control System and Instrumentation Engineering King Mongkuts Unniversity of Technology Thonburi Thailand Linear Least-Squares Problems Given y, measurement signal, find

More information

Train the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test?

Train the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test? Train the model with a subset of the data Test the model on the remaining data (the validation set) What data to choose for training vs. test? In a time-series dimension, it is natural to hold out the

More information

10. Multi-objective least squares

10. Multi-objective least squares L Vandenberghe ECE133A (Winter 2018) 10 Multi-objective least squares multi-objective least squares regularized data fitting control estimation and inversion 10-1 Multi-objective least squares we have

More information

Gradient Descent. Stephan Robert

Gradient Descent. Stephan Robert Gradient Descent Stephan Robert Model Other inputs # of bathrooms Lot size Year built Location Lake view Price (CHF) 6 5 4 3 2 1 2 4 6 8 1 Size (m 2 ) Model How it works Linear regression with one variable

More information

Least-squares data fitting

Least-squares data fitting EE263 Autumn 2015 S. Boyd and S. Lall Least-squares data fitting 1 Least-squares data fitting we are given: functions f 1,..., f n : S R, called regressors or basis functions data or measurements (s i,

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Lecture 6: Linear Regression (continued)

Lecture 6: Linear Regression (continued) Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.

More information

Ensemble Methods and Random Forests

Ensemble Methods and Random Forests Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

More information

1. Let A be a 2 2 nonzero real matrix. Which of the following is true?

1. Let A be a 2 2 nonzero real matrix. Which of the following is true? 1. Let A be a 2 2 nonzero real matrix. Which of the following is true? (A) A has a nonzero eigenvalue. (B) A 2 has at least one positive entry. (C) trace (A 2 ) is positive. (D) All entries of A 2 cannot

More information

Gradient Descent. Model

Gradient Descent. Model Gradient Descent Stephan Robert Model Other inputs # of bathrooms Lot size Year built Location Lake view Price (HF) 6 5 4 3 2 2 4 6 8 Size (m 2 ) 2 Model How it works Linear regression with one variable

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Simple linear regression

Simple linear regression Simple linear regression Business Statistics 41000 Fall 2015 1 Topics 1. conditional distributions, squared error, means and variances 2. linear prediction 3. signal + noise and R 2 goodness of fit 4.

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology ..... SPARROW SPARse approximation Weighted regression Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Université de Montréal March 12, 2012 SPARROW 1/47 .....

More information

12. Cholesky factorization

12. Cholesky factorization L. Vandenberghe ECE133A (Winter 2018) 12. Cholesky factorization positive definite matrices examples Cholesky factorization complex positive definite matrices kernel methods 12-1 Definitions a symmetric

More information

Three hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

Three hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER. Three hours To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER EXTREME VALUES AND FINANCIAL RISK Examiner: Answer QUESTION 1, QUESTION

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Integer-Valued Polynomials

Integer-Valued Polynomials Integer-Valued Polynomials LA Math Circle High School II Dillon Zhi October 11, 2015 1 Introduction Some polynomials take integer values p(x) for all integers x. The obvious examples are the ones where

More information

Introduction to Machine Learning (67577) Lecture 3

Introduction to Machine Learning (67577) Lecture 3 Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz

More information

6.867 Machine learning: lecture 2. Tommi S. Jaakkola MIT CSAIL

6.867 Machine learning: lecture 2. Tommi S. Jaakkola MIT CSAIL 6.867 Machine learning: lecture 2 Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learning problem hypothesis class, estimation algorithm loss and estimation criterion sampling, empirical and

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Modeling and Predicting Chaotic Time Series

Modeling and Predicting Chaotic Time Series Chapter 14 Modeling and Predicting Chaotic Time Series To understand the behavior of a dynamical system in terms of some meaningful parameters we seek the appropriate mathematical model that captures the

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Multiple Linear Regression for the Supervisor Data

Multiple Linear Regression for the Supervisor Data for the Supervisor Data Rating 40 50 60 70 80 90 40 50 60 70 50 60 70 80 90 40 60 80 40 60 80 Complaints Privileges 30 50 70 40 60 Learn Raises 50 70 50 70 90 Critical 40 50 60 70 80 30 40 50 60 70 80

More information

Additional Exercises for Vectors, Matrices, and Least Squares

Additional Exercises for Vectors, Matrices, and Least Squares Additional Exercises for Vectors, Matrices, and Least Squares Stephen Boyd Lieven Vandenberghe December 13, 2017 This is a collection of additional exercises for the book Introduction to Applied Linear

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

Estimation of the Click Volume by Large Scale Regression Analysis May 15, / 50

Estimation of the Click Volume by Large Scale Regression Analysis May 15, / 50 Estimation of the Click Volume by Large Scale Regression Analysis Yuri Lifshits, Dirk Nowotka [International Computer Science Symposium in Russia 07, Ekaterinburg] Presented by: Aditya Menon UCSD May 15,

More information

Fitting Linear Statistical Models to Data by Least Squares: Introduction

Fitting Linear Statistical Models to Data by Least Squares: Introduction Fitting Linear Statistical Models to Data by Least Squares: Introduction Radu Balan, Brian R. Hunt and C. David Levermore University of Maryland, College Park University of Maryland, College Park, MD Math

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Simple Linear Regression Estimation and Properties

Simple Linear Regression Estimation and Properties Simple Linear Regression Estimation and Properties Outline Review of the Reading Estimate parameters using OLS Other features of OLS Numerical Properties of OLS Assumptions of OLS Goodness of Fit Checking

More information

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. 1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Chapter 7: Model Assessment and Selection

Chapter 7: Model Assessment and Selection Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering

More information

Linear Regression 1 / 25. Karl Stratos. June 18, 2018

Linear Regression 1 / 25. Karl Stratos. June 18, 2018 Linear Regression Karl Stratos June 18, 2018 1 / 25 The Regression Problem Problem. Find a desired input-output mapping f : X R where the output is a real value. x = = y = 0.1 How much should I turn my

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Time Series 2. Robert Almgren. Sept. 21, 2009

Time Series 2. Robert Almgren. Sept. 21, 2009 Time Series 2 Robert Almgren Sept. 21, 2009 This week we will talk about linear time series models: AR, MA, ARMA, ARIMA, etc. First we will talk about theory and after we will talk about fitting the models

More information

Ordinary Least Squares Linear Regression

Ordinary Least Squares Linear Regression Ordinary Least Squares Linear Regression Ryan P. Adams COS 324 Elements of Machine Learning Princeton University Linear regression is one of the simplest and most fundamental modeling ideas in statistics

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Convex Optimization / Homework 1, due September 19

Convex Optimization / Homework 1, due September 19 Convex Optimization 1-725/36-725 Homework 1, due September 19 Instructions: You must complete Problems 1 3 and either Problem 4 or Problem 5 (your choice between the two). When you submit the homework,

More information

Decision trees COMS 4771

Decision trees COMS 4771 Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).

More information

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III) Title: Spatial Statistics for Point Processes and Lattice Data (Part III) Lattice Data Tonglin Zhang Outline Description Research Problems Global Clustering and Local Clusters Permutation Test Spatial

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering

More information

Sampling distribution of GLM regression coefficients

Sampling distribution of GLM regression coefficients Sampling distribution of GLM regression coefficients Patrick Breheny February 5 Patrick Breheny BST 760: Advanced Regression 1/20 Introduction So far, we ve discussed the basic properties of the score,

More information

FE570 Financial Markets and Trading. Stevens Institute of Technology

FE570 Financial Markets and Trading. Stevens Institute of Technology FE570 Financial Markets and Trading Lecture 5. Linear Time Series Analysis and Its Applications (Ref. Joel Hasbrouck - Empirical Market Microstructure ) Steve Yang Stevens Institute of Technology 9/25/2012

More information

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression Models. Chapter 4. Introduction. Introduction. Introduction Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Homework 6. Due: 10am Thursday 11/30/17

Homework 6. Due: 10am Thursday 11/30/17 Homework 6 Due: 10am Thursday 11/30/17 1. Hinge loss vs. logistic loss. In class we defined hinge loss l hinge (x, y; w) = (1 yw T x) + and logistic loss l logistic (x, y; w) = log(1 + exp ( yw T x ) ).

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Linear Independence. Stephen Boyd. EE103 Stanford University. October 9, 2017

Linear Independence. Stephen Boyd. EE103 Stanford University. October 9, 2017 Linear Independence Stephen Boyd EE103 Stanford University October 9, 2017 Outline Linear independence Basis Orthonormal vectors Gram-Schmidt algorithm Linear independence 2 Linear dependence set of n-vectors

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Logistic regression and linear classifiers COMS 4771

Logistic regression and linear classifiers COMS 4771 Logistic regression and linear classifiers COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random

More information

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Stephen Boyd and Almir Mutapcic Notes for EE364b, Stanford University, Winter 26-7 April 13, 28 1 Noisy unbiased subgradient Suppose f : R n R is a convex function. We say

More information

Multivariate Time Series: VAR(p) Processes and Models

Multivariate Time Series: VAR(p) Processes and Models Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with

More information

Bivariate Relationships Between Variables

Bivariate Relationships Between Variables Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Time Series Analysis

Time Series Analysis Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Regression based methods, 1st part: Introduction (Sec.

More information

Lecture 7: Kernels for Classification and Regression

Lecture 7: Kernels for Classification and Regression Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

CSC411 Fall 2018 Homework 5

CSC411 Fall 2018 Homework 5 Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to

More information

Concordia University (5+5)Q 1.

Concordia University (5+5)Q 1. (5+5)Q 1. Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/1 40 Examination Date Time Pages Mid Term Test May 26, 2004 Two Hours 3 Instructor Course Examiner

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Convex Functions Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Definition convex function Examples

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

Math Practice Exam 2 - solutions

Math Practice Exam 2 - solutions Math 181 - Practice Exam 2 - solutions Problem 1 A population of dinosaurs is modeled by P (t) = 0.3t 2 + 0.1t + 10 for times t in the interval [ 5, 0]. a) Find the rate of change of this population at

More information