Least Squares Data Fitting
|
|
- Dortha Stokes
- 6 years ago
- Views:
Transcription
1 Least Squares Data Fitting Stephen Boyd EE103 Stanford University October 31, 2017
2 Outline Least squares model fitting Validation Feature engineering Least squares model fitting 2
3 Setup we believe a scalar y and an n-vector x are related by model y f(x) x is called the independent variable y is called the outcome or response variable f : R n R gives the relation between x and y often x is a feature vector, and y is something we want to predict we don t know f, which gives the true relationship between x and y Least squares model fitting 3
4 Data we are given some data x (1),..., x (N), y (1),..., y (N) also called observations, examples, samples, or measurements x (i), y (i) is ith data pair x (i) j is the jth component of ith data point x (i) Least squares model fitting 4
5 Model choose model ˆf : R n R, a guess or approximation of f linear in the parameters model form: ˆf(x) = θ 1 f 1 (x) + + θ p f p (x) f i : R n R are basis functions that we choose θ i are model parameters that we choose ŷ (i) = ˆf(x (i) ) is (the model s) prediction of y (i) we d like ŷ (i) y (i), i.e., model is consistent with observed data Least squares model fitting 5
6 Least squares data fitting prediction error or residual is r i = y (i) ŷ (i) least squares data fitting: choose model parameters θ i to minimize RMS prediction error on data set ( (r (1) ) (r (N) ) 2 N ) 1/2 this can be formulated (and solved) as a least squares problem Least squares model fitting 6
7 Least squares data fitting express y (i), ŷ (i), and r (i) as N-vectors y d = (y (1),..., y (N) ) is vector of outcomes ŷ d = (ŷ (1),..., ŷ (N) ) is vector of predictions r d = (r (1),..., r (N) ) is vector of residuals rms(r d ) is RMS prediction error define N p matrix A, A ij = f j (x (i) ), so ŷ d = Aθ least squares data fitting: choose θ to minimize r d 2 = y d ŷ d 2 = y d Aθ 2 = Aθ y d 2 ˆθ = (A T A) 1 A T y (if columns of A are independent) Aˆθ y 2 /N is minimum mean-square (fitting) error Least squares model fitting 7
8 Fitting a constant model simplest possible model: p = 1, f 1 (x) = 1, so model ˆf(x) = θ1 is a constant function A = 1, so ˆθ 1 = (1 T 1) 1 1 T y d = (1/N)1 T y d = avg(y d ) the mean of y (1),..., y (N) is the least squares fit by a constant MMSE is std(y d ) 2 ; RMS error is std(y d ) more sophisticated models are judged against the constant model Least squares model fitting 8
9 Fitting univariate functions when n = 1, we seek to approximate a function f : R R we can plot the data (x i, y i ) and the model function ŷ = ˆf(x) Least squares model fitting 9
10 Straight-line fit p = 2, with f 1 (x) = 1, f 2 (x) = x model has form ˆf(x) = θ 1 + θ 2 x matrix A has form 1 x (1) 1 x (2) A =.. 1 x (N) can work out ˆθ 1 and ˆθ 2 explicitly: ˆf(x) = avg(y d ) + ρ std(yd ) std(x d ) (x avg(xd )) where x d = (x (1),..., x (N) ) Least squares model fitting 10
11 Example ˆf(x) Least squares model fitting 11 x
12 Asset α and β x is return of whole market, y is return of a particular asset write straight-line model as ŷ = (r rf + α) + β(x µ mkt ) µ mkt is the average market return r rf is the risk-free interest rate several other slightly different definitions are used called asset α and β, widely used Least squares model fitting 12
13 Time series trend y (i) is value of quantity at time x (i) = i ŷ (i) = ˆθ 1 + ˆθ 2 i, i = 1,..., N, is called trend line y d ŷ d is called de-trended time series ˆθ2 is trend coefficient Least squares model fitting 13
14 World petroleum consumption 90 Petroleum consumption (million barrels per day) Year Least squares model fitting 14
15 World petroleum consumption, de-trended 6 Petroleum consumption (million barrels per day) Year Least squares model fitting 15
16 Polynomial fit f i (x) = x i 1, i = 1,..., p model is a polynomial of degree less than p ˆf(x) = θ 1 + θ 2 x + + θ p x p 1 (here x i means scalar x to ith power; x (i) is ith data point) A is Vandermonde matrix 1 x (1) (x (1) ) p 1 1 x (2) (x (2) ) p 1 A =... 1 x (N) (x (N) ) p 1 Least squares model fitting 16
17 Example N = 100 data points ˆf(x) degree 2 ˆf(x) degree 6 x x ˆf(x) degree 10 ˆf(x) degree 15 x Least squares model fitting 17 x
18 Regression as general data fitting regression model is affine function ŷ = ˆf(x) = x T β + v fits general fitting form with basis functions f 1 (x) = 1, f i (x) = x i 1, i = 2,..., n + 1 so model is ŷ = θ 1 + θ 2 x θ n+1 x n = x T θ 2:n + θ 1 β = θ 2:n+1, v = θ 1 Least squares model fitting 18
19 General data fitting as regression general fitting model ˆf(x) = θ1 f 1 (x) + + θ p f p (x) common assumption: f 1 (x) = 1 same as regression model ˆf( x) = x T β + v, with x = (f 2(x),..., f p(x)) are transformed features v = θ 1, β = θ 2:p Least squares model fitting 19
20 Auto-regressive time series model time zeries z 1, z 2,... auto-regressive (AR) prediction model: ẑ t+1 = θ 1 z t + + θ M z t M+1, t = M, M + 1,... M is memory of model ẑ t+1 is prediction of next value, based on previous M values we ll choose β to minimize sum of squares of prediction errors, (ẑ M+1 z M+1 ) (ẑ T z T ) 2 put in general form with y (i) = z M+i, x (i) = (z M+i 1,..., z i ), i = 1,..., T M Least squares model fitting 20
21 Example hourly temperature at LAX in May 2016, length 744 average is F, standard deviation 3.05 F predictor ẑ t+1 = z t gives RMS error 1.16 F predictor ẑ t+1 = z t 23 gives RMS error 1.73 F AR model with M = 8 gives RMS error 0.98 F Least squares model fitting 21
22 Example solid line shows one-hour ahead predictions from AR model, first 5 days Temperature ( F) k Least squares model fitting 22
23 Outline Least squares model fitting Validation Feature engineering Validation 23
24 Generalization basic idea: goal of model is not to predict outcome for the given data instead it is to predict the outcome on new, unseen data a model that makes reasonable predictions on new, unseen data has generalization ability, or generalizes a model that makes poor predictions on new, unseen data is said to suffer from over-fit Validation 24
25 Validation a simple and effective method to guess if a model will generalize split original data into a training set and a test set typical splits: 80%/20%, 90%/10% build ( train ) model on training data set then check the model s predictions on the test data set (can also compare RMS prediction error on train and test data) if they are similar, we can guess the model will generalize Validation 25
26 Validation can be used to choose among different candidate models, e.g. polynomials of different degrees regression models with different sets of regressors we d use one with low, or lowest, test error Validation 26
27 Example polynomials fit using training set of 100 points plots below show performance with test set of 100 points ˆf(x) degree 2 ˆf(x) degree 6 x x ˆf(x) degree 10 ˆf(x) degree 15 x x Validation 27
28 Example suggests degree 4, 5, or 6 are reasonable choices Relative RMS error Data set Test set Degree Validation 28
29 Cross validation to carry out cross validation: divide data into 10 folds for i = 1,..., 10, build (train) model using all folds except i test model on data in fold i interpreting cross validation results: if test RMS errors are much larger than train RMS errors, model is over-fit if test and train RMS errors are similar and consistent, we can guess the model will have a similar RMS error on future data Validation 29
30 Example house price, regression fit with x = (area/1000 ft. 2, bedrooms) 774 sales, divided into 5 folds of 155 sales each fit 5 regression models, removing each fold Model parameters RMS error Fold v β 1 β 2 Train Test Validation 30
31 Outline Least squares model fitting Validation Feature engineering Feature engineering 31
32 Feature engineering start with original or base feature n-vector x choose basis functions f 1,..., f p to create mapped feature p-vector (f 1 (x),..., f p (x)) now fit linear in parameters model with mapped features ŷ = θ 1 f 1 (x) + + θ p f p (x) check the model using validation Feature engineering 32
33 Transforming features standardizing features: replace x i with (x i b i )/a i b i mean value of the feature across the data a i standard deviation of the feature across the data new features are called z-scores log transform: if x i is nonnegative and spans a wide range, replace it with log(1 + x i ) hi and lo features: create new features given by max{x 1 b, 0}, min{x 1 a, 0} (called hi and lo versions of original feature x i ) Feature engineering 33
34 Example house price prediction start with base features x 1 is area of house (in 1000ft. 2 ) x 2 is number of bedrooms x 3 is 1 for condo, 0 for house x 4 is zip code of address (62 values) we ll use p = 8 basis functions: f 1(x) = 1, f 2(x) = x 1, f 3(x) = max{x 1 1.5, 0} f 4(x) = x 2, f 5(x) = x 3 f 6(x), f 7(x), f 8(x) are Boolean functions of x 4 which encode 4 groups of nearby zipcodes (i.e., neighborhood) five fold model validation Feature engineering 34
35 Example Model parameters RMS error Fold θ 1 θ 2 θ 3 θ 4 θ 5 θ 6 θ 7 θ 8 Train Test Feature engineering 35
9. Least squares data fitting
L. Vandenberghe EE133A (Spring 2017) 9. Least squares data fitting model fitting regression linear-in-parameters models time series examples validation least squares classification statistics interpretation
More informationTime Series. Karanveer Mohan Keegan Go Stephen Boyd. EE103 Stanford University. November 15, 2016
Time Series Karanveer Mohan Keegan Go Stephen Boyd EE103 Stanford University November 15, 2016 Outline Introduction Linear operations Least-squares Prediction Introduction 2 Time series data represent
More informationLeast Squares Classification
Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting
More informationNonlinear Least Squares
Nonlinear Least Squares Stephen Boyd EE103 Stanford University December 6, 2016 Outline Nonlinear equations and least squares Examples Levenberg-Marquardt algorithm Nonlinear least squares classification
More information1. Vectors. notation. examples. vector operations. linear functions. complex vectors. complexity of vector computations
L. Vandenberghe ECE133A (Winter 2018) 1. Vectors notation examples vector operations linear functions complex vectors complexity of vector computations 1-1 Vector a vector is an ordered finite list of
More informationNorm and Distance. Stephen Boyd. EE103 Stanford University. September 27, 2017
Norm and Distance Stephen Boyd EE103 Stanford University September 27, 2017 Outline Norm and distance Distance Standard deviation Angle Norm and distance 2 Norm the Euclidean norm (or just norm) of an
More informationFinal Exam. Problem Score Problem Score 1 /10 8 /10 2 /10 9 /10 3 /10 10 /10 4 /10 11 /10 5 /10 12 /10 6 /10 13 /10 7 /10 Total /130
EE103/CME103: Introduction to Matrix Methods December 9 2015 S. Boyd Final Exam You may not use any books, notes, or computer programs (e.g., Julia). Throughout this exam we use standard mathematical notation;
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationLeast Squares. Stephen Boyd. EE103 Stanford University. October 28, 2017
Least Squares Stephen Boyd EE103 Stanford University October 28, 2017 Outline Least squares problem Solution of least squares problem Examples Least squares problem 2 Least squares problem suppose m n
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationInverses. Stephen Boyd. EE103 Stanford University. October 28, 2017
Inverses Stephen Boyd EE103 Stanford University October 28, 2017 Outline Left and right inverses Inverse Solving linear equations Examples Pseudo-inverse Left and right inverses 2 Left inverses a number
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationDistribution-Free Distribution Regression
Distribution-Free Distribution Regression Barnabás Póczos, Alessandro Rinaldo, Aarti Singh and Larry Wasserman AISTATS 2013 Presented by Esther Salazar Duke University February 28, 2014 E. Salazar (Reading
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationLecture 6: Linear Regression
Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationOpening Theme: Flexibility vs. Stability
Opening Theme: Flexibility vs. Stability Patrick Breheny August 25 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction We begin this course with a contrast of two simple, but very different,
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationHypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima
Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s
More informationSequential Convex Programming
Sequential Convex Programming sequential convex programming alternating convex optimization convex-concave procedure Prof. S. Boyd, EE364b, Stanford University Methods for nonconvex optimization problems
More informationLinear Least Square Problems Dr.-Ing. Sudchai Boonto
Dr-Ing Sudchai Boonto Department of Control System and Instrumentation Engineering King Mongkuts Unniversity of Technology Thonburi Thailand Linear Least-Squares Problems Given y, measurement signal, find
More informationTrain the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test?
Train the model with a subset of the data Test the model on the remaining data (the validation set) What data to choose for training vs. test? In a time-series dimension, it is natural to hold out the
More information10. Multi-objective least squares
L Vandenberghe ECE133A (Winter 2018) 10 Multi-objective least squares multi-objective least squares regularized data fitting control estimation and inversion 10-1 Multi-objective least squares we have
More informationGradient Descent. Stephan Robert
Gradient Descent Stephan Robert Model Other inputs # of bathrooms Lot size Year built Location Lake view Price (CHF) 6 5 4 3 2 1 2 4 6 8 1 Size (m 2 ) Model How it works Linear regression with one variable
More informationLeast-squares data fitting
EE263 Autumn 2015 S. Boyd and S. Lall Least-squares data fitting 1 Least-squares data fitting we are given: functions f 1,..., f n : S R, called regressors or basis functions data or measurements (s i,
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationLecture 6: Linear Regression (continued)
Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.
More informationEnsemble Methods and Random Forests
Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More information1. Let A be a 2 2 nonzero real matrix. Which of the following is true?
1. Let A be a 2 2 nonzero real matrix. Which of the following is true? (A) A has a nonzero eigenvalue. (B) A 2 has at least one positive entry. (C) trace (A 2 ) is positive. (D) All entries of A 2 cannot
More informationGradient Descent. Model
Gradient Descent Stephan Robert Model Other inputs # of bathrooms Lot size Year built Location Lake view Price (HF) 6 5 4 3 2 2 4 6 8 Size (m 2 ) 2 Model How it works Linear regression with one variable
More informationThe prediction of house price
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationSimple linear regression
Simple linear regression Business Statistics 41000 Fall 2015 1 Topics 1. conditional distributions, squared error, means and variances 2. linear prediction 3. signal + noise and R 2 goodness of fit 4.
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More information... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology
..... SPARROW SPARse approximation Weighted regression Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Université de Montréal March 12, 2012 SPARROW 1/47 .....
More information12. Cholesky factorization
L. Vandenberghe ECE133A (Winter 2018) 12. Cholesky factorization positive definite matrices examples Cholesky factorization complex positive definite matrices kernel methods 12-1 Definitions a symmetric
More informationThree hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.
Three hours To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER EXTREME VALUES AND FINANCIAL RISK Examiner: Answer QUESTION 1, QUESTION
More informationBusiness Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More informationInteger-Valued Polynomials
Integer-Valued Polynomials LA Math Circle High School II Dillon Zhi October 11, 2015 1 Introduction Some polynomials take integer values p(x) for all integers x. The obvious examples are the ones where
More informationIntroduction to Machine Learning (67577) Lecture 3
Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz
More information6.867 Machine learning: lecture 2. Tommi S. Jaakkola MIT CSAIL
6.867 Machine learning: lecture 2 Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learning problem hypothesis class, estimation algorithm loss and estimation criterion sampling, empirical and
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationModeling and Predicting Chaotic Time Series
Chapter 14 Modeling and Predicting Chaotic Time Series To understand the behavior of a dynamical system in terms of some meaningful parameters we seek the appropriate mathematical model that captures the
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationStatistics 910, #5 1. Regression Methods
Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known
More informationMultiple Linear Regression for the Supervisor Data
for the Supervisor Data Rating 40 50 60 70 80 90 40 50 60 70 50 60 70 80 90 40 60 80 40 60 80 Complaints Privileges 30 50 70 40 60 Learn Raises 50 70 50 70 90 Critical 40 50 60 70 80 30 40 50 60 70 80
More informationAdditional Exercises for Vectors, Matrices, and Least Squares
Additional Exercises for Vectors, Matrices, and Least Squares Stephen Boyd Lieven Vandenberghe December 13, 2017 This is a collection of additional exercises for the book Introduction to Applied Linear
More informationLinear regression COMS 4771
Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between
More informationEstimation of the Click Volume by Large Scale Regression Analysis May 15, / 50
Estimation of the Click Volume by Large Scale Regression Analysis Yuri Lifshits, Dirk Nowotka [International Computer Science Symposium in Russia 07, Ekaterinburg] Presented by: Aditya Menon UCSD May 15,
More informationFitting Linear Statistical Models to Data by Least Squares: Introduction
Fitting Linear Statistical Models to Data by Least Squares: Introduction Radu Balan, Brian R. Hunt and C. David Levermore University of Maryland, College Park University of Maryland, College Park, MD Math
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationSection 3: Simple Linear Regression
Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationSimple Linear Regression Estimation and Properties
Simple Linear Regression Estimation and Properties Outline Review of the Reading Estimate parameters using OLS Other features of OLS Numerical Properties of OLS Assumptions of OLS Goodness of Fit Checking
More informationDiagnostics can identify two possible areas of failure of assumptions when fitting linear models.
1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationChapter 7: Model Assessment and Selection
Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering
More informationLinear Regression 1 / 25. Karl Stratos. June 18, 2018
Linear Regression Karl Stratos June 18, 2018 1 / 25 The Regression Problem Problem. Find a desired input-output mapping f : X R where the output is a real value. x = = y = 0.1 How much should I turn my
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationTime Series 2. Robert Almgren. Sept. 21, 2009
Time Series 2 Robert Almgren Sept. 21, 2009 This week we will talk about linear time series models: AR, MA, ARMA, ARIMA, etc. First we will talk about theory and after we will talk about fitting the models
More informationOrdinary Least Squares Linear Regression
Ordinary Least Squares Linear Regression Ryan P. Adams COS 324 Elements of Machine Learning Princeton University Linear regression is one of the simplest and most fundamental modeling ideas in statistics
More informationA Significance Test for the Lasso
A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationConvex Optimization / Homework 1, due September 19
Convex Optimization 1-725/36-725 Homework 1, due September 19 Instructions: You must complete Problems 1 3 and either Problem 4 or Problem 5 (your choice between the two). When you submit the homework,
More informationDecision trees COMS 4771
Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).
More informationLattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)
Title: Spatial Statistics for Point Processes and Lattice Data (Part III) Lattice Data Tonglin Zhang Outline Description Research Problems Global Clustering and Local Clusters Permutation Test Spatial
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering
More informationSampling distribution of GLM regression coefficients
Sampling distribution of GLM regression coefficients Patrick Breheny February 5 Patrick Breheny BST 760: Advanced Regression 1/20 Introduction So far, we ve discussed the basic properties of the score,
More informationFE570 Financial Markets and Trading. Stevens Institute of Technology
FE570 Financial Markets and Trading Lecture 5. Linear Time Series Analysis and Its Applications (Ref. Joel Hasbrouck - Empirical Market Microstructure ) Steve Yang Stevens Institute of Technology 9/25/2012
More informationRegression Models. Chapter 4. Introduction. Introduction. Introduction
Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationHomework 6. Due: 10am Thursday 11/30/17
Homework 6 Due: 10am Thursday 11/30/17 1. Hinge loss vs. logistic loss. In class we defined hinge loss l hinge (x, y; w) = (1 yw T x) + and logistic loss l logistic (x, y; w) = log(1 + exp ( yw T x ) ).
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationReal Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report
Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important
More informationBusiness Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where
More informationLinear Independence. Stephen Boyd. EE103 Stanford University. October 9, 2017
Linear Independence Stephen Boyd EE103 Stanford University October 9, 2017 Outline Linear independence Basis Orthonormal vectors Gram-Schmidt algorithm Linear independence 2 Linear dependence set of n-vectors
More informationChapter 3 Multiple Regression Complete Example
Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationLogistic regression and linear classifiers COMS 4771
Logistic regression and linear classifiers COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random
More informationThe Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility
The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationStochastic Subgradient Methods
Stochastic Subgradient Methods Stephen Boyd and Almir Mutapcic Notes for EE364b, Stanford University, Winter 26-7 April 13, 28 1 Noisy unbiased subgradient Suppose f : R n R is a convex function. We say
More informationMultivariate Time Series: VAR(p) Processes and Models
Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with
More informationBivariate Relationships Between Variables
Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods
More informationSimple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.
Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1
More informationTime Series Analysis
Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Regression based methods, 1st part: Introduction (Sec.
More informationLecture 7: Kernels for Classification and Regression
Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationCSC411 Fall 2018 Homework 5
Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to
More informationConcordia University (5+5)Q 1.
(5+5)Q 1. Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/1 40 Examination Date Time Pages Mid Term Test May 26, 2004 Two Hours 3 Instructor Course Examiner
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationChapter 16. Simple Linear Regression and dcorrelation
Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationConvex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)
Convex Functions Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Definition convex function Examples
More informationShort Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning
Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants
More informationMath Practice Exam 2 - solutions
Math 181 - Practice Exam 2 - solutions Problem 1 A population of dinosaurs is modeled by P (t) = 0.3t 2 + 0.1t + 10 for times t in the interval [ 5, 0]. a) Find the rate of change of this population at
More information