Statistical Machine Learning, Part I. Regression 2
|
|
- Marilynn Day
- 5 years ago
- Views:
Transcription
1 Statistical Machine Learning, Part I Regression 2 mcuturi@i.kyoto-u.ac.jp SML
2 Last Week Regression: highlight a functional relationship between a predicted variable and predictors SML
3 Last Week Regression: highlight a functional relationship between a predicted variable and predictors find a function f such that (x,y) that can appear,f(x) y SML
4 Last Week Regression: highlight a functional relationship between a predicted variable and predictors to find an accurate function f such that (x,y) that can appear,f(x) y use a data set & the least-squares criterion: min f F 1 N N (y j f(x j )) 2 j=1 SML
5 Last Week Regression: highlight a functional relationship between a predicted variable and predictors when regressing a real number vs a real number : Scatter plot of Rent vs. Surface y = 0.13*x y = *x *x 0.71 y = 6.2e 05*x *x *x 3.1 y = 1.4e 06*x *x *x *x 1.2 y = 1.8e 07*x e 05*x *x *x 2 1.1*x apartment linear quadratic cubic 4th degree 5th degree Rent (x JPY) Surface Least-Squares Criterion L(b,a 1,,a p ) to fit lines, polynomials. results in solving a linear system. 2 nd order(b,a 1,,a p ) a p = linear in (b,a 1,,a p ) When setting L/ a p = 0 we get p+1 linear equations for p+1 variables. SML
6 Last Week Regression: highlight a functional relationship between a predicted variable and predictors when regressing a real number vs d real numbers (vector in R d ), find best fit α R d such that (α T x+α 0 ) y. Add to d N data matrix, a row of 1 s to get the predictors X. The row Y of predicted values The Least-Squares criterion also applies: ( L(α) = Y α T X 2 = α T XX T α 2Y X T α+ Y 2). α L = 0 α = (XX T ) 1 XY T This works if XX T R d+1 is invertible. SML
7 Last Week >> (X X )\(X Y ) ans = x age x surface x distance JPY SML
8 Today A statistical / probabilistic perspective on LS-regression A few words on polynomials in higher dimensions A geometric perspective Variable co-linearity and Overfitting problem Some solutions: advanced regression techniques Subset selection Ridge Regression Lasso SML
9 A (very few) words on the statistical/probabilistic interpretation of LS SML
10 The Statistical Perspective on Regression Assume that the values of y are stochastically linked to observations x as y (α T x+β) N(0,σ). This difference is a random variable called ε and is called a residue. SML
11 The Statistical Perspective on Regression This can be rewritten as, y = (α T x+β)+ε, ε N(0,σ), We assume that the difference between y and (α T x+b) behaves like a Gaussian (normally distributed) random variable. Goal as a statistician: Estimate α and β given observations. SML
12 Identically Independently Distributed (i.i.d) Observations Statistical hypothesis: assume that the parameters are α = a,β = b SML
13 Identically Independently Distributed (i.i.d) Observations Statistical hypothesis: assume that the parameters are α = a,β = b In such a case, what would be the probability of each observation (x j,y j )? SML
14 Identically Independently Distributed (i.i.d) Observations Statistical hypothesis: assuming that the parameters are α = a,β = b, what would be the probability of each observation?: For each couple (x j,y j ), j = 1,,N, P(x j,y j α = a,β = b) = 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2 SML
15 Identically Independently Distributed (i.i.d) Observations Statistical hypothesis: assuming that the parameters are α = a,β = b, what would be the probability of each observation?: For each couple (x j,y j ), j = 1,,N, P(x j,y j α = a,β = b) = 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2 Since each measurement (x j,y j ) has been independently sampled, P ({(x j,y j )} j=1,,n α = a,β = b) = N j=1 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2 SML
16 Identically Independently Distributed (i.i.d) Observations Statistical hypothesis: assuming that the parameters are α = a,β = b, what would be the probability of each observation?: For each couple (x j,y j ), j = 1,,N, P(x j,y j α = a,β = b) = 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2 Since each measurement (x j,y j ) has been independently sampled, P ({(x j,y j )} j=1,,n α = a,β = b) = N j=1 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2 A.K.A likelihood of the dataset {(x j,y j ) j=1,,n } as a function of a and b, L {(xj,y j )}(a,b) = N j=1 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2 SML
17 Maximum Likelihood Estimation (MLE) of Parameters Hence, for a,b, the likelihood function on the dataset {(x j,y j ) j=1,,n }... L(a,b) = N j=1 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2 SML
18 Maximum Likelihood Estimation (MLE) of Parameters Hence, for a,b, the likelihood function on the dataset {(x j,y j ) j=1,,n }... L(a,b) = N j=1 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2 Why not use the likelihood to guess (a,b) given data? SML
19 Maximum Likelihood Estimation (MLE) of Parameters Hence, for a,b, the likelihood function on the dataset {(x j,y j ) j=1,,n }... L(a,b) = N j=1 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2...the MLE approach selects the values of (a,b) which mazimize L(a,b) SML
20 Maximum Likelihood Estimation (MLE) of Parameters Hence, for a,b, the likelihood function on the dataset {(x j,y j ) j=1,,n }... L(a,b) = N j=1 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2...the MLE approach selects the values of (a,b) which mazimize L(a,b) Since max (a,b) L(a,b) max (a,b) logl(a,b) logl(a,b) = C 1 2σ 2 N y j (a T x j +b) 2. j=1 Hence max (a,b) L(a,b) min (a,b) N j=1 y j (a T x j +b) 2... SML
21 Statistical Approach to Linear Regression Properties of the MLE estimator: convergence of α a? Confidence intervals for coefficients, Tests procedures to assess if model fits the data, Residues Histogram Relative Frequency x x 10 Bayesian approaches: instead of looking for one optimal fit (a, b) juggle with a whole density on (a,b) to make decisions etc. SML
22 A few words on polynomials in higher dimensions SML
23 A few words on polynomials in higher dimensions For d variables, that is for points x R d, the space of polynomials on these variables up to degree p is generated by {x u u N d,u = (u 1,,u d ), d u i p} i=1 where the monomial x u is defined as x u 1 1 xu 2 2 xu d d Recurrence for dimension of that space: dim p+1 = dim p + ( p+1 d+p For d = 20 and p = 5, > ) Problem with polynomial interpolation in high-dimensions is the explosion of relevant variables (one for each monomial) SML
24 Geometric Perspective SML
25 Back to Basics Recall the problem: X = x 1 x 2 x N Rd+1 N... and Y = [ y 1 y N ] R N. We look for α such that α T X Y. SML
26 Back to Basics If we transpose this expression we get X T α Y T, 1 x 1,1 x d,1 1 x 1,2 x d, x 1,k x d,k α 0. =.... α d 1 x 1,N x d,n y 1. y 2. y. y N Using the notation Y = Y T,X = X T and X k for the (k +1) th column of X, d α k X k Y k=0 Note how the X k corresponds to all values taken by the k th variable. Problem: approximate/reconstruct Reconstructing Y R N using X 0,X 1,,X d R N? SML
27 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y Consider the observed vector in R N of predicted values SML
28 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y X 0 Plot the first regressor X 0... SML
29 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y X 1 X 0 Assume the next regressor X 1 is colinear to X 0... SML
30 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y X 1 X 0 X 2 and so is X 2... SML
31 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y X 1 X 0 X 2 Very little choices to approximate Y... SML
32 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y X 2 X 1 X 0 Suppose X 2 is actually not colinear to X 0. SML
33 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y X 2 X 1 X 0 This opens new ways to reconstruct Y. SML
34 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y X 2 X 1 X 0 When X 0,X 1,X 2 are linearly independent, SML
35 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y X 2 X 1 X 0 Y is in their span since the space is of dimension 3 SML
36 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d The dimension of that space is Rank(X), the rank of X Rank(X) min(d+1,n). SML
37 Linear System Three cases depending on RankX and d,n 1. RankX < N. d+1 column vectors do not span R N For arbitrary Y, there is no solution to α T X=Y 2. RankX = N and d+1 > N, too many variables span the whole of R N infinite number of solutions to α T X=Y. 3. RankX = N and d+1 = N, # variables = # observations Exact and unique solution: α = X 1 Y we have α T X=Y In most applications, d+1 N so we are either in case 1 or 2 SML
38 Case 1: RankX < N no solution to α T X=Y (equivalently Xα = Y) in general case. What about the orthogonal projection of Y on the image of X Y Ŷ span{x 0,X 1,,X d } Namely the point Ŷ such that Ŷ = argmin Y u. u spanx 0,X 1,,X d SML
39 Case 1: RankX < N Lemma 1.{X 0,X 1,,X d } is a l.i. family X T X is invertible SML
40 Case 1: RankX < N Computing the projection ˆω of a point ω on a subspace V is well understood. In particular, if (X 0,X 1,,X d ) is a basis of span{x 0,X 1,,X d }... (that is {X 0,X 1,,X d } is a linearly independent family)... then (X T X) is invertible and... Ŷ = X(X T X) 1 X T Y This gives us the α vector of weights we are looking for: Ŷ = X(X T X) 1 X T Y }{{} ˆα = Xˆα Y or ˆα T X = Y What can go wrong? SML
41 Case 1: RankX < N If X T X is invertible, Ŷ = X(X T X) 1 X T Y If X T X is not invertible... we have a problem. If X T X s condition number λ max (X T X) λ min (X T X), is very large, a small change in Y can cause dramatic changes in α. In this case the linear system is said to be badly conditioned... Using the formula Ŷ = X(X T X) 1 X T Y might return garbage as can be seen in the following Matlab example. SML
42 Case 2: RankX = N and d+1 > N high-dimensional low-sample setting Ill-posed inverse problem, the set {α R d Xα = Y} is a whole vector space. We need to choose one from many admissible points. When does this happen? High-dimensional low-sample case (DNA chips, multimedia etc.) How to solve for this? Use something called regularization. SML
43 A practical perspective: Colinearity and Overfitting SML
44 A Few High-dimensions Low sample settings DNA chips are very long vectors of measurements, one for each gene Task: regress a health-related variable against gene expression levels Image: SML
45 A Few High-dimensions Low sample settings. please = 2. send = 1 s represented as bag-of-words j =. R d money = 2. assignment = 0. Task: regress probability that this is an against bag-of-words Image: SML
46 Correlated Variables Suppose you run a real-estate company. For each apartment you have compiled a few hundred predictor variables, e.g. distances to conv. store, pharmacy, supermarket, parking lot, etc. distances to all main locations in Kansai socio-economic variables of the neighboorhood characteristics of the apartment Some are obviously correlated (correlated= almost colinear) distance to Post Office / distance to Post ATM In that case, we may have some problems (Matlab example) Source: SML
47 Overfitting Given d variables (including constant variable), consider the least squares criterion ( ) 2 j d L d (α 1,,α d ) = y j α i x i,j i=1 i=1 Add any variable vector x d+1,j,j = 1,,N, and define L d+1 (α 1,,α d,α d+1 ) = ( j y j i=1 ) 2 d α i x i,j α d+1 x d+1,j i=1 SML
48 Overfitting Given d variables (including constant variable), consider the least squares criterion ( ) 2 j d L d (α 1,,α d ) = y j α i x i,j i=1 i=1 Add any variable vector x d+1,j,j = 1,,N, and define L d+1 (α 1,,α d,α d+1 ) = ( j y j i=1 ) 2 d α i x i,j α d+1 x d+1,j i=1 THEN min α R d+1l d+1(α) min α R dl d (α) SML
49 Overfitting Given d variables (including constant variable), consider the least squares criterion ( ) 2 j d L d (α 1,,α d ) = y j α i x i,j i=1 i=1 Add any variable vector x d+1,j,j = 1,,N, and define L d+1 (α 1,,α d,α d+1 ) = ( j y j i=1 ) 2 d α i x i,j α d+1 x d+1,j i=1 Then min α R d+1l d+1(α) min α R dl d (α) why? L d (α 1,,α d ) = L d+1 (α 1,,α d,0) SML
50 Overfitting Given d variables (including constant variable), consider the least squares criterion ( ) 2 j d L d (α 1,,α d ) = y j α i x i,j i=1 i=1 Add any variable vector x d+1,j,j = 1,,N, and define L d+1 (α 1,,α d,α d+1 ) = ( j y j i=1 ) 2 d α i x i,j α d+1 x d+1,j i=1 Then min α R d+1l d+1(α) min α R dl d (α) why? L d (α 1,,α d ) = L d+1 (α 1,,α d,0) Residual-sum-of-squares goes down... but is it relevant to add variables? SML
51 Occam s razor formalization of overfitting Minimizing least-squares (RSS) is not clever enough. We need another idea to avoid overfitting. Occam s razor:lex parsimoniae law of parsimony: principle that recommends selecting the hypothesis that makes the fewest assumptions. one should always opt for an explanation in terms of the fewest possible causes, factors, or variables. Wikipedia: William of Ockham, born died 1347 SML
52 Advanced Regression Techniques SML
53 Quick Reminder on Vector Norms For a vector a R d, the Euclidian norm is the quantity a 2 = d a 2 i. More generally, the q-norm is for q > 0, i=1 a q = ( d )1 a i q i=1 q. In particular for q = 1, In the limit q and q 0, a 1 = d a i i=1 a = max i=1,,d a i. a 0 = #{i a i 0}. SML
54 Tikhonov Regularization 43 - Ridge Regression 62 Tikhonov s motivation : solve ill-posed inverse problems by regularization If min α L(α) is achieved on many points... consider min α L(α)+λ α 2 2 We can show that this leads to selecting ˆα = (X T X+λI d+1 ) 1 XY The condition number has changed to λ max (X T X)+λ λ min (X T X)+λ. SML
55 Subset selection : Exhaustive Search Following Ockham s razor, ideally we would like to know for any value p min L(α) α, α 0 =p select the best vector α which only gives weights to p variables. Find the best combination of p variables. Practical Implementation For p n, ( n p) possible combinations of p variables. Brute force approach: generate ( n p) regression problems and select the one that achieves the best RSS. Impossible in practice with moderately large n and p... ( 30 5) = SML
56 Subset selection : Forward Search Since the exact search is intractable in practice, consider the forward heuristic In Forward search: define I 1 = {0}. given a set I k {0,,d} of k variables, what is the most informative variable one could add? Compute for each variable i in {0,,d}\I k t i = min (α k ) k Ik,α N y j j=1 k x k,j +αx i,j k Ikα 2 Set I k+1 = I k {i } for any i such that i = min t i. k = k +1 until desired number of variablse SML
57 Subset selection : Backward Search... or the backward heuristic In Backward search: define I d = {0,1,,n}. given a set I k {0,,d} of k variables, what is the least informative variable one could remove? Compute for each variable i in I k t i = min (α k ) k Ik \{i} N y j j=1 k I k \{i} α k x k,j 2 Set I k 1 = I k \{i } for any i such that i = max t i. k = k 1 until desired number of variables SML
58 Subset selection : LASSO Naive Least-squares min α L(α) Best fit with p variables (Occam!) min L(α) α, α 0 =p Tikhonov regularized Least-squares min α L(α)+λ α 2 2 LASSO (least absolute shrinkage and selection operator) min α L(α)+λ α 1 SML
Foundation of Intelligent Systems, Part I. Regression 2
Foundation of Intelligent Systems, Part I Regression 2 mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Some Words on the Survey Not enough answers to say anything meaningful! Try again: survey. FIS - 2013 2 Last
More informationFoundation of Intelligent Systems, Part I. Regression
Foundation of Intelligent Systems, Part I Regression mcuturi@i.kyoto-u.ac.jp FIS-2013 1 Before starting Please take this survey before the end of this week. Here are a few books which you can check beyond
More informationSCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.
SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression
More informationThis model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that
Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear
More informationCS540 Machine learning Lecture 5
CS540 Machine learning Lecture 5 1 Last time Basis functions for linear regression Normal equations QR SVD - briefly 2 This time Geometry of least squares (again) SVD more slowly LMS Ridge regression 3
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationWeek 3: Linear Regression
Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to
More information3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.
7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLecture 3. Linear Regression
Lecture 3. Linear Regression COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne Weeks 2 to 8 inclusive Lecturer: Andrey Kan MS: Moscow, PhD:
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationDS-GA 1002 Lecture notes 10 November 23, Linear models
DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationMODULE 8 Topics: Null space, range, column space, row space and rank of a matrix
MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix Definition: Let L : V 1 V 2 be a linear operator. The null space N (L) of L is the subspace of V 1 defined by N (L) = {x
More informationRidge Regression 1. to which some random noise is added. So that the training labels can be represented as:
CS 1: Machine Learning Spring 15 College of Computer and Information Science Northeastern University Lecture 3 February, 3 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu 1 Introduction Ridge
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationMachine Learning for Biomedical Engineering. Enrico Grisan
Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Curse of dimensionality Why are more features bad? Redundant features (useless or confounding) Hard to interpret and
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationStatistical Geometry Processing Winter Semester 2011/2012
Statistical Geometry Processing Winter Semester 2011/2012 Linear Algebra, Function Spaces & Inverse Problems Vector and Function Spaces 3 Vectors vectors are arrows in space classically: 2 or 3 dim. Euclidian
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationFundamentals of Machine Learning (Part I)
Fundamentals of Machine Learning (Part I) Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp April 12, 2018 Mohammad Emtiyaz Khan 2018 1 Goals Understand (some) fundamentals
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 9 1 / 23 Overview
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationLinear regression COMS 4771
Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationOutline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren
1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationMachine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017
Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationIs the test error unbiased for these programs? 2017 Kevin Jamieson
Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning
More informationMethods of Mathematical Physics X1 Homework 2 - Solutions
Methods of Mathematical Physics - 556 X1 Homework - Solutions 1. Recall that we define the orthogonal complement as in class: If S is a vector space, and T is a subspace, then we define the orthogonal
More informationGeometric Modeling Summer Semester 2010 Mathematical Tools (1)
Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Recap: Linear Algebra Today... Topics: Mathematical Background Linear algebra Analysis & differential geometry Numerical techniques Geometric
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationIs the test error unbiased for these programs?
Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationReferences. Lecture 3: Shrinkage. The Power of Amnesia. Ockham s Razor
References Lecture 3: Shrinkage Isabelle Guon guoni@inf.ethz.ch Structural risk minimization for character recognition Isabelle Guon et al. http://clopinet.com/isabelle/papers/sr m.ps.z Kernel Ridge Regression
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationLecture 6: Linear Regression
Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationBindel, Fall 2016 Matrix Computations (CS 6210) Notes for
1 A cautionary tale Notes for 2016-10-05 You have been dropped on a desert island with a laptop with a magic battery of infinite life, a MATLAB license, and a complete lack of knowledge of basic geometry.
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:
Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationLECTURE NOTE #3 PROF. ALAN YUILLE
LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationLeast Squares. Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter UCSD
Least Squares Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 75A Winter 0 - UCSD (Unweighted) Least Squares Assume linearity in the unnown, deterministic model parameters Scalar, additive noise model: y f (
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationSummer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.
Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationRegression. Oscar García
Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is
More information11 Hypothesis Testing
28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More information(II.B) Basis and dimension
(II.B) Basis and dimension How would you explain that a plane has two dimensions? Well, you can go in two independent directions, and no more. To make this idea precise, we formulate the DEFINITION 1.
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationTutorial 6: Linear Regression
Tutorial 6: Linear Regression Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction to Simple Linear Regression................ 1 2 Parameter Estimation and Model
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationVectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =
Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationCSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes
CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More information[POLS 8500] Review of Linear Algebra, Probability and Information Theory
[POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming
More informationMATH 315 Linear Algebra Homework #1 Assigned: August 20, 2018
Homework #1 Assigned: August 20, 2018 Review the following subjects involving systems of equations and matrices from Calculus II. Linear systems of equations Converting systems to matrix form Pivot entry
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationAdvanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)
Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More information