Foundation of Intelligent Systems, Part I. Regression 2

Size: px
Start display at page:

Download "Foundation of Intelligent Systems, Part I. Regression 2"

Transcription

1 Foundation of Intelligent Systems, Part I Regression 2 mcuturi@i.kyoto-u.ac.jp FIS

2 Some Words on the Survey Not enough answers to say anything meaningful! Try again: survey. FIS

3 Last Week Regression: highlight a functional relationship between a predicted variable and predictors FIS

4 Last Week Regression: highlight a functional relationship between a predicted variable and predictors find a function f such that (x,y) that can appear,f(x) y FIS

5 Last Week Regression: highlight a functional relationship between a predicted variable and predictors to find an accurate function f such that (x,y) that can appear,f(x) y use a data set & the least-squares criterion: min f F 1 N N (y j f(x j )) 2 j=1 FIS

6 Last Week Regression: highlight a functional relationship between a predicted variable and predictors when regressing a real number vs a real number : Scatter plot of Rent vs. Surface y = 0.13*x y = *x *x 0.71 y = 6.2e 05*x *x *x 3.1 y = 1.4e 06*x *x *x *x 1.2 y = 1.8e 07*x e 05*x *x *x 2 1.1*x apartment linear quadratic cubic 4th degree 5th degree Rent (x JPY) Surface Least-Squares Criterion L(b,a 1,,a p ) to fit lines, polynomials. results in solving a linear system. 2 nd order(b,a 1,,a p ) a p = linear in (b,a 1,,a p ) When setting L/ a p = 0 we get p+1 linear equations for p+1 variables. FIS

7 Last Week Regression: highlight a functional relationship between a predicted variable and predictors when regressing a real number vs d real numbers (vector in R d ), find best fit α R n such that (α T x+α 0 ) y. Add to d N data matrix, a row of 1 s to get the predictors X. The row Y of predicted values The Least-Squares criterion also applies: ( L(α) = Y α T X 2 = α T XX T α 2Y X T α+ Y 2). α L = 0 α = (XX T ) 1 XY T This works if XX T R d+1 is invertible. FIS

8 Last Week >> (X X )\(X Y ) ans = x age x surface x distance JPY FIS

9 Today A statistical / probabilistic perspective on LS-regression A few words on polynomials in higher dimensions A geometric perspective Variable co-linearity and Overfitting problem Some solutions: advanced regression techniques Subset selection Ridge Regression Lasso FIS

10 A (very few) words on the statistical/probabilistic interpretation of LS FIS

11 The Statistical Perspective on Regression Assume that the values of y are stochastically linked to observations x as y (α T x+β) N(0,σ). This difference is a random variable called ε and is called a residue. FIS

12 The Statistical Perspective on Regression This can be rewritten as, y = (α T x+β)+ε, ε N(0,σ), We assume that the difference between y and (α T x+b) behaves like a Gaussian (normally distributed) random variable. Goal as a statistician: Estimate α and β given observations. FIS

13 Identically Independently Distributed (i.i.d) Observations Statistical hypothesis: assume that the parameters are α = a,β = b FIS

14 Identically Independently Distributed (i.i.d) Observations Statistical hypothesis: assume that the parameters are α = a,β = b In such a case, what would be the probability of each observation (x j,y j )? FIS

15 Identically Independently Distributed (i.i.d) Observations Statistical hypothesis: assuming that the parameters are α = a,β = b, what would be the probability of each observation?: For each couple (x j,y j ), j = 1,,N, P(x j,y j α = a,β = b) = 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2 FIS

16 Identically Independently Distributed (i.i.d) Observations Statistical hypothesis: assuming that the parameters are α = a,β = b, what would be the probability of each observation?: For each couple (x j,y j ), j = 1,,N, P(x j,y j α = a,β = b) = 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2 Since each measurement (x j,y j ) has been independently sampled, P ({(x j,y j )} j=1,,n α = a,β = b) = N j=1 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2 FIS

17 Identically Independently Distributed (i.i.d) Observations Statistical hypothesis: assuming that the parameters are α = a,β = b, what would be the probability of each observation?: For each couple (x j,y j ), j = 1,,N, P(x j,y j α = a,β = b) = 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2 Since each measurement (x j,y j ) has been independently sampled, P ({(x j,y j )} j=1,,n α = a,β = b) = N j=1 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2 A.K.A likelihood of the dataset {(x j,y j ) j=1,,n } as a function of a and b, L {(xj,y j )}(a,b) = N j=1 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2 FIS

18 Maximum Likelihood Estimation (MLE) of Parameters Hence, for a,b, the likelihood function on the dataset {(x j,y j ) j=1,,n }... L(a,b) = N j=1 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2 FIS

19 Maximum Likelihood Estimation (MLE) of Parameters Hence, for a,b, the likelihood function on the dataset {(x j,y j ) j=1,,n }... L(a,b) = N j=1 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2 Why not use the likelihood to guess (a,b) given data? FIS

20 Maximum Likelihood Estimation (MLE) of Parameters Hence, for a,b, the likelihood function on the dataset {(x j,y j ) j=1,,n }... L(a,b) = N j=1 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2...the MLE approach selects the values of (a,b) which mazimize L(a,b) FIS

21 Maximum Likelihood Estimation (MLE) of Parameters Hence, for a,b, the likelihood function on the dataset {(x j,y j ) j=1,,n }... L(a,b) = N j=1 1 2πσ exp ( y j (a T x j +b) 2 ) 2σ 2...the MLE approach selects the values of (a,b) which mazimize L(a,b) Since max (a,b) L(a,b) max (a,b) logl(a,b) logl(a,b) = C 1 2σ 2 N y j (a T x j +b) 2. j=1 Hence max (a,b) L(a,b) min (a,b) N j=1 y j (a T x j +b) 2... FIS

22 Statistical Approach to Linear Regression Properties of the MLE estimator: convergence of α a? Confidence intervals for coefficients, Tests procedures to assess if model fits the data, Residues Histogram Relative Frequency x x 10 Bayesian approaches: instead of looking for one optimal fit (a, b) juggle with a whole density on (a,b) to make decisions etc. FIS

23 A few words on polynomials in higher dimensions FIS

24 A few words on polynomials in higher dimensions For d variables, that is for points x R d, the space of polynomials on these variables up to degree p is generated by {x u u N d,u = (u 1,,u d ), d u i p} i=1 where the monomial x u is defined as x u 1 1 xu 2 2 xu d d Recurrence for dimension of that space: dim p+1 = dim p + ( p+1 d+p For d = 20 and p = 5, > ) Problem with polynomial interpolation in high-dimensions is the explosion of relevant variables (one for each monomial) FIS

25 Geometric Perspective FIS

26 Back to Basics Recall the problem: X = x 1 x 2 x N Rd+1 N... and Y = [ y 1 y N ] R N. We look for α such that α T X Y. FIS

27 Back to Basics If we transpose this expression we get X T α Y T, 1 x 1,1 x d,1 1 x 1,2 x d, x 1,k x d,k α 0. =.... α d 1 x 1,N x d,n y 1. y 2. y. y N Using the notation Y = Y T,X = X T and X k for the (k +1) th column of X, d α k X k Y k=0 Note how the X k corresponds to all values taken by the k th variable. Problem: approximate/reconstruct Reconstructing Y R N using X 0,X 1,,X d R N? FIS

28 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y Consider the observed vector in R N of predicted values FIS

29 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y X 0 Plot the first regressor X 0... FIS

30 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y X 1 X 0 Assume the next regressor X 1 is colinear to X 0... FIS

31 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y X 1 X 0 X 2 and so is X 2... FIS

32 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y X 1 X 0 X 2 Very little choices to approximate Y... FIS

33 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y X 2 X 1 X 0 Suppose X 2 is actually not colinear to X 0. FIS

34 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y X 2 X 1 X 0 This opens new ways to reconstruct Y. FIS

35 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y X 2 X 1 X 0 When X 0,X 1,X 2 are linearly independent, FIS

36 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d Y X 2 X 1 X 0 Y is in their span since the space is of dimension 3 FIS

37 Linear System Reconstructing Y R N using X 0,X 1,,X d vectors of R N. Our ability to approximate Y depends implicitly on the space spanned by X 0,X 1,,X d The dimension of that space is Rank(X), the rank of X Rank(X) min(d+1,n). FIS

38 Linear System Three cases depending on RankX and d,n 1. RankX < N. d+1 column vectors do not span R N For arbitrary Y, there is no solution to α T X=Y 2. RankX = N and d+1 > N, too many variables span the whole of R N infinite number of solutions to α T X=Y. 3. RankX = N and d+1 = N, # variables = # observations Exact and unique solution: α = X 1 Y we have α T X=Y In most applications, d+1 N so we are either in case 1 or 2 FIS

39 Case 1: RankX < N no solution to α T X=Y (equivalently Xα = Y) in general case. What about the orthogonal projection of Y on the image of X Y Ŷ span{x 0,X 1,,X d } Namely the point Ŷ such that Ŷ = argmin Y u. u spanx 0,X 1,,X d FIS

40 Case 1: RankX < N Lemma 1.{X 0,X 1,,X d } is a l.i. family X T X is invertible FIS

41 Case 1: RankX < N Computing the projection ˆω of a point ω on a subspace V is well understood. In particular, if (X 0,X 1,,X d ) is a basis of span{x 0,X 1,,X d }... (that is {X 0,X 1,,X d } is a linearly independent family)... then (X T X) is invertible and... Ŷ = X(X T X) 1 X T Y This gives us the α vector of weights we are looking for: Ŷ = X(X T X) 1 X T Y }{{} ˆα = Xˆα Y or ˆα T X = Y What can go wrong? FIS

42 Case 1: RankX < N If X T X is invertible, Ŷ = X(X T X) 1 X T Y If X T X is not invertible... we have a problem. If X T X s condition number λ max (X T X) λ min (X T X), is very large, a small change in Y can cause dramatic changes in α. In this case the linear system is said to be badly conditioned... Using the formula Ŷ = X(X T X) 1 X T Y might return garbage as can be seen in the following Matlab example. FIS

43 Case 2: RankX = N and d+1 > N high-dimensional low-sample setting Ill-posed inverse problem, the set {α R d Xα = Y} is a whole vector space. We need to choose one from many admissible points. When does this happen? High-dimensional low-sample case (DNA chips, multimedia etc.) How to solve for this? Use something called regularization. FIS

44 A practical perspective: Colinearity and Overfitting FIS

45 A Few High-dimensions Low sample settings DNA chips are very long vectors of measurements, one for each gene Task: regress a health-related variable against gene expression levels Image: FIS

46 A Few High-dimensions Low sample settings. please = 2. send = 1 s represented as bag-of-words j =. R d money = 2. assignment = 0. Task: regress probability that this is an against bag-of-words Image: FIS

47 Correlated Variables Suppose you run a real-estate company. For each apartment you have compiled a few hundred predictor variables, e.g. distances to conv. store, pharmacy, supermarket, parking lot, etc. distances to all main locations in Kansai socio-economic variables of the neighboorhood characteristics of the apartment Some are obviously correlated (correlated= almost colinear) distance to Post Office / distance to Post ATM In that case, we may have some problems (Matlab example) Source: FIS

48 Overfitting Given d variables (including constant variable), consider the least squares criterion j d 2 L d (α 1,,α d ) = y j α i x i,j i=1 i=1 Add any variable vector x d+1,j,j = 1,,N, and define L d+1 (α 1,,α d,α d+1 ) = j y j i=1 d 2 α i x i,j α d+1 x d+1,j i=1 FIS

49 Overfitting Given d variables (including constant variable), consider the least squares criterion j d 2 L d (α 1,,α d ) = y j α i x i,j i=1 i=1 Add any variable vector x d+1,j,j = 1,,N, and define L d+1 (α 1,,α d,α d+1 ) = j y j i=1 d 2 α i x i,j α d+1 x d+1,j i=1 THEN min α R d+1l d+1(α) min α R dl d (α) FIS

50 Overfitting Given d variables (including constant variable), consider the least squares criterion j d 2 L d (α 1,,α d ) = y j α i x i,j i=1 i=1 Add any variable vector x d+1,j,j = 1,,N, and define L d+1 (α 1,,α d,α d+1 ) = j y j i=1 d 2 α i x i,j α d+1 x d+1,j i=1 Then min α R d+1l d+1(α) min α R dl d (α) why? L d (α 1,,α d ) = L d+1 (α 1,,α d,0) FIS

51 Overfitting Given d variables (including constant variable), consider the least squares criterion j d 2 L d (α 1,,α d ) = y j α i x i,j i=1 i=1 Add any variable vector x d+1,j,j = 1,,N, and define L d+1 (α 1,,α d,α d+1 ) = j y j i=1 d 2 α i x i,j α d+1 x d+1,j i=1 Then min α R d+1l d+1(α) min α R dl d (α) why? L d (α 1,,α d ) = L d+1 (α 1,,α d,0) Residual-sum-of-squares goes down... but is it relevant to add variables? FIS

52 Occam s razor formalization of overfitting Minimizing least-squares (RSS) is not clever enough. We need another idea to avoid overfitting. Occam s razor:lex parsimoniae law of parsimony: principle that recommends selecting the hypothesis that makes the fewest assumptions. one should always opt for an explanation in terms of the fewest possible causes, factors, or variables. Wikipedia: William of Ockham, born died 1347 FIS

53 Advanced Regression Techniques FIS

54 Quick Reminder on Vector Norms For a vector a R d, the Euclidian norm is the quantity a 2 = d a 2 i. More generally, the q-norm is for q > 0, i=1 a q = ( d )1 a i q i=1 q. In particular for q = 1, In the limit q and q 0, a 1 = d a i i=1 a = max i=1,,d a i. a 0 = #{i a i 0}. FIS

55 Tikhonov Regularization 43 - Ridge Regression 62 Tikhonov s motivation : solve ill-posed inverse problems by regularization If min α L(α) is achieved on many points... consider min α L(α)+λ α 2 2 We can show that this leads to selecting ˆα = (X T X+λI d+1 ) 1 XY The condition number has changed to λ max (X T X)+λ λ min (X T X)+λ. FIS

56 Subset selection : Exhaustive Search Following Ockham s razor, ideally we would like to know for any value p min L(α) α, α 0 =p select the best vector α which only gives weights to p variables. Find the best combination of p variables. Practical Implementation For p n, ( n p) possible combinations of p variables. Brute force approach: generate ( n p) regression problems and select the one that achieves the best RSS. Impossible in practice with moderately large n and p... ( 30 5) = FIS

57 Subset selection : Forward Search Since the exact search is intractable in practice, consider the forward heuristic In Forward search: define I 1 = {0}. given a set I k {0,,d} of k variables, what is the most informative variable one could add? Compute for each variable i in {0,,d}\I k t i = min (α k ) k Ik,α N y j j=1 k x k,j +αx i,j k Ikα 2 Set I k+1 = I k {i } for any i such that i = min t i. k = k +1 until desired number of variablse FIS

58 Subset selection : Backward Search... or the backward heuristic In Backward search: define I d = {0,1,,n}. given a set I k {0,,d} of k variables, what is the least informative variable one could remove? Compute for each variable i in I k t i = min (α k ) k Ik \{i} N y j j=1 k I k \{i} α k x k,j 2 Set I k 1 = I k \{i } for any i such that i = max t i. k = k 1 until desired number of variables FIS

59 Subset selection : LASSO Naive Least-squares min α L(α) Best fit with p variables (Occam!) min L(α) α, α 0 =p Tikhonov regularized Least-squares min α L(α)+λ α 2 2 LASSO (least absolute shrinkage and selection operator) min α L(α)+λ α 1 FIS

Statistical Machine Learning, Part I. Regression 2

Statistical Machine Learning, Part I. Regression 2 Statistical Machine Learning, Part I Regression 2 mcuturi@i.kyoto-u.ac.jp SML-2015 1 Last Week Regression: highlight a functional relationship between a predicted variable and predictors SML-2015 2 Last

More information

Foundation of Intelligent Systems, Part I. Regression

Foundation of Intelligent Systems, Part I. Regression Foundation of Intelligent Systems, Part I Regression mcuturi@i.kyoto-u.ac.jp FIS-2013 1 Before starting Please take this survey before the end of this week. Here are a few books which you can check beyond

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

CS540 Machine learning Lecture 5

CS540 Machine learning Lecture 5 CS540 Machine learning Lecture 5 1 Last time Basis functions for linear regression Normal equations QR SVD - briefly 2 This time Geometry of least squares (again) SVD more slowly LMS Ridge regression 3

More information

Week 3: Linear Regression

Week 3: Linear Regression Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to

More information

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No. 7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix Definition: Let L : V 1 V 2 be a linear operator. The null space N (L) of L is the subspace of V 1 defined by N (L) = {x

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Machine Learning for Biomedical Engineering. Enrico Grisan

Machine Learning for Biomedical Engineering. Enrico Grisan Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Curse of dimensionality Why are more features bad? Redundant features (useless or confounding) Hard to interpret and

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Ridge Regression 1. to which some random noise is added. So that the training labels can be represented as:

Ridge Regression 1. to which some random noise is added. So that the training labels can be represented as: CS 1: Machine Learning Spring 15 College of Computer and Information Science Northeastern University Lecture 3 February, 3 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu 1 Introduction Ridge

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Statistical Geometry Processing Winter Semester 2011/2012

Statistical Geometry Processing Winter Semester 2011/2012 Statistical Geometry Processing Winter Semester 2011/2012 Linear Algebra, Function Spaces & Inverse Problems Vector and Function Spaces 3 Vectors vectors are arrows in space classically: 2 or 3 dim. Euclidian

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Recap: Linear Algebra Today... Topics: Mathematical Background Linear algebra Analysis & differential geometry Numerical techniques Geometric

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Tutorial 6: Linear Regression

Tutorial 6: Linear Regression Tutorial 6: Linear Regression Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction to Simple Linear Regression................ 1 2 Parameter Estimation and Model

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Methods of Mathematical Physics X1 Homework 2 - Solutions

Methods of Mathematical Physics X1 Homework 2 - Solutions Methods of Mathematical Physics - 556 X1 Homework - Solutions 1. Recall that we define the orthogonal complement as in class: If S is a vector space, and T is a subspace, then we define the orthogonal

More information

Lecture 3. Linear Regression

Lecture 3. Linear Regression Lecture 3. Linear Regression COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne Weeks 2 to 8 inclusive Lecturer: Andrey Kan MS: Moscow, PhD:

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Fundamentals of Machine Learning (Part I)

Fundamentals of Machine Learning (Part I) Fundamentals of Machine Learning (Part I) Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp April 12, 2018 Mohammad Emtiyaz Khan 2018 1 Goals Understand (some) fundamentals

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 A cautionary tale Notes for 2016-10-05 You have been dropped on a desert island with a laptop with a magic battery of infinite life, a MATLAB license, and a complete lack of knowledge of basic geometry.

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Least Squares. Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter UCSD

Least Squares. Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter UCSD Least Squares Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 75A Winter 0 - UCSD (Unweighted) Least Squares Assume linearity in the unnown, deterministic model parameters Scalar, additive noise model: y f (

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren 1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information

More information

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

MATH 315 Linear Algebra Homework #1 Assigned: August 20, 2018

MATH 315 Linear Algebra Homework #1 Assigned: August 20, 2018 Homework #1 Assigned: August 20, 2018 Review the following subjects involving systems of equations and matrices from Calculus II. Linear systems of equations Converting systems to matrix form Pivot entry

More information

Lecture 6: Linear Regression

Lecture 6: Linear Regression Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 9 1 / 23 Overview

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

11 Hypothesis Testing

11 Hypothesis Testing 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes

More information

References. Lecture 3: Shrinkage. The Power of Amnesia. Ockham s Razor

References. Lecture 3: Shrinkage. The Power of Amnesia. Ockham s Razor References Lecture 3: Shrinkage Isabelle Guon guoni@inf.ethz.ch Structural risk minimization for character recognition Isabelle Guon et al. http://clopinet.com/isabelle/papers/sr m.ps.z Kernel Ridge Regression

More information

The Hilbert Space of Random Variables

The Hilbert Space of Random Variables The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

(II.B) Basis and dimension

(II.B) Basis and dimension (II.B) Basis and dimension How would you explain that a plane has two dimensions? Well, you can go in two independent directions, and no more. To make this idea precise, we formulate the DEFINITION 1.

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015 MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

18.06 Problem Set 3 - Solutions Due Wednesday, 26 September 2007 at 4 pm in

18.06 Problem Set 3 - Solutions Due Wednesday, 26 September 2007 at 4 pm in 8.6 Problem Set 3 - s Due Wednesday, 26 September 27 at 4 pm in 2-6. Problem : (=2+2+2+2+2) A vector space is by definition a nonempty set V (whose elements are called vectors) together with rules of addition

More information

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is

More information

Mathematical statistics

Mathematical statistics October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive: Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Lecture 6: Linear Regression (continued)

Lecture 6: Linear Regression (continued) Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.

More information

Linear Models Review

Linear Models Review Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10 COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Is the test error unbiased for these programs?

Is the test error unbiased for these programs? Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017 Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

Containment restrictions

Containment restrictions Containment restrictions Tibor Szabó Extremal Combinatorics, FU Berlin, WiSe 207 8 In this chapter we switch from studying constraints on the set operation intersection, to constraints on the set relation

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

Scientific Computing: Dense Linear Systems

Scientific Computing: Dense Linear Systems Scientific Computing: Dense Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 February 9th, 2012 A. Donev (Courant Institute)

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information