Linear Models for Regression
|
|
- Corey Goodwin
- 5 years ago
- Views:
Transcription
1 Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1
2 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter of The Elements of Statistical Learning by Hastie, Tibshirani, Friedman Möller/Mori 2
3 Outline Regression Linear Basis Function Models Loss Functions for Regression Finding Optimal Weights Regularization Bayesian Linear Regression Möller/Mori 3
4 Outline Regression Linear Basis Function Models Loss Functions for Regression Finding Optimal Weights Regularization Bayesian Linear Regression Möller/Mori 4
5 Regression Given training set {(x 1, t 1 ),..., (x N, t N )} t i is continuous: regression For now, assume t i R, x i R D E.g. t i is stock price, x i contains company profit, debt, cash flow, gross sales, number of spam s sent, Möller/Mori 5
6 Outline Regression Linear Basis Function Models Loss Functions for Regression Finding Optimal Weights Regularization Bayesian Linear Regression Möller/Mori 6
7 Linear Functions A function f( ) is linear if f(αx 1 + βx 2 ) = αf(x 1 ) + βf(x 2 ) Linear functions will lead to simple algorithms, so let s see what we can do with them Möller/Mori 7
8 Linear Regression Simplest linear model for regression y(x, w) = w 0 + w 1 x 1 + w 2 x w D x D Remember, we re learning w Set w so that y(x, w) aligns with target value in training data Möller/Mori 8
9 Linear Regression Simplest linear model for regression y(x, w) = w 0 + w 1 x 1 + w 2 x w D x D Remember, we re learning w Set w so that y(x, w) aligns with target value in training data Möller/Mori 9
10 Linear Regression Simplest linear model for regression y(x, w) = w 0 + w 1 x 1 + w 2 x w D x D Remember, we re learning w Set w so that y(x, w) aligns with target value in training data This is a very simple model, limited in what it can do Möller/Mori 10
11 Linear Basis Function Models Simplest linear model y(x, w) = w 0 + w 1 x 1 + w 2 x w D x D was linear in x ( ) and w Linear in w is what will be important for simple algorithms Extend to include fixed non-linear functions of data y(x, w) = w 0 + w 1 ϕ 1 (x) + w 2 ϕ 2 (x) w M 1 ϕ M 1 (x) Linear combinations of these basis functions also linear in parameters Möller/Mori 11
12 Linear Basis Function Models Simplest linear model y(x, w) = w 0 + w 1 x 1 + w 2 x w D x D was linear in x ( ) and w Linear in w is what will be important for simple algorithms Extend to include fixed non-linear functions of data y(x, w) = w 0 + w 1 ϕ 1 (x) + w 2 ϕ 2 (x) w M 1 ϕ M 1 (x) Linear combinations of these basis functions also linear in parameters Möller/Mori 12
13 Linear Basis Function Models Simplest linear model y(x, w) = w 0 + w 1 x 1 + w 2 x w D x D was linear in x ( ) and w Linear in w is what will be important for simple algorithms Extend to include fixed non-linear functions of data y(x, w) = w 0 + w 1 ϕ 1 (x) + w 2 ϕ 2 (x) w M 1 ϕ M 1 (x) Linear combinations of these basis functions also linear in parameters Möller/Mori 13
14 Linear Basis Function Models Bias parameter allows fixed offset in data y(x, w) = w 0 }{{} bias Think of simple 1-D x: +w 1 ϕ 1 (x) + w 2 ϕ 2 (x) w M 1 ϕ M 1 (x) y(x, w) = w }{{} 0 + w 1 intercept }{{} slope For notational convenience, define ϕ 0 (x) = 1: y(x, w) = M 1 j=0 x w j ϕ j (x) = w T ϕ(x) Möller/Mori 14
15 Linear Basis Function Models Bias parameter allows fixed offset in data y(x, w) = w 0 }{{} bias Think of simple 1-D x: +w 1 ϕ 1 (x) + w 2 ϕ 2 (x) w M 1 ϕ M 1 (x) y(x, w) = w }{{} 0 + w 1 intercept }{{} slope For notational convenience, define ϕ 0 (x) = 1: y(x, w) = M 1 j=0 x w j ϕ j (x) = w T ϕ(x) Möller/Mori 15
16 Linear Basis Function Models Bias parameter allows fixed offset in data y(x, w) = w 0 }{{} bias Think of simple 1-D x: +w 1 ϕ 1 (x) + w 2 ϕ 2 (x) w M 1 ϕ M 1 (x) y(x, w) = w }{{} 0 + w 1 intercept }{{} slope For notational convenience, define ϕ 0 (x) = 1: y(x, w) = M 1 j=0 x w j ϕ j (x) = w T ϕ(x) Möller/Mori 16
17 Linear Basis Function Models Function for regression y(x, w) is non-linear function of x, but linear in w: y(x, w) = M 1 j=0 w j ϕ j (x) = w T ϕ(x) Polynomial regression is an example of this Order M polynomial regression, ϕ j (x) =? Möller/Mori 17
18 Linear Basis Function Models Function for regression y(x, w) is non-linear function of x, but linear in w: y(x, w) = M 1 j=0 w j ϕ j (x) = w T ϕ(x) Polynomial regression is an example of this Order M polynomial regression, ϕ j (x) =? Möller/Mori 18
19 Linear Basis Function Models Function for regression y(x, w) is non-linear function of x, but linear in w: y(x, w) = M 1 j=0 w j ϕ j (x) = w T ϕ(x) Polynomial regression is an example of this Order M polynomial regression, ϕ j (x) =? ϕ j (x) = x j : y(x, w) = w 0 x 0 + w 1 x w M x M Möller/Mori 19
20 Basis Functions: Feature Functions Often we extract features from x An intuitve way to think of ϕ j (x) is as feature functions E.g. Automatic project report grading system x is text of report: In this project we apply the algorithm of Möller [2] to recognizing blue objects. We test this algorithm on pictures of you and I from my holiday photo collection.... ϕ 1 (x) is count of occurrences of Möller [ ϕ 2 (x) is count of occurrences of of you and I Regression grade y(x, w) = 20ϕ 1 (x) 10ϕ 2 (x) Möller/Mori 20
21 Basis Functions: Feature Functions Often we extract features from x An intuitve way to think of ϕ j (x) is as feature functions E.g. Automatic project report grading system x is text of report: In this project we apply the algorithm of Möller [2] to recognizing blue objects. We test this algorithm on pictures of you and I from my holiday photo collection.... ϕ 1 (x) is count of occurrences of Möller [ ϕ 2 (x) is count of occurrences of of you and I Regression grade y(x, w) = 20ϕ 1 (x) 10ϕ 2 (x) Möller/Mori 21
22 Basis Functions: Feature Functions Often we extract features from x An intuitve way to think of ϕ j (x) is as feature functions E.g. Automatic project report grading system x is text of report: In this project we apply the algorithm of Möller [2] to recognizing blue objects. We test this algorithm on pictures of you and I from my holiday photo collection.... ϕ 1 (x) is count of occurrences of Möller [ ϕ 2 (x) is count of occurrences of of you and I Regression grade y(x, w) = 20ϕ 1 (x) 10ϕ 2 (x) Möller/Mori 22
23 Basis Functions: Feature Functions Often we extract features from x An intuitve way to think of ϕ j (x) is as feature functions E.g. Automatic project report grading system x is text of report: In this project we apply the algorithm of Möller [2] to recognizing blue objects. We test this algorithm on pictures of you and I from my holiday photo collection.... ϕ 1 (x) is count of occurrences of Möller [ ϕ 2 (x) is count of occurrences of of you and I Regression grade y(x, w) = 20ϕ 1 (x) 10ϕ 2 (x) Möller/Mori 23
24 Other Non-linear Basis Functions Polynomial ϕ j (x) = x j Gaussians ϕ j (x) = exp{ (x µ j) 2 2s 2 } Sigmoidal ϕ j (x) = 1 1+exp((µ j x)/s) Möller/Mori 24
25 Example - Gaussian Basis Functions: Temperature Use Gaussian basis functions, regression on temperature Möller/Mori 25
26 Example - Gaussian Basis Functions: Temperature µ 1 = Vancouver, µ 2 = San Francisco, µ 3 = Oakland Möller/Mori 26
27 Example - Gaussian Basis Functions: Temperature µ 1 = Vancouver, µ 2 = San Francisco, µ 3 = Oakland Temperature in x = Seattle? y(x, w) = w 1 exp{ (x µ 1) 2 } + w 2s 2 2 exp{ (x µ 2) 2 } + w 2s 2 3 exp{ (x µ 3) 2 } 2s 2 Möller/Mori 27
28 Example - Gaussian Basis Functions: Temperature µ 1 = Vancouver, µ 2 = San Francisco, µ 3 = Oakland Temperature in x = Seattle? y(x, w) = w 1 exp{ (x µ 1) 2 } + w 2s 2 2 exp{ (x µ 2) 2 } + w 2s 2 3 exp{ (x µ 3) 2 } 2s 2 Compute distances to all µ, y(x, w) w 1 Möller/Mori 28
29 Example - Gaussian Basis Functions Could define ϕ j (x) = exp{ (x x j) 2 2s 2 } Gaussian around each training data point x j, N of them Could use for modelling temperature or resource availability at spatial location x Overfitting - interpolates data Example of a kernel method, more later Möller/Mori 29
30 Example - Gaussian Basis Functions Could define ϕ j (x) = exp{ (x x j) 2 2s 2 } Gaussian around each training data point x j, N of them Could use for modelling temperature or resource availability at spatial location x Overfitting - interpolates data Example of a kernel method, more later Möller/Mori 30
31 Example - Gaussian Basis Functions Could define ϕ j (x) = exp{ (x x j) 2 2s 2 } Gaussian around each training data point x j, N of them Could use for modelling temperature or resource availability at spatial location x Overfitting - interpolates data Example of a kernel method, more later Möller/Mori 31
32 Example - Gaussian Basis Functions Could define ϕ j (x) = exp{ (x x j) 2 2s 2 } Gaussian around each training data point x j, N of them Could use for modelling temperature or resource availability at spatial location x Overfitting - interpolates data Example of a kernel method, more later Möller/Mori 32
33 Example - Gaussian Basis Functions Could define ϕ j (x) = exp{ (x x j) 2 2s 2 } Gaussian around each training data point x j, N of them Could use for modelling temperature or resource availability at spatial location x Overfitting - interpolates data Example of a kernel method, more later Möller/Mori 33
34 Outline Regression Linear Basis Function Models Loss Functions for Regression Finding Optimal Weights Regularization Bayesian Linear Regression Möller/Mori 34
35 Loss Functions for Regression We want to find the best set of coefficients w Recall, one way to define best was minimizing squared error: E(w) = 1 N {y(x n, w) t n } 2 2 n=1 We will now look at another way, based on maximum likelihood Möller/Mori 35
36 Gaussian Noise Model for Regression We are provided with a training set {(x i, t i )} Let s assume t arises from a deterministic function plus Gaussian distributed (with precision β) noise: t = y(x, w) + ϵ The probability of observing a target value t is then: p(t x, w, β) = N (t y(x, w), β 1 ) Notation: N (x µ, σ 2 ); x drawn from Gaussian with mean µ, variance σ 2 Möller/Mori 36
37 Gaussian Noise Model for Regression We are provided with a training set {(x i, t i )} Let s assume t arises from a deterministic function plus Gaussian distributed (with precision β) noise: t = y(x, w) + ϵ The probability of observing a target value t is then: p(t x, w, β) = N (t y(x, w), β 1 ) Notation: N (x µ, σ 2 ); x drawn from Gaussian with mean µ, variance σ 2 Möller/Mori 37
38 Maximum Likelihood for Regression The likelihood of data t = {t i } using this Gaussian noise model is: N p(t w, β) = N (t n w T ϕ(x n ), β 1 ) The log-likelihood is: ln p(t w, β) = ln N n=1 n=1 β exp ( β2 ) (t n w T ϕ(x n )) 2 2π = N 2 ln β N 2 ln(2π) β 1 N (t n w T ϕ(x n )) 2 }{{} 2 n=1 const. wrt w }{{} squared error Sum of squared errors is maximum likelihood under a Gaussian noise model Möller/Mori 38
39 Maximum Likelihood for Regression The likelihood of data t = {t i } using this Gaussian noise model is: N p(t w, β) = N (t n w T ϕ(x n ), β 1 ) The log-likelihood is: ln p(t w, β) = ln N n=1 n=1 β exp ( β2 ) (t n w T ϕ(x n )) 2 2π = N 2 ln β N 2 ln(2π) β 1 N (t n w T ϕ(x n )) 2 }{{} 2 n=1 const. wrt w }{{} squared error Sum of squared errors is maximum likelihood under a Gaussian noise model Möller/Mori 39
40 Maximum Likelihood for Regression The likelihood of data t = {t i } using this Gaussian noise model is: N p(t w, β) = N (t n w T ϕ(x n ), β 1 ) The log-likelihood is: ln p(t w, β) = ln N n=1 n=1 β exp ( β2 ) (t n w T ϕ(x n )) 2 2π = N 2 ln β N 2 ln(2π) β 1 N (t n w T ϕ(x n )) 2 }{{} 2 n=1 const. wrt w }{{} squared error Sum of squared errors is maximum likelihood under a Gaussian noise model Möller/Mori 40
41 Maximum Likelihood for Regression The likelihood of data t = {t i } using this Gaussian noise model is: N p(t w, β) = N (t n w T ϕ(x n ), β 1 ) The log-likelihood is: ln p(t w, β) = ln N n=1 n=1 β exp ( β2 ) (t n w T ϕ(x n )) 2 2π = N 2 ln β N 2 ln(2π) β 1 N (t n w T ϕ(x n )) 2 }{{} 2 n=1 const. wrt w }{{} squared error Sum of squared errors is maximum likelihood under a Gaussian noise model Möller/Mori 41
42 Outline Regression Linear Basis Function Models Loss Functions for Regression Finding Optimal Weights Regularization Bayesian Linear Regression Möller/Mori 42
43 Finding Optimal Weights How do we maximize likelihood wrt w (or minimize squared error)? Take gradient of log-likelihood wrt w: w i ln p(t w, β) = β N (t n w T ϕ(x n ))ϕ i (x n ) n=1 In vector form: N ln p(t w, β) = β (t n w T ϕ(x n ))ϕ(x n ) T n=1 Möller/Mori 43
44 Finding Optimal Weights How do we maximize likelihood wrt w (or minimize squared error)? Take gradient of log-likelihood wrt w: w i ln p(t w, β) = β N (t n w T ϕ(x n ))ϕ i (x n ) n=1 In vector form: N ln p(t w, β) = β (t n w T ϕ(x n ))ϕ(x n ) T n=1 Möller/Mori 44
45 Finding Optimal Weights How do we maximize likelihood wrt w (or minimize squared error)? Take gradient of log-likelihood wrt w: w i ln p(t w, β) = β N (t n w T ϕ(x n ))ϕ i (x n ) n=1 In vector form: N ln p(t w, β) = β (t n w T ϕ(x n ))ϕ(x n ) T n=1 Möller/Mori 45
46 Finding Optimal Weights Set gradient to 0: ( N N ) 0 T = t n ϕ(x n ) T w T ϕ(x n )ϕ(x n ) T n=1 n=1 Maximum likelihood estimate for w: Φ = w ML = ( Φ T Φ ) 1 Φ T t ϕ 0 (x 1 ) ϕ 1 (x 1 )... ϕ M 1 (x 1 ) ϕ 0 (x 2 ) ϕ 1 (x 2 )... ϕ M 1 (x 2 ) ϕ 0 (x N ) ϕ 1 (x N )... ϕ M 1 (x N ) Φ = ( Φ T Φ ) 1 Φ T known as the pseudo-inverse (pinv in MATLAB) Möller/Mori 46
47 Finding Optimal Weights Set gradient to 0: ( N N ) 0 T = t n ϕ(x n ) T w T ϕ(x n )ϕ(x n ) T n=1 n=1 Maximum likelihood estimate for w: Φ = w ML = ( Φ T Φ ) 1 Φ T t ϕ 0 (x 1 ) ϕ 1 (x 1 )... ϕ M 1 (x 1 ) ϕ 0 (x 2 ) ϕ 1 (x 2 )... ϕ M 1 (x 2 ) ϕ 0 (x N ) ϕ 1 (x N )... ϕ M 1 (x N ) Φ = ( Φ T Φ ) 1 Φ T known as the pseudo-inverse (pinv in MATLAB) Möller/Mori 47
48 Geometry of Least Squares S t ϕ 2 ϕ 1 y t = (t 1,..., t N ) is the target value vector S is space spanned by φ j = (ϕ j (x 1 ),..., ϕ j (x N )) Solution y lies in S Least squares solution is orthogonal projection of t onto S Can verify this by looking at y = Φw ML = ΦΦ t = P t P 2 = P, P = P T Möller/Mori 48
49 Geometry of Least Squares S t ϕ 2 ϕ 1 y t = (t 1,..., t N ) is the target value vector S is space spanned by φ j = (ϕ j (x 1 ),..., ϕ j (x N )) Solution y lies in S Least squares solution is orthogonal projection of t onto S Can verify this by looking at y = Φw ML = ΦΦ t = P t P 2 = P, P = P T Möller/Mori 49
50 Geometry of Least Squares S t ϕ 2 ϕ 1 y t = (t 1,..., t N ) is the target value vector S is space spanned by φ j = (ϕ j (x 1 ),..., ϕ j (x N )) Solution y lies in S Least squares solution is orthogonal projection of t onto S Can verify this by looking at y = Φw ML = ΦΦ t = P t P 2 = P, P = P T Möller/Mori 50
51 Geometry of Least Squares S t ϕ 2 ϕ 1 y t = (t 1,..., t N ) is the target value vector S is space spanned by φ j = (ϕ j (x 1 ),..., ϕ j (x N )) Solution y lies in S Least squares solution is orthogonal projection of t onto S Can verify this by looking at y = Φw ML = ΦΦ t = P t P 2 = P, P = P T Möller/Mori 51
52 Geometry of Least Squares S t ϕ 2 ϕ 1 y t = (t 1,..., t N ) is the target value vector S is space spanned by φ j = (ϕ j (x 1 ),..., ϕ j (x N )) Solution y lies in S Least squares solution is orthogonal projection of t onto S Can verify this by looking at y = Φw ML = ΦΦ t = P t P 2 = P, P = P T Möller/Mori 52
53 Sequential Learning In practice N might be huge, or data might arrive online Can use a gradient descent method: Start with initial guess for w Update by taking a step in gradient direction E of error function Modify to use stochastic / sequential gradient descent: If error function E = n E n (e.g. least squares) Update by taking a step in gradient direction E n for one example Details about step size are important decrease step size at the end Möller/Mori 53
54 Sequential Learning In practice N might be huge, or data might arrive online Can use a gradient descent method: Start with initial guess for w Update by taking a step in gradient direction E of error function Modify to use stochastic / sequential gradient descent: If error function E = n E n (e.g. least squares) Update by taking a step in gradient direction E n for one example Details about step size are important decrease step size at the end Möller/Mori 54
55 Sequential Learning In practice N might be huge, or data might arrive online Can use a gradient descent method: Start with initial guess for w Update by taking a step in gradient direction E of error function Modify to use stochastic / sequential gradient descent: If error function E = n E n (e.g. least squares) Update by taking a step in gradient direction E n for one example Details about step size are important decrease step size at the end Möller/Mori 55
56 Sequential Learning In practice N might be huge, or data might arrive online Can use a gradient descent method: Start with initial guess for w Update by taking a step in gradient direction E of error function Modify to use stochastic / sequential gradient descent: If error function E = n E n (e.g. least squares) Update by taking a step in gradient direction E n for one example Details about step size are important decrease step size at the end Möller/Mori 56
57 Outline Regression Linear Basis Function Models Loss Functions for Regression Finding Optimal Weights Regularization Bayesian Linear Regression Möller/Mori 57
58 Regularization We discussed regularization as a technique to avoid over-fitting: Ẽ(w) = 1 2 N {y(x n, w) t n } 2 + λ 2 w 2 }{{} regularizer n=1 Next on the menu: Other regularizers Bayesian learning and quadratic regularizer Möller/Mori 58
59 Other Regularizers q = 0.5 q = 1 q = 2 q = 4 Can use different norms for regularizer: Ẽ(w) = 1 2 N {y(x n, w) t n } 2 + λ 2 n=1 e.g. q = 2 ridge regression e.g. q = 1 lasso math is easiest with ridge regression M w j q j=1 Möller/Mori 59
60 Optimization with a Quadratic Regularizer With q = 2, total error still a nice quadratic: Calculus... Ẽ(w) = 1 2 N {y(x n, w) t n } 2 + λ 2 wt w n=1 w = (λi } + {{ Φ T Φ } ) 1 Φ T t regularlized Similar to unregularlized least squares Advantage (λi + Φ T Φ) is well conditioned so inversion is stable Möller/Mori 60
61 Ridge Regression vs. Lasso w2 w w1 Ridge regression aka parameter shrinkage Weights w shrink back towards origin Möller/Mori 61
62 Ridge Regression vs. Lasso w2 w2 w w w1 w1 Ridge regression aka parameter shrinkage Weights w shrink back towards origin Lasso leads to sparse models Components of w tend to 0 with large λ (strong regularization) Intuitively, once minimum achieved at large radius, minimum is on w 1 = 0 Möller/Mori 62
63 Outline Regression Linear Basis Function Models Loss Functions for Regression Finding Optimal Weights Regularization Bayesian Linear Regression Möller/Mori 63
64 Bayesian Linear Regression Start with a prior over parameters w Conjugate prior is a Gaussian: p(w) = N (w 0, α 1 I) This simple form will make math easier; can be done for arbitrary Gaussian too Data likelihood, Gaussian model as before: p(t x, w, β) = N (t y(x, w), β 1 ) Möller/Mori 64
65 Bayesian Linear Regression Start with a prior over parameters w Conjugate prior is a Gaussian: p(w) = N (w 0, α 1 I) This simple form will make math easier; can be done for arbitrary Gaussian too Data likelihood, Gaussian model as before: p(t x, w, β) = N (t y(x, w), β 1 ) Möller/Mori 65
66 Bayesian Linear Regression Posterior distribution on w: ( N ) p(w t) p(t n x n, w, β) p(w) = [ N n=1 Take the log: n=1 ( β exp β ) ] ( α ) M 2π 2 (t n w T ϕ(x n )) 2 2 exp( α 2π 2 wt w) ln p(w t) = β 2 N (t n w T ϕ(x n )) 2 + α 2 wt w + const n=1 L 2 regularization is maximum a posteriori (MAP) with a Gaussian prior. λ = α/β Möller/Mori 66
67 Bayesian Linear Regression Posterior distribution on w: ( N ) p(w t) p(t n x n, w, β) p(w) = [ N n=1 Take the log: n=1 ( β exp β ) ] ( α ) M 2π 2 (t n w T ϕ(x n )) 2 2 exp( α 2π 2 wt w) ln p(w t) = β 2 N (t n w T ϕ(x n )) 2 + α 2 wt w + const n=1 L 2 regularization is maximum a posteriori (MAP) with a Gaussian prior. λ = α/β Möller/Mori 67
68 Bayesian Linear Regression Posterior distribution on w: ( N ) p(w t) p(t n x n, w, β) p(w) = [ N n=1 Take the log: n=1 ( β exp β ) ] ( α ) M 2π 2 (t n w T ϕ(x n )) 2 2 exp( α 2π 2 wt w) ln p(w t) = β 2 N (t n w T ϕ(x n )) 2 + α 2 wt w + const n=1 L 2 regularization is maximum a posteriori (MAP) with a Gaussian prior. λ = α/β Möller/Mori 68
69 Bayesian Linear Regression Posterior distribution on w: ( N ) p(w t) p(t n x n, w, β) p(w) = [ N n=1 Take the log: n=1 ( β exp β ) ] ( α ) M 2π 2 (t n w T ϕ(x n )) 2 2 exp( α 2π 2 wt w) ln p(w t) = β 2 N (t n w T ϕ(x n )) 2 + α 2 wt w + const n=1 L 2 regularization is maximum a posteriori (MAP) with a Gaussian prior. λ = α/β Möller/Mori 69
70 Bayesian Linear Regression - Intuition Simple example x, t R, y(x, w) = w 0 + w 1 x Start with Gaussian prior in parameter space Samples shown in data space Receive data points (blue circles in data space) Compute likelihood Posterior is prior (or prev. posterior) times likelihood Möller/Mori 70
71 Bayesian Linear Regression - Intuition Simple example x, t R, y(x, w) = w 0 + w 1 x Start with Gaussian prior in parameter space Samples shown in data space Receive data points (blue circles in data space) Compute likelihood Posterior is prior (or prev. posterior) times likelihood Möller/Mori 71
72 Bayesian Linear Regression - Intuition Simple example x, t R, y(x, w) = w 0 + w 1 x Start with Gaussian prior in parameter space Samples shown in data space Receive data points (blue circles in data space) Compute likelihood Posterior is prior (or prev. posterior) times likelihood Möller/Mori 72
73 Bayesian Linear Regression - Intuition Simple example x, t R, y(x, w) = w 0 + w 1 x Start with Gaussian prior in parameter space Samples shown in data space Receive data points (blue circles in data space) Compute likelihood Posterior is prior (or prev. posterior) times likelihood Möller/Mori 73
74 Bayesian Linear Regression - Intuition Simple example x, t R, y(x, w) = w 0 + w 1 x Start with Gaussian prior in parameter space Samples shown in data space Receive data points (blue circles in data space) Compute likelihood Posterior is prior (or prev. posterior) times likelihood Möller/Mori 74
75 Bayesian Linear Regression - Intuition Simple example x, t R, y(x, w) = w 0 + w 1 x Start with Gaussian prior in parameter space Samples shown in data space Receive data points (blue circles in data space) Compute likelihood Posterior is prior (or prev. posterior) times likelihood Möller/Mori 75
76 Predictive Distribution Single estimate of w (ML or MAP) doesn t tell whole story We have a distribution over w, and can use it to make predictions Given a new value for x, we can compute a distribution over t: p(t t, α, β) = p(t, w t, α, β)dw p(t t, α, β) = p(t w, β) p(w t, α, β) }{{}}{{}}{{} dw predict probability sum i.e. For each value of w, let it make a prediction, multiply by its probability, sum over all w For arbitrary models as the distributions, this integral may not be computationally tractable Möller/Mori 76
77 Predictive Distribution Single estimate of w (ML or MAP) doesn t tell whole story We have a distribution over w, and can use it to make predictions Given a new value for x, we can compute a distribution over t: p(t t, α, β) = p(t, w t, α, β)dw p(t t, α, β) = p(t w, β) p(w t, α, β) }{{}}{{}}{{} dw predict probability sum i.e. For each value of w, let it make a prediction, multiply by its probability, sum over all w For arbitrary models as the distributions, this integral may not be computationally tractable Möller/Mori 77
78 Predictive Distribution Single estimate of w (ML or MAP) doesn t tell whole story We have a distribution over w, and can use it to make predictions Given a new value for x, we can compute a distribution over t: p(t t, α, β) = p(t, w t, α, β)dw p(t t, α, β) = p(t w, β) p(w t, α, β) }{{}}{{}}{{} dw predict probability sum i.e. For each value of w, let it make a prediction, multiply by its probability, sum over all w For arbitrary models as the distributions, this integral may not be computationally tractable Möller/Mori 78
79 Predictive Distribution With the Gaussians we ve used for these distributions, the predicitve distribution will also be Gaussian (math on convolutions of Gaussians) Green line is true (unobserved) curve, blue data points, red line is mean, pink one standard deviation Uncertainty small around data points Pink region shrinks with more data Möller/Mori 79
80 Bayesian Model Selection So what do the Bayesians say about model selection? Model selection is choosing model M i e.g. degree of polynomial, type of basis function ϕ Don t select, just integrate p(t x, D) = L p(t x, M i, D) p(m }{{} i D) predictive dist. i=1 Average together the results of all models Could choose most likely model a posteriori p(m i D) More efficient, approximation Möller/Mori 80
81 Bayesian Model Selection So what do the Bayesians say about model selection? Model selection is choosing model M i e.g. degree of polynomial, type of basis function ϕ Don t select, just integrate p(t x, D) = L p(t x, M i, D) p(m }{{} i D) predictive dist. i=1 Average together the results of all models Could choose most likely model a posteriori p(m i D) More efficient, approximation Möller/Mori 81
82 Bayesian Model Selection So what do the Bayesians say about model selection? Model selection is choosing model M i e.g. degree of polynomial, type of basis function ϕ Don t select, just integrate p(t x, D) = L p(t x, M i, D) p(m }{{} i D) predictive dist. i=1 Average together the results of all models Could choose most likely model a posteriori p(m i D) More efficient, approximation Möller/Mori 82
83 Bayesian Model Selection So what do the Bayesians say about model selection? Model selection is choosing model M i e.g. degree of polynomial, type of basis function ϕ Don t select, just integrate p(t x, D) = L p(t x, M i, D) p(m }{{} i D) predictive dist. i=1 Average together the results of all models Could choose most likely model a posteriori p(m i D) More efficient, approximation Möller/Mori 83
84 Bayesian Model Selection How do we compute the posterior over models? p(m i D) p(d M i )p(m i ) Another likelihood + prior combination Likelihood: p(d M i ) = p(d w, M i )p(w M i )dw Möller/Mori 84
85 Bayesian Model Selection How do we compute the posterior over models? p(m i D) p(d M i )p(m i ) Another likelihood + prior combination Likelihood: p(d M i ) = p(d w, M i )p(w M i )dw Möller/Mori 85
86 Conclusion Readings: Ch. 3.1, , , 3.4 Linear Models for Regression Linear combination of (non-linear) basis functions Fitting parameters of regression model Least squares Maximum likelihood (can be = least squares) Controlling over-fitting Regularization Bayesian, use prior (can be = regularization) Model selection Cross-validation (use held-out data) Bayesian (use model evidence, likelihood) Möller/Mori 86
These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationLinear Models for Regression
Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationBayesian Linear Regression. Sargur Srihari
Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationRegression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.
Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions
More informationOutline lecture 2 2(30)
Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationBayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop
Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate
More informationComputer Vision Group Prof. Daniel Cremers. 3. Regression
Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationSCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati
SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA Sistemi di Elaborazione dell Informazione Regressione Ruggero Donida Labati Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy http://homes.di.unimi.it/donida
More informationNon-parametric Methods
Non-parametric Methods Machine Learning Alireza Ghane Non-Parametric Methods Alireza Ghane / Torsten Möller 1 Outline Machine Learning: What, Why, and How? Curve Fitting: (e.g.) Regression and Model Selection
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationToday. Calculus. Linear Regression. Lagrange Multipliers
Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationKernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1
Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationMachine learning - HT Basis Expansion, Regularization, Validation
Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationWeek 3: Linear Regression
Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to
More informationCSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes
CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationLecture 1b: Linear Models for Regression
Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced
More information1. Non-Uniformly Weighted Data [7pts]
Homework 1: Linear Regression Writeup due 23:59 on Friday 6 February 2015 You will do this assignment individually and submit your answers as a PDF via the Canvas course website. There is a mathematical
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationMachine Learning. 7. Logistic and Linear Regression
Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationThe evolution from MLE to MAP to Bayesian Learning
The evolution from MLE to MAP to Bayesian Learning Zhe Li January 13, 2015 1 The evolution from MLE to MAP to Bayesian Based on the linear regression, we illustrate the evolution from Maximum Loglikelihood
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationOverview c 1 What is? 2 Definition Outlines 3 Examples of 4 Related Fields Overview Linear Regression Linear Classification Neural Networks Kernel Met
c Outlines Statistical Group and College of Engineering and Computer Science Overview Linear Regression Linear Classification Neural Networks Kernel Methods and SVM Mixture Models and EM Resources More
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationFundamentals of Machine Learning
Fundamentals of Machine Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://icapeople.epfl.ch/mekhan/ emtiyaz@gmail.com Jan 2, 27 Mohammad Emtiyaz Khan 27 Goals Understand (some) fundamentals of Machine
More informationFundamentals of Machine Learning (Part I)
Fundamentals of Machine Learning (Part I) Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp April 12, 2018 Mohammad Emtiyaz Khan 2018 1 Goals Understand (some) fundamentals
More informationCSC2515 Assignment #2
CSC2515 Assignment #2 Due: Nov.4, 2pm at the START of class Worth: 18% Late assignments not accepted. 1 Pseudo-Bayesian Linear Regression (3%) In this question you will dabble in Bayesian statistics and
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationIntroduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module 2 Lecture 05 Linear Regression Good morning, welcome
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationCSC 411: Lecture 04: Logistic Regression
CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic
More informationIntroduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,
1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,
More informationArtificial Neural Networks
Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationCSC 411: Lecture 02: Linear Regression
CSC 411: Lecture 02: Linear Regression Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto (Most plots in this lecture are from Bishop s book) Zemel, Urtasun, Fidler (UofT) CSC 411: 02-Regression
More informationy(x) = x w + ε(x), (1)
Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationLinear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationRirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology
Rirdge Regression Szymon Bobek Institute of Applied Computer science AGH University of Science and Technology Based on Carlos Guestrin adn Emily Fox slides from Coursera Specialization on Machine Learnign
More informationGradient Descent. Sargur Srihari
Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationDensity Estimation: ML, MAP, Bayesian estimation
Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum
More information