Naive Bayes and Gaussian Bayes Classifier

Size: px
Start display at page:

Download "Naive Bayes and Gaussian Bayes Classifier"

Transcription

1 Naive Bayes and Gaussian Bayes Classifier Elias Tragas October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

2 Naive Bayes Bayes Rules: Naive Bayes Assumption: p(t x) = p(x t)p(t) p(x) D p(x t) = p(x j t) j=1 Likelihood function: L(θ) = p(x, t θ) = p(x t, θ)p(t θ) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

3 Example: Spam Classification Each vocabulary is one feature dimension. We encode each as a feature vector x {0, 1} V x j = 1 iff the vocabulary x j appears in the . Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

4 Example: Spam Classification Each vocabulary is one feature dimension. We encode each as a feature vector x {0, 1} V x j = 1 iff the vocabulary x j appears in the . We want to model the probability of any word x j appearing in an given the is spam or not. Example: $10,000, Toronto, Piazza, etc. Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

5 Example: Spam Classification Each vocabulary is one feature dimension. We encode each as a feature vector x {0, 1} V x j = 1 iff the vocabulary x j appears in the . We want to model the probability of any word x j appearing in an given the is spam or not. Example: $10,000, Toronto, Piazza, etc. Idea: Use Bernoulli distribution to model p(x j t) Example: p( $10, 000 spam) = 0.3 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

6 Bernoulli Naive Bayes Assuming all data points x (i) are i.i.d. samples, and p(x j t) follows a Bernoulli distribution with parameter µ jt p(x (i) t (i) ) = D j=1 µ x(i) j (1 µ jt (i) jt (i)) (1 x(i) j ) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

7 Bernoulli Naive Bayes Assuming all data points x (i) are i.i.d. samples, and p(x j t) follows a Bernoulli distribution with parameter µ jt p(t x) p(x (i) t (i) ) = D j=1 N p(t (i) )p(x (i) t (i) ) = i=1 µ x(i) j (1 µ jt (i) jt (i)) (1 x(i) j ) N D p(t (i) ) i=1 j=1 µ x(i) j (1 µ jt (i) jt (i)) (1 x(i) j ) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

8 Bernoulli Naive Bayes Assuming all data points x (i) are i.i.d. samples, and p(x j t) follows a Bernoulli distribution with parameter µ jt p(t x) p(x (i) t (i) ) = D j=1 N p(t (i) )p(x (i) t (i) ) = i=1 µ x(i) j (1 µ jt (i) jt (i)) (1 x(i) j ) N D p(t (i) ) i=1 j=1 µ x(i) j (1 µ jt (i) jt (i)) (1 x(i) j ) where p(t) = π t. Parameters π t, µ jt can be learnt using maximum likelihood. Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

9 Derivation of maximum likelihood estimator (MLE) θ = [µ, π] log L(θ) = log p(x, t θ) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

10 Derivation of maximum likelihood estimator (MLE) θ = [µ, π] log L(θ) = log p(x, t θ) = N log π t (i) + i=1 D j=1 x (i) j log µ jt (i) + (1 x (i) j ) log(1 µ jt (i)) Want: arg max θ log L(θ) subject to k π k = 1 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

11 Derivation of maximum likelihood estimator (MLE) Take derivative w.r.t. µ log L(θ) µ jk = 0 N ( ) 1 t (i) = k x (i) j 1 x (i) j = 0 µ jk 1 µ jk i=1 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

12 Derivation of maximum likelihood estimator (MLE) Take derivative w.r.t. µ log L(θ) µ jk = 0 N i=1 N ( ) 1 t (i) = k x (i) j 1 x (i) j = 0 µ jk 1 µ jk i=1 ( ) [ 1 t (i) = k x (i) j ( (1 µ jk ) 1 x (i) j ) ] µ jk = 0 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

13 Derivation of maximum likelihood estimator (MLE) Take derivative w.r.t. µ log L(θ) µ jk = 0 N i=1 N ( ) 1 t (i) = k x (i) j 1 x (i) j = 0 µ jk 1 µ jk i=1 ( ) [ 1 t (i) = k N i=1 x (i) j ( ) 1 t (i) = k µ jk = ( (1 µ jk ) N i=1 1 x (i) j ) ] µ jk = 0 ( ) 1 t (i) = k x (i) j Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

14 Derivation of maximum likelihood estimator (MLE) Take derivative w.r.t. µ log L(θ) µ jk = 0 N i=1 N ( ) 1 t (i) = k x (i) j 1 x (i) j = 0 µ jk 1 µ jk i=1 ( ) [ 1 t (i) = k N i=1 x (i) j ( ) 1 t (i) = k µ jk = ( (1 µ jk ) N i=1 1 x (i) j ) ] µ jk = 0 ( ) 1 t (i) = k x (i) j µ jk = N i=1 1 ( t (i) = k ) x (i) j N i=1 1 ( t (i) = k ) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

15 Derivation of maximum likelihood estimator (MLE) Use Lagrange multiplier to derive π L(θ) π k + λ κ π κ π k = 0 λ = N i=1 ( ) 1 1 t (i) = k) π k Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

16 Derivation of maximum likelihood estimator (MLE) Use Lagrange multiplier to derive π L(θ) π k + λ κ π κ π k π k = = 0 λ = N i=1 N i=1 1 ( t (i) = k) ) λ ( ) 1 1 t (i) = k) π k Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

17 Derivation of maximum likelihood estimator (MLE) Use Lagrange multiplier to derive π L(θ) π k + λ κ π κ π k = 0 λ = π k = λ Apply constraint: k π k = 1 λ = N π k = N i=1 N i=1 1 ( t (i) = k) ) N i=1 1 ( t (i) = k) ) N ( ) 1 1 t (i) = k) π k Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

18 Sanity check µ jk = N i=1 1 ( t (i) = k ) x (i) j N i=1 1 ( t (i) = k ) Which means each word affects the probability of some class proportionally to the number of times the class and it co-occurred, over the number of times the class appeared. π k = N i=1 1 ( t (i) = k) ) Which means the prior probability of a class is the amount of times it appeared over the total number of class appearances. N Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

19 Tips Remember that these symbols are supposed to mean something, when you re doing a derivation, focus on keeping the context of all the symbols you introduce. It will help you realize when your results are nonsense. It s important to think about how this model will behave in the real world, for example the NBC output is based on a product numbers x < 1. What does this say about the behaviour of the output over large sequences? We encode our word vector with binary occurence statements, not the number of times they appear. Why does this make sense to do? What are the implications of this approach? Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

20 Spam Classification Demo Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

21 Gaussian Bayes Classifier Instead of assuming conditional independence of x j, we model p(x t) as a Gaussian distribution and the dependence relation of x j is encoded in the covariance matrix. Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

22 Gaussian Bayes Classifier Instead of assuming conditional independence of x j, we model p(x t) as a Gaussian distribution and the dependence relation of x j is encoded in the covariance matrix. Multivariate Gaussian distribution: 1 f (x) = ( (2π) D det(σ) exp 1 ) 2 (x µ)t Σ 1 (x µ) µ: mean, Σ: covariance matrix, D: dim(x) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

23 Derivation of maximum likelihood estimator (MLE) θ = [µ, Σ, π], Z = (2π) D det(σ) p(x t) = 1 ( Z exp 1 ) 2 (x µ)t Σ 1 (x µ) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

24 Derivation of maximum likelihood estimator (MLE) θ = [µ, Σ, π], Z = (2π) D det(σ) p(x t) = 1 ( Z exp 1 ) 2 (x µ)t Σ 1 (x µ) log L(θ) = log p(x, t θ) = log p(t θ) + log p(x t, θ) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

25 Derivation of maximum likelihood estimator (MLE) θ = [µ, Σ, π], Z = (2π) D det(σ) p(x t) = 1 ( Z exp 1 ) 2 (x µ)t Σ 1 (x µ) log L(θ) = log p(x, t θ) = log p(t θ) + log p(x t, θ) = N log π t (i) log Z 1 2 i=1 ( ) T ) x (i) µ t (i) Σ 1 (x (i) µ t (i) t (i) Want: arg max θ log L(θ) subject to k π k = 1 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

26 Derivation of maximum likelihood estimator (MLE) Take derivative w.r.t. µ log L µ k = N i=0 ( ) 1 t (i) = k Σ 1 (x (i) µ k ) = 0 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

27 Derivation of maximum likelihood estimator (MLE) Take derivative w.r.t. µ log L µ k = N i=0 ( ) 1 t (i) = k Σ 1 (x (i) µ k ) = 0 µ k = N i=1 1 ( t (i) = k ) x (i) N i=1 1 ( t (i) = k ) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

28 Derivation of maximum likelihood estimator (MLE) Take derivative w.r.t. Σ 1 (not Σ) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

29 Derivation of maximum likelihood estimator (MLE) Take derivative w.r.t. Σ 1 (not Σ) Note: det(a) = det(a)a 1T A det(a) 1 = det ( A 1) x T Ax A = xx T Σ T = Σ Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

30 Derivation of maximum likelihood estimator (MLE) Take derivative w.r.t. Σ 1 (not Σ) Note: det(a) = det(a)a 1T A det(a) 1 = det ( A 1) x T Ax A = xx T Σ T = Σ log L Σ 1 k = N i=0 ( ) [ 1 t (i) = k log Z k Σ 1 k 1 2 (x (i) µ k )(x (i) µ k ) T ] = 0 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

31 Derivation of maximum likelihood estimator (MLE) Take derivative w.r.t. Σ 1 (not Σ) Note: det(a) = det(a)a 1T A det(a) 1 = det ( A 1) x T Ax A = xx T Σ T = Σ log L Σ 1 k = N i=0 ( ) [ 1 t (i) = k log Z k Σ 1 k 1 2 (x (i) µ k )(x (i) µ k ) T ] = 0 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

32 Derivation of maximum likelihood estimator (MLE) Z k = (2π) D det(σ k ) log Z k Σ 1 k = 1 Z k Z k Σ 1 k = (2π) D 2 det(σk ) 1 2 (2π) D 2 det ( Σ 1 ) 1 2 k Σ 1 k Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

33 Derivation of maximum likelihood estimator (MLE) log Z k Σ 1 k = 1 Z k Z k = det(σ 1 k ) 1 2 Σ 1 k ( 1 2 Z k = (2π) D det(σ k ) = (2π) D 2 det(σk ) 1 2 (2π) D 2 det ( Σ 1 ) 1 2 k Σ 1 k ) det ( Σ 1 ) 3 2 k det ( Σ 1 ) k Σ T k = 1 2 Σ k Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

34 Derivation of maximum likelihood estimator (MLE) log Z k Σ 1 k = 1 Z k Z k = det(σ 1 k ) 1 2 Σ 1 k ( 1 2 Z k = (2π) D det(σ k ) = (2π) D 2 det(σk ) 1 2 (2π) D 2 det ( Σ 1 ) 1 2 k Σ 1 k ) det ( Σ 1 ) 3 2 k det ( Σ 1 ) k Σ T k = 1 2 Σ k log L Σ 1 k = N i=0 ( ) [ 1 t (i) 1 = k 2 Σ k 1 ] 2 (x (i) µ k )(x (i) µ k ) T = 0 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

35 Derivation of maximum likelihood estimator (MLE) log Z k Σ 1 k = 1 Z k Z k = det(σ 1 k ) 1 2 Σ 1 k ( 1 2 Z k = (2π) D det(σ k ) = (2π) D 2 det(σk ) 1 2 (2π) D 2 det ( Σ 1 ) 1 2 k Σ 1 k ) det ( Σ 1 ) 3 2 k det ( Σ 1 ) k Σ T k = 1 2 Σ k log L Σ 1 k = N i=0 ( ) [ 1 t (i) 1 = k 2 Σ k 1 ] 2 (x (i) µ k )(x (i) µ k ) T = 0 Σ k = N i=1 1 ( t (i) = k ) ( x (i) µ k ) ( x (i) µ k ) T N i=1 1 ( t (i) = k ) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

36 Derivation of maximum likelihood estimator (MLE) N i=1 1 ( t (i) = k) ) π k = N (Same as Bernoulli) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

37 Gaussian Bayes Classifier Demo Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

38 Gaussian Bayes Classifier If we constrain Σ to be diagonal, then we can rewrite p(x j t) as a product of p(x j t) 1 p(x t) = ( (2π) D det(σ t ) exp 1 ) 2 (x j µ jt ) T Σ 1 t (x k µ kt ) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

39 Gaussian Bayes Classifier If we constrain Σ to be diagonal, then we can rewrite p(x j t) as a product of p(x j t) 1 p(x t) = ( (2π) D det(σ t ) exp 1 ) 2 (x j µ jt ) T Σ 1 t (x k µ kt ) = D j=1 ( 1 exp 1 ) x j µ jt 2 (2π) D 2 = Σ t,jj 2Σ t,jj D p(x j t) j=1 Diagonal covariance matrix satisfies the naive Bayes assumption. Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

40 Gaussian Bayes Classifier Case 1: The covariance matrix is shared among classes p(x t) = N (x µ t, Σ) Case 2: Each class has its own covariance p(x t) = N (x µ t, Σ t ) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

41 Gaussian Bayes Binary Classifier Decision Boundary If the covariance is shared between classes, p(x t = 1) = p(x t = 0) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

42 Gaussian Bayes Binary Classifier Decision Boundary If the covariance is shared between classes, p(x t = 1) = p(x t = 0) log π (x µ 1) T Σ 1 (x µ 1 ) = log π (x µ 0) T Σ 1 (x µ 0 ) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

43 Gaussian Bayes Binary Classifier Decision Boundary If the covariance is shared between classes, p(x t = 1) = p(x t = 0) log π (x µ 1) T Σ 1 (x µ 1 ) = log π (x µ 0) T Σ 1 (x µ 0 ) C + x T Σ 1 x 2µ T 1 Σ 1 x + µ T 1 Σ 1 µ 1 = x T Σ 1 x 2µ T 0 Σ 1 x + µ T 0 Σ 1 µ 0 [ 2(µ 0 µ 1 ) T Σ 1] x (µ 0 µ 1 ) T Σ 1 (µ 0 µ 1 ) = C Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

44 Gaussian Bayes Binary Classifier Decision Boundary If the covariance is shared between classes, p(x t = 1) = p(x t = 0) log π (x µ 1) T Σ 1 (x µ 1 ) = log π (x µ 0) T Σ 1 (x µ 0 ) C + x T Σ 1 x 2µ T 1 Σ 1 x + µ T 1 Σ 1 µ 1 = x T Σ 1 x 2µ T 0 Σ 1 x + µ T 0 Σ 1 µ 0 [ 2(µ 0 µ 1 ) T Σ 1] x (µ 0 µ 1 ) T Σ 1 (µ 0 µ 1 ) = C a T x b = 0 The decision boundary is a linear function (a hyperplane in general). Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

45 Relation to Logistic Regression We can write the posterior distribution p(t = 0 x) as p(x, t = 0) p(x, t = 0) + p(x, t = 1) = π 0 N (x µ 0, Σ) π 0 N (x µ 0, Σ) + π 1 N (x µ 1, Σ) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

46 Relation to Logistic Regression We can write the posterior distribution p(t = 0 x) as = p(x, t = 0) p(x, t = 0) + p(x, t = 1) = π 0 N (x µ 0, Σ) π 0 N (x µ 0, Σ) + π 1 N (x µ 1, Σ) { 1 + π [ 1 exp 1 π 0 2 (x µ 1) T Σ 1 (x µ 1 ) + 1 ]} 1 2 (x µ 0) T Σ 1 (x µ 0 ) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

47 Relation to Logistic Regression We can write the posterior distribution p(t = 0 x) as = = p(x, t = 0) p(x, t = 0) + p(x, t = 1) = π 0 N (x µ 0, Σ) π 0 N (x µ 0, Σ) + π 1 N (x µ 1, Σ) { 1 + π [ 1 exp 1 π 0 2 (x µ 1) T Σ 1 (x µ 1 ) + 1 ]} 1 2 (x µ 0) T Σ 1 (x µ 0 ) { [ 1 + exp log π 1 + (µ 1 µ 0 ) T Σ 1 x + 1 ) (µ ]} 1 T 1 Σ 1 µ 1 µ T 0 Σ 1 µ 0 π 0 2 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

48 Relation to Logistic Regression We can write the posterior distribution p(t = 0 x) as = = p(x, t = 0) p(x, t = 0) + p(x, t = 1) = π 0 N (x µ 0, Σ) π 0 N (x µ 0, Σ) + π 1 N (x µ 1, Σ) { 1 + π [ 1 exp 1 π 0 2 (x µ 1) T Σ 1 (x µ 1 ) + 1 ]} 1 2 (x µ 0) T Σ 1 (x µ 0 ) { [ 1 + exp log π 1 + (µ 1 µ 0 ) T Σ 1 x + 1 ) (µ ]} 1 T 1 Σ 1 µ 1 µ T 0 Σ 1 µ 0 π 0 2 = exp( w T x b) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

49 Gaussian Bayes Binary Classifier Decision Boundary If the covariance is not shared between classes, p(x t = 1) = p(x t = 0) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

50 Gaussian Bayes Binary Classifier Decision Boundary If the covariance is not shared between classes, p(x t = 1) = p(x t = 0) log π (x µ 1) T Σ 1 1 (x µ 1) = log π (x µ 0) T Σ 1 0 (x µ 0) Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

51 Gaussian Bayes Binary Classifier Decision Boundary If the covariance is not shared between classes, p(x t = 1) = p(x t = 0) log π (x µ 1) T Σ 1 1 (x µ 1) = log π (x µ 0) T Σ 1 0 (x µ 0) x T ( Σ 1 1 Σ 1 ) ) ) 0 x 2 (µ T 1 Σ 1 1 µ T 0 Σ 1 0 x + (µ T 0 Σ 0 µ 0 µ T 1 Σ 1 µ 1 = C Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

52 Gaussian Bayes Binary Classifier Decision Boundary If the covariance is not shared between classes, p(x t = 1) = p(x t = 0) log π (x µ 1) T Σ 1 1 (x µ 1) = log π (x µ 0) T Σ 1 0 (x µ 0) x T ( Σ 1 1 Σ 1 ) ) ) 0 x 2 (µ T 1 Σ 1 1 µ T 0 Σ 1 0 x + (µ T 0 Σ 0 µ 0 µ T 1 Σ 1 µ 1 = C x T Qx 2b T x + c = 0 The decision boundary is a quadratic function. In 2-d case, it looks like an ellipse, or a parabola, or a hyperbola. Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

53 Thanks! Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, / 23

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

CSC 411: Lecture 09: Naive Bayes

CSC 411: Lecture 09: Naive Bayes CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Generative Learning algorithms

Generative Learning algorithms CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,

More information

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Machine Learning - MT Classification: Generative Models

Machine Learning - MT Classification: Generative Models Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of Oxford October 31, 2016 Announcements Practical 1 Submission Try to get signed off during session itself Otherwise,

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2017

Cheng Soon Ong & Christian Walder. Canberra February June 2017 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Maximum likelihood estimation

Maximum likelihood estimation Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization

More information

Lecture 1 October 9, 2013

Lecture 1 October 9, 2013 Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Linear Classification: Probabilistic Generative Models

Linear Classification: Probabilistic Generative Models Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016 Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 26, 2015 Today: Bayes Classifiers Conditional Independence Naïve Bayes Readings: Mitchell: Naïve Bayes

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Naive Bayes & Introduction to Gaussians

Naive Bayes & Introduction to Gaussians Naive Bayes & Introduction to Gaussians Andreas C. Kapourani 2 March 217 1 Naive Bayes classifier In the previous lab we illustrated how to use Bayes Theorem for pattern classification, which in practice

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Minimum Error Rate Classification

Minimum Error Rate Classification Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are

More information

Machine Learning CS-6350, Assignment - 3 Due: 08 th October 2013

Machine Learning CS-6350, Assignment - 3 Due: 08 th October 2013 SCHOOL OF COMPUTING, UNIVERSITY OF UTAH Machine Learning CS-6350, Assignment - 3 Due: 08 th October 2013 Chandramouli, Shridharan sdharan@cs.utah.edu (00873255) Singla, Sumedha sumedha.singla@utah.edu

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Lecture 8 October Bayes Estimators and Average Risk Optimality

Lecture 8 October Bayes Estimators and Average Risk Optimality STATS 300A: Theory of Statistics Fall 205 Lecture 8 October 5 Lecturer: Lester Mackey Scribe: Hongseok Namkoong, Phan Minh Nguyen Warning: These notes may contain factual and/or typographic errors. 8.

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

TDA231. Logistic regression

TDA231. Logistic regression TDA231 Devdatt Dubhashi dubhashi@chalmers.se Dept. of Computer Science and Engg. Chalmers University February 19, 2016 Some data 5 x2 0 5 5 0 5 x 1 In the Bayes classifier, we built a model of each class

More information

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline

More information

Lecture 4 Logistic Regression

Lecture 4 Logistic Regression Lecture 4 Logistic Regression Dr.Ammar Mohammed Normal Equation Hypothesis hθ(x)=θ0 x0+ θ x+ θ2 x2 +... + θd xd Normal Equation is a method to find the values of θ operations x0 x x2.. xd y x x2... xd

More information

Lecture 2: Logistic Regression and Neural Networks

Lecture 2: Logistic Regression and Neural Networks 1/23 Lecture 2: and Neural Networks Pedro Savarese TTI 2018 2/23 Table of Contents 1 2 3 4 3/23 Naive Bayes Learn p(x, y) = p(y)p(x y) Training: Maximum Likelihood Estimation Issues? Why learn p(x, y)

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007 COS 424: Interacting with ata Lecturer: ave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007 1 Graphical Models Wrap-up We began the lecture with some final words on graphical models. Choosing a

More information

Classification. Sandro Cumani. Politecnico di Torino

Classification. Sandro Cumani. Politecnico di Torino Politecnico di Torino Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks Gaussian Classifier We want to

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Inf2b Learning and Data

Inf2b Learning and Data Inf2b Learning and Data Lecture 13: Review (Credit: Hiroshi Shimodaira Iain Murray and Steve Renals) Centre for Speech Technology Research (CSTR) School of Informatics University of Edinburgh http://www.inf.ed.ac.uk/teaching/courses/inf2b/

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Lecture 2: Simple Classifiers

Lecture 2: Simple Classifiers CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412

More information

CS340 Machine learning Gaussian classifiers

CS340 Machine learning Gaussian classifiers CS340 Machine learning Gaussian classifiers 1 Correlated features Height and weight are not independent 2 Multivariate Gaussian Multivariate Normal (MVN) N(x µ,σ) def 1 (2π) p/2 Σ 1/2exp[ 1 2 (x µ)t Σ

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information