Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Size: px
Start display at page:

Download "Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016"

Transcription

1 Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

2 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

3 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

4 Univariate Gaussian (Normal) Distribution X is a continuous RV with values x R X N (µ, σ 2 ), i.e. X has a Gaussian distribution or normal distribution mean E[X ] = µ mode µ N (x µ, σ 2 ) variance var[x ] = σ 2 1 2πσ 2 e 1 2σ 2 (x µ)2 precision λ = 1 σ 2 (µ 2σ, µ + 2σ) is the approx 95% interval (µ 3σ, µ + 3σ) is the approx. 99.7% interval (= P X (X = x)) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

5 Multivariate Gaussian (Normal) Distribution X is a continuous RV with values x R D X N (µ, Σ), i.e. X has a Multivariate Normal distribution (MVN) or multivariate Gaussian [ 1 N (x µ, Σ) exp 1 ] (2π) D/2 Σ 1/2 2 (x µ)t Σ 1 (x µ) mean: E[x] = µ mode: µ covariance matrix: cov[x] = Σ R D D where Σ = Σ T and Σ 0 precision matrix: Λ Σ 1 spherical isotropic covariance with Σ = σ 2 I D Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

6 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

7 MLE for an MVN Theorem Theorem 1 If we have N iid samples x i N (µ, Σ), then the MLE for the parameters is given by 1 µ MLE = 1 N N x i x i=1 2 Σ MLE = 1 N (x i x)(x i x) T = 1 ( N x i x T i N N i=1 i=1 ) x x T this theorem states the MLE parameter estimates for an MVN are just the empirical mean and the empirical covariance in the univariate case, one has ˆµ = 1 N x i x N ˆσ 2 = 1 N i=1 N (x i x)(x i x) T = 1 ( N N i=1 i=1 x i x T i ) x 2 Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

8 MLE for an MVN Theorem proof sketch in order to find the MLE one should maximize the log-likelihood of the dataset given that x i N (µ, Σ) p(d µ, Σ) = i N (x i µ, Σ) the log-likelihood (dropping additive constants) is l(µ, Σ) = log p(d µ, Σ) = N 2 log Λ 1 (x i µ)λ(x i µ) T + const 2 the MLE estimates can be obtained by maximizing l(µ, Σ) w.r.t. µ and Σ i homework: continue the proof for the univariate case Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

9 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

10 Generative Classifiers probabilistic classifier we are given a dataset D = {(x i, y i )} N i=1 the goal is to compute the class posterior p(y = c x) which models the mapping y = f (x) generative classifiers p(y = c x) is computed starting from the class-conditional density p(x y = c, θ) and the class prior p(y = c θ) given that p(y = c x, θ) p(x y = c, θ)p(y = c θ) (= p(y = c, x θ)) this is called a generative classifier since it specifies how to generate the feature vector x for each class y = c (by using p(x y = c, θ)) the model is usually fit by maximizing the joint log-likelihood, i.e. one computes θ = arg max θ i log p(y i, x i θ) discriminative classifiers the model p(y = c x) is directly fit to the data the model is usually fit by maximizing the conditional log-likelihood, i.e. one computes θ = arg max θ i log p(y i x i, θ) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

11 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

12 Gaussian Discriminant Analysis GDA we can use the MVN for defining the class conditional densities in a generative classifier p(x y = c, θ) = N (x µ c, Σ c) for c {1,..., C} this means the samples of each class c are characterized by a normal distribution this model is called Gaussian Discriminative Analysis (GDA) but it is a generative classifier (not discriminative) in the case Σ c is diagonal for each c, this model is equivalent to a Naive Bayes Classifier (NBC) since p(x y = c, θ) = D N (x j µ jc, σjc) 2 for c {1,..., C} j=1 once the model is fit to the data, we can classify a feature vector by using the decision rule [ ] ŷ(x) = argmax log p(y = c x, θ) = argmax log p(y = c π) + log p(x y = c, θ c) c c Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

13 Gaussian Discriminant Analysis GDA decision rule ŷ(x) = argmax c [ ] log p(y = c π) + log p(x y = c, θ c) given that y Cat(π) and x (y = c) N (µ c, Σ c) the decision rule becomes (dropping additive constants) [ ŷ(x) = argmin log π c + 1 c 2 log Σc + 1 ] 2 (x µc)t Σ 1 c (x µ c) which can be thought as a nearest centroid classifier in fact, with an uniform prior and Σ c = Σ ŷ(x) = argmin c (x µ c) T Σ 1 (x µ c) = argmin c x µ c 2 Σ in this case, we select the class c whose center µ c is closest to x (using the Mahalanobis distance x µ c Σ ) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

14 Mahalanobis Distance the covariance matrix Σ can be diagonalized since it is a symmetric real matrix Σ = UDU T = D λ i u i u T i where U = [u 1,..., u D ] is an orthonormal matrix of eigenvectors (i.e. U T U = I) and λ i are the corresponding eigenvalues (λ i 0 since Σ 0) one has immediately Σ 1 = UD 1 U T = D 1 i=1 u i u T i λ i ( ) 1/2 the Mahalanobis distance is defined as x µ Σ (x µ) T Σ 1 (x µ) one can rewrite i=1 (x µ) T Σ 1 (x µ) = (x µ) T ( D = D i=1 i=1 1 λ i (x µ) T u i u T i (x µ) = where y i u T i (x µ) (or equivalently y U T (x µ)) ) 1 u i u T i (x µ) = λ i Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42 D i=1 y 2 i λ i

15 Mahalanobis Distance Σ = UDU T = D i=1 λ iu i u T i (x µ) T Σ 1 (x µ) = D yi 2 i=1 (where y U T (x µ)) λ i (1) center w.r.t. µ (2) rotate by U T (3) get a norm weighted by the 1 λ i Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

16 Gaussian Discriminant Analysis GDA left: height/weight data for the two classes male/female right: visualization of 2D Gaussian fit to each class we can see that the features are correlated (tall people tend to weigh more ) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

17 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

18 Quadratic Discriminant Analysis QDA the complete class posterior with Gaussian densities is π c 2πΣ c 1/2 exp[ 1 2 p(y = c x, θ) = (x µc)t Σ 1 c (x µ c)] c π c 2πΣ c 1/2 exp[ 1 (x µ 2 c )T Σ 1 c (x µ c ) the quadratic decision boundaries can be found by imposing or equivalently p(y = c x, θ) = p(y = c x, θ) log p(y = c x, θ) = log p(y = c x, θ) for each pair of adjacent classes (c, c ), which results in the quadratic equation 1 2 (x µ c )T Σ 1 c (x µ c ) = 1 2 (x µ c )T Σ 1 c (x µ c ) + constant Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

19 Quadratic Discriminant Analysis QDA left: dataset with 2 classes right: dataset with 3 classes Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

20 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

21 Linear Discriminant Analysis LDA we now consider the GDA in the special case Σ c = Σ for c {1,..., C} in this case we have p(y = c x, θ) π c exp [ 1 ] 2 (x µc)t Σ 1 (x µ c) = [ = exp 1 ] 2 xt Σ 1 x exp [µ Tc Σ 1 x 12 ] µtc Σ 1 µ c + log π c note that the quadratic term 1 2 xt Σ 1 x is independent of c and it will cancel out in the numerator and denominator of the complete class posterior equation we define γ c 1 2 µt c Σ 1 µ c + log π c β c Σ 1 µ c we can rewrite p(y = c x, θ) = e βt c x+γc c e βt c x+γ c Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

22 Linear Discriminant Analysis LDA we have p(y = c x, θ) = e βt c x+γc c e βt c x+γ c S(η) c where η [β1 T x + γ 1,..., βc T x + γ C ] T R C and the function S(η) is the softmax function defined as [ e η 1 η S(η) c e η,..., e C ] T c c e η c and S(η) c R is just its c-th component the softmax function S(η) is so-called since it acts a bit like the max function. To see this, divide each component η c by a temperature T, then 1 if c = argmax η c S(η/T ) c = c as T 0 0 otherwise in other words, at low temperature S(η/T ) c returns the most probable state, whereas at high temperatures S(η/T ) c returns one of the states with a uniform probability (cfr. Bolzmann distribution in physics) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

23 Linear Discriminant Analysis Softmax softmax distribution S(η/T ), where η = [3, 0, 1] T, at different temperatures T when the temperature is high (left), the distribution is uniform, whereas when the temperature is low (right), the distribution is spiky, with all its mass on the largest element Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

24 Linear Discriminant Analysis LDA in order to find the decision boundaries we impose p(y = c x, θ) = p(y = c x, θ) which entails e βt c x+γc = e βt c x+γ c in this case, taking the logs returns βc T x + γ c = β T c x + γ c which in turn corresponds to a linear decision boundary 1 (β c β c ) T x = (γ c γ c ) 1 in D dimensions this corresponds to an hyperplane, in 3D to a plane, in 2D to a straight line Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

25 Linear Discriminant Analysis LDA left: dataset with 2 classes right: dataset with 3 classes Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

26 Linear Discriminant Analysis two-class LDA let us consider an LDA with just two classes (i.e. y {0, 1}) in this case e βt 1 x+γ 1 p(y = 1 x, θ) = e βt 1 x+γ 1 + e βt 0 x+γ 0 1 = 1 + e (β 0 β 1 ) T x+(γ 0 γ 1 ) that is p(y = 1 x, θ) = sigm((β 0 β 1) T x + (γ 0 γ 1)) where sigm(η) 1 is the sigmoid function (aka logistic function) 1+exp( η) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

27 Linear Discriminant Analysis two-class LDA the linear decision boundary is if we define (β 0 β 1) T x + (γ 0 γ 1) = 0 w β 1 β 0 = Σ 1 (µ 1 µ 0) x (µ1 + µ0) (µ1 µ0) log(π 1/π 0) (µ 1 µ 0) T Σ 1 (µ 1 µ 0) we obtain w T x 0 = (γ 1 γ 0) the linear decision boundary can be rewritten as w T (x x 0) = 0 in fact we have p(y = 1 x, θ) = sigm(w T (x x 0)) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

28 Linear Discriminant Analysis two-class LDA we have w β 1 β 0 = Σ 1 (µ 1 µ 0) x (µ1 + µ0) (µ1 µ0) log(π 1/π 0) (µ 1 µ 0) T Σ 1 (µ 1 µ 0) the linear decision boundary is w T (x x 0) = 0 in the case Σ 1 = Σ 2 = I and π 1 = π 0, one has w = µ 1 µ 0 and x 0 = 1 (µ1 + µ0) 2 Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

29 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

30 MLE for GDA how to fit the GDA model? the simplest way is to use MLE let s assume iid samples, then it is p(d θ) = N i=1 p(x i, y i θ) one has p(x i, y i θ) = p(x i y i, θ)p(y i π) p(x i y i, θ) = c N (x i µ c, Σ c) I(y i =c) p(y i π) = c π I(y i =c) c where θ is a compound parameter vector containing the parameters π, µ c and Σ c the log-likelihood function is [ N C ] C [ log p(d θ) = I(y i = c) log π c + i=1 c=1 c=1 i:y i =c ] log N (x i µ c, Σ c) which is the sum of C + 1 distinct terms: the first depending on π and the other C terms depending both on µ c and Σ c we can estimate each parameter by optimizing the log-likelihood separately w.r.t. it Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

31 MLE for GDA the log-likelihood function is [ N C ] C [ log p(d θ) = I(y i = c) log π c + i=1 c=1 c=1 for the class prior, as with the NBC model, we have ˆπ c = Nc N i:y i =c ] log N (x i µ c, Σ c) for the class conditional densities, we partition the data based on its class label, and compute the MLE for each Gaussian term ˆΣ c = 1 N c ˆµ c = 1 N c N c N c i:y i =c i:y i =c x i (x i ˆµ c)(x i ˆµ c) T Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

32 Posterior Predictive for GDA once the model is fit and the parameters are estimated we can make predictions by using a plug-in approximation p(y = c x, ˆθ) ˆπ c 2π ˆΣ c 1/2 exp[ 1 2 (x ˆµc)T ˆΣ 1 c (x ˆµ c)] Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

33 Overfitting for GDA the MLE is fast and simple, however it can badly overfit in high dimensions in particular, ˆΣ c = 1 N c N c i:y i =c (x i ˆµ c)(x i ˆµ c) T R D D is singular for N c < D even when N c > D, the MLE can be ill-conditioned (close to singular) possible simple strategies to solve this issue (they reduce the number of parameters) use NBC model/assumption (i.e. Σ c are diagonal) use LDA (i.e. Σ c = Σ) use diagonal LDA (i.e. Σ c = Σ = diag(σ1, 2..., σd)) 2 (following subsection) use Bayesian approach: estimate full covariance by imposing a prior and then integrating out (following subsection) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

34 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

35 Diagonal LDA the diagonal LDA assumes Σ c = Σ = diag(σ 2 1,..., σ 2 D) for c {1,..., C} one has p(x i, y i = c θ) = p(x i y i = c, θ c)p(y i = c π) = N (x i µ c, Σ)π c = and taking the logs log p(x i, y i = c θ) = D (x ij µ cj ) 2 + log π c j=1 typically the estimates of the parameters are ˆµ cj = 1 N c i:y i =c ˆσ 2 j = 1 N C x ij 2σ 2 j D N (x ij µ cj, σj 2 ) j=1 C (x ij ˆµ cj ) 2 (pooled empirical variance) c=1 i:y i =c in high-dimensional settings, this model can work much better than LDA and RDA Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

36 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

37 Bayesian Procedure we now follow the full Bayesian procedure to fit the GDA model let s restart from the expression of the posterior predictive PDF p(y = c x, D) = p(y = c, x D) p(x D) since we are interested in computing = p(x y = c, D)p(y = c D) p(x D) c = argmax c p(y = c x, D) we can neglect the constant p(x D) and use the following simpler expression p(y = c x, D) p(x y = c, D)p(y = c D) note that we didn t use the model parameters in the previous equation now we use the Bayesian procedure in which we integrate out the unknown parameters for simplicity we now consider a vector parameter π for the PMF p(y = c D) and a vector parameter θ c for the PDF p(x y = c, D) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

38 Bayesian Procedure as for the PMF p(y = c D) we can integrate out π as follows p(y = c D) = p(y = c, π D)dπ we know that y Cat(π) i.e. p(y π) = c πi(y=c) c we can decompose p(y = c, π D) as follows p(y = c, π D) = p(y = c π, D)p(π D) = p(y = c π)p(π D) = π cp(π D) where p(π D) is the posterior w.r.t. π using the previous equation in integral above we have p(y = c D) = p(y = c, π D)dπ = π cp(π D)dπ = E[π c D] = Nc + αc N + α 0 which is the posterior mean computed for the Dirichlet-multinomial model (cfr lecture 4 slides) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

39 Bayesian Procedure as for the PDF p(x y = c, D) we can integrate out θ c as follows p(x y = c, D) = p(x, θ c y = c, D)dθ c = p(x, θ c D c)dθ c where for simplicity we introduce D c {(x i, y i ) D y i = c} we know that p(x θ c) = N (x µ c, Σ c) where θ c = (µ c, Σ c) we can use the following decomposition p(x, θ c D c) = p(x θ c, D c)p(θ c D c) = p(x θ c)p(θ c D c) where p(θ c D c) is the posterior w.r.t. θ c hence one has p(x y = c, D) = p(x, θ c D c)dθ c = p(x θ c)p(θ c D c)dθ c = = N (x µ c, Σ c)p(µ c, Σ c D c)dµ cdσ c Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

40 Bayesian Procedure one has p(x y = c, D) = N (x µ c, Σ c)p(µ c, Σ c D c)dµ cdσ c the posterior is (see sect of the book) p(µ c, Σ c D c) = NIW(m c, Σ c m c N, κ c N, ν c N, S c N) then (see sect ) p(x y = c, D) = N (x µ c, Σ c)niw(µ c, Σ c m c N, κ c N, ν c N, S c N)dµ cdσ c = p(x y = c, D) = T (x m c N, κ c N + 1 κ c N (νc N D + 1) Sc N, ν c N D + 1) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

41 Bayesian Procedure let s summarize what we obtained by applying the Bayesian procedure we first found and then p(y = c D) = E[π c D] = p(x y = c, D) = T (x m c N, Nc + αc N + α 0 κ c N + 1 κ c N (νc N D + 1) Sc N, ν c N D + 1) then combining everything in the starting posterior predictive we have p(y = c x, D) p(x y = c, D)p(y = c D) = = E[π c D]T (x m c N, κ c N + 1 κ c N (νc N D + 1) Sc N, ν c N D + 1) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

42 Credits Kevin Murphy s book Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016 Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic

More information

Machine Learning - MT Classification: Generative Models

Machine Learning - MT Classification: Generative Models Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of Oxford October 31, 2016 Announcements Practical 1 Submission Try to get signed off during session itself Otherwise,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016

Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016 Lecture 3 Probability - Part 2 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 19, 2016 Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 1 / 46 Outline 1 Common Continuous

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

CS340 Machine learning Gaussian classifiers

CS340 Machine learning Gaussian classifiers CS340 Machine learning Gaussian classifiers 1 Correlated features Height and weight are not independent 2 Multivariate Gaussian Multivariate Normal (MVN) N(x µ,σ) def 1 (2π) p/2 Σ 1/2exp[ 1 2 (x µ)t Σ

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

7 Gaussian Discriminant Analysis (including QDA and LDA)

7 Gaussian Discriminant Analysis (including QDA and LDA) 36 Jonathan Richard Shewchuk 7 Gaussian Discriminant Analysis (including QDA and LDA) GAUSSIAN DISCRIMINANT ANALYSIS Fundamental assumption: each class comes from normal distribution (Gaussian). X N(µ,

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Classification. Sandro Cumani. Politecnico di Torino

Classification. Sandro Cumani. Politecnico di Torino Politecnico di Torino Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks Gaussian Classifier We want to

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

CSC 411: Lecture 09: Naive Bayes

CSC 411: Lecture 09: Naive Bayes CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

LDA, QDA, Naive Bayes

LDA, QDA, Naive Bayes LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

MSA220 Statistical Learning for Big Data

MSA220 Statistical Learning for Big Data MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Inf2b Learning and Data

Inf2b Learning and Data Inf2b Learning and Data Lecture 13: Review (Credit: Hiroshi Shimodaira Iain Murray and Steve Renals) Centre for Speech Technology Research (CSTR) School of Informatics University of Edinburgh http://www.inf.ed.ac.uk/teaching/courses/inf2b/

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X

More information

The generative approach to classification. A classification problem. Generative models CSE 250B

The generative approach to classification. A classification problem. Generative models CSE 250B The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

CS534: Machine Learning. Thomas G. Dietterich 221C Dearborn Hall

CS534: Machine Learning. Thomas G. Dietterich 221C Dearborn Hall CS534: Machine Learning Thomas G. Dietterich 221C Dearborn Hall tgd@cs.orst.edu http://www.cs.orst.edu/~tgd/classes/534 1 Course Overview Introduction: Basic problems and questions in machine learning.

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1 Machine Learning 1 Linear Classifiers Marius Kloft Humboldt University of Berlin Summer Term 2014 Machine Learning 1 Linear Classifiers 1 Recap Past lectures: Machine Learning 1 Linear Classifiers 2 Recap

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the

More information

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems

More information

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution Part I 09.06.2006 Discriminant Analysis The purpose of discriminant analysis is to assign objects to one of several (K) groups based on a set of measurements X = (X 1, X 2,..., X p ) which are obtained

More information

Exam 2. Jeremy Morris. March 23, 2006

Exam 2. Jeremy Morris. March 23, 2006 Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following

More information

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

CSC411 Fall 2018 Homework 5

CSC411 Fall 2018 Homework 5 Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to

More information

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries Overview Gaussians estimated from training data Guido Sanguinetti Informatics B Learning and Data Lecture 1 9 March 1 Today s lecture Posterior probabilities, decision regions and minimising the probability

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Naive Bayes & Introduction to Gaussians

Naive Bayes & Introduction to Gaussians Naive Bayes & Introduction to Gaussians Andreas C. Kapourani 2 March 217 1 Naive Bayes classifier In the previous lab we illustrated how to use Bayes Theorem for pattern classification, which in practice

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify

More information

Generative Learning algorithms

Generative Learning algorithms CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,

More information

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples

More information

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Lecture 5: Classification

Lecture 5: Classification Lecture 5: Classification Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical Sciences Binghamton

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information