Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016
|
|
- Rodney Griffin
- 5 years ago
- Views:
Transcription
1 Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
2 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
3 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
4 Univariate Gaussian (Normal) Distribution X is a continuous RV with values x R X N (µ, σ 2 ), i.e. X has a Gaussian distribution or normal distribution mean E[X ] = µ mode µ N (x µ, σ 2 ) variance var[x ] = σ 2 1 2πσ 2 e 1 2σ 2 (x µ)2 precision λ = 1 σ 2 (µ 2σ, µ + 2σ) is the approx 95% interval (µ 3σ, µ + 3σ) is the approx. 99.7% interval (= P X (X = x)) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
5 Multivariate Gaussian (Normal) Distribution X is a continuous RV with values x R D X N (µ, Σ), i.e. X has a Multivariate Normal distribution (MVN) or multivariate Gaussian [ 1 N (x µ, Σ) exp 1 ] (2π) D/2 Σ 1/2 2 (x µ)t Σ 1 (x µ) mean: E[x] = µ mode: µ covariance matrix: cov[x] = Σ R D D where Σ = Σ T and Σ 0 precision matrix: Λ Σ 1 spherical isotropic covariance with Σ = σ 2 I D Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
6 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
7 MLE for an MVN Theorem Theorem 1 If we have N iid samples x i N (µ, Σ), then the MLE for the parameters is given by 1 µ MLE = 1 N N x i x i=1 2 Σ MLE = 1 N (x i x)(x i x) T = 1 ( N x i x T i N N i=1 i=1 ) x x T this theorem states the MLE parameter estimates for an MVN are just the empirical mean and the empirical covariance in the univariate case, one has ˆµ = 1 N x i x N ˆσ 2 = 1 N i=1 N (x i x)(x i x) T = 1 ( N N i=1 i=1 x i x T i ) x 2 Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
8 MLE for an MVN Theorem proof sketch in order to find the MLE one should maximize the log-likelihood of the dataset given that x i N (µ, Σ) p(d µ, Σ) = i N (x i µ, Σ) the log-likelihood (dropping additive constants) is l(µ, Σ) = log p(d µ, Σ) = N 2 log Λ 1 (x i µ)λ(x i µ) T + const 2 the MLE estimates can be obtained by maximizing l(µ, Σ) w.r.t. µ and Σ i homework: continue the proof for the univariate case Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
9 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
10 Generative Classifiers probabilistic classifier we are given a dataset D = {(x i, y i )} N i=1 the goal is to compute the class posterior p(y = c x) which models the mapping y = f (x) generative classifiers p(y = c x) is computed starting from the class-conditional density p(x y = c, θ) and the class prior p(y = c θ) given that p(y = c x, θ) p(x y = c, θ)p(y = c θ) (= p(y = c, x θ)) this is called a generative classifier since it specifies how to generate the feature vector x for each class y = c (by using p(x y = c, θ)) the model is usually fit by maximizing the joint log-likelihood, i.e. one computes θ = arg max θ i log p(y i, x i θ) discriminative classifiers the model p(y = c x) is directly fit to the data the model is usually fit by maximizing the conditional log-likelihood, i.e. one computes θ = arg max θ i log p(y i x i, θ) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
11 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
12 Gaussian Discriminant Analysis GDA we can use the MVN for defining the class conditional densities in a generative classifier p(x y = c, θ) = N (x µ c, Σ c) for c {1,..., C} this means the samples of each class c are characterized by a normal distribution this model is called Gaussian Discriminative Analysis (GDA) but it is a generative classifier (not discriminative) in the case Σ c is diagonal for each c, this model is equivalent to a Naive Bayes Classifier (NBC) since p(x y = c, θ) = D N (x j µ jc, σjc) 2 for c {1,..., C} j=1 once the model is fit to the data, we can classify a feature vector by using the decision rule [ ] ŷ(x) = argmax log p(y = c x, θ) = argmax log p(y = c π) + log p(x y = c, θ c) c c Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
13 Gaussian Discriminant Analysis GDA decision rule ŷ(x) = argmax c [ ] log p(y = c π) + log p(x y = c, θ c) given that y Cat(π) and x (y = c) N (µ c, Σ c) the decision rule becomes (dropping additive constants) [ ŷ(x) = argmin log π c + 1 c 2 log Σc + 1 ] 2 (x µc)t Σ 1 c (x µ c) which can be thought as a nearest centroid classifier in fact, with an uniform prior and Σ c = Σ ŷ(x) = argmin c (x µ c) T Σ 1 (x µ c) = argmin c x µ c 2 Σ in this case, we select the class c whose center µ c is closest to x (using the Mahalanobis distance x µ c Σ ) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
14 Mahalanobis Distance the covariance matrix Σ can be diagonalized since it is a symmetric real matrix Σ = UDU T = D λ i u i u T i where U = [u 1,..., u D ] is an orthonormal matrix of eigenvectors (i.e. U T U = I) and λ i are the corresponding eigenvalues (λ i 0 since Σ 0) one has immediately Σ 1 = UD 1 U T = D 1 i=1 u i u T i λ i ( ) 1/2 the Mahalanobis distance is defined as x µ Σ (x µ) T Σ 1 (x µ) one can rewrite i=1 (x µ) T Σ 1 (x µ) = (x µ) T ( D = D i=1 i=1 1 λ i (x µ) T u i u T i (x µ) = where y i u T i (x µ) (or equivalently y U T (x µ)) ) 1 u i u T i (x µ) = λ i Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42 D i=1 y 2 i λ i
15 Mahalanobis Distance Σ = UDU T = D i=1 λ iu i u T i (x µ) T Σ 1 (x µ) = D yi 2 i=1 (where y U T (x µ)) λ i (1) center w.r.t. µ (2) rotate by U T (3) get a norm weighted by the 1 λ i Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
16 Gaussian Discriminant Analysis GDA left: height/weight data for the two classes male/female right: visualization of 2D Gaussian fit to each class we can see that the features are correlated (tall people tend to weigh more ) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
17 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
18 Quadratic Discriminant Analysis QDA the complete class posterior with Gaussian densities is π c 2πΣ c 1/2 exp[ 1 2 p(y = c x, θ) = (x µc)t Σ 1 c (x µ c)] c π c 2πΣ c 1/2 exp[ 1 (x µ 2 c )T Σ 1 c (x µ c ) the quadratic decision boundaries can be found by imposing or equivalently p(y = c x, θ) = p(y = c x, θ) log p(y = c x, θ) = log p(y = c x, θ) for each pair of adjacent classes (c, c ), which results in the quadratic equation 1 2 (x µ c )T Σ 1 c (x µ c ) = 1 2 (x µ c )T Σ 1 c (x µ c ) + constant Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
19 Quadratic Discriminant Analysis QDA left: dataset with 2 classes right: dataset with 3 classes Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
20 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
21 Linear Discriminant Analysis LDA we now consider the GDA in the special case Σ c = Σ for c {1,..., C} in this case we have p(y = c x, θ) π c exp [ 1 ] 2 (x µc)t Σ 1 (x µ c) = [ = exp 1 ] 2 xt Σ 1 x exp [µ Tc Σ 1 x 12 ] µtc Σ 1 µ c + log π c note that the quadratic term 1 2 xt Σ 1 x is independent of c and it will cancel out in the numerator and denominator of the complete class posterior equation we define γ c 1 2 µt c Σ 1 µ c + log π c β c Σ 1 µ c we can rewrite p(y = c x, θ) = e βt c x+γc c e βt c x+γ c Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
22 Linear Discriminant Analysis LDA we have p(y = c x, θ) = e βt c x+γc c e βt c x+γ c S(η) c where η [β1 T x + γ 1,..., βc T x + γ C ] T R C and the function S(η) is the softmax function defined as [ e η 1 η S(η) c e η,..., e C ] T c c e η c and S(η) c R is just its c-th component the softmax function S(η) is so-called since it acts a bit like the max function. To see this, divide each component η c by a temperature T, then 1 if c = argmax η c S(η/T ) c = c as T 0 0 otherwise in other words, at low temperature S(η/T ) c returns the most probable state, whereas at high temperatures S(η/T ) c returns one of the states with a uniform probability (cfr. Bolzmann distribution in physics) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
23 Linear Discriminant Analysis Softmax softmax distribution S(η/T ), where η = [3, 0, 1] T, at different temperatures T when the temperature is high (left), the distribution is uniform, whereas when the temperature is low (right), the distribution is spiky, with all its mass on the largest element Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
24 Linear Discriminant Analysis LDA in order to find the decision boundaries we impose p(y = c x, θ) = p(y = c x, θ) which entails e βt c x+γc = e βt c x+γ c in this case, taking the logs returns βc T x + γ c = β T c x + γ c which in turn corresponds to a linear decision boundary 1 (β c β c ) T x = (γ c γ c ) 1 in D dimensions this corresponds to an hyperplane, in 3D to a plane, in 2D to a straight line Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
25 Linear Discriminant Analysis LDA left: dataset with 2 classes right: dataset with 3 classes Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
26 Linear Discriminant Analysis two-class LDA let us consider an LDA with just two classes (i.e. y {0, 1}) in this case e βt 1 x+γ 1 p(y = 1 x, θ) = e βt 1 x+γ 1 + e βt 0 x+γ 0 1 = 1 + e (β 0 β 1 ) T x+(γ 0 γ 1 ) that is p(y = 1 x, θ) = sigm((β 0 β 1) T x + (γ 0 γ 1)) where sigm(η) 1 is the sigmoid function (aka logistic function) 1+exp( η) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
27 Linear Discriminant Analysis two-class LDA the linear decision boundary is if we define (β 0 β 1) T x + (γ 0 γ 1) = 0 w β 1 β 0 = Σ 1 (µ 1 µ 0) x (µ1 + µ0) (µ1 µ0) log(π 1/π 0) (µ 1 µ 0) T Σ 1 (µ 1 µ 0) we obtain w T x 0 = (γ 1 γ 0) the linear decision boundary can be rewritten as w T (x x 0) = 0 in fact we have p(y = 1 x, θ) = sigm(w T (x x 0)) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
28 Linear Discriminant Analysis two-class LDA we have w β 1 β 0 = Σ 1 (µ 1 µ 0) x (µ1 + µ0) (µ1 µ0) log(π 1/π 0) (µ 1 µ 0) T Σ 1 (µ 1 µ 0) the linear decision boundary is w T (x x 0) = 0 in the case Σ 1 = Σ 2 = I and π 1 = π 0, one has w = µ 1 µ 0 and x 0 = 1 (µ1 + µ0) 2 Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
29 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
30 MLE for GDA how to fit the GDA model? the simplest way is to use MLE let s assume iid samples, then it is p(d θ) = N i=1 p(x i, y i θ) one has p(x i, y i θ) = p(x i y i, θ)p(y i π) p(x i y i, θ) = c N (x i µ c, Σ c) I(y i =c) p(y i π) = c π I(y i =c) c where θ is a compound parameter vector containing the parameters π, µ c and Σ c the log-likelihood function is [ N C ] C [ log p(d θ) = I(y i = c) log π c + i=1 c=1 c=1 i:y i =c ] log N (x i µ c, Σ c) which is the sum of C + 1 distinct terms: the first depending on π and the other C terms depending both on µ c and Σ c we can estimate each parameter by optimizing the log-likelihood separately w.r.t. it Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
31 MLE for GDA the log-likelihood function is [ N C ] C [ log p(d θ) = I(y i = c) log π c + i=1 c=1 c=1 for the class prior, as with the NBC model, we have ˆπ c = Nc N i:y i =c ] log N (x i µ c, Σ c) for the class conditional densities, we partition the data based on its class label, and compute the MLE for each Gaussian term ˆΣ c = 1 N c ˆµ c = 1 N c N c N c i:y i =c i:y i =c x i (x i ˆµ c)(x i ˆµ c) T Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
32 Posterior Predictive for GDA once the model is fit and the parameters are estimated we can make predictions by using a plug-in approximation p(y = c x, ˆθ) ˆπ c 2π ˆΣ c 1/2 exp[ 1 2 (x ˆµc)T ˆΣ 1 c (x ˆµ c)] Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
33 Overfitting for GDA the MLE is fast and simple, however it can badly overfit in high dimensions in particular, ˆΣ c = 1 N c N c i:y i =c (x i ˆµ c)(x i ˆµ c) T R D D is singular for N c < D even when N c > D, the MLE can be ill-conditioned (close to singular) possible simple strategies to solve this issue (they reduce the number of parameters) use NBC model/assumption (i.e. Σ c are diagonal) use LDA (i.e. Σ c = Σ) use diagonal LDA (i.e. Σ c = Σ = diag(σ1, 2..., σd)) 2 (following subsection) use Bayesian approach: estimate full covariance by imposing a prior and then integrating out (following subsection) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
34 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
35 Diagonal LDA the diagonal LDA assumes Σ c = Σ = diag(σ 2 1,..., σ 2 D) for c {1,..., C} one has p(x i, y i = c θ) = p(x i y i = c, θ c)p(y i = c π) = N (x i µ c, Σ)π c = and taking the logs log p(x i, y i = c θ) = D (x ij µ cj ) 2 + log π c j=1 typically the estimates of the parameters are ˆµ cj = 1 N c i:y i =c ˆσ 2 j = 1 N C x ij 2σ 2 j D N (x ij µ cj, σj 2 ) j=1 C (x ij ˆµ cj ) 2 (pooled empirical variance) c=1 i:y i =c in high-dimensional settings, this model can work much better than LDA and RDA Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
36 Outline 1 Basics Multivariate Gaussian 2 MLE for an MVN Theorem 3 Gaussian Discriminant Analysis Generative Classifiers Gaussian Discriminant Analysis (GDA) Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA) MLE for Gaussian Discriminant Analysis Diagonal LDA Bayesian Procedure Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
37 Bayesian Procedure we now follow the full Bayesian procedure to fit the GDA model let s restart from the expression of the posterior predictive PDF p(y = c x, D) = p(y = c, x D) p(x D) since we are interested in computing = p(x y = c, D)p(y = c D) p(x D) c = argmax c p(y = c x, D) we can neglect the constant p(x D) and use the following simpler expression p(y = c x, D) p(x y = c, D)p(y = c D) note that we didn t use the model parameters in the previous equation now we use the Bayesian procedure in which we integrate out the unknown parameters for simplicity we now consider a vector parameter π for the PMF p(y = c D) and a vector parameter θ c for the PDF p(x y = c, D) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
38 Bayesian Procedure as for the PMF p(y = c D) we can integrate out π as follows p(y = c D) = p(y = c, π D)dπ we know that y Cat(π) i.e. p(y π) = c πi(y=c) c we can decompose p(y = c, π D) as follows p(y = c, π D) = p(y = c π, D)p(π D) = p(y = c π)p(π D) = π cp(π D) where p(π D) is the posterior w.r.t. π using the previous equation in integral above we have p(y = c D) = p(y = c, π D)dπ = π cp(π D)dπ = E[π c D] = Nc + αc N + α 0 which is the posterior mean computed for the Dirichlet-multinomial model (cfr lecture 4 slides) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
39 Bayesian Procedure as for the PDF p(x y = c, D) we can integrate out θ c as follows p(x y = c, D) = p(x, θ c y = c, D)dθ c = p(x, θ c D c)dθ c where for simplicity we introduce D c {(x i, y i ) D y i = c} we know that p(x θ c) = N (x µ c, Σ c) where θ c = (µ c, Σ c) we can use the following decomposition p(x, θ c D c) = p(x θ c, D c)p(θ c D c) = p(x θ c)p(θ c D c) where p(θ c D c) is the posterior w.r.t. θ c hence one has p(x y = c, D) = p(x, θ c D c)dθ c = p(x θ c)p(θ c D c)dθ c = = N (x µ c, Σ c)p(µ c, Σ c D c)dµ cdσ c Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
40 Bayesian Procedure one has p(x y = c, D) = N (x µ c, Σ c)p(µ c, Σ c D c)dµ cdσ c the posterior is (see sect of the book) p(µ c, Σ c D c) = NIW(m c, Σ c m c N, κ c N, ν c N, S c N) then (see sect ) p(x y = c, D) = N (x µ c, Σ c)niw(µ c, Σ c m c N, κ c N, ν c N, S c N)dµ cdσ c = p(x y = c, D) = T (x m c N, κ c N + 1 κ c N (νc N D + 1) Sc N, ν c N D + 1) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
41 Bayesian Procedure let s summarize what we obtained by applying the Bayesian procedure we first found and then p(y = c D) = E[π c D] = p(x y = c, D) = T (x m c N, Nc + αc N + α 0 κ c N + 1 κ c N (νc N D + 1) Sc N, ν c N D + 1) then combining everything in the starting posterior predictive we have p(y = c x, D) p(x y = c, D)p(y = c D) = = E[π c D]T (x m c N, κ c N + 1 κ c N (νc N D + 1) Sc N, ν c N D + 1) Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
42 Credits Kevin Murphy s book Luigi Freda ( La Sapienza University) Lecture 5 November 29, / 42
Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.
Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationLecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016
Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic
More informationMachine Learning - MT Classification: Generative Models
Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of Oxford October 31, 2016 Announcements Practical 1 Submission Try to get signed off during session itself Otherwise,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationLecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016
Lecture 3 Probability - Part 2 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 19, 2016 Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 1 / 46 Outline 1 Common Continuous
More informationBayesian Decision and Bayesian Learning
Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationCS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis
CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationCS340 Machine learning Gaussian classifiers
CS340 Machine learning Gaussian classifiers 1 Correlated features Height and weight are not independent 2 Multivariate Gaussian Multivariate Normal (MVN) N(x µ,σ) def 1 (2π) p/2 Σ 1/2exp[ 1 2 (x µ)t Σ
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More information7 Gaussian Discriminant Analysis (including QDA and LDA)
36 Jonathan Richard Shewchuk 7 Gaussian Discriminant Analysis (including QDA and LDA) GAUSSIAN DISCRIMINANT ANALYSIS Fundamental assumption: each class comes from normal distribution (Gaussian). X N(µ,
More informationGaussian Models (9/9/13)
STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationClassification. Sandro Cumani. Politecnico di Torino
Politecnico di Torino Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks Gaussian Classifier We want to
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationCSC 411: Lecture 09: Naive Bayes
CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationLecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016
Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen
More informationIntroduction to Machine Learning Spring 2018 Note 18
CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationLDA, QDA, Naive Bayes
LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationMSA220 Statistical Learning for Big Data
MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationInf2b Learning and Data
Inf2b Learning and Data Lecture 13: Review (Credit: Hiroshi Shimodaira Iain Murray and Steve Renals) Centre for Speech Technology Research (CSTR) School of Informatics University of Edinburgh http://www.inf.ed.ac.uk/teaching/courses/inf2b/
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationLinear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining
Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X
More informationThe generative approach to classification. A classification problem. Generative models CSE 250B
The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationCS534: Machine Learning. Thomas G. Dietterich 221C Dearborn Hall
CS534: Machine Learning Thomas G. Dietterich 221C Dearborn Hall tgd@cs.orst.edu http://www.cs.orst.edu/~tgd/classes/534 1 Course Overview Introduction: Basic problems and questions in machine learning.
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationMachine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1
Machine Learning 1 Linear Classifiers Marius Kloft Humboldt University of Berlin Summer Term 2014 Machine Learning 1 Linear Classifiers 1 Recap Past lectures: Machine Learning 1 Linear Classifiers 2 Recap
More informationIntroduction to Machine Learning
Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationPart I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis
Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationRegularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution
Part I 09.06.2006 Discriminant Analysis The purpose of discriminant analysis is to assign objects to one of several (K) groups based on a set of measurements X = (X 1, X 2,..., X p ) which are obtained
More informationExam 2. Jeremy Morris. March 23, 2006
Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following
More informationSupervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing
Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationBayesian Decision Theory
Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationIntroduction to Machine Learning
Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationRegularized Discriminant Analysis and Reduced-Rank LDA
Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and
More informationCSC411 Fall 2018 Homework 5
Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to
More informationInformatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries
Overview Gaussians estimated from training data Guido Sanguinetti Informatics B Learning and Data Lecture 1 9 March 1 Today s lecture Posterior probabilities, decision regions and minimising the probability
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationNaive Bayes & Introduction to Gaussians
Naive Bayes & Introduction to Gaussians Andreas C. Kapourani 2 March 217 1 Naive Bayes classifier In the previous lab we illustrated how to use Bayes Theorem for pattern classification, which in practice
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationBANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1
BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationSTATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010
STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify
More informationGenerative Learning algorithms
CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,
More informationGenerative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham
Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples
More informationLogistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationLecture 5: Classification
Lecture 5: Classification Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical Sciences Binghamton
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More information