Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Size: px
Start display at page:

Download "Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26"

Transcription

1 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

2 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

3 Schedule and Final Upcoming Schedule Today: Last day of class Next Monday (3/13): No class I will hold office hours in my office (BH 4531F) Next Wednesday (3/15): Final Exam, HW6 due Final Exam Cumulative but with more emphasis on new material 8 short questions (recall that midterm had 6 short and 3 long questions) Should be much shorter than midterm, but of equal difficulty Focus on major concepts (e.g., MLE, primal and dual formulations of SVM, gradient descent, etc.) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

4 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

5 Neural Networks Basic Idea Learning nonlinear basis functions and classifiers Hidden layers are nonlinear mappings from input features to new representation Output layers use the new representations for classification and regression Learning parameters Backpropogation = efficient algorithm for (stochastic) gradient descent Can write down explicit updates via chain rule of calculus Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

6 bias'unit ' x x = X Single'Node' x = x x 1 x 2 x = 6 4 h (x) =g ( x) 1 = 1+e T x Sigmoid'(logis1c)'ac1va1on'func1on:' g(z) = 1 1+e z Based'on'slide'by'Andrew'Ng' 9'

7 Neural'Network' bias'units' x a (2) h (x) Layer'1' (Input'Layer)' Layer'2' (Hidden'Layer)' Layer'3' (Output'Layer)' Slide'by'Andrew'Ng' 11'

8 Vectoriza1on' a (2) = g z (2) 1 = g (1) 1 x + (1) 11 x 1 + (1) 12 x 2 + (1) 13 x 3 1 a (2) = g z (2) 2 = g (1) 2 x + (1) 21 x 1 + (1) 22 x 2 + (1) 23 x 3 2 a (2) = g z (2) 3 = g (1) 3 x + (1) 31 x 1 + (1) 32 x 2 + (1) 33 x 3 3 h (x) =g (2) 1 a(2) + (2) 11 a(2) 1 + (2) 12 a(2) 2 + (2) 13 a(2) 3 = g z (3) 1 Based'on'slide'by'Andrew'Ng' (1) (2) h (x) Feed@Forward'Steps:' z (2) = (1) x a (2) = g(z (2) ) Add a (2) =1 z (3) = (2) a (2) h (x) =a (3) = g(z (3) ) 15'

9 Learning'in'NN:'Backpropaga1on' Similar'to'the'perceptron'learning'algorithm,'we'cycle' through'our'examples' If'the'output'of'the'network'is'correct,'no'changes'are'made' If'there'is'an'error,'weights'are'adjusted'to'reduce'the'error' We'are'just'performing''(stochas1c)'gradient'descent!' Based'on'slide'by'T.'Finin,'M.'desJardins,'L'Getoor,'R.'Par' 32'

10 J( ) = Op1mizing'the'Neural'Network' 1 n " n X LX 1 + 2n i=1 k=1 s X l 1 l=1 Solve'via:'' KX y ik log(h (x i )) k +(1 Xs l i=1 j=1 (l) ji 2 # y ik ) log 1 (h (x i )) k Unlike before, J(Θ)'is'not' (l) ij J( ) =a (l) j (l+1) i (ignoring'λ;'if'λ = )' Based'on'slide'by'Andrew'Ng' Forward' Propaga1on' Backpropaga1on' 34'

11 Forward'Propaga1on' Given'one'labeled'training'instance'(x, y):' ' Forward'Propaga1on' a (1) '= x' z (2) = Θ (1) a (1) a (2) '= g(z (2) ) z (3) = Θ (2) a (2) a (3) '= g(z (3) ) z (4) = Θ (3) a (3) [add'a (2) ]' [add'a (3) ]' a (4) '= h Θ (x) = g(z (4) )' a (1) a (2) a (3) a (4) Based'on'slide'by'Andrew'Ng' 35'

12 Backpropaga1on:'Gradient'Computa1on' Let'δ j (l) =' error 'of'node'j'in'layer'l (#layers'l'='4) Backpropaga1on'' Element@wise' product'.*' δ (4) = a (4) y δ (3) = (Θ (3) ) T δ (4).* g (z (3) ) δ (2) = (Θ (2) ) T δ (3).* g (z (2) ) (No'δ (1) ) δ (2) δ (3) g (z (3) ) = a (3).* (1 a (3) )' g (z (2) ) = a (2).* (1 a (2) )' δ (4) ' ' Based'on'slide'by'Andrew'Ng' 42'

13 Training'a'Neural'Network'via'Gradient' Descent'with'Backprop' Given: training set {(x 1,y 1 ),...,(x n,y n )} Initialize all (l) randomly (NOT to!) Loop // each iteration is called an epoch (l) ij Set = 8l, i, j For each training instance (x i,y i ): Set a (1) = x i Compute {a (2),...,a (L) } via forward propagation Compute (L) = a (L) y i Compute errors { (L 1),..., (2) } Compute gradients (l) ij = (l) ij + a(l) j Compute avg regularized gradient D (l) ij (l+1) i = ( 1 n Update weights via gradient step (l) ij = (l) ij Until weights converge or max #epochs is reached (Used'to'accumulate'gradient)' 1 n (l) ij + (l) (l) ij D (l) ij ij if j 6= otherwise Backpropaga1on' Based'on'slide'by'Andrew'Ng' 45'

14 Summary of the course so far Supervised learning has been our focus Setup: given a training dataset {x n, y n } N n=1, we learn a function h(x) to predict x s true value y (i.e., regression or classification) Linear vs. nonlinear features 1 Linear: h(x) depends on w T x 2 Nonlinear: h(x) depends on w T φ(x), where φ is either explicit or depends on a kernel function k(x m, x n ) = φ(x m ) T φ(x n ) Loss function 1 Squared loss: least square for regression (minimizing residual sum of errors) 2 Logistic loss: logistic regression 3 Exponential loss: AdaBoost 4 Margin-based loss: support vector machines Principles of estimation 1 Point estimate: maximum likelihood, regularized likelihood Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

15 cont d Optimization 1 Methods: gradient descent, Newton method 2 Convex optimization: global optimum vs. local optimum 3 Lagrange duality: primal and dual formulation Learning theory 1 Difference between training error and generalization error 2 Overfitting, bias and variance tradeoff 3 Regularization: various regularized models Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

16 Supervised versus Unsupervised Learning Supervised Learning from labeled observations Labels teach algorithm to learn mapping from observations to labels Classification, Regression Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

17 Supervised versus Unsupervised Learning Supervised Learning from labeled observations Labels teach algorithm to learn mapping from observations to labels Classification, Regression Unsupervised Learning from unlabeled observations Learning algorithm must find latent structure from features alone Can be goal in itself (discover hidden patterns, exploratory analysis) Can be means to an end (preprocessing for supervised task) Clustering (Today) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

18 Outline 1 Administration 2 Review of last lecture 3 Clustering K-means Gaussian mixture models Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

19 Clustering Setup Given D = {x n } N n=1 and K, we want to output: Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

20 Clustering Setup Given D = {x n } N n=1 and K, we want to output: {µ k } K k=1 : prototypes of clusters A(x n ) {1, 2,..., K}: the cluster membership Toy Example Cluster data into two clusters. 2 (a) 2 (i) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

21 Clustering Setup Given D = {x n } N n=1 and K, we want to output: {µ k } K k=1 : prototypes of clusters A(x n ) {1, 2,..., K}: the cluster membership Toy Example Cluster data into two clusters. 2 (a) 2 (i) Applications Identify communities within social networks Find topics in news stories Group similiar sequences into gene families Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

22 K-means example 2 (a) 2 (b) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

23 K-means example 2 (a) 2 (b) 2 (c) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

24 K-means example 2 (a) 2 (b) 2 (c) (d) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

25 K-means example 2 (a) 2 (b) 2 (c) (d) 2 (e) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

26 K-means example 2 (a) 2 (b) 2 (c) (d) 2 (e) 2 (f) (g) 2 (h) 2 (i) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

27 K-means clustering Intuition Data points assigned to cluster k should be near prototype µ k Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

28 K-means clustering Intuition Data points assigned to cluster k should be near prototype µ k Distortion measure (clustering objective function, cost function) N K J = r nk x n µ k 2 2 n=1 k=1 where r nk {, 1} is an indicator variable r nk = 1 if and only if A(x n ) = k Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

29 Algorithm Minimize distortion Alternative optimization between {r nk } and {µ k } Step Initialize {µ k } to some values Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

30 Algorithm Minimize distortion Alternative optimization between {r nk } and {µ k } Step Initialize {µ k } to some values Step 1 Fix {µ k } and minimize over {r nk }, to get this assignment: r nk = { 1 if k = arg minj x n µ j 2 2 otherwise Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

31 Algorithm Minimize distortion Alternative optimization between {r nk } and {µ k } Step Initialize {µ k } to some values Step 1 Fix {µ k } and minimize over {r nk }, to get this assignment: r nk = { 1 if k = arg minj x n µ j 2 2 otherwise Step 2 Fix {r nk } and minimize over {µ k } to get this update: n µ k = r nkx n n r nk Step 3 Return to Step 1 unless stopping criterion is met Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

32 Remarks Prototype µ k is the mean of points assigned to cluster k, hence K-means The procedure reduces J in both Step 1 and Step 2 and thus makes improvements on each iteration No guarantee we find the global solution Quality of local optimum depends on initial values at Step k-means++ is a principled approximation algorithm Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

33 Probabilistic interpretation of clustering? How can we model p(x) to reflect our intuition that points stay close to their cluster centers? Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

34 Probabilistic interpretation of clustering? How can we model p(x) to reflect our intuition that points stay close to their cluster centers? 1 (b) Points seem to form 3 clusters Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

35 Probabilistic interpretation of clustering? How can we model p(x) to reflect our intuition that points stay close to their cluster centers? 1.5 (b) Points seem to form 3 clusters We cannot model p(x) with simple and known distributions E.g., the data is not a Guassian b/c we have 3 distinct concentrated regions.5 1 Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

36 Gaussian mixture models: intuition 1.5 (a) Model each region with a distinct distribution Can use Gaussians Gaussian mixture models (GMMs).5 1 Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

37 Gaussian mixture models: intuition 1.5 (a).5 1 Model each region with a distinct distribution Can use Gaussians Gaussian mixture models (GMMs) We don t know cluster assignments (label), parameters of Gaussians, or mixture components! Must learn from unlabeled data D = {x n } N n=1 Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

38 Gaussian mixture models: formal definition GMM has the following density function for x K p(x) = ω k N(x µ k, Σ k ) k=1 K: number of Gaussians they are called mixture components µ k and Σ k : mean and covariance matrix of k-th component Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

39 Gaussian mixture models: formal definition GMM has the following density function for x K p(x) = ω k N(x µ k, Σ k ) k=1 K: number of Gaussians they are called mixture components µ k and Σ k : mean and covariance matrix of k-th component ω k : mixture weights (or priors) represent how much each component contributes to final distribution. They satisfy 2 properties: Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

40 Gaussian mixture models: formal definition GMM has the following density function for x K p(x) = ω k N(x µ k, Σ k ) k=1 K: number of Gaussians they are called mixture components µ k and Σ k : mean and covariance matrix of k-th component ω k : mixture weights (or priors) represent how much each component contributes to final distribution. They satisfy 2 properties: k, ω k >, and ω k = 1 These properties ensure p(x) is in fact a probability density function k Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

41 GMM as the marginal distribution of a joint distribution Consider the following joint distribution p(x, z) = p(z)p(x z) where z is a discrete random variable taking values between 1 and K. Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

42 GMM as the marginal distribution of a joint distribution Consider the following joint distribution p(x, z) = p(z)p(x z) where z is a discrete random variable taking values between 1 and K. Denote ω k = p(z = k) Now, assume the conditional distributions are Gaussian distributions p(x z = k) = N(x µ k, Σ k ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

43 GMM as the marginal distribution of a joint distribution Consider the following joint distribution p(x, z) = p(z)p(x z) where z is a discrete random variable taking values between 1 and K. Denote ω k = p(z = k) Now, assume the conditional distributions are Gaussian distributions p(x z = k) = N(x µ k, Σ k ) Then, the marginal distribution of x is p(x) = K ω k N(x µ k, Σ k ) k=1 Namely, the Gaussian mixture model Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

44 GMMs: example 1.5 (a).5 1 The conditional distribution between x and z (representing color) are p(x z = red) = N(x µ 1, Σ 1 ) p(x z = blue) = N(x µ 2, Σ 2 ) p(x z = green) = N(x µ 3, Σ 3 ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

45 GMMs: example 1.5 (a).5 1 The conditional distribution between x and z (representing color) are p(x z = red) = N(x µ 1, Σ 1 ) p(x z = blue) = N(x µ 2, Σ 2 ) p(x z = green) = N(x µ 3, Σ 3 ) 1 The marginal distribution is thus (b).5 p(x) = p(red)n(x µ 1, Σ 1 ) + p(blue)n(x µ 2, Σ 2 ) + p(green)n(x µ 3, Σ 3 ).5 1 Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

46 Parameter estimation for Gaussian mixture models The parameters in GMMs are Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

47 Parameter estimation for Gaussian mixture models The parameters in GMMs are θ = {ω k, µ k, Σ k } K k=1 Let s first consider the simple/unrealistic case where we have labels z Define D = {x n, z n } N n=1 D is the complete data D the incomplete data How can we learn our parameters? Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

48 Parameter estimation for Gaussian mixture models The parameters in GMMs are θ = {ω k, µ k, Σ k } K k=1 Let s first consider the simple/unrealistic case where we have labels z Define D = {x n, z n } N n=1 D is the complete data D the incomplete data How can we learn our parameters? Given D, the maximum likelihood estimation of the θ is given by θ = arg max log D = n log p(x n, z n ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

49 Parameter estimation for GMMs: complete data The complete likelihood is decomposable log p(x n, z n ) = log p(z n )p(x n z n ) = n n k n:z n=k log p(z n )p(x n z n ) where we have grouped data by its values z n. Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

50 Parameter estimation for GMMs: complete data The complete likelihood is decomposable log p(x n, z n ) = log p(z n )p(x n z n ) = n n k n:z n=k log p(z n )p(x n z n ) where we have grouped data by its values z n. Let γ nk {, 1} be a binary variable that indicates whether z n = k: Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

51 Parameter estimation for GMMs: complete data The complete likelihood is decomposable log p(x n, z n ) = log p(z n )p(x n z n ) = n n k n:z n=k log p(z n )p(x n z n ) where we have grouped data by its values z n. Let γ nk {, 1} be a binary variable that indicates whether z n = k: log p(x n, z n ) = n k γ nk log p(z = k)p(x n z = k) n Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

52 Parameter estimation for GMMs: complete data The complete likelihood is decomposable log p(x n, z n ) = log p(z n )p(x n z n ) = n n k n:z n=k log p(z n )p(x n z n ) where we have grouped data by its values z n. Let γ nk {, 1} be a binary variable that indicates whether z n = k: log p(x n, z n ) = n k = k γ nk log p(z = k)p(x n z = k) n γ nk [log ω k + log N(x n µ k, Σ k )] n Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

53 Parameter estimation for GMMs: complete data From our previous discussion, we have log p(x n, z n ) = γ nk [log ω k + log N(x n µ k, Σ k )] n k n Regrouping, we have log p(x n, z n ) = n k γ nk log ω k + n k { } γ nk log N(x n µ k, Σ k ) n Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

54 Parameter estimation for GMMs: complete data From our previous discussion, we have log p(x n, z n ) = γ nk [log ω k + log N(x n µ k, Σ k )] n k n Regrouping, we have log p(x n, z n ) = n k γ nk log ω k + n k { } γ nk log N(x n µ k, Σ k ) n The term inside the braces depends on k-th component s parameters. It is now easy to show that (left as an exercise) the MLE is: n ω k = γ nk 1 k n γ, µ k = nk n γ γ nk x n nk n 1 Σ k = n γ γ nk (x n µ k )(x n µ k ) T nk What s the intuition? n Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

55 Intuition Since γ nk is binary, the previous solution is nothing but ω k : fraction of total data points whose z n is k note that k n γ nk = N µ k : mean of all data points whose z n is k Σ k : covariance of all data points whose z n is k Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

56 Intuition Since γ nk is binary, the previous solution is nothing but ω k : fraction of total data points whose z n is k note that k n γ nk = N µ k : mean of all data points whose z n is k Σ k : covariance of all data points whose z n is k This intuition will help us develop an algorithm for estimating θ when we do not know z n (incomplete data) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

57 Parameter estimation for GMMs: incomplete data When z n is not given, we can guess it via the posterior probability p(z n = k x n ) = p(x n z n = k)p(z n = k) p(x n ) = p(x n z n = k)p(z n = k) K k =1 p(x n z n = k )p(z n = k ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

58 Parameter estimation for GMMs: incomplete data When z n is not given, we can guess it via the posterior probability p(z n = k x n ) = p(x n z n = k)p(z n = k) p(x n ) = p(x n z n = k)p(z n = k) K k =1 p(x n z n = k )p(z n = k ) To compute the posterior probability, we need to know the parameters θ! Let s pretend we know the value of the parameters so we can compute the posterior probability. How is that going to help us? Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

59 Estimation with soft γ nk We define γ nk = p(z n = k x n ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

60 Estimation with soft γ nk We define γ nk = p(z n = k x n ) Recall that γ nk was previously binary Now it s a soft assignment of x n to k-th component Each x n is assigned to a component fractionally according to p(z n = k x n ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

61 Estimation with soft γ nk We define γ nk = p(z n = k x n ) Recall that γ nk was previously binary Now it s a soft assignment of x n to k-th component Each x n is assigned to a component fractionally according to p(z n = k x n ) We now get the same expression for the MLE as before! n ω k = γ nk 1 k n γ, µ k = nk n γ γ nk x n nk n 1 Σ k = n γ γ nk (x n µ k )(x n µ k ) T nk But remember, we re cheating by using θ to compute γ nk! n Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

62 Iterative procedure Alternate between estimating γ nk and computing parameters Step : initialize θ with some values (random or otherwise) Step 1: compute γ nk using the current θ Step 2: update θ using the just computed γ nk Step 3: go back to Step 1 Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

63 Iterative procedure Alternate between estimating γ nk and computing parameters Step : initialize θ with some values (random or otherwise) Step 1: compute γ nk using the current θ Step 2: update θ using the just computed γ nk Step 3: go back to Step 1 This is an example of the EM algorithm a powerful procedure for model estimation with hidden/latent variables Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

64 Iterative procedure Alternate between estimating γ nk and computing parameters Step : initialize θ with some values (random or otherwise) Step 1: compute γ nk using the current θ Step 2: update θ using the just computed γ nk Step 3: go back to Step 1 This is an example of the EM algorithm a powerful procedure for model estimation with hidden/latent variables Connection with K-means? Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

65 Iterative procedure Alternate between estimating γ nk and computing parameters Step : initialize θ with some values (random or otherwise) Step 1: compute γ nk using the current θ Step 2: update θ using the just computed γ nk Step 3: go back to Step 1 This is an example of the EM algorithm a powerful procedure for model estimation with hidden/latent variables Connection with K-means? GMMs provide probabilistic interpretation for K-means K-means is hard GMM or GMMs is soft K-means Posterior γ nk provides a probabilistic assignment for x n to cluster k Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Logistic Regression. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, / 48

Logistic Regression. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, / 48 Logistic Regression Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, 2017 1 / 48 Outline 1 Administration 2 Review of last lecture 3 Logistic regression

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Lecture 11: Unsupervised Machine Learning

Lecture 11: Unsupervised Machine Learning CSE517A Machine Learning Spring 2018 Lecture 11: Unsupervised Machine Learning Instructor: Marion Neumann Scribe: Jingyu Xin Reading: fcml Ch6 (Intro), 6.2 (k-means), 6.3 (Mixture Models); [optional]:

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Introduction to Machine Learning HW6

Introduction to Machine Learning HW6 CS 189 Spring 2018 Introduction to Machine Learning HW6 Your self-grade URL is http://eecs189.org/self_grade?question_ids=1_1,1_ 2,2_1,2_2,3_1,3_2,3_3,4_1,4_2,4_3,4_4,4_5,4_6,5_1,5_2,6. This homework is

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro

More information

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam. CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Review and Motivation

Review and Motivation Review and Motivation We can model and visualize multimodal datasets by using multiple unimodal (Gaussian-like) clusters. K-means gives us a way of partitioning points into N clusters. Once we know which

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

But if z is conditioned on, we need to model it:

But if z is conditioned on, we need to model it: Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Least Mean Squares Regression

Least Mean Squares Regression Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method

More information

Learning From Data Lecture 25 The Kernel Trick

Learning From Data Lecture 25 The Kernel Trick Learning From Data Lecture 25 The Kernel Trick Learning with only inner products The Kernel M. Magdon-Ismail CSCI 400/600 recap: Large Margin is Better Controling Overfitting Non-Separable Data 0.08 random

More information