Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
|
|
- Marylou Shona Grant
- 6 years ago
- Views:
Transcription
1 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
2 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
3 Schedule and Final Upcoming Schedule Today: Last day of class Next Monday (3/13): No class I will hold office hours in my office (BH 4531F) Next Wednesday (3/15): Final Exam, HW6 due Final Exam Cumulative but with more emphasis on new material 8 short questions (recall that midterm had 6 short and 3 long questions) Should be much shorter than midterm, but of equal difficulty Focus on major concepts (e.g., MLE, primal and dual formulations of SVM, gradient descent, etc.) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
4 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
5 Neural Networks Basic Idea Learning nonlinear basis functions and classifiers Hidden layers are nonlinear mappings from input features to new representation Output layers use the new representations for classification and regression Learning parameters Backpropogation = efficient algorithm for (stochastic) gradient descent Can write down explicit updates via chain rule of calculus Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
6 bias'unit ' x x = X Single'Node' x = x x 1 x 2 x = 6 4 h (x) =g ( x) 1 = 1+e T x Sigmoid'(logis1c)'ac1va1on'func1on:' g(z) = 1 1+e z Based'on'slide'by'Andrew'Ng' 9'
7 Neural'Network' bias'units' x a (2) h (x) Layer'1' (Input'Layer)' Layer'2' (Hidden'Layer)' Layer'3' (Output'Layer)' Slide'by'Andrew'Ng' 11'
8 Vectoriza1on' a (2) = g z (2) 1 = g (1) 1 x + (1) 11 x 1 + (1) 12 x 2 + (1) 13 x 3 1 a (2) = g z (2) 2 = g (1) 2 x + (1) 21 x 1 + (1) 22 x 2 + (1) 23 x 3 2 a (2) = g z (2) 3 = g (1) 3 x + (1) 31 x 1 + (1) 32 x 2 + (1) 33 x 3 3 h (x) =g (2) 1 a(2) + (2) 11 a(2) 1 + (2) 12 a(2) 2 + (2) 13 a(2) 3 = g z (3) 1 Based'on'slide'by'Andrew'Ng' (1) (2) h (x) Feed@Forward'Steps:' z (2) = (1) x a (2) = g(z (2) ) Add a (2) =1 z (3) = (2) a (2) h (x) =a (3) = g(z (3) ) 15'
9 Learning'in'NN:'Backpropaga1on' Similar'to'the'perceptron'learning'algorithm,'we'cycle' through'our'examples' If'the'output'of'the'network'is'correct,'no'changes'are'made' If'there'is'an'error,'weights'are'adjusted'to'reduce'the'error' We'are'just'performing''(stochas1c)'gradient'descent!' Based'on'slide'by'T.'Finin,'M.'desJardins,'L'Getoor,'R.'Par' 32'
10 J( ) = Op1mizing'the'Neural'Network' 1 n " n X LX 1 + 2n i=1 k=1 s X l 1 l=1 Solve'via:'' KX y ik log(h (x i )) k +(1 Xs l i=1 j=1 (l) ji 2 # y ik ) log 1 (h (x i )) k Unlike before, J(Θ)'is'not' (l) ij J( ) =a (l) j (l+1) i (ignoring'λ;'if'λ = )' Based'on'slide'by'Andrew'Ng' Forward' Propaga1on' Backpropaga1on' 34'
11 Forward'Propaga1on' Given'one'labeled'training'instance'(x, y):' ' Forward'Propaga1on' a (1) '= x' z (2) = Θ (1) a (1) a (2) '= g(z (2) ) z (3) = Θ (2) a (2) a (3) '= g(z (3) ) z (4) = Θ (3) a (3) [add'a (2) ]' [add'a (3) ]' a (4) '= h Θ (x) = g(z (4) )' a (1) a (2) a (3) a (4) Based'on'slide'by'Andrew'Ng' 35'
12 Backpropaga1on:'Gradient'Computa1on' Let'δ j (l) =' error 'of'node'j'in'layer'l (#layers'l'='4) Backpropaga1on'' Element@wise' product'.*' δ (4) = a (4) y δ (3) = (Θ (3) ) T δ (4).* g (z (3) ) δ (2) = (Θ (2) ) T δ (3).* g (z (2) ) (No'δ (1) ) δ (2) δ (3) g (z (3) ) = a (3).* (1 a (3) )' g (z (2) ) = a (2).* (1 a (2) )' δ (4) ' ' Based'on'slide'by'Andrew'Ng' 42'
13 Training'a'Neural'Network'via'Gradient' Descent'with'Backprop' Given: training set {(x 1,y 1 ),...,(x n,y n )} Initialize all (l) randomly (NOT to!) Loop // each iteration is called an epoch (l) ij Set = 8l, i, j For each training instance (x i,y i ): Set a (1) = x i Compute {a (2),...,a (L) } via forward propagation Compute (L) = a (L) y i Compute errors { (L 1),..., (2) } Compute gradients (l) ij = (l) ij + a(l) j Compute avg regularized gradient D (l) ij (l+1) i = ( 1 n Update weights via gradient step (l) ij = (l) ij Until weights converge or max #epochs is reached (Used'to'accumulate'gradient)' 1 n (l) ij + (l) (l) ij D (l) ij ij if j 6= otherwise Backpropaga1on' Based'on'slide'by'Andrew'Ng' 45'
14 Summary of the course so far Supervised learning has been our focus Setup: given a training dataset {x n, y n } N n=1, we learn a function h(x) to predict x s true value y (i.e., regression or classification) Linear vs. nonlinear features 1 Linear: h(x) depends on w T x 2 Nonlinear: h(x) depends on w T φ(x), where φ is either explicit or depends on a kernel function k(x m, x n ) = φ(x m ) T φ(x n ) Loss function 1 Squared loss: least square for regression (minimizing residual sum of errors) 2 Logistic loss: logistic regression 3 Exponential loss: AdaBoost 4 Margin-based loss: support vector machines Principles of estimation 1 Point estimate: maximum likelihood, regularized likelihood Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
15 cont d Optimization 1 Methods: gradient descent, Newton method 2 Convex optimization: global optimum vs. local optimum 3 Lagrange duality: primal and dual formulation Learning theory 1 Difference between training error and generalization error 2 Overfitting, bias and variance tradeoff 3 Regularization: various regularized models Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
16 Supervised versus Unsupervised Learning Supervised Learning from labeled observations Labels teach algorithm to learn mapping from observations to labels Classification, Regression Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
17 Supervised versus Unsupervised Learning Supervised Learning from labeled observations Labels teach algorithm to learn mapping from observations to labels Classification, Regression Unsupervised Learning from unlabeled observations Learning algorithm must find latent structure from features alone Can be goal in itself (discover hidden patterns, exploratory analysis) Can be means to an end (preprocessing for supervised task) Clustering (Today) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
18 Outline 1 Administration 2 Review of last lecture 3 Clustering K-means Gaussian mixture models Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
19 Clustering Setup Given D = {x n } N n=1 and K, we want to output: Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
20 Clustering Setup Given D = {x n } N n=1 and K, we want to output: {µ k } K k=1 : prototypes of clusters A(x n ) {1, 2,..., K}: the cluster membership Toy Example Cluster data into two clusters. 2 (a) 2 (i) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
21 Clustering Setup Given D = {x n } N n=1 and K, we want to output: {µ k } K k=1 : prototypes of clusters A(x n ) {1, 2,..., K}: the cluster membership Toy Example Cluster data into two clusters. 2 (a) 2 (i) Applications Identify communities within social networks Find topics in news stories Group similiar sequences into gene families Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
22 K-means example 2 (a) 2 (b) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
23 K-means example 2 (a) 2 (b) 2 (c) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
24 K-means example 2 (a) 2 (b) 2 (c) (d) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
25 K-means example 2 (a) 2 (b) 2 (c) (d) 2 (e) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
26 K-means example 2 (a) 2 (b) 2 (c) (d) 2 (e) 2 (f) (g) 2 (h) 2 (i) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
27 K-means clustering Intuition Data points assigned to cluster k should be near prototype µ k Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
28 K-means clustering Intuition Data points assigned to cluster k should be near prototype µ k Distortion measure (clustering objective function, cost function) N K J = r nk x n µ k 2 2 n=1 k=1 where r nk {, 1} is an indicator variable r nk = 1 if and only if A(x n ) = k Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
29 Algorithm Minimize distortion Alternative optimization between {r nk } and {µ k } Step Initialize {µ k } to some values Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
30 Algorithm Minimize distortion Alternative optimization between {r nk } and {µ k } Step Initialize {µ k } to some values Step 1 Fix {µ k } and minimize over {r nk }, to get this assignment: r nk = { 1 if k = arg minj x n µ j 2 2 otherwise Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
31 Algorithm Minimize distortion Alternative optimization between {r nk } and {µ k } Step Initialize {µ k } to some values Step 1 Fix {µ k } and minimize over {r nk }, to get this assignment: r nk = { 1 if k = arg minj x n µ j 2 2 otherwise Step 2 Fix {r nk } and minimize over {µ k } to get this update: n µ k = r nkx n n r nk Step 3 Return to Step 1 unless stopping criterion is met Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
32 Remarks Prototype µ k is the mean of points assigned to cluster k, hence K-means The procedure reduces J in both Step 1 and Step 2 and thus makes improvements on each iteration No guarantee we find the global solution Quality of local optimum depends on initial values at Step k-means++ is a principled approximation algorithm Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
33 Probabilistic interpretation of clustering? How can we model p(x) to reflect our intuition that points stay close to their cluster centers? Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
34 Probabilistic interpretation of clustering? How can we model p(x) to reflect our intuition that points stay close to their cluster centers? 1 (b) Points seem to form 3 clusters Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
35 Probabilistic interpretation of clustering? How can we model p(x) to reflect our intuition that points stay close to their cluster centers? 1.5 (b) Points seem to form 3 clusters We cannot model p(x) with simple and known distributions E.g., the data is not a Guassian b/c we have 3 distinct concentrated regions.5 1 Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
36 Gaussian mixture models: intuition 1.5 (a) Model each region with a distinct distribution Can use Gaussians Gaussian mixture models (GMMs).5 1 Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
37 Gaussian mixture models: intuition 1.5 (a).5 1 Model each region with a distinct distribution Can use Gaussians Gaussian mixture models (GMMs) We don t know cluster assignments (label), parameters of Gaussians, or mixture components! Must learn from unlabeled data D = {x n } N n=1 Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
38 Gaussian mixture models: formal definition GMM has the following density function for x K p(x) = ω k N(x µ k, Σ k ) k=1 K: number of Gaussians they are called mixture components µ k and Σ k : mean and covariance matrix of k-th component Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
39 Gaussian mixture models: formal definition GMM has the following density function for x K p(x) = ω k N(x µ k, Σ k ) k=1 K: number of Gaussians they are called mixture components µ k and Σ k : mean and covariance matrix of k-th component ω k : mixture weights (or priors) represent how much each component contributes to final distribution. They satisfy 2 properties: Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
40 Gaussian mixture models: formal definition GMM has the following density function for x K p(x) = ω k N(x µ k, Σ k ) k=1 K: number of Gaussians they are called mixture components µ k and Σ k : mean and covariance matrix of k-th component ω k : mixture weights (or priors) represent how much each component contributes to final distribution. They satisfy 2 properties: k, ω k >, and ω k = 1 These properties ensure p(x) is in fact a probability density function k Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
41 GMM as the marginal distribution of a joint distribution Consider the following joint distribution p(x, z) = p(z)p(x z) where z is a discrete random variable taking values between 1 and K. Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
42 GMM as the marginal distribution of a joint distribution Consider the following joint distribution p(x, z) = p(z)p(x z) where z is a discrete random variable taking values between 1 and K. Denote ω k = p(z = k) Now, assume the conditional distributions are Gaussian distributions p(x z = k) = N(x µ k, Σ k ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
43 GMM as the marginal distribution of a joint distribution Consider the following joint distribution p(x, z) = p(z)p(x z) where z is a discrete random variable taking values between 1 and K. Denote ω k = p(z = k) Now, assume the conditional distributions are Gaussian distributions p(x z = k) = N(x µ k, Σ k ) Then, the marginal distribution of x is p(x) = K ω k N(x µ k, Σ k ) k=1 Namely, the Gaussian mixture model Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
44 GMMs: example 1.5 (a).5 1 The conditional distribution between x and z (representing color) are p(x z = red) = N(x µ 1, Σ 1 ) p(x z = blue) = N(x µ 2, Σ 2 ) p(x z = green) = N(x µ 3, Σ 3 ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
45 GMMs: example 1.5 (a).5 1 The conditional distribution between x and z (representing color) are p(x z = red) = N(x µ 1, Σ 1 ) p(x z = blue) = N(x µ 2, Σ 2 ) p(x z = green) = N(x µ 3, Σ 3 ) 1 The marginal distribution is thus (b).5 p(x) = p(red)n(x µ 1, Σ 1 ) + p(blue)n(x µ 2, Σ 2 ) + p(green)n(x µ 3, Σ 3 ).5 1 Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
46 Parameter estimation for Gaussian mixture models The parameters in GMMs are Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
47 Parameter estimation for Gaussian mixture models The parameters in GMMs are θ = {ω k, µ k, Σ k } K k=1 Let s first consider the simple/unrealistic case where we have labels z Define D = {x n, z n } N n=1 D is the complete data D the incomplete data How can we learn our parameters? Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
48 Parameter estimation for Gaussian mixture models The parameters in GMMs are θ = {ω k, µ k, Σ k } K k=1 Let s first consider the simple/unrealistic case where we have labels z Define D = {x n, z n } N n=1 D is the complete data D the incomplete data How can we learn our parameters? Given D, the maximum likelihood estimation of the θ is given by θ = arg max log D = n log p(x n, z n ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
49 Parameter estimation for GMMs: complete data The complete likelihood is decomposable log p(x n, z n ) = log p(z n )p(x n z n ) = n n k n:z n=k log p(z n )p(x n z n ) where we have grouped data by its values z n. Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
50 Parameter estimation for GMMs: complete data The complete likelihood is decomposable log p(x n, z n ) = log p(z n )p(x n z n ) = n n k n:z n=k log p(z n )p(x n z n ) where we have grouped data by its values z n. Let γ nk {, 1} be a binary variable that indicates whether z n = k: Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
51 Parameter estimation for GMMs: complete data The complete likelihood is decomposable log p(x n, z n ) = log p(z n )p(x n z n ) = n n k n:z n=k log p(z n )p(x n z n ) where we have grouped data by its values z n. Let γ nk {, 1} be a binary variable that indicates whether z n = k: log p(x n, z n ) = n k γ nk log p(z = k)p(x n z = k) n Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
52 Parameter estimation for GMMs: complete data The complete likelihood is decomposable log p(x n, z n ) = log p(z n )p(x n z n ) = n n k n:z n=k log p(z n )p(x n z n ) where we have grouped data by its values z n. Let γ nk {, 1} be a binary variable that indicates whether z n = k: log p(x n, z n ) = n k = k γ nk log p(z = k)p(x n z = k) n γ nk [log ω k + log N(x n µ k, Σ k )] n Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
53 Parameter estimation for GMMs: complete data From our previous discussion, we have log p(x n, z n ) = γ nk [log ω k + log N(x n µ k, Σ k )] n k n Regrouping, we have log p(x n, z n ) = n k γ nk log ω k + n k { } γ nk log N(x n µ k, Σ k ) n Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
54 Parameter estimation for GMMs: complete data From our previous discussion, we have log p(x n, z n ) = γ nk [log ω k + log N(x n µ k, Σ k )] n k n Regrouping, we have log p(x n, z n ) = n k γ nk log ω k + n k { } γ nk log N(x n µ k, Σ k ) n The term inside the braces depends on k-th component s parameters. It is now easy to show that (left as an exercise) the MLE is: n ω k = γ nk 1 k n γ, µ k = nk n γ γ nk x n nk n 1 Σ k = n γ γ nk (x n µ k )(x n µ k ) T nk What s the intuition? n Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
55 Intuition Since γ nk is binary, the previous solution is nothing but ω k : fraction of total data points whose z n is k note that k n γ nk = N µ k : mean of all data points whose z n is k Σ k : covariance of all data points whose z n is k Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
56 Intuition Since γ nk is binary, the previous solution is nothing but ω k : fraction of total data points whose z n is k note that k n γ nk = N µ k : mean of all data points whose z n is k Σ k : covariance of all data points whose z n is k This intuition will help us develop an algorithm for estimating θ when we do not know z n (incomplete data) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
57 Parameter estimation for GMMs: incomplete data When z n is not given, we can guess it via the posterior probability p(z n = k x n ) = p(x n z n = k)p(z n = k) p(x n ) = p(x n z n = k)p(z n = k) K k =1 p(x n z n = k )p(z n = k ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
58 Parameter estimation for GMMs: incomplete data When z n is not given, we can guess it via the posterior probability p(z n = k x n ) = p(x n z n = k)p(z n = k) p(x n ) = p(x n z n = k)p(z n = k) K k =1 p(x n z n = k )p(z n = k ) To compute the posterior probability, we need to know the parameters θ! Let s pretend we know the value of the parameters so we can compute the posterior probability. How is that going to help us? Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
59 Estimation with soft γ nk We define γ nk = p(z n = k x n ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
60 Estimation with soft γ nk We define γ nk = p(z n = k x n ) Recall that γ nk was previously binary Now it s a soft assignment of x n to k-th component Each x n is assigned to a component fractionally according to p(z n = k x n ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
61 Estimation with soft γ nk We define γ nk = p(z n = k x n ) Recall that γ nk was previously binary Now it s a soft assignment of x n to k-th component Each x n is assigned to a component fractionally according to p(z n = k x n ) We now get the same expression for the MLE as before! n ω k = γ nk 1 k n γ, µ k = nk n γ γ nk x n nk n 1 Σ k = n γ γ nk (x n µ k )(x n µ k ) T nk But remember, we re cheating by using θ to compute γ nk! n Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
62 Iterative procedure Alternate between estimating γ nk and computing parameters Step : initialize θ with some values (random or otherwise) Step 1: compute γ nk using the current θ Step 2: update θ using the just computed γ nk Step 3: go back to Step 1 Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
63 Iterative procedure Alternate between estimating γ nk and computing parameters Step : initialize θ with some values (random or otherwise) Step 1: compute γ nk using the current θ Step 2: update θ using the just computed γ nk Step 3: go back to Step 1 This is an example of the EM algorithm a powerful procedure for model estimation with hidden/latent variables Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
64 Iterative procedure Alternate between estimating γ nk and computing parameters Step : initialize θ with some values (random or otherwise) Step 1: compute γ nk using the current θ Step 2: update θ using the just computed γ nk Step 3: go back to Step 1 This is an example of the EM algorithm a powerful procedure for model estimation with hidden/latent variables Connection with K-means? Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
65 Iterative procedure Alternate between estimating γ nk and computing parameters Step : initialize θ with some values (random or otherwise) Step 1: compute γ nk using the current θ Step 2: update θ using the just computed γ nk Step 3: go back to Step 1 This is an example of the EM algorithm a powerful procedure for model estimation with hidden/latent variables Connection with K-means? GMMs provide probabilistic interpretation for K-means K-means is hard GMM or GMMs is soft K-means Posterior γ nk provides a probabilistic assignment for x n to cluster k Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26
CSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationClustering and Gaussian Mixture Models
Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationTopics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families
Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationExpectation maximization
Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:
More informationLecture 14. Clustering, K-means, and EM
Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationLogistic Regression. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, / 48
Logistic Regression Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, 2017 1 / 48 Outline 1 Administration 2 Review of last lecture 3 Logistic regression
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationLecture 11: Unsupervised Machine Learning
CSE517A Machine Learning Spring 2018 Lecture 11: Unsupervised Machine Learning Instructor: Marion Neumann Scribe: Jingyu Xin Reading: fcml Ch6 (Intro), 6.2 (k-means), 6.3 (Mixture Models); [optional]:
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationIntroduction to Machine Learning HW6
CS 189 Spring 2018 Introduction to Machine Learning HW6 Your self-grade URL is http://eecs189.org/self_grade?question_ids=1_1,1_ 2,2_1,2_2,3_1,3_2,3_3,4_1,4_2,4_3,4_4,4_5,4_6,5_1,5_2,6. This homework is
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationMachine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari
Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationGaussian Mixture Models
Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro
More informationLearning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I
Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationMidterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.
CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationReview and Motivation
Review and Motivation We can model and visualize multimodal datasets by using multiple unimodal (Gaussian-like) clusters. K-means gives us a way of partitioning points into N clusters. Once we know which
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationBut if z is conditioned on, we need to model it:
Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationLeast Mean Squares Regression
Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method
More informationLearning From Data Lecture 25 The Kernel Trick
Learning From Data Lecture 25 The Kernel Trick Learning with only inner products The Kernel M. Magdon-Ismail CSCI 400/600 recap: Large Margin is Better Controling Overfitting Non-Separable Data 0.08 random
More information