# Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Save this PDF as:

Size: px
Start display at page:

Download "Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26"

## Transcription

1 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

2 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

3 Schedule and Final Upcoming Schedule Today: Last day of class Next Monday (3/13): No class I will hold office hours in my office (BH 4531F) Next Wednesday (3/15): Final Exam, HW6 due Final Exam Cumulative but with more emphasis on new material 8 short questions (recall that midterm had 6 short and 3 long questions) Should be much shorter than midterm, but of equal difficulty Focus on major concepts (e.g., MLE, primal and dual formulations of SVM, gradient descent, etc.) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

4 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

5 Neural Networks Basic Idea Learning nonlinear basis functions and classifiers Hidden layers are nonlinear mappings from input features to new representation Output layers use the new representations for classification and regression Learning parameters Backpropogation = efficient algorithm for (stochastic) gradient descent Can write down explicit updates via chain rule of calculus Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

6 bias'unit ' x x = X Single'Node' x = x x 1 x 2 x = 6 4 h (x) =g ( x) 1 = 1+e T x Sigmoid'(logis1c)'ac1va1on'func1on:' g(z) = 1 1+e z Based'on'slide'by'Andrew'Ng' 9'

7 Neural'Network' bias'units' x a (2) h (x) Layer'1' (Input'Layer)' Layer'2' (Hidden'Layer)' Layer'3' (Output'Layer)' Slide'by'Andrew'Ng' 11'

8 Vectoriza1on' a (2) = g z (2) 1 = g (1) 1 x + (1) 11 x 1 + (1) 12 x 2 + (1) 13 x 3 1 a (2) = g z (2) 2 = g (1) 2 x + (1) 21 x 1 + (1) 22 x 2 + (1) 23 x 3 2 a (2) = g z (2) 3 = g (1) 3 x + (1) 31 x 1 + (1) 32 x 2 + (1) 33 x 3 3 h (x) =g (2) 1 a(2) + (2) 11 a(2) 1 + (2) 12 a(2) 2 + (2) 13 a(2) 3 = g z (3) 1 Based'on'slide'by'Andrew'Ng' (1) (2) h (x) z (2) = (1) x a (2) = g(z (2) ) Add a (2) =1 z (3) = (2) a (2) h (x) =a (3) = g(z (3) ) 15'

10 J( ) = Op1mizing'the'Neural'Network' 1 n " n X LX 1 + 2n i=1 k=1 s X l 1 l=1 Solve'via:'' KX y ik log(h (x i )) k +(1 Xs l i=1 j=1 (l) ji 2 # y ik ) log 1 (h (x i )) k Unlike before, J(Θ)'is'not' (l) ij J( ) =a (l) j (l+1) i (ignoring'λ;'if'λ = )' Based'on'slide'by'Andrew'Ng' Forward' Propaga1on' Backpropaga1on' 34'

11 Forward'Propaga1on' Given'one'labeled'training'instance'(x, y):' ' Forward'Propaga1on' a (1) '= x' z (2) = Θ (1) a (1) a (2) '= g(z (2) ) z (3) = Θ (2) a (2) a (3) '= g(z (3) ) z (4) = Θ (3) a (3) [add'a (2) ]' [add'a (3) ]' a (4) '= h Θ (x) = g(z (4) )' a (1) a (2) a (3) a (4) Based'on'slide'by'Andrew'Ng' 35'

12 Backpropaga1on:'Gradient'Computa1on' Let'δ j (l) =' error 'of'node'j'in'layer'l (#layers'l'='4) Backpropaga1on'' product'.*' δ (4) = a (4) y δ (3) = (Θ (3) ) T δ (4).* g (z (3) ) δ (2) = (Θ (2) ) T δ (3).* g (z (2) ) (No'δ (1) ) δ (2) δ (3) g (z (3) ) = a (3).* (1 a (3) )' g (z (2) ) = a (2).* (1 a (2) )' δ (4) ' ' Based'on'slide'by'Andrew'Ng' 42'

13 Training'a'Neural'Network'via'Gradient' Descent'with'Backprop' Given: training set {(x 1,y 1 ),...,(x n,y n )} Initialize all (l) randomly (NOT to!) Loop // each iteration is called an epoch (l) ij Set = 8l, i, j For each training instance (x i,y i ): Set a (1) = x i Compute {a (2),...,a (L) } via forward propagation Compute (L) = a (L) y i Compute errors { (L 1),..., (2) } Compute gradients (l) ij = (l) ij + a(l) j Compute avg regularized gradient D (l) ij (l+1) i = ( 1 n Update weights via gradient step (l) ij = (l) ij Until weights converge or max #epochs is reached (Used'to'accumulate'gradient)' 1 n (l) ij + (l) (l) ij D (l) ij ij if j 6= otherwise Backpropaga1on' Based'on'slide'by'Andrew'Ng' 45'

14 Summary of the course so far Supervised learning has been our focus Setup: given a training dataset {x n, y n } N n=1, we learn a function h(x) to predict x s true value y (i.e., regression or classification) Linear vs. nonlinear features 1 Linear: h(x) depends on w T x 2 Nonlinear: h(x) depends on w T φ(x), where φ is either explicit or depends on a kernel function k(x m, x n ) = φ(x m ) T φ(x n ) Loss function 1 Squared loss: least square for regression (minimizing residual sum of errors) 2 Logistic loss: logistic regression 3 Exponential loss: AdaBoost 4 Margin-based loss: support vector machines Principles of estimation 1 Point estimate: maximum likelihood, regularized likelihood Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

15 cont d Optimization 1 Methods: gradient descent, Newton method 2 Convex optimization: global optimum vs. local optimum 3 Lagrange duality: primal and dual formulation Learning theory 1 Difference between training error and generalization error 2 Overfitting, bias and variance tradeoff 3 Regularization: various regularized models Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

16 Supervised versus Unsupervised Learning Supervised Learning from labeled observations Labels teach algorithm to learn mapping from observations to labels Classification, Regression Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

17 Supervised versus Unsupervised Learning Supervised Learning from labeled observations Labels teach algorithm to learn mapping from observations to labels Classification, Regression Unsupervised Learning from unlabeled observations Learning algorithm must find latent structure from features alone Can be goal in itself (discover hidden patterns, exploratory analysis) Can be means to an end (preprocessing for supervised task) Clustering (Today) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

18 Outline 1 Administration 2 Review of last lecture 3 Clustering K-means Gaussian mixture models Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

19 Clustering Setup Given D = {x n } N n=1 and K, we want to output: Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

20 Clustering Setup Given D = {x n } N n=1 and K, we want to output: {µ k } K k=1 : prototypes of clusters A(x n ) {1, 2,..., K}: the cluster membership Toy Example Cluster data into two clusters. 2 (a) 2 (i) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

21 Clustering Setup Given D = {x n } N n=1 and K, we want to output: {µ k } K k=1 : prototypes of clusters A(x n ) {1, 2,..., K}: the cluster membership Toy Example Cluster data into two clusters. 2 (a) 2 (i) Applications Identify communities within social networks Find topics in news stories Group similiar sequences into gene families Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

22 K-means example 2 (a) 2 (b) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

23 K-means example 2 (a) 2 (b) 2 (c) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

24 K-means example 2 (a) 2 (b) 2 (c) (d) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

25 K-means example 2 (a) 2 (b) 2 (c) (d) 2 (e) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

26 K-means example 2 (a) 2 (b) 2 (c) (d) 2 (e) 2 (f) (g) 2 (h) 2 (i) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

27 K-means clustering Intuition Data points assigned to cluster k should be near prototype µ k Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

28 K-means clustering Intuition Data points assigned to cluster k should be near prototype µ k Distortion measure (clustering objective function, cost function) N K J = r nk x n µ k 2 2 n=1 k=1 where r nk {, 1} is an indicator variable r nk = 1 if and only if A(x n ) = k Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

29 Algorithm Minimize distortion Alternative optimization between {r nk } and {µ k } Step Initialize {µ k } to some values Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

30 Algorithm Minimize distortion Alternative optimization between {r nk } and {µ k } Step Initialize {µ k } to some values Step 1 Fix {µ k } and minimize over {r nk }, to get this assignment: r nk = { 1 if k = arg minj x n µ j 2 2 otherwise Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

31 Algorithm Minimize distortion Alternative optimization between {r nk } and {µ k } Step Initialize {µ k } to some values Step 1 Fix {µ k } and minimize over {r nk }, to get this assignment: r nk = { 1 if k = arg minj x n µ j 2 2 otherwise Step 2 Fix {r nk } and minimize over {µ k } to get this update: n µ k = r nkx n n r nk Step 3 Return to Step 1 unless stopping criterion is met Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

32 Remarks Prototype µ k is the mean of points assigned to cluster k, hence K-means The procedure reduces J in both Step 1 and Step 2 and thus makes improvements on each iteration No guarantee we find the global solution Quality of local optimum depends on initial values at Step k-means++ is a principled approximation algorithm Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

33 Probabilistic interpretation of clustering? How can we model p(x) to reflect our intuition that points stay close to their cluster centers? Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

34 Probabilistic interpretation of clustering? How can we model p(x) to reflect our intuition that points stay close to their cluster centers? 1 (b) Points seem to form 3 clusters Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

35 Probabilistic interpretation of clustering? How can we model p(x) to reflect our intuition that points stay close to their cluster centers? 1.5 (b) Points seem to form 3 clusters We cannot model p(x) with simple and known distributions E.g., the data is not a Guassian b/c we have 3 distinct concentrated regions.5 1 Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

36 Gaussian mixture models: intuition 1.5 (a) Model each region with a distinct distribution Can use Gaussians Gaussian mixture models (GMMs).5 1 Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

37 Gaussian mixture models: intuition 1.5 (a).5 1 Model each region with a distinct distribution Can use Gaussians Gaussian mixture models (GMMs) We don t know cluster assignments (label), parameters of Gaussians, or mixture components! Must learn from unlabeled data D = {x n } N n=1 Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

38 Gaussian mixture models: formal definition GMM has the following density function for x K p(x) = ω k N(x µ k, Σ k ) k=1 K: number of Gaussians they are called mixture components µ k and Σ k : mean and covariance matrix of k-th component Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

39 Gaussian mixture models: formal definition GMM has the following density function for x K p(x) = ω k N(x µ k, Σ k ) k=1 K: number of Gaussians they are called mixture components µ k and Σ k : mean and covariance matrix of k-th component ω k : mixture weights (or priors) represent how much each component contributes to final distribution. They satisfy 2 properties: Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

40 Gaussian mixture models: formal definition GMM has the following density function for x K p(x) = ω k N(x µ k, Σ k ) k=1 K: number of Gaussians they are called mixture components µ k and Σ k : mean and covariance matrix of k-th component ω k : mixture weights (or priors) represent how much each component contributes to final distribution. They satisfy 2 properties: k, ω k >, and ω k = 1 These properties ensure p(x) is in fact a probability density function k Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

41 GMM as the marginal distribution of a joint distribution Consider the following joint distribution p(x, z) = p(z)p(x z) where z is a discrete random variable taking values between 1 and K. Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

42 GMM as the marginal distribution of a joint distribution Consider the following joint distribution p(x, z) = p(z)p(x z) where z is a discrete random variable taking values between 1 and K. Denote ω k = p(z = k) Now, assume the conditional distributions are Gaussian distributions p(x z = k) = N(x µ k, Σ k ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

43 GMM as the marginal distribution of a joint distribution Consider the following joint distribution p(x, z) = p(z)p(x z) where z is a discrete random variable taking values between 1 and K. Denote ω k = p(z = k) Now, assume the conditional distributions are Gaussian distributions p(x z = k) = N(x µ k, Σ k ) Then, the marginal distribution of x is p(x) = K ω k N(x µ k, Σ k ) k=1 Namely, the Gaussian mixture model Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

44 GMMs: example 1.5 (a).5 1 The conditional distribution between x and z (representing color) are p(x z = red) = N(x µ 1, Σ 1 ) p(x z = blue) = N(x µ 2, Σ 2 ) p(x z = green) = N(x µ 3, Σ 3 ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

45 GMMs: example 1.5 (a).5 1 The conditional distribution between x and z (representing color) are p(x z = red) = N(x µ 1, Σ 1 ) p(x z = blue) = N(x µ 2, Σ 2 ) p(x z = green) = N(x µ 3, Σ 3 ) 1 The marginal distribution is thus (b).5 p(x) = p(red)n(x µ 1, Σ 1 ) + p(blue)n(x µ 2, Σ 2 ) + p(green)n(x µ 3, Σ 3 ).5 1 Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

46 Parameter estimation for Gaussian mixture models The parameters in GMMs are Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

47 Parameter estimation for Gaussian mixture models The parameters in GMMs are θ = {ω k, µ k, Σ k } K k=1 Let s first consider the simple/unrealistic case where we have labels z Define D = {x n, z n } N n=1 D is the complete data D the incomplete data How can we learn our parameters? Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

48 Parameter estimation for Gaussian mixture models The parameters in GMMs are θ = {ω k, µ k, Σ k } K k=1 Let s first consider the simple/unrealistic case where we have labels z Define D = {x n, z n } N n=1 D is the complete data D the incomplete data How can we learn our parameters? Given D, the maximum likelihood estimation of the θ is given by θ = arg max log D = n log p(x n, z n ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

49 Parameter estimation for GMMs: complete data The complete likelihood is decomposable log p(x n, z n ) = log p(z n )p(x n z n ) = n n k n:z n=k log p(z n )p(x n z n ) where we have grouped data by its values z n. Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

50 Parameter estimation for GMMs: complete data The complete likelihood is decomposable log p(x n, z n ) = log p(z n )p(x n z n ) = n n k n:z n=k log p(z n )p(x n z n ) where we have grouped data by its values z n. Let γ nk {, 1} be a binary variable that indicates whether z n = k: Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

51 Parameter estimation for GMMs: complete data The complete likelihood is decomposable log p(x n, z n ) = log p(z n )p(x n z n ) = n n k n:z n=k log p(z n )p(x n z n ) where we have grouped data by its values z n. Let γ nk {, 1} be a binary variable that indicates whether z n = k: log p(x n, z n ) = n k γ nk log p(z = k)p(x n z = k) n Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

52 Parameter estimation for GMMs: complete data The complete likelihood is decomposable log p(x n, z n ) = log p(z n )p(x n z n ) = n n k n:z n=k log p(z n )p(x n z n ) where we have grouped data by its values z n. Let γ nk {, 1} be a binary variable that indicates whether z n = k: log p(x n, z n ) = n k = k γ nk log p(z = k)p(x n z = k) n γ nk [log ω k + log N(x n µ k, Σ k )] n Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

53 Parameter estimation for GMMs: complete data From our previous discussion, we have log p(x n, z n ) = γ nk [log ω k + log N(x n µ k, Σ k )] n k n Regrouping, we have log p(x n, z n ) = n k γ nk log ω k + n k { } γ nk log N(x n µ k, Σ k ) n Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

54 Parameter estimation for GMMs: complete data From our previous discussion, we have log p(x n, z n ) = γ nk [log ω k + log N(x n µ k, Σ k )] n k n Regrouping, we have log p(x n, z n ) = n k γ nk log ω k + n k { } γ nk log N(x n µ k, Σ k ) n The term inside the braces depends on k-th component s parameters. It is now easy to show that (left as an exercise) the MLE is: n ω k = γ nk 1 k n γ, µ k = nk n γ γ nk x n nk n 1 Σ k = n γ γ nk (x n µ k )(x n µ k ) T nk What s the intuition? n Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

55 Intuition Since γ nk is binary, the previous solution is nothing but ω k : fraction of total data points whose z n is k note that k n γ nk = N µ k : mean of all data points whose z n is k Σ k : covariance of all data points whose z n is k Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

56 Intuition Since γ nk is binary, the previous solution is nothing but ω k : fraction of total data points whose z n is k note that k n γ nk = N µ k : mean of all data points whose z n is k Σ k : covariance of all data points whose z n is k This intuition will help us develop an algorithm for estimating θ when we do not know z n (incomplete data) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

57 Parameter estimation for GMMs: incomplete data When z n is not given, we can guess it via the posterior probability p(z n = k x n ) = p(x n z n = k)p(z n = k) p(x n ) = p(x n z n = k)p(z n = k) K k =1 p(x n z n = k )p(z n = k ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

58 Parameter estimation for GMMs: incomplete data When z n is not given, we can guess it via the posterior probability p(z n = k x n ) = p(x n z n = k)p(z n = k) p(x n ) = p(x n z n = k)p(z n = k) K k =1 p(x n z n = k )p(z n = k ) To compute the posterior probability, we need to know the parameters θ! Let s pretend we know the value of the parameters so we can compute the posterior probability. How is that going to help us? Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

59 Estimation with soft γ nk We define γ nk = p(z n = k x n ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

60 Estimation with soft γ nk We define γ nk = p(z n = k x n ) Recall that γ nk was previously binary Now it s a soft assignment of x n to k-th component Each x n is assigned to a component fractionally according to p(z n = k x n ) Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

61 Estimation with soft γ nk We define γ nk = p(z n = k x n ) Recall that γ nk was previously binary Now it s a soft assignment of x n to k-th component Each x n is assigned to a component fractionally according to p(z n = k x n ) We now get the same expression for the MLE as before! n ω k = γ nk 1 k n γ, µ k = nk n γ γ nk x n nk n 1 Σ k = n γ γ nk (x n µ k )(x n µ k ) T nk But remember, we re cheating by using θ to compute γ nk! n Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

62 Iterative procedure Alternate between estimating γ nk and computing parameters Step : initialize θ with some values (random or otherwise) Step 1: compute γ nk using the current θ Step 2: update θ using the just computed γ nk Step 3: go back to Step 1 Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

63 Iterative procedure Alternate between estimating γ nk and computing parameters Step : initialize θ with some values (random or otherwise) Step 1: compute γ nk using the current θ Step 2: update θ using the just computed γ nk Step 3: go back to Step 1 This is an example of the EM algorithm a powerful procedure for model estimation with hidden/latent variables Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

64 Iterative procedure Alternate between estimating γ nk and computing parameters Step : initialize θ with some values (random or otherwise) Step 1: compute γ nk using the current θ Step 2: update θ using the just computed γ nk Step 3: go back to Step 1 This is an example of the EM algorithm a powerful procedure for model estimation with hidden/latent variables Connection with K-means? Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

65 Iterative procedure Alternate between estimating γ nk and computing parameters Step : initialize θ with some values (random or otherwise) Step 1: compute γ nk using the current θ Step 2: update θ using the just computed γ nk Step 3: go back to Step 1 This is an example of the EM algorithm a powerful procedure for model estimation with hidden/latent variables Connection with K-means? GMMs provide probabilistic interpretation for K-means K-means is hard GMM or GMMs is soft K-means Posterior γ nk provides a probabilistic assignment for x n to cluster k Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, / 26

### Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

### CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

### Neural Networks and Deep Learning

Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

### Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

### Statistical Pattern Recognition

Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

### Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

### Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

### Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

### Linear Regression (continued)

Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

### Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

### The Expectation-Maximization Algorithm

1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

### Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

### Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

### CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

### Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

### Latent Variable Models

Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

### Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines

### Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

### Machine Learning Lecture 5

Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

### Machine Learning Lecture 5

Machine Learning Lecture 5 Linear Discriminant Functions 25.10.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

### Introduction to Machine Learning

Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

### Mixtures of Gaussians

Mixtures of Gaussians Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 (One) bad case for k-means Clusters may overlap Some clusters may be wider than others 2 1 Density

### Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

### CPSC 540: Machine Learning

CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

### Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

### Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

### K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

### ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

### Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

### Statistical Pattern Recognition

Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization

### Expectation maximization

Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

### 6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

### Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

### Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

### Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

### Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

### Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 11-13, 2018 Prof. Michael Paul Unsupervised Naïve Bayes Last week

### COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

### Pattern Recognition and Machine Learning

Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

### Nonparametric Bayesian Methods (Gaussian Processes)

[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

### STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

### Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

### Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

### Linear & nonlinear classifiers

Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

### Lecture 11: Unsupervised Machine Learning

CSE517A Machine Learning Spring 2018 Lecture 11: Unsupervised Machine Learning Instructor: Marion Neumann Scribe: Jingyu Xin Reading: fcml Ch6 (Intro), 6.2 (k-means), 6.3 (Mixture Models); [optional]:

### Logistic Regression. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, / 48

Logistic Regression Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, 2017 1 / 48 Outline 1 Administration 2 Review of last lecture 3 Logistic regression

### Gaussian Mixture Models

Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

### Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

### Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

### Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

### UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

### Expectation Maximization

Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

### FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

### Introduction to Machine Learning Midterm Exam

10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

### Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

### Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

### Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets

### Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

### Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

### Data Mining Techniques

Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of

### Linear & nonlinear classifiers

Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

### Linear Models for Classification

Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

### CMU-Q Lecture 24:

CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

### Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

### Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

### Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

### Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

### Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2

### Introduction to Machine Learning HW6

CS 189 Spring 2018 Introduction to Machine Learning HW6 Your self-grade URL is http://eecs189.org/self_grade?question_ids=1_1,1_ 2,2_1,2_2,3_1,3_2,3_3,4_1,4_2,4_3,4_4,4_5,4_6,5_1,5_2,6. This homework is

### STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

### CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#\$

### Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

### Lecture 3: Pattern Classification

EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

### Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

### Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

### Linear Dynamical Systems

Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

### Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators

### Lecture 6. Regression

Lecture 6. Regression Prof. Alan Yuille Spring 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

### Introduction to Machine Learning Midterm Exam Solutions

10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

### Association studies and regression

Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

### Machine Learning, Midterm Exam

10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

### Least Mean Squares Regression

Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method

### Support Vector Machines

Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

### Support Vector Machines and Kernel Methods

2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

### Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

### Gaussian Mixture Models

Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro

### Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

### Expectation Maximization

Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

### Logistic Regression. Machine Learning Fall 2018

Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

### Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

### Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

### Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

### Neural Network Training

Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

### Discriminative Models

No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

### Midterm: CS 6375 Spring 2018

Midterm: CS 6375 Spring 2018 The exam is closed book (1 cheat sheet allowed). Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional

### Computational statistics

Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

### Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines