Module Master Recherche Apprentissage et Fouille
|
|
- Hannah Farmer
- 5 years ago
- Views:
Transcription
1 Module Master Recherche Apprentissage et Fouille Michele Sebag Balazs Kegl Antoine Cornuéjols 19 novembre 2008
2 Unsupervised Learning Clustering Data Streaming Application: Clustering of EGEE Jobs, Intrusion Detection
3 Clustering K-Means Expectation Maximization Selecting the number of clusters Affinity propagation Scalability
4 Clustering elias.pampalk/music/
5 Clustering Questions Hard or soft? Hard: find a partition of the data Soft: estimate the distribution of the data as a mixture of components. Parametric vs non Parametric? Parametric: number K of clusters is known Non-Parametric: find K (wrapping a parametric clustering algorithm) Caveat: Complexity Outliers Validation
6 Formal Background Notations E {x 1,... x N } dataset N number of data points K number of clusters given or optimized C k k-th cluster Hard clustering τ(i) index of cluster containing x i f k k-th model Soft clustering γ k (i) Pr(x i f k ) Solution Hard Clustering Partition = (C 1,... C k ) Soft Clustering i k γ k(i) = 1
7 Formal Background, 2 Quality / Cost function Measures how well the clusters characterize the data (log)likelihood dispersion K k=1 1 C k 2 d(x i, x j ) 2 x i,x j in C k soft clustering hard clustering Tradeoff Quality increases with K Regularization needed to avoid one cluster per data point
8 Clustering vs Classification Marina Meila Classification Clustering K # classes (given) # clusters (unknown) Quality Generalization error many cost functions Focus on Test set Training set Goal Prediction Interpretation Analysis discriminant exploratory Field mature new
9 Non-Parametric Clustering Hierarchical Clustering Principle agglomerative (join nearest clusters) divisive (split most dispersed cluster) CONS: Complexity O(N 3 )
10 Hierarchical Clustering, example
11 Influence of distance/similarity d(x, x ) = i (x i x i )2 1 1 P i x i x i x. x P i (x i x)(x i x ) x x. x x Euclidean distance Cosine angle Pearson
12 Parametric Clustering K is known Algorithms based on distances K-means graph / cut Algorithms based on models Mixture of models: EM algorithm
13 Clustering K-Means Expectation Maximization Selecting the number of clusters Affinity propagation Scalability
14 K-Means Algorithm 1. Init: Uniformly draw K points x ij in E Set C j = {x ij } 2. Repeat 3. Draw without replacement x i from E 4. τ(i) = argmin k=1...k {d(x i, C k )} find best cluster for x i 5. C τ(i) = C τ(i) xi add x i to C τ(i) 6. Until all points have been drawn 7. If partition C 1... C K has changed Stabilize Define x ik = best point in C k, C k = {x ik }, goto 2. Algorithm terminates
15 K-Means, Knobs Knob 1 : define d(x i, C k ) favors min{d(x i, x j ), x j C k } long clusters * average{d(x i, x j ), x j C k } compact clusters max{d(x i, x j ), x j C k } spheric clusters Knob 2 : define best in C k Medoid argmin i { x j C k d(x i, x j )} 1 * Average C k x j C k x j (does not belong to E)
16 No single best choice
17 K-Means, Discussion PROS Complexity O(K N) Can incorporate prior knowledge initialization CONS Sensitive to initialization Sensitive to outliers Sensitive to irrelevant attributes
18 K-Means, Convergence For cost function L( ) = k i,j / τ(i)=τ(j)=k d(x i, x j ) for d(x i, C k ) = average {d(x i, x j ), x j C k } for best in C k = average of x j C k K-means converges toward a (local) minimum of L.
19 K-Means, Practicalities Initialization Uniform sampling Average of E + random perturbations Average of E + orthogonal perturbations Extreme points: select x i1 uniformly in E, then Select x ij j = argmax{ d(x i, x ik )} k=1 Pre-processing Mean-centering the dataset
20 Clustering K-Means Expectation Maximization Selecting the number of clusters Affinity propagation Scalability
21 Model-based clustering Mixture of components Density f = K k=1 π kf k f k : the k-th component of the mixture γ k (i) = π kf k (x) f (x) induces C k = {x j / k = argmax{γ k (j)}} Nature of components: prior knowledge Most often Gaussian: f k = (µ k, Σ k ) Beware: clusters are not always Gaussian...
22 Model-based clustering, 2 Search space Solution : (π k, µ k, Σ k ) K k=1 = θ Criterion: log-likelihood of dataset l(θ) = log(pr(e)) = to be maximized. N log Pr(x i ) i=1 N K log(π k f k (x i )) i=1 k=1
23 Model-based clustering with EM Formalization Define z i,k = 1 iff x i belongs to C k. E[z i,k ] = γ k (i) Expectation of log likelihood prob. x i generated by π k f k E[l(θ)] N i=1 K k=1 γ i(k) log(π k f k (x i )) = N K i=1 k=1 γ i(k) log π k + N K i=1 k=1 γ i(k) log f k (x i ) EM optimization E step Given θ, compute γ k (i) = π kf k (x i ) f (x) M step Given γ k (i), compute θ = (π k, µ k, Σ k ) = argmine[l(θ)]
24 Maximization step π k : Fraction of points in C k π k = 1 N N γ k (i) i=1 µ k : Mean of C k µk = N i=1 γ k(i)x i N i=1 γ k(i) Σ k : Covariance Σ k = N i=1 γ k(i)(x i µ k )(x i µ k ) N i=1 γ k(i)
25 Clustering K-Means Expectation Maximization Selecting the number of clusters Affinity propagation Scalability
26 Choosing the number of clusters K-means constructs a partition whatever the K value is. Selection of K Bayesian approaches Tradeoff between accuracy / richness of the model Stability Varying the data should not change the result Gap statistics Compare with null hypothesis: all data in same cluster.
27 Bayesian approaches Bayesian Information Criterion BIC(θ) = l(θ) #θ 2 log N Select K = argmax BIC(θ) where #θ = number of free parameters in θ: if all components have same scalar variance σ #θ = K Kd if each component has a scalar variance σ k #θ = K 1 + K(d + 1) if each component has a full covariance matrix Σ k #θ = K 1 + K(d + d(d 1)/2)
28 Gap statistics Principle: hypothesis testing 1. Consider hypothesis H 0 : there is no cluster in the data. E is generated from a no-cluster distribution π. 2. Estimate the distribution f 0,K of L(C 1,... C K ) for data generated after π. Analytically if π is simple Use Monte-Carlo methods otherwise 3. Reject H 0 with confidence α if the probability of generating the true value L(C 1,... C K ) under f 0,K is less than α. Beware: the test is done for all K values...
29 Gap statistics, 2 Algorithm Assume E extracted from a no-cluster distribution, e.g. a single Gaussian. 1. Sample E according to this distribution 2. Apply K-means on this sample 3. Measure the associated loss function Repeat : compute the average L 0 (K) and variance σ 0 (K) Define the gap: Rule Select min K s.t. Gap(K) = L 0 (K) L(C 1,... C K ) Gap(K) Gap(K + 1) σ 0 (K + 1) What is nice: also tells if there are no clusters in the data...
30 Stability Principle Consider E perturbed from E Construct C 1,... C K from E Evaluate the distance between (C 1,... C K ) and (C 1,... C K ) If small distance (stability), K is OK Distortion D( ) Define S S ij = < x i, x j > (λ i, v i ) i-th (eigenvalue, eigenvector) of S X X i,j = 1 iff x i C j D( ) = i x i µ τ(i) 2 = tr(s) tr(x SX ) Minimal distortion D = tr(s) K 1 k=1 λ k
31 Stability, 2 Results has low distortion (µ 1,... µ K ) close to space (v 1,... v K ). 1, and 2 have low distortion close (and close to optimal clustering) Counter-example Meila ICML 06
32 From K-Means to K-Centers Assumptions for K-Means A distance or dissimilarity Possibility to create artefacts Not applicable in some domains barycenters average molecule? average sentence? K-Centers, position of the problem A combinatorial optimization problem. Find σ : {1,..., N} {1,..., N} minimizing: E[σ] = N d(x i, x σ(i) ) i=1 (What is missing here?)
33 Clustering K-Means Expectation Maximization Selecting the number of clusters Affinity propagation Scalability
34 Motivations Clustering: Unsupervised learning Affinity Propagation and State of the art K-means K-centers AP exemplar artefact actual point actual point parameter K K s (penalty) algorithm greedy search greedy search message passing performance not stable not stable stable complexity N K N K N 2 log(n) Clustering by Passing Messages Between Data Points. B.J. Frey, D. Dueck. Science 2007
35 Affinity Propagation Given E = {e 1, e 2,..., e N } d(e i, e j ) Find σ : E E such that: where σ = argmax elements their dissimilarity σ(e i ), exemplar representing e i N S(e i, σ(e i )) i=1 { S(ei, e j ) = d 2 (e i, e j ) if i j S(e i, e i ) = s s : penalty parameter Particular cases s =, only one exemplar s = 0, every point is an exemplar 1 cluster N clusters
36 Affinity Propagation, Principle Algorithm: Message propagation Responsibility r(i, k) Availability a(i, k). could x k be examplar for x i
37 Affinity Propagation, 2 Two types of messages r(i, k) : Responsibility of i to k a(i, k) : Availability of i as examplar for k Rules of propagation r(i, k) = S(e i, e k ) max k,k k{a(i, k ) + S(e i, e k )} r(k, k) = S(e k, e k ) max k,k k{s(e k, e k )} a(i, k) = min {0, r(k, k) + i,i i,k max{0, r(i, k)}} a(k, k) = i,i k max{0, r(i, k)}
38 Iterations of Message passing
39 Iterations of Message passing
40 Iterations of Message passing
41 Iterations of Message passing
42 Iterations of Message passing
43 Iterations of Message passing
44 Iterations of Message passing
45 Iterations of Message passing
46 Affinity Propagation, cont d
47 Affinity Propagation, cont d
48 Affinity Propagation in a Nutshell WHEN to use it? When averages don t make sense PROS vs K-centers Lower distortion e.g., molecules; documents D([σ]) = N i=1 d 2 (e i, σ(e i )) CONS: Computational complexity Similarity computation: O(N 2 ) Message passing: O(N 2 log N)
49 Clustering K-Means Expectation Maximization Selecting the number of clusters Affinity propagation Scalability
50 Hierarchical AP Clustering data streams: Theory and practice. S. Guha, A. Meyerson, N. Mishra, R. Motwani. TKDE 2003.
51 Weighted AP AP WAP e i (e i, n i ) S(e i, e j ) n i S(e i, e j ) S(e i, e i ) S(e i, e i ) + (n i 1) ɛ With S(e i, e j ) price for e i to select e j as an exemplar ɛ variance of n i points Proposition WAP AP with duplicated elements
52 Hierarchical WAP Complexity of HiWAP is O(N 3/2 ) can be iteratively reduced to O(N 1+γ )
53 Validation of Hi-WAP on EGEE jobs EGEE (Enabling Grids for E-sciencE) sites in 50 countries 10,000 user access to 80,000 CPUs 300,000 jobs per day Description of jobs (237,087) 4 numeric features 1 symbolic feature e.g. duration of execution name of queue
54 Validation of Hi-WAP on EGEE jobs Evaluation: Distortion D([σ]) = N d 2 (e i, σ(e i )) i=1 237,087 jobs each job IR 5 CPU 10 (Intel 2.66GHz) Baseline: K-centers best out of runs same overall computational cost
Unsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationComputer Vision Group Prof. Daniel Cremers. 14. Clustering
Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationData Preprocessing. Cluster Similarity
1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationK-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543
K-Means, Expectation Maximization and Segmentation D.A. Forsyth, CS543 K-Means Choose a fixed number of clusters Choose cluster centers and point-cluster allocations to minimize error can t do this by
More informationUnsupervised Learning. k-means Algorithm
Unsupervised Learning Supervised Learning: Learn to predict y from x from examples of (x, y). Performance is measured by error rate. Unsupervised Learning: Learn a representation from exs. of x. Learn
More informationPrinciples of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata
Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationMultivariate Statistics
Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationToward Autonomic Grids: Analyzing the Job Flow with Affinity Streaming
Toward Autonomic Grids: Analyzing the Job Flow with Affinity Streaming Xiangliang Zhang, Cyril Furtlehner Julien Perez, Cécile Germain-Renaud, Michèle Sebag TAO team INRIA CNRS Université de Paris-Sud,
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationMore on Unsupervised Learning
More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data
More informationPattern Classification
Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 6345 Automatic Speech Recognition Semi-Parametric Classifiers 1 Semi-Parametric
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationMultivariate Analysis
Multivariate Analysis Chapter 5: Cluster analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2015/2016 Master in Business Administration and
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate
More informationManaging Uncertainty
Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationThe Expectation-Maximization Algorithm
The Expectation-Maximization Algorithm Francisco S. Melo In these notes, we provide a brief overview of the formal aspects concerning -means, EM and their relation. We closely follow the presentation in
More informationMachine Learning - MT Clustering
Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationCluster Validity. Oct. 28, Cluster Validity 10/14/ Erin Wirch & Wenbo Wang. Outline. Hypothesis Testing. Relative Criteria.
1 Testing Oct. 28, 2010 2 Testing Testing Agenda 3 Testing Review of Testing Testing Review of Testing 4 Test a parameter against a specific value Begin with H 0 and H 1 as the null and alternative hypotheses
More informationSTATS 306B: Unsupervised Learning Spring Lecture 5 April 14
STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationApplying cluster analysis to 2011 Census local authority data
Applying cluster analysis to 2011 Census local authority data Kitty.Lymperopoulou@manchester.ac.uk SPSS User Group Conference November, 10 2017 Outline Basic ideas of cluster analysis How to choose variables
More informationFinite Singular Multivariate Gaussian Mixture
21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationStatistical Learning. Dong Liu. Dept. EEIS, USTC
Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationData Exploration and Unsupervised Learning with Clustering
Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a
More informationPart I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes
Week 10 Based in part on slides from textbook, slides of Susan Holmes Part I Linear regression & December 5, 2012 1 / 1 2 / 1 We ve talked mostly about classification, where the outcome categorical. If
More informationClassification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationClustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering.
1 / 19 sscott@cse.unl.edu x1 If no label information is available, can still perform unsupervised learning Looking for structural information about instance space instead of label prediction function Approaches:
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationModel-based cluster analysis: a Defence. Gilles Celeux Inria Futurs
Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More information10-701/15-781, Machine Learning: Homework 4
10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one
More informationINRIA Rh^one-Alpes. Abstract. Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis
Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition Halima Bensmail Universite Paris 6 Gilles Celeux INRIA Rh^one-Alpes Abstract Friedman (1989) has proposed a regularization technique
More informationComputational functional genomics
Computational functional genomics (Spring 2005: Lecture 8) David K. Gifford (Adapted from a lecture by Tommi S. Jaakkola) MIT CSAIL Basic clustering methods hierarchical k means mixture models Multi variate
More informationMassachusetts Institute of Technology
Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationBayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses
Bayesian Learning Two Roles for Bayesian Methods Probabilistic approach to inference. Quantities of interest are governed by prob. dist. and optimal decisions can be made by reasoning about these prob.
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationCourse in Data Science
Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an
More informationDirection: This test is worth 250 points and each problem worth points. DO ANY SIX
Term Test 3 December 5, 2003 Name Math 52 Student Number Direction: This test is worth 250 points and each problem worth 4 points DO ANY SIX PROBLEMS You are required to complete this test within 50 minutes
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Kernel Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 2: 2 late days to hand it in tonight. Assignment 3: Due Feburary
More informationBayesian Decision and Bayesian Learning
Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationSparse and Robust Optimization and Applications
Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability
More informationStatistics 202: Data Mining. c Jonathan Taylor. Model-based clustering Based in part on slides from textbook, slides of Susan Holmes.
Model-based clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Model-based clustering General approach Choose a type of mixture model (e.g. multivariate Normal)
More informationEM Algorithm LECTURE OUTLINE
EM Algorithm Lukáš Cerman, Václav Hlaváč Czech Technical University, Faculty of Electrical Engineering Department of Cybernetics, Center for Machine Perception 121 35 Praha 2, Karlovo nám. 13, Czech Republic
More informationCS181 Midterm 2 Practice Solutions
CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph
More information