applications Rome, 9 February Università di Roma La Sapienza Robust model based clustering: methods and applications Francesco Dotto Introduction
|
|
- Alvin Archibald Gallagher
- 5 years ago
- Views:
Transcription
1 model : fuzzy model : Università di Roma La Sapienza Rome, 9 February
2 Outline of the presentation model : fuzzy 1 General motivation 2 algorithm on trimming and reweigthing. 3 algorithm on trimming and geometric 4 clusterwise fuzzy regression model on trimming 2 and 4 : joint work with Alessio Farcomeni, Luis Angel Garcia Escudero and Agustin Mayo Iscar. 3 : joint work with Alessio Farcomeni
3 General Motivation model : fuzzy The presence of outlying observations may lead to unsatisfactory results like i.e: 1 Hetherogeneus groups artificially joined together 2 Spurious clusters containing only outlying observations may be detected 3 estimation of clusters parameters may be inconsistent To deal with these outlined issues we propose some robust methods on trimming
4 Arising problems model : Two main problems affect the existing robust method (detailed review in [Farcomeni and Greco, 2015] and [Ritter, 2015]): 1 Tuning the parameter establishing the proportion of trimmed observations 2 Choosing a proper constraint on the scatter matrices (required to avoid the effect of spurious maximizers) fuzzy
5 rtclust algorithm model : For issue 1 we propose the usage of the rtclust procedure ([ et al., 2015b]): fuzzy x 2 x 2 x True assignments x Initial x Final x 2 x 2 x True assignments x Initial x Final We start from the output of a robust method (i.e. the tclust [García-Escudero et al., 2008]) with a very high trimming level and then apply reweighting. x x
6 rtclust approach sketch of the algorithm model : fuzzy The reweighting proceeds, for pre-fixed α 1 ą α 2 ą... ą α L for each l 1,..., L as 1 Sort the Mahalanobis distances d p1q,, d pnq and take the sets: A tx i : d i ď d prnp1 αl qsqu and B tx i : d i ď χ 2 p,α L u (1) 2 Fix A X B th 1,, H K u, with " * H j x i P AXB such that d Σ l px i, m l j jq min d Σ q 1,...,k l px q i, mqq l 3 Update the clusters proportions and current contamination level 4 Update clusters centers and scatter matrices (2)
7 rtclust approach 5% contaminated dataset: Estimated location parameter model : Estimated µ p=2 p=4 p=6 rtlcust H&R Iterated H&R tclust MSE µ^ fuzzy 0.0 rtclust33 rtclust20 HR33 HR20 HR_it33 HR_it20 tclust33 tclust20 tclust10 tclust05 rtclust33 rtclust20 HR33 HR20 HR_it33 HR_it20 tclust33 tclust20 tclust10 tclust05 rtclust33 rtclust20 HR33 HR20 HR_it33 HR_it20 tclust33 tclust20 tclust10 tclust05 Figure 1: Model s performance for µ estimation when p 2, 4, 6. Whenever the value exceeds the scale of the plot we put a IJ
8 Simulations results 5% contaminated dataset: Estimated scale parameter model : Estimated Σ p=2 p=4 p=6 rtlcust H&R Iterated H&R tclust MSE Σ^ fuzzy rtclust33 rtclust20 HR33 HR20 HR_it33 HR_it20 tclust33 tclust20 tclust10 tclust05 rtclust33 rtclust20 HR33 HR20 HR_it33 HR_it20 tclust33 tclust20 tclust10 tclust05 rtclust33 rtclust20 HR33 HR20 HR_it33 HR_it20 tclust33 tclust20 tclust10 tclust05 Figure 2: Model s performance for Σ estimation when p 2, 4, 6 Whenever the value exceeds the scale of the plot we put a IJ
9 Simulations results 5% contaminated dataset: Estimated contamination level model : p=2 Estimated α p=4 p=6 rtlcust H&R Iterated H&R tclust α^ fuzzy rtclust33 rtclust20 HR33 HR20 HR_it33 HR_it20 tclust33 tclust20 tclust10 tclust05 rtclust33 rtclust20 HR33 HR20 HR_it33 HR_it20 tclust33 tclust20 tclust10 tclust05 rtclust33 rtclust20 HR33 HR20 HR_it33 HR_it20 tclust33 tclust20 tclust10 tclust05 Figure 3: Model s performance for Σ estimation when p 2, 4, 6 Whenever the value exceeds the scale of the plot we put a IJ
10 Constraining the obtained solution Target function model : fuzzy Generally speaking a robust procedure aims to maximize the objective function given by: where: ź K ź j ź j f px i ; µ j ; Σ j q g ψi px i q j 1 ipr j irr j 1 f p q stands for the multivariate normal density 2 g ψi p q is the contaminating density with mild probabilistic assumption on it 3 K is the number of groups 4 R Ť K j 1 R j is the set of the clean observation and is such that #R rnp1 αqs (3)
11 Constraining the obtained solution Why? model : fuzzy Equation (3) is unbounded, for that reason the effect of the spurious maximizers must be controlled. Spurious may defined as parameter points having 0 standard deviation for some components and can be generated by any small number of sample points grouped sufficiently close together. Such points make the objective function tend to infinity and thus a spurious solution is returned as output of the algorithm
12 What spurious maximizers actually are??? A graphical representation model : Constrained Variances Uncostrained Variances fuzzy Figure 4: function y y x x Effect of constraint in the maximization of the objective
13 Constraining the obtained solution Some feasible solutions model : fuzzy Many solution to avoid extreme cases like Figure 12 can be used 1 Constraining on the ratio between between the highest eigenvalue and the lowest eigenvalue is one of the possible solutions: max j 1,2...,K max h 1,2...,p λ h pσ j q min j 1,2...,K min h 1,2...,p λ h pσ j q M n m n ď c with c P R Advantages: Interpetability, easy to be implemented. Disadvantages:When constraint the scale invariance of the obtained estimators is lost 2 Inserting on the estimated clusters shapes (4)
14 How to obtain models? model : fuzzy Let us consider the eigenvalue decomposition given by ([Celeux and Govaert, 1995]): where: Σ k λ k D k A k D T k for k 1, 2,... K (5) 1 λ k Σ k 1{d is the volume of the k th cluster 2 A k is an orthogonal matrix with the eigenvalues of Σ k, the shape of each cluster 3 D k is a matrix whose columns are given by the eigenvectors of Σ k and it determines the direction of each cluster.
15 Different Parametrization... models mclust proposal model : fuzzy proposal (work in progress) is to provide a robust version of all the model standing in table Model Name Parametrization ER Invariance EII λi Not required Isometric transformations VII λ k I Not required Isometric transformations EEI λa Not required Scaling VEI λ k A Not required Scaling EVI λa k Required Traslation VVI λ k A k Required Traslation EEE λdad T Not required Linear transformations EEV λd k ADk T Not required Linear transformations VEV λ k D k ADk T Not required Linear transformations VVV λ k D k A k Dk T Required Traslation Table 1: List of possible models obtained imposing shapes to the detected clusters
16 proposal Motivation and State of Art model : fuzzy Motivations: 1 Obtaining invariance properties of the estimators 2 Usage of geometric, interpretable output for the researcher State of art: 1 Good performance in simulation w.r.t existing proposals 2 Evaluating a suitable method to choose the proper model
17 Fuzzy and regression Merging the two approaches model : fuzzy In [ et al., 2015a] - in review for ADAC, many improvements required...crossed fingers!!! - we proposed a robust clusterwise regression model on fuzziness: 1 Trimming allows us to reach robustness 2 Fuzziness allows us to deal with bridge uncertainty around the assignment Generally speaking we aim to maximize the following objective function: nÿ i 1 j 1 kÿ uij m log `p j f py i ; x 1 i b j ` bj 0, sj 2 q (6)
18 Linear model : fuzzy methods are generally on estimating k groups around suitably defined centroids óó Each unit is assigned to each groups minimizing a function of a distance from the centroid. Linear methods are used to search k structure around an explanatory variable óó The assignment of a unit to each group is on the minimization of the regression error.
19 Fuzzy model : fuzzy For a given dataset x px 1,..., x n q with x i P R p, a unit can be assigned to a cluster 1 ď j ď k following two approaches: Crispy assignments where u ij u ij P t0, 1u (7) # 1 if x i P j 0 if x i R j Each observation belong to only one cluster Fuzzy assignments u ij P r0, 1s (8) where # 1 if x i full assignment in j u ij 0 if x i no assignment in j Intermediate assignments are taken in account
20 Sketch of the simulation study MSE of the estimated β model : (a) (b) (c) (d) (e) (f) (g) (h) fuzzy creg EM A creg FTCR creg EM A creg FTCR creg EM A creg FTCR creg EM A creg FTCR
21 I model : fuzzy Celeux, G. and Govaert, G. (1995). Gaussian parsimonious models. Pattern recognition, 28: , F., Farcomeni, A., García-Escudero, L., and Mayo-Iscar, A. (2015a). A fuzzy approach to robust clusterwise regression. In review for Advances in Data Analysis and Classification., F., Farcomeni, A., García-Escudero, L., and Mayo Iscar, A. (2015b). The rtclust procedure for robust. In 10 th Scientific Meeting of the CLassification and Data Analysis Group of the Italian Statstical Sociey. Book of Astract. CUEC.
22 II model : fuzzy Farcomeni, A. and Greco, L. (2015). Methods for Data Reduction. Chapman & Hall/CRC Press. García-Escudero, L., Gordaliza, A., Matrán, C., and Mayo-Iscar, A. (2008). A general trimming approach to robust cluster analysis. Ann Stat, 36: Ritter, G. (2015). Cluster Analysis and Variable Selection. Chapman & Hall/CRC Press.
The power of (extended) monitoring in robust clustering
Statistical Methods & Applications manuscript No. (will be inserted by the editor) The power of (extended) monitoring in robust clustering Alessio Farcomeni and Francesco Dotto Received: date / Accepted:
More informationA robust clustering procedure with unknown number of clusters
A robust clustering procedure with unknown number of clusters Una procedura di cluster analysis robusta con un numero di cluster incognito Francesco Dotto and Alessio Farcomeni Abstract A new methodology
More informationFamilies of Parsimonious Finite Mixtures of Regression Models arxiv: v1 [stat.me] 2 Dec 2013
Families of Parsimonious Finite Mixtures of Regression Models arxiv:1312.0518v1 [stat.me] 2 Dec 2013 Utkarsh J. Dang and Paul D. McNicholas Department of Mathematics & Statistics, University of Guelph
More informationIntroduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy
Introduction to Robust Statistics Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Multivariate analysis Multivariate location and scatter Data where the observations
More informationParsimonious Gaussian Mixture Models
Parsimonious Gaussian Mixture Models Brendan Murphy Department of Statistics, Trinity College Dublin, Ireland. East Liguria West Liguria Umbria North Apulia Coast Sardina Inland Sardinia South Apulia Calabria
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High
More informationINRIA Rh^one-Alpes. Abstract. Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis
Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition Halima Bensmail Universite Paris 6 Gilles Celeux INRIA Rh^one-Alpes Abstract Friedman (1989) has proposed a regularization technique
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationModel-based cluster analysis: a Defence. Gilles Celeux Inria Futurs
Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.
More informationEXPLORING THE NUMBER OF GROUPS IN ROBUST MODEL-BASED CLUSTERING
EXPLORING THE NUMBER OF GROUPS IN ROBUST MODEL-BASED CLUSTERING L.A. García-Escudero, A. Gordaliza, C. Matrán and A. Mayo-Iscar Departamento de Estadística e Investigación Operativa Universidad de Valladolid.
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationForecasting Wind Ramps
Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators
More informationReduction of Random Variables in Structural Reliability Analysis
Reduction of Random Variables in Structural Reliability Analysis S. Adhikari and R. S. Langley Department of Engineering University of Cambridge Trumpington Street Cambridge CB2 1PZ (U.K.) February 21,
More informationMotivating the Covariance Matrix
Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationModel-Based Clustering of High-Dimensional Data: A review
Model-Based Clustering of High-Dimensional Data: A review Charles Bouveyron, Camille Brunet To cite this version: Charles Bouveyron, Camille Brunet. Model-Based Clustering of High-Dimensional Data: A review.
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid ASMDA Brest May 2005 Introduction Modern data are high dimensional: Imagery:
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationLinear vector spaces and subspaces.
Math 2051 W2008 Margo Kondratieva Week 1 Linear vector spaces and subspaces. Section 1.1 The notion of a linear vector space. For the purpose of these notes we regard (m 1)-matrices as m-dimensional vectors,
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationJournal of Statistical Software
JSS Journal of Statistical Software January 2012, Volume 46, Issue 6. http://www.jstatsoft.org/ HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data Laurent
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationDegenerate Expectation-Maximization Algorithm for Local Dimension Reduction
Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Xiaodong Lin 1 and Yu Zhu 2 1 Statistical and Applied Mathematical Science Institute, RTP, NC, 27709 USA University of Cincinnati,
More informationMULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES
MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES S. Visuri 1 H. Oja V. Koivunen 1 1 Signal Processing Lab. Dept. of Statistics Tampere Univ. of Technology University of Jyväskylä P.O.
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationMachine Learning for Data Science (CS4786) Lecture 12
Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We
More informationModel selection criteria in Classification contexts. Gilles Celeux INRIA Futurs (orsay)
Model selection criteria in Classification contexts Gilles Celeux INRIA Futurs (orsay) Cluster analysis Exploratory data analysis tools which aim is to find clusters in a large set of data (many observations
More informationHeeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University
Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future
More informationarxiv: v1 [stat.me] 7 Aug 2015
Dimension reduction for model-based clustering Luca Scrucca Università degli Studi di Perugia August 0, 05 arxiv:508.07v [stat.me] 7 Aug 05 Abstract We introduce a dimension reduction method for visualizing
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationLecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016
Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics
More informationLouis Roussos Sports Data
Louis Roussos Sports Data Rank the sports you most like to participate in, 1 = favorite, 7 = least favorite. There are n=130 rank vectors. > sportsranks Baseball Football Basketball Tennis Cycling Swimming
More informationLecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016
Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationAccurate and Powerful Multivariate Outlier Detection
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 11, Dublin (Session CPS66) p.568 Accurate and Powerful Multivariate Outlier Detection Cerioli, Andrea Università di Parma, Dipartimento di
More informationFinite Mixture Models and Clustering
Finite Mixture Models and Clustering Mohamed Nadif LIPADE, Université Paris Descartes, France Nadif (LIPADE) EPAT, May, 2010 Course 3 1 / 40 Introduction Outline 1 Introduction Mixture Approach 2 Finite
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationMore on Unsupervised Learning
More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationIndependent Component (IC) Models: New Extensions of the Multinormal Model
Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationLearning sets and subspaces: a spectral approach
Learning sets and subspaces: a spectral approach Alessandro Rudi DIBRIS, Università di Genova Optimization and dynamical processes in Statistical learning and inverse problems Sept 8-12, 2014 A world of
More informationA Bayesian Treatment of Linear Gaussian Regression
A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,
More informationTable of Contents. Multivariate methods. Introduction II. Introduction I
Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationDetection of outliers in multivariate data:
1 Detection of outliers in multivariate data: a method based on clustering and robust estimators Carla M. Santos-Pereira 1 and Ana M. Pires 2 1 Universidade Portucalense Infante D. Henrique, Oporto, Portugal
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationMATHEMATICS. Units Topics Marks I Relations and Functions 10
MATHEMATICS Course Structure Units Topics Marks I Relations and Functions 10 II Algebra 13 III Calculus 44 IV Vectors and 3-D Geometry 17 V Linear Programming 6 VI Probability 10 Total 100 Course Syllabus
More informationMultivariate Statistics
Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More informationClustering Oligarchies
Margareta Ackerman Shai Ben-David David Loker Sivan Sabato Caltech University of Waterloo University of Waterloo Microsoft Research Abstract We investigate the extent to which clustering algorithms are
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationPrincipal Component Analysis and Linear Discriminant Analysis
Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationSolution to Series 7
Prof. r. M. Maathuis Multivariate tatistics 2014 olution to eries 7 1. a) Computing the 2 clusters with the K-means method. > set.seed(10) > kmean.bank
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationComputational Connections Between Robust Multivariate Analysis and Clustering
1 Computational Connections Between Robust Multivariate Analysis and Clustering David M. Rocke 1 and David L. Woodruff 2 1 Department of Applied Science, University of California at Davis, Davis, CA 95616,
More informationMachine Learning 11. week
Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately
More informationBubble Clustering: Set Covering via Union of Ellipses
Bubble Clustering: Set Covering via Union of Ellipses Matt Kraning, Arezou Keshavarz, Lei Zhao CS229, Stanford University Email: {mkraning,arezou,leiz}@stanford.edu Abstract We develop an algorithm called
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and
More informationUnsupervised Learning: Dimensionality Reduction
Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationAssignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran
Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran 1. Let A m n be a matrix of real numbers. The matrix AA T has an eigenvector x with eigenvalue b. Then the eigenvector y of A T A
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationPattern Classification
Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 6345 Automatic Speech Recognition Semi-Parametric Classifiers 1 Semi-Parametric
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationDimension Reduction for Model-based Clustering via Mixtures of Multivariate t-distributions
Dimension Reduction for Model-based Clustering via Mixtures of Multivariate t-distributions by Katherine Morris A Thesis presented to The University of Guelph In partial fulfilment of requirements for
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationComputational Statistics and Data Analysis
Computational Statistics and Data Analysis 65 (2013) 29 45 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Robust
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Goal (Lecture): To present Probabilistic Principal Component Analysis (PPCA) using both Maximum Likelihood (ML) and Expectation Maximization
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationProblem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30
Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain
More informationRobust estimation of principal components from depth-based multivariate rank covariance matrix
Robust estimation of principal components from depth-based multivariate rank covariance matrix Subho Majumdar Snigdhansu Chatterjee University of Minnesota, School of Statistics Table of contents Summary
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationData Exploration and Unsupervised Learning with Clustering
Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a
More information9 Multi-Model State Estimation
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State
More informationRestricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model
Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives
More informationLeast Squares Optimization
Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques, which are widely used to analyze and visualize data. Least squares (LS)
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More information