Mul$variate clustering. Astro 585 Spring 2013

Similar documents
Mul$variate clustering & classifica$on with R

Mixture Models. Michael Kuhn

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Lectures in AstroStatistics: Topics in Machine Learning for Astronomers

Unsupervised clustering of COMBO-17 galaxy photometry

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on

Multivariate Statistics

Unsupervised machine learning

Data Exploration and Unsupervised Learning with Clustering

More on Unsupervised Learning

New Methods of Mixture Model Cluster Analysis

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

The Challenges of Heteroscedastic Measurement Error in Astronomical Data

ASTRO504 Extragalactic Astronomy. 2. Classification

Clustering using Mixture Models

Overview of clustering analysis. Yuehua Cui

Ac#ve Galaxies, Colliding Galaxies

Introduction to astrostatistics

Multi-wavelength Astronomy

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Multivariate Analysis

Statistical Machine Learning

Galaxy Formation Now and Then

Applying cluster analysis to 2011 Census local authority data

Dark Matter ASTR 2120 Sarazin. Bullet Cluster of Galaxies - Dark Matter Lab

Part two of a year-long introduction to astrophysics:

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America

The Universe of Galaxies: from large to small. Physics of Galaxies 2012 part 1 introduction

Data Preprocessing. Cluster Similarity

Clusters of galaxies

Chapter 14 Combining Models

Chapter 17. Active Galaxies and Supermassive Black Holes

Galaxies and Hubble s Law

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

Lecture Outlines. Chapter 24. Astronomy Today 8th Edition Chaisson/McMillan Pearson Education, Inc.

PATTERN CLASSIFICATION

L11: Pattern recognition principles

SKINAKAS OBSERVATORY. Astronomy Projects for University Students PROJECT GALAXIES

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

Regression Clustering

Galaxies Guiding Questions

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

The Classification of Galaxies

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Unsupervised Learning

Sta$s$cs in astronomy and the SAMSI ASTRO Program

GRB history. Discovered 1967 Vela satellites. classified! Published 1973! Ruderman 1974 Texas: More theories than bursts!

Lecture 19: Galaxies. Astronomy 111

Science Olympiad Astronomy C Division Event Golden Gate Invitational

Galaxies with Active Nuclei. Active Galactic Nuclei Seyfert Galaxies Radio Galaxies Quasars Supermassive Black Holes

Lecture 10am 19 April 2005

Classifying Galaxy Morphology using Machine Learning

CS 6140: Machine Learning Spring 2016

Chapter 30. Galaxies and the Universe. Chapter 30:

Galaxies. CESAR s Booklet

BAYESIAN DECISION THEORY

CS6220: DATA MINING TECHNIQUES

Doing astronomy with SDSS from your armchair

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce

Galaxies. Galaxy Diversity. Galaxies, AGN and Quasars. Physics 113 Goderya

Discriminant Analysis and Statistical Pattern Recognition

HALTON ARP A MODERN DAY GALILEO

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Quasars are supermassive black holes, found in the centers of galaxies Mass of quasar black holes = solar masses

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Normal Galaxies (Ch. 24) + Galaxies and Dark Matter (Ch. 25) Symbolically: E0.E7.. S0..Sa..Sb..Sc..Sd..Irr

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Real Astronomy from Virtual Observatories

THE EMMIX SOFTWARE FOR THE FITTING OF MIXTURES OF NORMAL AND t-components

24.1 Hubble s Galaxy Classification

Star systems like our Milky Way. Galaxies

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Forecasting Wind Ramps

Modern Astronomy Review #1

Ultra Luminous Infared Galaxies. Yanling Wu Feb 22 nd,2005

2. Can observe radio waves from the nucleus see a strong radio source there Sagittarius A* or Sgr A*.

Hierarchical Bayesian Modeling

Powering the Universe with Supermassive Black Holes. Steve Ehlert and Paul Simeon

Characterizing Stars

Characterizing Stars. Guiding Questions. Parallax. Careful measurements of the parallaxes of stars reveal their distances

Chapter 15 2/19/2014. Lecture Outline Hubble s Galaxy Classification. Normal and Active Galaxies Hubble s Galaxy Classification

Lecture Outlines. Chapter 25. Astronomy Today 7th Edition Chaisson/McMillan Pearson Education, Inc.

An introduction to clustering techniques

A Unified Model for AGN. Ryan Yamada Astro 671 March 27, 2006

Gamma-ray variability of radio-loud narrow-line Seyfert 1 galaxies

Exploiting Sparse Non-Linear Structure in Astronomical Data

2019 Astronomy Team Selection Test

AN INTRODUCTIONTO MODERN ASTROPHYSICS

The hazy band of the Milky Way is our wheel-shaped galaxy seen from within, but its size

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Computer Vision Group Prof. Daniel Cremers. 14. Clustering

Remarkable Disk and Off Nuclear Starburst Ac8vity in the Tadpole Galaxy as Revealed by the Spitzer Space Telescope

9. High-level processing (astronomical data analysis)

Lecture 9. Quasars, Active Galaxies and AGN

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

Hubble s Law. Tully-Fisher relation. The redshift. λ λ0. Are there other ways to estimate distances? Yes.

Transcription:

Mul$variate clustering Astro 585 Spring 2013

Astronomical context Astronomers have constructed classifica<ons of celes<al objects for centuries: Asteros (fixed stars) vs. planetos (roving stars) [Greece, >2Kyr ago] Luminosae, Nebulosae, Occultae [Hodierna, mid- 17 th c] Comet orbits & morphology [Alexander 1850, Lardner 1853, Barnard 1891] Stellar spectra: 6 classes (Secchi 1860s), 7 classes (Pickering/Cannon 1900-20s), 10 classes w/ brown dwarfs (Kirkpatrick 2005) Variable stars: 6+5 classes (Townley 1913), ~80 classes (Samus 2009) Galaxy morphology: 6+3 classes (Hubble 1926) SupernovaeL Ia, Ib, Ic, Iib, IIP, Iin (Tura`o 2003) Ac<ve galac<c nuclei: Seyfert gal, radio gal, LINERs, quasars, QSOs, BL Lac, blazars Gamma ray bursts: short, long, intermediate (Kouveliotou 1993; Mukherjee 1998) Protostars/PMS stars: Class 0, 0/I, I, II, III (Lada 1992, Andre 1993) In nearly every case, these classes were created by well- argued, but subjec$ve assessment of source proper$es. In sta$s$cal parlance, the problem is called unsupervised clustering of mul2variate data

Tradi$onal clustering methods in astronomy Histogram- like univariate criteria Abell galaxies with 80-129 galaxies within 2 magnitude of their third brightness member are classified as Richness class 2 (Abell 1958) Young stellar objects with infrared colors 0.4<[5.8]- [8.0]<1.1 and 0.0<[3.6]- [4.5]<0.8 are classified as Class II (Allen 2004) Seyfert galaxies are dis<nguished from starburst emission line galaxies by a curve on a 4- emission- line diagram (Veilleux & Osterbrock 1987) In sta$s$cs and computer science, these are `determinis$c decision trees

As with astronomical histograms, the choice of class boundaries are typically chosen heuris<cally without mathema<cal calcula<on

Percola$on or `friends- of- friends algorithm 1. Plot data points in a 2- dimensional diagram 2. Find the closest pair, and call the merged object a `cluster 3. Repeat step 2 un<l some chosen threshold is reached. Some objects will lie in rich clusters, others have one companion, and others are isolated. Turner & Go` Groups of Galaxies I A Catalog ApJS 1976 In sta$s$cs, this is `single linkage hierarchical clustering

Sta$s$cal approach to unsupervised clustering In unsupervised clustering of a mul<variate nxp dataset, the number, loca<on, size and morphology of the data groupings is unknown. There is no `prior knowledge of classes. Nonparametric clustering algorithms: Agglomera<ve hierarchical clustering K- means par<<oning Density- based clustering Parametric clustering algorithms: Normal mixture models Special class on Friday: Isothermal ellipsoid mixture models for young star clusters (Mike Kuhn, PSU)

Nonparametric unsupervised clustering is a very uncertain enterprise, and different algorithms give different outcomes without mathema2cal guidance (e.g. there is no likelihood to maximize or stopping criterion to choose number of clusters). Results should be viewed with great cau2on for scien2fic inference. Parametric unsupervised clustering lies on a stronger founda2on (e.g. there is a likelihood to maximize, and BIC/AIC for model selec2on). But it assumes the clusters in fact follow the chosen parametric form.

Agglomera$ve hierarchical clustering 1. Construct the distance matrix d(x i,x j ) for the dataset, assuming a distance metric (e.g. Euclidean with standardized variables). Call each point a `cluster. 2. Merge two clusters with the smallest `distance. Several common choices for measuring the `distance between a cluster and a new data point: o o o o Minimum distance between any cons<tuent point of the cluster and the new point = single linkage clustering. This procedure is equilvalent to `pruning the minimal spanning tree of the mul<variate dataset. This is the astronomers friends- of- friends or percola<on algorithm. This method is vulnerable to spurious `chaining of dis<nct clusters into elongated superclusters, and is not recommended by sta<s<cians. Average distance between the cons<tuent points of the cluster and the new point = average linkage clustering. This oren gives an intermediate outcome but is scale- depenent. Maximum distance between any cons<tuent point of the cluster and the new point = complete linkage clustering. This is a conserva<ve procedure that tends to give hyperspherical clusters. Minimize the intra- cluster variances (W matrix) = Ward s minimum variance clustering

The result of an agglomera<ve (or divisive) clustering procedure is a dendrogram, or tree, showing the membership of each cluster at each stage of the clustering. There is no mathema2cal basis for choosing where to cut the tree, and thereby establishing the true number of clusters present. Qualita<vely, objects combined at greater `heights in the dendrogram are more dissimilar. Comparison of hierarchical clustering methods Primate scapular shapes N=105, p=7 A. J. Izenman Modern Mul$variate Sta$s$cal Techniques (2008)

k- means par$$oning An important non- hierarchical approach to mul<variate clustering Choose k, the number of cluster suspected to be present Select k seed loca<ons in p- space Itera<vely (re)assign each object to nearest cluster to minimize W Recalculate cluster centroid arer objects are added or subtracted Stop when no reassignments are made ü Oren run with different k and different seeds ü Pairwise distance matrix not calculated or stored good for large datasets ü An NP- hard problem, but computa<onally efficient approxima<ons ü Several computa<onal procedures: Lloyd s algorithm (Voronoi itera<on), EM algorithm, kd- trees, Linde- Buzo- Grey algorithm, ü Important variant: k- medoid par<<oning (to reduce outlier effects) Very important in data mining, computer vision, and other computa>onally intensive problems

Density- based clustering Computer scien<sts and sta<s<cians have developed methods for cases familiar to astronomers: where several clusters may exist in a background of unclustered objects. The methods assign objects either to clusters or to an unclustered background. Unlike k- means, these methods do not need a specified number of clusters. Friedman & Fisher (1998) bump hun$ng (PRIM): progressively shrink hyperrectangles to increase the enclosed density of points with a preset minimum popula<on. Remove box from the dataset, and repeat. User interac<on permi`ed. Related to CART. DBSCAN (density- based spa<al clusters of applica<ons with noise) by Ester et al. (1996). User specifies minimum popula<on and maximum extent (`reach ) of a cluster. A cluster is expanded based on a single- linkage criterion. Others include BIRCH, DENCLUE, CHAMAELEON, OPTICS,

Normal mixture models These are parametric regression models where the mul<variate dataset is assumed to consist of k mul<variate normal (MVN) clusters. Each cluster has a hyperellipsoidal morphology extending over the en<re space with mean vector µ j and covariance matrix Σ j where j=1,, p. The model has 2kp+k+1 parameters: k means and k variances in p dimensions, k mixture weights, and k itself. Parameters are es<mated by MLE using the EM Algorithm: Seed values of k, µ j, and lower bound to Σ j must be provided E step: Calculate likelihood of each object lying in each cluster M step: Cluster µ j and Σ j are updated with weighted objects Iterate EM un<l likelihood (or MAP for Bayesian) is maximized Run for different k with model selec<on from BIC or bootstrap Other techniques include use of W and B matrices, robust procedures, and penal<es for roughness Codes include EMMIX, MCLUST, and AutoClass (h`p://<.arc.nasa.gov/tech/rse/synthesis- projects- applica<ons/autoclass, Cheesman & Stutz 1995)