Separating Signal from Background

Similar documents
Estimation of a Two-component Mixture Model

ON THE UNIFIED METHOD WITH NUISANCE PARAMETERS

Future DM indirect detection in dwarf spheroidal galaxies and Foreground effect on the J-factor estimation Koji Ichikawa

Robust estimate of the dark matter halo of the Milky-way's dwarf spheroidal galaxies

Dark matter annihilation and decay factors in the Milky Way s dwarf spheroidal galaxies

Koji Ichikawa. (In preparation) TAUP, Torino, Sep. 7-11, 2015

Koji Ichikawa. Prospects on the indirect dark matter detection and a future spectroscopic survey of dwarf spheroidal galaxies.

Koji Ichikawa In collaboration with Kohei Hayashi, Masahiro Ibe, Miho N. Ishigaki, Shigeki Matsumoto and Hajime Sugai.

Dwarf Galaxy Dispersion Profile Calculations Using a Simplified MOND External Field Effect

The tidal stirring model and its application to the Sagittarius dwarf

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Masses of Dwarf Satellites of the Milky Way

SEMIPARAMETRIC INFERENCE WITH SHAPE CONSTRAINTS ROHIT KUMAR PATRA

The Current Status of Too Big To Fail problem! based on Warm Dark Matter cosmology

The Milky Way s rotation curve out to 100 kpc and its constraint on the Galactic mass distribution

Distance and extinction determination for stars in LAMOST and APOGEE survey

Dark Matter in Dwarf Galaxies

Extreme Galaxies: Part I

New Methods of Mixture Model Cluster Analysis

Regression Clustering

New insights into the Sagittarius stream

Substructure in the Galaxy

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Mixture Models and EM

A search for dark matter annihilation in the newly discovered dwarf galaxy Reticulum II

Pattern Recognition. Parameter Estimation of Probability Density Functions

Lecture 16: Mixtures of Generalized Linear Models

arxiv: v1 [astro-ph] 8 Oct 2008

A wide-field view of the Phoenix transition type dwarf galaxy (Battaglia et al., MNRAS accepted, airxv/ )

The Galaxy Dark Matter Connection

Joint Optimization of Segmentation and Appearance Models

A Spectral Analysis of the Chemical Enrichment History of Red Giants in the Andromeda

More on Unsupervised Learning

Generalized Linear Models

Elena D Onghia. Harvard Smithsonian Center for Astrophysics. In collaboration with:

Dark Matter Halos of M31. Joe Wolf

BHS Astronomy: Galaxy Classification and Evolution

PoS(Extremesky 2011)064

Statistics: Learning models from data

The Expectation-Maximization Algorithm

arxiv:astro-ph/ v1 28 Nov 2002

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

INTERNAL KINEMATICS OF THE FORNAX DWARF SPHEROIDAL GALAXY

Galaxies. Need a (physically) meaningful way of describing the relevant properties of a galaxy.

The Brown Dwarfs of our Milky Way

A minimalist s exposition of EM

Substructure in the Stellar Halo of the Andromeda Spiral Galaxy

A tool to test galaxy formation theories. Joe Wolf (UC Irvine)

Enrichment of r-process elements in nearby dsph galaxies, the Milky way, and globular cluster(s)

Chapter 19 Reading Quiz Clickers. The Cosmic Perspective Seventh Edition. Our Galaxy Pearson Education, Inc.

The Massive Star Content of M31. Phil Massey Lowell Observatory The 2nd MMT Science Symposium

Outline of GLMs. Definitions

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)

Galaxy classification

Chapter 10: Unresolved Stellar Populations

QUANTIFYING THE MIX OF STELLAR POPULATIONS ANDREW ZHOU IN STUDIES OF THE ANDROMEDA GALAXY ABSTRACT

Lecture 6: Gaussian Mixture Models (GMM)

V. Astronomy Section

Embedding Computer Models for Stellar Evolution into a Coherent Statistical Analysis

Milky Way Companions. Dave Nero. February 3, UT Astronomy Bag Lunch

The Hot Gaseous Halos of Spiral Galaxies. Joel Bregman, Matthew Miller, Edmund Hodges Kluck, Michael Anderson, XinyuDai

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Globular Clusters in LSB Dwarf Galaxies

Exponential Families

An Extended View of the Pulsating Stars in the Carina Dwarf Spheroidal Galaxy

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Properties of Stars (continued) Some Properties of Stars. What is brightness?

Open Cluster Research Project

MIDTERM #2. ASTR 101 General Astronomy: Stars & Galaxies. Can we detect BLACK HOLES? Black Holes in Binaries to the rescue. Black spot in the sky?

Galactic Bulge Science

Computing the MLE and the EM Algorithm

Spatial distribution of stars in the Milky Way

Mixture Models. Michael Kuhn

The Besançon Galaxy Model development

Lecture Tutorial: Using Astronomy Picture of the Day to learn about the life cycle of stars

Sean Escola. Center for Theoretical Neuroscience

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

The cosmic distance scale

The Milky Way and Near-Field Cosmology

Topic 2: Logistic Regression

Precision kinematics Demonstration on Bootes dsph. Sergey Koposov Matt Walker, Vasily Belokurov, Gerry Gilmore, Jorge Pennarubia and others

On the Formation of Elliptical Galaxies. George Locke 12/8/09

Galaxies. Hubble's measurement of distance to M31 Normal versus other galaxies Classification of galaxies Ellipticals Spirals Scaling relations

A novel determination of the local dark matter density. Riccardo Catena. Institut für Theoretische Physik, Heidelberg

STA 4273H: Statistical Machine Learning

AstroBITS: Open Cluster Project

Dark Matter in Galaxies

ECE521 week 3: 23/26 January 2017

Maximum likelihood estimation of a log-concave density based on censored data

Connecting the small and large scales

Chapter 14 Combining Models

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Fitting Narrow Emission Lines in X-ray Spectra

The asymmetry in the GC population of NGC 4261

The Mass of the Milky Way Using Globular Cluster Kinematics

John E. Hibbard NRAO-CV

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Energy minimization via graph-cuts

Transcription:

1 Department of Statistics Columbia University February 6, 2009 1 Collaborative work with Matthew Walker, Mario Mateo & Michael Woodroofe

Outline Separating Signal and Foreground stars 1 Separating Signal Stars

Problem Separating Signal and Foreground stars Most astronomical data sets are polluted to some extent by foreground/background objects ( contaminants/noise ) that can be difficult to distinguish from objects of interest ( member/signal ) Contaminants may have the same apparent magnitudes, colors, and even velocities as members How do you separate out the signal stars? We develop an algorithm for evaluating membership (estimating parameters & probability of an object belonging to the member population)

Example Data on stars in nearby dwarf spheroidal (dsph) galaxies Data: (X 1i, X 2i, V 3i, σ i, ΣMg i,...) Velocity samples suffer from contamination by foreground Milky Way stars

Approach Separating Signal and Foreground stars Our method is based on the Expectation-Maximization (EM) algorithm We assign parametric distributions to the observables; derived from the underlying physics in most cases The EM algorithm provides estimates of the unknown parameters (mean velocity, velocity dispersion, etc.) Also, probability of each star belonging to the signal population

A toy example Suppose N Poisson(b + s) is the number of stars observed s = rate for observing a member star b = the foreground rate Given N = n, we have W 1,..., W n f b,s where data {W i = (X 1i, X 2i, V 3i, σ i )} n i=1, and f b,s(w) = bf b(w)+sf s(w) b+s We assume that f b and f s are parameterized (modeled by the underlying physics) probability densities

For Galaxy stars The stellar density (number of stars per unit area) falls exponentially with radius, R The distribution of velocity given position is assumed to be normal with mean µ and variance σ 2 + σ 2 i For Foreground stars The density is uniform over the field of view The distribution of velocities V 3i is independent of position (X 1i, X 2i ) We adopt V 3i from the Besanćon Milky Way model (Robin et al. 2003), which specifies velocity distributions of Milky Way stars along a given line of sight

Extension: Scenario I Introduce a non-parametric component Velocity dispersion was assumed constant; now can model it as a function of projected radius R Needs a tuning parameter to find σ(r)

Scenario II Separating Signal and Foreground stars Do not assume exponential density profile Assume that as you move from the center of the galaxy, the chance of observing a member star decreases

A further extension Mixture model f (x) = k j=1 π jf j (x); k j=1 π j = 1 Model f 1, f 2,..., f k as log-concave densities on R p No Tuning parameter required completely non-parametric

Model Separating Signal and Foreground stars Suppose N Poisson(b + s) is the number of stars observed s = rate for observing a member star b = the foreground rate Given N = n, we have W 1,..., W n f b,s where data {W i = (X 1i, X 2i, V 3i, σ i, ΣMg i,...)} n i=1, and f b,s (w) = bf b(w)+sf s(w) b+s We assume that f b and f s are parameterized (modeled by the underlying physics) probability densities

The Likelihood Let Y i be the indicator of a foreground star, i.e., Y i = 1 if the i th star is a foreground star, and Y i = 0 otherwise Note that Y i s are i.i.d. Bernoulli( b b+s ). Let Z = (W, Y, N) be the complete data [where W = (W 1, W 2,..., W n ) and Y = (Y 1, Y 2,..., Y n )] The likelihood for the complete data can be written as L C (b+s) (b + s)n (β) = e N! N i=1 { b fb (W i ) b + s } Yi { } s fs (W i ) 1 Yi b + s

Algorithm Separating Signal and Foreground stars E-step: Start with some initial estimates of the parameter β [in our simple example β = (s, b, µ, σ 2,...)] Evaluates the expectation of the log-likelihood given the observed data under the current estimates of the unknown parameters Evaluate Q(β, β n ) = E βn [l(β) W, N] M-step: Maximizes the expectation Q(β, β n ) with respect to β Iterate until the estimates stabilize (which is guaranteed!)

Summary: EM algorithm with mixture models Estimates of unknown parameters Estimated probability that the i th star is a member Allows flexible modeling (non-parametric) of data All we need is to form the likelihood! References Walker et al. (2008): to appear in Astronomical Journal Sen, B., et al. (2008): to appear in Statis. Sinica Cule, M. L. (2008): submitted