Dark Matter Detection: Finding a Halo in a Haystack

Similar documents
Observing Dark Worlds (Final Report)

Observing Dark Worlds

Detecting Dark Matter Halos using Principal Component Analysis

Detecting Dark Matter Halos Sam Beder, Brie Bunge, Adriana Diakite December 14, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

The motions of stars in the Galaxy

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

ECE521 week 3: 23/26 January 2017

y Xw 2 2 y Xw λ w 2 2

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Introduction. Chapter 1

COMS 4771 Introduction to Machine Learning. Nakul Verma

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Algorithm-Independent Learning Issues

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Nonparametric Regression With Gaussian Processes

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Gravitational Lensing. A Brief History, Theory, and Applications

Lecture 3: Pattern Classification

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

The Bayes classifier

CMU-Q Lecture 24:

LECTURE NOTE #3 PROF. ALAN YUILLE

Bayesian Learning (II)

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

PHY323:Lecture 7 Dark Matter with Gravitational Lensing

Density Estimation. Seungjin Choi

Bayesian Support Vector Machines for Feature Ranking and Selection

ASTRON 331 Astrophysics TEST 1 May 5, This is a closed-book test. No notes, books, or calculators allowed.

Introduction to Machine Learning Midterm Exam Solutions

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Projects in Geometry for High School Students

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

18.9 SUPPORT VECTOR MACHINES

Lecture 3: Pattern Classification. Pattern classification

Machine Learning for Signal Processing Bayes Classification and Regression

Characterizing Dark Matter Concentrations Through Magnitude Distortions due to Gravitational Lensing

AU/Mpc km/au s 1 (2) = s 1 (3) n(t 0 ) = ɛ(t 0) mc 2 (7) m(h) = m p = kg (8)

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

AST1100 Lecture Notes

Galaxy structures described by ECE theory

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

If there is an edge to the universe, we should be able to see our way out of the woods. Olber s Paradox. This is called Olber s Paradox

Lecture 15 - Orbit Problems

Solutions. 8th Annual Boston Area Undergraduate Physics Competition April 27, 2002

Jeff Howbert Introduction to Machine Learning Winter

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

SOLUTIONS TO HOMEWORK ASSIGNMENT #2, Math 253

Machine Learning Linear Classification. Prof. Matteo Matteucci

Search for Point-like. Neutrino Telescope

Introduction to Machine Learning Midterm Exam

Kernel Methods. Charles Elkan October 17, 2007

Assignment #12 The Milky Way

Introduction to Machine Learning

Fast Likelihood-Free Inference via Bayesian Optimization

( ) 2 1 r S. ( dr) 2 r 2 dφ

STA414/2104 Statistical Methods for Machine Learning II

arxiv:astro-ph/ v1 16 Jan 1997

Predicting flight on-time performance

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

CSCI-567: Machine Learning (Spring 2019)

CSE446: non-parametric methods Spring 2017

An Algorithmist s Toolkit Nov. 10, Lecture 17

The Milky Way Part 3 Stellar kinematics. Physics of Galaxies 2011 part 8

arxiv: v1 [stat.me] 5 Apr 2013

Lecture : Probabilistic Machine Learning

Statistical Machine Learning Hilary Term 2018

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Greedy Layer-Wise Training of Deep Networks

Chapter 7. Rotational Motion and The Law of Gravity

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

COMP90051 Statistical Machine Learning

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Bayesian Decision Theory

Physics 201 Midterm Exam 3

The Expectation-Maximization Algorithm

Kernel methods, kernel SVM and ridge regression

Relevance Vector Machines for Earthquake Response Spectra

Introduction to Machine Learning Midterm, Tues April 8

Classification: The rest of the story

Bayesian Machine Learning

AUTOMATIC MORPHOLOGICAL CLASSIFICATION OF GALAXIES. 1. Introduction

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Pattern Recognition. Parameter Estimation of Probability Density Functions

Density Estimation: ML, MAP, Bayesian estimation

Pre-Calculus MATH 119 Fall Section 1.1. Section objectives. Section 1.3. Section objectives. Section A.10. Section objectives

Massachusetts Institute of Technology

Machine Learning : Support Vector Machines

Separating Stars and Galaxies Based on Color

Statistical Data Mining and Machine Learning Hilary Term 2016

CS798: Selected topics in Machine Learning

The Geometry of Large Rotating Systems

PATTERN RECOGNITION AND MACHINE LEARNING

Rotation. Rotational Variables

Transcription:

Dark Matter Detection: Finding a Halo in a Haystack Paul Covington, Dan Frank, Alex Ioannidis Introduction TM The predictive modeling competition platform Kaggle recently posed the Observing Dark Worlds prize, challenging researchers to design a supervised learning algorthim that predicts the position of dark matter halos in images of the night sky. Although dark matter neither emits nor absorbs light, its mass bends the path of a light beam passing near it. Regions with high densities of dark matter termed dark matter halos exhibit this so-called gravitational lensing, warping light from galaxies behind them. When a telescope captures an image of the sky, such galaxies appear elongated tangential to the dark matter halo in the image. That is, each galaxy appears stretched along a direction perpendicular to the radial vector from the dark matter to that galaxy. Aggregating this phenomenon across a sky filled with otherwise randomly oriented elliptical galaxies, one observes more galactic elongation (ellipticity) tangential to the dark matter than one would otherwise expect (Fig. and ). The challenge is to use this statistical bias in observed ellipticity to predict the position of the unseen dark matter halos (Fig. and, left). Training Sky 8: 59 galaxies, halos Obective value for Sky 8 4 4 3 3.99.975.96.945.93.95.9.885 3 4 3 4 Figure : Left: A sky with one halo (white ball at upper left). Galactic ellipticities are notably skewed by the halo. Right: Plot of an obective function (described below) that we designed to have its minimum at the most likely location for the dark matter halo. The position of this minimum is marked by a red dot in the plot at left. Training Sky 34: 63 galaxies, 3 halos 4 3 3 4 Figure : Left: A sky with three halos (white balls) with our method s predictions indicated by red dots. Due to interactions of halos with differing strengths on each galaxy it is no longer easy to discern the position of the halos. The now 6-d obective ( dimensions in the position of each halo) cannot be visualized. Right: Ellipticity, the elongation and orientation of an ellipse, can be described by components e and e such that k(e, e )k.

Methods. Datasets and Competition Metric The training data consists of 3 skies, each with between 3 and 7 galaxies. For each galaxy i, a coordinate (x (i), y (i) ) is given ranging from to 4 pixels along with an ellipticity (e (i), e(i) ). Here e(i) represents stretching along the x-axis (positive for elongation along x and negative for elongation along y), and e (i) represents elongation along the line 45 to the x-axis (Fig., right). Each sky also contains one to three halos, whose coordinates are given. Predicting these halo coordinates in test set skies is the challenge. In these test skies only galaxy coordinates, ellipticities, and the total number of halos is provided. The algorithm s performance on a test set of skies is evaluated by Kaggle according to the formula m = F/+G where F represents the average distance of a predicted halo from the actual halo and G is an angular term that penalizes algorithms with a preferred location for the halo in the sky (positional bias).. Obective Function Physically, the distortion induced by a halo on a galaxy s ellipticity should be related to the distance of closest approach to the halo of a light beam traveling from the galaxy to the observer. We explored the functional form of this radial influence on distortion by plotting distance of a galaxy from the halo on the horizontal and on the vertical either e tangential = (e cos(φ) + e sin(φ)) (the elongation along the tangential direction) or e radial (the complementary elongation along a line 45 degrees to the tangential direction) for each galaxy in all single halo training skies (Fig.3). Two functional forms were proposed to fit the dependence of tangential ellipticity on radius: K θ (r) = e tangential exp( (r/a) ) (Gaussian) and the more general K θ (r) = e tangential exp( (r/a) d ) (learned exponential). The parameter vector, θ = a or θ = (a, d) respectively, is learned from the training skies (see Parameter Fitting section below). learned exponential model gaussian model tangential ellipticity e tan. radial ellipticity e rad. 3 4 5 radius r 3 4 5 radius r Figure 3: Left: Tangential ellipticity (left) for galaxies in a particular sky varies with distance from the dark matter halo, while radial ellipticity (right) does not. For a one halo sky the ellipticity for a particular galaxy is then modeled as, ê (x, y, α) = αk θ (r) cos(φ) + ɛ ê (x, y, α) = αk θ (r) sin(φ) + ɛ where ɛ and ɛ represent the random components of the galaxy s ellipticity, and α is a parameter associated with each halo that represents its strength, and is determined from the observed galaxy data (see Optimization

section below). For a multiple halo sky the influence of each of the halos on a galaxy s ellipticity is assumed to form a linear superposition. The predicted ellipticity for a galaxy i is thus, Nh ê (i) ({(x, y)}, {α}) = = Nh ê (i) ({(x, y)}, {α}) = = ) cos(φ(i) ) + ɛ(i) ) sin(φ(i) ) + ɛ(i) The obective function is formed by summing the squared errors between the model predicted ellipticity components (ê (i) and ê (i) ) and the observed ellipticity components (e(i) and e (i) ) over all galaxies in a given sky. E({(x, y)}, {α}) = i e (i) + ) cos(φ(i) ) + e (i) + ) sin(φ(i) ) Finally, predicted halo locations {(x, y )} and strengths {α } are found by minimizing the obective for a particular sky, ({(x, y )}, {α }) = argmin (x,y,α) E({(x, y)}, {α}) This obective is a squared error loss function for the deviation of the observed galaxy ellipticity components from their maximum likelihood predictions given the halo positions. Assuming that the deviations (ɛ and ɛ ) of each galaxy s ellipticity components from their model predictions follows a bivariate Gaussian with spherical covariance, the optimum of this squared error loss function obective also is the maximum likelihood estimate for the halo positions. Moreover, assuming a uniform prior probability for the halos positions in the sky, this maximum likelihood estimator for the halo positions is also the Bayesian maximum a posteriori estimator for the halo positions. Thus, under these assumptions the highest probability positioning for the halos given the observed galaxy ellipticities is the argmin of this obective..3 Optimization The optimal halo strength parameters α i for each galaxy can be found analytically, since the obective is quadratic in the α i. Differentiating the obective with respect to each α i and setting equal to zero yields a linear system for the α i, Aα = b with A,k = i K θ (r (i) )K θ(r (i) k ) cos((φ(i) φ (i) k )) and b = i K θ (r (i) )e(i) tangential (x, y ) With the αi determined explicitly, and assuming the model parameters are already fit (see Parameter Fitting section below), we need to optimize the obective only over the space of possible halo positions. Each halo may be positioned anywhere in the -dimensional sky image, so we have a -d, 4-d, or 6-d search space for the one, two, and three halo problems respectively. This optimization is performed using a downhill simplex algorithm, employing random starts to overcome local minima..4 Parameter Fitting The radial influence models e tangential exp( (r/a) ) (Gaussian) and e tangential exp( (r/a) d ) (learned exponential) require fitting values to the parameter vector θ, where θ = a or θ = (a, d) respectively. Since we consider these models to be approximations to a universal physical law, we require these parameters be constant 3

for all halos across all skies. Fixing θ both prevents overfitting to the training data and allows us to arrive at a more accurate estimation of it. When fitting the more general learned exponential model these benefits are crucial, since, in constast to a, which ust alters the scaling, the parameter d alters the functional form of the model, introducing significant flexibility and implying very high dimensionality. The parametric estimation of θ was performed similarly to the non-parametric estimation of halo positions previously described. Specifically, the sum of squared errors between modeled and observed galaxy ellipticities for all single-halo training skies was minimized with respect to θ. 3 Discussion To determine the efficacy of our optimization algorithm, we compared the value of the obective at the true halo locations in the training data to the value returned by the optimization routine (see Fig. 4). For modest number of random starts (corresponding to a few minutes per sky on a desktop machine), the optimization algorithm plateaus consistently finding a minimum below that of the true solution. This diagnostic suggests the optimization is finding the obective function s global minumum; but this obective function minimum does not correspond to the true halo positions. We concluded that we should focus on improving our obective function rather than pursuing improvements to our optimization algorithm. We decided to improve our obective by refining our model of K θ (r) from the Gaussian form to the learned exponential form. Another question addresses the validity of our. normalized obective value.5..5 4 6 8 4 6 number of random starts Figure 4: A line is plotted for each sky showing the minimal normalized obective value (given by E(x,y ) E(x true,y true) E(x true,y true) ) returned for increasing numbers of random starts. (The predicted values were derived with our Gaussian model.) initial assumption of Gaussian distributed deviations (ɛ (i) and ɛ (i) ) for the galaxy ellipticity components (e(i) and e (i) ) from their predicted values (ê(i) and ê (i) ). This assumption was crucial, since it allowed us to use a squared error loss obective to find our maximum likelihood (and maximum a posteriori) estimators. The quadratic form of this obective further allowed us to solve explicitly for the halo strength parameters αi, reducing the dimensionality of our final optimization space. Unfortunately the assumption is clearly not true. The support of ɛ (i) and ɛ (i) is the unit cirlce (Fig. ), so their deviations must come from a pdf with a similarly limited support not a Gaussian with infinite support. However this theoretical obection is not a severe problem in practice. Figure 5 shows that for moderate e tangential values (less than.4) the ellipticity pdfs are close to bivariate Gaussians with spherical covariance. Moreover, so little probability density is near the unit circle boundary that approximating the ellipticity deviations with a Gaussian pdf is tolerable. This self-consistently ustifies our original use of a minimum squared error obective. For regions with predicted e tangential (plotted as ) larger that.4, approximating the ellipticity deviations with a bivariate Gaussian of spherical covariance becomes less tenable, see Fig. 5 bottom plot. Fortunately, very few galaxies lie so close to a dark matter halo as to be inside such a high predicted e tangential region. (One might incorporate the departure from spherical covariance for galaxies in high predicted 4

ẽ. 9 8 7 6 5 4 3 ẽ..8.6.4..8.6.4. ẽ. 6.48.4.3.4.6.8......4..8.5 ẽ...9.6.3.. Figure 5: These four plots look across all training skies at regions of those skies where tangential ellipticity is predicted (by the learned exponential model) to be.,.,.3, and.4 respectively. The actual tangential ellipticity observed for each galaxy in such regions is labeled, and its complementary ellipticity component is labeled ẽ. By aggregating these ellipticity values across the many training galaxies found in each region, and using kernel density estimation, the corresponding ellipticity pdf can be found. These are the pdfs plotted above. e tangential regions by adusting the weights of the squared errors of e and e to vary with the magnitude of the predicted e tangential for the galaxy. This would still not alleviate the departures from even spherical Gaussian shape that occur for extreme e tangential values.) 4 Results Our dark matter halo position prediction algorithm performed well on both the training set and test set and compared favorably with the other Kaggle TM prize entrants, see Table below. Here we see a comparison halo skies halo skies 3 halo skies all skies Kaggle distance error distance error distance error distance error metric gridded signal (kaggle) 645 767 483 65.77 maximum likelihood (kaggle) 63 - - - - gaussian model 77 84 7 8.955 learned exponential 33 74 934 73.856 Table : The distance metric gives the average distance of predicted halos from actual halos on the training set. The Kaggle metric was described in Methods (lower is better) and describes error on the test set. of the performance of our initial Gaussian model and our later more general learned exponential model to the performance of Kaggle TM provided benchmarks. Against all benchmarks we showed superior results. Indeed, our algorithm outperformed even an astrophysical standard, the 5, line Lenstool Maximum Likelihood code. Kaggle Ranking (team Skynet): 55 th out of 337 competitors with score of.9783 5