MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION

Similar documents
NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION

Chapter 3: Cluster Analysis

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

Elements of Machine Intelligence - I

Simple Linear Regression (single variable)

Checking the resolved resonance region in EXFOR database

Churn Prediction using Dynamic RFM-Augmented node2vec

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Math Foundations 20 Work Plan

Part 3 Introduction to statistical classification techniques

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents

Subject description processes

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

Professional Development. Implementing the NGSS: High School Physics

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

The standards are taught in the following sequence.

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

5 th grade Common Core Standards

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

IAML: Support Vector Machines

Dataflow Analysis and Abstract Interpretation

Resampling Methods. Chapter 5. Chapter 5 1 / 52

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

Hypothesis Tests for One Population Mean

Determining the Accuracy of Modal Parameter Estimation Methods

7 TH GRADE MATH STANDARDS

ENSC Discrete Time Systems. Project Outline. Semester

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

Evaluating enterprise support: state of the art and future challenges. Dirk Czarnitzki KU Leuven, Belgium, and ZEW Mannheim, Germany

Math Foundations 10 Work Plan

CESAR Science Case The differential rotation of the Sun and its Chromosphere. Introduction. Material that is necessary during the laboratory

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

Analysis on the Stability of Reservoir Soil Slope Based on Fuzzy Artificial Neural Network

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are:

Document for ENES5 meeting

Weathering. Title: Chemical and Mechanical Weathering. Grade Level: Subject/Content: Earth and Space Science

How do scientists measure trees? What is DBH?

IB Sports, Exercise and Health Science Summer Assignment. Mrs. Christina Doyle Seneca Valley High School

Floating Point Method for Solving Transportation. Problems with Additional Constraints

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

Least Squares Optimal Filtering with Multirate Observations

Eric Klein and Ning Sa

FIZIKA ANGOL NYELVEN JAVÍTÁSI-ÉRTÉKELÉSI ÚTMUTATÓ

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank

Agenda. What is Machine Learning? Learning Type of Learning: Supervised, Unsupervised and semi supervised Classification

What is Statistical Learning?

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

I. SEARCH PARAMETERS AND ACCEPTANCE CRITERIA

CS 109 Lecture 23 May 18th, 2016

Application of ILIUM to the estimation of the T eff [Fe/H] pair from BP/RP

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

GENESIS Structural Optimization for ANSYS Mechanical

A Quick Overview of the. Framework for K 12 Science Education

Grade Level: 4 Date: Mon-Fri Time: 1:20 2:20 Topic: Rocks and Minerals Culminating Activity Length of Period: 5 x 1 hour

YEAR 6 (PART A) Textbook 6A schema

ALE 21. Gibbs Free Energy. At what temperature does the spontaneity of a reaction change?

MACE For Conformation Traits

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1

Reinforcement Learning" CMPSCI 383 Nov 29, 2011!

Science 9 Unit 2: Atoms, Elements and Compounds

LOTNAV: A LOW-THRUST INTERPLANETARY NAVIGATION TOOL

A study of the large voids in the spatial distribution of galaxy clusters in the Northern Galactic Hemisphere

Application of APW Pseudopotential Form Factor in the Calculation of Liquid Metal Resistivities.

Formal Uncertainty Assessment in Aquarius Salinity Retrieval Algorithm

NAME: Prof. Ruiz. 1. [5 points] What is the difference between simple random sampling and stratified random sampling?

NGSS High School Physics Domain Model

Chapter 31: Galaxies and the Universe

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.

Competency Statements for Wm. E. Hay Mathematics for grades 7 through 12:

Interference is when two (or more) sets of waves meet and combine to produce a new pattern.

Data Mining Techniques

On classifier behavior in the presence of mislabeling noise

Millburn ASG Numeracy Developmental Milestones

8 th Grade Math: Pre-Algebra

College of Engineering Writing & Communication Resource Center

Application of Image Restoration Technique in Flow Scalar Imaging. Experiment

Experiment #3. Graphing with Excel

Appropriate Documentation for Phase I and II History/Architecture Reports

Lifting a Lion: Using Proportions

CHM112 Lab Graphing with Excel Grading Rubric

MODULE ONE. This module addresses the foundational concepts and skills that support all of the Elementary Algebra academic standards.

Curriculum Development Overview Unit Planning for 8 th Grade Mathematics MA10-GR.8-S.1-GLE.1 MA10-GR.8-S.4-GLE.2

Emphases in Common Core Standards for Mathematical Content Kindergarten High School

UN Committee of Experts on Environmental Accounting New York, June Peter Cosier Wentworth Group of Concerned Scientists.

Observability-based Rules for Designing Consistent EKF SLAM Estimators

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

IEEE Int. Conf. Evolutionary Computation, Nagoya, Japan, May 1996, pp. 366{ Evolutionary Planner/Navigator: Operator Performance and

Algebra 1 /Algebra 1 Honors Curriculum Map

A Correlation of. to the. South Carolina Academic Standards for Mathematics Precalculus

Data Mining Techniques

Resumen de presentación

Transcription:

MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION Silvia de Castr García Directres: Dr. Ricard Pérez Martínez, Dra. Ana María Pérez García 16/03/2018 Machine Learning fr cluster-galaxy classificatin 1

INTRODUCTION Cntext Galaxy Clusters are giant csmic labratries harbring thusands f bjects with different rigins and characteristics. It is cmmnly accepted that the evlutin f galaxies within clusters differs frm that in the field, althugh the main prcesses are still prly understd. Key t a full characterizatin f these bjects in such a high density envirnments is a cmprehensive study f a cherent set f clusters, using a wide variety f phtmetric data frm different space bservatries and ptical surveys frm grund based telescpes. Galaxy Cluster SDSS J1044 +4112 2

INTRODUCTION Prblem Hwever, the current limited classificatin techniques d nt scale apprpriately with the vast vlume f data and data frmats available. 3

INTRODUCTION Slutin Apply machine learning techniques (bth supervised and unsupervised learning) t multi-wavelength datasets In rder t efficiently classify cluster galaxies. 4

INTRODUCTION Science Case Objective Cluster membership determinatin: Develp a fast pht-z estimatr able t establish memberships with accuracy cmparable t spectrscpic redshifts. 5

BACKGROUND Machine Learning techniques are starting t be widely used in Astrnmy. We find several wrks in phtmetric redshift estimatin in different dmains: Cperative phtmetric redshift estimatin S. Cavuti+ 2017 Metaphr: a ML based methd fr the prbability density estimatin f phtmetric redshifts S. Cavuti+ 2017 Mapping the galaxy clr-redshift relatin: ptimal phtmetric redshift calibratin strategies fr csmlgy surveys - D. Masters+ 2015 Phtmetric redshifts fr quasars in multi-band surveys M. Brescia+ 2013 6

THE DATA Multi-wavelength phtmetric catalgue f cluster ZwCl0024+1652 prduced by Pérez Martinez et. al. (2016) Cmbining data f 7 different catalgues: XMM-Newtn and Chandra catalgues fr X-ray data; GALEX fr ultravilet data; Mran et. al. (2005) catalgue f ptical/nir infrmatin including HST and grund-based brad-band data (frm CFHT and Hale 200- inch Telescpes); IRAC and MIPS data frm Spitzer; PACS & SPIRE frm Herschel. 19670 surces 1262 clustermember 32 phtmetric pints 2016-08-04 Title f the presentatin Cnfidential - Fr internal use nly 7

THE TOOL WEKA (Waikat Envirnment fr Knwledge Analysis) https://www.cs.waikat.ac.nz/ml/weka/witten_et_al_2016_appendix.pdf WEKA is a data mining framewrk prviding state-f-the-art techniques in machine learning. Weka GUI Explrer and Visualizatin Advantages Easy t use GUI available Highly prtable written in JAVA Wide set f ML techniques including: data preprcessing, classificatin, regressin, clustering, assciatin rules and visualizing capabilities. Open Surce GNU General Public License Drawbacks Specific-dedicated frmat (*.arff) N FITS cmpatible. Nt widely used in Astrnmy > few use-cases available Nt pssible t train mdels frm large data sets frm Weka Explrer GUI althugh wners claim shuld be pssible with the CLI (further wrk fr Big Data shall be explred). 8

SCIENCE CASE 1: PHOTO-Z ESTIMATOR 1 DATA PRE-PROCESSING 2 CLUSTERING FITS2ARFF cnversin Adding attributes (deriving clurs frm phtmetric pints) 10 clurs; Remving redundant/irrelevant attributes Objective: Find clusters in the clur-data f the training set (1262 galaxies with spectrscpic z) ML technique: K-means algrithm with Euclidian distance 3 CLASSIFYING 4 PHOTO-Z DETERMINATION Objective: Classify the test set, using the clusters fund in previus step ML-technique: K-nearest neighburs Objective: Estimate pht-z ML-technique: Cmputing the median pht-z f the surces f the cluster. 9

IN PROGRESS Pre-prcessing: Selecting the mst-significant clurs; Clustering: Imprving k selectin fr k-means (Elbw methd); Manhattan distance vs Euclidian distance; Classifying: Test different ptins f K-NN; 10

NEXT STEPS Keep n tuning clustering and classifying methds t imprve results; Explre ther ML techniques fr the pht-z estimatr (e.g. Self-Organised Maps r Expectatin Maximizatin fr clustering, Randm frest, SVM and Deep Learning fr classificatin); Explre the semi-supervised apprach; Extend methdlgy t different cluster data; Cmpare results and extract cnclusins; Technlgy: Test WEKA CLI perfrmance with larger datasets; Explre WEKA fr Big Data; Check suitability f WEKA vs. ther tls (Pythn SciPy / Keras) 11

QUESTIONS? THANK YOU 12