Application of ILIUM to the estimation of the T eff [Fe/H] pair from BP/RP

Similar documents
Further observations, tests and developments

AP Statistics Notes Unit Two: The Normal Distributions

Kinetic Model Completeness

Particle Size Distributions from SANS Data Using the Maximum Entropy Method. By J. A. POTTON, G. J. DANIELL AND B. D. RAINFORD

, which yields. where z1. and z2

C.A.L. Bailer-Jones. Mon. Not. R. Astron. Soc. 000, (0000) Printed 22 December 2009 (MN LATEX style file v2.2)

What is Statistical Learning?

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

arxiv:hep-ph/ v1 2 Jun 1995

Hubble s Law PHYS 1301

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Physics 2010 Motion with Constant Acceleration Experiment 1

Differentiation Applications 1: Related Rates

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

How do scientists measure trees? What is DBH?

Thermodynamics Partial Outline of Topics

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

Chapter 3: Cluster Analysis

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Activity Guide Loops and Random Numbers

Lab 1 The Scientific Method

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

Interference is when two (or more) sets of waves meet and combine to produce a new pattern.

Physics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018

SPH3U1 Lesson 06 Kinematics

Experiment #3. Graphing with Excel

CHM112 Lab Graphing with Excel Grading Rubric

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Determining the Accuracy of Modal Parameter Estimation Methods

Verification of Quality Parameters of a Solar Panel and Modification in Formulae of its Series Resistance

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

Least Squares Optimal Filtering with Multirate Observations

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS

APPLICATION OF THE BRATSETH SCHEME FOR HIGH LATITUDE INTERMITTENT DATA ASSIMILATION USING THE PSU/NCAR MM5 MESOSCALE MODEL

EXPERIMENTAL STUDY ON DISCHARGE COEFFICIENT OF OUTFLOW OPENING FOR PREDICTING CROSS-VENTILATION FLOW RATE

ALE 21. Gibbs Free Energy. At what temperature does the spontaneity of a reaction change?

READING STATECHART DIAGRAMS

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Pattern Recognition 2014 Support Vector Machines

(2) Even if such a value of k was possible, the neutrons multiply

Comparing Several Means: ANOVA. Group Means and Grand Mean

Department of Electrical Engineering, University of Waterloo. Introduction

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Preparation work for A2 Mathematics [2018]

Basics. Primary School learning about place value is often forgotten and can be reinforced at home.

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are:

Flipping Physics Lecture Notes: Simple Harmonic Motion Introduction via a Horizontal Mass-Spring System

Study Group Report: Plate-fin Heat Exchangers: AEA Technology

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

INSTRUMENTAL VARIABLES

We can see from the graph above that the intersection is, i.e., [ ).

Chapter 2 GAUSS LAW Recommended Problems:

THERMAL-VACUUM VERSUS THERMAL- ATMOSPHERIC TESTS OF ELECTRONIC ASSEMBLIES

We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line

Section 5.8 Notes Page Exponential Growth and Decay Models; Newton s Law

Computational modeling techniques

BASD HIGH SCHOOL FORMAL LAB REPORT

THERMAL TEST LEVELS & DURATIONS

Pipetting 101 Developed by BSU CityLab

Preparation work for A2 Mathematics [2017]

^YawataR&D Laboratory, Nippon Steel Corporation, Tobata, Kitakyushu, Japan

37 Maxwell s Equations

LCAO APPROXIMATIONS OF ORGANIC Pi MO SYSTEMS The allyl system (cation, anion or radical).

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion

Flipping Physics Lecture Notes: Simple Harmonic Motion Introduction via a Horizontal Mass-Spring System

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

Math Foundations 20 Work Plan

The standards are taught in the following sequence.

EE247B/ME218: Introduction to MEMS Design Lecture 7m1: Lithography, Etching, & Doping CTN 2/6/18

Thermodynamics and Equilibrium

Making and Experimenting with Voltaic Cells. I. Basic Concepts and Definitions (some ideas discussed in class are omitted here)

6.3: Volumes by Cylindrical Shells

End of Course Algebra I ~ Practice Test #2

Chapter 3 Kinematics in Two Dimensions; Vectors

1 The limitations of Hartree Fock approximation

ECE 5318/6352 Antenna Engineering. Spring 2006 Dr. Stuart Long. Chapter 6. Part 7 Schelkunoff s Polynomial

A Matrix Representation of Panel Data

Excessive Social Imbalances and the Performance of Welfare States in the EU. Frank Vandenbroucke, Ron Diris and Gerlinde Verbist

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

A Few Basic Facts About Isothermal Mass Transfer in a Binary Mixture

Surface and Contact Stress

CESAR Science Case The differential rotation of the Sun and its Chromosphere. Introduction. Material that is necessary during the laboratory

Tutorial 4: Parameter optimization

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Physical Layer: Outline

ABSORPTION OF GAMMA RAYS

Dead-beat controller design

Transcription:

Applicatin f ILIUM t the estimatin f the T eff [Fe/H] pair frm BP/RP prepared by: apprved by: reference: issue: 1 revisin: 1 date: 2009-02-10 status: Issued Cryn A.L. Bailer-Jnes Max Planck Institute fr Astrnmy, Heidelberg Email: calj@mpia.de

ILIUM n T eff, [Fe/H] Abstract A new parameter estimatin methd ILIUM was intrduced in GAIA-C8-TN-MPIA-CBJ-042 where it was demnstrated n the prblem f estimating T eff and lg g frm nisy BP/RP spectra. The current implementatin is limited t tw APs (ne strng and ne weak ). Here I present results f estimating T eff and [Fe/H] frm the same data. Fr F,G,K dwarfs (4000 T eff 7000 K) with metallicities ranging frm +1 t 4 dex, we can estimate [Fe/H] t an accuracy f 0.14 dex and T eff t 0.4% (mean abslute errrs) at G=15, t 0.26 dex and 0.6% respectively at G=18.5 and t 0.82 dex and 1.6% at G=20 (data with an end-f-missin SNR crrespnding t 72 transits, but nn-versampled spectra). The errrs fr giants are 50% larger at G=15 but nly 10% larger at the fainter magnitudes. Surprisingly, the perfrmance is hardly imprved when stars with [Fe/H] < 2.0 are remved frm the analysis. If ILIUM is applied t stars with unknwn lg g (having been trained n the full range f lg g), then the perfrmance at G=18.5 is 0.40 dex in [Fe/H] and 1.2% in T eff. (Given that dwarfs heavily utnumber giants in a magnitude-limited sample, better verall results wuld be btained if we trained n dwarfs.) Frm this I shw that all three APs (T eff, [Fe/H] and lg g) can be estimated by successively applying the tw 2-AP versins f ILIUM. Technical Nte 2

ILIUM n T eff, [Fe/H] Cntents 1 Intrductin 4 2 Applicatin t dwarfs 5 2.1 G=15....................................... 5 2.2 G=18.5...................................... 14 2.3 G=20....................................... 14 3 Applicatin t giants 18 4 Applicatin t stars with unknwn lg g 21 5 Estimating all three astrphysical parameters 21 A Why we cannt crrect fr the systematic errrs 23 Technical Nte 3

ILIUM n T eff, [Fe/H] 1 Intrductin Estimatin f the pair T eff and [Fe/H] is a realistic tw parameter prblem, because either (a) we can assume that mst stars in a magnitude limited survey are dwarfs, r (b) with Gaia we can use the parallaxes (via the derived abslute magnitude) alng with a Gaia clur t independently estimate lg g. I therefre apply ILIUM t estimate T eff and [Fe/H] separately fr dwarfs and giants. The dwarf sample is defined as having lg g equal t 4.0, 4.5 r 5.0 dex. This cmprises 1716 such stars and is randmly split int equal sized train and test sets. (Recall that the training data are used t fit the frward mdel and d the nearest neighbur initializatin.) The giant sample is defined as bjects with lg g equal t 1.0, 1.5, 2.0, 2.5 r 3.0 (1882 stars). The AP distributin fr the dwarfs is shwn in Fig. 1 (the grid fr the giants is almst identical). See Fig. 5 f GAIA-C8-MPIA-CBJ-042 fr the T eff lg g distributin. The spectral data are exactly as in CBJ-042 (Srd & Vallenari 2008), that is, they have a SNR crrespnding t a stack f 72 transits, yet with the rigial wavelength dispersin, i.e. nn-versampled spectra. Oversampling shuld imprve the spectral reslutin which may imprve perfrmance abve that reprted here. On the ther hand, the spectral cmbinatin and versampling prcedure may intrduce additinal errrs nt yet accunted fr in the simulatins (althugh GOG des currently include sme additinal errr surces beynd the usual triad f surce, backgrund and CCD readut; Zaldua et al. 2008). T get an idea f the quality f the spectra, Fig. 2 plts the median and 10% and 90% quartiles f the SNR at each wavelength at G=18.5 and G=20.0. (Cmpared t the G=18.5 curve, the SNR at G=15 is 8 6 times larger between 400 and 650 nm and 7 12 times larger between 650 and 1000 nm.) ILIUM is used in its default mde with the internal parameters exactly as shwn in Table 2 f CBJ-042. That is, the same values f 4000 6000 8000 10000 12000 14000 Teff / K / dex FIGURE 1: The AP grid fr the dwarfs (lg g {4.0, 4.5, 5.0}) Technical Nte 4

ILIUM n T eff, [Fe/H] parameters used there fr the (standardized) lg g measures are used here fr the (standardized) [Fe/H] measures. Perfrmance is reprted using statistics f the AP residuals: the RMS, σ φ ; the mean abslute residual, δφ ; the mean residual, δφ (a measure f the systematic errr). SNR per band 0 20 40 60 80 400 500 600 700 800 900 1000 wavelength / nm FIGURE 2: The median SNR (slid line) and 0.1 and 0.9 quartiles (dashed lines) acrss the dwarf sample fr G=18.5 (red) and G=20.0 (blue) 2 Applicatin t dwarfs The frward mdel fits are shwn in Figs. 3 and 4. The scatter in the plt against [Fe/H] is due t the lg g variatins. The fits are as gd as we culd expect. 2.1 G=15 The general pattern f the iterative updates is similar t thse seen in Fig. 8 f CBJ-042, s is nt shwn. The spectra f AP updates are interesting because they allw us t see which spectral bands cntribute t the APs fr which stars (and hw these evlve ver the iteratins), but they are t numerus t include here. The residuals are shwn in Fig. 5. The summary statistics are [Fe/H] lg (T eff ) δφ 0.15 5.8e 4 ILIUM, dwarfs, G=15, full AP range δφ 0.68 0.0058 σ φ 1.25 0.0087 The verall metallicity perfrmance is pr, because the sample includes many ht stars, and it Technical Nte 5

ILIUM n T eff, [Fe/H] 2 1 0 1 2 3 338 2 1 0 1 2 3 362 1.5 0.5 0.5 1.5 392 2 1 0 1 432 3 2 1 0 1 488 c(grid.aps[subgrid.bj, 2], runif.grid.aps[, 2]) c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 6 4 2 0 1 573 c(grid.aps[subgrid.bj, 2], runif.grid.aps[, 2]) c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 1.5 0.5 0.5 1.5 681 c(grid.aps[subgrid.bj, 2], runif.grid.aps[, 2]) c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 1 0 1 2 3 731 c(grid.aps[subgrid.bj, 2], runif.grid.aps[, 2]) c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 1 0 1 2 3 789 c(grid.aps[subgrid.bj, 2], runif.grid.aps[, 2]) c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 1 0 1 2 3 4 857 c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 1 0 1 2 3 4 934 c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 1 0 1 2 3 4 1020 c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) FIGURE 3: Predictins f the full frward mdel fr the dwarfs as a functin f lg (T eff ) at cnstant [Fe/H]= 1.0 in 12 different bands (with wavelength in nm at the tp f each panel). The black crsses (barely visible) are the (nise-free) grid pints, the red stars are the frward mdel predictins (at randmly selected AP values) and the blue circles the nisy (G=15) grid pints. The flux pltted n the rdinate is in standardized units. is well knwn that [Fe/H] cannt be accurately estimated fr htter stars. This is als respnsible fr the systematic verestimatin f [Fe/H] at lw metallicity (the trend in the middle left panel). There is als a strng dependence f bth the [Fe/H] and T eff accuracy n T eff, with bth being wrse fr htter stars (middle and bttm right panels). This is due t the metallicity spread, because we saw n such trend fr cnstant metallicity. (Indeed, we see the ppsite effect, namely lwer T eff errr fr ht stars: see Fig. 11 f CBJ-042). This strng dependence f the results n AP renders the abve summary statistics (which averages ver a mre r less unifrm sampling in lg (T eff ) up t 14 000 K) rather meaningless. Fr this reasn we replt the residuals and recalculate 1 the statistics remving stars with true 1 In this and all fllwing examples we nly remve bjects frm the analysis. ILIUM is nt retrained, s it can Technical Nte 6

ILIUM n T eff, [Fe/H] 1.6 1.2 338 1.8 1.4 1.0 362 1.6 1.4 1.2 1.0 392 1.8 1.6 1.4 1.2 432 1.50 1.35 1.20 488 1.0 0.5 0.0 573 1.4 1.6 1.8 2.0 681 1.25 1.35 1.45 1.55 731 1.20 1.30 1.40 789 1.20 1.25 1.30 1.35 857 c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 1.14 1.18 1.22 1.26 934 c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 1.05 1.15 1.25 1.35 1020 c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) FIGURE 4: As Fig. 3, but nw shwing predictins f the full frward mdel as a functin f [Fe/H] at cnstant T eff =5000 K. T eff > 7000 K [Fe/H] lg (T eff ) δφ 5.6e 3 5.2e 5 δφ 0.14 0.0017 σ φ 0.24 0.002 ILIUM, dwarfs, G=15, T eff 7000 K The residuals are pltted in Fig. 6. The results are nw dramatically different: the residuals fr [Fe/H] have drpped by a factr f 5 and thse fr lg (T eff ) by a factr f 3. (Nte that the errr in lg (T eff ) f 0.0017 crrespnds t an errr in T eff f 0.4% multiply by 2.3.) We can still estimate [Fe/H] t an accuracy f better than 0.5 dex even at [Fe/H]= 4.0. (Hwever, frm the middle left panel we d see a slight tendency t verestimate the metallicity fr [Fe/H]= 3.0 and 4.0.) The systematic in [Fe/H] residual at lw [Fe/H] has nw vanished, cnfirming that it is a prblem nly fr the ht stars. still prduce APs spanning the whle grid plus/minus the permitted 10% extraplatin. Technical Nte 7

ILIUM n T eff, [Fe/H] 4 2 0 2 4 0.0 0.4 0.8 1.2 ual 4 2 0 2 4 0.04 0.00 lg(teff) resid 0.04 0.02 0.00 0.02 0 20 40 60 80 lg(teff) residual 3.6 3.7 3.8 3.9 4.0 4.1 4 2 0 2 4 lg(teff) 3.6 3.7 3.8 3.9 4.0 4.1 0.04 0.00 lg(teff) lg(teff) resid FIGURE 5: AP residuals fr the dwarfs at G=15, pltted as a functin f the true APs, fr the full range f T eff and [Fe/H] shwn in Fig. 1 Technical Nte 8

ILIUM n T eff, [Fe/H] 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0.0 1.0 2.0 3.0 ual 1.5 0.5 0.5 1.5 0.010 0.000 lg(teff) resid 0.010 0.005 0.000 0.005 0 50 150 250 lg(teff) residual 3.60 3.65 3.70 3.75 3.80 3.85 1.5 0.5 0.5 1.5 lg(teff) 3.60 3.65 3.70 3.75 3.80 3.85 0.010 0.000 lg(teff) lg(teff) resid FIGURE 6: AP residuals fr the dwarfs at G=15 shwn just fr stars with true T eff 7000 K, pltted as a functin f the true APs Technical Nte 9

ILIUM n T eff, [Fe/H] 0.6 0.4 0.2 0.0 0.2 0.4 0 1 2 3 4 ual 2.0 1.5 1.0 0.5 0.0 0.5 1.0 0.6 0.2 0.2 2.0 1.5 1.0 0.5 0.0 0.5 1.0 0.010 0.000 lg(teff) resid 0.010 0.005 0.000 0.005 0 50 150 250 lg(teff) residual 3.60 3.65 3.70 3.75 3.80 3.85 0.6 0.2 0.2 lg(teff) 3.60 3.65 3.70 3.75 3.80 3.85 0.010 0.000 lg(teff) lg(teff) resid FIGURE 7: AP residuals fr the dwarfs at G=15 shwn just fr stars with true T eff 7000 K and [Fe/H] 2.0, pltted as a functin f the true APs Technical Nte 10

ILIUM n T eff, [Fe/H] As [Fe/H] decreases, its signature in the spectra weakens and in principle is harder t detect and estimate. We might therefre expect that remving the mst metal pr stars (which are anyway very rare in the Galaxy which Gaia will bserve) imprves the results. Remving als stars with true [Fe/H]< 2.0 dex (this remves 20% f the bjects frm the cl star sample) yields residuals as shwn in Fig. 7 and [Fe/H] lg (T eff ) δφ 5.6e 3 5.2e 5 ILIUM, dwarfs, G=15, T eff 7000 K and [Fe/H] 2.0 dex δφ 0.11 0.0018 σ φ 0.16 0.002 The errrs have hardly decreased, which tells us nt nly that ILIUM can estimate [Fe/H] equally well acrss the metallicity range (smething we culd anyway see in Fig. 6), but als that the T eff accuracy is nt affected by metallicity. Curiusly, the systematic errr in [Fe/H] has increased a bit, but this may nt be significant (it is still fur time smaller than the mean abslute errr). These are nw realistic summary statistics fr Gaia, as even if we didn t knw lg g frm the astrmetry, the majrity f bjects are dwarfs which are nt very metal pr. If we remve nly the lw metallicity stars but retain the ht stars, then the results hardly imprve with respect t the riginal (all APs) case. This cnfirms that it is the remval f ht stars which is crucial fr estimating metallicity, and this because they retain hardly any metallicity signature in their BP/RP spectra. Metal pr stars, in cntrast, retain a metallicity signature which we can still detect t an accuracy f 0.5 dex r better dwn t [Fe/H]= 4.0 I remind the reader that ILIUM is allwed t estimate AP values which extend beynd the training grid, by 10% f the AP range in each directin (the default setting). Even thugh all the test data have true APs within the grid limits, we see that ILIUM des assign a few values beynd this, e.g. the tw 4.5 dex stars we can identify in Fig. 6. S far we have made cuts n the true APs in rder t predict perfrmance n ppulatins f certain types f stars. In the real applicatin, we wuld need t make cuts based n the estimated APs. If we d this and recalculate the statistics fr the cl star sample we get [Fe/H] lg (T eff ) δφ 0.016 8.0e 5 ILIUM, dwarfs, G=15, estimated T eff 7000 K δφ 0.13 0.0017 σ φ 0.21 0.002 which is n wrse than when making cuts n the real APs. Of curse, t measure residuals and perfrmance statistics like this we wuld need sme independent truth, but it is still interesting t plt the residuals agaist the estimated APs, as in Fig. 8. The interpretatin f the plts is left t the reader as an exercise. ILIUM estimated the AP uncertainties fr each star. Returning t the results fr the full AP range, Fig. 9 shws the rati f the AP uncertainties t the abslute value f the true residuals. The errr predictins and their distributin are reasnable, althugh there is a tendency t Technical Nte 11

ILIUM n T eff, [Fe/H] 1.5 1.0 0.5 0.0 0.5 1.0 0.0 1.0 2.0 3.0 ual 1.5 0.5 0.5 estimated 0.010 0.000 estimated lg(teff) resid 0.010 0.005 0.000 0.005 0 50 150 250 lg(teff) residual 3.60 3.65 3.70 3.75 3.80 3.85 1.5 0.5 0.5 estimated lg(teff) 3.60 3.65 3.70 3.75 3.80 3.85 0.010 0.000 estimated lg(teff) lg(teff) resid FIGURE 8: AP residuals fr the dwarfs at G=15 shwn just fr stars with estimated T eff 7000 K, pltted as a functin f the estimated APs Technical Nte 12

ILIUM n T eff, [Fe/H] 0 1 2 3 4 5 0.0 0.5 1.0 1.5 errr estimate rati 0 1 2 3 4 5 errr estimate rati 0 1 2 3 4 5 lg(teff) estimate rati 0 1 2 3 4 5 0.0 1.0 2.0 3.0 lg(teff) errr estimate rati 3.6 3.7 3.8 3.9 4.0 4.1 0 1 2 3 4 5 lg(teff) errr estimate rati 3.6 3.7 3.8 3.9 4.0 4.1 0 1 2 3 4 5 lg(teff) lg(teff) errr estimate rati FIGURE 9: Estimated AP uncertainties fr the dwarfs expressed as a rati f the abslute value f the true residuals, fr G=15 Technical Nte 13

ILIUM n T eff, [Fe/H] underestimate the errrs, especially fr lg (T eff ). 2.2 G=18.5 Applying ILIUM t the same data at G=18.5, we again see pr statistics when averaging ver the full range f T eff and [Fe/H], s limiting t cl stars we get [Fe/H] lg (T eff ) δφ 0.037 4.5e 6 ILIUM, dwarfs, G=18.5, T eff 7000 K δφ 0.26 0.0024 σ φ 0.42 0.0033 Even 3.5 magnitudes fainter than G=15, we still get very reasnable results, with little trend in the accuracy with T eff r [Fe/H]. That the perfrmance degrades little is nt surprising, hwever, when we cnsider that the SNR per band is still ver 20 fr mst f the spectrum (see Fig. 2). 2.3 G=20 We nw apply ILIUM t stars at Gaia s magnitude limit. We d nt necessarily expect the best science t cme ut these bjects the median SNR per band is just 10 but as there will be s many f them it is imprtant t assess hw well we can estimate their APs. Furthermre the perfrmance n G=20 end-f-missin data is rughly what we expect fr a single transit spectrum n G=17.7 stars, a scaling which assumes that the nise is dminated by surce nise. (That is, if the surce delivers F phtns per transit ver N transits, I am assuming SNR F N, which can be cntrasted with the case when backgrund/readut nise dminate, in which case SNR F N.) In practice the results wuld actually apply t slightly brighter stars, because the flux limit fr a given SNR scales mre rapidly with the number f transits than N 0.5, due t the surce-independent nise terms and because additinal nise will effectively be intrduced by the spectral cmbinatin. The summary statistics fr cl stars at G=20 are [Fe/H] lg (T eff ) δφ 0.033 3.6e 4 ILIUM, dwarfs, G=20.0, T eff 7000 K δφ 0.82 0.0070 σ φ 1.14 0.009 and the residuals are pltted in Fig. 10. Over the whle metallicity range T eff accuracy is still very gd at 1.5%. Remving in additin the ht stars hardly imprves this, decreasing it by abut 7%. As expected, metallicity perfrmance is much wrse than at G=18.5, althugh we can still distinguish metal pr stars ([Fe/H] < 2.5 dex) frm slar metallicity nes at three times the mean abslute errr (r at 2 sigma if we use the RMS). Hwever, the mst metal pr stars suffer frm systematic errrs: stars with [Fe/H]= 4.0 have a systematic metallicity errr f Technical Nte 14

ILIUM n T eff, [Fe/H] 4 2 0 2 0.0 0.2 0.4 0.6 ual 4 2 0 2 0.02 0.00 0.02 0.04 lg(teff) resid 0.02 0.01 0.00 0.01 0.02 0.03 0.04 0 20 40 60 lg(teff) residual 3.60 3.65 3.70 3.75 3.80 3.85 4 2 0 2 lg(teff) 3.60 3.65 3.70 3.75 3.80 3.85 0.02 0.00 0.02 0.04 lg(teff) lg(teff) resid FIGURE 10: AP residuals fr the dwarfs at G=20 shwn just fr stars with true T eff 7000 K, pltted as a functin f the true APs Technical Nte 15

ILIUM n T eff, [Fe/H] +0.9 dex (verestimated), and a standard deviatin abut this f 1.3 dex. At [Fe/H]= 3.0 the systematic is +0.6 dex with a standard deviatin abut this f 1.3 dex. We might think that we culd crrect fr these systematics, but it turns ut that we cannt (see Appendix A). 0.10 0.05 0.00 0.05 0.10 4 2 0 2 4 lg(teff) residual [Fe/H] resid 0.10 0.05 0.00 0.05 0.10 4 2 0 2 4 lg(teff) residual [Fe/H] resid FIGURE 11: Crrelatin between the residuals n the dwarfs at G=20 fr the full sample (left) and fr the cl stars (T eff 7000 K) (right). The Pearsn crrelatin cefficients are 0.37 (left) and 0.003 (right) Fig. 11 plts the crrelatin between the residuals. On the full sample there is a small but significant anticrrelatin. This is presumably related t the systematic errrs intrduced by the ht stars, because there is n significant crrelatin nce we remve these frm the analysis (right panel). It is almst as imprtant t have a measure f uncertainty in an AP estimate as it is t have the AP estimate itself, as nly then d we knw whether (and t what degree) we can trust the estimate. Statistics based n test sets (i.e. thse shwn in the table abve) are imprtant, but it is desirable t have bject-specific uncertainty estimates which take the actual measurement int accunt. ILIUM can d this, as was decsribed in CBJ-042. These errr predictins are shwn in Fig. 12, pltted as a rati ver the true residuals fr each bject. The distributin is better than we saw fr G=15: it extends ver a larger range f values and is nt skewed twards frequent underestimatin. In additin t errr estimates, we need t knw whether a presented unlabelled bject fits int the dmain f the classifier s training grid. We assess this via a Gdness-f-Fit (GF) measure between the bserved spectrum and the spectrum which ILIUM predicts. ILIUM currently measures this via the reduced χ 2 fr this, whereby a value f 1 is expected fr a gd fit. The distributin is shwn in Fig. 13 and has a mean f 0.97 (median f 0.45). Technical Nte 16

ILIUM n T eff, [Fe/H] 0 2 4 6 8 0.0 0.2 0.4 0.6 0.8 errr estimate rati 0 2 4 6 8 errr estimate rati 0 2 4 6 8 lg(teff) errr estimate rati 0 2 4 6 8 0.0 0.5 1.0 1.5 lg(teff) errr estimate rati 3.6 3.7 3.8 3.9 4.0 4.1 0 2 4 6 8 lg(teff) errr estimate rati 3.6 3.7 3.8 3.9 4.0 4.1 0 2 4 6 8 lg(teff) lg(teff) errr estimate rati FIGURE 12: Estimated AP uncertainties fr the dwarfs expressed as a rati f the abslute value f the true residuals, fr G=20, fr the full range f APs. The red pints are fr bjects with T eff 7000 K: f these 327 bjects, 24 (7%) have errr estimate ratis greater than 8. Technical Nte 17

ILIUM n T eff, [Fe/H] 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 GF (chisq) FIGURE 13: Distributin f the gdness-f-fit (reduced χ 2 ) values fr the dwarf sample at G=20 3 Applicatin t giants The frward mdel fits fr the giants are shwn in Figs. 14 and 15. Recall that the frward mdels are fit t nise-free data (the black crsses in the figures). T get an idea f hw much the nisy data deviates frm them, I verplt G=20 data as blue pints in Figs. 14 and 15. This can be cmpared t the blue pints in the dwarf frward mdel plts, which were fr G=15 data. The fits (red pints) against T eff are quite similar t what we fund fr the dwarfs (Fig. 3), which is nt surprising as T eff is a strng parameter and s the different lg g selectin has little impact. (In bth cases [Fe/H] was held cnstant at 1.0 dex). At first glance the [Fe/H] fit lks very different frm the dwarf case (Fig. 4), but this is mstly because f the different scales n the rdinate. Yet there are differences, indicating that the metallicity dependence f the flux depends n the surface gravity. At G=15 the perfrmance n the test set fr the full T eff and [Fe/H] range can be summarized as [Fe/H] lg (T eff ) δφ 0.089 1.0e 3 ILIUM, giants, G=15, full AP range δφ 0.62 0.0048 σ φ 1.03 0.0072 This is very similar t what we fund with the dwarfs. We likewise see here that we get better perfrmance n the cler stars Technical Nte 18

ILIUM n T eff, [Fe/H] 5 0 5 10 338 4 2 0 2 4 362 2 1 0 1 2 392 2.0 1.0 0.0 1.0 432 c(grid.aps[subgrid.bj, 2], runif.grid.aps[, 2]) c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 3 2 1 0 1 488 c(grid.aps[subgrid.bj, 2], runif.grid.aps[, 2]) c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 4 2 0 2 573 c(grid.aps[subgrid.bj, 2], runif.grid.aps[, 2]) c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 2 1 0 1 2 3 681 c(grid.aps[subgrid.bj, 2], runif.grid.aps[, 2]) c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 1 0 1 2 731 c(grid.aps[subgrid.bj, 2], runif.grid.aps[, 2]) c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 1 0 1 2 3 789 c(grid.aps[subgrid.bj, 2], runif.grid.aps[, 2]) c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 1 0 1 2 3 857 c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 1 0 1 2 3 934 c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) 4 2 0 2 4 1020 c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, band]) FIGURE 14: Predictins f the full frward mdel fr the dwarfs as a functin f lg (T eff ) at cnstant [Fe/H]= 1.0 in 12 different bands (with wavelength in nm at the tp f each panel). The black crsses are the (nise-free) grid pints, the red stars are the frward mdel predictins (at randmly selected AP values) and the blue circles the nisy (G=20) grid pints. The flux pltted n the rdinate is in standardized units. [Fe/H] lg (T eff ) δφ 0.01 2.3e 4 δφ 0.22 0.0028 σ φ 0.34 0.0037 ILIUM, giants, G=15, T eff 7000 K The systematic in [Fe/H] is again reduced and n lnger significant. Bth [Fe/H] and T eff can be estimated accurately fr giants at this magnitude, althugh the errrs are abut 50% larger than culd be achieved with dwarfs. The perfrmance at G=18.5 and G=20 is as fllws Technical Nte 19

ILIUM n T eff, [Fe/H] c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, c(grid.pht.st[subgrid.bj, band]) band], runif.pred.pht.st[, band]) 0.4 0.8 1.2 1.6 0.5 1.0 1.5 2.0 2.5 2.0 1.6 1.2 0.8 10 5 0 338 4 2 0 2 362 392 432 681 857 c(grid.aps[subgrid.bj, 1], runif.grid.aps[, 1]) c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, c(grid.pht.st[subgrid.bj, band]) band], runif.pred.pht.st[, band]) 0.4 0.8 1.2 1.6 0.5 1.0 1.5 1.4 1.0 0.6 731 2.0 1.0 488 573 934 c(grid.aps[subgrid.bj, 1], runif.grid.aps[, 1]) c(grid.pht.st[subgrid.bj, band], runif.pred.pht.st[, c(grid.pht.st[subgrid.bj, band]) band], runif.pred.pht.st[, band]) 2 1 0 1 2 1.5 0.5 0.5 1.5 0.6 1.0 1.4 789 1020 c(grid.aps[subgrid.bj, 1], runif.grid.aps[, 1]) FIGURE 15: As Fig. 14, but nw shwing predictins f the full frward mdel as a functin f [Fe/H] at cnstant T eff =5000 K. and [Fe/H] lg (T eff ) δφ 0.03 3.4e 4 δφ 0.31 0.0035 σ φ 0.50 0.0045 [Fe/H] lg (T eff ) δφ 0.11 1.1e 3 δφ 0.74 0.0073 σ φ 1.08 0.0092 ILIUM, giants, G=18.5, T eff 7000 K ILIUM, giants, G=20, T eff 7000 K With respect t G=15, the [Fe/H] errrs are 1.4 and 3.4 times higher at G=18.5 and G=20 respectively. T eff can be estimated t 0.8% and 1.7% respectively (1.3 and 2.6 times higher than at G=15). In all three magnitude cases, remving the metal pr stars ([Fe/H] < 2.0) barely imprves the results ver what we see in the abve tables, the largest reductin being f Technical Nte 20

ILIUM n T eff, [Fe/H] 0.05 dex in δφ in [Fe/H] at G=20, just as we saw fr the dwarfs. 4 Applicatin t stars with unknwn lg g S far we have trained ILIUM using stars with a restricted lg g range and applied it t stars with the same lg g range. Hwever, there is n reasn why we have t d this. Fr example, if we had a magnitude limited sample and were unable t estimate lg g, then we might decide t use the ILIUM mdel trained n dwarfs, because the majrity f stars are dwarfs. This wuld presumably intrduce larger errrs n the giants. Alternatively, we might train ILIUM n a unifrm sampling in lg g. This we d here, fitting ILIUM n the full range f lg g, frm 0.5 t 5.0 in steps f 0.5 (see Fig. 5 f CBJ-042). The full T eff, lg g and [Fe/H] grid cmprises 4361 stars, f which 75% are randmly selected fr training and the remaining 25% fr testing. The perfrmance n data at G=18.5 is [Fe/H] lg (T eff ) δφ 0.02 1.0e 4 ILIUM, all lg g, G=18.5, T eff 7000 K δφ 0.40 0.0052 σ φ 0.60 0.0070 The perfrmance is slightly wrse in bth APs than when we were restricted t either dwarfs r giants. 5 Estimating all three astrphysical parameters The mdel in the previus sectin (call it ILIUM ) allws us t estimate [Fe/H] and T eff withut knwing lg g. Having estimated [Fe/H], we culd then use a T eff lg g versin f ILIUM (call it ILIUM lgg ) at the apprpriate [Fe/H] t estimate lg g, and thereby cme up with a slutin fr all three APs. In CBJ-042 I demnstrated that lg g culd be estimated at G=18.5 t an accuracy f 0.35 dex (mean abslute errr). That assumed [Fe/H] =0, but we culd f curse build several mdels f ILIUM lgg at different metallicities and chse the apprpriate ne based n the estimated [Fe/H]. T check the viability f this apprach, I have trained and tested an ILIUM lgg mdel nw using a small range f metallicities, namely [Fe/H] { 2.5, 2.0, 1.5}, n G=18.5. This range simulates having identified a metal pr star with sme uncertainty in the metallicity estimatin. The full data set cntains 874 stars, randmly split int equal-sized train and test sets. lg g lg (T eff ) δφ 0.077 4.2e 4 ILIUM, G=18.5, [Fe/H] { 2.5, 2.0, 1.5} δφ 0.49 0.0058 σ φ 0.79 0.0081 T eff can be estimated just as accurately as the [Fe/H] = 0 case (CBJ-042). lg g is slightly wrse (it was δφ = 0.35 with [Fe/H] = 0) but still reasnable. This shws that a tw-stage apprach Technical Nte 21

ILIUM n T eff, [Fe/H] t estimating all three APs is viable: (1) use ILIUM t estimate [Fe/H] and T eff ; (2) use ILIUM lgg t estimate lg g and T eff. In principle we culd even iterate this and re-estimate [Fe/H] again with ILIUM mdel and thereby achieve better accuracy. Maybe we wuld have the first ILIUM trained nly n dwarfs t get best accuracy n mst stars. There are many alternatives. Of curse, it is still quite pssible that ILIUM can be extended t multiple weak and strng APs, as described in sectin 5 f CBJ-042. References Bailer-Jnes C.A.L., 2008, ILIUM: An iterative lcal interplatin methd fr parameter estimatin, GAIA-C8-TN-MPIA-CBJ-042 Srd R., Vallenari A., 2008, Descriptin f CU8 cycle 3 simulated data, GAIA-C8-DA-OAPD- RS-002 Zaldua I., et al., 2008, Interface Cntrl Dcument fr GOG v2.0.2 (cycle3), GAIA-C2-SP-UB- IZ-001-02 Technical Nte 22

ILIUM n T eff, [Fe/H] A Why we cannt crrect fr the systematic errrs 4 2 0 2 estimated 3.0 2.0 1.0 0.0 0.5 2 0 1 2 3 crrected after crrectin 4 2 0 2 true 2 0 1 2 3 true after crrectin FIGURE 16: Metallicity residuals fr the G=20 dwarf experiment in sectin 2.3 fr T eff 7000 K. Tp right: [Fe/H] residuals against true [Fe/H]. Tp left: [Fe/H] residuals against estimated [Fe/H]. The red line is a linear fit t achieve a crrectin. Bttm left: [Fe/H] residuals after applying the crrectin vs. the crrected [Fe/H]. Bttm right: The crrected residuals pltted against the true [Fe/H]. Fig. 16 demnstrates why we cannt crrect fr systematic metallicity errrs, at least nt fr the G=20 case discussed in sectin 2.3. First, we cannt deduce frm the plt f the residuals against the true [Fe/H] whether r nt a crrectin is pssible (tp right panel): The true [Fe/H] cannt be the basis f a crrectin f unlabelled data! If we plt against the estimated [Fe/H] (tp left panel), then we see a systematic trend which we can fit, e.g. with the red line shwn. We then subtract this frm each estimated [Fe/H] t give the crrected [Fe/H]. We can then analyse hw well this crrectin has perfrmed. The bttm left panel shws the residual in the crrected [Fe/H] (i.e. crrected minus true) pltted as a functin f the crrected [Fe/H]. Cmparing t the plt abve it, we can see hw the crrectin has wrked. Hwever, if we nw plt the residuals against the true metallicity, we see that the systematic has actually gt wrse (cmpare t the plt abve it). Technical Nte 23