Making precise and accurate measurements with data-driven models David W. Hogg Center for Cosmology and Particle Physics, Dept. Physics, NYU Center for Data Science, NYU Max-Planck-Insitut für Astronomie, Heidelberg Flatiron Institute, Simons Foundation, New York City Hunstead Lecture 2017 November 9
conclusions A data-driven model measures physical parameters of stars with better quality than any physics-driven pipeline. The Cannon no physics is harmed Connections to other physical systems and models. Connections to extra-solar-planet and Milky-Way science. Criticisms of vanilla machine learning. Everything open-source or public-domain. Melissa Ness (MPIA), Andy Casey (Cambridge), Anna Y. Q. Ho (Caltech), Lauren Anderson (Flatiron), Hans-Walter Rix (MPIA)
De-noising (Anderson)
Annie Jump Cannon O B A F G K M temperature sequence! alphabetical order is hydrogen-line-strength order Cannon understood the temperature sequence of stars without the benefit of physical models data-driven non-linear dimensionality reduction manifold learning (using a huge amount of prior knowledge) namesake of The Cannon
the paradoxes of contemporary physics models are incredibly explanatory QCD, ΛCDM, helioseismology and yet...
the paradoxes of contemporary physics models are incredibly explanatory QCD, ΛCDM, helioseismology and yet... models are wrong (ruled out) in detail χ 2 ν The χ 2 statistic is a measure of the size of your data! data are abundant and precise
the paradoxes of contemporary physics models are incredibly explanatory QCD, ΛCDM, helioseismology and yet... models are wrong (ruled out) in detail χ 2 ν The χ 2 statistic is a measure of the size of your data! data are abundant and precise missing physics, approximation, gastrophysics models are fundamentally computational
context: Galactic archaeology stars populate orbits in the Milky Way conserved actions (or chaotic equivalents) stars are formed from particular gas clouds stars have conserved surface abundances the combined action-chemical space will be far more informative than either taken independently
context: Galactic archaeology top priority for many new projects Gaia & Gaia-ESO HERMES & GALAH SDSS-III APOGEE terrifying inconsistencies in current approaches models of stars are amazingly good...... but chemical signatures are incredibly tiny
context: extra-solar planets planets are measured relative to their host stars transits radial-velocity signals astrometric signals some planet measurements are now very precise need stellar properties for accuracy
definition: physics-driven models (my usage) put in everything you know gravity, atomic and molecular transitions, radiation make approximations to make things computable sub-grid models, mixing length, etc. compute like hell
definition: machine learning (my usage) the most extreme of data-driven models the data is the model none of your knowledge is relevant learn (fit) an exceedingly flexible model explain or cluster the data transformation from data to labels concept of non-parametrics concept of train, validate, and test many packages and implementations (and outrageous successes)
When does (vanilla) machine learning help you? train & test situation training data are statistically identical to the test data same noise amplitude same distance or redshift distribution same luminosity distribution never true! training data have accurate and precise labels therefore, we can t use vanilla machine learning! (physicists rarely can)
definition: data-driven models (my usage) make use of things you strongly believe noise model & instrument resolution causal structure (shared parameters) capitalize on huge amounts of data exceedingly flexible model concept of train, validate, and test every situation will be bespoke
label transfer for stars a few of your stars have good labels (from somewhere) can you use this to label the other stars? why would you want to do this?
label transfer for stars a few of your stars have good labels (from somewhere) can you use this to label the other stars? why would you want to do this? you don t have good models at your wavelengths? you want two surveys to be on the same system? you have some stars at high SNR, some at low SNR? you spent human time on some stars but can t on all?
stellar spectra stars are very close to black-bodies to first order, a stellar spectrum depends on effective temperature T eff, surface gravity log g, and metallicity [Fe/H]
stellar spectra stars are very close to black-bodies to first order, a stellar spectrum depends on effective temperature T eff, surface gravity log g, and metallicity [Fe/H] to second order, tens of chemical abundances, rotation, turbulence, activity
stellar spectra all chemical information is in absorption lines corresponding to atomic and molecular transitions some 30 elements are visible in the best stars spectroscopy at is the primary tool R λ > 20, 000 λ
stellar astrophysics 1.0 1.0 0.8 0.8 0.6 1.0 A B Teff = 4750, log g = 3.0, [Fe/H] = 0.15 Teff = 4849, log g = 2.2, [Fe/H] = -1.0 normalized flux f 0.6 0.8 0.6 1.0 0.4 0.8 0.6 0.2 1.0 0.8 Teff = 3614, log g = 0.4, [Fe/H] = -0.68 Teff = 5003, log g = 2.8, [Fe/H] = -0.71 0.6 0.0 0.015200 15400 0.2 15600 158000.4 16000 16200 0.6 16400 16600 0.8 16800 1.0 wavelength λ (Å)
stellar astrophysics
SDSS-III APOGEE Galactic archaeology APOGEE DR12 & DR13: 156,000 stars (98,000 giants) R = 22, 500 spectra in 1.5 < λ < 1.7 µm precise RVs and stellar parameters 15 19 abundances per star (our own home-built and special continuum normalization; ask me!) all data completely public!
SDSS-III APOGEE 1.0 1.0 0.8 0.8 0.6 1.0 A B Teff = 4750, log g = 3.0, [Fe/H] = 0.15 Teff = 4849, log g = 2.2, [Fe/H] = -1.0 normalized flux f 0.6 0.8 0.6 1.0 0.4 0.8 0.6 0.2 1.0 0.8 Teff = 3614, log g = 0.4, [Fe/H] = -0.68 Teff = 5003, log g = 2.8, [Fe/H] = -0.71 0.6 0.0 0.015200 15400 0.2 15600 158000.4 16000 16200 0.6 16400 16600 0.8 16800 1.0 wavelength λ (Å)
train, validate, and test split the data into three disjoint subsets in the training step you set the parameters of your model using the training set the validation set is used to set hyperparameters or model complexity in the test step you apply the model to the test set new data to make predictions or deliver results
The Cannon: Experiment 1: training set 543 stars (too few) from 19 clusters (too few) T eff, log g, [Fe/H] labels from APOGEE calling parameters and abundances labels slight adjustments to labels to get them onto possible isochrones terrible coverage of the main sequence only the Pleiades home-made Pleiades labels (by Ness) no [Fe/H] spread at high log g.
The Cannon: Experiment 1; training set log g (dex) log g (dex) log g (dex) log g (dex) 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 5500 13 Gyr ( 2.35) 12 Gyr ( 2.33) 12.7 Gyr ( 2.06) 13 Gyr ( 1.98) 13 Gyr ( 1.78) M92 M15 M53 N5466 N4147 13 Gyr ( 1.66) 11.7 Gyr ( 1.58) 11.5 Gyr ( 1.5) 13 Gyr ( 1.33) 13 Gyr ( 003) M2 M13 M3 M5 M107 10 Gyr ( 0.82) 1 Gyr ( 0.28) 2 Gyr ( 0.20) 5 Gyr ( 0.03) 3.2 Gyr ( 0.01) M71 N2158 N2420 N188 M67 1.6 Gyr (0.02) 0.15 Gyr (+0.03) 2.5 Gyr (+0.09) 5 Gyr (+0.47) N7789 Pleiades N6819 N6791 5000 4500 4000 5500 5000 4500 4000 5500 5000 4500 4000 5500 5000 4500 4000 5500 5000 4500 4000 Teff (K) Teff (K) Teff (K) Teff (K) Teff (K)
The Cannon: Experiment 1; training set log g (dex) log g (dex) log g (dex) log g (dex) 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 5500 13 Gyr ( 2.35) 12 Gyr ( 2.33) 12.7 Gyr ( 2.06) 13 Gyr ( 1.98) 13 Gyr ( 1.78) M92 M15 M53 N5466 N4147 13 Gyr ( 1.66) 11.7 Gyr ( 1.58) 11.5 Gyr ( 1.5) 13 Gyr ( 1.33) 13 Gyr ( 003) M2 M13 M3 M5 M107 10 Gyr ( 0.82) 1 Gyr ( 0.28) 2 Gyr ( 0.20) 5 Gyr ( 0.03) 3.2 Gyr ( 0.01) M71 N2158 N2420 N188 M67 1.6 Gyr (0.02) 0.15 Gyr (+0.03) 2.5 Gyr (+0.09) 5 Gyr (+0.47) N7789 Pleiades N6819 N6791 5000 4500 4000 5500 5000 4500 4000 5500 5000 4500 4000 5500 5000 4500 4000 5500 5000 4500 4000 Teff (K) Teff (K) Teff (K) Teff (K) Teff (K)
The Cannon: model a generative model of the APOGEE spectra given label vector l, predict flux vector f probabilistic prediction p(f l, θ) use every spectral pixel s uncertainty variance σλn 2 responsibly details: spectral expectation is quadratic in the labels every wavelength λ treated independently an intrinsic Gaussian scatter s 2 λ at every wavelength λ 80,000 free parameters in θ!
The Cannon: model ln p(f n l n, θ) = L ln p(f λn l n, θ λ, s 2 λ) λ=1 ln p(f λn l n, θ λ, s 2 λ) = 1 [f λn θ λt l n] 2 2 σλn 2 + + ln(σ s2 λn 2 + s 2 λ) λ l T {1, T eff, log g, [Fe/H], Teff, 2 T eff log g,, [Fe/H] 2} θ T { θ λ, s 2 } L λ λ=1
The Cannon: model ln p(f n l n, θ) training step: optimize w.r.t. parameters θ at fixed labels l using training-set data linear least squares every wavelength λ treated independently test step: optimize w.r.t. labels l at fixed parameters θ using test-set (survey) data non-linear optimization every star treated independently
The Cannon: model training
The Cannon: model training
The Cannon: model training cross-validation
The Cannon: results The Cannon is far faster than physical modeling model trains in seconds (thousands of fits) The Cannon labels 10 5 stars per hour (pure Python on a laptop) labels appear sensible The Cannon labels lie near sensible isochrones scatter against APOGEE labels consistent with APOGEE precision successfully puts labels on dwarfs
The Cannon: Experiment 1: comparison with ASPCAP labels
The Cannon: Experiment 1: label veracity
The Cannon: Experiment 1: label veracity
The Cannon: works at low signal-to-noise
The Cannon: works at low signal-to-noise
The Cannon: results The Cannon is far faster than physical modeling model trains in seconds (thousands of fits) The Cannon labels 10 5 stars per hour (pure Python on a laptop) labels appear sensible The Cannon labels lie near sensible isochrones scatter against APOGEE labels consistent with APOGEE precision successfully puts labels on dwarfs
The Cannon: shortcuts and choices no Bayes; no partial or noisy labels quadratic order replacing polynomial with a Gaussian process continuous model complexity; non-parametric spectral representation too-small training set only three labels age, [α/fe] splitting the giant branch how to go to many elements?
The Cannon: label transfer from APOGEE to LAMOST (Casey, Ho)
The Cannon: masses and ages for red giants (Ness)
The Cannon: masses and ages for red giants (Ness)
The Cannon: detailed abundances (Casey) 6000 5000 4000 T eff 5 38 4000 5000 6000 5.0 2.5 0.0 log g 0.00 0.07 0.0 2.5 5.0 0.0 1.5 3.0 [Al/H] 0.00 0.09 0.0 [Ca/H] 0.0 [C/H] 0.0 [Fe/H] 1.5 3.0 0.00 0.07 1.5 3.0 0.00 0.06 1.5 3.0 0.00 0.03
4000 5000 6000 0.0 2.5 5.0 The Cannon: detailed abundances (Casey) 0.0 [Ca/H] 0.0 [C/H] 0.0 [Fe/H] 1.5 3.0 0.00 0.07 1.5 3.0 0.00 0.06 1.5 3.0 0.00 0.03 0.0 [K/H] 0.0 [Mg/H] 0.0 [Mn/H] 1.5 3.0 0.01 0.11 1.5 3.0 0.00 0.04 1.5 3.0 0.00 0.06 0.0 [Na/H] 0.0 [Ni/H] 0.0 [N/H]
The Cannon: detailed abundances (Casey) 3.0 1.5 0.01 0.11 3.0 1.5 0.00 0.04 3.0 1.5 0.00 0.06 3.0 1.5 0.0 0.01 0.17 [Na/H] 3.0 1.5 0.0 0.00 0.05 [Ni/H] 3.0 1.5 0.0 0.00 0.06 [N/H] 3.0 1.5 0.0 0.00 0.07 [O/H] 3.0 1.5 0.0 0.00 0.05 [Si/H] 3.0 1.5 0.0 0.00 0.09 [S/H]
3.0 3.0 3.0 The Cannon: detailed abundances (Casey) 0.0 [O/H] 0.0 [Si/H] 0.0 [S/H] 1.5 3.0 0.00 0.07 1.5 3.0 0.00 0.05 1.5 3.0 0.00 0.09 0.0 [Ti/H] 0.0 [V/H] 1.5 0.01 0.13 3.0 3.0 1.5 0.0 1.5 0.03 0.26 3.0 3.0 1.5 0.0
The Cannon: detailed abundances (Ness)
The Cannon: detailed abundances (Ness)
The Cannon: detailed abundances (Ness)
lessons learned regressions are different from density estimators value of convex regularization
The Cannon: identification of lines (Casey) 1.0 [Al/H] [S/H] [K/H] 0.5 θ/max θ 0.0 0.5 1.0 15200 15280 15360 16650 16700 16750 16800 λ (Å)
The Cannon: discovery of outliers (Ho)
The Cannon: chemical tagging
The future: Unsupervised (Anderson)
The future: Unsupervised (Anderson)
read more original paper on The Cannon and APOGEE: Ness et al., arxiv:1501.07604 labeling LAMOST, RAVE: Ho et al., arxiv:1602.00303, Casey et al., arxiv:1609.02914 chemical abundances: Casey et al., arxiv:1603.03040, Ness et al., arxiv:1701.07829 red-giant masses and ages: Ness et al., arxiv:1511.08204; Ho et al., arxiv:1609.03195 chemical tagging: Hogg et al., arxiv:1601.05413 de-noising Gaia: Anderson et al., arxiv:1706.05055 Eilers in prep: working with missing and noisy labels Price-Whelan in prep: modeling spectroscopic binaries Bedell in prep: extreme-precision radial-velocity
conclusions A data-driven model measures physical parameters of stars with better quality than any physics-driven pipeline. The Cannon no physics is harmed Connections to other physical systems and models. Connections to extra-solar-planet and Milky-Way science. Criticisms of vanilla machine learning. Everything open-source or public-domain. Melissa Ness (MPIA), Andy Casey (Cambridge), Anna Y. Q. Ho (Caltech), Lauren Anderson (Flatiron), Hans-Walter Rix (MPIA)