Data-Intensive Statistical Challenges in Astrophysics
|
|
- Muriel York
- 5 years ago
- Views:
Transcription
1 Data-Intensive Statistical Challenges in Astrophysics Alex Szalay The Johns Hopkins University Lots of contributions from Tamas Budavari (JHU)
2 Scientific Data Analysis Today Scientific data is doubling every year, reaching PBs Data is everywhere, never will be at a single location Architectures increasingly CPU-heavy, IO-poor Data-intensive scalable architectures needed Databases are a good starting point Scientists spend 80% of time on menial tasks Scientists need special features (arrays, GPUs) Most data analysis done on midsize BeoWulf clusters Universities hitting the power wall Soon we cannot even store the incoming data stream Not scalable, not maintainable
3 Non-Incremental Changes Science is moving increasingly from hypothesisdriven to data-driven discoveries Need new randomized, incremental algorithms Best result in 1 min, 1 hour, 1 day, 1 week New computational tools and strategies not just statistics, not just computer science, not just astronomy, not just genomics Need new data intensive scalable architectures Astronomy has always been data-driven. now becoming more generally accepted
4 Scalable Data-Intensive Analysis Large data sets => data resides on hard disks Analysis has to move to the data Hard disks are becoming sequential devices For a PB data set you cannot use a random access pattern Both analysis and visualization become streaming problems Same thing is true with searches Massively parallel sequential crawlers (MR, Hadoop, etc) Spatial indexing needs to be maximally sequential Space filling curves (Peano-Hilbert, Morton, )
5 Why Astronomy Data? Approach inherently and traditionally data-driven Important spatio-temporal features Very large density contrasts in populations Real errors and covariances Many signals very subtle, buried in systematics Data sets very large, pushing scalability Worthless! Jim Gray
6 The Age of Surveys CMB Surveys (pixels) 1990 COBE Boomerang 10, CBI 50, WMAP 1 Million 2008 Planck 10 Million Angular Galaxy Surveys (obj) 1970 Lick 1M 1990 APM 2M 2005 SDSS 200M 2011 PS1 1000M 2020 LSST 30000M Time Domain QUEST SDSS Extension survey Dark Energy Camera Pan-STARRS LSST Galaxy Redshift Surveys (obj) 1986 CfA LCRS dF SDSS BOSS LAMOST Petabytes/year
7 Sloan Digital Sky Survey The Cosmic Genome Project Two surveys in one Photometric survey in 5 bands Spectroscopic redshift survey Data is public 2.5 Terapixels of images => 5 Tpx 10 TB of raw data => 120TB processed 0.5 TB catalogs => 35TB in the end Started in 1992, finished in 2008 Extra data volume enabled by Moore s Law Kryder s Law
8 Astro-Statistical Challenges The crossmatch problem (multi-, time domain) Photometric redshifts (prediction/regression problem) Correlations (auto/cross, higher order) Outlier detection in many dimensions Statistical errors vs systematics Comparing observations to models comparing distributions, updating models The unknown unknown, when we have no models.. Exascale streams are coming (Square Km Array) Scalability!!!
9 Relevant Astro Challenges I. Analysis of spectra (multispectral data) Sparse signal in large dimensions Much noise, and very rare events 4Kx1M SVD problem, perfect for randomized algorithms Motivated our work on robust incremental PCA Photometric measurements (segmentation) Currently 500M rows x 500 attributes Noisy, missing data Spatial, temporal searches and associations Dominated by unknown systematic errors Small number of subtle outliers, buried in very large typical population Very skewed distributions
10 Galaxy Diversity from PCA PC 1st [Average Spectrum] 2nd [Stellar Continuum] 3rd [Finer Continuum Features + Age] 4th 5th [Age] Balmer series hydrogen lines [Metallicity] Mg b, Na D, Ca II Triplet
11 Streaming PCA Initialization Eigensystem of a small, random subset Truncate at p largest eigenvalues Incremental updates Mean and the low-rank A matrix SVD of A yields new eigensystem Randomized algorithm! T. Budavari, D. Mishin 2011
12 Robust PCA PCA minimizes σ RMS of the residuals r = y Py Quadratic formula: r 2 extremely sensitive to outliers We optimize a robust M-scale σ 2 (Maronna 2005) Implicitly given by Fits in with the iterative method! Outliers can be processed separately
13 Eigenvalues in Streaming PCA 13 Classic Robust
14 Examples with SDSS Built on top of the Incremental Robust PCA Principal Component Pursuit (I. Csabai) CX decomposition (C-W Yip)
15 Principal component pursuit Low rank approximation of data matrix: X Standard PCA: min X E subject to rank( E) k 2 works well if the noise distribution is Gaussian outliers can cause bias Principal component pursuit min subjectto sparse spiky noise/outliers: try to minimize the number of outliers while keeping the rank low NP-hard problem A 0 X N A, rank( N) k The L1 trick: min N A subjectto X N A * N, A 1 numerically feasible convex problem (Augmented Lagrange Multiplier) min N, A N A subjectto X ( N A) * 1 2 * E. Candes, et al. Robust Principal Component Analysis. preprint, Abdelkefi et al. ACM CoNEXT Workshop (traffic anomaly detection)
16 Testing on Galaxy Spectra Slowly varying continuum + absorption lines Highly variable sparse emission lines This is the simple version of PCP: the position of the lines are known but there are many of them, automatic detection can be useful spiky noise can bias standard PCA DATA: Streaming robust PCA implementation for galaxy spectrum catalog (L. Dobos et al.) SDSS 1M galaxy spectra Morphological subclasses Robust averages + first few PCA directions
17 PCA PCA reconstruction Residual
18 Principal component pursuit Low rank Sparse Residual λ=0.6/sqrt(n), ε=0.03
19 Galaxy ID Galaxy ID Not Every Data Direction is Equal Wavelength Selected Wavelengths Wavelength A = C X Selected Wavelengths Procedure: 1. Perform SVD of A = U V T 2. Pick number of eigenvectors = K 3. Calculate Leverage Score = i V T ij 2 / K Mahoney and Drineas 2009
20 Wavelength Sampling Probability k = 2 c = 7 k = 4 c = 16 k = 6 c = 25 k = 8 c = 29
21 Ranking Astronomical Line Indices Subspace Analysis of Spectra Cutouts: -Othogonality -Divergence -Commonality (Worthey et al. 94; Trager et al. 98) (Yip et al in prep.)
22 Identifying New Line Indices, Objectively (Yip et al in prep.)
23 Relevant Astro Challenges II. Photometric redshifts High dimensional embedding of a noisy low-dimensional process: guess galaxy distance from its colors Subtle statistical problem, with huge computational needs => scalability challenge Training set => regression/estimation engine Models constrained by Bayesian priors during training (Budavari) Need to detect atypical objects that do not fit known patterns Large-Scale Fuzzy Spatial Cross-Matching Soon we need to do spatial matches 100Bx100B data sets Experimenting with different approaches (DB/Hadoop/GPU)
24 Photometric Redshifts Normally, distances from Hubble s Law H r Measure the Doppler shift of spectral lines distance! v 0 But spectroscopy is very expensive SDSS: 640 spectra in 45 minutes vs. 300K 5 color images Future big surveys will have no spectra Idea: Multicolor images are like a crude spectrograph Statistical estimation of the redshifts/distances
25 Photometric Redshifts Phenomenological (PolyFit, ANNz, knn, RF) Simple, quite accurate, fairly robust Little physical insight, difficult to extrapolate, Malmquist Template-based (KL, HyperZ ) Simple, physical model Calibrations, templates, issues with accuracy Hybrid ( base learner ) Physical basis, adaptive Complicated, compute intensive Important for next generation surveys! We must understand the errors!
26 Recent Developments Random Forests S. Carliles, C. Priebe, A. Szalay, T. Budavari, S. Heinis (2009) Slightly better than other estimators Estimated errors close to Gaussian, and accurate Physically motivated removal of various systematics Inclination Self Absorption in a galaxy (Yip et al 2011) Effect of emission lines Unified theory of photometric redshifts (Budavari 2010) Not a regression problem Kernel density estimators, constrained by model priors
27 RF is like Kernel Estimator For a given query point each tree picks a neighborhood The ensemble of these neighborhood points defines a kernel The smaller the sampling rate from the training set, the broader the kernel => effective degrees of freedom N eff Central limit theorem convergence is driven by N eff RF is computationally a very efficient way to regress
28 Carliles et al 2009 Zspec vs Zrf
29 RF on Cyberbricks 36-node Amdahl cluster using 1200W total Zotac Atom/ION motherboards 4GB of memory, N330 dual core Atom, 16 GPU cores Aggregate disk space 43.6TB 63 x 120GB SSD = 7.7 TB 27x 1TB Samsung F1 = 27.0 TB 18x.5TB Samsung M1= 9.0 TB Blazing I/O Performance: 18GB/s Amdahl number = 1 for under $30K Using the GPUs for data mining: 6.4B multidimensional regressions (photo-z) in 5 minutes over 1.2TB of data Running the Random Forest algorithm inside the DB
30 Correlation Functions 2-point, 3 point 2 nd order, higher order Spatial, angular Various estimators Direct estimators, edge corrections (Ripley, Landy-Szalay) Various algorithms 1 2 Tree codes (a. Gray) GPU-based codes (Budavari)
31 The Impact of GPUs We need to reconsider the N logn only approach Once we can run 100K threads, maybe running SIMD N 2 on smaller partitions is also acceptable Recent JHU effort on integrating CUDA with SQL Server, using SQL UDF Galaxy spatial correlations: 600 trillion real and random galaxy pairs using brute force N 2 Much faster than the tree codes! This is because high resolution was needed Tian, Budavari, Neyrinck, Szalay 2010
32 Visualizing Petabytes Visualizations are becoming IO limited Needs to be done where the data is It is easier to send a HD 3D video stream to the user than all the data It is possible to build individual servers with extreme data rates (5GBps per server see Data-Scope) Prototype on turbulence simulation already works: data streaming directly from SQL Server to GPU Interactive visualizations driven remotely
33 Streaming Visualization Kai Buerger (visited JHU from Munich) Implemented high performance streaming from DB into the GPU, exceeding 1GB/sec Limited by the current I/O subsystem Using the velocity field from our 30TB turbulence database
34
35
36 JHU Data-Scope Revised 1P 1S All P All S Full servers rack units capacity TB price $K power kw GPU* TF seq IO GBps IOPS kiops netwk bw Gbps * Without the GPU costs (it is about $1,600/ card)
37 Summary Real multi-pb solutions are needed NOW! We have to build it ourselves Multi-faceted challenges No single trivial breakthrough, will need multiple thrusts Various scalable algorithms, new statistical techniques Adoption of new approaches is a difficult process Need to train people with new skill sets Π shaped people Increasing diversification Also many common patterns across all disciplines A new, Fourth Paradigm of science is emerging but it is not incremental.
Data-Intensive Statistical Challenges in Astrophysics
Data-Intensive Statistical Challenges in Astrophysics Collaborators: T. Budavari, C-W Yip (JHU), M. Mahoney (Stanford), I. Csabai, L. Dobos (Hungary) Alex Szalay The Johns Hopkins University The Age of
More informationCOMPUTATIONAL STATISTICS IN ASTRONOMY: NOW AND SOON
COMPUTATIONAL STATISTICS IN ASTRONOMY: NOW AND SOON / The Johns Hopkins University Recording Observations Astronomers drew it Now kids do it on the SkyServer #1 by Haley Multicolor Universe Eventful Universe
More informationMULTIPLE EXPOSURES IN LARGE SURVEYS
MULTIPLE EXPOSURES IN LARGE SURVEYS / Johns Hopkins University Big Data? Noisy Skewed Artifacts Big Data? Noisy Skewed Artifacts Serious Issues Significant fraction of catalogs is junk GALEX 50% PS1 3PI
More informationBAYESIAN CROSS-IDENTIFICATION IN ASTRONOMY
BAYESIAN CROSS-IDENTIFICATION IN ASTRONOMY / The Johns Hopkins University Sketch by William Parsons (1845) 2 Recording Observations Whirlpool Galaxy M51 Discovered by Charles Messier (1773) Multicolor
More informationAstroinformatics in the data-driven Astronomy
Astroinformatics in the data-driven Astronomy Massimo Brescia 2017 ICT Workshop Astronomy vs Astroinformatics Most of the initial time has been spent to find a common language among communities How astronomers
More informationPRACTICAL ANALYTICS 7/19/2012. Tamás Budavári / The Johns Hopkins University
PRACTICAL ANALYTICS / The Johns Hopkins University Statistics Of numbers Of vectors Of functions Of trees Statistics Description, modeling, inference, machine learning Bayesian / Frequentist / Pragmatist?
More informationThe SDSS is Two Surveys
The SDSS is Two Surveys The Fuzzy Blob Survey The Squiggly Line Survey The Site The telescope 2.5 m mirror Digital Cameras 1.3 MegaPixels $150 4.3 Megapixels $850 100 GigaPixels $10,000,000 CCDs CCDs:
More informationData analysis of massive data sets a Planck example
Data analysis of massive data sets a Planck example Radek Stompor (APC) LOFAR workshop, Meudon, 29/03/06 Outline 1. Planck mission; 2. Planck data set; 3. Planck data analysis plan and challenges; 4. Planck
More informationExploiting Sparse Non-Linear Structure in Astronomical Data
Exploiting Sparse Non-Linear Structure in Astronomical Data Ann B. Lee Department of Statistics and Department of Machine Learning, Carnegie Mellon University Joint work with P. Freeman, C. Schafer, and
More informationarxiv: v1 [astro-ph.im] 9 May 2013
Astron. Nachr. / AN Volume, No. Issue, 1 4 (212) / DOI DOI Photo-Met: a non-parametric method for estimating stellar metallicity from photometric observations Gyöngyi Kerekes 1, István Csabai 1, László
More informationDesign and implementation of the spectra reduction and analysis software for LAMOST telescope
Design and implementation of the spectra reduction and analysis software for LAMOST telescope A-Li Luo *a, Yan-Xia Zhang a and Yong-Heng Zhao a *a National Astronomical Observatories, Chinese Academy of
More informationSome Statistical Aspects of Photometric Redshift Estimation
Some Statistical Aspects of Photometric Redshift Estimation James Long November 9, 2016 1 / 32 Photometric Redshift Estimation Background Non Negative Least Squares Non Negative Matrix Factorization EAZY
More informationAstronomy of the Next Decade: From Photons to Petabytes. R. Chris Smith AURA Observatory in Chile CTIO/Gemini/SOAR/LSST
Astronomy of the Next Decade: From Photons to Petabytes R. Chris Smith AURA Observatory in Chile CTIO/Gemini/SOAR/LSST Classical Astronomy still dominates new facilities Even new large facilities (VLT,
More informationProbabilistic photometric redshifts in the era of Petascale Astronomy
Probabilistic photometric redshifts in the era of Petascale Astronomy Matías Carrasco Kind NCSA/Department of Astronomy University of Illinois at Urbana-Champaign Tools for Astronomical Big Data March
More informationQuantifying correlations between galaxy emission lines and stellar continua
Quantifying correlations between galaxy emission lines and stellar continua R. Beck, L. Dobos, C.W. Yip, A.S. Szalay and I. Csabai 2016 astro-ph: 1601.0241 1 Introduction / Technique Data Emission line
More informationAstroinformatics: massive data research in Astronomy Kirk Borne Dept of Computational & Data Sciences George Mason University
Astroinformatics: massive data research in Astronomy Kirk Borne Dept of Computational & Data Sciences George Mason University kborne@gmu.edu, http://classweb.gmu.edu/kborne/ Ever since humans first gazed
More informationCONTEMPORARY ANALYTICAL ECOSYSTEM PATRICK HALL, SAS INSTITUTE
CONTEMPORARY ANALYTICAL ECOSYSTEM PATRICK HALL, SAS INSTITUTE Copyright 2013, SAS Institute Inc. All rights reserved. Agenda (Optional) History Lesson 2015 Buzzwords Machine Learning for X Citizen Data
More informationThe SDSS Data. Processing the Data
The SDSS Data Processing the Data On a clear, dark night, light that has traveled through space for a billion years touches a mountaintop in southern New Mexico and enters the sophisticated instrumentation
More informationA SPEctra Clustering Tool for the exploration of large spectroscopic surveys. Philipp Schalldach (HU Berlin & TLS Tautenburg, Germany)
A SPEctra Clustering Tool for the exploration of large spectroscopic surveys Philipp Schalldach (HU Berlin & TLS Tautenburg, Germany) Working Group Helmut Meusinger (Tautenburg, Germany) Philipp Schalldach
More informationHUBBLE SOURCE CATALOG. Steve Lubow Space Telescope Science Institute June 6, 2012
HUBBLE SOURCE CATALOG Steve Lubow Space Telescope Science Institute June 6, 2012 HUBBLE SPACE TELESCOPE 2.4 m telescope launched in 1990 In orbit above Earth s atmosphere High resolution, faint sources
More informationAutomatic Unsupervised Spectral Classification of all SDSS/DR7 Galaxies
Automatic Unsupervised Spectral Classification of all SDSS/DR7 Galaxies J. Sánchez Almeida, J. A. L. Aguerri, C. Muñoz-Tuñón, A. de Vicente Instituto de Astrofisica de Canarias, Spain 2010, ApJ, 714, 487
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationReal Astronomy from Virtual Observatories
THE US NATIONAL VIRTUAL OBSERVATORY Real Astronomy from Virtual Observatories Robert Hanisch Space Telescope Science Institute US National Virtual Observatory About this presentation What is a Virtual
More informationData-driven models of stars
Data-driven models of stars David W. Hogg Center for Cosmology and Particle Physics, New York University Center for Data Science, New York University Max-Planck-Insitut für Astronomie, Heidelberg 2015
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationClassifying Galaxy Morphology using Machine Learning
Julian Kates-Harbeck, Introduction: Classifying Galaxy Morphology using Machine Learning The goal of this project is to classify galaxy morphologies. Generally, galaxy morphologies fall into one of two
More informationCOMPRESSED AND PENALIZED LINEAR
COMPRESSED AND PENALIZED LINEAR REGRESSION Daniel J. McDonald Indiana University, Bloomington mypage.iu.edu/ dajmcdon 2 June 2017 1 OBLIGATORY DATA IS BIG SLIDE Modern statistical applications genomics,
More informationData Release 5. Sky coverage of imaging data in the DR5
Data Release 5 The Sloan Digital Sky Survey has released its fifth Data Release (DR5). The spatial coverage of DR5 is about 20% larger than that of DR4. The photometric data in DR5 are based on five band
More informationBig Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond
Big Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond Julian Borrill Computational Cosmology Center, LBL & Space Sciences Laboratory, UCB with Christopher Cantalupo, Theodore Kisner, Radek
More informationPrecise measurement of the radial BAO scale in galaxy redshift surveys
Precise measurement of the radial BAO scale in galaxy redshift surveys David Alonso (Universidad Autónoma de Madrid - IFT) Based on the work arxiv:1210.6446 In collaboration with: E. Sánchez, F. J. Sánchez,
More informationCosmology of Photometrically- Classified Type Ia Supernovae
Cosmology of Photometrically- Classified Type Ia Supernovae Campbell et al. 2013 arxiv:1211.4480 Heather Campbell Collaborators: Bob Nichol, Chris D'Andrea, Mat Smith, Masao Sako and all the SDSS-II SN
More informationFast Hierarchical Clustering from the Baire Distance
Fast Hierarchical Clustering from the Baire Distance Pedro Contreras 1 and Fionn Murtagh 1,2 1 Department of Computer Science. Royal Holloway, University of London. 57 Egham Hill. Egham TW20 OEX, England.
More informationModern Image Processing Techniques in Astronomical Sky Surveys
Modern Image Processing Techniques in Astronomical Sky Surveys Items of the PhD thesis József Varga Astronomy MSc Eötvös Loránd University, Faculty of Science PhD School of Physics, Programme of Particle
More informationVirtual Sensors and Large-Scale Gaussian Processes
Virtual Sensors and Large-Scale Gaussian Processes Ashok N. Srivastava, Ph.D. Principal Investigator, IVHM Project Group Lead, Intelligent Data Understanding ashok.n.srivastava@nasa.gov Coauthors: Kamalika
More informationRLW paper titles:
RLW paper titles: http://www.wordle.net Astronomical Surveys and Data Archives Richard L. White Space Telescope Science Institute HiPACC Summer School, July 2012 Overview Surveys & catalogs: Fundamental
More informationDimensionality Reduction and Principle Components Analysis
Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture
More informationIntroduction to the Sloan Survey
Introduction to the Sloan Survey Title Rita Sinha IUCAA SDSS The SDSS uses a dedicated, 2.5-meter telescope on Apache Point, NM, equipped with two powerful special-purpose instruments. The 120-megapixel
More informationLarge-Scale Behavioral Targeting
Large-Scale Behavioral Targeting Ye Chen, Dmitry Pavlov, John Canny ebay, Yandex, UC Berkeley (This work was conducted at Yahoo! Labs.) June 30, 2009 Chen et al. (KDD 09) Large-Scale Behavioral Targeting
More informationStatistics of the Universe: Exa-calculations and Cosmology's Data Deluge
Statistics of the Universe: Exa-calculations and Cosmology's Data Deluge Debbie Bard Matt Bellis Cosmology: the study of the nature and history of the Universe History of Universe driven by competing forces:
More informationSPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM
SPATIAL DATA MINING Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM INTRODUCTION The main difference between data mining in relational DBS and in spatial DBS is that attributes of the neighbors
More information9.1 Large Scale Structure: Basic Observations and Redshift Surveys
9.1 Large Scale Structure: Basic Observations and Redshift Surveys Large-Scale Structure Density fluctuations evolve into structures we observe: galaxies, clusters, etc. On scales > galaxies, we talk about
More informationDictionary Learning for photo-z estimation
Dictionary Learning for photo-z estimation Joana Frontera-Pons, Florent Sureau, Jérôme Bobin 5th September 2017 - Workshop Dictionary Learning on Manifolds MOTIVATION! Goal : Measure the radial positions
More informationStructured matrix factorizations. Example: Eigenfaces
Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix
More informationNonlinear Data Transformation with Diffusion Map
Nonlinear Data Transformation with Diffusion Map Peter Freeman Ann Lee Joey Richards* Chad Schafer Department of Statistics Carnegie Mellon University www.incagroup.org * now at U.C. Berkeley t! " 3 3
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationStudying galaxies with the Sloan Digital Sky Survey
Studying galaxies with the Sloan Digital Sky Survey Laboratory exercise, Physics of Galaxies, Spring 2017 (Uppsala Universitet) by Beatriz Villarroel *** The Sloan Digital Sky Survey (SDSS) is the largest
More informationGaussian Process Modeling in Cosmology: The Coyote Universe. Katrin Heitmann, ISR-1, LANL
1998 2015 Gaussian Process Modeling in Cosmology: The Coyote Universe Katrin Heitmann, ISR-1, LANL In collaboration with: Jim Ahrens, Salman Habib, David Higdon, Chung Hsu, Earl Lawrence, Charlie Nakhleh,
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationCosmology & Culture. Lecture 4 Wednesday April 22, 2009 The Composition of the Universe, & The Cosmic Spheres of Time.
Cosmology & Culture Lecture 4 Wednesday April 22, 2009 The Composition of the Universe, & The Cosmic Spheres of Time UCSC Physics 80C Why do scientists take seriously the Double Dark cosmology, which says
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationDark Matter on Small Scales: Merging and Cosmogony. David W. Hogg New York University CCPP
Dark Matter on Small Scales: Merging and Cosmogony David W. Hogg New York University CCPP summary galaxy merger rates suggest growth of ~1 percent per Gyr galaxy evolution is over can we rule out CDM now
More informationVector Space Models. wine_spectral.r
Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components
More informationLearning an astronomical catalog of the visible universe through scalable Bayesian inference
Learning an astronomical catalog of the visible universe through scalable Bayesian inference Jeffrey Regier with the DESI Collaboration Department of Electrical Engineering and Computer Science UC Berkeley
More informationIntroductory Review on BAO
Introductory Review on BAO David Schlegel Lawrence Berkeley National Lab 1. What are BAO? How does it measure dark energy? 2. Current observations From 3-D maps From 2-D maps (photo-z) 3. Future experiments
More informationCaesar s Taxi Prediction Services
1 Caesar s Taxi Prediction Services Predicting NYC Taxi Fares, Trip Distance, and Activity Paul Jolly, Boxiao Pan, Varun Nambiar Abstract In this paper, we propose three models each predicting either taxi
More informationDot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics
Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Chengjie Qin 1 and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced June 29, 2017 Machine Learning (ML) Is
More informationPhotometric Redshifts with DAME
Photometric Redshifts with DAME O. Laurino, R. D Abrusco M. Brescia, G. Longo & DAME Working Group VO-Day... in Tour Napoli, February 09-0, 200 The general astrophysical problem Due to new instruments
More informationThe Sloan Digital Sky Survey. Sebastian Jester Experimental Astrophysics Group Fermilab
The Sloan Digital Sky Survey Sebastian Jester Experimental Astrophysics Group Fermilab SLOAN DIGITAL SKY SURVEY Sloan Digital Sky Survey Goals: 1. Image ¼ of sky in 5 bands 2. Measure parameters of objects
More informationCOMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection
COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:
More informationA Unified Theory of Photometric Redshifts. Tamás Budavári The Johns Hopkins University
A Unified Theory of Photometric Redshifts Tamás Budavári The Johns Hopkins University Haunting Questions Two classes of methods: empirical and template fitting Why are they so different? Anything fundamental?
More informationAstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis
AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Ian Foster: Univ. of
More informationRecent BAO observations and plans for the future. David Parkinson University of Sussex, UK
Recent BAO observations and plans for the future David Parkinson University of Sussex, UK Baryon Acoustic Oscillations SDSS GALAXIES CMB Comparing BAO with the CMB CREDIT: WMAP & SDSS websites FLAT GEOMETRY
More informationDark Energy. Cluster counts, weak lensing & Supernovae Ia all in one survey. Survey (DES)
Dark Energy Cluster counts, weak lensing & Supernovae Ia all in one survey Survey (DES) What is it? The DES Collaboration will build and use a wide field optical imager (DECam) to perform a wide area,
More informationLecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University
Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization
More informationAstronomy 102: Stars and Galaxies Review Exam 3
October 31, 2004 Name: Astronomy 102: Stars and Galaxies Review Exam 3 Instructions: Write your answers in the space provided; indicate clearly if you continue on the back of a page. No books, notes, or
More informationStatistics in Astronomy (II) applications. Shen, Shiyin
Statistics in Astronomy (II) applications Shen, Shiyin Contents I. Linear fitting: scaling relations I.5 correlations between parameters II. Luminosity distribution function Vmax VS maximum likelihood
More informationDimensionality Reduction
Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball
More informationUnveiling the misteries of our Galaxy with Intersystems Caché
Unveiling the misteries of our Galaxy with Intersystems Caché Dr. Jordi Portell i de Mora on behalf of the Gaia team at the Institute for Space Studies of Catalonia (IEEC UB) and the Gaia Data Processing
More information9. High-level processing (astronomical data analysis)
Master ISTI / PARI / IV Introduction to Astronomical Image Processing 9. High-level processing (astronomical data analysis) André Jalobeanu LSIIT / MIV / PASEO group Jan. 2006 lsiit-miv.u-strasbg.fr/paseo
More informationCosmology on the Beach: Experiment to Cosmology
Image sky Select targets Design plug-plates Plug fibers Observe! Extract spectra Subtract sky spec. Cosmology on the Beach: Experiment to Cosmology Fit redshift Make 3-D map Test physics! David Schlegel!1
More informationA523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011
A523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 Lecture 1 Organization:» Syllabus (text, requirements, topics)» Course approach (goals, themes) Book: Gregory, Bayesian
More informationToday. Lookback time. ASTR 1020: Stars & Galaxies. Astronomy Picture of the day. April 2, 2008
ASTR 1020: Stars & Galaxies April 2, 2008 Astronomy Picture of the day Reading: Chapter 21, sections 21.3. MasteringAstronomy Homework on Galaxies and Hubble s Law is due April 7 th. Weak Lensing Distorts
More information2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51
2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each
More informationAlgorithms for Higher Order Spatial Statistics
Algorithms for Higher Order Spatial Statistics István Szapudi Institute for Astronomy University of Hawaii Future of AstroComputing Conference, SDSC, Dec 16-17 Outline Introduction 1 Introduction 2 Random
More informationFrom quasars to dark energy Adventures with the clustering of luminous red galaxies
From quasars to dark energy Adventures with the clustering of luminous red galaxies Nikhil Padmanabhan 1 1 Lawrence Berkeley Labs 04-15-2008 / OSU CCAPP seminar N. Padmanabhan (LBL) Cosmology with LRGs
More informationPower spectrum exercise
Power spectrum exercise In this exercise, we will consider different power spectra and how they relate to observations. The intention is to give you some intuition so that when you look at a microwave
More informationAutomated Classification of HETDEX Spectra. Ted von Hippel (U Texas, Siena, ERAU)
Automated Classification of HETDEX Spectra Ted von Hippel (U Texas, Siena, ERAU) Penn State HETDEX Meeting May 19-20, 2011 Outline HETDEX with stable instrumentation and lots of data ideally suited to
More informationFrom DES to LSST. Transient Processing Goes from Hours to Seconds. Eric Morganson, NCSA LSST Time Domain Meeting Tucson, AZ May 22, 2017
From DES to LSST Transient Processing Goes from Hours to Seconds Eric Morganson, NCSA LSST Time Domain Meeting Tucson, AZ May 22, 2017 Hi, I m Eric Dr. Eric Morganson, Research Scientist, Nation Center
More informationLecture 3a: The Origin of Variational Bayes
CSC535: 013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton The origin of variational Bayes In variational Bayes, e approximate the true posterior across parameters
More informationApplied Machine Learning for Design Optimization in Cosmology, Neuroscience, and Drug Discovery
Applied Machine Learning for Design Optimization in Cosmology, Neuroscience, and Drug Discovery Barnabas Poczos Machine Learning Department Carnegie Mellon University Machine Learning Technologies and
More information17 Random Projections and Orthogonal Matching Pursuit
17 Random Projections and Orthogonal Matching Pursuit Again we will consider high-dimensional data P. Now we will consider the uses and effects of randomness. We will use it to simplify P (put it in a
More informationDetecting the Unexpected
Detecting the Unexpected Discovery in the Era of Astronomically Big Data Insights from Space Telescope Science Institute s first Big Data conference Josh Peek, Chair Sarah Kendrew, C0-Chair SOC: Erik Tollerud,
More informationLocally-biased analytics
Locally-biased analytics You have BIG data and want to analyze a small part of it: Solution 1: Cut out small part and use traditional methods Challenge: cutting out may be difficult a priori Solution 2:
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationMultistage Anslysis on Solar Spectral Analyses with Uncertainties in Atomic Physical Models
Multistage Anslysis on Solar Spectral Analyses with Uncertainties in Atomic Physical Models Xixi Yu Imperial College London xixi.yu16@imperial.ac.uk 23 October 2018 Xixi Yu (Imperial College London) Multistage
More informationDES Galaxy Clusters x Planck SZ Map. ASTR 448 Kuang Wei Nov 27
DES Galaxy Clusters x Planck SZ Map ASTR 448 Kuang Wei Nov 27 Origin of Thermal Sunyaev-Zel'dovich (tsz) Effect Inverse Compton Scattering Figure Courtesy to J. Carlstrom Observables of tsz Effect Decrease
More informationAUTOMATIC MORPHOLOGICAL CLASSIFICATION OF GALAXIES. 1. Introduction
AUTOMATIC MORPHOLOGICAL CLASSIFICATION OF GALAXIES ZSOLT FREI Institute of Physics, Eötvös University, Budapest, Pázmány P. s. 1/A, H-1117, Hungary; E-mail: frei@alcyone.elte.hu Abstract. Sky-survey projects
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationCurrent Status of Chinese Virtual Observatory
Current Status of Chinese Virtual Observatory Chenzhou Cui, Yongheng Zhao National Astronomical Observatories, Chinese Academy of Science, Beijing 100012, P. R. China Dec. 30, 2002 General Information
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationDistant galaxies: a future 25-m submm telescope
Distant galaxies: a future 25-m submm telescope Andrew Blain Caltech 11 th October 2003 Cornell-Caltech Workshop Contents Galaxy populations being probed Modes of investigation Continuum surveys Line surveys
More informationNeuroscience Introduction
Neuroscience Introduction The brain As humans, we can identify galaxies light years away, we can study particles smaller than an atom. But we still haven t unlocked the mystery of the three pounds of matter
More informationRandomized Selection on the GPU. Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory
Randomized Selection on the GPU Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory High Performance Graphics 2011 August 6, 2011 Top k Selection on GPU Output the top k keys
More informationAstronomical imagers. ASTR320 Monday February 18, 2019
Astronomical imagers ASTR320 Monday February 18, 2019 Astronomical imaging Telescopes gather light and focus onto a focal plane, but don t make perfect images Use a camera to improve quality of images
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationGalaxy Formation Now and Then
Galaxy Formation Now and Then Matthias Steinmetz Astrophysikalisches Institut Potsdam 1 Overview The state of galaxy formation now The state of galaxy formation 10 years ago Extragalactic astronomy in
More informationCrurent and Future Massive Redshift Surveys
Crurent and Future Massive Redshift Surveys Jean-Paul Kneib LASTRO Light on the Dark Galaxy Power Spectrum 3D mapping of the position of galaxies non-gaussian initial fluctuations Size of the Horizon:
More informationData Preprocessing Tasks
Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can
More information