The changing landscape of astroinformatics
|
|
- Marilynn Stokes
- 5 years ago
- Views:
Transcription
1 The changing landscape of astroinformatics astrostatistics and Eric Feigelson Center for Astrosta2s2cs Penn State University IAU Symposium #325 Astroinforma6cs Sorrento IT
2 An aphorism The scien)st collects data in order to understand natural phenomena The sta)s)cian helps the scien)st acquire understanding from the data The computer scien)st helps the scien)st perform the needed calcula)ons (*) The individual proficient at all these tasks is a data scien)st (*) needed only for Big Data or Big Theory
3 Outline 1 Astronomers & stat/comp methodology: o Role of statistics in astronomy o Education gap o Resurgence of astrostatistics 2 The R statistical software environment
4 Statistical questions for a scientist How reliable are my results? (P=0.05 vs. P=0.003) Have I used the most effective procedures for achieving my results? (KS vs. AD test) Do my results depend on uncertain models? (power law?) Do my results depend on my chosen analysis path? A suggested path: Ø Nonparametric data analysis (including hypothesis tests, local regression, Fourier or wavelet transforms, etc) Ø Clustering & classification (if dataset is inhomogeneous) Ø Parametric analysis: maximum likelihood with model selection Ø Bayesian analysis: weight likelihood with prior knowledge Ø Discuss: What have I learned from each step along the path?
5 Recommended steps in the statistical analysis of scientific data The application of statistics can reliably quantify information embedded in scientific data and help adjudicate the relevance of theoretical models. But this is not a straightforward, mechanical enterprise. It requires: Ø exploration of the data Ø careful statement of the scientific problem Ø model formulation in mathematical form Ø choice of statistical method(s) Ø calculation of statistical quantities Ø judicious scientific evaluation of the results Astronomers often do not adequately pursue each step
6 Astronomy involves many branches of modern statistics Cosmology Statistics Galaxy clustering Spatial point processes, clustering Galaxy morphology Galaxy luminosity fn Power law relationships Regression, mixture models Gamma distribution Pareto distribution Weak lensing morphology Geostatistics, density estimation Strong lensing morphology Shape statistics Strong lensing timing Time series with lag Faint source detection False Discovery Rate SN Ia & QSO lightcurves Nonstationary time series CMB spatial analysis Markov fields, ICA, etc ΛCDM parameters Bayesian inference & model selection Comparing data & simulation Uncertainty quantification
7 The education gap Statistics Astronomers usually take 0-1 courses in statistical methods during their university/graduate education, in contrast to extensive training in physics and associated mathematics. Students are thirsty for statistics training. Computation Survey of ~1100 astronomers worldwide (Momcheva & Tollerud 2015) shows that 90% write software but only 8% receive substantial training in software development and 43% receive no training. Preferred languages are Python (67%), IDL (44%), C/C++ (37%), Fortran (28%),, Matlab (8%), R (6%, 3% in USA), julia (1%). No correlation with age. Considering the wide-spread use of R in other scientific fields, its popularity among astronomers is strikingly low. The result of weak training is a widespread ignorance of the myriad advances in modern statistical & computational methodology.
8 Consequently The state of astrostatistics today is often poor! Many astronomical studies are confined to a narrow suite of familiar statistical methods: Fourier transform for temporal analysis (Fourier 1807) Least squares regression (Legendre 1805, Pearson 1901) Kolmogorov-Smirnov goodness-of-fit test (Kolmogorov, 1933) Principal components analysis for tables (Hotelling 1936) Even traditional methods are sometimes misused see Beware the Kolmogorov-Smirnov test! page on ASAIP
9 Under-utilized methodology: modeling (MLE, EM Algorithm, BIC, bootstrap) multivariate classification (LDA, SVM, CART, RFs) time series (autoregressive models, state space models) spatial point processes (Ripley s K, kriging) nondetections (survival analysis) image analysis (computer vision methods, False Detection Rate) statistical computing (R) Advertisement Modern Statistical Methods for Astronomy with R Applications E. D. Feigelson & G. J. Babu, Cambridge Univ Press, 2012! "#$$%&!'()'!*+,-.!/01&2!34&!! 5%67!/67&4$489!:!;4684<4=9!544>!!
10 Recent resurgence in astrostatistics Papers in astronomical literature doubled to ~500/yr in past decade ( Methods: statistical papers in NASA-Smithsonian Astrophysics Data System) Short training courses (Penn State, India, Brazil, Greece, China, Italy, France, Germany, Spain, CfA, Caltech, Sweden, IAU/AAS/CASCA/ meetings) Cross-disciplinary research collaborations (Harvard/ICHASC, Carnegie-Mellon, Penn State, NASA-Ames/Stanford, CEA-Saclay/Stanford, Cornell, UC-Berkeley, Michigan, Imperial College London, Swinburne, ) Cross-disciplinary conferences (Statistical Challenges in Modern Astronomy, Astronomical Data Analysis , PhysStat, SAMSI 2006/2012, Astroinformatics ) Scholarly society working groups: Int l Stat Institute s Int l Astrostatistical Assn, Int l Astro Union Commission, Amer Astro Soc Working Group, Amer Stat Assn Interest Group, LSST Science Collaboration, IEEE Astro Data Miner Task Force
11 Textbooks New resources in astrostatistics Bayesian Logical Data Analysis for the Physical Sciences: A Compara:ve Approach with Mathema:ca Support Gregory, 2005 Prac:cal Sta:s:cs for Astronomers, Wall & Jenkins, 2 nd ed, 2012 Modern Sta:s:cal Methods for Astronomy with R Applica:on, Feigelson & Babu, 2012 Sta:s:cs, Data Mining, and Machine Learning in Astronomy: A Prac:cal Python Guide for the Analysis of Survey Data,, Ivecic, Connolly, VanderPlas & Gray, 2014 hop://asaip.psu.edu Recent papers, meetings, jobs, blogs, courses, forums,
12 Statistical analysis needs software A brief history of statistical computing 1960s c2000: Sta6s6cal analysis developed by academic sta6s6cians, but implemented by private companies (SAS, SPSS, etc) 1980s: John Chambers (ATT, USA)) develops S sta6s6cal package in the new C language 1990s: Ross Ihaka & Robert Gentleman (Univ Auckland NZ) mimic S in an open source system, R. R Core Development Team expands, GNU GPL release. 2000s: Comprehensive R Analysis Network (CRAN) for user- provided specialized packages grows exponen6ally. Important packages incorporated into base- R.
13 Growth of CRAN contributed packages Oct : 9373 packages (~5/day) ~200,000 functions See The Popularity of Data Analysis Software, R. A. Muenchen,
14 The R statistical computing environment R integrates data manipula6on, graphics and extensive sta6s6cal analysis. Uniform documenta6on and coding standards. Fully programmable C- like language, similar to IDL. Specializes in vector/matrix inputs. Same compiler speed as IDL, Python, Matlab. Easy binary download for Windows, Mac or linux. On- the- fly installa6on of CRAN packages. Quick 2- way communica6on with C, Fortran, Python. Emulator of Matlab user- provided add- on CRAN packages including ~20 for astronomy (FITSio, astrolibr, )
15 R s growing importance in data science Kdnuggests 2014 poll data analytics/mining SPSS R Posts on software forums 2013 Job trends from Indeed.com
16 R is now the world s premier statistical computing package developed by statisticians with community packages from all of the sciences extensively used in genomics, biostatistics, econometrics, ecology, social sci, etc Data scientists recommend both Python and R Usage of both is growing rapidly (
17 Some functionalities of base R arithme6c & linear algebra bootstrap resampling empirical distribu6on tests exploratory data analysis generalized linear modeling graphics robust sta6s6cs linear programming local and ridge regression max likelihood es6ma6on multivariate analysis multivariate clustering neural networks smoothing spatial point processes statistical distributions statistical tests survival analysis time series analysis
18 Selected methods in Comprehensive R Archive Network (CRAN) Bayesian computation & MCMC, classification & regression trees, genetic algorithms, geostatistical modeling, hidden Markov models, irregular time series, kernel-based machine learning, least-angle & lasso regression, likelihood ratios, map projections, mixture models & modelbased clustering, nonlinear least squares, multidimensional analysis, multimodality test, multivariate time series, multivariate outlier detection, neural networks, non-linear time series analysis, nonparametric multiple comparisons, omnibus tests for normality, orientation data, parallel coordinates plots, partial least squares, periodic autoregression analysis, principal curve fits, projection pursuit, quantile regression, random fields, Random Forest classification, ridge regression, robust regression, Self-Organizing Maps, shape analysis, space-time ecological analysis, spatial analyisis & kriging, spline regressions, tessellations, three-dimensional visualization, wavelet toolbox
19 spatstat: A large CRAN package useful for astronomy ~1500 sta6s6cal func6ons ~800 book describing methods w/ recipes Spa2al Point Pa?erns: Methodology and Applica2ons with R Adrian Baddeley et al D/3D, 1M pts for ploing, 100K exploratory, 10K modeling (on laptop) Smoothing loca6ons and mark variables Global spa6al autocorrela6on func6ons (PCF, K, F, G, J, L*) Tests for inhomogeneity & anisotropy Clustering & hypothesis tests for mul6type data MLE and Bayesian modeling, model residuals & valida6on Poisson, Gibbs, user- supplied models Tessella6ons
20 CRAN Task Views ( CRAN Task Views provide brief overviews of CRAN packages by topic & func6onality. Maintained be expert volunteers. Bayesian ~110 (incl. rbugs, rjags, rstan, MCMCpack, LaplacesDemon, boa) Chem/Phys ~75 packages (incl. 20 for astronomy) Cluster/Mixture ~100 packages Graphics ~40 packages HighPerfComp ~75 packages (parallelia6on, GPUs, etc) Machine Learning ~70 packages Medical imaging ~20 packages Robust ~50 packages Spa6al ~135 packages (incl. spatstat) Survival ~200 packages TimeSeries ~170 packages
21 High Performance Compu2ng with R While originally designed for an individual exploring small datasets, R can be pipelined and can treat megadatsets. Several dozen CRAN packages are devoted to high- performance compu6ng. o foreach func6on released in R by Revolu6on Analy6cs. for loop distributed among local cores o Packages for parallel compu2ng communica2on treat Message Passing Interface, networkspaces, SOCK, PVM, Open MP, o Process management on mul2core computers & clusters: o batch: assists mul6ple parameter simula6ons, permuta6ons, etc o snowfall: wrapper around snow incl. error checks for MPI & PVM o Distributed compu6ng of mul2ple MCMC chains: bugsparallel
22 o Several packages for cloud & grid processing o cloudrmpi: processing using MPI on Amazon s EC2 cloud o GridR: executes R func6ons on remote hosts, clusters or grids o rmr and RHIPE: executes R func6ons via MapReduce on a Hadoop cluster o Several packages treat large memory & out- of- memory data o bigmemory: points to very large objects in memory o ff: fast access to large data on disk o HadoopStreaming: R opera6ons on streaming tabular or string data o Several packages provide GPU compu2ng o gputools: CUDA implementa6on of linear modeling, clustering, SVM, ICA, o magma: matrix algebra for mul6core CPU & GPU architectures o OpenCL: R interface to OpenCL for heterogeneous CPU/GPU systems See CRAN Task View on High Performance Compu)ng hop://cran.r- project.org/web/views/highperformancecompu6ng.html
23 # Support Vector Machine model, predic2on and valida2on Classifica2on in R install.packages('e1071') ; library(e1071) SDSS_svm_mod <- svm(sdss_train[,5] ~.,data=sdss_train[,1:4],cost=100, gamma=1) summary(sdss_svm_mod) ; str(sdss_svm_mod) SDSS_svm_test_pred <- predict(sdss_svm_mod, SDSS_test) plot(sdss_test[,1], SDSS_test[,2], xlim=c(- 0.7,3), col=round(sdss_svm_test_pred), ylim=c(- 0.7,1.8), pch=20, cex=0.5, main='support Vector Machine', xlab='u- g (mag) ) dev.copy2pdf(file='sdss_svm_class.pdf') Support Vector Machine u g (mag) g r (mag) Classification of g r (mag) r i (mag) SDSS Test Dataset r i (mag) i z (mag) www2.astro.psu.edu/users/edf/san6ago_2016
24 Closing remarks Astronomy graduate curriculum needs a year of statistical and computational methodology. Textbook in astroinformatics needed. Some astronomers should get M.S. in statistics or computer science. All astronomers should know methodology at Wikipedia level. Astrostatistics and astroinformatics can become a well-funded, cross-disciplinary research field involving a few percent of astronomers (cf. astrochemists) promulgating modern methodology and pushing its frontiers. Astronomers can benefit from regular use of many methods coded in R & CRAN. R propels implementation of modern procedures with little effort. Experts should investigate utility of R/CRAN in HPC environments.
Astrostatistics: Past, Present and Future
Astrostatistics: Past, Present and Future Eric Feigelson Penn State University Summer School in Statistics for Astronomy June 2017 What is astronomy? Astronomy is the observational study of matter beyond
More informationAstrostatistics: Past, Present & Future
Astrostatistics: Past, Present & Future Eric Feigelson Penn State University Summer School in Statistics for Astronomers XI June 2015 What is astronomy? Astronomy is the observational study of matter beyond
More informationIntroduction to Astrostatistics and R
Introduction to Astrostatistics and R Eric Feigelson Penn State University Osservatorio Astrofisico di Arcetri April 2014 My credentials Professor of Astronomy & Astrophysics and of Statistics Assoc Director,
More informationIntroduction to Astrostatistics and R
Introduction to Astrostatistics and R Eric Feigelson Penn State University Cosmology on the Beach, 2014 Lecture #1 My credentials Professor of Astronomy & Astrophysics and of Statistics Assoc Director,
More informationThe Challenges of Heteroscedastic Measurement Error in Astronomical Data
The Challenges of Heteroscedastic Measurement Error in Astronomical Data Eric Feigelson Penn State Measurement Error Workshop TAMU April 2016 Outline 1. Measurement errors are ubiquitous in astronomy,
More informationMaster of Science in Statistics A Proposal
1 Master of Science in Statistics A Proposal Rationale of the Program In order to cope up with the emerging complexity on the solutions of realistic problems involving several phenomena of nature it is
More informationIntroduction to astrostatistics
Introduction to astrostatistics Eric Feigelson (Astro & Astrophys) & G. Jogesh Babu (Stat) Center for Astrostatistics Penn State University Astronomy & statistics: A glorious history Hipparchus (4th c.
More informationPrerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3
University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.
More informationParallelizing Gaussian Process Calcula1ons in R
Parallelizing Gaussian Process Calcula1ons in R Christopher Paciorek UC Berkeley Sta1s1cs Joint work with: Benjamin Lipshitz Wei Zhuo Prabhat Cari Kaufman Rollin Thomas UC Berkeley EECS (formerly) IBM
More informationComputational Biology Course Descriptions 12-14
Computational Biology Course Descriptions 12-14 Course Number and Title INTRODUCTORY COURSES BIO 311C: Introductory Biology I BIO 311D: Introductory Biology II BIO 325: Genetics CH 301: Principles of Chemistry
More informationStatistics Toolbox 6. Apply statistical algorithms and probability models
Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of
More informationSTATISTICAL COMPUTING USING R/S. John Fox McMaster University
STATISTICAL COMPUTING USING R/S John Fox McMaster University The S statistical programming language and computing environment has become the defacto standard among statisticians and has made substantial
More informationSurprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University
Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University kborne@gmu.edu, http://classweb.gmu.edu/kborne/ Outline Astroinformatics Example Application:
More informationData Analysis Methods
Data Analysis Methods Successes, Opportunities, and Challenges Chad M. Schafer Department of Statistics & Data Science Carnegie Mellon University cschafer@cmu.edu The Astro/Data Science Community LSST
More informationData Exploration vis Local Two-Sample Testing
Data Exploration vis Local Two-Sample Testing 0 20 40 60 80 100 40 20 0 20 40 Freeman, Kim, and Lee (2017) Astrostatistics at Carnegie Mellon CMU Astrostatistics Network Graph 2017 (not including collaborations
More informationUnsupervised clustering of COMBO-17 galaxy photometry
STScI Astrostatistics R tutorials Eric Feigelson (Penn State) November 2011 SESSION 2 Multivariate clustering and classification ***************** ***************** Unsupervised clustering of COMBO-17
More informationStatistical Searches in Astrophysics and Cosmology
Statistical Searches in Astrophysics and Cosmology Ofer Lahav University College London, UK Abstract We illustrate some statistical challenges in Astrophysics and Cosmology, in particular noting the application
More informationClassifica(on and predic(on omics style. Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University
Classifica(on and predic(on omics style Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University Classifica(on Learning Set Data with known classes Prediction Classification rule Data with unknown
More informationMul$variate clustering & classifica$on with R
Mul$variate clustering & classifica$on with R Eric Feigelson Penn State Cosmology on the Beach, 2014 Astronomical context Astronomers have constructed classifica@ons of celes@al objects for centuries:
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationCosmological N-Body Simulations and Galaxy Surveys
Cosmological N-Body Simulations and Galaxy Surveys Adrian Pope, High Energy Physics, Argonne Na3onal Laboratory, apope@anl.gov CScADS: Scien3fic Data and Analy3cs for Extreme- scale Compu3ng, 30 July 2012
More informationAstroinformatics in the data-driven Astronomy
Astroinformatics in the data-driven Astronomy Massimo Brescia 2017 ICT Workshop Astronomy vs Astroinformatics Most of the initial time has been spent to find a common language among communities How astronomers
More informationSTATISTICS-STAT (STAT)
Statistics-STAT (STAT) 1 STATISTICS-STAT (STAT) Courses STAT 158 Introduction to R Programming Credit: 1 (1-0-0) Programming using the R Project for the Statistical Computing. Data objects, for loops,
More informationDANIEL WILSON AND BEN CONKLIN. Integrating AI with Foundation Intelligence for Actionable Intelligence
DANIEL WILSON AND BEN CONKLIN Integrating AI with Foundation Intelligence for Actionable Intelligence INTEGRATING AI WITH FOUNDATION INTELLIGENCE FOR ACTIONABLE INTELLIGENCE in an arms race for artificial
More informationAdvanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland
EnviroInfo 2004 (Geneva) Sh@ring EnviroInfo 2004 Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland Mikhail Kanevski 1, Michel Maignan 1
More informationStatistical Analysis of fmrl Data
Statistical Analysis of fmrl Data F. Gregory Ashby The MIT Press Cambridge, Massachusetts London, England Preface xi Acronyms xv 1 Introduction 1 What Is fmri? 2 The Scanning Session 4 Experimental Design
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationCS 6140: Machine Learning Spring What We Learned Last Week 2/26/16
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationarxiv: v1 [astro-ph.im] 20 Jan 2017
IAU Symposium 325 on Astroinformatics Proceedings IAU Symposium No. xxx, xxx A.C. Editor, B.D. Editor & C.E. Editor, eds. c xxx International Astronomical Union DOI: 00.0000/X000000000000000X Deep learning
More informationSPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA
SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA D. Pokrajac Center for Information Science and Technology Temple University Philadelphia, Pennsylvania A. Lazarevic Computer
More informationLectures in AstroStatistics: Topics in Machine Learning for Astronomers
Lectures in AstroStatistics: Topics in Machine Learning for Astronomers Jessi Cisewski Yale University American Astronomical Society Meeting Wednesday, January 6, 2016 1 Statistical Learning - learning
More informationChallenges and Methods for Massive Astronomical Data
Challenges and Methods for Massive Astronomical Data mini-workshop on Computational AstroStatistics The workshop is sponsored by CHASC/C-BAS, NSF grants DMS 09-07185 (HU) and DMS 09-07522 (UCI), and the
More informationContents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1
Contents Preface to Second Edition Preface to First Edition Abbreviations xv xvii xix PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 1 The Role of Statistical Methods in Modern Industry and Services
More informationCONTEMPORARY ANALYTICAL ECOSYSTEM PATRICK HALL, SAS INSTITUTE
CONTEMPORARY ANALYTICAL ECOSYSTEM PATRICK HALL, SAS INSTITUTE Copyright 2013, SAS Institute Inc. All rights reserved. Agenda (Optional) History Lesson 2015 Buzzwords Machine Learning for X Citizen Data
More informationIE598 Big Data Optimization Introduction
IE598 Big Data Optimization Introduction Instructor: Niao He Jan 17, 2018 1 A little about me Assistant Professor, ISE & CSL UIUC, 2016 Ph.D. in Operations Research, M.S. in Computational Sci. & Eng. Georgia
More informationStatistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R
Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July 2008 1 / 23 Statistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R Mirko
More informationModeling radiocarbon in the Earth System. Radiocarbon summer school 2012
Modeling radiocarbon in the Earth System Radiocarbon summer school 2012 Outline Brief introduc9on to mathema9cal modeling Single pool models Mul9ple pool models Model implementa9on Parameter es9ma9on What
More informationCourse in Data Science
Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an
More informationExploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data
Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data Jinzhong Wang April 13, 2016 The UBD Group Mobile and Social Computing Laboratory School of Software, Dalian University
More informationSta$s$cs in astronomy and the SAMSI ASTRO Program
Sta$s$cs in astronomy and the SAMSI ASTRO Program G. Jogesh Babu (with input from Eric Feigelson) Penn State Why astrosta$s$cs? Astronomers encounter a surprising variety of sta$s$cal problems in their
More informationALL OF NONPARAMETRIC STATISTICS SOLUTIONS
page 1 / 5 page 2 / 5 all of nonparametric statistics pdf Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples
More informationFrom statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu
From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationMul$variate clustering. Astro 585 Spring 2013
Mul$variate clustering Astro 585 Spring 2013 Astronomical context Astronomers have constructed classifica
More informationOutline Challenges of Massive Data Combining approaches Application: Event Detection for Astronomical Data Conclusion. Abstract
Abstract The analysis of extremely large, complex datasets is becoming an increasingly important task in the analysis of scientific data. This trend is especially prevalent in astronomy, as large-scale
More informationUncertainty quantification and visualization for functional random variables
Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,
More informationImproved Astronomical Inferences via Nonparametric Density Estimation
Improved Astronomical Inferences via Nonparametric Density Estimation Chad M. Schafer, InCA Group www.incagroup.org Department of Statistics Carnegie Mellon University Work Supported by NSF, NASA-AISR
More informationExploiting Sparse Non-Linear Structure in Astronomical Data
Exploiting Sparse Non-Linear Structure in Astronomical Data Ann B. Lee Department of Statistics and Department of Machine Learning, Carnegie Mellon University Joint work with P. Freeman, C. Schafer, and
More informationMachine Learning Overview
Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression
More informationStatistical Applications in the Astronomy Literature II Jogesh Babu. Center for Astrostatistics PennState University, USA
Statistical Applications in the Astronomy Literature II Jogesh Babu Center for Astrostatistics PennState University, USA 1 The likelihood ratio test (LRT) and the related F-test Protassov et al. (2002,
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationECE 3800 Probabilistic Methods of Signal and System Analysis
ECE 3800 Probabilistic Methods of Signal and System Analysis Dr. Bradley J. Bazuin Western Michigan University College of Engineering and Applied Sciences Department of Electrical and Computer Engineering
More informationManual for a computer class in ML
Manual for a computer class in ML November 3, 2015 Abstract This document describes a tour of Machine Learning (ML) techniques using tools in MATLAB. We point to the standard implementations, give example
More informationRelated Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM
Lecture 9 SEM, Statistical Modeling, AI, and Data Mining I. Terminology of SEM Related Concepts: Causal Modeling Path Analysis Structural Equation Modeling Latent variables (Factors measurable, but thru
More informationMining Digital Surveys for photo-z s and other things
Mining Digital Surveys for photo-z s and other things Massimo Brescia & Stefano Cavuoti INAF Astr. Obs. of Capodimonte Napoli, Italy Giuseppe Longo Dept of Physics Univ. Federico II Napoli, Italy Results
More informationAstronomical time series (and more on R) Eric Feigelson. 3rd INPE Advanced School in Astrophysics: Astrostatistics 2009
Astronomical time series (and more on R) Eric Feigelson 3rd INPE Advanced School in Astrophysics: Astrostatistics 2009 Outline 1 Time series in astronomy 2 Frequency domain methods 3 More about R 4 R in
More informationLatent Dirichlet Alloca/on
Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which
More informationChart types and when to use them
APPENDIX A Chart types and when to use them Pie chart Figure illustration of pie chart 2.3 % 4.5 % Browser Usage for April 2012 18.3 % 38.3 % Internet Explorer Firefox Chrome Safari Opera 35.8 % Pie chart
More informationContinuous Machine Learning
Continuous Machine Learning Kostiantyn Bokhan, PhD Project Lead at Samsung R&D Ukraine Kharkiv, October 2016 Agenda ML dev. workflows ML dev. issues ML dev. solutions Continuous machine learning (CML)
More informationAstronomy of the Next Decade: From Photons to Petabytes. R. Chris Smith AURA Observatory in Chile CTIO/Gemini/SOAR/LSST
Astronomy of the Next Decade: From Photons to Petabytes R. Chris Smith AURA Observatory in Chile CTIO/Gemini/SOAR/LSST Classical Astronomy still dominates new facilities Even new large facilities (VLT,
More informationINTENSIVE COMPUTATION. Annalisa Massini
INTENSIVE COMPUTATION Annalisa Massini 2015-2016 Course topics The course will cover topics that are in some sense related to intensive computation: Matlab (an introduction) GPU (an introduction) Sparse
More informationMULTIPLE EXPOSURES IN LARGE SURVEYS
MULTIPLE EXPOSURES IN LARGE SURVEYS / Johns Hopkins University Big Data? Noisy Skewed Artifacts Big Data? Noisy Skewed Artifacts Serious Issues Significant fraction of catalogs is junk GALEX 50% PS1 3PI
More informationA Course on Advanced Econometrics
A Course on Advanced Econometrics Yongmiao Hong The Ernest S. Liu Professor of Economics & International Studies Cornell University Course Introduction: Modern economies are full of uncertainties and risk.
More informationSummary Report: MAA Program Study Group on Computing and Computational Science
Summary Report: MAA Program Study Group on Computing and Computational Science Introduction and Themes Henry M. Walker, Grinnell College (Chair) Daniel Kaplan, Macalester College Douglas Baldwin, SUNY
More informationModeling User s Cognitive Dynamics in Information Access and Retrieval using Quantum Probability (ESR-6)
Modeling User s Cognitive Dynamics in Information Access and Retrieval using Quantum Probability (ESR-6) SAGAR UPRETY THE OPEN UNIVERSITY, UK QUARTZ Mid-term Review Meeting, Padova, Italy, 07/12/2018 Self-Introduction
More informationBig Data Inference. Combining Hierarchical Bayes and Machine Learning to Improve Photometric Redshifts. Josh Speagle 1,
ICHASC, Apr. 25 2017 Big Data Inference Combining Hierarchical Bayes and Machine Learning to Improve Photometric Redshifts Josh Speagle 1, jspeagle@cfa.harvard.edu In collaboration with: Boris Leistedt
More informationAn Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology. Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University
An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University Why Do We Care? Necessity in today s labs Principled approach:
More informationGenerative Model (Naïve Bayes, LDA)
Generative Model (Naïve Bayes, LDA) IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University Materials from Prof. Jia Li, sta3s3cal learning book (Has3e et al.), and machine learning
More informationCollaborative topic models: motivations cont
Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.
More informationAstroinformatics: massive data research in Astronomy Kirk Borne Dept of Computational & Data Sciences George Mason University
Astroinformatics: massive data research in Astronomy Kirk Borne Dept of Computational & Data Sciences George Mason University kborne@gmu.edu, http://classweb.gmu.edu/kborne/ Ever since humans first gazed
More informationBig Computing in High Energy Physics. David Toback Department of Physics and Astronomy Mitchell Institute for Fundamental Physics and Astronomy
Big Computing in High Energy Physics Big Data Workshop Department of Physics and Astronomy Mitchell Institute for Fundamental Physics and Astronomy October 2011 Outline Particle Physics and Big Computing
More information2005 ICPSR SUMMER PROGRAM REGRESSION ANALYSIS III: ADVANCED METHODS
2005 ICPSR SUMMER PROGRAM REGRESSION ANALYSIS III: ADVANCED METHODS William G. Jacoby Department of Political Science Michigan State University June 27 July 22, 2005 This course will take a modern, data-analytic
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationPackage BNN. February 2, 2018
Type Package Package BNN February 2, 2018 Title Bayesian Neural Network for High-Dimensional Nonlinear Variable Selection Version 1.0.2 Date 2018-02-02 Depends R (>= 3.0.2) Imports mvtnorm Perform Bayesian
More informationModels to carry out inference vs. Models to mimic (spatio-temporal) systems 5/5/15
Models to carry out inference vs. Models to mimic (spatio-temporal) systems 5/5/15 Ring-Shaped Hotspot Detection: A Summary of Results, IEEE ICDM 2014 (w/ E. Eftelioglu et al.) Where is a crime source?
More informationA CUDA Solver for Helmholtz Equation
Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College
More informationMapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland
MapReduce in Spark Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second semester
More informationTable of Contents. Multivariate methods. Introduction II. Introduction I
Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation
More informationStatistícal Methods for Spatial Data Analysis
Texts in Statistícal Science Statistícal Methods for Spatial Data Analysis V- Oliver Schabenberger Carol A. Gotway PCT CHAPMAN & K Contents Preface xv 1 Introduction 1 1.1 The Need for Spatial Analysis
More informationEconometric Analysis of Network Data. Georgetown Center for Economic Practice
Econometric Analysis of Network Data Georgetown Center for Economic Practice May 8th and 9th, 2017 Course Description This course will provide an overview of econometric methods appropriate for the analysis
More informationLanguage GTC April 7, These slides at Data Science Applications of GPUs in the R.
Data Science April 7, 2016 These slides at http://heather.cs.ucdavis.edu/gtc.pdf Why R? Why R? The lingua franca for the data science community. (R-Python-Julia battle looming?) Statistically Correct:
More informationPROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY
PROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY Arto Klami Adapted from my talk in AIHelsinki seminar Dec 15, 2016 1 MOTIVATING INTRODUCTION Most of the artificial intelligence success stories
More informationReport and Opinion 2016;8(6) Analysis of bivariate correlated data under the Poisson-gamma model
Analysis of bivariate correlated data under the Poisson-gamma model Narges Ramooz, Farzad Eskandari 2. MSc of Statistics, Allameh Tabatabai University, Tehran, Iran 2. Associate professor of Statistics,
More informationregression modelling wih spatial pdf Beginner's Guide to Spatial, Temporal and Spatial-Temporal
DOWNLOAD OR READ : REGRESSION MODELLING WIH SPATIAL AND SPATIAL TEMPORAL DATA A BAYESIAN APPROACHBAYESIAN STATISTICS AND MARKETINGAPPLIED BAYESIAN FORECASTING AND TIME SERIES ANALYSIS PDF EBOOK EPUB MOBI
More informationAdvanced Computing Systems for Scientific Research
Undergraduate Review Volume 10 Article 13 2014 Advanced Computing Systems for Scientific Research Jared Buckley Jason Covert Talia Martin Recommended Citation Buckley, Jared; Covert, Jason; and Martin,
More informationTutorial on Probabilistic Programming with PyMC3
185.A83 Machine Learning for Health Informatics 2017S, VU, 2.0 h, 3.0 ECTS Tutorial 02-04.04.2017 Tutorial on Probabilistic Programming with PyMC3 florian.endel@tuwien.ac.at http://hci-kdd.org/machine-learning-for-health-informatics-course
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationData Intensive Computing meets High Performance Computing
Data Intensive Computing meets High Performance Computing Kathy Yelick Associate Laboratory Director for Computing Sciences, Lawrence Berkeley National Laboratory Professor of Electrical Engineering and
More informationECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications
ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications Yongmiao Hong Department of Economics & Department of Statistical Sciences Cornell University Spring 2019 Time and uncertainty
More informationRick Ebert & Joseph Mazzarella For the NED Team. Big Data Task Force NASA, Ames Research Center 2016 September 28-30
NED Mission: Provide a comprehensive, reliable and easy-to-use synthesis of multi-wavelength data from NASA missions, published catalogs, and the refereed literature, to enhance and enable astrophysical
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationStatistical Methods for Astronomy
Statistical Methods for Astronomy If your experiment needs statistics, you ought to have done a better experiment. -Ernest Rutherford Lecture 1 Lecture 2 Why do we need statistics? Definitions Statistical
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationBiology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week:
Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week: Course general information About the course Course objectives Comparative methods: An overview R as language: uses and
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationECE 3800 Probabilistic Methods of Signal and System Analysis
ECE 3800 Probabilistic Methods of Signal and System Analysis Dr. Bradley J. Bazuin Western Michigan University College of Engineering and Applied Sciences Department of Electrical and Computer Engineering
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationHigh-Performance Scientific Computing
High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org
More information