Learning an astronomical catalog of the visible universe through scalable Bayesian inference

Size: px
Start display at page:

Download "Learning an astronomical catalog of the visible universe through scalable Bayesian inference"

Transcription

1 Learning an astronomical catalog of the visible universe through scalable Bayesian inference Jeffrey Regier with the DESI Collaboration Department of Electrical Engineering and Computer Science UC Berkeley December 9, 2016

2 The DESI Collaboration Jeff Regier Ryan Giordano Steve Howard Jon McAuliffe Michael Jordan Andy Miller Ryan Adams Jarrett Revels Andreas Jensen Kiran Pamnany Debbie Bard Rollin Thomas Dustin Lang David Schlegel Prabhat 2 / 31

3 An astronomical image An image from the Sloan Digital Sky Survey covering roughly one quarter square degree of the sky. 3 / 31

4 Faint light sources Most light sources are near the detection limit. 4 / 31

5 Outline 1. our graphical model for astronomical images (Celeste) 2. scaling approximate posterior inference to catalog the visible universe 3. model extensions 5 / 31

6 The Celeste graphical model 6 / 31

7 Scientific color priors 7 / 31

8 Galaxies: light-density model The light density for galaxy s is modeled as mixture of two extremal galaxy prototypes: h s (w) = θ s h s1 (w) + (1 θ s )h s0 (w). Each prototype (i = 0 or i = 1) is a mixture of bivariate normal distributions: J h si (w) = η ij φ (w; µ s, ν ij Q s ). j=1 An elliptical galaxy, θ s = 0 Shared covariance matrix Q s accounts for the scale σ s, rotation ϕ s, and axis ratio ρ s. A spiral galaxy, θ s = 1 8 / 31

9 Idealized sky view The brightness for sky position w is S G b (w) = l sb g s (w) s=1 where g s (w) = { 1 {µ s = w}, if a s = 0 ( star ) h s (w), if a s = 1 ( galaxy ). 9 / 31

10 Astronomical images Images differ from the idealized sky view due to 1. pixelation and point spread K f nbm (w) = ᾱ nbk φ ( w m ; w + ξ ) nbk, τ nbk k=1 G nbm = G b f nbm 2. background radiation and calibration F nbm = ι nb [ɛ nb + G nbm ] 3. finite exposure duration x nbm (a s, r s, c s ) S s=1 Poisson (F nbm) 10 / 31

11 Intractable posterior Let Θ = (a s, r s, c s ) S s=1. The posterior on Θ is intractable because of coupling between the sources: p(θ x) = p(x Θ)p(Θ) p(x) and p(x) = = p(x Θ)p(Θ) dθ N B M p(x nbm Θ)p(Θ) dθ. n=1 b=1 m=1 11 / 31

12 Variational inference Variational inference approximates the exact posterior p with a simpler distribution q Q. 12 / 31

13 Variational inference for Celeste An approximating distribution that factorizes across light sources (a structured mean-field assumption) makes most expectations tractable: q(θ) = S q(θ s ). s=1 The delta method for moments approximates the remaining expectations. Existing catalogs provide good initial settings for the variational parameters. Light sources are unlikely to contribute photons to distant pixels. The model contains an auxiliary variable indicating the mixture component that generated each source s colors. Newton s method converges in tens of iterations. 13 / 31

14 Validation from Stripe 82 Photo Celeste position missed gals 23 / / 421 missed stars 10 / / 421 color u-g color g-r color r-i color i-z brightness profile axis ratio scale angle Average error. Lower is better. Highlight scores are more than 2 standard deviations better. 14 / 31

15 Scaling inference to the visibile universe

16 The setting Big data Sloan Digital Sky Survey: 55 TB of images; hundreds of millions of stars and galaxies Large Synoptic Survey (2019): 15 TB of images nightly Fancy hardware Cori supercomputer, Phase 1: 1,630 nodes, each with 32 cores. Cori supercomputer, Phase 2: 9,300 nodes, each with 272 hardware threads. 30 teraflops. 1.5 PB array of SSDs 16 / 31

17 Julia programming language high-level syntax as fast as C++ (when necessary) a single language for hotspots and the rest multi-threading (experimental), not just multi-processing 17 / 31

18 Fast serial optimization algorithm analytic expectations (and one in delta method for moments) block coordinate ascent Newton steps rather than L-BFGS or a first-order method manually coded gradients and Hessians 3x overhead from computing exact Hessians 18 / 31

19 Parallelism among nodes A region of the sky, shown twice, divided into 25 overlapping boxes: A1,...,A9 and B1,...,B16. Each box corresponds to a task: to optimize all the light sources within its boundaries. 19 / 31

20 Parallelism among threads Light sources that do not overlap may be updated concurrently. 20 / 31

21 Parallelism among threads Light sources that do not overlap may be updated concurrently. [1] Pan, Xinghao, et al. CYCLADES: Conflict-free Asynchronous Machine Learning. NIPS / 31

22 Weak and strong scaling Celeste light sources/second. We observe perfect scaling up to 64 nodes. The we are limited by interconnect bandwidth. 21 / 31

23 Hero run: catalog of the Sloan Digital Sky Survey 512 nodes 32 cores/node = 16,384 cores 16,384 cores 16 hours = 250,000 core hours input: 55 TB of astronomical images output: catalog of 250 million light sources 22 / 31

24 A deep generative model for galaxies

25 A deep generative model for galaxies Example z n = [0.1, 0.5, 0.2, 0.1] f µ (z n ) = f σ (z n ) = z n N (0, I) x n z n N (f µ (z), f σ (z)) x n = 24 / 31

26 Autoencoder architecture 25 / 31

27 Sample fits Each row corresponds to a different example from a test set. The left column shows the input x. The center column shows the output f µ(z) for a z sampled from N (g µ(x), g σ(x)). The right column shows the output f σ(z) for the same z. 26 / 31

28 Second-order stochastic variational inference

29 Second-order Stochastic Variational Inference Require: ω is the initial vector of variational parameters; δ (δ min, δ max ) is the initial trust-region radius; γ > 1 is the trust region expansion factor; and η 1 (0, 1) and η 2 > 0 are constants. 1: for i 1 to M do 2: Sample e 1,..., e N iid from base distribution ɛ. 3: g ν ˆL(ν; e 1,..., e N ) ω 4: H 2 ˆL(ν; ν e 1,..., e N ) ω 5: ω arg max ν {g ν + ν Hν : ν δ} non-convex quadratic optimization 6: β g ω + ω Hω the expected improvement 7: Sample e 1,..., e N iid from base distribution ɛ. 8: α ˆL(ω ; e 1,..., e N ) ˆL(ω; e 1,..., e N ) the observed improvement 9: if α/β > η 1 and g η 2 δ then 10: ω ω 11: δ max(γδ, δ max ) 12: else 13: δ δ/γ 14: if δ < δ min or i = M then return ω 28 / 31

30 Second-order Stochastic Variational Inference Require: ω is the initial vector of variational parameters; δ (δ min, δ max ) is the initial trust-region radius; γ > 1 is the trust region expansion factor; and η 1 (0, 1) and η 2 > 0 are constants. 1: for i 1 to M do 2: Sample e 1,..., e N iid from base distribution ɛ. 3: g ν ˆL(ν; e 1,..., e N ) ω 4: H 2 ˆL(ν; ν e 1,..., e N ) ω 5: ω arg max ν {g ν + ν Hν : ν δ} non-convex quadratic optimization 6: β g ω + ω Hω the expected improvement 7: Sample e 1,..., e N iid from base distribution ɛ. 8: α ˆL(ω ; e 1,..., e N ) ˆL(ω; e 1,..., e N ) the observed improvement 9: if α/β > η 1 and g η 2 δ then 10: ω ω 11: δ max(γδ, δ max ) 12: else 13: δ δ/γ 14: if δ < δ min or i = M then return ω 68X fewer iterations than ADVI/SGD. 28 / 31

31 Case study: empirical convergence rates 29 / 31

32 Open questions How do we more accurately approximate the posterior distribution? normalizing flows without an encoder network hybrid VI/MCMC linear response variational Bayes (LRVB) How do we model a spatially varying point spread function (PSF)? Variational autoencoders fit independent PSFs well. But it isn t easy to account for dependence among the PSFs of nearby astronomical objects. How can we easily account for the details we don t yet account for? cosmic rays, airplanes, and satelights camera saturation (censoring) and imaging artifacts terrestrial vs extraterestrial background radiation transient events: supernovae, exoplanets, and near-earth asteroids additional imaging datasets, spectrographic datasets, calibration datasets 30 / 31

33 Thank you! 31 / 31

Celeste: Variational inference for a generative model of astronomical images

Celeste: Variational inference for a generative model of astronomical images Celeste: Variational inference for a generative model of astronomical images Jerey Regier Statistics Department UC Berkeley July 9, 2015 Joint work with Jon McAulie (UCB Statistics), Andrew Miller, Ryan

More information

Celeste: Scalable variational inference for a generative model of astronomical images

Celeste: Scalable variational inference for a generative model of astronomical images Celeste: Scalable variational inference for a generative model of astronomical images Jeffrey Regier 1, Brenton Partridge 2, Jon McAuliffe 1, Ryan Adams 2, Matt Hoffman 3, Dustin Lang 4, David Schlegel

More information

Celeste: Variational inference for a generative model of astronomical images

Celeste: Variational inference for a generative model of astronomical images : Variational inference for a generative model of astronomical images Jeffrey Regier, University of California, Berkeley Andrew Miller, Harvard University Jon McAuliffe, University of California, Berkeley

More information

Streaming Variational Bayes

Streaming Variational Bayes Streaming Variational Bayes Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson, Michael I. Jordan UC Berkeley Discussion led by Miao Liu September 13, 2013 Introduction The SDA-Bayes Framework

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

Two Useful Bounds for Variational Inference

Two Useful Bounds for Variational Inference Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Overlapping Astronomical Sources: Utilizing Spectral Information

Overlapping Astronomical Sources: Utilizing Spectral Information Overlapping Astronomical Sources: Utilizing Spectral Information David Jones Advisor: Xiao-Li Meng Collaborators: Vinay Kashyap (CfA) and David van Dyk (Imperial College) CHASC Astrostatistics Group April

More information

Image Processing in Astronomy: Current Practice & Challenges Going Forward

Image Processing in Astronomy: Current Practice & Challenges Going Forward Image Processing in Astronomy: Current Practice & Challenges Going Forward Mario Juric University of Washington With thanks to Andy Connolly, Robert Lupton, Ian Sullivan, David Reiss, and the LSST DM Team

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Astronomical image reduction using the Tractor

Astronomical image reduction using the Tractor the Tractor DECaLS Fin Astronomical image reduction using the Tractor Dustin Lang McWilliams Postdoc Fellow Carnegie Mellon University visiting University of Waterloo UW / 2015-03-31 1 Astronomical image

More information

Algorithms for Variational Learning of Mixture of Gaussians

Algorithms for Variational Learning of Mixture of Gaussians Algorithms for Variational Learning of Mixture of Gaussians Instructors: Tapani Raiko and Antti Honkela Bayes Group Adaptive Informatics Research Center 28.08.2008 Variational Bayesian Inference Mixture

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Nov 2, 2016 Outline SGD-typed algorithms for Deep Learning Parallel SGD for deep learning Perceptron Prediction value for a training data: prediction

More information

Contrastive Divergence

Contrastive Divergence Contrastive Divergence Training Products of Experts by Minimizing CD Hinton, 2002 Helmut Puhr Institute for Theoretical Computer Science TU Graz June 9, 2010 Contents 1 Theory 2 Argument 3 Contrastive

More information

Photometric Techniques II Data analysis, errors, completeness

Photometric Techniques II Data analysis, errors, completeness Photometric Techniques II Data analysis, errors, completeness Sergio Ortolani Dipartimento di Astronomia Universita di Padova, Italy. The fitting technique assumes the linearity of the intensity values

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

Variational Autoencoder

Variational Autoencoder Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational

More information

Hierarchical Bayesian Modeling

Hierarchical Bayesian Modeling Hierarchical Bayesian Modeling Making scientific inferences about a population based on many individuals Angie Wolfgang NSF Postdoctoral Fellow, Penn State Astronomical Populations Once we discover an

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Hubble s Law and the Cosmic Distance Scale

Hubble s Law and the Cosmic Distance Scale Lab 7 Hubble s Law and the Cosmic Distance Scale 7.1 Overview Exercise seven is our first extragalactic exercise, highlighting the immense scale of the Universe. It addresses the challenge of determining

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Topic 3: Neural Networks

Topic 3: Neural Networks CS 4850/6850: Introduction to Machine Learning Fall 2018 Topic 3: Neural Networks Instructor: Daniel L. Pimentel-Alarcón c Copyright 2018 3.1 Introduction Neural networks are arguably the main reason why

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

Model Checking and Improvement

Model Checking and Improvement Model Checking and Improvement Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Model Checking All models are wrong but some models are useful George E. P. Box So far we have looked at a number

More information

Probabilistic Catalogs and beyond...

Probabilistic Catalogs and beyond... Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/ brewer/ Probabilistic catalogs So, what is a probabilistic catalog? And what s beyond? Finite Mixture Models Finite

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Intro to Probability. Andrei Barbu

Intro to Probability. Andrei Barbu Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Cosmology with the Sloan Digital Sky Survey Supernova Search. David Cinabro

Cosmology with the Sloan Digital Sky Survey Supernova Search. David Cinabro Cosmology with the Sloan Digital Sky Survey Supernova Search David Cinabro Cosmology Background The study of the origin and evolution of the Universe. First real effort by Einstein in 1916 Gravity is all

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem

More information

Understanding Covariance Estimates in Expectation Propagation

Understanding Covariance Estimates in Expectation Propagation Understanding Covariance Estimates in Expectation Propagation William Stephenson Department of EECS Massachusetts Institute of Technology Cambridge, MA 019 wtstephe@csail.mit.edu Tamara Broderick Department

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Hierarchical Bayesian Modeling

Hierarchical Bayesian Modeling Hierarchical Bayesian Modeling Making scientific inferences about a population based on many individuals Angie Wolfgang NSF Postdoctoral Fellow, Penn State Astronomical Populations Once we discover an

More information

Online Algorithms for Sum-Product

Online Algorithms for Sum-Product Online Algorithms for Sum-Product Networks with Continuous Variables Priyank Jaini Ph.D. Seminar Consistent/Robust Tensor Decomposition And Spectral Learning Offline Bayesian Learning ADF, EP, SGD, oem

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

X-ray observations of neutron stars and black holes in nearby galaxies

X-ray observations of neutron stars and black holes in nearby galaxies X-ray observations of neutron stars and black holes in nearby galaxies Andreas Zezas Harvard-Smithsonian Center for Astrophysics The lives of stars : fighting against gravity Defining parameter : Mass

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

Homework on Properties of Galaxies in the Hubble Deep Field Name: Due: Friday, April 8 30 points Prof. Rieke & TA Melissa Halford

Homework on Properties of Galaxies in the Hubble Deep Field Name: Due: Friday, April 8 30 points Prof. Rieke & TA Melissa Halford Homework on Properties of Galaxies in the Hubble Deep Field Name: Due: Friday, April 8 30 points Prof. Rieke & TA Melissa Halford You are going to work with some famous astronomical data in this homework.

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Homework #7: Properties of Galaxies in the Hubble Deep Field Name: Due: Friday, October points Profs. Rieke

Homework #7: Properties of Galaxies in the Hubble Deep Field Name: Due: Friday, October points Profs. Rieke Homework #7: Properties of Galaxies in the Hubble Deep Field Name: Due: Friday, October 31 30 points Profs. Rieke You are going to work with some famous astronomical data in this homework. The image data

More information

Efficient Likelihood-Free Inference

Efficient Likelihood-Free Inference Efficient Likelihood-Free Inference Michael Gutmann http://homepages.inf.ed.ac.uk/mgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh 8th November 2017

More information

The SDSS is Two Surveys

The SDSS is Two Surveys The SDSS is Two Surveys The Fuzzy Blob Survey The Squiggly Line Survey The Site The telescope 2.5 m mirror Digital Cameras 1.3 MegaPixels $150 4.3 Megapixels $850 100 GigaPixels $10,000,000 CCDs CCDs:

More information

Data Processing in DES

Data Processing in DES Data Processing in DES Brian Yanny Oct 28, 2016 http://data.darkenergysurvey.org/fnalmisc/talk/detrend.p Basic Signal-to-Noise calculation in astronomy: Assuming a perfect atmosphere (fixed PSF of p arcsec

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Basics of Photometry

Basics of Photometry Basics of Photometry Photometry: Basic Questions How do you identify objects in your image? How do you measure the flux from an object? What are the potential challenges? Does it matter what type of object

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Linear and logistic regression

Linear and logistic regression Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

MULTIPLE EXPOSURES IN LARGE SURVEYS

MULTIPLE EXPOSURES IN LARGE SURVEYS MULTIPLE EXPOSURES IN LARGE SURVEYS / Johns Hopkins University Big Data? Noisy Skewed Artifacts Big Data? Noisy Skewed Artifacts Serious Issues Significant fraction of catalogs is junk GALEX 50% PS1 3PI

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Counterfactuals via Deep IV. Matt Taddy (Chicago + MSR) Greg Lewis (MSR) Jason Hartford (UBC) Kevin Leyton-Brown (UBC)

Counterfactuals via Deep IV. Matt Taddy (Chicago + MSR) Greg Lewis (MSR) Jason Hartford (UBC) Kevin Leyton-Brown (UBC) Counterfactuals via Deep IV Matt Taddy (Chicago + MSR) Greg Lewis (MSR) Jason Hartford (UBC) Kevin Leyton-Brown (UBC) Endogenous Errors y = g p, x + e and E[ p e ] 0 If you estimate this using naïve ML,

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Chapter 23 Lecture. The Cosmic Perspective Seventh Edition. Dark Matter, Dark Energy, and the Fate of the Universe Pearson Education, Inc.

Chapter 23 Lecture. The Cosmic Perspective Seventh Edition. Dark Matter, Dark Energy, and the Fate of the Universe Pearson Education, Inc. Chapter 23 Lecture The Cosmic Perspective Seventh Edition Dark Matter, Dark Energy, and the Fate of the Universe Curvature of the Universe The Density Parameter of the Universe Ω 0 is defined as the ratio

More information

Fast Likelihood-Free Inference via Bayesian Optimization

Fast Likelihood-Free Inference via Bayesian Optimization Fast Likelihood-Free Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology

More information

An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme

An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme Ghassen Jerfel April 2017 As we will see during this talk, the Bayesian and information-theoretic

More information

Natural Gradients via the Variational Predictive Distribution

Natural Gradients via the Variational Predictive Distribution Natural Gradients via the Variational Predictive Distribution Da Tang Columbia University datang@cs.columbia.edu Rajesh Ranganath New York University rajeshr@cims.nyu.edu Abstract Variational inference

More information

VIME: Variational Information Maximizing Exploration

VIME: Variational Information Maximizing Exploration 1 VIME: Variational Information Maximizing Exploration Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel Reviewed by Zhao Song January 13, 2017 1 Exploration in Reinforcement

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight) CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information

More information

Andriy Mnih and Ruslan Salakhutdinov

Andriy Mnih and Ruslan Salakhutdinov MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Lecture 13 Fundamentals of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up

More information

Bayesian Feature Selection with Strongly Regularizing Priors Maps to the Ising Model

Bayesian Feature Selection with Strongly Regularizing Priors Maps to the Ising Model LETTER Communicated by Ilya M. Nemenman Bayesian Feature Selection with Strongly Regularizing Priors Maps to the Ising Model Charles K. Fisher charleskennethfisher@gmail.com Pankaj Mehta pankajm@bu.edu

More information

GALAXIES. Hello Mission Team members. Today our mission is to learn about galaxies.

GALAXIES. Hello Mission Team members. Today our mission is to learn about galaxies. GALAXIES Discussion Hello Mission Team members. Today our mission is to learn about galaxies. (Intro slide- 1) Galaxies span a vast range of properties, from dwarf galaxies with a few million stars barely

More information

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008 Ages of stellar populations from color-magnitude diagrams Paul Baines Department of Statistics Harvard University September 30, 2008 Context & Example Welcome! Today we will look at using hierarchical

More information

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Kai Yu Joint work with John Lafferty and Shenghuo Zhu NEC Laboratories America, Carnegie Mellon University First Prev Page

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

LECTURE 1: Introduction to Galaxies. The Milky Way on a clear night

LECTURE 1: Introduction to Galaxies. The Milky Way on a clear night LECTURE 1: Introduction to Galaxies The Milky Way on a clear night VISIBLE COMPONENTS OF THE MILKY WAY Our Sun is located 28,000 light years (8.58 kiloparsecs from the center of our Galaxy) in the Orion

More information

Time-Sensitive Dirichlet Process Mixture Models

Time-Sensitive Dirichlet Process Mixture Models Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce

More information