Bayesian Evidence and Model Selection: A Tutorial in Two Acts
|
|
- Elwin Greene
- 6 years ago
- Views:
Transcription
1 Bayesian Evidence and Model Selection: A Tutorial in Two Acts Kevin H. Knuth Depts. of Physics and Informatics, University at Albany, Albany NY USA Based on the paper: Knuth K.H., Habeck M., Malakar N.K., Mubeen A.M., Placek B Bayesian evidence and model selection. In press at Digital Signal Processing. doi: /j.dsp DOWNLOAD TALK NOW: Google knuthlab Click Talks 7/19/2015 MaxEnt 2015 Tutorial 1
2 This tutorial follows the paper: Knuth K.H., Habeck M., Malakar N.K., Mubeen A.M., Placek B Bayesian evidence and model selection. In press at Digital Signal Processing. doi: /j.dsp References are not provided in the talk slides, please consult the paper Equations in the talk are numbered in accordance with the paper When referencing anything from Act 1 of this talk, please reference this paper When referencing anything from Act 2 of this talk, please reference the slides 7/19/2015 MaxEnt 2015 Tutorial 2
3 Bayesian Evidence Odds Ratios Evidence, Model Order and Priors Numerical Techniques Laplace Approximation Importance Sampling Annealed Importance Sampling Variational Bayes Nested Sampling Applications Signal Detection : Brain Computer Interface / Neuroscience Sensor Characterization : Robotics / Signal Processing Exoplanet Characterization : Astronomy / Astrophysics Examples Nested Sampling Demo Nested Sampling and Phase Transitions 7/19/2015 MaxEnt 2015 Tutorial 3
4 7/19/2015 MaxEnt 2015 Tutorial 4
5 Bayesian Evidence 7/19/2015 MaxEnt 2015 Tutorial 5
6 Bayesian Evidence : Odds Ratios : oooo Bayes Theorem Posterior Probability Prior Probability Likelihood Evidence or Marginal Likelihood M represents a class of models represented by a set of model parameters m represents a particular model defined by a set of particular model parameter values d represents the acquired data 7/19/2015 MaxEnt 2015 Tutorial 6
7 Bayesian Evidence : Odds Ratios : oooo Bayesian Evidence The Bayesian evidence can be found by marginalizing the joint distribution P m, d M, I over all model parameter values. M represents a class of models represented by a set of model parameters m represents a particular model defined by a set of particular model parameter values d represents the acquired data I represents the dependence on any relevant prior information 7/19/2015 MaxEnt 2015 Tutorial 7
8 Bayesian Evidence : Odds Ratios : oooo Model Comparison We derive the ratio of probabilities of two models given data If the prior probabilities of the models are equal then this is the ratio of evidences 7/19/2015 MaxEnt 2015 Tutorial 8
9 Bayesian Evidence : Odds Ratios : oooo Odds Ratio or Bayes Factor The ratio of probabilities of the models given the data is proportional to the Odds Ratio 7/19/2015 MaxEnt 2015 Tutorial 9
10 Bayesian Evidence : Evidence, Model Order and Priors : ooooo Evidence: Model Order and Priors It is instructive to see how the evidence depends on both the model order and prior probabilities Consider a model with a single parameter: x ε [x min, x max ] with a width of x = x max x min Define the effective width where L max is the maximum likelihood value. 7/19/2015 MaxEnt 2015 Tutorial 10
11 Bayesian Evidence : Evidence, Model Order and Priors : ooooo Model with a Single Parameter Consider a model with a single parameter: x ε [x min, x max ] with a width of x = x max x min Given the effective width The evidence is 7/19/2015 MaxEnt 2015 Tutorial 11
12 Bayesian Evidence : Evidence, Model Order and Priors : ooooo Occam Factor The evidence is proportional to the ratio of the effective width of the likelihood and the width of the prior This ratio δx x is called the Occam factor after Occam s Razor: "Non sunt multiplicanda entia sine necessitate", " Entities must not be multiplied beyond necessity - William of Ockham 7/19/2015 MaxEnt 2015 Tutorial 12
13 Bayesian Evidence : Evidence, Model Order and Priors : ooooo Model Order For models with multiple parameters this generalizes to the ratio of the volume of the models that are compatible with both the data and the prior and the prior volume. If we assume that each of the K parameters has prior width x δx K then the Occam factor scales as. As model parameters added, eventually one fits the data asymptotically well so that δx attains a maximum value and further model parameters can only decrease the Occam factor. If we increase the flexibility of our model by the introduction of more model parameters, we reduce the Occam factor. x 7/19/2015 MaxEnt 2015 Tutorial 13
14 Bayesian Evidence : Evidence, Model Order and Priors : ooooo Odds Ratios and Occam Factors We compute the odds ratio for a model M 0 without model parameters to a model M 1 with a single model parameter. L max ratio Occam Factor The likelihood ratio is a classical statistic in frequentist model selection. If we only consider the likelihood ratio in model comparison problems, we fail to acknowledge the importance of Occam factors. 7/19/2015 MaxEnt 2015 Tutorial 14
15 Numerical Techniques 7/19/2015 MaxEnt 2015 Tutorial 15
16 Numerical Techniques : o Numerical Techniques There are a wide variety of techniques that can be used to estimate the Bayesian evidence: Laplace Approximation Importance Sampling Path Sampling Thermodynamic Integration Simulated Annealing Annealed Importance Sampling Variational Bayes (Ensemble Learning) Nested Sampling 7/19/2015 MaxEnt 2015 Tutorial 16
17 Numerical Techniques : Laplace Approximation : oooo Laplace Approximation is a simple and useful method for approximating a unimodal probability density function with a Gaussian Consider a function p x with a peak at x = x 0 We write a Taylor series expansion of ln p x about x = x 0 which can be simplified to 7/19/2015 MaxEnt 2015 Tutorial 17
18 Numerical Techniques : Laplace Approximation : oooo Laplace Approximation Previously, we had By defining We can write 7/19/2015 MaxEnt 2015 Tutorial 18
19 Numerical Techniques : Laplace Approximation : oooo Laplace Approximation Previously, we had By taking the exponential we can approximate the density by with an integral (evidence) of: 7/19/2015 MaxEnt 2015 Tutorial 19
20 Numerical Techniques : Laplace Approximation : oooo Laplace Approximation In the case of a multidimensional posterior we have where The evidence is then 7/19/2015 MaxEnt 2015 Tutorial 20
21 Numerical Techniques : Importance Sampling : oooo Importance Sampling allows one to find expectation values with respect to one distribution p(x) by computing expectation values with respect to a second distribution q(x) that is easier to sample from. The expectation value of f x with respect to p x is given by One can write p x as p x q x whenever p x is nonzero. q x as long as q x is non-zero 7/19/2015 MaxEnt 2015 Tutorial 21
22 Numerical Techniques : Importance Sampling : oooo Importance Sampling Writing p x as p x q x q x, we have: As long as the ratio p x q x estimate this with samples from q x by does not attain extreme values we can 7/19/2015 MaxEnt 2015 Tutorial 22
23 Numerical Techniques : Importance Sampling : oooo Importance Sampling Importance sampling can be used to compute ratios of evidence values in a similar fashion by writing 7/19/2015 MaxEnt 2015 Tutorial 23
24 Numerical Techniques : Importance Sampling : oooo Importance Sampling The evidence ratio can be found by sampling from q x as long as p x is sufficiently close to q x to avoid extreme ratios of p x q x 7/19/2015 MaxEnt 2015 Tutorial 24
25 Numerical Techniques : Variational Bayes : ooo Variational Bayes which is also known as ensemble learning, relies on approximating The posterior P m M, I with another distribution Q m. By defining the negative Free Energy And the Kullback-Leibler (KL) Divergence We can write 7/19/2015 MaxEnt 2015 Tutorial 25
26 Numerical Techniques : Variational Bayes : ooo Variational Bayes With this expression in hand We can show that the negative Free Energy is a lower bound to the evidence By minimizing the negative Free Energy, we can approximate the evidence 7/19/2015 MaxEnt 2015 Tutorial 26
27 Numerical Techniques : Variational Bayes : ooo Variational Bayes By choosing a distribution Q m that factorizes into where the set of parameters m 0 is disjoint from m 1 we can minimize the negative Free Energy and estimate the evidence by choosing where 7/19/2015 MaxEnt 2015 Tutorial 27
28 Numerical Techniques : Nested Sampling : ooooooo Nested Sampling was developed by John Skilling to stochastically integrate the posterior probability to obtain the evidence. Posterior estimates are used to obtain model parameter estimates. Nested sampling aims to estimate the cumulative distribution function of the density or states (DOS), which is the prior probability mass enclosed within a likelihood boundary. 7/19/2015 MaxEnt 2015 Tutorial 28
29 Numerical Techniques : Nested Sampling : ooooooo Nested Sampling Given a likelihood L, one can find the prior mass such that the likelihood of those states is greater than L Parameter Space 7/19/2015 MaxEnt 2015 Tutorial 29
30 Numerical Techniques : Nested Sampling : ooooooo Nested Sampling One can then estimate the evidence via stochastic integration using samples distributed according to the prior Likelihood integrated over Prior 7/19/2015 MaxEnt 2015 Tutorial 30
31 Numerical Techniques : Nested Sampling : ooooooo Nested Sampling One begins with a set of N samples. Use the sample with the lowest likelihood to define an implicit likelihood boundary. This results in an average decrease of the prior volume by 1/N Sample from the prior (uniformly is easiest) from within the implicit likelihood boundary to maintain N samples to estimate the evidence Z Keep track of L i (X i+1 X i ) i 7/19/2015 MaxEnt 2015 Tutorial 31
32 Numerical Techniques : Nested Sampling : ooooooo Nested Sampling Note how the prior volume contracts by 1/N each time. Early steps contribute little to the integral (Z) since the likelihood is very low. Later steps contribute little to (Z) since the prior volume change is very small. The steps that contribute most are in the middle of the sequence. 7/19/2015 MaxEnt 2015 Tutorial 32
33 Numerical Techniques : Nested Sampling : ooooooo Nested Sampling Since nested sampling contracts along the prior volume, it is relatively unaffected by local maxima in evidence (phase transitions). (See Figure A) Methods based on tempering, such as simulated annealing follow the slope of the log L curve and as such, get stuck at phase transitions. (See Figure B) 7/19/2015 MaxEnt 2015 Tutorial 33
34 Numerical Techniques : Nested Sampling : ooooooo Nested Sampling The great challenge is sampling uniformly (from the prior) within the implicit Likelihood boundaries. Several versions of Nested Sampling now exist: MultiNest (developed by Feroz and Hobson): clusters samples (K-means) and fits clusters with ellipsoids. Samples uniformly from within those ellipsoids. Very fast. Excellent performance for multi-modal distributions. Clustering limits this to 10s of parameters and the ellipsoids may not cover the high likelihood regions. Galilean Monte Carlo (Developed by Feroz and Skilling): moves a new sample with momentum reflecting off of logl boundaries. Excellent at handling ridges both angled and curved. Constrained Hamilton Monte Carlo (developed by M. Betancourt): similar to Galilean Monte Carlo. Diffusive Nested Sampling (developed by Brewer): allows samples to diffuse to lower likelihood nested levels and takes a weighted average. Nested Sampling with Demons (developed by M. Habeck): utilizes demon variables that smooth the constraint boundary and push the samples away from it. 7/19/2015 MaxEnt 2015 Tutorial 34
35 7/19/2015 MaxEnt 2015 Tutorial 35
36 Applications: Signal Detection: oooooooo Signal Detection Brain Computer Interface / Neuroscience 7/19/2015 MaxEnt 2015 Tutorial 36
37 Applications: Signal Detection: oooooooo Signal Detection (Mubeen and Knuth) We consider a practical signal detection problem where the log odds-ratio can be analytically derived. The specific application was originally for the detection of evoked brain responses The signal absent case models the recording x in channel m as noise The signal present case models the recording x in channel m as signal plus where the signal has a amplitude parameter α and can be coupled differently to different detectors (via C) 7/19/2015 MaxEnt 2015 Tutorial 37
38 Applications: Signal Detection: oooooooo Considering the Evidence The odds ratio can be written as For the noise only case, the evidence is the likelihood (Gaussian) 7/19/2015 MaxEnt 2015 Tutorial 38
39 Applications: Signal Detection: oooooooo Considering the Evidence In the signal plus noise case, the evidence is Assigning a Gaussian Likelihood and Prior for α 7/19/2015 MaxEnt 2015 Tutorial 39
40 Applications: Signal Detection: oooooooo Considering the Evidence We can then write the evidence as where 7/19/2015 MaxEnt 2015 Tutorial 40
41 Applications: Signal Detection: oooooooo Considering the Evidence If the signal amplitude must be positive: α ε 0, + then: If amplitude can be positive or negative: α ε, + then: 7/19/2015 MaxEnt 2015 Tutorial 41
42 Applications: Signal Detection: oooooooo Considering the Evidence Look at: The expression E (86) contains the cross-correlation term, which is what is typically used for the detection of a target signal in ongoing recordings. The log OR detection filters incorporate more information that leads to extra terms, which serve to aid in target signal detection. 7/19/2015 MaxEnt 2015 Tutorial 42
43 Applications: Signal Detection: oooooooo Detecting Signals A. The P300 template target signal. B. An example of three channels (Cz, Pz, Fz) of synthetic ongoing EEG with two P300 target signal events (indicated by the arrows) at an SNR of 5 db. 7/19/2015 MaxEnt 2015 Tutorial 43
44 Applications: Signal Detection: oooooooo Signal Detection Performance Detection performance measured by that area under the ROC curve as a function of signal SNR. Both OR techniques outperform cross-correlation! 7/19/2015 MaxEnt 2015 Tutorial 44
45 Applications: Signal Detection: oooo Sensor Characterization Robotics / Signal Processing 7/19/2015 MaxEnt 2015 Tutorial 45
46 Applications: Signal Detection: oooo Modeling a Robotic Sensor (Malakar, Gladkov, Knuth) In this project, we aim to model the spatial sensitivity function of a LEGO light sensor for use on a robotic system. Here the application is to develop a robotic arm that can characterize the white circle by measuring light intensities are various locations. By modeling the light sensor, we aim to increase the robot s performance. 7/19/2015 MaxEnt 2015 Tutorial 46
47 Applications: Signal Detection: oooo Modeling a Robotic Sensor The LEGO light sensor was slowly moved over a black-and-white albedo pattern on the surface of a table to obtain calibration data. Sensor orientation was varied as well. 7/19/2015 MaxEnt 2015 Tutorial 47
48 Applications: Signal Detection: oooo Modeling a Robotic Sensor Mixture of Gaussians models were used. Four model orders were tested using Nested Sampling. The 1-MoG model was slightly favored. Note the increasing uncertainty as the model becomes more complex. This suggests that the permutation space was not fully explored. 7/19/2015 MaxEnt 2015 Tutorial 48
49 Applications: Signal Detection: oooo Examining the Sensor Model Performance Here we show a comparison between the 1-MoG model and the data 7/19/2015 MaxEnt 2015 Tutorial 49
50 Applications: Star System Characterization: ooooooo Star System Characterization Astronomy / Astrophysics 7/19/2015 MaxEnt 2015 Tutorial 50
51 Applications: Star System Characterization: ooooooo Star System Characterization (Placek and Knuth) In our DSP paper, we give an example of Bayesian model testing applied to exoplanet characterization. Ben Placek also has a paper and poster here at MaxEnt 2015 on the topic. Here I will apply these model testing concepts to determining the orbital configuration of a triple star system. Digital Sky Survey (DSS) 7/19/2015 MaxEnt 2015 Tutorial 51
52 Applications: Star System Characterization: ooooooo KIC : Two Periods This star exhibits oscillations of two commensurate periods in its Light curve: 6.45 days and days (a rare 10:1 resonance!) Digital Sky Survey (DSS) Photometric data obtained from the Kepler mission (A) Quarter 13 light curve folded on the P1 = 6.45 day period, (B) Quarter 13 light curve folded on the P2 = day period (C) is the entire Q13 light curve. 7/19/2015 MaxEnt 2015 Tutorial 52
53 Applications: Star System Characterization: ooooooo KIC : Radial Velocity Measurements Eleven radial velocity measurements taken over the span of a week. The 6.45 day period is visible, but not the day period. Courtesy of Geoff Marcy and Howard Issacson 7/19/2015 MaxEnt 2015 Tutorial 53
54 Applications: Star System Characterization: ooooooo KIC : Models Two possible models of the system. The main star is a G-star (like our sun), at least one of the other companions (C1) is M-dwarf. (A) A hierarchical arrangement (C1 and C2 orbit G with 6.45 day period, and orbit one another with day period) (B) A planetary arrangement (C1 orbits with 6.45 day period, and C2 orbits with day period) 7/19/2015 MaxEnt 2015 Tutorial 54
55 Applications: Star System Characterization: ooooooo KIC : Results Testing the Hierarchical Model against the Planetary Model using the Radial Velocity Data. The Circular Hierarchical Model has the greatest evidence (by a factor of exp ) 7/19/2015 MaxEnt 2015 Tutorial 55
56 Applications: Star System Characterization: ooooooo KIC This system is a hierarchical triple system consisting of a G-star with two co-orbiting M-dwarfs in a 1:10 resonance (P1 = 6.45 day, P2 = day) 7/19/2015 MaxEnt 2015 Tutorial 56
57 Applications: Star System Characterization: ooooooo KIC /19/2015 MaxEnt 2015 Tutorial 57
58 Demonstrations: Nested Sampling: Lighthouse Problem: ooooo Nested Sampling Demo (sans model testing) 7/19/2015 MaxEnt 2015 Tutorial 58
59 Demonstrations: Nested Sampling: Lighthouse Problem: ooooo Nested Sampling Demo The Lighthouse Problem (Gull) Consider a Lighthouse located just off of a straight shore that extends a great distance. Imagine that the lighthouse has a laser beam that it fires at random times as it rotates with a uniform speed. Along the shore are light detectors that detect laser beam hits. Based on this data, where is the lighthouse? 7/19/2015 MaxEnt 2015 Tutorial 59
60 Demonstrations: Nested Sampling: Lighthouse Problem: ooooo The Likelihood Function It is a useful exercise to derive the likelihood via a change of variables. p x I = β π β 2 + α x 2 We assign a uniform prior for the location parameters α and β x = β tan θ + α 7/19/2015 MaxEnt 2015 Tutorial 60
61 y position Demonstrations: Nested Sampling: Lighthouse Problem: ooooo Nested Sampling Run using D = 64 data points (recorded flashes) and N = 100 samples Iteration is halted when Δ log Z = 10 7 o Live Samples + Used Samples Location of Lighthouse # Iterations = 1193 Log Z = mean(x) = 0.48 ± 0.26 mean(y) = 0.51 ± 0.28 x position 7/19/2015 MaxEnt 2015 Tutorial 61
62 Demonstrations: Nested Sampling: Lighthouse Problem: ooooo Nested Sampling Run This shows the relationship between log L and log Prior Volume 7/19/2015 MaxEnt 2015 Tutorial 62
63 Demonstrations: Nested Sampling: Lighthouse Problem: ooooo Nested Sampling Run with a Gaussian Likelihood This shows the relationship between log L and log Prior Volume 7/19/2015 MaxEnt 2015 Tutorial 63
64 Applications: Nested Sampling and Phase Transitions: oooo Nested Sampling and Phase Transitions 7/19/2015 MaxEnt 2015 Tutorial 64
65 Applications: Nested Sampling and Phase Transitions: oooo Peaks on Peaks Here is a Gaussian Likelihood with a taller peak on the side 7/19/2015 MaxEnt 2015 Tutorial 65
66 Applications: Nested Sampling and Phase Transitions: oooo Nested Sampling with Phase Transitions Phase Transitions represent local peaks in the evidence Phase Transition 7/19/2015 MaxEnt 2015 Tutorial 66
67 Applications: Nested Sampling and Phase Transitions: oooo Acoustic Source Localization: One Detector Consider an acoustic source localization problem using a single detector. There is a low (red) and a high (blue) frequency source. Note how the high frequency source is found first inducing a phase transition: 7/19/2015 MaxEnt 2015 Tutorial 67
68 Applications: Nested Sampling and Phase Transitions: oooo Acoustic Source Localization: Two Detectors In this example, we have two detectors, which allow us to localize the sources to rings. Again, the low frequency source is found first. 7/19/2015 MaxEnt 2015 Tutorial 68
69 Acknowledgements Michael Habeck Nabin Malakar Asim Mubeen Ben Placek 7/19/2015 MaxEnt 2015 Tutorial 69
EXONEST The Exoplanetary Explorer. Kevin H. Knuth and Ben Placek Department of Physics University at Albany (SUNY) Albany NY
EXONEST The Exoplanetary Explorer Kevin H. Knuth and Ben Placek Department of Physics University at Albany (SUNY) Albany NY Kepler Mission The Kepler mission, launched in 2009, aims to explore the structure
More informationarxiv:astro-ph/ v1 14 Sep 2005
For publication in Bayesian Inference and Maximum Entropy Methods, San Jose 25, K. H. Knuth, A. E. Abbas, R. D. Morris, J. P. Castle (eds.), AIP Conference Proceeding A Bayesian Analysis of Extrasolar
More informationStrong Lens Modeling (II): Statistical Methods
Strong Lens Modeling (II): Statistical Methods Chuck Keeton Rutgers, the State University of New Jersey Probability theory multiple random variables, a and b joint distribution p(a, b) conditional distribution
More informationNested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland
Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/ brewer/ is a Monte Carlo method (not necessarily MCMC) that was introduced by John Skilling in 2004. It is very popular
More informationMultimodal Nested Sampling
Multimodal Nested Sampling Farhan Feroz Astrophysics Group, Cavendish Lab, Cambridge Inverse Problems & Cosmology Most obvious example: standard CMB data analysis pipeline But many others: object detection,
More information2D Image Processing (Extended) Kalman and particle filter
2D Image Processing (Extended) Kalman and particle filter Prof. Didier Stricker Dr. Gabriele Bleser Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
More informationMiscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4)
Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ Bayesian
More informationAdditional Keplerian Signals in the HARPS data for Gliese 667C from a Bayesian re-analysis
Additional Keplerian Signals in the HARPS data for Gliese 667C from a Bayesian re-analysis Phil Gregory, Samantha Lawler, Brett Gladman Physics and Astronomy Univ. of British Columbia Abstract A re-analysis
More informationExpectation propagation for signal detection in flat-fading channels
Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA
More informationBayesian inference J. Daunizeau
Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty
More informationVariational Methods in Bayesian Deconvolution
PHYSTAT, SLAC, Stanford, California, September 8-, Variational Methods in Bayesian Deconvolution K. Zarb Adami Cavendish Laboratory, University of Cambridge, UK This paper gives an introduction to the
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationBayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang
Bayesian Dropout Tue Herlau, Morten Morup and Mikkel N. Schmidt Discussed by: Yizhe Zhang Feb 20, 2016 Outline 1 Introduction 2 Model 3 Inference 4 Experiments Dropout Training stage: A unit is present
More informationarxiv: v1 [physics.ins-det] 18 Mar 2013
Modeling a Sensor to Improve its Efficacy arxiv:1303.4385v1 [physics.ins-det] 18 Mar 2013 Nabin K. Malakar W. B. Hanson Center for Space Sciences University of Texas at Dallas Richardson TX nabin.malakar@utdallas.edu
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationA6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring
Lecture 9 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/a6523 Applications: Comparison of Frequentist and Bayesian inference
More informationBayesian Model Selection & Extrasolar Planet Detection
Bayesian Model Selection & Extrasolar Planet Detection Eric B. Ford UC Berkeley Astronomy Dept. Wednesday, June 14, 2006 SCMA IV SAMSI Exoplanets Working Group: Jogesh Babu, Susie Bayarri, Jim Berger,
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods
Prof. Daniel Cremers 11. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationUnfolding techniques for neutron spectrometry
Uncertainty Assessment in Computational Dosimetry: A comparison of Approaches Unfolding techniques for neutron spectrometry Physikalisch-Technische Bundesanstalt Braunschweig, Germany Contents of the talk
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationHierarchical Bayesian Modeling
Hierarchical Bayesian Modeling Making scientific inferences about a population based on many individuals Angie Wolfgang NSF Postdoctoral Fellow, Penn State Astronomical Populations Once we discover an
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationAn introduction to Bayesian inference and model comparison J. Daunizeau
An introduction to Bayesian inference and model comparison J. Daunizeau ICM, Paris, France TNU, Zurich, Switzerland Overview of the talk An introduction to probabilistic modelling Bayesian model comparison
More informationGaussian Process Approximations of Stochastic Differential Equations
Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationSAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software
SAMSI Astrostatistics Tutorial More Markov chain Monte Carlo & Demo of Mathematica software Phil Gregory University of British Columbia 26 Bayesian Logical Data Analysis for the Physical Sciences Contents:
More informationSRNDNA Model Fitting in RL Workshop
SRNDNA Model Fitting in RL Workshop yael@princeton.edu Topics: 1. trial-by-trial model fitting (morning; Yael) 2. model comparison (morning; Yael) 3. advanced topics (hierarchical fitting etc.) (afternoon;
More informationCIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions
CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions December 14, 2016 Questions Throughout the following questions we will assume that x t is the state vector at time t, z t is the
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationRobotics. Lecture 4: Probabilistic Robotics. See course website for up to date information.
Robotics Lecture 4: Probabilistic Robotics See course website http://www.doc.ic.ac.uk/~ajd/robotics/ for up to date information. Andrew Davison Department of Computing Imperial College London Review: Sensors
More informationBayesian inference J. Daunizeau
Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty
More informationBayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL. London SPM Course
Bayesian Inference Chris Mathys Wellcome Trust Centre for Neuroimaging UCL London SPM Course Thanks to Jean Daunizeau and Jérémie Mattout for previous versions of this talk A spectacular piece of information
More informationWhy Try Bayesian Methods? (Lecture 5)
Why Try Bayesian Methods? (Lecture 5) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ p.1/28 Today s Lecture Problems you avoid Ambiguity in what is random
More informationStatistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit
Statistics Lent Term 2015 Prof. Mark Thomson Lecture 2 : The Gaussian Limit Prof. M.A. Thomson Lent Term 2015 29 Lecture Lecture Lecture Lecture 1: Back to basics Introduction, Probability distribution
More informationEfficient Likelihood-Free Inference
Efficient Likelihood-Free Inference Michael Gutmann http://homepages.inf.ed.ac.uk/mgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh 8th November 2017
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More informationComputer Vision Group Prof. Daniel Cremers. 14. Sampling Methods
Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationBayesian search for other Earths
Bayesian search for other Earths Low-mass planets orbiting nearby M dwarfs Mikko Tuomi University of Hertfordshire, Centre for Astrophysics Research Email: mikko.tuomi@utu.fi Presentation, 19.4.2013 1
More informationECE295, Data Assimila0on and Inverse Problems, Spring 2015
ECE295, Data Assimila0on and Inverse Problems, Spring 2015 1 April, Intro; Linear discrete Inverse problems (Aster Ch 1 and 2) Slides 8 April, SVD (Aster ch 2 and 3) Slides 15 April, RegularizaFon (ch
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationA Bayesian Analysis of Extrasolar Planet Data for HD 73526
Astrophysical Journal, 22 Dec. 2004, revised May 2005 A Bayesian Analysis of Extrasolar Planet Data for HD 73526 P. C. Gregory Physics and Astronomy Department, University of British Columbia, Vancouver,
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationRecent advances in cosmological Bayesian model comparison
Recent advances in cosmological Bayesian model comparison Astrophysics, Imperial College London www.robertotrotta.com 1. What is model comparison? 2. The Bayesian model comparison framework 3. Cosmological
More informationApproximate Inference
Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationModeling a Sensor to Improve its Efficacy
University at Albany, State University of New York Scholars Archive Physics Faculty Scholarship Physics Spring 5--13 Modeling a Sensor to Improve its Efficacy Nabin K. Malakar University of Texas at Dallas,
More informationKalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein
Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationLecture 6: Model Checking and Selection
Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which
More informationCheng Soon Ong & Christian Walder. Canberra February June 2017
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX
More informationStatistical Methods for Astronomy
Statistical Methods for Astronomy If your experiment needs statistics, you ought to have done a better experiment. -Ernest Rutherford Lecture 1 Lecture 2 Why do we need statistics? Definitions Statistical
More informationApproximate inference in Energy-Based Models
CSC 2535: 2013 Lecture 3b Approximate inference in Energy-Based Models Geoffrey Hinton Two types of density model Stochastic generative model using directed acyclic graph (e.g. Bayes Net) Energy-based
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationAdvanced Statistical Methods. Lecture 6
Advanced Statistical Methods Lecture 6 Convergence distribution of M.-H. MCMC We denote the PDF estimated by the MCMC as. It has the property Convergence distribution After some time, the distribution
More informationBayesian room-acoustic modal analysis
Bayesian room-acoustic modal analysis Wesley Henderson a) Jonathan Botts b) Ning Xiang c) Graduate Program in Architectural Acoustics, School of Architecture, Rensselaer Polytechnic Institute, Troy, New
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationLecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1
Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationMethods of Data Analysis Learning probability distributions
Methods of Data Analysis Learning probability distributions Week 5 1 Motivation One of the key problems in non-parametric data analysis is to infer good models of probability distributions, assuming we
More informationPhysics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester
Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationAstronomy. Astrophysics. DIAMONDS: A new Bayesian nested sampling tool. Application to peak bagging of solar-like oscillations
A&A 571, A71 (2014) DOI: 10.1051/0004-6361/201424181 c ESO 2014 Astronomy & Astrophysics DIAMONDS: A new Bayesian nested sampling tool Application to peak bagging of solar-like oscillations E. Corsaro
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationIntroduction to Bayesian Data Analysis
Introduction to Bayesian Data Analysis Phil Gregory University of British Columbia March 2010 Hardback (ISBN-10: 052184150X ISBN-13: 9780521841504) Resources and solutions This title has free Mathematica
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationIndependent Component Analysis. Contents
Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle
More informationRobotics. Mobile Robotics. Marc Toussaint U Stuttgart
Robotics Mobile Robotics State estimation, Bayes filter, odometry, particle filter, Kalman filter, SLAM, joint Bayes filter, EKF SLAM, particle SLAM, graph-based SLAM Marc Toussaint U Stuttgart DARPA Grand
More informationBayesian Inference in Astronomy & Astrophysics A Short Course
Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationProbabilistic Graphical Models for Image Analysis - Lecture 1
Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationREVEAL. Receiver Exploiting Variability in Estimated Acoustic Levels Project Review 16 Sept 2008
REVEAL Receiver Exploiting Variability in Estimated Acoustic Levels Project Review 16 Sept 2008 Presented to Program Officers: Drs. John Tague and Keith Davidson Undersea Signal Processing Team, Office
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationExtra-solar Planets via Bayesian Fusion MCMC
Extra-solar Planets via Bayesian Fusion MCMC This manuscript to appear as Chapter 7 in Astrostatistical Challenges for the New Astronomy, Springer Series in Astrostatistics, Hilbe, J.M (ed), 2012, New
More informationTesting Restrictions and Comparing Models
Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by
More informationFast Likelihood-Free Inference via Bayesian Optimization
Fast Likelihood-Free Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:
More information