Bayesian Evidence and Model Selection: A Tutorial in Two Acts


 Elwin Greene
 10 months ago
 Views:
Transcription
1 Bayesian Evidence and Model Selection: A Tutorial in Two Acts Kevin H. Knuth Depts. of Physics and Informatics, University at Albany, Albany NY USA Based on the paper: Knuth K.H., Habeck M., Malakar N.K., Mubeen A.M., Placek B Bayesian evidence and model selection. In press at Digital Signal Processing. doi: /j.dsp DOWNLOAD TALK NOW: Google knuthlab Click Talks 7/19/2015 MaxEnt 2015 Tutorial 1
2 This tutorial follows the paper: Knuth K.H., Habeck M., Malakar N.K., Mubeen A.M., Placek B Bayesian evidence and model selection. In press at Digital Signal Processing. doi: /j.dsp References are not provided in the talk slides, please consult the paper Equations in the talk are numbered in accordance with the paper When referencing anything from Act 1 of this talk, please reference this paper When referencing anything from Act 2 of this talk, please reference the slides 7/19/2015 MaxEnt 2015 Tutorial 2
3 Bayesian Evidence Odds Ratios Evidence, Model Order and Priors Numerical Techniques Laplace Approximation Importance Sampling Annealed Importance Sampling Variational Bayes Nested Sampling Applications Signal Detection : Brain Computer Interface / Neuroscience Sensor Characterization : Robotics / Signal Processing Exoplanet Characterization : Astronomy / Astrophysics Examples Nested Sampling Demo Nested Sampling and Phase Transitions 7/19/2015 MaxEnt 2015 Tutorial 3
4 7/19/2015 MaxEnt 2015 Tutorial 4
5 Bayesian Evidence 7/19/2015 MaxEnt 2015 Tutorial 5
6 Bayesian Evidence : Odds Ratios : oooo Bayes Theorem Posterior Probability Prior Probability Likelihood Evidence or Marginal Likelihood M represents a class of models represented by a set of model parameters m represents a particular model defined by a set of particular model parameter values d represents the acquired data 7/19/2015 MaxEnt 2015 Tutorial 6
7 Bayesian Evidence : Odds Ratios : oooo Bayesian Evidence The Bayesian evidence can be found by marginalizing the joint distribution P m, d M, I over all model parameter values. M represents a class of models represented by a set of model parameters m represents a particular model defined by a set of particular model parameter values d represents the acquired data I represents the dependence on any relevant prior information 7/19/2015 MaxEnt 2015 Tutorial 7
8 Bayesian Evidence : Odds Ratios : oooo Model Comparison We derive the ratio of probabilities of two models given data If the prior probabilities of the models are equal then this is the ratio of evidences 7/19/2015 MaxEnt 2015 Tutorial 8
9 Bayesian Evidence : Odds Ratios : oooo Odds Ratio or Bayes Factor The ratio of probabilities of the models given the data is proportional to the Odds Ratio 7/19/2015 MaxEnt 2015 Tutorial 9
10 Bayesian Evidence : Evidence, Model Order and Priors : ooooo Evidence: Model Order and Priors It is instructive to see how the evidence depends on both the model order and prior probabilities Consider a model with a single parameter: x ε [x min, x max ] with a width of x = x max x min Define the effective width where L max is the maximum likelihood value. 7/19/2015 MaxEnt 2015 Tutorial 10
11 Bayesian Evidence : Evidence, Model Order and Priors : ooooo Model with a Single Parameter Consider a model with a single parameter: x ε [x min, x max ] with a width of x = x max x min Given the effective width The evidence is 7/19/2015 MaxEnt 2015 Tutorial 11
12 Bayesian Evidence : Evidence, Model Order and Priors : ooooo Occam Factor The evidence is proportional to the ratio of the effective width of the likelihood and the width of the prior This ratio δx x is called the Occam factor after Occam s Razor: "Non sunt multiplicanda entia sine necessitate", " Entities must not be multiplied beyond necessity  William of Ockham 7/19/2015 MaxEnt 2015 Tutorial 12
13 Bayesian Evidence : Evidence, Model Order and Priors : ooooo Model Order For models with multiple parameters this generalizes to the ratio of the volume of the models that are compatible with both the data and the prior and the prior volume. If we assume that each of the K parameters has prior width x δx K then the Occam factor scales as. As model parameters added, eventually one fits the data asymptotically well so that δx attains a maximum value and further model parameters can only decrease the Occam factor. If we increase the flexibility of our model by the introduction of more model parameters, we reduce the Occam factor. x 7/19/2015 MaxEnt 2015 Tutorial 13
14 Bayesian Evidence : Evidence, Model Order and Priors : ooooo Odds Ratios and Occam Factors We compute the odds ratio for a model M 0 without model parameters to a model M 1 with a single model parameter. L max ratio Occam Factor The likelihood ratio is a classical statistic in frequentist model selection. If we only consider the likelihood ratio in model comparison problems, we fail to acknowledge the importance of Occam factors. 7/19/2015 MaxEnt 2015 Tutorial 14
15 Numerical Techniques 7/19/2015 MaxEnt 2015 Tutorial 15
16 Numerical Techniques : o Numerical Techniques There are a wide variety of techniques that can be used to estimate the Bayesian evidence: Laplace Approximation Importance Sampling Path Sampling Thermodynamic Integration Simulated Annealing Annealed Importance Sampling Variational Bayes (Ensemble Learning) Nested Sampling 7/19/2015 MaxEnt 2015 Tutorial 16
17 Numerical Techniques : Laplace Approximation : oooo Laplace Approximation is a simple and useful method for approximating a unimodal probability density function with a Gaussian Consider a function p x with a peak at x = x 0 We write a Taylor series expansion of ln p x about x = x 0 which can be simplified to 7/19/2015 MaxEnt 2015 Tutorial 17
18 Numerical Techniques : Laplace Approximation : oooo Laplace Approximation Previously, we had By defining We can write 7/19/2015 MaxEnt 2015 Tutorial 18
19 Numerical Techniques : Laplace Approximation : oooo Laplace Approximation Previously, we had By taking the exponential we can approximate the density by with an integral (evidence) of: 7/19/2015 MaxEnt 2015 Tutorial 19
20 Numerical Techniques : Laplace Approximation : oooo Laplace Approximation In the case of a multidimensional posterior we have where The evidence is then 7/19/2015 MaxEnt 2015 Tutorial 20
21 Numerical Techniques : Importance Sampling : oooo Importance Sampling allows one to find expectation values with respect to one distribution p(x) by computing expectation values with respect to a second distribution q(x) that is easier to sample from. The expectation value of f x with respect to p x is given by One can write p x as p x q x whenever p x is nonzero. q x as long as q x is nonzero 7/19/2015 MaxEnt 2015 Tutorial 21
22 Numerical Techniques : Importance Sampling : oooo Importance Sampling Writing p x as p x q x q x, we have: As long as the ratio p x q x estimate this with samples from q x by does not attain extreme values we can 7/19/2015 MaxEnt 2015 Tutorial 22
23 Numerical Techniques : Importance Sampling : oooo Importance Sampling Importance sampling can be used to compute ratios of evidence values in a similar fashion by writing 7/19/2015 MaxEnt 2015 Tutorial 23
24 Numerical Techniques : Importance Sampling : oooo Importance Sampling The evidence ratio can be found by sampling from q x as long as p x is sufficiently close to q x to avoid extreme ratios of p x q x 7/19/2015 MaxEnt 2015 Tutorial 24
25 Numerical Techniques : Variational Bayes : ooo Variational Bayes which is also known as ensemble learning, relies on approximating The posterior P m M, I with another distribution Q m. By defining the negative Free Energy And the KullbackLeibler (KL) Divergence We can write 7/19/2015 MaxEnt 2015 Tutorial 25
26 Numerical Techniques : Variational Bayes : ooo Variational Bayes With this expression in hand We can show that the negative Free Energy is a lower bound to the evidence By minimizing the negative Free Energy, we can approximate the evidence 7/19/2015 MaxEnt 2015 Tutorial 26
27 Numerical Techniques : Variational Bayes : ooo Variational Bayes By choosing a distribution Q m that factorizes into where the set of parameters m 0 is disjoint from m 1 we can minimize the negative Free Energy and estimate the evidence by choosing where 7/19/2015 MaxEnt 2015 Tutorial 27
28 Numerical Techniques : Nested Sampling : ooooooo Nested Sampling was developed by John Skilling to stochastically integrate the posterior probability to obtain the evidence. Posterior estimates are used to obtain model parameter estimates. Nested sampling aims to estimate the cumulative distribution function of the density or states (DOS), which is the prior probability mass enclosed within a likelihood boundary. 7/19/2015 MaxEnt 2015 Tutorial 28
29 Numerical Techniques : Nested Sampling : ooooooo Nested Sampling Given a likelihood L, one can find the prior mass such that the likelihood of those states is greater than L Parameter Space 7/19/2015 MaxEnt 2015 Tutorial 29
30 Numerical Techniques : Nested Sampling : ooooooo Nested Sampling One can then estimate the evidence via stochastic integration using samples distributed according to the prior Likelihood integrated over Prior 7/19/2015 MaxEnt 2015 Tutorial 30
31 Numerical Techniques : Nested Sampling : ooooooo Nested Sampling One begins with a set of N samples. Use the sample with the lowest likelihood to define an implicit likelihood boundary. This results in an average decrease of the prior volume by 1/N Sample from the prior (uniformly is easiest) from within the implicit likelihood boundary to maintain N samples to estimate the evidence Z Keep track of L i (X i+1 X i ) i 7/19/2015 MaxEnt 2015 Tutorial 31
32 Numerical Techniques : Nested Sampling : ooooooo Nested Sampling Note how the prior volume contracts by 1/N each time. Early steps contribute little to the integral (Z) since the likelihood is very low. Later steps contribute little to (Z) since the prior volume change is very small. The steps that contribute most are in the middle of the sequence. 7/19/2015 MaxEnt 2015 Tutorial 32
33 Numerical Techniques : Nested Sampling : ooooooo Nested Sampling Since nested sampling contracts along the prior volume, it is relatively unaffected by local maxima in evidence (phase transitions). (See Figure A) Methods based on tempering, such as simulated annealing follow the slope of the log L curve and as such, get stuck at phase transitions. (See Figure B) 7/19/2015 MaxEnt 2015 Tutorial 33
34 Numerical Techniques : Nested Sampling : ooooooo Nested Sampling The great challenge is sampling uniformly (from the prior) within the implicit Likelihood boundaries. Several versions of Nested Sampling now exist: MultiNest (developed by Feroz and Hobson): clusters samples (Kmeans) and fits clusters with ellipsoids. Samples uniformly from within those ellipsoids. Very fast. Excellent performance for multimodal distributions. Clustering limits this to 10s of parameters and the ellipsoids may not cover the high likelihood regions. Galilean Monte Carlo (Developed by Feroz and Skilling): moves a new sample with momentum reflecting off of logl boundaries. Excellent at handling ridges both angled and curved. Constrained Hamilton Monte Carlo (developed by M. Betancourt): similar to Galilean Monte Carlo. Diffusive Nested Sampling (developed by Brewer): allows samples to diffuse to lower likelihood nested levels and takes a weighted average. Nested Sampling with Demons (developed by M. Habeck): utilizes demon variables that smooth the constraint boundary and push the samples away from it. 7/19/2015 MaxEnt 2015 Tutorial 34
35 7/19/2015 MaxEnt 2015 Tutorial 35
36 Applications: Signal Detection: oooooooo Signal Detection Brain Computer Interface / Neuroscience 7/19/2015 MaxEnt 2015 Tutorial 36
37 Applications: Signal Detection: oooooooo Signal Detection (Mubeen and Knuth) We consider a practical signal detection problem where the log oddsratio can be analytically derived. The specific application was originally for the detection of evoked brain responses The signal absent case models the recording x in channel m as noise The signal present case models the recording x in channel m as signal plus where the signal has a amplitude parameter α and can be coupled differently to different detectors (via C) 7/19/2015 MaxEnt 2015 Tutorial 37
38 Applications: Signal Detection: oooooooo Considering the Evidence The odds ratio can be written as For the noise only case, the evidence is the likelihood (Gaussian) 7/19/2015 MaxEnt 2015 Tutorial 38
39 Applications: Signal Detection: oooooooo Considering the Evidence In the signal plus noise case, the evidence is Assigning a Gaussian Likelihood and Prior for α 7/19/2015 MaxEnt 2015 Tutorial 39
40 Applications: Signal Detection: oooooooo Considering the Evidence We can then write the evidence as where 7/19/2015 MaxEnt 2015 Tutorial 40
41 Applications: Signal Detection: oooooooo Considering the Evidence If the signal amplitude must be positive: α ε 0, + then: If amplitude can be positive or negative: α ε, + then: 7/19/2015 MaxEnt 2015 Tutorial 41
42 Applications: Signal Detection: oooooooo Considering the Evidence Look at: The expression E (86) contains the crosscorrelation term, which is what is typically used for the detection of a target signal in ongoing recordings. The log OR detection filters incorporate more information that leads to extra terms, which serve to aid in target signal detection. 7/19/2015 MaxEnt 2015 Tutorial 42
43 Applications: Signal Detection: oooooooo Detecting Signals A. The P300 template target signal. B. An example of three channels (Cz, Pz, Fz) of synthetic ongoing EEG with two P300 target signal events (indicated by the arrows) at an SNR of 5 db. 7/19/2015 MaxEnt 2015 Tutorial 43
44 Applications: Signal Detection: oooooooo Signal Detection Performance Detection performance measured by that area under the ROC curve as a function of signal SNR. Both OR techniques outperform crosscorrelation! 7/19/2015 MaxEnt 2015 Tutorial 44
45 Applications: Signal Detection: oooo Sensor Characterization Robotics / Signal Processing 7/19/2015 MaxEnt 2015 Tutorial 45
46 Applications: Signal Detection: oooo Modeling a Robotic Sensor (Malakar, Gladkov, Knuth) In this project, we aim to model the spatial sensitivity function of a LEGO light sensor for use on a robotic system. Here the application is to develop a robotic arm that can characterize the white circle by measuring light intensities are various locations. By modeling the light sensor, we aim to increase the robot s performance. 7/19/2015 MaxEnt 2015 Tutorial 46
47 Applications: Signal Detection: oooo Modeling a Robotic Sensor The LEGO light sensor was slowly moved over a blackandwhite albedo pattern on the surface of a table to obtain calibration data. Sensor orientation was varied as well. 7/19/2015 MaxEnt 2015 Tutorial 47
48 Applications: Signal Detection: oooo Modeling a Robotic Sensor Mixture of Gaussians models were used. Four model orders were tested using Nested Sampling. The 1MoG model was slightly favored. Note the increasing uncertainty as the model becomes more complex. This suggests that the permutation space was not fully explored. 7/19/2015 MaxEnt 2015 Tutorial 48
49 Applications: Signal Detection: oooo Examining the Sensor Model Performance Here we show a comparison between the 1MoG model and the data 7/19/2015 MaxEnt 2015 Tutorial 49
50 Applications: Star System Characterization: ooooooo Star System Characterization Astronomy / Astrophysics 7/19/2015 MaxEnt 2015 Tutorial 50
51 Applications: Star System Characterization: ooooooo Star System Characterization (Placek and Knuth) In our DSP paper, we give an example of Bayesian model testing applied to exoplanet characterization. Ben Placek also has a paper and poster here at MaxEnt 2015 on the topic. Here I will apply these model testing concepts to determining the orbital configuration of a triple star system. Digital Sky Survey (DSS) 7/19/2015 MaxEnt 2015 Tutorial 51
52 Applications: Star System Characterization: ooooooo KIC : Two Periods This star exhibits oscillations of two commensurate periods in its Light curve: 6.45 days and days (a rare 10:1 resonance!) Digital Sky Survey (DSS) Photometric data obtained from the Kepler mission (A) Quarter 13 light curve folded on the P1 = 6.45 day period, (B) Quarter 13 light curve folded on the P2 = day period (C) is the entire Q13 light curve. 7/19/2015 MaxEnt 2015 Tutorial 52
53 Applications: Star System Characterization: ooooooo KIC : Radial Velocity Measurements Eleven radial velocity measurements taken over the span of a week. The 6.45 day period is visible, but not the day period. Courtesy of Geoff Marcy and Howard Issacson 7/19/2015 MaxEnt 2015 Tutorial 53
54 Applications: Star System Characterization: ooooooo KIC : Models Two possible models of the system. The main star is a Gstar (like our sun), at least one of the other companions (C1) is Mdwarf. (A) A hierarchical arrangement (C1 and C2 orbit G with 6.45 day period, and orbit one another with day period) (B) A planetary arrangement (C1 orbits with 6.45 day period, and C2 orbits with day period) 7/19/2015 MaxEnt 2015 Tutorial 54
55 Applications: Star System Characterization: ooooooo KIC : Results Testing the Hierarchical Model against the Planetary Model using the Radial Velocity Data. The Circular Hierarchical Model has the greatest evidence (by a factor of exp ) 7/19/2015 MaxEnt 2015 Tutorial 55
56 Applications: Star System Characterization: ooooooo KIC This system is a hierarchical triple system consisting of a Gstar with two coorbiting Mdwarfs in a 1:10 resonance (P1 = 6.45 day, P2 = day) 7/19/2015 MaxEnt 2015 Tutorial 56
57 Applications: Star System Characterization: ooooooo KIC /19/2015 MaxEnt 2015 Tutorial 57
58 Demonstrations: Nested Sampling: Lighthouse Problem: ooooo Nested Sampling Demo (sans model testing) 7/19/2015 MaxEnt 2015 Tutorial 58
59 Demonstrations: Nested Sampling: Lighthouse Problem: ooooo Nested Sampling Demo The Lighthouse Problem (Gull) Consider a Lighthouse located just off of a straight shore that extends a great distance. Imagine that the lighthouse has a laser beam that it fires at random times as it rotates with a uniform speed. Along the shore are light detectors that detect laser beam hits. Based on this data, where is the lighthouse? 7/19/2015 MaxEnt 2015 Tutorial 59
60 Demonstrations: Nested Sampling: Lighthouse Problem: ooooo The Likelihood Function It is a useful exercise to derive the likelihood via a change of variables. p x I = β π β 2 + α x 2 We assign a uniform prior for the location parameters α and β x = β tan θ + α 7/19/2015 MaxEnt 2015 Tutorial 60
61 y position Demonstrations: Nested Sampling: Lighthouse Problem: ooooo Nested Sampling Run using D = 64 data points (recorded flashes) and N = 100 samples Iteration is halted when Δ log Z = 10 7 o Live Samples + Used Samples Location of Lighthouse # Iterations = 1193 Log Z = mean(x) = 0.48 ± 0.26 mean(y) = 0.51 ± 0.28 x position 7/19/2015 MaxEnt 2015 Tutorial 61
62 Demonstrations: Nested Sampling: Lighthouse Problem: ooooo Nested Sampling Run This shows the relationship between log L and log Prior Volume 7/19/2015 MaxEnt 2015 Tutorial 62
63 Demonstrations: Nested Sampling: Lighthouse Problem: ooooo Nested Sampling Run with a Gaussian Likelihood This shows the relationship between log L and log Prior Volume 7/19/2015 MaxEnt 2015 Tutorial 63
64 Applications: Nested Sampling and Phase Transitions: oooo Nested Sampling and Phase Transitions 7/19/2015 MaxEnt 2015 Tutorial 64
65 Applications: Nested Sampling and Phase Transitions: oooo Peaks on Peaks Here is a Gaussian Likelihood with a taller peak on the side 7/19/2015 MaxEnt 2015 Tutorial 65
66 Applications: Nested Sampling and Phase Transitions: oooo Nested Sampling with Phase Transitions Phase Transitions represent local peaks in the evidence Phase Transition 7/19/2015 MaxEnt 2015 Tutorial 66
67 Applications: Nested Sampling and Phase Transitions: oooo Acoustic Source Localization: One Detector Consider an acoustic source localization problem using a single detector. There is a low (red) and a high (blue) frequency source. Note how the high frequency source is found first inducing a phase transition: 7/19/2015 MaxEnt 2015 Tutorial 67
68 Applications: Nested Sampling and Phase Transitions: oooo Acoustic Source Localization: Two Detectors In this example, we have two detectors, which allow us to localize the sources to rings. Again, the low frequency source is found first. 7/19/2015 MaxEnt 2015 Tutorial 68
69 Acknowledgements Michael Habeck Nabin Malakar Asim Mubeen Ben Placek 7/19/2015 MaxEnt 2015 Tutorial 69
EXONEST The Exoplanetary Explorer. Kevin H. Knuth and Ben Placek Department of Physics University at Albany (SUNY) Albany NY
EXONEST The Exoplanetary Explorer Kevin H. Knuth and Ben Placek Department of Physics University at Albany (SUNY) Albany NY Kepler Mission The Kepler mission, launched in 2009, aims to explore the structure
More informationarxiv:astroph/ v1 14 Sep 2005
For publication in Bayesian Inference and Maximum Entropy Methods, San Jose 25, K. H. Knuth, A. E. Abbas, R. D. Morris, J. P. Castle (eds.), AIP Conference Proceeding A Bayesian Analysis of Extrasolar
More informationMiscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4)
Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ Bayesian
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationSAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software
SAMSI Astrostatistics Tutorial More Markov chain Monte Carlo & Demo of Mathematica software Phil Gregory University of British Columbia 26 Bayesian Logical Data Analysis for the Physical Sciences Contents:
More informationStatistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit
Statistics Lent Term 2015 Prof. Mark Thomson Lecture 2 : The Gaussian Limit Prof. M.A. Thomson Lent Term 2015 29 Lecture Lecture Lecture Lecture 1: Back to basics Introduction, Probability distribution
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationKalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein
Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationAstronomy. Astrophysics. DIAMONDS: A new Bayesian nested sampling tool. Application to peak bagging of solarlike oscillations
A&A 571, A71 (2014) DOI: 10.1051/00046361/201424181 c ESO 2014 Astronomy & Astrophysics DIAMONDS: A new Bayesian nested sampling tool Application to peak bagging of solarlike oscillations E. Corsaro
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of Kmeans Hard assignments of data points to clusters small shift of a
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationBayesian Inference in Astronomy & Astrophysics A Short Course
Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationFast LikelihoodFree Inference via Bayesian Optimization
Fast LikelihoodFree Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology
More informationPhysics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester
Physics 403 Propagation of Uncertainties Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Maximum Likelihood and Minimum Least Squares Uncertainty Intervals
More informationSOLVING STOCHASTIC PERT NETWORKS EXACTLY USING HYBRID BAYESIAN NETWORKS
SOLVING STOCHASTIC PERT NETWORKS EXACTLY USING HYBRID BAYESIAN NETWORKS Esma Nur Cinicioglu and Prakash P. Shenoy School of Business University of Kansas, Lawrence, KS 66045 USA esmanur@ku.edu, pshenoy@ku.edu
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and NonParametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationBayesian methods in the search for gravitational waves
Bayesian methods in the search for gravitational waves Reinhard Prix AlbertEinsteinInstitut Hannover Bayes forum Garching, Oct 7 2016 Statistics as applied Probability Theory Probability Theory: extends
More informationSLAM Techniques and Algorithms. Jack Collier. Canada. Recherche et développement pour la défense Canada. Defence Research and Development Canada
SLAM Techniques and Algorithms Jack Collier Defence Research and Development Canada Recherche et développement pour la défense Canada Canada Goals What will we learn Gain an appreciation for what SLAM
More informationAdaptive Monte Carlo Methods for Numerical Integration
Adaptive Monte Carlo Methods for Numerical Integration Mark Huber 1 and Sarah Schott 2 1 Department of Mathematical Sciences, Claremont McKenna College 2 Department of Mathematics, Duke University 8 March,
More informationIntroduction to Bayesian methods in inverse problems
Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Hidden Markov Models Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Additional References: David
More informationWeighing a Supermassive Black Hole Marc Royster & Andrzej Barski
Weighing a Supermassive Black Hole Marc Royster & Andrzej Barski Purpose The discovery of Kepler s Laws have had a tremendous impact on our perspective of the world as a civilization. Typically, students
More informationAn introduction to Variational calculus in Machine Learning
n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongamro, Namgu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationTechnical Report. Bayesian inference for latent variable models. Ulrich Paquet. Number 724. July Computer Laboratory
Technical Report UCAMCLTR724 ISSN 14762986 Number 724 Computer Laboratory Bayesian inference for latent variable models Ulrich Paquet July 2008 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom
More informationSelf Adaptive Particle Filter
Self Adaptive Particle Filter Alvaro Soto Pontificia Universidad Catolica de Chile Department of Computer Science Vicuna Mackenna 4860 (143), Santiago 22, Chile asoto@ing.puc.cl Abstract The particle filter
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 18: HMMs and Particle Filtering 4/4/2011 Pieter Abbeel  UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore
More informationSensor Tasking and Control
Sensor Tasking and Control Sensing Networking Leonidas Guibas Stanford University Computation CS428 Sensor systems are about sensing, after all... System State Continuous and Discrete Variables The quantities
More informationPILCO: A ModelBased and DataEfficient Approach to Policy Search
PILCO: A ModelBased and DataEfficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationBayes Filter Reminder. Kalman Filter Localization. Properties of Gaussians. Gaussians. Prediction. Correction. σ 2. Univariate. 1 2πσ e.
Kalman Filter Localization Bayes Filter Reminder Prediction Correction Gaussians p(x) ~ N(µ,σ 2 ) : Properties of Gaussians Univariate p(x) = 1 1 2πσ e 2 (x µ) 2 σ 2 µ Univariate σ σ Multivariate µ Multivariate
More informationSensor Fusion: Particle Filter
Sensor Fusion: Particle Filter By: Gordana Stojceska stojcesk@in.tum.de Outline Motivation Applications Fundamentals Tracking People Advantages and disadvantages Summary June 05 JASS '05, St.Petersburg,
More informationMachine Learning Srihari. Information Theory. Sargur N. Srihari
Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationExtrasolar planets. Lecture 23, 4/22/14
Extrasolar planets Lecture 23, 4/22/14 Extrasolar planets Extrasolar planets: planets around other stars Also called exoplanets 1783 exoplanets discovered as of 4/21/14 Orbitting 1105 different stars Number
More information2D Image Processing. Bayes filter implementation: Kalman filter
2D Image Processing Bayes filter implementation: Kalman filter Prof. Didier Stricker Dr. Gabriele Bleser Kaiserlautern University http://ags.cs.unikl.de/ DFKI Deutsches Forschungszentrum für Künstliche
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationStatistical validation of PLATO 2.0 planet candidates
Statistical validation of PLATO 2.0 planet candidates Rodrigo F. Díaz Laboratoire d Astrophysique de Marseille José Manuel Almenara, Alexandre Santerne, Claire Moutou, Anthony Lehtuillier, Magali Deleuil
More informationParticle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics
Particle Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Motivation For continuous spaces: often no analytical formulas for Bayes filter updates
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationCENG 793. On Machine Learning and Optimization. Sinan Kalkan
CENG 793 On Machine Learning and Optimization Sinan Kalkan 2 Now Introduction to ML Problem definition Classes of approaches KNN Support Vector Machines Softmax classification / logistic regression Parzen
More informationPossible Prelab Questions.
Possible Prelab Questions. Read Lab 2. Study the Analysis section to make sure you have a firm grasp of what is required for this lab. 1) A car is travelling with constant acceleration along a straight
More informationCS 188: Artificial Intelligence Spring 2009
CS 188: Artificial Intelligence Spring 2009 Lecture 21: Hidden Markov Models 4/7/2009 John DeNero UC Berkeley Slides adapted from Dan Klein Announcements Written 3 deadline extended! Posted last Friday
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationA variational radial basis function approximation for diffusion processes
A variational radial basis function approximation for diffusion processes Michail D. Vrettas, Dan Cornford and Yuan Shen Aston University  Neural Computing Research Group Aston Triangle, Birmingham B4
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment
More informationHierarchical Bayesian Modeling
Hierarchical Bayesian Modeling Making scientific inferences about a population based on many individuals Angie Wolfgang NSF Postdoctoral Fellow, Penn State Astronomical Populations Once we discover an
More informationStatistical techniques for data analysis in Cosmology
Statistical techniques for data analysis in Cosmology arxiv:0712.3028; arxiv:0911.3105 Numerical recipes (the bible ) Licia Verde ICREA & ICC UBIEEC http://icc.ub.edu/~liciaverde outline Lecture 1: Introduction
More informationDETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja
DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION Alexandre Iline, Harri Valpola and Erkki Oja Laboratory of Computer and Information Science Helsinki University of Technology P.O.Box
More informationarxiv:physics/ v2 [physics.dataan] 16 Sep 2013
arxiv:physics/0605197v [physics.dataan] 16 Sep 013 Optimal DataBased Binning for Histograms Kevin H. Knuth Departments of Physics and Informatics University at Albany (SUNY) Albany NY 1, USA September
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationUnit 1 Algebra Boot Camp #1 (2 weeks, August/September)
Unit 1 Algebra Boot Camp #1 (2 weeks, August/September) Common Core State Standards Addressed: F.IF.4: For a function that models a relationship between two quantities, interpret key features of graphs
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Nonparametric Gaussian Process (GP) GP Regression
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationP Values and Nuisance Parameters
P Values and Nuisance Parameters Luc Demortier The Rockefeller University PHYSTATLHC Workshop on Statistical Issues for LHC Physics CERN, Geneva, June 27 29, 2007 Definition and interpretation of p values;
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationThe Laplace Approximation
The Laplace Approximation Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative Models
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte  CMPT 726 Bishop PRML Ch. 4 Classification: Handwritten Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationMachine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014
Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,
More informationBayesian Inference by Density Ratio Estimation
Bayesian Inference by Density Ratio Estimation Michael Gutmann https://sites.google.com/site/michaelgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh
More informationStatistical Techniques in Robotics (16831, F12) Lecture#21 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (16831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian
More informationAnnealed Importance Sampling for Neural Mass Models
for Neural Mass Models, UCL. WTCN, UCL, October 2015. Generative Model Behavioural or imaging data y. Likelihood p(y w, Γ). We have a Gaussian prior over model parameters p(w µ, Λ) = N (w; µ, Λ) Assume
More informationarxiv: v1 [astroph.ep] 13 Jul 2015
Draft version July 14, 2015 Preprint typeset using L A TEX style emulateapj v. 5/2/11 THE METALLICITIES OF STARS WITH AND WITHOUT TRANSITING PLANETS Lars A. Buchhave 1,2 David W. Latham 1 Draft version
More informationAstronomy 421. Lecture 8: Binary stars
Astronomy 421 Lecture 8: Binary stars 1 Key concepts: Binary types How to use binaries to determine stellar parameters The massluminosity relation 2 Binary stars So far, we ve looked at the basic physics
More informationAutonomous Science Platforms and QuestionAsking Machines
Autonomous Science Platforms and QuestionAsking Machines Kevin H. Knuth Departments of Physics and Informatics University at Albany (SUNY) Albany, NY, USA Email: kknuth@albany.edu Julian L. Center Autonomous
More informationChapter 1: NMR Coupling Constants
NMR can be used for more than simply comparing a product to a literature spectrum. There is a great deal of information that can be learned from analysis of the coupling constants for a compound. 1.1 Coupling
More informationNovelty Detection based on Extensions of GMMs for Industrial Gas Turbines
Novelty Detection based on Extensions of GMMs for Industrial Gas Turbines Yu Zhang, Chris Bingham, Michael Gallimore School of Engineering University of Lincoln Lincoln, U.. {yzhang; cbingham; mgallimore}@lincoln.ac.uk
More informationMIXTURE MODELS AND EM
Last updated: November 6, 212 MIXTURE MODELS AND EM Credits 2 Some of these slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Simon Prince, University College London Sergios Theodoridis,
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationHidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012
Hidden Markov Models Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 19 Apr 2012 Many slides courtesy of Dan Klein, Stuart Russell, or
More informationParticle lter for mobile robot tracking and localisation
Particle lter for mobile robot tracking and localisation Tinne De Laet K.U.Leuven, Dept. Werktuigkunde 19 oktober 2005 Particle lter 1 Overview Goal Introduction Particle lter Simulations Particle lter
More informationStat 451 Lecture Notes Monte Carlo Integration
Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:
More informationIntroduction to Particle Filters for Data Assimilation
Introduction to Particle Filters for Data Assimilation Mike Dowd Dept of Mathematics & Statistics (and Dept of Oceanography Dalhousie University, Halifax, Canada STATMOS Summer School in Data Assimila5on,
More informationBayesian Paradigm. Maximum A Posteriori Estimation
Bayesian Paradigm Maximum A Posteriori Estimation Simple acquisition model noise + degradation Constraint minimization or Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization)
More informationBayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine
Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationDetection theory 101 ELECE5410 Signal Processing for Communications
Detection theory 101 ELECE5410 Signal Processing for Communications Binary hypothesis testing Null hypothesis H 0 : e.g. noise only Alternative hypothesis H 1 : signal + noise p(x;h 0 ) γ p(x;h 1 ) Tradeoff
More informationMonitoring the Behavior of Star Spots Using Photometric Data
Monitoring the Behavior of Star Spots Using Photometric Data P. Ioannidis 1 and J.H.M.M. Schmitt 1 1 Hamburger Sternwarte, Gojenbergsweg 112, 21029 HH  Germany Abstract. We use high accuracy photometric
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/992/2/ce725/ Agenda Expectationmaximization
More informationMachine Learning, Midterm Exam
10601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv BarJoseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationLine Broadening. φ(ν) = Γ/4π 2 (ν ν 0 ) 2 + (Γ/4π) 2, (3) where now Γ = γ +2ν col includes contributions from both natural broadening and collisions.
Line Broadening Spectral lines are not arbitrarily sharp. There are a variety of mechanisms that give them finite width, and some of those mechanisms contain significant information. We ll consider a few
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 Kmeans clustering Getting the idea with a simple example 9.2 Mixtures
More informationLecture 8: Graphical models for Text
Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin QuiñoneroCandela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor  p. 1/15 Today s class BiasVariance tradeoff. Penalized regression. Crossvalidation.  p. 2/15 Biasvariance
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationQuantile POD for HitMiss Data
Quantile POD for HitMiss Data YewMeng Koh a and William Q. Meeker a a Center for Nondestructive Evaluation, Department of Statistics, Iowa State niversity, Ames, Iowa 50010 Abstract. Probability of detection
More information1. Use the properties of exponents to simplify the following expression, writing your answer with only positive exponents.
Math120  Precalculus. Final Review. Fall, 2011 Prepared by Dr. P. Babaali 1 Algebra 1. Use the properties of exponents to simplify the following expression, writing your answer with only positive exponents.
More informationThe Kalman Filter ImPr Talk
The Kalman Filter ImPr Talk Ged Ridgway Centre for Medical Image Computing November, 2006 Outline What is the Kalman Filter? State Space Models Kalman Filter Overview Bayesian Updating of Estimates Kalman
More informationTHE problem of phase noise and its influence on oscillators
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 54, NO. 5, MAY 2007 435 Phase Diffusion Coefficient for Oscillators Perturbed by Colored Noise Fergal O Doherty and James P. Gleeson Abstract
More information