ECE295, Data Assimila0on and Inverse Problems, Spring 2015

Similar documents
ECE295, Data Assimila0on and Inverse Problems, Spring 2015

ECE295, Data Assimilation and Inverse Problems, Spring 2015

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Answers and expectations

Computational Perception. Bayesian Inference

Frequentist Statistics and Hypothesis Testing Spring

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Fundamental Probability and Statistics

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

For final project discussion every afternoon Mark and I will be available

Strong Lens Modeling (II): Statistical Methods

Bayesian Regression Linear and Logistic Regression

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Bayes

Bayesian analysis in nuclear physics

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Bayesian Inference and MCMC

Density Estimation. Seungjin Choi

A Bayesian Approach to Phylogenetics

The connection of dropout and Bayesian statistics

Physics 509: Bootstrap and Robust Parameter Estimation

Bayesian Machine Learning

Cheng Soon Ong & Christian Walder. Canberra February June 2018

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Part 1: Expectation Propagation

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Bayesian Methods for Machine Learning

Naïve Bayes classification

Intro to Bayesian Methods

Probability and Estimation. Alan Moses

Machine Learning using Bayesian Approaches

Bayesian Inference. Introduction

ECE 450 Homework #3. 1. Given the joint density function f XY (x,y) = 0.5 1<x<2, 2<y< <x<4, 2<y<3 0 else

STA414/2104 Statistical Methods for Machine Learning II

CS491/691: Introduction to Aerial Robotics

CS 630 Basic Probability and Information Theory. Tim Campbell

Bayesian Models in Machine Learning

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Lecture : Probabilistic Machine Learning

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Monte Carlo Integration. Computer Graphics CMU /15-662, Fall 2016

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Modeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems

Probabilistic Machine Learning

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

STA 4273H: Sta-s-cal Machine Learning

ECE Homework Set 3

Intelligent Systems I

Bayesian RL Seminar. Chris Mansley September 9, 2008

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning

2D Image Processing. Bayes filter implementation: Kalman filter

Bayes Nets: Sampling

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Lecture 6: Model Checking and Selection

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Introduction to Mobile Robotics Probabilistic Robotics

Bayesian Networks BY: MOHAMAD ALSABBAGH

Introduc)on to Bayesian Methods

Naive Bayes classification

Machine Learning 4771

arxiv:astro-ph/ v1 14 Sep 2005

Approximate Inference Part 1 of 2

PATTERN RECOGNITION AND MACHINE LEARNING

ECE285/SIO209, Machine learning for physical applications, Spring 2017

When one of things that you don t know is the number of things that you don t know

Probabilistic Graphical Models

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Nonparametric Bayesian Methods (Gaussian Processes)

CSC2515 Assignment #2

LEARNING WITH BAYESIAN NETWORKS

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

Inferring information about models from samples

Classical and Bayesian inference

Expectation Propagation Algorithm

Some Probability and Statistics

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

ECE521 week 3: 23/26 January 2017

CS 6140: Machine Learning Spring 2016

Ensemble Data Assimilation and Uncertainty Quantification

Review of Probability Theory

Introduction to Stochastic Processes

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

an introduction to bayesian inference

Transcription:

ECE295, Data Assimila0on and Inverse Problems, Spring 2015 1 April, Intro; Linear discrete Inverse problems (Aster Ch 1 and 2) Slides 8 April, SVD (Aster ch 2 and 3) Slides 15 April, RegularizaFon (ch 4) 22 April, Sparse methods (ch 7.2-7.3), radar 29 April, more on Sparse 6 May, Bayesian methods and Monte Carlo methods (ch 11), Markov Chain Monte Carlo 13 May, IntroducFon to sequenfal Bayesian methods, Kalman Filter (KF) 20 May, Gaussian Mixture Model (Nima ) 27 May, Ensemple Kalman Filer (EnKF) 3 June, EnKF, ParFcle Filter, Homework: Just email the code to me (I dont need anything else). Call the files LastName_ExXX. Homework is due 8am on Wednesday. 8 April: Hw 1: Download the matlab codes for the book (cd_5.3) from this website 15 April: SVD analysis: SVD homework. You can also try replacing the matrix in the Shaw problem with the beamforming sensing matrix. The sensing matrix is available here. 22 April Late April: Beamforming May: Ice- flow from GPS

Generating samples from an arbitrary posterior PDF The rejection method 277

Generating samples from the posterior PDF Rejection method But this requires us to know P max Step 1: generate a uniform random variable, x i between a and b p(x i )= 1 (b a), a x i b Step 2: generate a second uniform random variable, y i p(y i )= 1, 0 y i p max p max Step 3: accept x i if y i p(x i d) otherwise reject Step 4: go to step 1 278

Monte Carlo integration Consider any integral of the form I = Z M f(m)p(m d)dm Given a set of samples m i (i=,...,n s ) with sampling density h(m i ), the Monte Carlo approximation to I is given by I N s X i=1 f(m i )p(m i d) h(m i ) Only need to know p(m d) to a multiplicative constant If the sampling density is proportional to p(m i d) then, h(m) =N s p(m d) I 1 X N s N s i=1 f(m i ) The variance of the f(m i ) values gives the numerical integration error in I 286

Example: Monte Carlo integration Finding the area of a circle by throwing darts Z I = A f(m)dm f(m) = ( 1 m inside circle 0 otherwise h(m) = N s A I 1 X N S N s i=1 f(m i ) Number of points inside the circle Total number of points 287

Monte Carlo integration We have I = N s ZM f(m)p(m d)dm X i=1 f(m i )p(m i d) h(m i ) 1 X N s N s i=1 f(m i ) The variance in this estimate is given by I 2 = 1 N s 1 N 2 s N s X i=1 f 2 (m i ) 1 N 2 Xs f(m i ) N s To carry out MC integration of the posterior we ONLY NEED to be able to evaluate the integrand up to a multiplicative constant. i=1 As the number of samples, N s, grows the error in the numerical estimate will decrease with the square root of N s. In principal any sampling density h(m) can be used but the convergence rate will be fastest when h(m) p(m d). What useful integrals should one calculate using samples distributed according to the posterior p(m d )? 288

The volume of a N- dim cube 2^N For N=2 3.14/2^2=3/4 For N=10 3.14^5/120/2^10=2/1000 This will be hard!

Probabilistic inference Bayes theorem and all that...

Books

Highly nonlinear inverse problems Multi-modal data misfit/objective function What value is there in an optimal model? 250

Probabilistic inference: History p(m data, I) Z p(x)dx =1 Mass of Saturn Z b Pr(x : a x b) = p(x)dx a (Laplace 1812) We have already met the concept of using a probability density function p(x) to describe the state of a random variable. In the probabilistic (or Bayesian) approach, probabilities are also used to describe inferences (or degrees of belief) about x even if x itself is not a random variable. Laplace (1812) rediscovered the work of Bayes (1763), and used it to constrain the mass of Saturn. In 150 years the estimate changed by only 0.63%! But Laplace died in 1827 and then the arguments started... 251

Bayesian or Frequentist: the arguments Mass of Saturn Some thought that using probabilities to describe degrees of belief was too subjective and so they redefined probability as the long run relative frequency of a random event. This became the Frequentist approach. To estimate the mass of Saturn the frequentist has to relate the mass to the data through a statistic. Since the data contain `random noise probability theory can be applied to the statistic (which becomes the random variable!). This gave birth to the field of statistics! But how to choose the statistic? `.. a plethora of tests and procedures without any clear underlying rationale `Bayesian is subjective and requires too many guesses A. Frequentist (D. S. Sivia) `Frequentist is subjective, but BI can solve problems more completely A. Bayesian For a discussion see Sivia (2005, pp 8-11). Data Analysis: A Bayesian Tutorial 2 nd Ed. D. S. Sivia with J. Skilling, O.U.P. (2005) 252

Probability theory: Joint probability density functions A PDF for variable x p(x) Joint PDF of x and y p(x, y) Probability is proportional to area under the curve or surface If x and y are independent their joint PDF is separable p(x, y) =p(x) p(y) 253

Probability theory: Conditional probability density functions Joint PDF of x and y The PDF of x and y taken together Conditional PDFs The PDF of x given a value for y p(x, y) p(x y) p(y x) Relationship between joint and conditional PDFs p(x, y) =p(x y) p(y) 254

Probability theory: Marginal probability density functions A marginal PDF is a summation of probabilities Marginal PDFs p(y) = p(x) = Z Z p(x, y)dx p(x, y)dy Relationship between joint, conditional and marginal PDFs p(x, y) =p(x y) p(y) 255

Prior probability density functions What we know from previous experiments, or what we guess... p(m) =k exp p(x) =k exp ( (x x o) 2 2 2 ½ 1 ¾ 2 (m m o) T Cm 1 (m m o) ) Beware: there is no such thing as a non-informative prior p(x)dx = p(y)dy p(y) =p(x) dx dy p(x) =C p(x 2 )= C 2x As A this is not proper! 256

Likelihood functions The likelihood that the data would have occurred for a given model p(d i x) =exp ( (x x o,i) 2 2 2 i ½ p(d m) =exp 1 ¾ 2 (d Gm)T CD 1 (d Gm) ) Maximizing likelihoods is what Frequentists do. It is what we did earlier. max m p(d m) =min ln(p(d m)) m =min m (d Gm)T CD 1 (d Gm) Maximizing the likelihood = minimizing the data prediction error 257

Bayes theorem All information is expressed in terms of probability density functions Bayes rule (1763) Conditional PDFs Assumptions p(m d,i) p(d m,i) p(m I) data model parameters Posterior probability density Likelihood x Prior probability density 1702-1761 What is known after the data are collected Measuring fit to data What is known before the data are collected 258

Example: Measuring the mass of an object If we have an object whose mass, m, we which to determine. Before we collect any data we believe that its mass is approximately 10.0 ± 1 g. In probabilistic terms we could represent this as a Gaussian prior distribution prior p(m) = 1 2 e 1 2 (m 10.0)2 Suppose a measurement is taken and a value 11.2 g is obtained, and the measuring device is believed to give Gaussian errors with mean 0 and = 0.5 g. Then the likelihood function can be written p(d m) = 1 0.5 2 e 2(m 11.2)2 Likelihood p(m d) = 1 e 1 2 (m 10.0)2 2(m 11.2) 2 1 2 (m 10.96)2 p(m d) e 1/5 Posterior The posterior PDF becomes a Gaussian centred at the value of 10.96 g with standard deviation = (1/5) 1/2 0.45. 259

Example: Measuring the mass of an object The more accurate new data has changed the estimate of m and decreased its uncertainty For the general linear inverse problem we would have Prior: Likelihood: p(m) exp p(d m) exp One data point problem ½ 1 2 (m m o) T C 1 m (m m o) ½ 1 2 (d Gm)T C 1 d (d Gm) ¾ Posterior PDF ½ exp 1 ¾ 2 [(d Gm)T Cd 1 (d Gm)+(m m o) T Cm 1 (m m o)] ¾ 260

Product of Gaussians=Gaussian: For the general linear inverse problem we would have Prior: Likelihood: p(m) exp One data point problem ½ 1 ¾ 2 (m m o) T Cm 1 (m m o) ½ p(d m) exp 1 ¾ 2 (d Gm)T Cd 1 (d Gm) Posterior PDF ½ exp 1 ¾ 2 [(d Gm)T Cd 1 (d Gm)+(m m o) T Cm 1 (m m o)] # exp$ 1 2 m ˆm % S 1 = G T C d 1 G + C m 1 ˆm = G T C d 1 G + C m 1 [ ] T S 1 [ m ˆm ] ( ) 1 ( G T C 1 d d + C 1 m m 0 ) = m 0 + G T C 1 1 ( d G + C m ) 1 G T C 1 d d Gm 0 & ' ( 260 ( )

The biased coin problem Suppose we have a suspicious coin and we want to know if it is biased or not? Let be the probability that we get a head. = 1 : means we always get a head. = 0 : means we always get a tail. = 0.5 : means equal likelihood of head or tail. We can collect data by tossing the coin many times {H, T, T, H,...} We seek a probability density function for given the data p( d,i) p(d,i) p( I) Posterior PDF Likelihood x Prior PDF 261

The biased coin problem What is the prior PDF for? Let us assume that it is uniform p( I) =1, 0 1 What is the Likelihood function? The probability of observing R heads out of N coin tosses is p(d,i) R (1 ) N R p( d,i) p(d,i) p( I) Posterior PDF Likelihood x Prior PDF 262

The biased coin problem We have the posterior PDF for given the data and our prior PDF p( d,i) R (1 ) N R After N coin tosses let R = number of heads observed. Then we Can plot the probability density for p( d) as data are collected Number of data 263

The biased coin problem p( d,i) R (1 ) N R Number of data 264

The biased coin problem But what if three people had different opinions about the coin prior to collecting the data? Dr. Blue knows nothing about the coin. Dr. Green thinks the coin is likely to be almost fair. Dr. Red thinks the coin is either highly biased to heads or tails. p(d,i) R (1 ) N R 265

The biased coin problem Number of data 266