Lecture 13 and 14: Bayesian estimation theory

Size: px
Start display at page:

Download "Lecture 13 and 14: Bayesian estimation theory"

Transcription

1 1 Lecture 13 and 14: Bayesian estimation theory Spring EE 194 Networked estimation and control (Prof. Khan) March I. BAYESIAN ESTIMATORS Mother Nature conducts a random experiment that generates a parameter θ from a probability density function p(θ). This parameter θ then codes (or parameterizes) the conditional (or measurement) density f(x θ). A random experiment generates a measurement x from f(x θ). The problem is to estimate θ from x. We denote the estimate by (x). The Bayesian setup consists of the following notion. Loss function: The quality of the estimate (x) is measured by a real-valued loss function. Some examples are: Quadratic loss function: L(θ (x)) [θ (x)] T [θ (x]. Binary (01) loss function: L(θ (x)) 0 if (x θ and 1 otherwise. Risk: The risk can be defined as the average loss function over the density p(x θ). The risk basically addresses what is the average loss or risk associate with the estimate (x). Mathematically R(θ ) E x L(θ (x)) L(θ (x))f(x θ)dx. The notation E x indicates that the expectation is over the distribution of the random measurement x (with θ fixed). Bayes risk: Bayes risk is the risk averaged over the prior distribution on θ: R(p ) E θ R(θ ) R(θ )p(θ)dθ L(θ (x)) f(x θ)dxp(θ)dθ. }{{} f(xθ)dxdθ Bayes Risk estimator: The Bayes risk estimator minimizes the Bayes risk: B arg min R(p ) i.e. the value of that minimizes the Bayes risk.

2 2 The Bayes risk estimator is a rule for mapping the observations x into estimates B (x). It depends on the conditional distribution of the measurements and the prior distribution of the parameter. When this prior is not known then the mini-max principle may be used. Mini-max estimator Suppose an experimentalist chooses an estimate and Mother Nature (M) is allowed to choose her prior after the experimentalist (E) has made his/her choice. If Mother Nature does not like the experimentalist she will try to maximize the average risk for any choice : max R(p ). p We can turn this into a game between M and E by allowing E to observe the resulting average risk and permitting him/her to choose a decision rule to minimize this max average risk: min max R(p ). p The estimator that does this is called the mini-max estimator θ: θ arg min max R(p ). p There are other variants of this setup and leads to very fundamental questions in game theory. Recall that the Bayes risk is given by R(p ) II. COMPUTING BAYES RISK ESTIMATORS L(θ (x))f(x θ)dxdθ where f(x θ) f(x θ)p(θ). From Bayes rule we have f(x θ) f(θ x)f(x) where f(θ x) is the posterior density of θ given x and f(x) is the marginal density for x: f(θ x) f(x) f(x θ) f(x θ) f(x) f(x) p(θ) f(x θ)dθ f(x θ)p(θ)dθ. There is an important physical interpretation of the first formula. The prior density is mapped to the posterior density by the ratio of the conditional measurement density to the marginal density p(θ) x f(θ x)

3 3 i.e. the data x is used to map the prior into the posterior. The Bayes risk estimator is thus B (x) arg min L(θ (x))f(x θ)dxdθ arg min L(θ (x))f(θ x)f(x)dxdθ ( ) arg min L(θ (x))f(θ x)dθ f(x)dx arg min L(θ (x))f(θ x)dθ }{{} Conditional Bayes risk since the marginal density f(x) is non-negative. The result says that the Bayes risk estimator is the estimator that minimizes the conditional risk; conditional risk is the loss averaged over the conditional distribution of θ given x. Now to compute a particular estimator we need to consider some typical loss functions. Quadratic loss function: When the loss function is quadratic: L(θ (x)) [θ (x)] T [θ (x)] we may write the conditional Bayes risk as L(θ (x))f(θ x)dθ [θ (x)] T [θ (x)]f(θ x)dθ the gradient of the risk w.r.t is ( ) [θ (x)] T [θ (x)]f(θ x)dθ ( [θ (x)] T [θ (x)] ) f(θ x)dθ 2 [θ (x)]f(θ x)dθ and the second-derivative (curl?) is [ ] T ( ) [ T 2 [θ I > 0. The Bayes risk estimator now becomes 2 [θ (x)]f(θ x)dθ 0 θf(θ x)dθ (x)f(θ x)dθ B (x) θf(θ x)dθ E(θ x).

4 4 We say that the Bayes risk estimator under the quadratic loss function is the conditional mean of θ given x. In a nutshell Bayes estimation under quadratic loss comes down to the computation of the mean of the conditional density f(θ x). Nonlinear filtering is a generic term for this calculation because generally the result is a nonlinear function of the measurement x. Uniform loss function: Assume that the loss function is L(θ (x)) { 0 θ (x) ε 1 θ (x) > ε where ε > 0. Based on this loss function the expected posterior loss function becomes EL(θ (x)) 1 P ( θ (x) > ε) + 0 P ( θ (x) ε) P ( θ (x) > ε) 1 P ( θ (x) ε) 1 (x)+ε (x) ε f(θ x)dθ. The above is minimized when the negative term is maximized: (x) arg max θ In the limit that ε 0 the above becomes (x)+ε (x) ε f(θ x)dθ. which is the MAP estimator. lim ε 0 (x) arg max f(θ x) θ

5 5 Lecture 14: Wednesday Example 1: A radioactive source emits n radioactive particles and an imperfect Geiger counter records k n of them. Our problem is to estimate n from the measurements k. We assume that n is drawn from a Poisson distribution with known parameter λ: λ λn P [n] e n! n 0. The Poisson distribution characterizes the rate of emission of a process in a given interval of time (or space). Its likely to see a large n when the expected number of occurrences λ is high and a small n when the expected number of occurrences λ is small. We can show that E[n] λ and E((n E[n]) 2 ) λ. The number of recorded counts follow a binomial distribution: P [k n] ( n k ) p k (1 p) n k 0 k n E[k n] np E[k n] np(1 p). The Binomial distribution is the sum of i.i.d Bernoulli trials. Suppose a rv is 1 with probability p and 0 with 1 p. Then the binomial distribution characterizes what is the total number of 1 s we may observe over n trials. In order to proceed with the Bayesian analysis we need to compute the posterior distribution of n k: P [n k] P [n k] P [k] which requires the joint and the marginals. We have ( n P [n k] P [k n]p [n] )p k (1 p) n k λ λn e 0 k n n 0. k n! The marginal of k is P [k] ( n )p k (1 p) n k λ λn e k n! k 0 λ n k (λp) k (1 p) n k e λ k!(n k)! (λp) k e λ k! (λp)k e λ+λ λp k! λp (λp)k e k! (λ(1 p)) n k (n k)!

6 6 which is Poisson with rate λp. Now the posterior is P [n k] n! k!(n k)! pk (1 p) n k λ λn e n! e λp (λp)k k! 1 (n k)! (λ(1 p))n k e λ(1 p) n k which is similar to Poisson but n instead of starting from 0 starts from k. This has been called the Poisson distribution with displacement k. The conditional mean and variance are 1 E[n k] n (n k)! (λ(1 p))n k e λ(1 p) 1 (n k + k) (n k)! (λ(1 p))n k e λ(1 p) 1 (n k) (n k)! (λ(1 p))n k e λ(1 p) + λ(1 p)e λ(1 p) λ(1 p) + k; n k (n k)! (λ(1 p))n k 1 + ke λ(1 p) e λ(1 p) 1 k (n k)! (λ(1 p))n k e λ(1 p) E[(n E[n k]) 2 k] λ(1 p) Exercise. When the loss function is quadratic the optimal Bayes estimator is the conditional mean and thus n B E[n k] λ(1 p) + k. The Bayes estimate is k when p 1 independent of the expected number of occurrences λ. Since our measurement model is Bernoulli we can show that P (k n n) 1 when p 1. Similarly when p 0 i.e. we see no observations almost surely then the Bayes estimate is λ which is the expected number of occurrences. For 0 < p < 1 Bayes estimate optimally combines the two extremes. We can also think of λ(1 p) as the expected number of missed counts in this sense Bayes estimate applies a correction to include the missed counts. One can easily show that E[ n B ] λ E[n] i.e. the estimate is unbiased. However this is not conditionally unbiased i.e. The mean squared error in the estimator is E[ n B n] E[k n] + λ(1 p) np + λ(1 p) n. E[(n n B ) 2 ] E k ( E[(n nb ) 2 ] k ) E k ( E[(n E(n k)) 2 ] k ) E k (λ(1 p) k) λ(1 p).

7 7 III. MULTIVARIATE NORMAL Let x and y be jointly distributed according to the normal distribution: [ ] ([ ] [ ]) x 0 R xx R xy N y 0 Recall that the marginals are also normal i.e. R yx R yy x N(0 R xx ) y N(0 R yy ) where R xx E[xx T ] and so on. It can be shown that y x N(R yx R 1 xx x R yy R yx R 1 xx R xy ) x y N(R xy R 1 yy y R xx R xy R 1 yy R yx ). Hence the optimal Bayes estimate under quadratic loss is the mean of the posterior i.e. x B R xy Ryy 1 y. We can think of this as Mother Nature generating x from p(x) N(0 R xx ) distribution and Father Nature generating a measurement from from f(y x) which is also normal. What function relating y to x will result into the above f(y x)? Recalling that the sum of two normal random variables is also normal note that y Hx + r with H R yx Rxx 1 and r N(0 Q) statistically independent from x will result into the above f(y x). In other words we can generate the jointly normal x and y process as described above by two statistically independent normal random vectors x N(0 R xx ) and r N(0 Q) and by relating y and x as above. While generating this signal plus measurement model i.e. x being a signal and y Hx + r being the measurement we define one new matrix R yx and R yy is directly given by R xx and Q. Clearly R xy R T yx. Show that R yy R xy R 1 xx R T xy + Q. In short one can generate a jointly normal random process by two independent normal processes and a linear map.

8 8 IV. LINEAR STATISTICAL MODEL Consider the following signal plus noise model: y Hx + n where x N(0 R xx ) and n N(0 R nn ) are statistically independent. The correlation between x and y is R yx E[yx T ] E[(Hx + n)x T ] HR xx and the covariance of y is E[yy T ] HR xx H T + R nn. Thus x and y are jointly normal: [ ] ([ x N y 0 0 ] [ R xx HR xx R xx H T HR xx H T + R nn ]). Clearly the Bayes estimate under quadratic loss is the conditional mean of x y: with conditional covariance: x B R xx H T (HR xx H T + R nn ) 1 y }{{} G then P R xx R xx H T (HR xx H T + R nn ) 1 HR }{{} xx. G From the matrix inversion lemma note that P (R 1 nnh) 1 P 1 (R 1 nnh) GHR xx R xx P P (P 1 R xx I)(R 1 nnh) P ((R 1 nnh)r xx I) P (I + H T R 1 nnhr xx I) P H T R 1 nnhr xx G P H T R 1 nn. Hence the estimator can be re-written with x B P H T R 1 nny P (R 1 nnh) 1.

9 9 V. SEQUENTIAL BAYES The results of the previous section may be used to derive recursive estimates of the random vector x when the measurement vector y t [y 0 y 1... y t ] T increases in dimension with time. The basic idea is to write [ y t H t x + n t ] [ ] [ ] y t 1 y t H t 1 c T t x + n t 1 n t i.e. the kth measurement can be written as y k c T k x + n k where x N(0 R xx ) and n N(0 R t ) are statistically independent with R t being diagonal with elements r tt on the diagonal; R 00 r 00. This means that [ ] R 1 t Rt T rtt 1 The joint distribution of x and y t is [ ] ([ ] [ ]) x 0 R xx R xx Ht T N. y t 0 H t R xx H t R xx Ht T + R t The posterior is x y N( x t P t ) x t P t H T t R 1 t y t P 1 t (R 1 xx + H T t R 1 t H t ). The dimensions of H t R t and y t increase in time whereas they are fixed for x t and P t. How can we make the estimate equations recursive?.

Mathematical statistics: Estimation theory

Mathematical statistics: Estimation theory Mathematical statistics: Estimation theory EE 30: Networked estimation and control Prof. Khan I. BASIC STATISTICS The sample space Ω is the set of all possible outcomes of a random eriment. A sample point

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

STAT 830 Bayesian Estimation

STAT 830 Bayesian Estimation STAT 830 Bayesian Estimation Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Bayesian Estimation STAT 830 Fall 2011 1 / 23 Purposes of These

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

13. Parameter Estimation. ECE 830, Spring 2014

13. Parameter Estimation. ECE 830, Spring 2014 13. Parameter Estimation ECE 830, Spring 2014 1 / 18 Primary Goal General problem statement: We observe X p(x θ), θ Θ and the goal is to determine the θ that produced X. Given a collection of observations

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Estimation Theory Fredrik Rusek. Chapter 11

Estimation Theory Fredrik Rusek. Chapter 11 Estimation Theory Fredrik Rusek Chapter 11 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters If no MVU estimator exists, or is very hard to find, we can apply

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

INTRODUCTION TO BAYESIAN METHODS II

INTRODUCTION TO BAYESIAN METHODS II INTRODUCTION TO BAYESIAN METHODS II Abstract. We will revisit point estimation and hypothesis testing from the Bayesian perspective.. Bayes estimators Let X = (X,..., X n ) be a random sample from the

More information

Review of Probabilities and Basic Statistics

Review of Probabilities and Basic Statistics Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to

More information

Lecture 16: State Space Model and Kalman Filter Bus 41910, Time Series Analysis, Mr. R. Tsay

Lecture 16: State Space Model and Kalman Filter Bus 41910, Time Series Analysis, Mr. R. Tsay Lecture 6: State Space Model and Kalman Filter Bus 490, Time Series Analysis, Mr R Tsay A state space model consists of two equations: S t+ F S t + Ge t+, () Z t HS t + ɛ t (2) where S t is a state vector

More information

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro CS37300 Class Notes Jennifer Neville, Sebastian Moreno, Bruno Ribeiro 2 Background on Probability and Statistics These are basic definitions, concepts, and equations that should have been covered in your

More information

Name of the Student: Problems on Discrete & Continuous R.Vs

Name of the Student: Problems on Discrete & Continuous R.Vs Engineering Mathematics 05 SUBJECT NAME : Probability & Random Process SUBJECT CODE : MA6 MATERIAL NAME : University Questions MATERIAL CODE : JM08AM004 REGULATION : R008 UPDATED ON : Nov-Dec 04 (Scan

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

STAT 430/510 Probability

STAT 430/510 Probability STAT 430/510 Probability Hui Nie Lecture 16 June 24th, 2009 Review Sum of Independent Normal Random Variables Sum of Independent Poisson Random Variables Sum of Independent Binomial Random Variables Conditional

More information

Lecture 8 October Bayes Estimators and Average Risk Optimality

Lecture 8 October Bayes Estimators and Average Risk Optimality STATS 300A: Theory of Statistics Fall 205 Lecture 8 October 5 Lecturer: Lester Mackey Scribe: Hongseok Namkoong, Phan Minh Nguyen Warning: These notes may contain factual and/or typographic errors. 8.

More information

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept Computational Biology Lecture #3: Probability and Statistics Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept 26 2005 L2-1 Basic Probabilities L2-2 1 Random Variables L2-3 Examples

More information

18.175: Lecture 13 Infinite divisibility and Lévy processes

18.175: Lecture 13 Infinite divisibility and Lévy processes 18.175 Lecture 13 18.175: Lecture 13 Infinite divisibility and Lévy processes Scott Sheffield MIT Outline Poisson random variable convergence Extend CLT idea to stable random variables Infinite divisibility

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

ACM 116: Lectures 3 4

ACM 116: Lectures 3 4 1 ACM 116: Lectures 3 4 Joint distributions The multivariate normal distribution Conditional distributions Independent random variables Conditional distributions and Monte Carlo: Rejection sampling Variance

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabás Póczos & Aarti Singh 2014 Spring Administration http://www.cs.cmu.edu/~aarti/class/10701_spring14/index.html Blackboard

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

STAT 430/510: Lecture 15

STAT 430/510: Lecture 15 STAT 430/510: Lecture 15 James Piette June 23, 2010 Updates HW4 is up on my website. It is due next Mon. (June 28th). Starting today back at section 6.4... Conditional Distribution: Discrete Def: The conditional

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Chapter 2. Discrete Distributions

Chapter 2. Discrete Distributions Chapter. Discrete Distributions Objectives ˆ Basic Concepts & Epectations ˆ Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric Distributions ˆ Introduction to the Maimum Likelihood Estimation

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Convergence in Distribution

Convergence in Distribution Convergence in Distribution Undergraduate version of central limit theorem: if X 1,..., X n are iid from a population with mean µ and standard deviation σ then n 1/2 ( X µ)/σ has approximately a normal

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

P (A G) dp G P (A G)

P (A G) dp G P (A G) First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume

More information

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017 Department of Statistical Science Duke University FIRST YEAR EXAM - SPRING 017 Monday May 8th 017, 9:00 AM 1:00 PM NOTES: PLEASE READ CAREFULLY BEFORE BEGINNING EXAM! 1. Do not write solutions on the exam;

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Measure-theoretic probability

Measure-theoretic probability Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27 The probability space De nition The (Ω, A, P) measure space is

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

Problems on Discrete & Continuous R.Vs

Problems on Discrete & Continuous R.Vs 013 SUBJECT NAME SUBJECT CODE MATERIAL NAME MATERIAL CODE : Probability & Random Process : MA 61 : University Questions : SKMA1004 Name of the Student: Branch: Unit I (Random Variables) Problems on Discrete

More information

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity

More information

Statistical Theory MT 2006 Problems 4: Solution sketches

Statistical Theory MT 2006 Problems 4: Solution sketches Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Lecture 12. Poisson random variables

Lecture 12. Poisson random variables 18.440: Lecture 12 Poisson random variables Scott Sheffield MIT 1 Outline Poisson random variable definition Poisson random variable properties Poisson random variable problems 2 Outline Poisson random

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes Lecture Notes 7 Random Processes Definition IID Processes Bernoulli Process Binomial Counting Process Interarrival Time Process Markov Processes Markov Chains Classification of States Steady State Probabilities

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes

More information

Intro to Probability. Andrei Barbu

Intro to Probability. Andrei Barbu Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems

More information

Support Vector Machines and Bayes Regression

Support Vector Machines and Bayes Regression Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering

More information

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D. probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I

More information

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t ) LECURE NOES 21 Lecture 4 7. Sufficient statistics Consider the usual statistical setup: the data is X and the paramter is. o gain information about the parameter we study various functions of the data

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

for valid PSD. PART B (Answer all five units, 5 X 10 = 50 Marks) UNIT I

for valid PSD. PART B (Answer all five units, 5 X 10 = 50 Marks) UNIT I Code: 15A04304 R15 B.Tech II Year I Semester (R15) Regular Examinations November/December 016 PROBABILITY THEY & STOCHASTIC PROCESSES (Electronics and Communication Engineering) Time: 3 hours Max. Marks:

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:

More information

EE514A Information Theory I Fall 2013

EE514A Information Theory I Fall 2013 EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/

More information

Name of the Student: Problems on Discrete & Continuous R.Vs

Name of the Student: Problems on Discrete & Continuous R.Vs Engineering Mathematics 08 SUBJECT NAME : Probability & Random Processes SUBJECT CODE : MA645 MATERIAL NAME : University Questions REGULATION : R03 UPDATED ON : November 07 (Upto N/D 07 Q.P) (Scan the

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive

More information

Random Variables. P(x) = P[X(e)] = P(e). (1)

Random Variables. P(x) = P[X(e)] = P(e). (1) Random Variables Random variable (discrete or continuous) is used to derive the output statistical properties of a system whose input is a random variable or random in nature. Definition Consider an experiment

More information

Review of probability

Review of probability Review of probability Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts definition of probability random variables

More information

18.175: Lecture 17 Poisson random variables

18.175: Lecture 17 Poisson random variables 18.175: Lecture 17 Poisson random variables Scott Sheffield MIT 1 Outline More on random walks and local CLT Poisson random variable convergence Extend CLT idea to stable random variables 2 Outline More

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

Multivariate distributions

Multivariate distributions CHAPTER Multivariate distributions.. Introduction We want to discuss collections of random variables (X, X,..., X n ), which are known as random vectors. In the discrete case, we can define the density

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Classical and Bayesian inference

Classical and Bayesian inference Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Frank Porter April 3, 2017

Frank Porter April 3, 2017 Frank Porter April 3, 2017 Chapter 1 Probability 1.1 Definition of Probability The notion of probability concerns the measure ( size ) of sets in a space. The space may be called a Sample Space or an Event

More information

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit Evan Kwiatkowski, Jan Mandel University of Colorado Denver December 11, 2014 OUTLINE 2 Data Assimilation Bayesian Estimation

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................

More information

Practice Examination # 3

Practice Examination # 3 Practice Examination # 3 Sta 23: Probability December 13, 212 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use a single

More information

MIT Spring 2016

MIT Spring 2016 MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Final Exam # 3. Sta 230: Probability. December 16, 2012

Final Exam # 3. Sta 230: Probability. December 16, 2012 Final Exam # 3 Sta 230: Probability December 16, 2012 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use the extra sheets

More information