Estimation Theory Fredrik Rusek. Chapter 11

Size: px
Start display at page:

Download "Estimation Theory Fredrik Rusek. Chapter 11"

Transcription

1 Estimation Theory Fredrik Rusek Chapter 11

2 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters If no MVU estimator exists, or is very hard to find, we can apply an MMSE estimator to deterministic parameters Recall the form of the Bayesian estimator for DC-levels in WGN Compute the MSE for a given value of A

3 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters If no MVU estimator exists, or is very hard to find, we can apply an MMSE estimator to deterministic parameters Recall the form of the Bayesian estimator for DC-levels in WGN α<1 Compute the MSE for a given value of A

4 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters If no MVU estimator exists, or is very hard to find, we can apply an MMSE estimator to deterministic parameters Recall the form of the Bayesian estimator for DC-levels in WGN α<1 Compute the MSE for a given value of A Variance smaller than classical estimator Large bias for large A

5 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters If no MVU estimator exists, or is very hard to find, we can apply an MMSE estimator To deterministic parameters Recall the form of the Bayesian estimator for DC-levels in WGN α<1 Compute the MSE for a given value of A MSE for Bayesian is smaller for A close to the prior mean, but larger far away

6 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters However, the BMSE is smaller To deterministic parameters

7 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters However, the BMSE is smaller To deterministic parameters

8 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters However, the BMSE is smaller To deterministic parameters

9 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters However, the BMSE is smaller To deterministic parameters

10 Risk Functions To deterministic parameters p(θ) θ p(x θ) x Estimator θ Error: ε = θ - θ

11 Risk Functions To arameters p(θ) θ p(x θ) x Estimator θ Error: ε = θ - θ The MMSE estimator minimizes Bayes Risk where the cost function is

12 Risk Functions To arameters p(θ) θ p(x θ) x Estimator θ Error: ε = θ - θ The MMSE estimator minimizes Bayes Risk where the cost function is

13 Risk Functions To arameters An estimator that minimizez Bayes risk, for some cost, is termed a Bayes estimator p(θ) θ p(x θ) x Estimator θ Error: ε = θ - θ The MMSE estimator minimizes Bayes Risk where the cost function is

14 Let us now optimize for different To arameters For a quadratic cost, we already know that

15 Let us now optimize for different Bayes risk equals

16 Let us now optimize for different Bayes risk equals Minimize this to minimize Bayes risk

17 Let us now optimize for different Bayes risk equals

18 Let us now optimize for different Bayes risk equals

19 Let us now optimize for different Bayes risk equals We need depends on θ, but the limits of the integral Not standard differential

20 Interlude: Leibnitz s rule (very useful)

21 Leibnitz s rule (very useful) We have:

22 Leibnitz s rule (very useful)

23 Leibnitz s rule (very useful) u = θ φ 2 (u)= θ

24 Leibnitz s rule (very useful)

25 Leibnitz s rule (very useful)

26 Leibnitz s rule (very useful)

27 Leibnitz s rule (very useful) Lower limit does not depend on u: u = θ

28 Leibnitz s rule (very useful)

29 Leibnitz s rule (very useful)

30 Leibnitz s rule (very useful)

31 Let us now optimize for different Bayes risk equals We need depends on θ, but the limits of the integral Not standard differential

32 Let us now optimize for different Bayes risk equals

33 Let us now optimize for different Bayes risk equals θ is the median of the posterior

34 Let us now optimize for different Bayes risk equals θ = median

35 Let us now optimize for different Bayes risk equals θ = median

36 Let us now optimize for different Bayes risk equals θ = median θ = arg max

37 Let us now optimize for different Bayes risk equals θ = median θ = arg max Let δ->0: θ = arg max (maximum a posterori (MAP)) θ = arg max

38 Gausian posterior What is relation between mean, median and max? θ = median θ = arg max

39 Gausian posterior What is relation between mean, median and max? θ = median Gaussian posterior makes the three risk functions identical θ = arg max

40 Extension to vector parameter Suppose we have a vector parameter of unknowns θ Consider estimation of θ 1. It still holds that the MAP estimator uses

41 Extension to vector parameter Suppose we have a vector parameter of unknowns θ Consider estimation of θ 1. It still holds that the MAP estimator uses The parameters θ 2. θ N are nuisance parameters, but we can integrate them away

42 Extension to vector parameter Suppose we have a vector parameter of unknowns θ Consider estimation of θ 1. It still holds that the MAP estimator uses The parameters θ 2. θ N are nuisance parameters, but we can integrate them away The estimator is

43 Extension to vector parameter Suppose we have a vector parameter of unknowns θ Consider estimation of θ 1. It still holds that the MAP estimator uses The parameters θ 2. θ N are nuisance parameters, but we can integrate them away The estimator is

44 Extension to vector parameter Suppose we have a vector parameter of unknowns θ Consider estimation of θ 1. It still holds that the MAP estimator uses The parameters θ 2. θ N are nuisance parameters, but we can integrate them away The estimator is

45 Extension to vector parameter In vector form

46 Extension to vector parameter Observations Classical approach (non-bayesian): We must estimate all unknown paramters jointly, except if..what holds????

47 Extension to vector parameter Observations Classical approach (non-bayesian): We must estimate all unknown paramters jointly, except if Fisher information is diagonal Vector MMSE estimator minimizes the MSE for each component of the unknown vector parameter θ, i.e.,

48 Performance of MMSE estimator

49 Performance of MMSE estimator Function of x

50 Performance of MMSE estimator Bayes rule MMSE estimator

51 Performance of MMSE estimator By definition

52 Performance of MMSE estimator

53 Performance of MMSE estimator Element [1,1] of

54 Additive property Independent observations x 1,x 2 Estimate θ Assume that x 1,x 2, θ are jointly Gaussian Theorem 10.2

55 Additive property Independent observations x 1,x 2 Estimate θ Assume that x 1,x 2, θ are jointly Gaussian Typo in book, should include means as well Independent observations

56 Additive property Independent observations x 1,x 2 Estimate θ Assume that x 1,x 2, θ are jointly Gaussian MMSE estimate can be updated sequentially!!!

57 MAP estimator n

58 MAP estimator Benefits compared with MMSE Not needed (typically hard to find) Optimization generally easier than finding the conditional expectation n

59 MAP vs ML estimator Alexander Aljechin ( ) became world chess champion 1927 (by defeating Capablanca) Aljechin defended his title twice, and regained it once

60 MAP vs ML estimator Alexander Aljechin ( ) became world chess champion 1927 (by defeating Capablanca) Aljechin defended his title twice, and regained it once Magnus Calrsen became world champion 2013, and defended the title Once in 2014

61 MAP vs ML estimator Alexander Aljechin ( ) became world chess champion 1927 (by defeating Capablanca) Aljechin defended his title twice, and regained it once Magnus Calrsen became world champion 2013, and defended the title Once in 2014 Now consider a title game in Observe Y=y1, where y1=win Two hypotheses: H1: Aljechin defends title H2: Carlsen defends title

62 MAP vs ML estimator Alexander Aljechin ( ) became world chess champion 1927 (by defeating Capablanca) Aljechin defended his title twice, and regained it once Magnus Calrsen became world champion 2013, and defended the title Once in 2014 Now consider a title game in Observe Y=y1, where y1=win Two hypotheses: H1: Aljechin defends title H2: Carlsen defends title Given the above statistics f(y1 H1)>f(y1 H2)

63 MAP vs ML estimator Alexander Aljechin ( ) became world chess champion 1927 (by defeating Capablanca) Aljechin defended his title twice, and regained it once Magnus Calrsen became world champion 2013, and defended the title Once in 2014 Now consider a title game in Observe Y=y1, where y1=win Two hypotheses: H1: Aljechin defends title H2: Carlsen defends title Given the above statistics f(y1 H1)>f(y1 H2) ML rule: Aljechin takes title (although he died in 1946)

64 MAP vs ML estimator Alexander Aljechin ( ) became world chess champion 1927 (by defeating Capablanca) Aljechin defended his title twice, and regained it once Magnus Calrsen became world champion 2013, and defended the title Once in 2014 Now consider a title game in Observe Y=y1, where y1=win Two hypotheses: H1: Aljechin defends title H2: Carlsen defends title Given the above statistics f(y1 H1)>f(y1 H2) MAP rule: f(h1)=0, -> Carlsen defends title

65 Example DC-level in white noise, uniform prior U[-A 0,A 0 ] The posterior is We got stuck here: Cannot put the denominator in closed form Cannot integrate the nominator Lets try with the MAP estimator

66 Example DC-level in white noise, uniform prior U[-A 0,A 0 ] The posterior is Denominator: Does not depend on A -> irrelevant

67 Example DC-level in white noise, uniform prior U[-A 0,A 0 ] The posterior is Denominator: Does not depend on A -> irrelevant We need to maximize the nominator

68 Example DC-level in white noise, uniform prior U[-A 0,A 0 ]

69 Example DC-level in white noise, uniform prior U[-A 0,A 0 ]

70 Example DC-level in white noise, uniform prior U[-A 0,A 0 ] MAP estimator can be found! Lesson learned (generally true) MAP is easier to find than MMSE

71 Element-wise MAP for vector-valued parameter No-integration-needed benefit gone

72 Element-wise MAP for vector-valued parameter No-integration-needed benefit gone The estimator Minimizes the hit-or-miss risk for each I, where δ->0

73 Element-wise MAP for vector-valued parameter No-integration-needed benefit gone Let us now define another risk function Easy to prove that as δ->0, Bayes risk is minimized by the vector-map-estimator

74 Element-wise MAP and vector valued MAP are not the same Vector-valued MAP solution Element-wise MAP solution

75 Two properties of vector-map For jointly Gaussian x and θ, the conditional mean E(θ x) coincides with the peak of p(θ x). Hence, the vector-map and the MMSE coincide.

76 Two properties of vector-map For jointly Gaussian x and θ, the conditional mean E(θ x) coincides with the peak of p(θ x). Hence, the vector-map and the MMSE coincide. Invariance does not hold for MAP (as opposed to MLE)

77 Invariance Why does invariance hold for MLE? With α=g(θ), it holds that p(x α) = p θ (x g -1 (α))

78 Invariance Why does invariance hold for MLE? With α=g(θ), it holds that p(x α) = p θ (x g -1 (α)) However, MAP involves the prior, and it doesn t hold that p α (α)=p θ (g -1 (α)), since the two distributions are related through the Jacobian

79 Example Exponential Inverse gamma

80 Example Exponential Inverse gamma MAP

81 Example Exponential Inverse gamma MAP

82 Example Exponential Inverse gamma MAP

83 Example Consider estimation of? (holds for MLE)

84 Example Consider estimation of? (holds for MLE)

85 Example Consider estimation of? (holds for MLE)

86 Example Consider estimation of? (holds for MLE)

87 Example Consider estimation of? (holds for MLE)

88 Example Consider estimation of.

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Lecture 13 and 14: Bayesian estimation theory

Lecture 13 and 14: Bayesian estimation theory 1 Lecture 13 and 14: Bayesian estimation theory Spring 2012 - EE 194 Networked estimation and control (Prof. Khan) March 26 2012 I. BAYESIAN ESTIMATORS Mother Nature conducts a random experiment that generates

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Detection Theory. Composite tests

Detection Theory. Composite tests Composite tests Chapter 5: Correction Thu I claimed that the above, which is the most general case, was captured by the below Thu Chapter 5: Correction Thu I claimed that the above, which is the most general

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Estimation Theory Fredrik Rusek. Chapters

Estimation Theory Fredrik Rusek. Chapters Estimation Theory Fredrik Rusek Chapters 3.5-3.10 Recap We deal with unbiased estimators of deterministic parameters Performance of an estimator is measured by the variance of the estimate (due to the

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction Estimation Tasks Short Course on Image Quality Matthew A. Kupinski Introduction Section 13.3 in B&M Keep in mind the similarities between estimation and classification Image-quality is a statistical concept

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Detection theory 101 ELEC-E5410 Signal Processing for Communications Detection theory 101 ELEC-E5410 Signal Processing for Communications Binary hypothesis testing Null hypothesis H 0 : e.g. noise only Alternative hypothesis H 1 : signal + noise p(x;h 0 ) γ p(x;h 1 ) Trade-off

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Estimation Theory Fredrik Rusek. Chapters 6-7

Estimation Theory Fredrik Rusek. Chapters 6-7 Estimation Theory Fredrik Rusek Chapters 6-7 All estimation problems Summary All estimation problems Summary Efficient estimator exists All estimation problems Summary MVU estimator exists Efficient estimator

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur Module Random Processes Version, ECE IIT, Kharagpur Lesson 9 Introduction to Statistical Signal Processing Version, ECE IIT, Kharagpur After reading this lesson, you will learn about Hypotheses testing

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate

More information

EIE6207: Maximum-Likelihood and Bayesian Estimation

EIE6207: Maximum-Likelihood and Bayesian Estimation EIE6207: Maximum-Likelihood and Bayesian Estimation Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

ECE531 Lecture 8: Non-Random Parameter Estimation

ECE531 Lecture 8: Non-Random Parameter Estimation ECE531 Lecture 8: Non-Random Parameter Estimation D. Richard Brown III Worcester Polytechnic Institute 19-March-2009 Worcester Polytechnic Institute D. Richard Brown III 19-March-2009 1 / 25 Introduction

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Parameter Estimation

Parameter Estimation 1 / 44 Parameter Estimation Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay October 25, 2012 Motivation System Model used to Derive

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

9 Multi-Model State Estimation

9 Multi-Model State Estimation Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

CS540 Machine learning L9 Bayesian statistics

CS540 Machine learning L9 Bayesian statistics CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Chapter 8: Least squares (beginning of chapter)

Chapter 8: Least squares (beginning of chapter) Chapter 8: Least squares (beginning of chapter) Least Squares So far, we have been trying to determine an estimator which was unbiased and had minimum variance. Next we ll consider a class of estimators

More information

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Bayesian statistics DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall15 Carlos Fernandez-Granda Frequentist vs Bayesian statistics In frequentist statistics

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Statistical Learning. Philipp Koehn. 10 November 2015

Statistical Learning. Philipp Koehn. 10 November 2015 Statistical Learning Philipp Koehn 10 November 2015 Outline 1 Learning agents Inductive learning Decision tree learning Measuring learning performance Bayesian learning Maximum a posteriori and maximum

More information

EIE6207: Estimation Theory

EIE6207: Estimation Theory EIE6207: Estimation Theory Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: Steven M.

More information

y Xw 2 2 y Xw λ w 2 2

y Xw 2 2 y Xw λ w 2 2 CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:

More information

Bayesian Linear Regression. Sargur Srihari

Bayesian Linear Regression. Sargur Srihari Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Ch. 12 Linear Bayesian Estimators

Ch. 12 Linear Bayesian Estimators Ch. 1 Linear Bayesian Estimators 1 In chapter 11 we saw: the MMSE estimator takes a simple form when and are jointly Gaussian it is linear and used only the 1 st and nd order moments (means and covariances).

More information

Learning the hyper-parameters. Luca Martino

Learning the hyper-parameters. Luca Martino Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth

More information

Lecture 2: Conjugate priors

Lecture 2: Conjugate priors (Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability

More information

Parameter estimation Conditional risk

Parameter estimation Conditional risk Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution

More information

Advanced Signal Processing Introduction to Estimation Theory

Advanced Signal Processing Introduction to Estimation Theory Advanced Signal Processing Introduction to Estimation Theory Danilo Mandic, room 813, ext: 46271 Department of Electrical and Electronic Engineering Imperial College London, UK d.mandic@imperial.ac.uk,

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Remarks on Improper Ignorance Priors

Remarks on Improper Ignorance Priors As a limit of proper priors Remarks on Improper Ignorance Priors Two caveats relating to computations with improper priors, based on their relationship with finitely-additive, but not countably-additive

More information

Basic concepts in estimation

Basic concepts in estimation Basic concepts in estimation Random and nonrandom parameters Definitions of estimates ML Maimum Lielihood MAP Maimum A Posteriori LS Least Squares MMS Minimum Mean square rror Measures of quality of estimates

More information

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept Computational Biology Lecture #3: Probability and Statistics Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept 26 2005 L2-1 Basic Probabilities L2-2 1 Random Variables L2-3 Examples

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Parameter Estimation. Industrial AI Lab.

Parameter Estimation. Industrial AI Lab. Parameter Estimation Industrial AI Lab. Generative Model X Y w y = ω T x + ε ε~n(0, σ 2 ) σ 2 2 Maximum Likelihood Estimation (MLE) Estimate parameters θ ω, σ 2 given a generative model Given observed

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Introduction into Bayesian statistics

Introduction into Bayesian statistics Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

E. Santovetti lesson 4 Maximum likelihood Interval estimation

E. Santovetti lesson 4 Maximum likelihood Interval estimation E. Santovetti lesson 4 Maximum likelihood Interval estimation 1 Extended Maximum Likelihood Sometimes the number of total events measurements of the experiment n is not fixed, but, for example, is a Poisson

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Predictive Hypothesis Identification

Predictive Hypothesis Identification Marcus Hutter - 1 - Predictive Hypothesis Identification Predictive Hypothesis Identification Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU RSISE NICTA Marcus Hutter - 2 - Predictive

More information

Support Vector Machines and Bayes Regression

Support Vector Machines and Bayes Regression Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

Rowan University Department of Electrical and Computer Engineering

Rowan University Department of Electrical and Computer Engineering Rowan University Department of Electrical and Computer Engineering Estimation and Detection Theory Fall 2013 to Practice Exam II This is a closed book exam. There are 8 problems in the exam. The problems

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Detection and Estimation Chapter 1. Hypothesis Testing

Detection and Estimation Chapter 1. Hypothesis Testing Detection and Estimation Chapter 1. Hypothesis Testing Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2015 1/20 Syllabus Homework:

More information

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction 15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Rejection sampling - Acceptance probability. Review: How to sample from a multivariate normal in R. Review: Rejection sampling. Weighted resampling

Rejection sampling - Acceptance probability. Review: How to sample from a multivariate normal in R. Review: Rejection sampling. Weighted resampling Rejection sampling - Acceptance probability Review: How to sample from a multivariate normal in R Goal: Simulate from N d (µ,σ)? Note: For c to be small, g() must be similar to f(). The art of rejection

More information