Lecture 11 and 12: Basic estimation theory

Similar documents
EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

1.010 Uncertainty in Engineering Fall 2008

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Maximum Likelihood Estimation and Complexity Regularization

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Lecture 12: September 27

LECTURE NOTES 9. 1 Point Estimation. 1.1 The Method of Moments

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Unbiased Estimation. February 7-12, 2008

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Problem Set 4 Due Oct, 12

Exponential Families and Bayesian Inference

Chapter 6 Principles of Data Reduction

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Stat410 Probability and Statistics II (F16)

Lecture 2: Monte Carlo Simulation

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Estimation for Complete Data

Bayesian Methods: Introduction to Multi-parameter Models

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Topic 9: Sampling Distributions of Estimators

Lecture 13: Maximum Likelihood Estimation

Lecture 7: Properties of Random Samples

Kurskod: TAMS11 Provkod: TENB 21 March 2015, 14:00-18:00. English Version (no Swedish Version)

Mathematical Statistics Anna Janicka

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Estimation of the Mean and the ACVF

Random Variables, Sampling and Estimation

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation

Machine Learning Brett Bernstein

6. Sufficient, Complete, and Ancillary Statistics

Statistical Theory MT 2009 Problems 1: Solution sketches

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

5. Likelihood Ratio Tests

Regression and generalization

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

AMS570 Lecture Notes #2

Probability and MLE.

EE 4TM4: Digital Communications II Probability Theory

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Questions and Answers on Maximum Likelihood

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Solutions: Homework 3

Lecture 3: MLE and Regression

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Stat 421-SP2012 Interval Estimation Section

Summary. Recap ... Last Lecture. Summary. Theorem

AMS 216 Stochastic Differential Equations Lecture 02 Copyright by Hongyun Wang, UCSC ( ( )) 2 = E X 2 ( ( )) 2

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

STATISTICS 593C: Spring, Model Selection and Regularization

STAT Homework 1 - Solutions

f(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1

Distribution of Random Samples & Limit theorems

Maximum Likelihood Estimation

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

STATISTICAL INFERENCE

Statistical Theory MT 2008 Problems 1: Solution sketches

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

CS284A: Representations and Algorithms in Molecular Biology

Department of Mathematics

Lecture 6 Ecient estimators. Rao-Cramer bound.

REGRESSION WITH QUADRATIC LOSS

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Lecture 33: Bootstrap

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

The standard deviation of the mean

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Lecture 10 October Minimaxity and least favorable prior sequences

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Lecture 7: October 18, 2017

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

Clases 7-8: Métodos de reducción de varianza en Monte Carlo *

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

7.1 Convergence of sequences of random variables

Section 14. Simple linear regression.

Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Machine Learning Brett Bernstein

Lecture 23: Minimal sufficiency

Lecture 9: September 19

1 Introduction to reducing variance in Monte Carlo simulations

Monte Carlo Integration

Stat 319 Theory of Statistics (2) Exercises

Lecture 18: Sampling distributions

Matrix Representation of Data in Experiment

Topic 5 [434 marks] (i) Find the range of values of n for which. (ii) Write down the value of x dx in terms of n, when it does exist.

An Introduction to Randomized Algorithms

STAT Homework 7 - Solutions

Transcription:

Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis L. Scharf Let x deote a radom variable whose pdf p θ x is parameterized by the ukow parameter θ. For example cosider Fig. that shows two desities oe with θ ad the other with θ 2. p µ x p µ2 x bx x Fig.. Typical desity fuctios. Suppose that x is observed. Based o the prior model p θ x Fig. we ca say that x is more probably observed whe θ = θ 2 tha whe θ = θ. More geerally there may be a uique value of θ for which x is more probably observed tha for ay other. We call this value of θ as the maximum likelihood estimate ad deote it by θ: TexPoit fots used i EMF. [ ] Read the TexPoit maual before you delete this box.: θ = arg max p θ x. θ The fuctio lθ x = p θ x is called the likelihood fuctio ad its logarithm Lθ x = lp θ x 2 is called the log-likelihood fuctio. Whe Lθ x is cotiuously differetiable i θ the the ML estimate may be determied by differetiatig the log-likelihood fuctio. The ML estimate is the called the root of the ML equatio: θ Lθ x = θ lp θ x = 0. 3

2 We will assume that there is oly oe value of θ for which the derivative is 0. Example : What is the ML estimate of θ whe we observe y = θ + r where r Nµ σ 2? First we compute Lθ y = lp θ y. We ote that whe y = θ + r y is a ormal radom variable Nθ + µ σ 2. We have which implies that θ lp θy = θ { l + y µ + θ 2 2πσ 2 2 2σ 2 = 4σ 2 θ y µ + θ2 = y µ + θ 2σ2 = 0 θ = y µ. } Example 2: What is the ML estimate of θ whe we observe y = θ + r where r Nµ σ 2 I? I other words the oise vector r cosists of i.i.d idepedet idetically distributed radom variables each distributed as Nµ σ 2. From our observatio model we ote that y Nθ + µ σ 2 I ad y is also a collectio of idepedet rvs. Hece the joit desity of y is give by the product of the margials. We have lp θ y = l exp y i µ + θ 2 2πσ 2 2 2σ 2 = l exp y i µ + θ 2 2πσ 2 2 2σ 2 This leads to Fially we obtai = l + 2πσ 2 θ lp θy = 4σ 2 = 2σ 2 = 0. θ = 2 y i µ + θ 2 2σ 2 θ y i µ + θ 2 y i µ θ y i µ..

3 A very importat poit: So far θ is determiistic but ukow but θ is a radom variable. Example 3: What is the mea of θ? E θ = E y i µ = = = θ. Ey i µ µ + θ µ A estimator with the property that E θ = θ is said to be a ubiased estimate. Example 4: What is the variace of θ? For ow assume µ = 0. 2 E θ 2 = E y i = E y 2 i 2. This is because all the cross-terms yield Ey i y j i j which is 0 sice y i are i.i.d. It is straightforward to show that 2 var θ = E θ 2 σ 2 E θ =. Whe = we are reduced to the first example. The estimate i the first example is also ubiased ad has variace σ 2. As we add more observatios > the variace ucertaity i the estimate scales dow by. Ad goes to 0 as. I this case the ML estimate has the property that its variace goes to 0 as. Example 5: What is the distributio desity of θ? Ca be easily geeralized.

4 Example 6: What is the ML estimate of θ R r R m Nµ Σ? Clearly A R m. whe we observe y = Aθ + r where pr = 2π 2 Σ 2 exp 2 r µt Σ r µ. 4 From the observatio model y Aθ is distributed as Nµ Σ. We have the log-likelihood fuctio: lp θ y = exp 2π 2 Σ 2 2 y Aθ µt Σ y Aθ µ which leads to θ lp θy = θ y + Aθ + µt Σ y + Aθ + µ = y + Aθ + µ T Σ A + y + Aθ + µ T Σ A = 0. 0 = y + Aθ + µ T Σ A 0 = A T Σ y + Aθ + µ A T Σ y A T Σ µ = A T Σ Aθ θ = A T Σ A A T Σ y A T Σ µ. Recall Ax + b T CDx + e x = Dx + e T C T A + Ax + b T CD. Example 7: Is the above estimate ubiased? Example 8: What is the variace of the above estimate? Example 9: What is the distributio of the above desity?

5 Lecture 2: Wedesday Example 0: Cosider r i i =... be i.i.d Nµ σ 2. What is the ML estimate of µ ad σ 2? It is straightforward to show that µ = r i i.e. ML estimate µ is the sample mea. Note that this estimate is idepedet of the variace. For the variace we ote that the log-likelihood fuctio is x l p xr... r = l p x r i x Solvig for x yields = ri µ 2 l exp x 2πx 2x = l l x r i µ 2 x 2π 2x = 2x r i µ 2 2 x x = 2x + r i µ 2 = 0. 2x 2 2x + r i µ 2 = 0 2x 2 r i µ 2 = x 2 x x = σ 2 = r i µ 2. Whe the mea is ukow ad is itself estimate by µ the variace estimate is x = σ 2 = r i µ 2.

6 Example : Is the above estimate ubiased? Let s check. E σ2 = E r i µ 2 = E r i µ + µ µ 2 = E ri µ 2 + 2r i µµ µ + µ µ 2 = E r i µ 2 + 2E r i µµ µ + = σ 2 2E µ µ r i µ + σ2 = σ 2 2E µ µ µ µ + σ2 = σ 2 2 σ2 + σ2 = σ 2 σ2 = σ2. E µ µ 2 So σ 2 comes out be biased. We ca resolve this problem by ormalizig this estimate as s 2 = σ 2 = r i µ 2 which becomes a ubiased estimate of the variace. A. MLE for radom parameters I case of radom parameters the likelihood fuctio is to be modified as px θ which is the coditioal desity of the observatios x give the ukow radom parameter θ. Everythig else remais the same. The log-likelihood fuctio the becomes l px θ.

7 II. MAXIMUM APOSTERIORI PROBABILITY MAP ESTIMATE Whe a prior o the ukow radom parameter θ is kow the priciple of ML ca be exteded as maximizig the joi desity fuctio l px θ. We write the log-likelihood as: l px θ = l px θpθ = l px θ + l pθ ad the estimate of θ is determied by maximizig the above. The above ca also be viewed as maximizig l px θ = l pθ xpx = l pθ x + l px ad the estimate is the solutio of l pθ x + l px = l pθ x = 0 θ θ where px is the margial desity of x after itegratig θ out of the joit desity. The coditioal desity pθ x is called the posterior probability desity of θ give x. I summary whe px θ ad pθ are kow or ca be computed we use the first maximizatio; o the other had whe pθ x is kow or ca be computed we opt for the secod maximizatio. Notice that i geeral the coditioal desities are kow or easily computable ad the joit desities are much harder to compute. Comparig MLE to MAP the two estimates are the same whe l pθ = 0 or whe pθ θ is idepedet of θ. MLE ad MAP are poit estimates i.e. they estimate the parameter θ but do ot provide e.g. a cofidece iterval for θ.

8 III. BAYESIAN ESTIMATORS Mother Nature coducts a radom experimet that geerates a parameter θ from a probability desity fuctio pθ. This parameter θ the codes or parameterizes the coditioal or measuremet desity px θ. A radom experimet geerates a measuremet x from px θ. The problem is to estimate θ from x. We deote the estimate by θx. The Bayesia setup cosists of the followig otio. Loss fuctio: The quality of the estimate θx is measured by a real-valued loss fuctio. Some examples are: Quadratic loss fuctio: Lθ θx = [θ θx] T [θ θx]. Biary 0 loss fuctio: Lθ θx = 0 if θx = θ ad otherwise. Risk: The risk ca be defied as the average loss fuctio over the desity px θ. The risk basically addresses what is the average loss or risk associate with the estimate θx. Mathematically Rθ θ = E x Lθ θx = Lθ θxpx θdx. The otatio E x idicates that the expectatio is over the distributio of the radom measuremet x with θ fixed. Bayes risk: Bayes risk is the risk averaged over the prior distributio o θ: R θ = E θ Rθ θ = Rθ θpθdθ = Lθ θx px θdxpθdθ. }{{} pxθdxdθ Bayes Risk estimator: The Bayes risk estimator miimizes the Bayes risk: θ B = arg mi Rθ θ i.e. the value of θ that miimizes the Bayes risk.