Estimators as Random Variables

Similar documents
Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

2 Statistical Estimation: Basic Concepts

Part 4: Multi-parameter and normal models

Basic concepts in estimation

Probability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver

Estimation and Detection

ECE531 Lecture 10b: Maximum Likelihood Estimation

Bias Variance Trade-off

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

Statistical inference

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Parametric Techniques

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur

Advanced Signal Processing Introduction to Estimation Theory

Bias. Definition. One of the foremost expectations of an estimator is that it gives accurate estimates.

Parametric Techniques Lecture 3

Simple Linear Regression

Introduction to Maximum Likelihood Estimation

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

Parameter Estimation

Problem 1 (20) Log-normal. f(x) Cauchy

where r n = dn+1 x(t)

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

F & B Approaches to a simple model

If we want to analyze experimental or simulated data we might encounter the following tasks:

A Few Notes on Fisher Information (WIP)

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

EIE6207: Estimation Theory

Estimation Theory Fredrik Rusek. Chapters

Mathematical statistics

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

Brief Review on Estimation Theory

Review of probability and statistics 1 / 31

Proof In the CR proof. and

ECON The Simple Regression Model

Primer on statistics:

Simple linear regression

Statistics: Learning models from data

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

2. A Basic Statistical Toolbox

Inferring from data. Theory of estimators

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Expression arrays, normalization, and error models

14.30 Introduction to Statistical Methods in Economics Spring 2009

Inference in Normal Regression Model. Dr. Frank Wood

Practice Problems Section Problems

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

MS&E 226: Small Data

10. Linear Models and Maximum Likelihood Estimation

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Regression Estimation Least Squares and Maximum Likelihood

where x and ȳ are the sample means of x 1,, x n

Elements of statistics (MATH0487-1)

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Business Statistics: A Decision-Making Approach 6 th Edition. Chapter Goals

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Introduction to Simple Linear Regression

6.867 Machine learning

Simple and Multiple Linear Regression

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

{X i } realize. n i=1 X i. Note that again X is a random variable. If we are to

Stat 579: Generalized Linear Models and Extensions

ECNS 561 Multiple Regression Analysis

6.867 Machine Learning

Ch 2: Simple Linear Regression

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

All other items including (and especially) CELL PHONES must be left at the front of the room.

Estimation, Inference, and Hypothesis Testing

6. Vector Random Variables

Data Mining Stat 588

Economics 583: Econometric Theory I A Primer on Asymptotics

Machine Learning Basics: Estimators, Bias and Variance

Detection & Estimation Lecture 1

Machine Learning for OR & FE

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters.

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

The Simple Linear Regression Model

Review of Probability

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Review. December 4 th, Review

CSC321 Lecture 18: Learning Probabilistic Models

Probabilities & Statistics Revision

SIO 221B, Rudnick adapted from Davis 1. 1 x lim. N x 2 n = 1 N. { x} 1 N. N x = 1 N. N x = 1 ( N N x ) x = 0 (3) = 1 x N 2

Estimation of Parameters

Section 8.1: Interval Estimation

Mathematical statistics

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

System Identification

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

STATISTICS 4, S4 (4769) A2

Detection & Estimation Lecture 1

Applied Econometrics (QEM)

BTRY 4090: Spring 2009 Theory of Statistics

Transcription:

Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maimum likelihood Consistency Confidence intervals Properties of the mean estimator Introduction Up until now we have defined and discussed properties of random variables and processes In each case we started with some known property (e.g. autocorrelation) and derived other related properties (e.g. PSD) In practical problems we rarely know these properties a priori In stead, we must estimate what we wish to know from finite sets of measurements J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 Terminology Suppose we have N independent, identically-distributed (i.i.d.) observations i } N i= Ideally we would like to know the pdf of the data where R p f(; ) In probability theory, we think about the likeliness of i } N i= given the pdf and In inference, we are given i } N i= and are interested in the likeliness of Called the sampling distribution We will use to denote the parameter (or vector of parameters) we wish to estimate Estimators as Random Variables Our estimator is a function of the measurements ˆ [ ] i } N i= It is therefore a random variable It will be different for every different set of observations It is called an estimate or, if is a scalar, a point estimate Of course we want ˆ to be as close to the true as possible This could be, for eample, the process mean μ J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 3 J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 4

Natural Estimators ˆμ = ˆ [ ] i } N i= = N N n= This is the obvious or natural estimator of the process mean Sometimes called the average or sample mean It will also turn out to be the best estimator I will define best shortly ˆσ = ˆ [ ] i } N i= = N i N ( i ˆμ ) n= This is the obvious or natural estimator of the process variance Not the best Good Estimators Without loss of generality, let us consider a scalar parameter for the time being What is a good estimator Distribution of ˆ should be centered at the true value Want the distribution to be as narrow as possible Lower-order moments enable coarse measurements of good ˆ J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 5 J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 6 Bias Bias of an estimator ˆ of a parameter is defined as B(ˆ) E[ˆ] Unbiased: an estimator is said to be unbiased if B(ˆ) =0 This implies the pdf of the estimator is centered at the true value The sample mean is unbiased The estimator of variance on the earlier slide is biased Unbiased estimators are generally good, but they are not always best (more later) Variance Variance of an estimator ˆ of a parameter is defined as [ ] var(ˆ) =σ ˆ E ˆ E [ˆ ] A measure of the spread of ˆ about its mean Would like the variance to be as small as possible J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 7 J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 8

Bias-Variance Tradeoff The Bias-Variance Tradeoff ˆ ˆ In many cases minimizing variance conflicts with minimizing bias Note that ˆ 0 has zero variance, but is generally biased In these cases we must trade variance for bias (or vice versa) ˆ Understanding of the bias-variance tradeoff is crucial to this course Unbiased models are not always best The methods we will use to estimate the model coefficients are biased But they may be more accurate, because they have less variance This idea applies to nonlinear models as well ˆ J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 9 J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 0 Bias, Variance, and Modeling y() =g()+ε ŷ() =ĝ() In the modeling contet, we are usually interested in estimating a function For a given input, this function is a scalar We can define = g() Thus, all of the ideas that apply to estimating parameters also apply to estimating functional relationships Notation and Prediction Error y = g()+ε g = g() ĝ =ĝ() ĝ e =E[ĝ()] Epectation is taken over the distribution of data sets used to construct ĝ() and the distribution of the process noise f(ε) Everything is a function of Recall that ε is i.i.d. with zero mean We are treating as a fied, non-random variable The dependence on is not shown to simplify notation The prediction error for a new, given input is defined as PE() = E[(y ĝ) ] = E[((g ĝ)+ε) ] = E[(g ĝ) ]+E[(g ĝ)ε]+e[ε ] = MSE()+σ ε J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6

The Bias-Variance Tradeoff Derivation y = g()+ε g = g() ĝ =ĝ() ĝ e =E[ĝ()] Only ĝ is a random function Nothing else is dependent on the data set MSE() = E[(g ĝ) ] = E[(g ĝ e ) (ĝ ĝ e )} ] = E (g ĝ e ) (g ĝ e )(ĝ ĝ e ) +(ĝ ĝ e ) }}}} Thus Bias-Variance Tradeoff Derivation Continued = E[(g ĝ e ) (g ĝ e )(ĝ ĝ e )] = E[g gĝ e +ĝe g(ĝ ĝ e )] + ĝe ĝe = E[g gĝ e +ĝe gĝ +gĝ e ] = E[g gĝ +ĝe] = g g E[ĝ]+ĝe = g gĝ e +ĝ e = (g ĝ e ) MSE() = + = (g ĝ e ) +E[(ĝ ĝ e ) ] = (g E[ĝ]) +E [ (ĝ E[ĝ]) ] J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 3 J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 4 Bias-Variance Tradeoff Comments MSE() =(g E[ĝ]) +E [ (ĝ E[ĝ]) ] = Bias + Variance Large variance: the model is sensitive to small changes in the data set Large bias: if the model was compared to the true function on a large number of data sets, the epected value of the model ĝ() would not be close to the true function g() If the model is sensitive to small changes in the data, a biased model may have smaller error (MSE) than an unbiased model If the data is strongly collinear, biased models can result in more accurate models! Bias-Variance Tradeoff Comments Continued MSE() =(g E[ĝ]) +E [ (ĝ E[ĝ]) ] = Bias + Variance Large variance, small bias If the model is too fleible, it can overfit the data The model will change dramatically from one data set to another In this case it has high variance, but potentially low variance Small variance, large bias If the model is not very fleible, it may not capture the true relationship between the inputs and the output It will not vary as much from one data set to another In this case the model has low variance, but potentially high bias J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 5 J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 6

Mean Square Error Mean Square Error of an estimator ˆ of a parameter is defined as MSE() E[ ˆ ]=σ ˆ + B(ˆ) We will use often use MSE as a global measure of estimator performance Note that two different estimators may have the same MSE but different bias and variance This criterion is convenient for building estimators Creating a problem we can solve Note the rationale is due to convenience: Picking MSE results in a simple bias/variance decomposition Other error measures generally do not have such a decomposition var(ˆ) E Cramér-Rao Lower Bound [ ln f; (;) ] = E[ ln f ; (;) ] Minimum Variance Unbiased (MVU): Estimators that are both unbiased and have the smallest variance of all possible estimators Note that these do not necessarily achieve the minimum MSE Cramér-Rao Lower Bound (CRLB) shown above is a lower bound on unbiased estimators Log Likelihood Function of is ln f ; (; ) Note that the pdf f ; (; ) describes the distribution of the data (stochastic process), not the parameter is not a random variable, it is a parameter that defines the distribution J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 7 J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 8 Cramér-Rao Lower Bound Comments var(ˆ) E [ ln f; (;) ] = [ ] ln f E ; (;) Efficient Estimator: an unbiased estimate that achieves the CRLB with equality If it eists, then the unique solution is given by ln f ; (; ) =0 where the pdf is evaluated at the observed outcome (ζ) Maimum Likelihood (ML) Estimate: an estimator that satisfies the equation above This can be generalized to vectors of parameters Limited use f ; (; ) is rarely known in practice Consistency Consistent Estimator an estimator such that lim MSE(ˆ) =0 N Implies the following as the sample size grows (N ) The estimator becomes unbiased The variance approaches zero The distribution fˆ() becomes an impulse centered at J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 9 J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 0

Confidence Intervals Confidence Interval: interval, a b, that has a specified probability of covering the unknown true parameter value Pr a < b} = α The interval is estimated from the data, therefore it is also a pair of random variables Confidence Level: coverage probability of a confidence interval, α The confidence interval is not uniquely defined by the confidence level Properties of the Sample Mean ˆμ N E[ˆμ ]=μ var(ˆμ )= σ N N k=0 N l= N (n) The estimator is unbiased Can also be shown that Has minimum variance If Gaussian, is the maimum likelihood estimator If Gaussian, attains the Cramér-Rao Lower Bound J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 fˆμ (ˆμ ) = Sample Mean Confidence Intervals [ π(σ / N) ep Pr μ k σ < ˆμ <μ + k σ } Pr ˆμ k σ <μ < ˆμ + k σ } ( ) ] ˆμ μ σ / N = = α In general, we don t know the pdf If we can assume the process is Gaussian and IID, we know the pdf (sampling distribution) of the estimator If N is large and the distribution doesn t have heavy tails, the distribution of ˆμ is Gaussian by the Central Limit Theorem (CLT) Sample Mean Confidence Intervals Comments Pr ˆμ k σ <μ < ˆμ + k σ } = α In many cases the confidence intervals are accurate, even if they are only approimate We can choose k such that α equals any probability we like In general, the user picks α This controls how often the confidence interval does not cover μ 95% and 99% are common choices J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 3 J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 4

Sample Mean Variance when Gaussian and IID Pr ˆμ k σ <μ < ˆμ + k σ } = α If σ is unknown (usually), must estimate from the data ˆσ = N [(n) ˆμ ] N n=0 The corresponding z score, has a different distribution If (n) is IID and Gaussian ˆμ μ ˆσ / N has a Students t distribution with v = N degrees of freedom Approaches a Gaussian distribution as v becomes large (> 0) E[ˆμ ]=μ Sample Mean Variance when Gaussian var(ˆμ )= N N l= N ( l ) γ (l) N If (n) is Gaussian but not IID, the sample mean is normal with mean μ The approimate confidence interval is given by a Guassian PDF Pr ˆμ k var(ˆμ ) <μ < ˆμ k } var(ˆμ ) = α J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 5 J. McNames Portland State University ECE 4/557 Estimation Theory Ver..6 6