Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maximum likelihood Consistency Confidence intervals Properties of the mean estimator Properties of the variance estimator Examples Introduction Up until now we have defined and discussed properties of random variables and processes In each case we started with some known property (e.g. autocorrelation) and derived other related properties (e.g. PSD) In practical problems we rarely know these properties apriori In stead, we must estimate what we wish to know from finite sets of measurements J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 1 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 2 Terminology Suppose we have N observations {x(n)} N 1 collected from a WSS stochastic process This is one realization of the random process {x(n, ζ)} N 1 Ideally we would like to know the joint pdf f(x 1,x 2,...,x n ; θ 1,θ 2,...,θ p ) Here θ are unknown parameters of the joint pdf In probability theory, we think about the likeliness of {x(n)} N 1 given the pdf and θ In inference, we are given {x(n)} N 1 and are interested in the likeliness of θ Called the sampling distribution We will use θ to denote a scalar parameter (or θ for a vector of parameters) we wish to estimate Estimators as Random Variables Our estimator is a function of the measurements ˆθ [ ] {x(n)} N 1 It is therefore a random variable It will be different for every different set of observations It is called an estimate or, if θ is a scalar, a point estimate Of course we want ˆθ to be as close to the true θ as possible J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 3 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 4

Natural Estimators ˆμ x = ˆθ [ ] {x(n)} N 1 = 1 N 1 x(n) N n= This is the obvious or natural estimator of the process mean Sometimes called the average or sample mean It will also turn out to be the best estimator I will define best shortly ˆσ x 2 = ˆθ [ ] {x(n)} N 1 = 1 N 1 [x(n) ˆμ x ] 2 N n= Good Estimators fˆθ(ˆθ) θ What is a good estimator? Distribution of ˆθ should be centered at the true value Want the distribution to be as narrow as possible Lower-order moments enable coarse measurements of goodness ˆθ This is the obvious or natural estimator of the process variance Not the best J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 5 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 6 Bias Bias of an estimator ˆθ of a parameter θ is defined as B(ˆθ) E[ˆθ] θ Normalized Bias of an estimator ˆθ of a non-negative parameter θ is defined as ε b B(ˆθ) θ Unbiased: an estimator is said to be unbiased if B(ˆθ) = This implies the pdf of the estimator is centered at the true value θ The sample mean is unbiased The estimator of variance on the earlier slide is biased Unbiased estimators are generally good, but they are not always best (more later) Variance Variance of an estimator ˆθ of a parameter θ is defined as [ var(ˆθ) =σ 2ˆθ E ˆθ ] 2 E[ˆθ] Normalized Standard deviation of an estimator ˆθ of a non-negative parameter θ is defined as ε r σˆθ θ A measure of the spread of ˆθ about its mean Would like the variance to be as small as possible J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 7 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 8

fˆθ(ˆθ) Bias-Variance Tradeoff fˆθ(ˆθ) Mean Square Error Mean Square Error of an estimator ˆθ of a parameter θ is defined as [ MSE(θ) E ˆθ θ 2] = σ 2ˆθ + B(ˆθ) 2 ˆθ ˆθ θ θ In many cases minimizing variance conflicts with minimizing bias Note that ˆθ has zero variance, but is generally biased In these cases we must trade variance for bias (or vice versa) Normalized MSE of an estimator ˆθ of a parameter θ is defined as ε MSE(ˆθ) θ θ The decomposition of MSE into variance plus bias squared is very similar to the DC and AC decomposition of signal power We will use MSE as a global measure of estimator performance Note that two different estimators may have the same MSE, but different bias and variance This criterion is convenient for building estimators Creating a problem we can solve J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 9 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 1 Cramér-Rao Lower Bound 1 1 var(ˆθ) [ ( ) ] 2 = [ ] ln fx;θ(x;θ) 2 ln f E E x;θ (x;θ) θ 2 θ Minimum Variance Unbiased (MVU): Estimators that are both unbiased and have the smallest variance of all possible estimators Note that these do not necessarily achieve the minimum MSE Cramér-Rao Lower Bound (CRLB) is a lower bound on unbiased estimators Derived in text Log Likelihood Function of θ is ln f x;θ (x; θ) Note that the pdf f x;θ (x; θ) describes the distribution of the data (stochastic process), not the parameter Recall that θ is not a random variable, it is a parameter that defines the distribution Cramér-Rao Lower Bound Comments 1 1 var(ˆθ) [ ( ) ] 2 = [ ] ln fx;θ(x;θ) 2 ln f E E x;θ (x;θ) θ 2 θ Efficient Estimator: an unbiased estimate that achieves the CRLB with equality If it exists, then the unique solution is given by ln f x;θ (x; θ) = θ where the pdf is evaluated at the observed outcome x(ζ) Maximum Likelihood (ML) Estimate: an estimator that satisfies the equation above This can be generalized to vectors of parameters Limited use f x;θ (x; θ) is rarely known in practice J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 11 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 12

Consistency Consistent Estimator an estimator such that lim MSE(ˆθ) = N Implies the following as the sample size grows (N ) The estimator becomes unbiased The variance approaches zero The distribution fˆθ(x) becomes an impulse centered at θ s : interval, a θ b, that has a specified probability of covering the unknown true parameter value Pr {a <θ b} =1 α The interval is estimated from the data, therefore it is also a pair of random variables Confidence Level: coverage probability of a confidence interval, 1 α The confidence interval is not uniquely defined by the confidence level More later J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 13 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 14 Properties of the Sample Mean ˆμ x 1 N E[ˆμ x ]=μ x var(ˆμ x )= 1 N N 1 n= N l= N x(n) ( 1 l ) γ x (l) 1 N N N l= N γ x (l) If x(n) is WN, then this reduces to var(ˆμ x )= σ2 x N The estimator is unbiased If γ x (l) as l,thenvar(ˆμ x ) (estimator is consistent) The variance increases as the correlation of x(n) increases In processes with long memory or heavy tails, it is harder to estimate the mean fˆμx (ˆμ x ) = Sample Mean s [ 1 2π(σx / N) exp 1 2 { Pr μ x k σ x < ˆμ x <μ x + k σ } x N N { Pr ˆμ x k σ x <μ x < ˆμ x + k σ } x N N ( ) ] 2 ˆμx μ x σ x / N = = 1 α In general, we don t know the pdf If we can assume the process is Gaussian and IID, we know the pdf (sampling distribution) of the estimator If N is large and the distribution doesn t have heavy tails, the distribution of ˆμ x is Gaussian by the Central Limit Theorem (CLT) J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 15 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 16

Sample Mean s Comments { Pr ˆμ x k σ x <μ x < ˆμ x + k σ } x =1 α N N In many cases the confidence intervals are accurate, even if they are only approximate We can choose k such that 1 α equals any probability we like In general, the user picks α This controls how often the confidence interval does not cover μ x 95% and 99% are common choices Sample Mean Variance when Gaussian and IID { Pr ˆμ x k σ x <μ x < ˆμ x + k σ } x =1 α N N If σ x is unknown (usually), must estimate from the data ˆσ x 2 = 1 N 1 [x(n) ˆμ x ] 2 N 1 n= The corresponding z score, has a different distribution If x(n) is IID and Gaussian ˆμ x μ x ˆσ x / N has a Students t distribution with v = N 1 degrees of freedom Approaches a Gaussian distribution as v becomes large (> 2) J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 17 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 18 E[ˆμ x ]=μ x Sample Mean Variance when Gaussian var(ˆμ x )= 1 N N l= N ( 1 l ) γ x (l) N If x(n) is Gaussian but not IID, the sample mean is normal with mean μ The approximate confidence interval is given by a Gaussian PDF { Pr ˆμ x k var(ˆμ x ) <μ x < ˆμ x + k } var(ˆμ x ) =1 α Note that var(ˆμ x ) requires knowledge γ x (l) Example 1: Mean s Generate 1 random experiments of a white noise signal of length N =1and N = 1. Plot the histograms of the 95% confidence intervals, the means, and specify the percentage of times that the true mean was within the confidence interval. Repeat for a Gaussian and Exponential distributions. N =1, Normal: 94.4% Coverage N =1, Exponential: 88.9% Coverage N = 1, Normal: 95.7% Coverage N = 1, Exponential: 95.1% Coverage J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 19 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 2

Example 1: Mean Histogram, N =1 Example 1: Variance Histogram, N =1 12 Estimated Mean Histogram 14 1 12 8 1 6 8 6 4 4 2 2 1.8.6.4.2.2 Estimated Mean.4.6.8 1.5 1 1.5 2 2.5 3 3.5 4 4.5 5 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 21 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 22 12 1 8 6 Example 1: Histogram, N =1 Estimated s M = 1; % No. experiments N = 1; % No. observations cl = 95; % Confidence level ds = Normal ; X = randn(n,m); tm = ; % True mean Example 1: MATLAB Code mx = mean(x); % Estimate the mean sx = std(x); % Estimated std. dev. lc = mx + sx*tinv( (1-cl/1)/2,N-1)/sqrt(N); % Lower confidence interval uc = mx + sx*tinv(1-(1-cl/1)/2,n-1)/sqrt(n); % Upper confidence interval fprintf( Mean covered: %5.2f%c\n,1*sum(lc<tm & uc>=tm)/m,char(37)); 4 2 2 1.5 1.5.5 1 1.5 2 figure [n,x] = hist(mx,25); h = bar(x,n,1.); set(h, FaceColor,[.5.5 1.]); xlim([-1 1]); title( Estimated Mean Histogram ); xlabel( Estimated Mean ); box off; eval(sprintf( print -depsc %smeanhistogram%3d;,ds,n)); J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 23 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 24

Example 1: MATLAB Code figure; [n,x] = hist(sx.^2,25); h = bar(x,n,1.); set(h, FaceColor,[.5.5 1.]); xlim([ 5]); title( ); xlabel( ); box off; eval(sprintf( print -depsc %svariancehistogram%3d;,ds,n)); figure; [n,x] = hist(lc,25); h = bar(x,n,1.); set(h, FaceColor,[.5 1..5]); hold on; [n,x] = hist(uc,25); h = bar(x,n,1.); set(h, FaceColor,[1..5.5]); hold off; xlim([-2 2]); title( Estimated s ); xlabel( ); box off; eval(sprintf( print -depsc %sconfidencehistogram%3d;,ds,n)); 12 1 8 6 4 2 Example 1: Mean Histogram, Normal N = 1 Estimated Mean Histogram 1.8.6.4.2.2 Estimated Mean.4.6.8 1 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 25 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 26 Example 1: Variance Histogram, Normal N = 1 Example 1: Histogram, Normal N = 1 12 12 Estimated s 1 1 8 8 6 6 4 4 2 2.5 1 1.5 2 2.5 3 3.5 4 4.5 5 2 1.5 1.5.5 1 1.5 2 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 27 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 28

Example 1: Mean Histogram, Exponential N =1 Example 1: Variance Histogram, Exponential N =1 1 Estimated Mean Histogram 25 9 8 2 7 6 15 5 4 1 3 2 5 1.2.4.6.8 1 1.2 1.4 1.6 1.8 2 Estimated Mean.5 1 1.5 2 2.5 3 3.5 4 4.5 5 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 29 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 3 Example 1: Histogram, Exponential N =1 Example 1: Mean Histogram, Exponential N = 1 18 Estimated s 12 Estimated Mean Histogram 16 14 1 12 8 1 8 6 6 4 4 2 2 1.5.5 1 1.5 2 2.5 3.2.4.6.8 1 1.2 1.4 1.6 1.8 2 Estimated Mean J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 31 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 32

Example 1: Variance Histogram, Exponential N = 1 Example 1: Histogram, Exponential N = 1 14 12 Estimated s 12 1 1 8 8 6 6 4 4 2 2.5 1 1.5 2 2.5 3 3.5 4 4.5 5 1.5.5 1 1.5 2 2.5 3 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 33 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 34 Estimation of Variance The natural estimator of the variance is ˆσ 2 x 1 N In general, the mean is given by N 1 n= E[ˆσ 2 x]=σ 2 x var(ˆμ x )=σ 2 x 1 N If x(n) is uncorrelated this reduces to Thus, ˆσ 2 x is a biased estimator! [x(n) ˆμ x ] 2 N l= N E[ˆσ x]= 2 N 1 N σ2 x ( 1 l ) γ x (l) N Example 2: Biased Variance Let w(n) WN(,σ 2 w). Find a closed-form expression for E[ˆσ 2 w] where ˆσ 2 w is the natural variance estimator in terms of σ 2 w and the length of the sequence N. J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 35 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 36

Example 2: Workspace Estimation of Variance A better estimator (if the mean is unknown) is ˆσ 2 x var(ˆσ 2 x) N 1 1 [x(n) ˆμ x ] 2 N 1 2σ 4 N 1 n= for large N If x(n) is uncorrelated, this estimator is unbiased As N,ifγ x (l) as l,thenvar( var ˆ x ) and the biased estimator is asymptotically unbiased Both estimators are consistent J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 37 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 38 Sample Variance s ˆσ x 2 1 N 1 [x(n) ˆμ x ] 2 N 1 n= If the samples are IID and Gaussian, (N 1)ˆσ x/σ 2 x 2 has a chi-squared distribution with N degrees of freedom { } Pr ˆσ x 2 N 1 χ v (.975) <σ2 x ˆσ x 2 N 1 = 1 α χ v (.25) The quantiles of χ v ( ) can be obtained from look-up tables or MATLAB This confidence interval is sensitive to the normal assumption (unlike the confidence intervals for the mean) Also sensitive to the IID assumption (like the mean) Example 3: Variance s Generate 1 random experiments of a white noise signal of length N =1and N = 1. Plot the histograms of the estimated variances, 95% confidence intervals, and the confidence interval lengths. Specify the percentage of times that the true variance was within the confidence interval. Repeat for a Gaussian and Exponential distributions. N =1, Normal: 94.9% Coverage N =1, Exponential: 76.% Coverage N = 1, Normal: 95.6% Coverage N = 1, Exponential: 68.4% Coverage J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 39 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 4

Example 3: Variance Histogram, Normal N =1 Example 3: Histogram, Normal N =1 2 2 Estimated s 18 18 16 16 14 14 12 12 1 1 8 8 6 6 4 4 2 2.5 1 1.5 2 2.5 3 3.5 4 4.5 5.5 1 1.5 2 2.5 3 3.5 4 4.5 5 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 41 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 42 Example 3: Confidence Length Histogram, Normal N =1 Example 3: Variance Histogram, Normal N = 1 2 Lengths 1 18 9 16 8 14 7 12 6 1 5 8 4 6 3 4 2 2 1.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Length.5 1 1.5 2 2.5 3 3.5 4 4.5 5 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 43 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 44

Example 3: Histogram, Normal N = 1 Example 3: Confidence Length Histogram, Normal N = 1 1 Estimated s 1 Lengths 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1.5 1 1.5 2 2.5 3 3.5 4 4.5 5.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Length J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 45 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 46 Example 3: Variance Histogram, Exponential N =1 Example 3: Histogram, Exponential N =1 25 25 Estimated s 2 2 15 15 1 1 5 5 1 2 3 4 5 6 7 8 9 1 1 2 3 4 5 6 7 8 9 1 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 47 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 48

Example 3: Confidence Length Histogram, Exponential N =1 Example 3: Variance Histogram, Exponential N = 1 25 Lengths 16 14 2 12 15 1 8 1 6 5 4 2 1 2 3 4 5 6 7 8 9 1 Length 1 2 3 4 5 6 7 8 9 1 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 49 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 5 Example 3: Histogram, Exponential N = 1 Example 3: Confidence Length Histogram, Exponential N = 1 16 Estimated s 16 Lengths 14 14 12 12 1 1 8 8 6 6 4 4 2 2 1 2 3 4 5 6 7 8 9 1 1 2 3 4 5 6 7 8 9 1 Length J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 51 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 52

Example 3: Relevant MATLAB Code M = 1; % No. experiments N = 1; % No. observations cl = 95; % Confidence level %ds = Exponential ; %tm = 1; % True mean %tv = 1; % True Variance %X = exprnd(tm,n,m); ds = Normal ; tm = ; % True mean tv = 1; % True Variance X = randn(n,m); sx = std(x); % Std. dev. estimate lc = sx.^2*(n-1)/chi2inv(1-(1-cl/1)/2,n-1); % Lower confidence interval uc = sx.^2*(n-1)/chi2inv( (1-cl/1)/2,N-1); % Upper confidence interval fprintf( Variance covered: %5.2f%c\n,1*sum(lc<tv & uc>=tv)/m,char(37)); Summary Estimators are random variables with a distribution called the sampling distribution Bias, variance, and mean square error are useful measures of performance because they only require knowledge of second order statistics of the sampling distribution Confidence intervals are random not the parameter being estimated In many cases, it is very difficult to determine properties of the estimator (bias, variance, confidence intervals, etc.) because they often rely on unknown properties of the distribution Variance of ˆμ x depend on r x (l) J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 53 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 54 Summary (Continued) In some cases we can obtain good approximations based on the central limit theorem or other assumptions It is critical to scrutinize these assumptions and determine whether they are reasonable for your application Monte Carlo simulations are useful for examining the sampling distribution under controlled conditions J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.9 55