Similar documents
2. Variance and Higher Moments

2. The Sample Mean and the Law of Large Numbers

7. The Multivariate Normal Distribution

Limiting Distributions

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

Limiting Distributions

2. The Normal Distribution

2. The Binomial Distribution

Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee

2. Continuous Distributions

Mathematical statistics

Non-parametric Inference and Resampling

2. Tests in the Normal Model


Time series models in the Frequency domain. The power spectrum, Spectral analysis

6. Bernoulli Trials and the Poisson Process

Evaluating the Performance of Estimators (Section 7.3)

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

1. Discrete Distributions

5. Conditional Distributions

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

) ) = γ. and P ( X. B(a, b) = Γ(a)Γ(b) Γ(a + b) ; (x + y, ) I J}. Then, (rx) a 1 (ry) b 1 e (x+y)r r 2 dxdy Γ(a)Γ(b) D

Efficient and Robust Scale Estimation

Theory of Statistics.

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

4. The Negative Binomial Distribution

3. Tests in the Bernoulli Model

Econometrics Summary Algebraic and Statistical Preliminaries

3. The Multivariate Hypergeometric Distribution

Lecture 7 Introduction to Statistical Decision Theory

7. Higher Dimensional Poisson Process

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

f (1 0.5)/n Z =

Paul Schrimpf. January 23, UBC Economics 326. Statistics and Inference. Paul Schrimpf. Properties of estimators. Finite sample inference

Lecture 7: Chapter 7. Sums of Random Variables and Long-Term Averages

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Mathematical Statistics

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Topic 16 Interval Estimation

Discrete Distributions

6.1 Moment Generating and Characteristic Functions

Statistics & Data Sciences: First Year Prelim Exam May 2018

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

10. Joint Moments and Joint Characteristic Functions

13. Parameter Estimation. ECE 830, Spring 2014

Probabilidades y Estadística (M) Práctica 9 2 cuatrimestre 2018 Estadística

6 The normal distribution, the central limit theorem and random samples

Proving the central limit theorem

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

Stat 710: Mathematical Statistics Lecture 31

P (A G) dp G P (A G)

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Exercises. (a) Prove that m(t) =

Regression #3: Properties of OLS Estimator

Lecture 22: Variance and Covariance

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Estimation of Parameters

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Lecture 1: August 28

University of Regina. Lecture Notes. Michael Kozdron

Chapters 9. Properties of Point Estimators


MAS113 Introduction to Probability and Statistics

Institute of Actuaries of India

Asymptotic Statistics-III. Changliang Zou

Economics 573 Problem Set 5 Fall 2002 Due: 4 October b. The sample mean converges in probability to the population mean.

Parameter Estimation

Lecture 32: Asymptotic confidence sets and likelihoods

2 Statistical Estimation: Basic Concepts

. Find E(V ) and var(v ).

Review of Probabilities and Basic Statistics

Homework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples

On the convergence of sequences of random variables: A primer

Weighted Least Squares

2.3 Estimating PDFs and PDF Parameters

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Non-parametric Inference and Resampling

STAT 461/561- Assignments, Year 2015

IE 303 Discrete-Event Simulation

Brief Review on Estimation Theory

Math 362, Problem set 1

Lecture 28: Asymptotic confidence sets

Parameter Estimation and Fitting to Data

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Lecture 8: Information Theory and Statistics

Introduction to Rare Event Simulation

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

4. Conditional Probability

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Delta Method. Example : Method of Moments for Exponential Distribution. f(x; λ) = λe λx I(x > 0)

Section 9.1. Expected Values of Sums

Recitation 2: Probability

Chapter 6: Random Processes 1

Asymptotic Theory. L. Magee revised January 21, 2013

Review of Probability. CS1538: Introduction to Simulations

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Transcription:

1 of 7 7/16/2009 6:12 AM Virtual Laboratories > 7. Point Estimation > 1 2 3 4 5 6 1. Estimators The Basic Statistical Model As usual, our starting point is a random experiment with an underlying sample space and a probability measure P. In the basic statistical model, we have an observable random variable X taking values in a set S. Recall that in general, this variable can have quite a complicated structure. For example, if the experiment is to sample n objects from a population and record various measurements of interest, then the data vector has the form X = (X 1, X 2,..., X n ) where X i is the vector of measurements for the i th object. The most important special case is when (X 1, X 2,..., X n ) are independent and identically distributed (IID). In this case X is a random sample of size n from the distribution of an underlying measurement variable X. Statistics Recall also that a statistic is an observable function of the outcome variable of the random experiment: W = W(X) Thus, a statistic is simply a random variable derived from the observation variable X, with the assumption that W is also observable. As the notation indicates, W is typically also vector-valued. Parameters In the general sense, a parameter θ is a function of the distribution of X, taking values in a parameter space Θ. Typically, the distribution of X will have k real parameters of interest, so that θ has the form θ = (θ 1, θ 2,..., θ k ) and thus Θ R k. In many cases, one or more of the parameters are unknown, and must be estimated from the data variable X. This is one of the of the most important and basic of all statistical problems, and is the subject of this chapter. Estimators Suppose now that we have an unknown real parameter θ taking values in a parameter space Θ R. A real-valued statistic W = W(X) that is used to estimate θ is called, appropriately enough, an estimator of θ.

2 of 7 7/16/2009 6:12 AM Thus, the estimator is a random variable and hence has a distribution, a mean, a variance, and so on. When we actually run the experiment and observe the data x, the observed value w = W(x) (a single number) is the estimate of the parameter θ. Basic Properties The (random) error is the difference between the estimator and the parameter: W θ. The expected value of the error is known as the bias: bias(w) = (W θ) 1. Use basic properties of expected value to show that bias(w) = (W) θ Thus, the estimator is said to be unbiased if the bias is 0 for all θ Θ, equivalently if the expected value of the estimator is the parameter being estimated: (W) = θ for all θ Θ. The quality of the estimator is usually measured by computing the mean square error: MSE(W) = ((W θ) 2 ) 2. Use basic properties of expected value and variance to show that MSE(W) = var(w) + bias 2 (W) In particular, if the estimator is unbiased, then the mean square error of W is simply the variance of W. Ideally, we would like to have unbiased estimators with small mean square error. However, this is not always possible, and Exercise 2 shows the delicate relationship between bias and mean square error. In the next section we will see an example with two estimators of a parameter that are multiples of each other; one is unbiased, but the other has smaller mean square error. However, if we have two unbiased estimators of θ, denoted U and V, we naturally prefer the one with the smaller variance (mean square error). The relative efficiency of V to U is simply the ratio of the variances: eff(u, V) = var(u) var(v) Asymptotic Properties Often we have a general formula that defines an estimator of θ for any sample size n. Technically, this gives a sequence of real-valued estimators of θ: W n = W n (X 1, X 2,..., X n ), n N + In this case, we can discuss the asymptotic properties of the estimators as n. M ost of the definitions are natural generalizations of the ones above. First, the sequence of estimators W n is said to be asymptotically unbiased if

3 of 7 7/16/2009 6:12 AM bias(w n ) 0 as n for any θ Θ 3. Show that W n is asymptotically unbiased if and only if (W n ) θ as n for any θ Θ. Suppose now that U n and V n are two sequences of estimators that are asymptotically unbiased for θ. The asymptotic relative efficiency of V n to U n is the following limit, if it exists: var(u n ) lim var(v n ) n Naturally, we expect our estimators to improve, in some sense, as the sample size n increases. Specifically, the sequence of estimators W n is said to be consistent for θ if W n θ as n in probability.: P( W n θ > ε) 0 as n for every ε > 0 and every θ Θ 4. Suppose that MSE(W n ) 0 as n for any θ Θ. Show that W n is consistent for θ. Hint: Use M arkov's inequality. The condition in Exercise 4 is known as mean-square consistency. Thus, mean-square consistency implies simple consistency. This is simply a statistical version of the theorem that states that mean-square convergence implies convergence in probability. Estimation Problems In the next several subsections, we will review several basic estimation problems that were studied in the chapter on Random Samples. Estimating the Mean Suppose that X = (X 1, X 2,..., X n ) is a random sample of size n from the distribution of a real-valued random variable X that has mean μ and standard deviation σ. A natural estimator of the distribution mean μ is the sample mean, defined by M(X) = 1 n n i =1 X i 5. Show or recall that (M) = μ so M is an unbiased estimator of μ. var(m) = σ 2 n so M is a consistent estimator of μ. 6. In the sample mean experiment, set the sampling distribution to gamm Increase the sample size with the scroll bar and note graphically and numerically the unbiased and consistent properties. Run the experiment 1000 times updating every 10.

4 of 7 7/16/2009 6:12 AM 7. Run the normal estimation experiment 1000 times, updating every 10 runs, for several values of the parameters. In each case, compare the empirical bias and mean square error of M n with the theoretical values. The consistency of the sample mean M n as an estimator of the distribution mean μ is simply the weak law of large numbers. M oreover, there are a number of important special cases of the results in Exercise 5. See the section on Sample Mean for the details. Suppose that X = 1(A), the indicator variable for an event A that has probability P(A). Then the sample mean of the random sample X is the relative frequency or empirical probability of A, denoted P n (A). Hence P n (A) is an unbiased and consistent estimator of P(A). Suppose that F denotes the distribution function of a real-valued random variable X. Then for fixed x, the empirical distribution function F n (x) is simply the sample mean for a random sample of size n from the distribution of the indicator variable 1(X x). Hence F n (x) is an unbiased and consistent estimator of F(x). Suppose that X is a random variable with a discrete distribution on a countable set S and f denotes the probability density function of X. Then for fixed x S, the empirical probability density function f n (x) is simply the sample mean for a random sample of size n from the distribution of the indicator variable 1(X = x). Hence f n (x) is an unbiased and consistent estimator of f (x). 8. In matching experiment, the random variable is the number of matches. Run the simulation 1000 times updating every 10 runs and note the apparent convergence of the sample mean to the distribution mean. the empirical density function to the distribution density function. Estimating the Variance As in the last subsection, suppose that X = (X 1, X 2,..., X n ) is a random sample of size n from the distribution of a real-valued random variable X that has mean μ and standard deviation σ. We will also assume that the fourth central moment d 4 = ((X μ) 4 ) is finite. If μ is known (usually an artificial assumption), then a natural estimator of σ 2 is a special version of the sample variance, defined by W 2 (X) = 1 n n i =1 (X i μ) 2

5 of 7 7/16/2009 6:12 AM 9. Show or recall that (W 2 ) = σ 2 so W 2 is an unbiased estimator of σ 2 var(w 2 ) = 1 n (d 4 σ 4 ) so W 2 is a consistent estimator of σ 2 If μ is unknown (the more reasonable assumption), then a natural estimator of the distribution variance is the standard version of the sample variance, defined by 10. Show or recall that S 2 (X) = 1 n 1 n i =1 (X i M(X)) 2 (S 2 ) = σ 2 so S 2 is an unbiased estimator of σ 2 var(s 2 ) = 1 n (d 4 n 3 n 1 σ 4 ) so S 2 is a consistent estimator of σ 2 11. Run the exponential experiment 1000 times with an update frequency of 10. Note the apparent convergence of the sample standard deviation to the distribution standard deviation. 12. Show that var(w 2 ) < var(s 2 ). Thus, W 2 is better than S 2, assuming that μ is known so that we can actually use W 2. The asymptotic relative efficiency of S 2 to W 2 is 1. 13. Run the normal estimation experiment 1000 times, updating every 10 runs, for several values of the parameters. In each case, compare the empirical bias and mean square error of S 2 and of W 2 to their theoretical values. Which estimator seems to work better? The Poisson Distribution For an example of the ideas in the last two subsections, suppose that X has the Poisson distribution with unknown parameter a > 0. Then (X) = var(x) = a, so that we could use either the sample mean M or the sample variance S 2 as an estimator of Both are unbiased, so which is better? Naturally, we use mean square error as our criterion. 14. Show that c. (X) = a (X 2 ) = a 2 + a (X 3 ) = a 3 + 3 a 2 + a

6 of 7 7/16/2009 6:12 AM d. e. (X 4 ) = a 4 + 6 a 3 + 7 a 2 + a d 4 = 3 a 2 + a 15. Show that c. d. var(m) = a n var(s 2 ) = a n (1 + 2 a n n 1) var(m) < var(s 2 ), so the sample mean M is a better estimator of the parameter a than the sample variance S 2. The asymptotic relative efficiency of M to S 2 is 1 + 2 a 16. Run the Poisson experiment 100 times, updating every run, for several values of the parameter. In each case, compute the estimators M and S 2. Which estimator seems to work better? Estimating the Covariance Suppose that ((X 1, Y 1 ), (X 2, Y 2 ),..., (X n, Y n )) is a random sample of size n from the distribution of (X, Y), where X is a real-valued random variable with mean μ and standard deviation σ, and where Y is a real-valued random variable with mean ν and standard deviation τ. Let δ denote the covariance of (X, Y). As usual, we will let X = (X 1, X 2,..., X n ) and Y = (Y 1, Y 2,..., Y n ); these are random samples of size n from the distributions of X and Y, respectively. If μ and ν are known (usually an artificial assumption), then a natural estimator of the distribution covariance δ is a special version of the sample covariance, defined by W(X, Y) = 1 n n i =1 (X i μ) (Y i ν) 17. Show or recall that (W) = δ so W is an unbiased estimator of δ. W is consistent estimator of δ. If μ and ν are unknown (usually the more reasonable assumption), then a natural estimator of the distribution covariance δ is the usual version of the sample covariance, defined by 18. Show or recall that S(X, Y) = 1 n 1 n i =1 (X i M(X)) (Y i M(Y))

7 of 7 7/16/2009 6:12 AM (S) = δ so S is an unbiased estimator of δ. S is consistent estimator of δ. Chapter Topics The estimators of the mean, variance, and covariance that we have considered in this section have been natural in a sense. However, for other parameters, it is not clear how to even find a reasonable estimator in the first place. In the next several sections, we will consider the problem of constructing estimators. Then we return to the study of the mathematical properties of estimators, and consider the question of when we can know that an estimator is the best possible, given the dat Virtual Laboratories > 7. Point Estimation > 1 2 3 4 5 6 Contents Applets Data Sets Biographies External Resources Key words Feedback