Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Similar documents
STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Statistics 3858 : Maximum Likelihood Estimators

Statistics: Learning models from data

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

STAT 512 sp 2018 Summary Sheet

Practice Problems Section Problems

Loglikelihood and Confidence Intervals

Math 494: Mathematical Statistics

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

Math 494: Mathematical Statistics

Review of Discrete Probability (contd.)

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

A General Overview of Parametric Estimation and Inference Techniques.

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Review and continuation from last week Properties of MLEs

1 Degree distributions and data

Parametric Techniques Lecture 3

Mathematical statistics

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Parametric Techniques

Bootstrap, Jackknife and other resampling methods

Mathematical statistics

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

HT Introduction. P(X i = x i ) = e λ λ x i

The Nonparametric Bootstrap

Parameter Estimation

2017 Financial Mathematics Orientation - Statistics

Statistics and Econometrics I

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

BTRY 4090: Spring 2009 Theory of Statistics

simple if it completely specifies the density of x

Central Limit Theorem ( 5.3)

ECE 275A Homework 7 Solutions

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

STAT 830 Non-parametric Inference Basics

MATH4427 Notebook 2 Fall Semester 2017/2018

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

Theory of Statistics.

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

Statistical inference

F & B Approaches to a simple model

Why is the field of statistics still an active one?

Better Bootstrap Confidence Intervals

STAT 461/561- Assignments, Year 2015

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Empirical Likelihood

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Math Review Sheet, Fall 2008

IEOR 165 Lecture 13 Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Chapters 9. Properties of Point Estimators

ECE531 Lecture 10b: Maximum Likelihood Estimation

Information in Data. Sufficiency, Ancillarity, Minimality, and Completeness

COS513 LECTURE 8 STATISTICAL CONCEPTS

Hypothesis testing: theory and methods

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Nonparametric Methods II

Stat 5101 Lecture Notes

Generalized Linear Models Introduction

The comparative studies on reliability for Rayleigh models

Lecture 7 Introduction to Statistical Decision Theory

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Lectures on Simple Linear Regression Stat 431, Summer 2012

Exponential Families

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Elements of statistics (MATH0487-1)

Introduction to Maximum Likelihood Estimation

Basic Concepts of Inference

Notes on the Multivariate Normal and Related Topics

Estimation, Inference, and Hypothesis Testing

Topic 12 Overview of Estimation

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Using R in Undergraduate Probability and Mathematical Statistics Courses. Amy G. Froelich Department of Statistics Iowa State University

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Chapter 2: Resampling Maarten Jansen

Characterizing Forecast Uncertainty Prediction Intervals. The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ

Statistics 135 Fall 2007 Midterm Exam

Introduction to Machine Learning. Lecture 2

Statistics Masters Comprehensive Exam March 21, 2003

Mathematical statistics

Master s Written Examination

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you.

Maximum Likelihood Large Sample Theory

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

Statistics GIDP Ph.D. Qualifying Exam Theory Jan 11, 2016, 9:00am-1:00pm

Final Examination a. STA 532: Statistical Inference. Wednesday, 2015 Apr 29, 7:00 10:00pm. Thisisaclosed bookexam books&phonesonthefloor.

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

IEOR E4703: Monte-Carlo Simulation

Transcription:

Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence Intervals 5. Introduction to the Bootstrap These notes follow Rice [2007] very closely. 1

Introduction Our aim is to learn about a random process by observing a sample of outcomes. Generally, we have a sample X 1,..., X n drawn at random and we want to learn about their underlying distribution. For example, we might know that the distribution is normal and we want to estimate the parameters for the mean and variance. We can divide the type of models we might consider into two classes: parametric models and non-parametric models. A parametric model can de defined up to a finite dimensional parameter. Otherwise, it is considered a non-parametric model. For example, whenever we assume the random variable follows a particular probability distribution up to an unknown parameter we are considering a parametric model. In a nonparametric model the number and nature of the parameters is flexible and not fixed in advance. Here we will concentrate on estimation in the parametric case. 1 Estimation Given a model that is defined up to a finite set of parameters and a random sample from the model X 1,..., X n we wish to come up with a function of the sample that will estimate the parameters we don t know. Once we have decided on this estimator, θ n = t(x 1,..., X n ), it is only useful for inference if we know something about how it behaves. Since θ n is a function of a random sample it itself is a random variable and its behavior will depend on the size of the sample, n. Consider, for example, estimating the population mean of a normally distributed population (for illustrative purposes say N (10, 9)). The most obvious estimate is to simply draw a sample and calculate the sample mean. If we repeat this process with a new sample we would expect to get a different estimate. The distribution that results from repeated sampling is called the sampling distribution of the estimate. Figure 1 illustrates 500 estimates of the population mean based on a sample of size 100. We can see that our estimates are generally centered around the true value of 10 1.5 1 Density 0.5 0 8.5 9 9.5 10 10.5 11 11.5 Estimate Figure 1: 500 estimates of the population mean based on a sample size 100 but there is some variation - maybe a standard deviation of about 1. These observations translate to more formal ideas: the expectation of the estimator and the standard error of the estimator. Of course, it is even nicer if we can also parameterize the sampling distribution. So, what do we look for in a good estimator? Obviously, we want our estimate to be close to the true value. Also we want θ n to behave in a nice way as the sample size, n, increases. If we take a large sample we would like the estimate to be more accurate than a small sample. If we go back to our previous example we can look at the difference in behavior if we use different sample sizes. Each histogram in figure 2 represents 500 estimates of the population mean for sample sizes of 5, 10, 25, 50 and 100. We can see that the standard deviation of the estimate is smaller as the sample size increases. Formally, this is embodied in the principle of consistency. A consistent estimator will converge to the true parameter value as the sample size increases. Our estimate X 2

for the population mean of a normal seems very well behaved. Now, for some examples of not so well behaved estimators. Consistent but biased estimator Here we estimate the variance of the normal distribution used above (i.e. the true variance is 9). We use the estimate, ˆσ 2 = 1 (x i x) 2, n which happens to be the maximum likelihood estimate (to be discussed later). Histograms for 500 estimates based on different sample sizes are shown in Figure 3. For small sample sizes we can 1.5 Sample_Size: 5 Sample_Size: 10 Sample_Size: 25 Sample_Size: 50 Sample_Size: 100 1 Density 0.5 0 4 6 8 10 12 14 16 184 6 8 10 12 14 16 184 6 8 10 12 14 16 184 6 8 10 12 14 16 184 6 8 10 12 14 16 18 Estimate Figure 2: 500 estimates of the population mean based on sample sizes of 5, 10, 25, 50 and 100 Sample_Size: 5 Sample_Size: 10 Sample_Size: 25 Sample_Size: 50 Sample_Size: 100 0.3 Density 0.2 0.1 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 Estimate Figure 3: 500 maximum likelihood estimates of the population variance based on sample sizes of 5, 10, 25, 50 and 100 3

see that the estimates are not centered around 9. This is an example of a biased estimator. As the sample size increases the estimates move closer to being centered around 9. The estimate is said to be asymptotically unbiased. As sample size increases the estimates have smaller variance. This estimator is consistent. Inconsistent estimator It is very easy to come up with inconsistent estimators. After all, any function can be called an estimator even if it clearly will not have nice properties. For example, we could use the sample median to estimate the population mean. If the underlying distribution is antisymmetric then this will clearly be a poor estimator. This is illustrated in Figure 4 where the underlying distribution is exponential with mean 1. We can see that although the estimates have decreasing variance they 4 Sample_Size: 5 Sample_Size: 10 Sample_Size: 25 Sample_Size: 50 Sample_Size: 100 3 Density 2 1 0 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 Estimate Figure 4: 500 estimates of the population mean of an exponential distribution using the sample median. Based on sample sizes of 5, 10, 25, 50 and 100 are not getting closer to the true value of 1. Formal Definitions Consistency Let θ n be an estimate of θ based on a sample of size n. θ n is consistent in probability if θ n converges in probability to θ; that is for any ɛ < 0 Bias P ( θ n θ > 0) 0 as n θ n is unbiased if, E(θ n ) = θ. Efficiency An estimator is efficient if it has the lowest possible variance among all unbiased estimators 4

Mean Squared Error The quality of an estimator is generally judged by its mean squared error (MSE), 2 Method of Moments MSE = variance + bias 2. Let X be random variable following some distribution. Then the kth moment of the distribution is defined as, µ k = E(X k ). For example µ 1 = E(X) and µ 2 = V ar(x) + E(X) 2. The sample moments of observations X 1, X 2,..., X n independent and identically distributed (iid) from some distribution are defined as, ˆµ k = 1 n Xi k. For example, ˆµ 1 = X is the familiar sample mean and ˆµ 2 = ˆσ 2 + X 2 where ˆσ is the standard deviation of the sample. The method of moments estimator simply equates the moments of the distribution with the sample moments (µ k = ˆµ k ) and solves for the unknown parameters. Note that this implies the distribution must have finite moments. Example - Poisson Assume X 1,..., X n are drawn iid from a Poisson distribution with mass function, P (X = x) = λ x e λ /x!, x = 0, 1, 2,..., where λ is an unknown parameter. Check that E(X) = λ. So, µ 1 = E(X) = λ = X = ˆµ 1. Hence, the method of moments estimator of λ is the sample mean. Example - Gamma Assume X 1,..., X n are drawn iid from a Gamma distribution with density, f(x α, λ) = λα Γ(α) xα 1 e λx, x 0, where λ and α are unknown parameters. The first two moments of the gamma distribution are (check this yourself), µ 1 = α λ From the first equation, µ 2 = α(α + 1) λ 2. λ = α µ 1. 5

Substituting this into the second equation gives, or Then we have µ 2 µ 2 1 = α + 1 α, α = µ2 1 µ 2 µ 2. 1 λ = µ2 1 1 µ 2 µ 2 1 = µ 1 µ 2 µ 2. 1 µ 1 We substitute in the sample analogs of the moments and find the method of moments estimators are, ˆλ = X ˆσ 2 ˆα = X 2 ˆσ 2. Properties of the method of moments estimator Nice properties: it is consistent sometimes easier to calculate than maximum likelihood estimates Not so nice properties: sometimes not sufficient Sufficiency has a formal definition but intuitively it means that all the data that is relevant to estimating the parameter of interest is used. sometimes gives estimates outside the parameter space 3 Maximum Likelihood Let X 1, X 2,..., X n be a random vector of observations with joint density function f(x 1,...x n θ). Then the likelihood of θ as a function of the observed values, X i = x i, is defined as, lik(θ) = f(x 1,..., x n θ). The maximum likelihood estimate (mle) of the parameter θ is the value of θ that maximizes the likelihood function. In general, it is easier to maximize the natural log of the likelihood. In the case that the X i are iid the log likelihood is generally of the form, l(θ) = log f(x θ) = f(x i θ). 6

Example - Poisson Assume the same conditions as the example in the previous section. So, the log likelihood is, l(λ) = (x i log λ λ log x i!) To find the maximum we set the first derivative to zero, ( l xi ) (λ) = λ 1 Solving for λ we find that the mle is, = 1 λ ˆλ = 1 n x i n = 0 x i = X Note that this agrees with the method of moments estimator. Example - Gamma Again, assume the same conditions as the Gamma example in the previous section. The log likelihood is, l(α, λ) = ( log Γ(α) + α log λ + (α 1) log x i λx i ) In this case we have two parameters so we take the partial derivatives and set them both to zero. n l α = ( ) Γ (α) Γ(α) + log λ + log x i = l λ = n log x i + n log λ n Γ (α) Γ(α) = 0 ( α λ x i) = nα n λ x i = 0 This second equality gives the mle for λ as, ˆλ = ˆᾱ X. Substituting this into the first equation we find that the mle for α must satisfy, n log ˆα n log X + log x i n Γ (ˆα) Γ(ˆα) = 0. This equation needs to be solved by numerical means. Note however this is a different estimate to that given by the method of moments. 7

Properties of maximum likelihood estimation Nice properties: consistent very widely applicable unaffected by monotonic transformations of the data MLE of a function of the parameters, is that function of the MLE theory provides large sample properties asymptotically efficient estimators Not so nice properties: may be slightly biased parametric model required and must adequately describe statistical process generating data can be computationally demanding fails in some case, eg. if too many nuisance parameters (More on the asymptotic properties of the mle in the second Stats lecture). 4 Confidence Intervals Confidence intervals are an intuitive way to present information on an estimate and its uncertainty. Formally an 100(1 α)% confidence interval says that the interval is expected to contain the true value of the parameter 100(1 α)% of the time. Since a confidence interval is calculated from observations of a random variable it is itself a random variable. Each time a sample is taken (or an experiment performed) we would expect to get a different interval. If we calculate 95% confidence interval each time we would expect about 1 in 20 of the intervals to miss the true parameter. Figure 5 illustrates this. Fifty samples of size thirty are drawn at random from a population with mean 15. For each sample a 95% confidence interval for the population mean is calculated. These fifty intervals are plotted as vertical lines. We can see 4 intervals missed the true value. Example - Population mean Say we want an 100(1 α)% confidence interval for the population mean, µ. By Central Limit Theorem n( X µ)/σ is distributed N (0, 1). Let z(α) be the value such that the area to the right of z(α) is α. By symmetry of the normal z(1 α) = z(α). So, ( P z(α/2) X ) µ σ/ n z(α) = 1 α Rearranging this we find P ( X z(α/2)σ/ n µ X + nz(α/2)σ/ n ) = 1 α So a (1 α)% confidence interval for µ is ( X z(α/2)σ/ n, X +z(α/2)σ/ n). Of course in practice we don t know the value of σ so an estimate is generally plugged in instead. 8

12 14 16 18 Figure 5: Fifty 95% confidence intervals for the population mean calculated from samples of size thirty. The horizontal line is the true population mean. 5 Bootstrap What if we don t know the properties of our estimator? The idea of the bootstrap is to use simulation on the data we have to give us an estimate of the sampling distribution of the estimator. General Idea Repeatedly resample the data with replacement and calculate the estimate each time. The distribution of these bootstrap estimates approximates the sampling distribution of the estimate. Details Assume we have a sample X 1,..., X n that have some known distribution up to parameter θ. Also we have an estimator, θn ˆ = t(x 1,..., X n ), of θ. Now, we resample from the data with replacement. We have a new sample X1,..., Xn of size n that could contain any of the original sample datapoints any number of times. This resampled data is considered one bootstrap replicate. We calculate, ˆ θ n = t(x 1,..., X n) = an estimate of θ from a bootstrap replicate. The theory behind the bootstrap says that the distribution of (ˆθ θ) can be approximated by the distribution of ( ˆθ ˆθ). This method is probably best explained through an example. Below we use the bootstrap to investigate the sampling distribution of the method of moments estimator for the parameters in a gamma distribution. Example - Method of moments for gamma We have a sample of size 30 from a Gamma distribution with unknown parameters λ and α. The data is as follows. 9

0.038 0.009 1.190 0.312 0.015 0.036 0.865 0.425 0.269 0.098 0.498 0.511 0.000 0.159 0.437 1.141 0.002 0.111 0.013 0.150 0.076 0.786 0.985 0.344 0.668 0.018 0.344 0.085 0.460 0.622 0.082 0.944 0.028 0.035 2.018 2.293 0.123 2.604 0.017 0.014 0.108 0.442 3.316 0.679 0.407 0.120 0.654 0.338 1.434 0.508 25 20 Frequency 15 10 5 0 0 1 2 3 4 Figure 6: Histogram of the observations along with density of the estimated gamma distribution. The sample statistics are X = 0.537 and ˆσ 2 = 0.496. The method of moments estimators are, ˆλ = X ˆσ 2 = 1.082 ˆα = X 2 ˆσ 2 = 0.581. Figure 6 shows a histogram of the observed data along with the fitted density using the method of moments estimator. Now to the bootstrap. We take new samples from the original data. For example, the first sample is (sorted), This has a sample mean of X = 0.66 and sample variance of ˆσ2 = 0.801. 0.002 0.014 0.028 0.038 0.108 0.120 0.338 0.460 0.668 0.944 0.013 0.015 0.035 0.076 0.108 0.123 0.344 0.460 0.668 0.985 0.013 0.015 0.035 0.082 0.108 0.123 0.407 0.654 0.679 1.434 0.013 0.017 0.036 0.098 0.108 0.159 0.442 0.654 0.786 1.434 0.014 0.018 0.038 0.108 0.111 0.269 0.460 0.668 0.944 1.434 This gives us method of moment estimators of ˆλ = 0.825 and ˆα = 0.545. We repeat this process for the remaining 999 samples and plot the estimates in a histogram (figure 7). We can use this 10

Frequency 0 50 100 150 200 250 300 Frequency 0 50 100 150 0.5 1.0 1.5 2.0 2.5 3.0 3.5 λ^* 0.4 0.6 0.8 1.0 1.2 1.4 α^* Figure 7: Histograms for the estimates of λ and α from bootstrap replicates. The dotted lines correspond to the estimates from the original sample histogram as an approximation to the sampling distribution. We can see that the distribution for both parameters is right skewed. We can estimate the bias in the estimates by comparing the average of the bootstrap estimates with the initial estimate. ˆλ ave = 1 ˆλ i = 1.208 ˆα ave = 1 ˆα i = 0.631 Comparing these to the initial estimate we can see that the method of moments estimator might be slightly positively biased. We can also use the bootstrap replicates to estimate the standard error of the estimates. The estimates for the sampling standard errors are simply the standard deviation of the bootstrap estimates. ŝe λ = 1 (ˆλ i ˆλ ) 2 ave = 0.374 se α = 1 Approximate Confidence Intervals (ˆα i ˆα ave) 2 = 0.15 We can also use the bootstrap to find approximate confidence intervals. The bootstrap procedure is the same. Once the bootstrap distribution is obtained we can simply take the 2.5 and 97.5 percentiles as the upper and lower values of the confidence interval. 11

Frequency 0 50 100 150 200 250 300 Frequency 0 50 100 150 0.5 1.0 1.5 2.0 2.5 3.0 3.5 λ^* 0.4 0.6 0.8 1.0 1.2 1.4 α^* Figure 8: Histograms for the estimates of λ and α from bootstrap replicates. The dotted lines correspond to the estimates from the original sample Example - Gamma We find a 95% confidence interval for each parameter simply by finding the corresponding quantiles from the approximate sampling distributions found previously. These are plotted as dotted lines in Figure 8. The intervals are, α : (0.404, 0.982) λ : (0.770, 2.195) In words we would say: we estimate the true value of α to be between 0.404 and 0.982 with 95% confidence. Similarly for λ. References John A. Rice. Mathematical Statistics and Data Analysis. Duxbury Press, Belmont, CA, third edition, 2007. 12