Gov 2001: Section 4. February 20, Gov 2001: Section 4 February 20, / 39

Similar documents
Mathematical statistics

GOV 2001/ 1002/ E-2001 Section 3 Theories of Inference

Mathematical statistics

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Chapters 9. Properties of Point Estimators

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

A Very Brief Summary of Statistical Inference, and Examples

Central Limit Theorem ( 5.3)

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

UNIVERSITY OF TORONTO Faculty of Arts and Science

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Theory of Maximum Likelihood Estimation. Konstantin Kashin

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

A Very Brief Summary of Statistical Inference, and Examples

Mathematical statistics

Primer on statistics:

From Model to Log Likelihood

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Information in a Two-Stage Adaptive Optimal Design

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Introduction to Estimation Methods for Time Series models Lecture 2

Review. December 4 th, Review

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Theory of Statistics.

POLI 8501 Introduction to Maximum Likelihood Estimation

Elements of statistics (MATH0487-1)

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Introduction to Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Linear Regression Models P8111

6 The normal distribution, the central limit theorem and random samples

Interval Estimation III: Fisher's Information & Bootstrapping

Continuing with Binary and Count Outcomes

Parameter Estimation and Fitting to Data

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Parameter Estimation

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

HT Introduction. P(X i = x i ) = e λ λ x i

GOV 2001/ 1002/ Stat E-200 Section 8 Ordered Probit and Zero-Inflated Logit

Precept Three: Numerical Optimization and Simulation to Derive QOI from Model Estimates

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Comparing two independent samples

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

EECE Adaptive Control

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Math 50: Final. 1. [13 points] It was found that 35 out of 300 famous people have the star sign Sagittarius.

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

Statistics Ph.D. Qualifying Exam: Part II November 3, 2001

Practice Problems Section Problems

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Advanced Signal Processing Introduction to Estimation Theory

Final Examination Statistics 200C. T. Ferguson June 11, 2009

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Graduate Econometrics I: Unbiased Estimation

A General Overview of Parametric Estimation and Inference Techniques.

Brief Review on Estimation Theory

Parametric Techniques Lecture 3

Chapter 8.8.1: A factorization theorem

Basic Probability Reference Sheet

STAT 461/561- Assignments, Year 2015

Statistical Distribution Assumptions of General Linear Models

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

557: MATHEMATICAL STATISTICS II BIAS AND VARIANCE

Regression Estimation Least Squares and Maximum Likelihood

MS&E 226: Small Data

Parametric Techniques

Introduction to Estimation Methods for Time Series models. Lecture 1

First Year Examination Department of Statistics, University of Florida

Statistics II Lesson 1. Inference on one population. Year 2009/10

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

GOV 2001/ 1002/ E-2001 Section 8 Ordered Probit and Zero-Inflated Logit

Advanced Quantitative Methods: maximum likelihood

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Count and Duration Models

1. (Regular) Exponential Family

EIE6207: Estimation Theory

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Exercises and Answers to Chapter 1

MS&E 226: Small Data

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

Lecture 8: Information Theory and Statistics

Ch. 5 Hypothesis Testing

Generalized Linear Models

GOV 2001/ 1002/ E-200 Section 7 Zero-Inflated models and Intro to Multilevel Modeling 1

Logit Regression and Quantities of Interest

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Testing and Model Selection

Suggested solutions to written exam Jan 17, 2012

Inference in non-linear time series

Bias Variance Trade-off

Transcription:

Gov 2001: Section 4 February 20, 2013 Gov 2001: Section 4 February 20, 2013 1 / 39

Outline 1 The Likelihood Model with Covariates 2 Likelihood Ratio Test 3 The Central Limit Theorem and the MLE 4 What We Can Do with a Normal MLE 5 Efficiency Gov 2001: Section 4 February 20, 2013 2 / 39

Outline The Likelihood Model with Covariates 1 The Likelihood Model with Covariates 2 Likelihood Ratio Test 3 The Central Limit Theorem and the MLE 4 What We Can Do with a Normal MLE 5 Efficiency Gov 2001: Section 4 February 20, 2013 3 / 39

The Likelihood Model with Covariates A roadmap We started by introducing the concept of likelihood in the simplest univariate context one observation, one variable. Then we moved forward with more than one observation and multiplied likelihoods together. Now, we are introducing covariates. Stochastic : Y i f (y i γ) Systematic : γ = g(x i, θ) This allows us to find estimated coefficients for our covariates ultimately what we are really interested in! Gov 2001: Section 4 February 20, 2013 4 / 39

The Likelihood Model with Covariates A roadmap (ctd.) Key to all of this is the distinction between stochastic and systematic components: Stochastic - the probability distribution of the data; key to identifying what model (Poisson, binomial, etc.) you should use., E.g., Y i f (y i γ). Systematic - how the parameters of the probability distribution vary over your covariates; key to incorporating covariates into your model. E.g., γ = g(x i, θ). You ll need both parts to model the likelihood. Gov 2001: Section 4 February 20, 2013 5 / 39

The Likelihood Model with Covariates Back to our Running Example Ex. Waiting for the Redline How long will it take for the next T to get here? Y is a Exponential random variable with parameter λ. f (y) = λe λy Gov 2001: Section 4 February 20, 2013 6 / 39

The Likelihood Model with Covariates Back to our Running Example Ex. Waiting for the Redline How long will it take for the next T to get here? But this time we want to add covariates. What do you think affects the wait for the Redline? Gov 2001: Section 4 February 20, 2013 7 / 39

The Likelihood Model with Covariates How would we model this? We know the stochastic component: Y i Exponential(λ i ) Y i λ i e λ i y i Remember, for an Exponential µ i = 1 λ i So we re going to set the systematic component Why do we use the exp? What are the parameters here? µ i = exp(x i β) λ i = 1 exp(x i β) Gov 2001: Section 4 February 20, 2013 8 / 39

The Likelihood Model with Covariates Solve for the log-likelihood First, write the log-likelihood in terms of λ i : L(λ i y i ) Exponential(y i λ i ) λ i e λ i y i n L(λ y) λ i e λ i y i ln L(λ y) i=1 n (ln λ i λ i y i ) i=1 Gov 2001: Section 4 February 20, 2013 9 / 39

The Likelihood Model with Covariates Solve for the log-likelihood Next, plug in the systematic component ln L(β y) ln L(β y) ln L(β y) n i=1 n i=1 n i=1 ( ( ln 1 exp(x i β) ) ( ln 1 ln exp(x i β) ( (X i β) 1 exp(x i β) y i ) 1 exp(x i β) y i ) 1 exp(x i β) y i ) Gov 2001: Section 4 February 20, 2013 10 / 39

Solve Using R The Likelihood Model with Covariates I m going to say whether or not it is Friday and the minutes behind schedule are important covariates. I m going to create some fake data: set.seed(02139) n = 1000 Friday <- sample(c(0,1), n, replace=t) minssch <- rnorm(n, 3,.5) Y <- rexp(n, rate = 1/exp(1.25 -.5*Friday +.2*minsSch)) data <- as.data.frame(cbind(y, Friday, minssch)) Gov 2001: Section 4 February 20, 2013 11 / 39

Let s look at Y The Likelihood Model with Covariates hist(y, col = "goldenrod", main = "Distribution of y") Distribution of y Frequency 0 100 200 300 400 500 600 0 10 20 30 40 Y Gov 2001: Section 4 February 20, 2013 12 / 39

Solve with Zelig The Likelihood Model with Covariates We could solve with Zelig. library(zelig) #First, create an indicator that indicates 100% death rate. data$ind <- rep(1,n) #Next, specify the model z.out <- zelig(surv(y, ind) ~ Friday + minssch, data=data, model="exp") Gov 2001: Section 4 February 20, 2013 13 / 39

Exponential distribution Loglik(model)= -2503.4 Loglik(intercept only)= -2538.4 Chisq= 69.99 on 2 degrees of freedom, p= 6.7e-16 Number of Newton-Raphson Iterations: 4 Gov 2001: Section 4 February 20, 2013 14 / 39 Solve with Zelig The Likelihood Model with Covariates summary(z.out) Call: zelig(formula = Surv(Y, ind) ~ Friday + minssch, model = "exp", data = data) Value Std. Error z p (Intercept) 1.089 0.1928 5.65 1.64e-08 Friday -0.463 0.0635-7.29 3.02e-13 minssch 0.212 0.0613 3.46 5.41e-04 Scale fixed at 1

Solving Manually The Likelihood Model with Covariates Remember the log-likelihood we solved for before: n ( ) 1 ln L(β y) (X i β) exp(x i β) y i i=1 We can program the log-likelihood in two ways. llexp <- function(param, y, x){ rate <- 1/exp(x%*%param) sum(dexp(y, rate=rate, log=t)) } llexp2 <- function(param, y,x){ cov <- x%*%param sum(-cov - 1/exp(cov)*y) } Gov 2001: Section 4 February 20, 2013 15 / 39

The Likelihood Model with Covariates Solving Manually: Optimize #Create X with an intercept X <- cbind(1, Friday, minssch) #Specify starting values param <- c(1,1,1) #Solve using optim out <- optim(param, fn=llexp, y=y, x=x, method="bfgs", hessian=t, control=list(fnscale=-1)) Gov 2001: Section 4 February 20, 2013 16 / 39

The Likelihood Model with Covariates Solving Manually: Output > out$par [1] 1.0885871-0.4634621 0.2120591 Does this check with the Zelig output? Gov 2001: Section 4 February 20, 2013 17 / 39

Outline Likelihood Ratio Test 1 The Likelihood Model with Covariates 2 Likelihood Ratio Test 3 The Central Limit Theorem and the MLE 4 What We Can Do with a Normal MLE 5 Efficiency Gov 2001: Section 4 February 20, 2013 18 / 39

Likelihood Ratio Test Likelihood Ratio Tests Useful for when you are comparing two models. We ll call these restricted and unrestricted: Unrestricted : β 0 + β 1 X 1 + β2x 2 Restricted : β 0 + β 2 X 2 We want to test the usefulness of the parameters in the unrestricted model but omitted in the restricted model. Gov 2001: Section 4 February 20, 2013 19 / 39

Likelihood Ratio Test Likelihood Ratio Tests (ctd.) Here s how to operationalize this: Let L be the maximum of the unrestricted likelihood, and let L r the maximum of the restricted likelihood. But adding more variables can only increase the likelihood. Thus, L L r, or L r L 1. If the likelihood ratio is exactly 1, then there s no effect of the extra parameters at all. Gov 2001: Section 4 February 20, 2013 20 / 39

Likelihood Ratio Test Likelihood Ratio Tests (ctd.) Now, let s define a test statistic: define : R = 2ln L r L = 2(lnL lnl r ) R will always be greater than zero. It follows a χ 2 distribution with m degrees of freedom, where m is the number of restrictions. Key question: how much greater than zero does R have to be in order to convince us that the difference is due to systematic differences between the two models? Gov 2001: Section 4 February 20, 2013 21 / 39

Back to Our Example Likelihood Ratio Test What if we wanted to test whether the minutes behind schedule should be in our model at all? > unrestricted <- optim(param, fn=llexp, y=y, x=x, method="bfgs", hessian=t, control=list(fnscale=-1)) > unrestricted$value [1] -2503.445 versus >restricted <- optim(c(1,1), fn=llexp, y=y, x=cbind(1, Friday), method="bfgs", hessian=t, control=list(fnscale=-1)) > restricted$value [1] -2509.471 Gov 2001: Section 4 February 20, 2013 22 / 39

Likelihood Ratio Test Back to Our Example (ctd.) Under the null that the restrictions are valid, the test statistic would be distributed χ 2 with one degree of freedom: r <- 2*(unrestricted$value - restricted$value) > 1-pchisq(r,df=1) [1] 0.0005176814 So the probability of getting this test statistic under the null is extremely small. We reject. Gov 2001: Section 4 February 20, 2013 23 / 39

Outline The Central Limit Theorem and the MLE 1 The Likelihood Model with Covariates 2 Likelihood Ratio Test 3 The Central Limit Theorem and the MLE 4 What We Can Do with a Normal MLE 5 Efficiency Gov 2001: Section 4 February 20, 2013 24 / 39

The Central Limit Theorem and the MLE Once an ML estimates are calculated, we ll want to know how good they are. How much information does the MLE contain about the underlying parameter? The MLE alone isn t satisfying we need a way to quantify uncertainty. Gov 2001: Section 4 February 20, 2013 25 / 39

I m going to generate 1000 datasets, of 10 observations each from an Exponential with λ =.5. > n <- 10 > data <- sapply(seq(1,1000)function(x) rexp(n, rate=.5)) > dim(data) [1] 10 1000 For each of these datasets, I m going to find the maximum likelihood estimate for λ. llexp <- function(param, y){ sum(dexp(y, rate=param, log=t)) } out <- NULL for(i in 1:1000){ out[i] <- optim(c(1), fn=llexp, y=data[,i], method="bfgs", control=list(fnscale=-1))$par Gov 2001: Section 4 February 20, 2013 26 / 39 The Central Limit Theorem and the MLE Convince Yourself of a Normal Distribution for the MLE

The Central Limit Theorem and the MLE Plot of the Distribution of Lambda when N is 10 Histogram of Lambda for N=10 Frequency 0 20 40 60 80 100 120 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 out Gov 2001: Section 4 February 20, 2013 27 / 39

The Central Limit Theorem and the MLE Plot of the Distribution of Lambda when N is 100 Histogram of Lambda for N=100 Frequency 0 50 100 150 0.4 0.5 0.6 0.7 out Gov 2001: Section 4 February 20, 2013 28 / 39

The Central Limit Theorem and the MLE Plot of the Distribution of Lambda when N is 10000 Histogram of Lambda for N=10000 Frequency 0 20 40 60 80 0.485 0.490 0.495 0.500 0.505 0.510 0.515 out Gov 2001: Section 4 February 20, 2013 29 / 39

The Central Limit Theorem and the MLE How do we think about this intuitively? The Central Limit Theorem states that the mean of independent random variables will become approximately Normal as n goes to infinity. But we re not talking about the mean! Yes, but the maximum of the log-likelihood is essentially the mean of a lot of likelihoods. And we use this maximum to estimate our parameter. Therefore, as n gets larger, and the more likelihoods we have to conglomerate, the more normal the distribution of the parameter becomes. Gov 2001: Section 4 February 20, 2013 30 / 39

The Central Limit Theorem and the MLE Our Normal Variable So for large n, our parameter θ is distributed Normally with the mean as the true value of θ and the variance [I (ˆθ)] 1 Measure of curvature: Fisher Information Matrix I (ˆθ) = 2 lnl(θ) 2 (ˆθ) θ Inverse of the Fisher Information gives us Var(ˆθ) [I (ˆθ)] 1 Var(ˆθ) Square root of Var(ˆθ) gives us SE(ˆθ) SE(ˆθ) = Var(ˆθ) Gov 2001: Section 4 February 20, 2013 31 / 39

Outline What We Can Do with a Normal MLE 1 The Likelihood Model with Covariates 2 Likelihood Ratio Test 3 The Central Limit Theorem and the MLE 4 What We Can Do with a Normal MLE 5 Efficiency Gov 2001: Section 4 February 20, 2013 32 / 39

What We Can Do with a Normal MLE We Can Compute Confidence Intervals Under H 0 : θ = θ 0 Zθ0 = ˆθ θ 0 SE(ˆθ) Given α, the Confidence Interval: N(0, 1) [ˆθ ± N(0, 1) α 2 SE(ˆθ)] Why? Find all θ 0, s.t. P( N(0, 1) α 2 Z θ 0 N(0, 1) α 2 ) = 1 α Gov 2001: Section 4 February 20, 2013 33 / 39

What We Can Do with a Normal MLE We Can Make Predictions First, we find the variance-covariance matrix of our parameters we had before. out <- optim(param, fn=llexp, y=y, x=x, method="bfgs", hessian=t, control=list(fnscale=-1)) varcv <- -solve(out$hessian) Using our assumption that for large n, our β s are distributed Normally, we simulate β s. simbetas <- rmvnorm(1000, mean=out$par, sigma=varcv) Gov 2001: Section 4 February 20, 2013 34 / 39

What We Can Do with a Normal MLE We Can Make Predictions Say I wanted to know how much longer I will have to wait for the redline on Friday. First, I would have to create covariates that will let me predict the wait on Friday predcovs <- c(1,1, mean(minssch)) Then I would have to simulate y s from our model and simulated betas simyfriday <- apply(simbetas,1,function(x) rexp(n=1, rate=1/exp(predcovs%*%x))) plot(density(simyfriday), main= "Expected Wait for the Redline on Friday", xlab="mins") I would do the same for not Friday. predcovs <- c(1,0, mean(minssch)) simy <- apply(simbetas,1,function(x) rexp(n=1, rate=1/exp(predcovs%*%x))) plot(density(simy), main= "Expected Wait for the Redline not Friday", xlab="mins") Gov 2001: Section 4 February 20, 2013 35 / 39

What We Can Do with a Normal MLE We Can Make Predictions Expected Wait for the Redline on Friday Expected Wait for the Redline on not Friday Density 0.00 0.05 0.10 0.15 0.20 Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0 5 10 15 20 25 30 mins 0 10 20 30 mins Gov 2001: Section 4 February 20, 2013 36 / 39

Outline Efficiency 1 The Likelihood Model with Covariates 2 Likelihood Ratio Test 3 The Central Limit Theorem and the MLE 4 What We Can Do with a Normal MLE 5 Efficiency Gov 2001: Section 4 February 20, 2013 37 / 39

Efficiency Mean Squared Error and Efficiency For estimator ˆθ and true parameter θ 0, MSE(ˆθ) = E[(ˆθ θ 0 ) 2 ] This is the bias-variance tradeoff = Var(ˆθ) + (Bias(ˆθ, θ 0 ) 2 ) When two estimates ˆθ and θ are unbiased, we can compare their efficiency by using eff (ˆθ, θ) = Var( θ) Var(ˆθ) Gov 2001: Section 4 February 20, 2013 38 / 39

Efficiency Efficiency and the Cramer-Rao Inequality So basically, for unbiased estimators, we want parameters with low variance. Cramer-Rao Inequality Let X 1... X n be i.i.d. with density function f (x θ). Let T = t(x 1... X n ) be an unbiased estimate of θ. Then, under smoothness assumptions of f (x θ). Var(T ) 1 ni n (θ) = 1 I (θ) Among unbiased estimators, the MLE has the smallest asymptotic variance The MLE is thus said to be asymptotically efficient. Gov 2001: Section 4 February 20, 2013 39 / 39