Maximum Likelihood Exercises SOLUTIONS

Similar documents
maximum likelihood Maximum likelihood families nega2ve binomial ordered logit/probit

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Lecture 10: The Normal Distribution. So far all the random variables have been discrete.

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Statistical Distribution Assumptions of General Linear Models

Lecture 4: Training a Classifier

STAT 7030: Categorical Data Analysis

One-to-one functions and onto functions

Exam Applied Statistical Regression. Good Luck!

Machine Learning Linear Classification. Prof. Matteo Matteucci

Lecture 10: Powers of Matrices, Difference Equations

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1

Generalized Linear Models for Non-Normal Data

Analysing categorical data using logit models

Linear Regression Models P8111

Final Exam. Name: Solution:

1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression

Generalized Linear Models Introduction

12 Generalized linear models

In other words, we are interested in what is happening to the y values as we get really large x values and as we get really small x values.

MATH 1150 Chapter 2 Notation and Terminology

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

General Regression Model

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Stat 642, Lecture notes for 04/12/05 96

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

Eigenvalues & Eigenvectors

5.2 Infinite Series Brian E. Veitch

Chapter 1 Statistical Inference

Lecture 4: Training a Classifier

Tied survival times; estimation of survival probabilities

Introduction to Error Analysis

Supplemental Resource: Brain and Cognitive Sciences Statistics & Visualization for Data Analysis & Inference January (IAP) 2009

Chapter 19: Logistic regression

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Do students sleep the recommended 8 hours a night on average?

Chapter 7: Sampling Distributions

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

ISQS 5349 Spring 2013 Final Exam

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

CS1800: Sequences & Sums. Professor Kevin Gold

Chapter 7: Sampling Distributions

Section 1.x: The Variety of Asymptotic Experiences

Business Statistics. Lecture 9: Simple Regression

Psychology 282 Lecture #3 Outline

When is MLE appropriate

Glossary for the Triola Statistics Series

Statistics 262: Intermediate Biostatistics Model selection

Multinomial Logistic Regression Models

AQA Level 2 Further mathematics Further algebra. Section 4: Proof and sequences

Stat 5101 Lecture Notes

Interactions and Factorial ANOVA

A Re-Introduction to General Linear Models (GLM)

Math 3361-Modern Algebra Lecture 08 9/26/ Cardinality

Chapter 1. Modeling Basics

Algebra Exam. Solutions and Grading Guide

Interactions and Factorial ANOVA

[Limits at infinity examples] Example. The graph of a function y = f(x) is shown below. Compute lim f(x) and lim f(x).

Poisson regression: Further topics

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Lecture 17: Small-Sample Inferences for Normal Populations. Confidence intervals for µ when σ is unknown

C if U can. Algebra. Name

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

General Least Squares Fitting

Binary Logistic Regression

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Homework Solutions Applied Logistic Regression

Logit Regression and Quantities of Interest

Business Statistics. Lecture 10: Course Review

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

MS&E 226: Small Data

Descriptive statistics

Lecture 10A: Chapter 8, Section 1 Sampling Distributions: Proportions

Chapter 1 Describing Data

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

Announcements. J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 8, / 45

Finding the Median. 1 The problem of finding the median. 2 A good strategy? lecture notes September 2, Lecturer: Michel Goemans

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Induction 1 = 1(1+1) = 2(2+1) = 3(3+1) 2

MA554 Assessment 1 Cosets and Lagrange s theorem

Stat 587: Key points and formulae Week 15

Analysing data: regression and correlation S6 and S7

Week 5: Logistic Regression & Neural Networks

Association studies and regression

Probability Distributions

Logit Regression and Quantities of Interest

4 Multicategory Logistic Regression

Fitting a Straight Line to Data

:Effects of Data Scaling We ve already looked at the effects of data scaling on the OLS statistics, 2, and R 2. What about test statistics?

ACCUPLACER MATH 0311 OR MATH 0120

Inference Tutorial 2

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Lecture 3: Big-O and Big-Θ

Final Exam - Solutions

Comparing IRT with Other Models

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Transcription:

Maximum Likelihood Exercises SOLUTIONS Exercise : Frog covariates () Before we can fit the model, we have to do a little recoding of the data. Both ReedfrogPred$sie and ReedfrogPred$pred are text data, so we ll need to convert them to dummy variables, before we can use them in the model. R is nice about doing this for you, when you use lm(), but it won t do it automatically for custom maximum likelihood estimation, using mle2(). So to recode both categorical variables to 0/ dummies: # need to recode sie to a dummy variable ReedfrogPred$siecode <- ifelse( ReedfrogPred$sie=="big",, 0 ) # need to recode pred to a dummy variable ReedfrogPred$predcode <- ifelse( ReedfrogPred$pred=="pred",, 0 ) Now we can define the first model. All you need to do is modify the code I provided to expand to linear model portion. What we are assuming is that the probability of an individual tadpole surviving is given by: Pr(survive parameters) = + exp(linear model). So you can build the linear model part in the same way you d build a linear regression. This code fits the model that includes parameters for baseline survival (the intercept), sie, and predator present: m <- mle2( ReedfrogPred$surv ~ dbinom( prob=/(+exp( m + s*reedfrogpred$siecode + p*reedfrogpred$predcode )), sie=reedfrogpred$density ), start=list(m=0,s=0,p=0) ) The part that corresponds to the probability model above is: prob=/(+exp( m + s*reedfrogpred$siecode + p*reedfrogpred$predcode )) This is the same as: Pr(survive m, s, p) = The estimates you come up with are: m s p -2.923665 0.6738493 2.703926 + exp(m + s siecode + p predcode). What do these mean? To find out, you need to understand how the probability of survival responds to changes in these parameters. The answer is that negative estimates increase probability (as the value of the predictor increases), while positive estimates decrease probability. You can prove this for yourself (you should have) by simply plotting /( + exp(m)), for a range of m values: curve( /(+exp(x)), from=-5, to=5 )

/( + exp(x)) 0.0 0.2 0.4 0.6 0.8.0-4 -2 0 2 4 x Note that probability is highest for the smallest values. So looking at our parameter estimates again: m s p -2.923665 0.6738493 2.703926 we can conclude that baseline survival, m, is equal to: + exp( 2.92) = 0.9489798. That s the probability of survival you get when sie is ero and pred is ero. It s very high! The parameter estimates for the other two parameters, s and p, are positive, so each decreases chance of survival, as each predictor increases. Being bigger means dying more. The presence of predators means dying more. There is actually a deep and precise relationship between these estimates, on the log scale, and the odds of the event happening. The odds of a tadpole survival, considering only baseline survival m is: Pr(survive) Pr(survive) = 0.9489798 0.9489798 = 8.60008. That means 8.6: odds of survival. Very good odds. Now consider the odds of survival, when we add a predator. The chance of survival becomes /( + exp(m + p)) = /( + exp( 2.92 + 2.70) = 0.554596. So the new odds are: 0.554596 0.554596 =.2453. When predators are present, the odds of survival are.25:. Substantially reduced. 2

Now here s the magic. Compare 8.6: to.25:. The proportional change in odds is:.2453 8.60008 = 0.06694224. This says that the new odds are about 6.7% of the previous odds. By coincidence (not really), if you evaluate exp( p), where p is the parameter estimate for predation, you get: exp( p) = exp( 2.703926) = 0.0669427. That s the same number as before, the proportional change in odds. So you can interpret these coefficients inside the logistic as proportional changes in odds. Just use the formula: proportional change in odds = exp( estimate). You can prove that this formula is correct, with a little algebra. We want to know the number x that satisfies: for any values a and b. Solving for x yields: /( + exp(a)) /( + exp(a + b)) x = /( + exp(a)) /( + exp(a + b)), x = exp( b), which is the result we induced above, via arithmetic. In summary, to get the proportional change in odds induced by a parameter, per unit of its predictor variable, use exp( k), where k is the parameter. For this reason, you will sometimes see these logit parameter estimates referred to as log-odds. Coming back to our estimates, the odds changes induced by each are: > exp( -coef(m) ) m s p 8.60009093 0.50974267 0.0669428 So you can read these as: () the baseline odds of survival are 8.6:; (2) big tadpoles have half the odds of survival as small tadpoles; (3) when predators are present, a tadpole has 6.6% the odds of survival as when predators are absent. Now let s try a model that interacts sie and predation. m2 <- mle2( ReedfrogPred$surv ~ dbinom( prob=/(+exp( m + s*reedfrogpred$siecode + p*reedfrogpred$predcode + *ReedfrogPred$siecode*ReedfrogPred$predcode )), sie=reedfrogpred$density ), start=list(m=0,s=0,p=0,=0) ) Which evaluates the probability model: Pr(survive m, s, p, ) = The estimates are: + exp(m + s siecode + p predcode + siecode predcode). 3

m s p -2.46974-0.58644 2.3034.062006 Remember, positive estimates decrease chance of survival. So now the estimate for sie has changed direction, and being big helps you survive that makes more sense. This main effect applies in the absence of predators, so we might interpret this as saying that bigger individuals die slightly less, when there are no predators around. The interaction of sie and predation tells us that being big hurts your survival, when predators are around. This implies that predators target big tadpoles. Now converting to odds: > exp( -coef(m2) ) m s p.7272758.73588 0.87968 0.345765 We start with.7: odds of survival. Being big increases the odds of survival by about 7%. Predators reduce odds of survival by about 0.2 = 0.88 = 88%. When a tadpole is big and predators are present, the odds of survival (already accounting for the main effects of sie and predation) are reduced by an additional 0.35 = 0.65 = 65%. (2) Confidence intervals for the second model I estimated above (using confint()) are: 2.5 % 97.5 % m -2.926983-2.052882 s -0.8043439 0.479364 p.652749 2.6486447 0.340298.795809 We can again convert these to changes in odds: > exp( -confint(m2) ) Profiling... 2.5 % 97.5 % m 8.656568 7.7779393 s 2.2352294 0.697865 p 0.95240 0.07074703 0.76779 0.6669644 So read the m line as the baseline odds of survival lie between 8.7: and 7.8:, with 95% confidence. The s line could be read as the proportional change in odds induced by being big lie between 2.24 times and 0.62 times the original odds. Clearly the effect of sie isn t estimated very precisely. The other two parameters, p and, however are consistent reductions in odds. So we can have more confidence about interpreting those effects. To compute the quadratic estimate likelihood intervals, you can use the formula I provided in lecture (and in the source file for this week): pse <- sqrt(diag( vcov(m2) )) m s p 0.222047 0.3252592 0.2530069 0.368737 4

Note that these are the same standard errors you get if you type summary(m2). To make these into 95% confidence intervals, use the -score that corresponds to a 95% interval,.96, and add/subtract.96 times the standard error to each parameter estimate: coef(m2) + pse*.96 # upper bounds coef(m2) - pse*.96 # lower bounds These intervals are very similar to the profile likelihood intervals we produced with confint(). One reason the quadratic estimation intervals (the ones that are estimated form the vcov matrix) are so similar to those from the likelihood profiles is that the profiles are shaped very much like quadratic functions, in this case. You can see this when you plot the profiles, in the square-root space, using plot(profile(m2)): Likelihood profile: m Likelihood profile: s 0.0 0.5.0.5 2.0 2.5 99% 95% 90% 80% 50% 0.0 0.5.0.5 2.0 2.5 99% 95% 90% 80% 50% -3.0-2.8-2.6-2.4-2.2-2.0 -.0-0.5 0.0 0.5 m s Likelihood profile: p Likelihood profile: 0.0 0.5.0.5 2.0 2.5 99% 95% 90% 80% 50% 0.0 0.5.0.5 2.0 2.5 99% 95% 90% 80% 50%.6.8 2.0 2.2 2.4 2.6 2.8 0.0 0.5.0.5 2.0 p If the lines were less straight, there would be more disagreement between the quadratic estimation intervals and the direct profile intervals. (3) Now to plot the chance of survival, using the estimates. There are many ways to do this, so you may have used a different approach. That s fine. My tactic here is to use a basic bar chart, because we actually have discrete predictor categories. There are four groups of tadpoles, and each has a single estimated chance of survival for every tadpole in a group. The groups are: () small/no-pred, (2) small/pred, (3) big/no-pred, (4) big/pred. The code below computes these four probabilities and makes a simple bar chart of them. p <- /(+exp( coef(m2)[] )) # small/no-pred 5

p2 <- /(+exp( coef(m2)[] + coef(m2)[2] )) # large/no-pred p3 <- /(+exp( coef(m2)[] + coef(m2)[3] )) # small/pred p4 <- /(+exp( coef(m2)[] + coef(m2)[2] + coef(m2)[3] + coef(m2)[4] )) # large/pred barplot( c(p,p2,p3,p4), names.arg=c("small/no-pred","large/no-pred", "small/pred","large/pred"), ylab="probability of survival" ) All this is copacetic with our previous interpretation of the parameters. Exercise 2: Making up data () The histograms from 00 runs of estimation parameters from simulated data are: Histogram of co[, ] Histogram of co[, 2] Frequency 0 20 40 60 Frequency 0 20 40 60-20 20 60 00 co[, ] -70-50 -30-0 co[, 2] The left histogram is the p parameter. The right one is the k parameter. The mean of each is 6.04 and 7.73, respectively. The original (true) values are p = 5 and k = 2. So there appears to be some bias in the estimates: p tends to be overestimated, and k tends to be underestimated. (This bias doesn t go away, as you increase the number of simulations.) We can get a single run a simulated data to compare against, using: d <- gen.dat( 20, 5, -2 ) Estimate it s parameters and compute confidence intervals: m <- mle2( d$y ~ dbinom(prob=/( + exp(p + k*d$x)),sie=), start=list(p=0,k=0) ) Coefficients: p k 2.092665 -.645 2.5 % 97.5 % p -0.8594779 6.428434 k -2.748847-0.222747 Aside from the noted bias again, the histograms do provide a measure of the range of values that are plausible. The simulated distribution is much more skewed than the single sample estimates, though. This is due to the bias in estimation. So to answer the question, I would say the do not contain exactly the same information. Instead, we learn from the simulation exercise that the estimation is biased. 6

But look what happens when we make the true values p = 0 and k = 0. Now the simulated distributions and the confidence intervals from a single run are very similar. You can directly compare the implied confidence intervals of the simulated runs by find the 2.5% and 97.5% cutoffs in the histograms. R makes this easy, with the quantile function: d <- gen.dat( 20, 0, 0 ) m <- mle2( d$y ~ dbinom(prob=/( + exp(p + k*d$x)),sie=), start=list(p=0,k=0) ) confint(m) for ( i in :00 ) { d <- gen.dat( 20, 0, 0 ) m <- mle2( d$y ~ dbinom(prob=/( + exp(p + k*d$x)),sie=), start=list(p=0,k=0) ) if ( i== ) { co <- coef(m) else { co <- rbind(co,coef(m)) quantile( co[,], probs=c(0.025,0.975) ) quantile( co[,2], probs=c(0.025,0.975) ) (2) When we reduce the sample sie, everything gets wider: the histograms and the confidence intervals. Less data means less confidence. (3) Larger values of the parameters appear to induce more bias and make estimates harder. In particular, if the mean values tend to push the chance of seeing a towards either or 0, then it ll be very hard to estimate parameters, because the system is pushed against either the floor or ceiling. (4) To add another prediction variable: gen.dat2 <- function(n,p,k,k2) { x <- rnorm( n, 4, 2 ) x2 <- rnorm( n,, 2 ) pr <- /(+exp(p + k*x + k2*x2)) y <- rbinom( prob=pr, sie=, n=n ) data.frame(x=x,y=y,x2=x2) for ( i in :00 ) { d <- gen.dat2( 50, -, 3, ) m <- mle2( d$y ~ dbinom(prob=/( + exp(p + k*d$x + k2*d$x2)), sie=), start=list(p=0,k=0,k2=0) ) if ( i== ) { co <- coef(m) else { co <- rbind(co,coef(m)) 7

Exercise 3: Making up data again A Poisson random variable is a count of events that occur in a finite interval of time. The minimum is ero and the maximum is infinity. You can see what a Poisson distribution looks like with: hist(rpois(000,lambda=2)) The parameter lambda is the mean number of events. As you increase it, the distribution looks increasingly normal, but the values are still always integers (because counts are never fractions). All you have to do to convert the code form #2 into Poisson simulations is to replace the probability functions rbinom and dbinom with rpois and dpois. The trick, though, is to use exp() to ensure that the value of the mean number of events is always greater than ero. gen.dat.pois <- function(n,p,k) { x <- rnorm( n, 4, 2 ) y <- rpois( lambda=exp( p + k*x ), n=n ) data.frame(x=x,y=y) for ( i in :00 ) { d <- gen.dat.pois( 20,, 3 ) m <- mle2( d$y ~ dpois(lambda=exp( p + k*d$x ) ), start=list(p=0,k=0) ) if ( i== ) { co <- coef(m) else { co <- rbind(co,coef(m)) 8