Maximum Likelihood Exercises SOLUTIONS

Size: px

Start display at page:

Download "Maximum Likelihood Exercises SOLUTIONS"

Clement Webster
6 years ago
Views:

1 Maximum Likelihood Exercises SOLUTIONS Exercise : Frog covariates () Before we can fit the model, we have to do a little recoding of the data. Both ReedfrogPred$sie and ReedfrogPred$pred are text data, so we ll need to convert them to dummy variables, before we can use them in the model. R is nice about doing this for you, when you use lm(), but it won t do it automatically for custom maximum likelihood estimation, using mle2(). So to recode both categorical variables to 0/ dummies: # need to recode sie to a dummy variable ReedfrogPred$siecode <- ifelse( ReedfrogPred$sie=="big",, 0 ) # need to recode pred to a dummy variable ReedfrogPred$predcode <- ifelse( ReedfrogPred$pred=="pred",, 0 ) Now we can define the first model. All you need to do is modify the code I provided to expand to linear model portion. What we are assuming is that the probability of an individual tadpole surviving is given by: Pr(survive parameters) = + exp(linear model). So you can build the linear model part in the same way you d build a linear regression. This code fits the model that includes parameters for baseline survival (the intercept), sie, and predator present: m <- mle2( ReedfrogPred$surv ~ dbinom( prob=/(+exp( m + s*reedfrogpred$siecode + p*reedfrogpred$predcode )), sie=reedfrogpred$density ), start=list(m=0,s=0,p=0) ) The part that corresponds to the probability model above is: prob=/(+exp( m + s*reedfrogpred$siecode + p*reedfrogpred$predcode )) This is the same as: Pr(survive m, s, p) = The estimates you come up with are: m s p exp(m + s siecode + p predcode). What do these mean? To find out, you need to understand how the probability of survival responds to changes in these parameters. The answer is that negative estimates increase probability (as the value of the predictor increases), while positive estimates decrease probability. You can prove this for yourself (you should have) by simply plotting /( + exp(m)), for a range of m values: curve( /(+exp(x)), from=-5, to=5 )

2 /( + exp(x)) x Note that probability is highest for the smallest values. So looking at our parameter estimates again: m s p we can conclude that baseline survival, m, is equal to: + exp( 2.92) = That s the probability of survival you get when sie is ero and pred is ero. It s very high! The parameter estimates for the other two parameters, s and p, are positive, so each decreases chance of survival, as each predictor increases. Being bigger means dying more. The presence of predators means dying more. There is actually a deep and precise relationship between these estimates, on the log scale, and the odds of the event happening. The odds of a tadpole survival, considering only baseline survival m is: Pr(survive) Pr(survive) = = That means 8.6: odds of survival. Very good odds. Now consider the odds of survival, when we add a predator. The chance of survival becomes /( + exp(m + p)) = /( + exp( ) = So the new odds are: = When predators are present, the odds of survival are.25:. Substantially reduced. 2

3 Now here s the magic. Compare 8.6: to.25:. The proportional change in odds is: = This says that the new odds are about 6.7% of the previous odds. By coincidence (not really), if you evaluate exp( p), where p is the parameter estimate for predation, you get: exp( p) = exp( ) = That s the same number as before, the proportional change in odds. So you can interpret these coefficients inside the logistic as proportional changes in odds. Just use the formula: proportional change in odds = exp( estimate). You can prove that this formula is correct, with a little algebra. We want to know the number x that satisfies: for any values a and b. Solving for x yields: /( + exp(a)) /( + exp(a + b)) x = /( + exp(a)) /( + exp(a + b)), x = exp( b), which is the result we induced above, via arithmetic. In summary, to get the proportional change in odds induced by a parameter, per unit of its predictor variable, use exp( k), where k is the parameter. For this reason, you will sometimes see these logit parameter estimates referred to as log-odds. Coming back to our estimates, the odds changes induced by each are: > exp( -coef(m) ) m s p So you can read these as: () the baseline odds of survival are 8.6:; (2) big tadpoles have half the odds of survival as small tadpoles; (3) when predators are present, a tadpole has 6.6% the odds of survival as when predators are absent. Now let s try a model that interacts sie and predation. m2 <- mle2( ReedfrogPred$surv ~ dbinom( prob=/(+exp( m + s*reedfrogpred$siecode + p*reedfrogpred$predcode + *ReedfrogPred$siecode*ReedfrogPred$predcode )), sie=reedfrogpred$density ), start=list(m=0,s=0,p=0,=0) ) Which evaluates the probability model: Pr(survive m, s, p, ) = The estimates are: + exp(m + s siecode + p predcode + siecode predcode). 3

4 m s p Remember, positive estimates decrease chance of survival. So now the estimate for sie has changed direction, and being big helps you survive that makes more sense. This main effect applies in the absence of predators, so we might interpret this as saying that bigger individuals die slightly less, when there are no predators around. The interaction of sie and predation tells us that being big hurts your survival, when predators are around. This implies that predators target big tadpoles. Now converting to odds: > exp( -coef(m2) ) m s p We start with.7: odds of survival. Being big increases the odds of survival by about 7%. Predators reduce odds of survival by about 0.2 = 0.88 = 88%. When a tadpole is big and predators are present, the odds of survival (already accounting for the main effects of sie and predation) are reduced by an additional 0.35 = 0.65 = 65%. (2) Confidence intervals for the second model I estimated above (using confint()) are: 2.5 % 97.5 % m s p We can again convert these to changes in odds: > exp( -confint(m2) ) Profiling % 97.5 % m s p So read the m line as the baseline odds of survival lie between 8.7: and 7.8:, with 95% confidence. The s line could be read as the proportional change in odds induced by being big lie between 2.24 times and 0.62 times the original odds. Clearly the effect of sie isn t estimated very precisely. The other two parameters, p and, however are consistent reductions in odds. So we can have more confidence about interpreting those effects. To compute the quadratic estimate likelihood intervals, you can use the formula I provided in lecture (and in the source file for this week): pse <- sqrt(diag( vcov(m2) )) m s p

5 Note that these are the same standard errors you get if you type summary(m2). To make these into 95% confidence intervals, use the -score that corresponds to a 95% interval,.96, and add/subtract.96 times the standard error to each parameter estimate: coef(m2) + pse*.96 # upper bounds coef(m2) - pse*.96 # lower bounds These intervals are very similar to the profile likelihood intervals we produced with confint(). One reason the quadratic estimation intervals (the ones that are estimated form the vcov matrix) are so similar to those from the likelihood profiles is that the profiles are shaped very much like quadratic functions, in this case. You can see this when you plot the profiles, in the square-root space, using plot(profile(m2)): Likelihood profile: m Likelihood profile: s % 95% 90% 80% 50% % 95% 90% 80% 50% m s Likelihood profile: p Likelihood profile: % 95% 90% 80% 50% % 95% 90% 80% 50% p If the lines were less straight, there would be more disagreement between the quadratic estimation intervals and the direct profile intervals. (3) Now to plot the chance of survival, using the estimates. There are many ways to do this, so you may have used a different approach. That s fine. My tactic here is to use a basic bar chart, because we actually have discrete predictor categories. There are four groups of tadpoles, and each has a single estimated chance of survival for every tadpole in a group. The groups are: () small/no-pred, (2) small/pred, (3) big/no-pred, (4) big/pred. The code below computes these four probabilities and makes a simple bar chart of them. p <- /(+exp( coef(m2)[] )) # small/no-pred 5

6 p2 <- /(+exp( coef(m2)[] + coef(m2)[2] )) # large/no-pred p3 <- /(+exp( coef(m2)[] + coef(m2)[3] )) # small/pred p4 <- /(+exp( coef(m2)[] + coef(m2)[2] + coef(m2)[3] + coef(m2)[4] )) # large/pred barplot( c(p,p2,p3,p4), names.arg=c("small/no-pred","large/no-pred", "small/pred","large/pred"), ylab="probability of survival" ) All this is copacetic with our previous interpretation of the parameters. Exercise 2: Making up data () The histograms from 00 runs of estimation parameters from simulated data are: Histogram of co[, ] Histogram of co[, 2] Frequency Frequency co[, ] co[, 2] The left histogram is the p parameter. The right one is the k parameter. The mean of each is 6.04 and 7.73, respectively. The original (true) values are p = 5 and k = 2. So there appears to be some bias in the estimates: p tends to be overestimated, and k tends to be underestimated. (This bias doesn t go away, as you increase the number of simulations.) We can get a single run a simulated data to compare against, using: d <- gen.dat( 20, 5, -2 ) Estimate it s parameters and compute confidence intervals: m <- mle2( d$y ~ dbinom(prob=/( + exp(p + k*d$x)),sie=), start=list(p=0,k=0) ) Coefficients: p k % 97.5 % p k Aside from the noted bias again, the histograms do provide a measure of the range of values that are plausible. The simulated distribution is much more skewed than the single sample estimates, though. This is due to the bias in estimation. So to answer the question, I would say the do not contain exactly the same information. Instead, we learn from the simulation exercise that the estimation is biased. 6

7 But look what happens when we make the true values p = 0 and k = 0. Now the simulated distributions and the confidence intervals from a single run are very similar. You can directly compare the implied confidence intervals of the simulated runs by find the 2.5% and 97.5% cutoffs in the histograms. R makes this easy, with the quantile function: d <- gen.dat( 20, 0, 0 ) m <- mle2( d$y ~ dbinom(prob=/( + exp(p + k*d$x)),sie=), start=list(p=0,k=0) ) confint(m) for ( i in :00 ) { d <- gen.dat( 20, 0, 0 ) m <- mle2( d$y ~ dbinom(prob=/( + exp(p + k*d$x)),sie=), start=list(p=0,k=0) ) if ( i== ) { co <- coef(m) else { co <- rbind(co,coef(m)) quantile( co[,], probs=c(0.025,0.975) ) quantile( co[,2], probs=c(0.025,0.975) ) (2) When we reduce the sample sie, everything gets wider: the histograms and the confidence intervals. Less data means less confidence. (3) Larger values of the parameters appear to induce more bias and make estimates harder. In particular, if the mean values tend to push the chance of seeing a towards either or 0, then it ll be very hard to estimate parameters, because the system is pushed against either the floor or ceiling. (4) To add another prediction variable: gen.dat2 <- function(n,p,k,k2) { x <- rnorm( n, 4, 2 ) x2 <- rnorm( n,, 2 ) pr <- /(+exp(p + k*x + k2*x2)) y <- rbinom( prob=pr, sie=, n=n ) data.frame(x=x,y=y,x2=x2) for ( i in :00 ) { d <- gen.dat2( 50, -, 3, ) m <- mle2( d$y ~ dbinom(prob=/( + exp(p + k*d$x + k2*d$x2)), sie=), start=list(p=0,k=0,k2=0) ) if ( i== ) { co <- coef(m) else { co <- rbind(co,coef(m)) 7

8 Exercise 3: Making up data again A Poisson random variable is a count of events that occur in a finite interval of time. The minimum is ero and the maximum is infinity. You can see what a Poisson distribution looks like with: hist(rpois(000,lambda=2)) The parameter lambda is the mean number of events. As you increase it, the distribution looks increasingly normal, but the values are still always integers (because counts are never fractions). All you have to do to convert the code form #2 into Poisson simulations is to replace the probability functions rbinom and dbinom with rpois and dpois. The trick, though, is to use exp() to ensure that the value of the mean number of events is always greater than ero. gen.dat.pois <- function(n,p,k) { x <- rnorm( n, 4, 2 ) y <- rpois( lambda=exp( p + k*x ), n=n ) data.frame(x=x,y=y) for ( i in :00 ) { d <- gen.dat.pois( 20,, 3 ) m <- mle2( d$y ~ dpois(lambda=exp( p + k*d$x ) ), start=list(p=0,k=0) ) if ( i== ) { co <- coef(m) else { co <- rbind(co,coef(m)) 8

maximum likelihood Maximum likelihood families nega2ve binomial ordered logit/probit

maximum likelihood Maximum likelihood families nega2ve binomial ordered logit/probit The story so far... We have some data We want to predict it, using other data We venture a hypothesis We write that