maximum likelihood Maximum likelihood families nega2ve binomial ordered logit/probit

Size: px

Start display at page:

Download "maximum likelihood Maximum likelihood families nega2ve binomial ordered logit/probit"

Dortha Cunningham
5 years ago
Views:

1 maximum likelihood Maximum likelihood families nega2ve binomial ordered logit/probit

2 The story so far... We have some data We want to predict it, using other data We venture a hypothesis We write that hypothesis as a mathema2cal func2on We fit the func2on to the data

3 y i Zink(u = α + βx i,v = γ)

4 y i Zink(u = α + βx i,v = γ) Pr(y i u, v) = Zink(u, v)

5 previous regressions Gaussian Binomial Poisson y i N (α + βx i, σ) mle2( y ~ dnorm( mean=a+b*x, sd=s ), start=list(a=mean(y),b=0,s=1) ) lm( y ~ x )! " y i! Binom 1/(1 + exp(! + "x i ),N mle2( y ~ dbinom( prob=1/(1+exp(a+b*x), size=n ), start=list(a=0,b=0) ) y i! Pois(! + "x i ) mle2( y ~ dpois( lambda=a+b*x ), start=list(a=mean(y),b=0) )

6 new regressions Nega2ve binomial common in ecology Ordered logit/probit common in economics/poli sci/sociology

7 negative binomial y i Nbinom(µ, n) Model of counts (like Poisson and Binomial) No upper bound (like Poisson) mu: mean n: dispersion (smaller means more dispersed)

8 negative binomial mu=2,size=10 mu=2,size=2 Frequency y mu=6,size=10 Frequency Frequency y mu=6,size=2 Frequency y y

9 Nbinom converges to Poisson, as n goes to infinity. nbinom, size=1 nbinom, size=10 nbinom, size=1000 Frequency y y poisson, same mean Frequency Frequency poisson, same mean Frequency Frequency y poisson, same mean Frequency y y y

10 negative binomial As a result, ecologists and social scien2sts almost always use nbinom as a flexible version of Poisson. nbinom some2mes called overdispersed Poisson

11 negative binomial Back to del Norte salamanders frequency salamander density

12 negative binomial Q: Poisson or neg binom bener? A: Let the data decide. mp <- mle2( d$salaman ~ dpois( lambda=a ), start=list(a=mean(d$salaman)) ) mnb <- mle2( d$salaman ~ dnbinom( mu=a, size=exp(n) ), start=list(a=mean(d$salaman),n=0) )

13 negative binomial frequency Poisson predic2on Neg binom predic2on salamander density

14 Some2mes we have ordered categories: cold,hot small,medium,large strongly disagree,disagree,agree,strongly agree 1,2,3,4,5,6,7 Not Gaussian, because discrete and bounded Not counts (binomial,poisson,neg binom), because distance between each category unknown

15 Usual way to model this problem is by slicing up a probability distribu2on into n slices, where n is the number of categories probability

16 Easiest thing to do is compute: Probability a value y is less than or equal to a category value k, for a set of cutoffs z. (Cumula2ve probability.) Pr(y i k k, z) = exp(z k ) Pr(y i k k, z) =p(z k )

17 p(z 1 ) p(z 2 ) p(z 3 ) p(z 4 ) 4 categories: 1,2,3,4 need four z values, to define slices

18 p(z 1 ) p(z 2 ) p(z 3 ) p(z 4 ) z_4 you get for free! Must choose value of z_4 so that Pr(y i k k =4, z) =1 (y_i is always less than or equal to 4, because 4 is the max.)

19 other z values are fit to the data Example: equally likely to observe 1,2,3 or 4: p(z 1 ) p(z 2 ) p(z 3 ) p(z 4 )

20 Example: more likely to observe p(z 1 ) p(z 2 ) p(z 3 ) p(z 4 )

21 Example: more likely to observe p(z 1 ) p(z 2 ) p(z 3 ) p(z 4 )

22 Example: more likely to observe 3 and p(z 1 ) p(z 2 ) p(z 3 ) p(z 4 )

23 How to choose those z values? If we use logis2c for probability then: p(z k ) = Pr(y i k k, z) = exp(z k ) If you know the p(z_k) you want, then you can use algebra to solve for z_k. z k = ln ( ) 1/p(z k ) 1

24 In prac2ce, we use maximum likelihood to find values of z_1,z_2,...,z_n. But s2ll important to understand what they correspond to, in the data.

25 An example: 1 4 ra2ngs of sa2sfac2on with course content > stem(y) The decimal point is at the

26 First define a func2on to make our lives easier: logit <- function(x) 1/(1+exp(x))

27 Now make a list of z values (just some star2ng values, for example): z <- c( 1, 0, -1, -Inf ) Probability y_i <= k (in 1 4) is: > logit(z) [1]

28 > logit(z) [1] Why logit(z)? Because we said so just need some (cumula2ve) probability distribu2on.

29 Now to fit our z values to the data, need to write a probability density func3on (like dbinom, dpois, dnbinom). This func2on takes a list of y values and parameters and returns the likelihood of each value, given the parameters.

30 making your own density function For example, write your own Poisson density: Pr(x λ) = λx exp( λ) x! my.dpois <- function( x, lambda ) { lambda^x * exp(-lambda) / factorial(x) }

31 making your own density function Produces the same results as the built in func2on. my.dpois <- function( x, lambda ) { lambda^x * exp(-lambda) / factorial(x) } > my.dpois(1,2) [1] > dpois(1,2) [1]

32 making your own density function Structure of a density func2on: values to compute likelihood of; must be called x because mle2 expects it. whatever parameters it uses a parameter that tells the func2on whether to return likelihood or log likelihood dsomename <- function( x, parameters, log=false ) { code that computes likelihood or log likelihood }

33 making your own density function my.dpois <- function( x, lambda, log=false ) { p <- lambda^x * exp(-lambda) / factorial(x) if (log==true) p <- log(p) p }

34 logit(z_1) p(z 1 ) p(z 2 ) p(z 3 ) p(z 4 ) dorderedlogit(1,z) z <- c(z1,z2,z3,z4)

35 logit(z_1) logit(z_2) p(z 1 ) p(z 2 ) p(z 3 ) p(z 4 ) dorderedlogit(1,z) dorderedlogit(2,z) z <- c(z1,z2,z3,z4)

36 logit(z_1) logit(z_2) logit(z_3) p(z 1 ) p(z 2 ) p(z 3 ) p(z 4 ) dorderedlogit(1,z) dorderedlogit(2,z) dorderedlogit(3,z) z <- c(z1,z2,z3,z4)

37 logit(z_1) logit(z_2) logit(z_3) logit(z_4) p(z 1 ) p(z 2 ) p(z 3 ) p(z 4 ) dorderedlogit(1,z) dorderedlogit(2,z) dorderedlogit(3,z) dorderedlogit(4,z) z <- c(z1,z2,z3,z4)

38 values to compute likelihood of; must be called x because mle2 expects it. our list of z values a parameter that tells the func2on whether to return likelihood or log likelihood dorderlogit <- function( x, z, log=false ) { p <- logit( z[x] ) # prob y <= k nz <- c( Inf, z ) np <- logit( nz[x] ) # prob y <= k-1 p <- p - np # subtract to get likelihood of y==k if ( log==true ) p <- log(p) p }

39 Now we can fit, as usual: m <- mle2( y ~ dorderlogit( z=c(z1,z2,z3,-inf) ), start=list( z1=1, z2=0, z3=-1 ) ) Coefficients: z1 z2 z

40 Plokng the results: plot.new() frame() lines( c(0,1), c(.5,.5) ) lines( c( logit(coef(m)[1]), logit(coef(m)[1]) ), c(0,1) ) # z1 lines( c( logit(coef(m)[2]), logit(coef(m)[2]) ), c(0,1) ) # z2 lines( c( logit(coef(m)[3]), logit(coef(m)[3]) ), c(0,1) ) # z3 lines( c( logit(-inf), logit(-inf) ), c(0,1) ) # z4 = -Inf

41 That s nice, but we want to predict ordinal outcome with some other variable. Suppose we have the gender of each respondent. > y [1] [47] > f [1] [47]

42 Usual solu2on is to treat the z values as category specific intercepts and add a linear model to each: Pr(y i k k, z, β,x i ) = logit(z k + βx i )

43 Just the intercepts: > z <- c( 1, 0, -1, -Inf ) > logit( z ) [1] With covariate: > logit( z - 1 ) [1] > logit( z + 1 ) [1]

44 New density func2on: dorderlogit2 <- function( x, z, model, log=false ) { p <- logit( z[x] + model ) # prob y <= k } nz <- c( Inf, z ) np <- logit( nz[x] + model ) # prob y <= k-1 p <- p - np # subtract to get likelihood of y==k if ( log==true ) p <- log(p) p

45 New fit: m2 <- mle2( y ~ dorderlogit2( z=c(z1,z2,z3,-inf), model=b*f ), start=list( z1=1, z2=0, z3=-1, b=0 ) ) Coefficients: z1 z2 z3 b

46 Visualize as before: males females Coefficients: z1 z2 z3 b

47 Another way to visualize: probability Gender (female=1)

48 probability probability Gender (female=1) Gender (female=1)

49 You can compute propor2onal odds, as with regular logit. But now it is propor2onal change in cumula3ve probability. > # proportional cumulative odds calculation > z2 <- coef(m2)[2] > b <- coef(m2)[4] > # original odds > o1 <- logit(z2)/(1-logit(z2)) > # female odds > o2 <- logit(z2+b)/(1-logit(z2+b)) > # show same as exp(-b) > o2/o1 # proportional cumulative odds for category 2 z > exp(-b) b

50 The easy way is to use the polr() func2on, which does almost exactly what we just did manually. polr() just uses logit( z+b) where we used logit(z +b). library(mass) mpolr <- polr( as.ordered(y) ~ f ) Coefficients: f Intercepts:

51 polr() just uses logit( z+b) where we used logit(z+b). library(mass) mpolr <- polr( as.ordered(y) ~ f ) Coefficients: f Intercepts: mle2(minuslogl = y ~ dorderlogit2(z = c(z1, z2, z3, - Inf), model = b * f), start = list(z1 = 1, z2 = 0, z3 = -1, b = 0)) Coefficients: z1 z2 z3 b

52 Ordered logit lets you: Model ordered discrete data Derives histograms that change as predictor variables change Hard to interpret without visualizing

53 ordered _obit Other probability distribu2ons work fine probit => gaussian density ( probit is just the name sta2s2cians give to the cumula2ve normal.) A Tobit regression is not ordered at all it s a kind of censored gaussian model. We won t talk about it, but it s common in Econ and PoliSci.

Maximum Likelihood Exercises SOLUTIONS

Maximum Likelihood Exercises SOLUTIONS Exercise : Frog covariates () Before we can fit the model, we have to do a little recoding of the data. Both ReedfrogPred$sie and ReedfrogPred$pred are text data,