1 Statistics 249: Categorical Data/GLM Lecture 23

Size: px

Start display at page:

Download "1 Statistics 249: Categorical Data/GLM Lecture 23"

Curtis Powell
5 years ago
Views:

1 1 : 1.1 Review of Multiple Responses First: 1. Each response f(covariates = glm 2. How Responses are related to each other. A B A + B + A : B }{{} interaction 3. (1 + 2 global modela, B = f(covariates The colon and semi colon. Now: A : x means A is a function of x. If you had two separate models A : x and B : x then one could shortform: A : x & B : x A; B : x So those are two independent functions of the covariates Factor analysis: A : A + x This is a factor analysis where x is a continuous covariate. Response it η i for response a is α }{{} i +βx where β is the regresssion parameter. A different model: i level of factor A Is a model: Where the ith level has an ith Beta coefficient Another model A : A + Bx A : A + Ax η = α i + β i x In this model one has a two way reponse η i = α i + β j x where j is the level of factor B. 1.2 Chapter 8 Continuous Responses. Y s are a continuous responses. We ve previously seen the mean and variance a constant. In this case Var(Y = σ 2 µ 2 where E [Y ] = µ 1

2 1.2 Chapter 8 Continuous Responses. One could have used a log transformauton since this is equivalent to : Y log-normal log e Y = X T β+, varepsilon N(0, σ 2 However, one could also consider the Gamma model: Y i Gamma(µ i, ν i Where ν i = ω i ν and µ i = f(xβ and the weights ω i are known. We have: E [Y ] = µ Var(Y = µ2 ν ν 1 = σ 2 One can check this relationship if one has a close sample with similar mean. If the variance appears to be increasing quadratically with the mean, one can fit a Gamma distribution Why to use one model over the other? There has always been discussion of transformation and linear models. Why would one fit a gamma model? Most of the time: Underlying information about the process generating data Gamma often used as distribution of time arrivals (weighted sums Time arrival of the νth event has a gamma distribution. Another reason to prefer Gamma, advantage of link functions Link Functions Canonical link η = 1 µ, problem, can be negative. Log link Identity Link, literal fit One thing to note, especially for the canonical link, we wish that µ > 0. Previously, Canonical links were functions that took µ in a very small interval (In binomial 0 < p < 1 into all of r. On that scale, one could think of using a linear model to fit the transformation. But for the canonical link of the gamma distribution, one must suffer from the 0 to infinity range. If you have certain observations with exact sampling probability 0. Sometimes you need to worry about how the mean is related to the covariates. Sometimes µ changes in relationship to i in a systematic way. Y 1 Y 2... Y n Gamma µ i, }{{} ν same index for all observations 2

3 1.2 Chapter 8 Continuous Responses. If each of the Y i as a sample however, one would use the weights: Y i Gamma(µ i, ν i Often when one has observations Y i = 0 one should put a zero weight for the 0 samples. Of course if ν > 1 there should be zero probability in the model that Y i = 0. The model is very sensitive around 0, if you have a few estimates that are too close to zero, your estimates may have problems. In the log linear model log e Y if you get Y = 0 values, you should add a constant to all of the data. If Y has a negative value, you should just add movement away from 0. Unfortunately it is all too likely that you get bad point data like this Estimation of Dispersion Parameter i.e. estimating σ Assume that Y Gamma(µ, ν that is with σ 2 1 ν And estimate ν with maximum likelihood. 3

4 1.3 Second Half 2. For weights ω i = 1 l(ν, µ Y = n i=1 ν ( y i ν log e µ i + ν loge y i + ν log e ν log e Γ(ν = 0, derivative of log likelihood is 0 at the maximum point. ˆν,ˆµ l 3. ν 4. ( y i log ˆµ e ˆµ i + log e y i + log e ˆν + 1 Γ (ˆν i Γ(ˆν Remember that the deviance: D (y i, ˆµ = 2 ( y log i e ˆµ i + (y i ˆµ i (This is the definition of D the deviance divided by the dispersion parameter (scaled deviance So inserting the deviance D (y, ˆµ inside for the next equation, putting it into the first half of the equation 1 reaches: 1.3 Second Half ˆµ i ( 0 = D (y, ˆµ + 2n log e ˆν Γ (ˆν Γ(ˆν So if we solved this equation, we could have solved for ˆν. Given the same way we did with mixed effect models, when we wanted to estimate the σ 2 parameter. If we find the mle estimate, this estimate is usually biased. As before, the estimate for ˆν is biased. The adjusted equation for ˆν is similar, the same thing: Where: 2n ( log e ˆν Γ (ˆν Γ(ˆν p is the number of covariates on η = X T n p β }{{} p 1 pˆν 1 }{{} Adjustment to mle = D (y, ˆµ If we use the mle value for ν large (σ 2 is small, then one can simplify equation 2 (set all terms of order 1 for k 2 to zero. By solving that equation we find that: ν k ˆν 1 = D ( 6 + D Where we define the average of the deviances: 1.4 A Remark D = D D(y, ˆµ n Γ(ν = 1 ν exp ( k=1 ( 1 k S k ν k k Where S 1 is the Euler Mascheroni constant and S k = ζ(k where ζ( is the Riemann zeta function. Γ (ν Γ(ν = 1 ( 1 ν + S 1 + n + ν 1 n n=1 Provides an equation necessary for simplifying the result we saw. 4 (1 (2

5 1.5 Ben s shortcut 1.5 Ben s shortcut We know that: Hence we can write a approximation: ˆν log e(ˆν = Γ (ˆν γ(ˆν A taylor series approximation. Γ (ˆν Γ( nu ˆ = log e (ˆν Method of Moments estimator 1 2(ˆν 1 n=1 B 2n 2n( nu ˆ 1 More easy: ˆσ 2 = X2 n p Where X 2 = ( y ˆµ 2. ˆµ By definition o fthe methods of moments estimate we would have, as long as ˆβ was an unbiased estimate of ˆβ we should have a safe method for finding ˆσ Why do we care? ( Cov ˆβ σ 2 ( X T W X 1 ( (dµi 2 W = diag /V (µ i dη i I fthe σ 2 are unknown then estimating using maximum likelihood or methods of moments should be equivalent Last news Rima needs our title of our talks, and the references, soon. She would like us to write an evaluation of her performance, and we need to hand it in with a copy of the last problem set for additional points. 2 points if you do not pring out the evaluation sheets. 5

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general