10. Generalized linear models

Size: px

Start display at page:

Download "10. Generalized linear models"

George Shaw
5 years ago
Views:

1 10. Generalzed lnear models 10.1 Homogeneous models Exponental famles of dstrbutons, lnk functons, lkelhood estmaton 10.2 Example: Tort flngs 10.3 Margnal models and GEE 10.4 Random effects models 10.5 Fxed effects models Maxmum lkelhood, condonal lkelhood, Posson data 10.6 Baesan Inference Appendx 10A Exponental famles of dstrbutons

2 10.1 Homogeneous models Secton Outlne Exponental famles of dstrbutons Lnk functons Lkelhood estmaton In ths secton, we consder onl ndependent responses. No seral correlaton. No random effects that would nduce seral correlaton.

3 Exponental famles of dstrbutons The basc one parameter exponental faml s θ b( θ ) p(, θ, φ) = exp + S(, φ) φ Here, s a response and θ s the parameter of nterest. The parameter φ s a scale parameter that we often wll assume s known. The term b(θ) depends onl on the parameter θ, not the responses. S(, φ ) depends onl on the responses and the scale parameter, not the parameter θ. The response ma be dscrete or contnuous. Some straghtforward calculatons show that E = b (θ) and Var = b (θ) φ.

4 Specal cases of the basc exponental faml Normal The probabl dens functon s f( πσ 2 1 ( µ ) ( µ µ / 2) 1, µ, σ ) = exp = exp ln πσ 2σ σ 2σ 2 Take µ = θ, σ 2 = φ, b(θ) = θ 2 /2 and S(, φ ) = - 2 / (2φ) - ln(2 π φ))/2. Note that E = b (θ) = θ =µand Var = b (µ) σ 2 = σ 2. Bnomal, n trals and prob π of success The probabl mass functon s n n π f(, π ) = π (1 π ) = exp ln + n ln(1 π ) (1 π ) ( 2 ) n + ln Take ln (π/(1-π))= log (π) = θ, 1 = φ, b(θ) = n ln (1 + e θ ) and S(, φ ) = ln((n choose )). Note that E = b (θ) = n e θ /(1 + e θ ) = n π and Var = b (θ) (1) = n e θ /(1 + e θ ) 2 = n π(1-π), as antcpated.

5 Another specal case of the basc exponental faml Posson The probabl mass functon s ( λ) λ exp f(, λ) = = exp λ! Take ln (λ) = θ, 1 = φ, b(θ) = e θ and S(, φ ) = -ln(!)). Note that E = b (θ) = e θ = λ and Var = b (θ) (1) = e θ = λ, as antcpated. ( ln λ ln!)

6 Dstrbuton General Table 10A.1 Selected Dstrbutons of the One-Parameter Exponental Faml Parameters Dens Components E Var or Mass Functon θ, φ θ b( θ ) θ φ b(θ) S(,φ) b (θ) b (θ) φ exp + S(, φ) φ 2 1 ( µ ) µ σ θ ln(2πφ) θ = µ φ = σ 2 exp 2 σ 2π 2σ 2 + 2φ 2 Normal µ, σ 2 Bnomal π n n π π ( 1 π ) ln 1 π Posson λ exp( λ) Gamma α, β α β Γ( α)! 1 θ n ln ( 1 + e ) θ n e ln n θ 1 + e = nπ n e θ θ ( 1 + e ) 2 = n π ( 1 π ) λ ln λ 1 e θ -ln (!) e θ = λ e θ = λ α 1 exp ( β ) β α 1 -ln (-θ) 1 φ lnφ α ln ( 1 Γ φ ) + ( φ ( ) 1 1) ln 1 α θ = φ α 2 = 2 β θ β

7 Lnk functons To lnk up the unvarate exponental faml wh regresson problems, we defne the sstematc component of to be η = x β. The dea s to now choose a lnk between the sstematc component and the mean of, sa µ, of the form: η = g(µ ). g(.) s the lnk functon. Lnear combnatons of explanator varables, η = x β, ma var between negatve and posve nfn. However, means ma be restrcted to smaller range. For example, Posson means var between zero and nfn. The lnk functon serves to map the doman of the mean functon onto the whole real lne.

8 Bernoull llustraton of lnks Bernoull means var between 0 and 1, although lnear combnatons of explanator varables ma var between negatve and posve nfn. Here are three mportant examples of lnk functons for the Bernoull dstrbuton: Log: η = g(µ) = log(µ) = ln (µ/(1 µ)). Prob: η = g(µ) = Φ -1 (µ), where Φ -1 s the nverse of the standard normal dstrbuton functon. Complementar log-log: η = g(µ) = ln ( -ln(1 µ) ). Each functon maps the un nterval (0,1) onto the whole real lne.

9 Canoncal lnks As we have seen wh the Bernoull, there are several lnk functons that ma be suable for a partcular dstrbuton. When the sstematc component equals the parameter of nterest (η = θ ), ths s an ntuvel appealng case. That s, the parameter of nterest, θ, equals a lnear combnaton of explanator varables, η. Recall that η = g(µ) and µ = b (θ). Thus, f g -1 = b, then η = g(b (θ)) = θ. The choce of g, such that g -1 = b, s called a canoncal lnk. Examples: Normal: g(θ) = θ, Bnomal: g(θ) = log(θ), Posson: g(θ) = ln θ.

10 Estmaton Begn wh lkelhood estmaton for canoncal lnks Consder responses, wh mean µ, sstematc component η = g(µ ) = x β and canoncal lnk so that η = θ. Assume the responses are ndependent. Then, the log-lkelhood s ln p( ) = θ φ b ( θ ) + S(, φ ) = ( x β ) b( x β ) + S( φ, φ )

11 The log-lkelhood s MLEs - Canoncal lnks ln p( ) = ( x β) b( x β) Takng the partal dervatve wh respect to β elds the score equatons: 0= because µ = b (θ ) = b (x β ). Thus, we can solve for the mle s of β through: + S( φ 0 = Σ x ( - µ )., φ ) ( b ( xβ) ) x ( ) = µ x φ Ths s a specal case of the method of moments. φ

12 MLEs - general lnks For general lnks, we no longer assume the relaton θ = x β. We assume that β s related to θ through µ = b (θ ) and η = x β = g(µ ). Recall that the log-lkelhood s ( ) θ b θ ln p( ) = + S(, φ) φ Further, E = µ and Var = b (θ ) / φ. The jth element of the score functon s θ µ ln p( ) = β j β j φ because b (θ ) = µ

13 MLEs - more on general lnks To elmnate θ, we use the chan rule to get Thus, µ β Ths elds j = b ( θ β θ β j j ln p( ) ) = = = b ( θ µ β j β j β j Ths s called the generalzed estmatng equatons form. θ ) β φ Var µ j = Var φ θ β 1 ( Var ) ( µ ) j

14 Overdsperson When ftng models to data wh bnar or count dependent varables, s common to observe that the varance exceeds that antcpated b the f of the mean parameters. Ths phenomenon s known as overdsperson. A probablstc models ma be avalable to explan ths phenomenon. In man suatons, analsts are content to postulate an approxmate model through the relaton Var = σ 2 φ b (x β) / w. The scale parameter φ s specfed through the choce of the dstrbuton The scale parameter σ 2 allows for extra varabl. When the addonal scale parameter σ 2 s ncluded, s customar to estmate b Pearson s ch-square statstc dvded b the error degrees of freedom. That s, ( b ( x b MLE ) ˆ σ = w N K φ b ( x b ) MLE

15 10.2 Example: Tort flngs Table 10.2 State Characterstcs Dependent Varable NUMFILE Number of flngs of tort actons aganst nsurance companes. State Legal Characterstcs JSLIAB An ndcator of jont and several labl reform. COLLRULE An ndcator of collateral source reform. CAPS An ndcator of caps on non-economc reform. PUNITIVE An ndcator of lms of punve damage State Economc and Demographc Characterstcs POP The state populaton, n mllons. POPLAWYR The populaton per lawer. VEHCMILE Number of automobles mles per mle of road, n thousands. POPDENSY Number of people per ten square mles of land. WCMPMAX Maxmum workers compensaton weekl benef. URBAN Percentage of populaton lvng n urban areas. UNEMPLOY State unemploment rate, n percentages. Source: An Emprcal Stud of the Effects of Tort Reforms on the Rate of Tort Flngs, unpublshed Ph.D. Dssertaton, Han-Duck Lee, Unvers of Wsconsn (1994).

16 Table 10.3 Averages wh Explanator Indcator Varables Explanator Varable JSLIAB COLLRULE CAPS PUNITIVE Average Explanator Varable Average NUMFILE When Explanator Varable = 0 15,530 20,727 24,682 17,693 When Explanator Varable = 1 25,967 20,027 6,727 26,469 Table Summar Statstcs for Other Varables Varable Mean Medan Mnmum Maxmum Standard Correlaton wh NUMFILE devaton NUMFILE POP POPLAWYR VEHCMILE POPDENSY WCMPMAX URBAN UNEMPLOY

17 Offsets We assume that s Posson dstrbuton wh parameter POP exp(x β), where POP s the populaton of the h state at tme t. In GLM termnolog, a varable wh a known coeffcent equal to 1 s known as an offset. Usng logarhmc populaton, our Posson parameter for s ( ) ( ) ( β) exp, ln POP + x, 1β1 + L + x K β K = exp ln POP + xβ = POP exp x An alternatve approach s to use the average number of tort flngs as the response and assume approxmate normal. Note that n the Posson model above the expectaton of the average response s E ( / POP ) = exp( x β) whereas the varance s Var / POP = exp x β / POP ( ) ( )

18 Tort flngs Purpose: to understand was n whch state legal, economc and demographc characterstcs affect the number of flngs. Table 10.3 suggests more flngs under JSLIAB and PUNITIVE but less under CAPS Table 10.5 All varables under the homogenous model are statstcall sgnfcant However, estmated scale parameter seems mportant Here, onl JSLIAB s (posvel) statstcall sgnfcant Tme (categorcal) varable seems mportant

19 Table 10.5 Tort Flngs Model Coeffcent Estmates Based on N = 112 observatons from n = 19 states and T = 6 ears. Logarhmc populaton s used as an offset. Homogeneous Posson model Model wh estmated scale parameter Model wh scale parameter and tme categorcal varable Varable Parameter p-values Parameter p-values Parameter p-values estmate estmate estmate Intercept < <.0001 POPLAWYR/ < VEHCMILE/ < < <.0001 POPDENSY/ < WCMPMAX/ < URBAN/ < UNEMPLOY < JSLIAB < COLLRULE < CAPS < PUNITIVE < Scale Devance 118, , ,496.4 Pearson Ch-Square 129, , ,073.9

20 10.3 Margnal models Ths approach reduces the relance on the dstrbutonal assumptons b focusng on the frst two moments. We frst assume that the varance s a known functon of the mean up to a scale parameter, that s, Var = v(µ ) φ. Ths s a consequence of the exponental faml, although now s a basc assumpton. That s, n the GLM settng, we have Var = b (θ ) φ and µ = b (θ ). Because b(.) and φ are assumed known, Var s a known functon of µ. We also assume that the correlaton between two observatons whn the same subject s a known functon of ther means, up to a vector of parameters τ. That s corr( r, s ) = ρ(µ r, µ s, τ), for ρ(.) known.

21 Margnal model Ths framework ncorporates the lnear model ncel; we smpl use a GLM wh a normal dstrbuton. However, for nonlnear suatons, a correlaton s not alwas the best wa to capture dependences among observatons. Here s some notaton to help see the estmaton procedures. Defne µ = (µ 1, µ 2,..., µ T ) to be the vector of means for the h subject. To express the varance-covarance matrx, we defne a dagonal matrx of varances V = dag(v(µ 1 ),..., v( µ T ) ) and the matrx of correlatons R (τ) to be a matrx wh ρ(µ r, µ s, τ) n the rth row and sth column. Thus, Var = V 1/2 R (τ) V 1/2.

22 Generalzed estmatng equatons These assumptons are suable for a method of moments estmaton procedure called generalzed estmatng equatons (GEE) n bostatstcs, also known as the generalzed method of moments (GMM) n econometrcs. GEE wh known correlaton parameter Assumng τ s known, the jth row of the GEE s n 0 = K G µ = 1 Here, the matrx 1 ( b) V ( b)( µ ( b)) µ G β = µ 1 T µ (, τ ) L β β s T x K*. For lnear models wh µ = z α + x β, ths s the GLS estmator ntroduced n Secton 3.3.

23 Consstenc of GEEs The soluton, b EE, s asmptotcall normal wh covarance matrx Var n = 1 b EE G µ ( b) V ( b) G µ = 1 ( b) 1 Because ths s a functon of the means, µ, can be consstentl estmated.

24 Robust estmaton of standard errors emprcal standard errors ma be calculated usng the followng estmator of the asmptotc varance of b EE ( )( ) = = = n n n µ µ µ µ µ µ G V G G V µ µ V G G V G

25 GEE - correlaton parameter estmaton For GEEs wh unknown correlaton parameters, Prentce (1988) suggests usng a second estmatng equaton of the form: * E ( ) 1 * * W E = τ 0 where * 2 2 L L L = ( 2 ) Dggle, Lang and Zeger (1994) suggest usng the dent matrx for most dscrete data. However, for bnar responses, the note that the last T observatons are redundant because = 2 and should be gnored. the recommend usng 1 T 1 2 T ( Var( ) Var( )) W = dag 2 L, 1 1 T T

26 Tort flngs Assume an ndependent workng correlaton Ths elds at the same parameter estmators as n Table 10.5, under the homogenous Posson model wh an estmated scale parameter. JSLIAB s (posvel) statstcall sgnfcant, usng both model-based and robust standard errors. To test the robustness of ths model f, we f the same model wh an AR (1) workng correlaton. Agan, JSLIAB s (posvel) statstcall sgnfcant. Interestng that CAPS s now borderlne but n the oppose drecton suggested b Table 10.3

27 Parameter Estmate Emprcal Standard Error Table 10.6 Comparson of GEE Estmators. All models use an estmated scale parameter. Logarhmc populaton s used as an offset. Independent Workng Correlaton AR(1) Workng Correlaton Model-Based Standard Error Estmate Emprcal Standard Error Model-Based Standard Error Intercept * * POPLAWYR/ VEHCMILE/ * * POPDENSY/ * * WCMPMAX/ URBAN/ UNEMPLOY 0.087* * JSLIAB 0.177* * COLLRULE CAPS * PUNITIVE Scale AR(1) Coeffcent The astersk (*) ndcates that the estmate s more than twce the emprcal standard error, n absolute value.

28 10.4 Random effects models The motvaton and samplng ssues regardng random effects were ntroduced n Chapter 3. The model s easest to ntroduce and nterpret n the followng herarchcal fashon: 1. Subject effects {α } are a random sample from a dstrbuton that s known up to a vector of parameters τ. 2. Condonal on {α }, the responses { 1, 2,..., T } are a random sample from a GLM wh sstematc component η = z α + x β.

29 Random effects models Ths model s a generalzaton of: 1. The lnear random effects model n Chapter 3 - use a normal dstrbuton. 2. The bnar dependent varables random effects model of Secton usng a Bernoull dstrbuton. (In Secton 9.2, we focused on the case z =1.) Because we are samplng from a known dstrbuton wh a fne/small number of parameters, the maxmum lkelhood method of estmaton s readl avalable. We wll use ths method, assumng normall dstrbuted random effects. Also avalable n the lerature s the EM (for expectatonmaxmzaton) algorhm for estmaton - See Dggle, Lang and Zeger (1994).

30 Random effects lkelhood Condonal on α, the lkelhood for the h subject at the tth observaton s θ b( θ ) exp + S(, φ) φ where b (θ ) = E ( α ) and η = z α + x β = g(e ( α ) ). Condonal on α, the lkelhood for the h subject s: exp t θ b φ ( θ ) + S(, φ) We take expectatons over α to get the (uncondonal) lkelhood. To see ths explcl, let s use the canoncal lnk so that θ = η. The (uncondonal) lkelhood for the h subject s z a + x β b z a + x β l = exp S(, φ) exp d G( a) t t φ Hence, the total log-lkelhood s Σ ln l. The constant Σ S(, φ ) s unmportant for determnng mle s. Although evaluatng, and maxmzng, the lkelhood requres numercal ntegraton, s eas to do on the computer. ( ) ( )

31 Random effects and seral correlaton We saw n Chapter 3 that permtng subject-specfc effects, α, to be random nduced seral correlaton n the responses. Ths s because the varance-covarance matrx of s no longer dagonal. Ths s also true for the nonlnear GLM models. To see ths, let s use a canoncal lnk and recall that E ( α ) ) = b (θ ) = b (η ) = b (α + x β).

32 Covarance calculatons The covarance between two responses, 1 and 2, s Cov( 1, 2 ) = E 1 2 -E 1 E 2 = E {b (α +x 1 β) b (α +x 2 β)} -E b (α +x 1 β) E b (α +x 2 β) To see ths, usng the law of erated expectatons, E 1 2 = E E ( 1 2 α ) = E {E ( 1 α ) E( 2 α )} = E {b (α + x 1 β) b (α + x 2 β)}

33 More covarance calculatons Normal For the normal dstrbuton we have b (a) = a. Thus, Cov( 1, 2 ) = E {(α + x 1 β) (α + x 2 β)} - E (α + x 1 β) E (α + x 2 β) = E α 2 + (x 1 β) (x 2 β)-(x 1 β) (x 2 β)= Varα. For the Posson, we have b (a) = e a. Thus, E = E b (α + x β) = E exp(α + x β) = exp(x β) E exp(α ) and Cov( 1, 2 ) = E {exp(α + x 1 β) exp(α + x 2 β)} - exp((x 1 +x 2 ) β) {E exp(α )} 2 = exp((x 1 +x 2 ) β) {E exp(2α) - (E exp(α)) 2 } = exp((x 1 +x 2 ) β) Var exp(α).

34 Random effects lkelhood Recall, from Secton 10.2, that the (uncondonal) lkelhood for the h subject s ( ) ( ) z a + x β b z a + x β l = exp S(, φ) exp d G( a) t t φ Here, we use z = 1,φ = 1, and g(a) s the dens of α. For the Posson, we have b(a) = e a, and S(, φ) = -ln(!), so the lkelhood s l = exp ln(!) exp ( ( a + x β ) exp( a + x β )) g( a) da t t exp ln(!) exp = x β exp β t t t ( a exp( a + x )) g( a) da As before, evaluatng and maxmzng the lkelhood requres numercal ntegraton, et s eas to do on the computer.

35 Table 10.7 Tort Flngs Model Coeffcent Estmates Random Effects Logarhmc populaton s used as an offset. Homogeneous Model wh Random Effects Model estmated scale parameter Varable Parameter p-values Parameter estmate p-values estmate Intercept < <.0001 POPLAWYR/ <.0001 VEHCMILE/ < POPDENSY/ <.0001 WCMPMAX/ <.0001 URBAN/ <.0001 UNEMPLOY <.0001 JSLIAB COLLRULE CAPS PUNITIVE State Varance Log Lkelhood 119,576 15,623

36 10.5 Fxed effects models Consder responses, wh mean µ, sstematc component η = g(µ ) = z α + x β and canoncal lnk so that η = θ. Assume the responses are ndependent. Then, the log-lkelhood s ln p( ) = = ( θ ) + S(, φ) Thus, the responses depend on the parameters through onl summar statstcs. θ φ ( z α + x β ) b( z α + x β ) + S( φ That s, the statstcs Σ t z are suffcent for α. The statstcs Σ x are suffcent for β. Ths s a convenent propert of the canoncal lnks. It s not avalable for other choces of lnks. b, φ )

37 The log-lkelhood s ln p( ) MLEs - Canoncal lnks = ( z α + x β) b( z α + x β) + S( φ Takng the partal dervatve wh respect to α elds: z b ( z α + x β) z µ 0 = = t φ t φ ( ) ( ) because µ = b (θ ) = b (z α + x β ). Takng the partal dervatve wh respect to β elds: ( ( )) ( ), φ ) x b zα + xβ x µ 0 = = φ φ Thus, we can solve for the mle s of α and β through: 0 = Σ t z ( - µ ), and 0 = Σ x ( - µ ). Ths s a specal case of the method of moments. Ths ma produce nconsstent estmates of β, as we have seen n Chapter 9.

38 Table 10.8 Tort Flngs Model Coeffcent Estmates Fxed Effects All models have an estmated scale parameter. Logarhmc populaton s used as an offset. Homogeneous Model Model wh state categorcal varable Model wh state and tme categorcal varables Varable Parameter p-values Parameter p-values Parameter p-values estmate estmate estmate Intercept <.0001 POPLAWYR/ VEHCMILE/ < POPDENSY/ WCMPMAX/ URBAN/ UNEMPLOY JSLIAB COLLRULE CAPS PUNITIVE Scale Devance 118, , ,834.2 Pearson Ch-Square 129, , ,763.0

39 Condonal lkelhood estmaton Assume the canoncal lnk so that θ = η = z α + x β. Defne the lkelhood for a sngle observaton to be ( z α + x β) b( z α + x β) p(, α, β) = exp + S(, φ) φ Let S be the random vector representng Σ t z and let sum be the realzaton of Σ t z. Recall that Σ t z are suffcent for α. The condonal lkelhood of the data set s n p( 1, α, β)p( 2, α, β) Lp( T, α, β) = = 1 Prob( S sum ) Ths lkelhood does not depend on {α }, onl on β. Maxmzng wh respect to β elds root-n consstent estmates. The dstrbuton of S s mess and s dffcult to compute.

40 Posson dstrbuton The Posson s the most wdel used dstrbuton for counted responses. Examples nclude the number of mgrants from state to state and the number of tort flngs whn a state. A feature of the fxed effects verson of the model s that the mean equals the varance. To llustrate the applcaton of Posson panel data models, let s use the canoncal lnk and z = 1, so that ln E ( α ) = g(e ( α ) ) = θ = η = α + x β. Through the log functon, lnks the mean to a lnear combnaton of explanator varables. It s the bass of the so-called log-lnear model.

41 Condonal lkelhood estmaton We frst examne the fxed effects model and thus assume that {α } are fxed parameters. Thus, E = exp (α + x β). The dstrbuton s ( E ) exp( - E ) p(, α, β) =! From Secton 10.1, Σ t s a suffcent statstc for α. The dstrbuton of Σ t turns out to be Posson, wh mean exp(α ) Σ t exp(x β). Note that the rato of means, E exp( x β) π = = E exp( x β) does not depend on α. t t

42 Condonal lkelhood detals Thus, as n Secton 10.1, the condonal lkelhood for the h subject s ) Pr ob( ),, p( ),, )p(,, p( 2 1 = t T S β α β α β α L ( ) ( ) ( ) ( )! E - exp E!! E - exp E E = t t t T T T t T L L L

43 Condonal lkelhood detals where = Ths s a multnomal dstrbuton. = 1! t! L! T ( ) ( ) 1 E L E 1 t E t! t π 1! L T! t E exp ( x β) π = = E exp ( x β) t t T T

44 Multnomal dstrbuton Thus, the jont dstrbuton of 1,..., T gven Σ t has a multnomal dstrbuton. The condonal lkelhood s: Takng partal dervatves elds: ln β L = n t = 1 1! L T! where π ( β ) =. exp ( x β ) t Thus, the condonal MLE, b, s the soluton of: L ln π β =! ( β ) = x x π ( β ) exp ( x β ) x xπ = t ( b) 0 t t π

9. Binary Dependent Variables

9. Binary Dependent Variables 9. Bnar Dependent Varables 9. Homogeneous models Log, prob models Inference Tax preparers 9.2 Random effects models 9.3 Fxed effects models 9.4 Margnal models and GEE Appendx 9A - Lkelhood calculatons