Comparing groups using predicted probabilities

Size: px

Start display at page:

Download "Comparing groups using predicted probabilities"

Cuthbert Ezra Boyd
6 years ago
Views:

1 Comparing groups using predicted probabilities J. Scott Long Indiana University May 9, 2006 MAPSS - May 9, Page 1

2 The problem Allison (1999): Di erences in the estimated coe cients tell us nothing about the di erences in the underlying impact of x on the two groups. MAPSS - May 9, Page 2

3 Overview 1. Does the e ect of x on y di er across groups? 2. Comparing s across groups in the LRM 3. Problems with comparing s in logit and probit models 4. Tests and CIs for comparing Pr (y = 1) across groups 5. Example of gender di erences in tenure 6. Complications due to nonlinearity of models MAPSS - May 9, Page 3

4 LRM - Chow test 1. For example, Men: y = m + m educ educ + m ageage + " Women: y = w + w educ educ + w ageage + " 2. Do men and women have the same return for education? 3. We compute: z = H 0 : m educ = w educ b m educ b w educ r Var b m educ + Var b w educ MAPSS - May 9, Page 4

5 Logit and probit using latent y 1. Chow type test is invalid for logit & probit (Allison 1999). 2. Structural model with a latent y : y = + x + " 3. Error is normal for probit or logistic for logit: " f (0; " ) 4. We only observed a binary y: y = ( 1 if y > 0 0 if y 0 5. Graphically... MAPSS - May 9, Page 5

6 Logit and probit using latent y MAPSS - May 9, Page 6

7 y and Pr (y = 1) 1. Link to observed data: Pr (y = 1 j x) = Pr (y > 0 j x) = Pr (" < [ + x] j x) 2. Since y is not observed, is only identi ed up to a scale factor. MAPSS - May 9, Page 7

8 Identi cation of and " 1. Model A: 2. Model B: ya = a + a xx + " a where a = 1 = 6 + 1x + " a 3. Where: yb = b + b x x + " b where b = 2 = x + " b b = 2 a b x = 2 a x b = 2 a MAPSS - May 9, Page 8

9 Pr (y = 1 j x) in model A MAPSS - May 9, Page 9

10 Pr (y = 1 j x) in model B MAPSS - May 9, Page 10

11 The problem and a solution In terms of Pr(y = 1), these are empirically indistinguishable: Case 1: A change in x of 1 when a x = 1 and a = 1: Case 2: A change in x of 1 when b x = 2 and b = 2. MAPSS - May 9, Page 11

12 Implications for group comparisons 1. Comparing propensity for tenure for men and women: Men : y = m + m articles articles + " m Women : y = w + w articles articles + " w 2. Assume the s are equal: m articles = w articles 3. Assume women have more unobserved heterogeneity: w > m 4. Now we estimate the model... MAPSS - May 9, Page 12

13 Estimation 1. Probit software assumes that = For men, the estimated model is: y = m + m articles articles + " m m m m m = e m + em articlesarticles + e" m ; sd (e" m ) = 1 3. For women, the estimated model is: y = w + w articles articles + " w w w w w = e w + ew articlesarticles + e" w ; sd (e" w ) = 1 MAPSS - May 9, Page 13

14 Problem with Chow-type tests 1. We want to test: H 0 : m articles = w articles 2. But, end up testing: H 0 : em articles = ew articles 3. And, e m articles = e w articles does not imply m articles = w articles unless m = w. MAPSS - May 9, Page 14

15 Alternatives for testing group di erences 1. Allison s (1999) test of H 0 : m x = w x : (a) Disentangles the s and " s. (b) But requires that m z = w z for some z. 2. I propose testing H 0 : Pr (y = 1 j x) m = Pr (y = 1 j x) w since the probabilities are invariant to ". 3. Graphically... MAPSS - May 9, Page 15

16 Invariance of Pr(y = 1 j x) MAPSS - May 9, Page 16

17 Group comparisons of Pr(y = 1 j x) MAPSS - May 9, Page 17

18 Testing (x) 1. De ne: (x) = Pr (y = 1 j x) m Pr (y = 1 j x) w 2. Then: z = c (x) rv ar hc (x) i 3. Or the con dence interval: Pr ( LB (x) UB ) = :95 4. Delta method is very fast; bootstrap is very slow; both are available with SPost s prvalue and prgen. MAPSS - May 9, Page 18

19 Comparing groups di erences in Pr (y = 1) 1. Estimate: Pr (y = 1) = F ( w w + w x wx + m m + m x mx) where w = 1 for women, else 0; m = 1 for men, else 0; wx = w x; and mx = m x: 2. Then: Pr (y = 1 j x) w = F ( w + w x x) Pr (y = 1 j x) m = F ( m + m x x) 3. The gender di erence is: (x) = Pr (y = 1 j x) m Pr (y = 1 j x) w MAPSS - May 9, Page 19

20 Example: gender di erences in tenure Women Men Variable Mean SD Mean SD Is tenured? tenure Year year Year-squared yearsq Bachelor s selectivity select Total articles articles Distinguished job? distinguished Prestige of job prestige Person-years 1,121 1,824 Scientists MAPSS - May 9, Page 20

21 M1: articles only Women Men Variable e z e z constant articles Log-lik N 1,121 1,824 MAPSS - May 9, Page 21

22 M1: Pr(tenure j articles) 1 Men Women.8 Pr(tenure) Number of Articles (group_ten03a.do 11Apr2006) MAPSS - May 9, Page 22

23 M1: group di erences in probabilities 1. We are interested in group di erences in predictions: 2. We can test: 3. Or: (articles) = Pr (y = 1 j articles) m Pr (y = 1 j articles) w H 0 : (articles) = 0 (articles) LowerBound, (articles) UpperBound 4. With one RHS variable, we can plot all comparisons. MAPSS - May 9, Page 23

24 M1: (articles) with con dence intervals.8.7 Difference 95% confidence interval Pr(men) Pr(women) Number of Articles (group_ten03a.do 11Apr2006) MAPSS - May 9, Page 24

25 Adding variables additional variables 1. Structural model: y = + x x + z z + " 2. Setting z = Z changes the intercept: y = + x x + z Z + " = [ + z Z] + x x + " = + x x + " 3. Di erent values of z lead to di erent probability curves: Pr (y = 1 j x; z = Z) = Pr (" < [ + x x]) = F ( + x x) MAPSS - May 9, Page 25

26 E ect of levels of other variables 1.8 a* = (a + b_z 4) = 8 a* = (a + b_z 3) = 10 a* = (a + b_z 2) = 12 a* = (a + b_z 1) = 14 Pr(y=1 x,z ) x (group_parallel ) MAPSS - May 9, Page 26

27 Comparing groups with added variables 1. For given z = Z : Men: Pr (y = 1 j x; Z) m = F ( m + m x x) Women: Pr (y = 1 j x; Z) w = F ( w + w x x) 2. Group di erences depend on z: (x; Z) = Pr (y = 1 j x; Z) m Pr (y = 1 j x; Z) w 3. (x; Z) for a given x depends on the level of other variables. MAPSS - May 9, Page 27

28 M2: adding job type Women Men Variable e z e z constant articles distinguished Log-lik N 1,121 1,824 Chow tests: Allison tests: 2 articles =7.93, df=1, p<.01; 2 dist =1.38, df=1, p> articles =15.06, df=1, p<.01 (other e ects equal). 2 distinguished =3.54, df=1, p=.06 (other e ects equal). MAPSS - May 9, Page 28

29 M2: Pr(tenure j articles) 1 Men distinguished Women distinguished.8 Pr(tenure) Number of Articles (group_ten04a.do 14Apr2006) MAPSS - May 9, Page 29

30 M2: Pr(tenure j articles) 1.8 Men distinguished Women distinguished Men not distinguished Women not distinguished Pr(tenure) Number of Articles (group_ten04a.do 14Apr2006) MAPSS - May 9, Page 30

31 M2: (articles if not distinguished) Pr(men) Pr(women) Scientists not in distinguished departments Male Female difference 95% confidence interval Number of Articles (group_ten04a.do 11Apr2006) MAPSS - May 9, Page 31

32 M2: (articles if distinguished) Pr(men) Pr(women) Scientists in distinguished departments Male Female difference 95% confidence interval Number of Articles (group_ten04a.do 11Apr2006) MAPSS - May 9, Page 32

33 M2: (articles) by job type Pr(men) Pr(women) Distinguished if not significant Not distinguished if not significant Number of Articles (group_ten04a.do 11Apr2006) MAPSS - May 9, Page 33

34 M3: full model Women Men Variable e z e z constant year yearsq select articles prestige Log-lik N 1,121 1,824 Chow tests: Allison tests: 2 articles =1.07, df=1, p= prestige =0.03, df=1, p= articles =1.73, df=1, p=.19 (other e ects equal). 2 prestige =1.55, df=1, p=.21 (other e ects equal). MAPSS - May 9, Page 34

35 M3: Pr(tenure if prestige = 5) 1.8 Year 7 with prestige 5 Men Women Pr(tenure) Number of Articles (group_ten05b.do 04May2006) MAPSS - May 9, Page 35

36 M3: (articles if prestige = 5).7 Year 7 with prestige 5 Male Female difference 95% confidence interval Pr(men) Pr(women) Number of Articles (group_ten05b.do 04May2006) MAPSS - May 9, Page 36

37 M3: (articles by prestige) Pr(men) Pr(women) Weak (prestige=1) Adequate (prestige=2) Good (prestige=3) Strong (prestige=4) Distinguished (prestige=5) Number of Articles (group_ten05b.do 12Apr2006) MAPSS - May 9, Page 37

38 M3: (prestige by articles) articles 10 articles 20 articles 30 articles 40 articles 50 articles Pr(men) Pr(women) Job Prestige (group_ten05c.do 12Apr2006) MAPSS - May 9, Page 38

39 M3: (articles, prestige) MAPSS - May 9, Page 39

40 Conclusions 1. Predictions o er a general approach for comparing groups. 2. The approach cannot distinguish between di erent processes generating the same probabilities. For example: Case 1: m x = w x m 6= w ) (x) = 0 Case 2: m x 6= w x m = w ) (x) = 0 3. But, it provides a very detailed understanding of group di erences. 4. Implications for interpreting nonlinear models. 5. In Stata, enter: findit spost_groups for examples MAPSS - May 9, Page 40

41 References Allison, Paul D Comparing Logit and Probit Coe cients Across Groups. Sociological Methods and Research 28: Chow, G.C Tests of equality between sets of coe cients in two linear regressions. Econometrica 28: Long, J.S. and Freese, J Regression Models for Categorical and Limited Dependent Variables with Stata. Second Edition. College Station, TX: Stata Press. Long, J. Scott, Paul D. Allison, and Robert McGinnis Rank Advancement in Academic Careers: Sex Di erences and the E ects of Productivity. American Sociological Review 58: Xu, J. and J.S. Long, 2005, Con dence intervals for predicted outcomes in regression models for categorical outcomes. The Stata Journal 5: MAPSS - May 9, Page 41

42 Delta method for (x; ) 1. Start with the predicted probability: G() = Pr (y = 1 j x) = F (x) 2. A Taylor series expansion of G( ): b G( ) b G() + ( b ) T G 0 () 3. Where G 0 () = 0 = f(x)x K i T MAPSS - May 9, Page 42

43 4. Under standard assumptions, G( b ) is distributed normally around G() with a variance Var h G( b ) i = G 0 ( b ) T V ar( b )G 0 ( b ) 5. For a di erence in probability, let G () = F (jx a ) F (jx b ) MAPSS - May 9, Page 43

44 6. Then V ar h F ( jx b a ) F jx b i b = (jxa T V ar( ) (jx (jxa T V ar( ) (jx (jxb T V ar( ) (jx (jxb ) T V ar( ) (jx # # # # MAPSS - May 9, Page 44

45 Proof of invariance of Pr(y = 1) to " 1. If N (0; = 1) 2. Then: (") = 1 p 2 exp " Pr (y = 1 j x) = 3. If N (0; = 2): Z +x 1 (") = 1 p 2 exp " 2 2 2! = 1 p 2 exp 1 p 2 exp! t 2! 2 " 2! 2 dt (1) = 1 2 p 2 exp " 2! 8 MAPSS - May 9, Page 45

46 4. Then, Pr (y = 1 j x) = Z (+x) p 2 exp t 2! dt (2) 8 5. Equations 1 and 2 simply involve a linear change of variables, so the probabilities are equal. 6. If z = g (x) and x = g 1 (z), then Z A B f (z) dz = Z g 1 (A) g 1 (B) f [g (z)] dz dx dx 7. See Long 1997: for an alternative proof. MAPSS - May 9, Page 46

Group comparisons in logit and probit using predicted probabilities 1

Group comparisons in logit and probit using predicted probabilities 1 J. Scott Long Indiana University May 27, 2009 Abstract The comparison of groups in regression models for binary outcomes is complicated