Københavns Universitet. Correlations and Non-Linear Probability Models Breen, Richard; Holm, Anders; Karlson, Kristian Bernt

Size: px

Start display at page:

Download "Københavns Universitet. Correlations and Non-Linear Probability Models Breen, Richard; Holm, Anders; Karlson, Kristian Bernt"

Elinor Pope
6 years ago
Views:

1 university of copenhagen Køenhavns Universitet Correlations and Non-Linear Proaility Models Breen, Richard; Holm, Anders; Karlson, Kristian Bernt Pulished in: Sociological Methods & Research DOI: / Pulication date: 014 Document Version Peer reviewed version Citation for pulished version (APA: Breen, R, Holm, A, & Karlson, K B (014 Correlations and Non-Linear Proaility Models Sociological Methods & Research, 43(4, DOI: / Download date: 9 Apr 018

2 Correlations and Non-Linear Proaility Models Richard Breen*, Anders Holm**, and Kristian Bernt Karlson*** THIS PAPER IS PUBLISHED IN SOCIOLOGICAL METHODS & RESEARCH, 014, VOL 43, NO 4, Link: This is a post-print (ie final draft post-refereeing version according to SHERPA/ROMEO * Center for Research on Inequality and the Life Course, Department of Sociology, Yale University, richardreen@yaleedu ** Department of Sociology, University of Copenhagen; SFI The Danish National Centre for Social Research, ah@sockudk *** SFI - The Danish National Centre for Social Research; Danish School of Education, Aarhus University, kk@dpudk Tuesday, May 8, 013 Acknowledgements: We would like to thank the SMR editor and reviewers and the participants at the RC8 conference held in Trento, Italy, in May 013, for helpful comments on this paper

3 Correlations and Non-Linear Proaility Models Astract Although the parameters of logit and proit and other non-linear proaility models are often eplained and interpreted in relation to the regression coefficients of an underlying linear latent variale model, we argue that they may also e usefully interpreted in terms of the correlations etween the dependent variale of the latent variale model and its predictor variales We show how this correlation can e derived from the parameters of non-linear proaility models, develop tests for the statistical significance of the derived correlation, and illustrate its usefulness in two applications Under certain circumstances, which we eplain, the derived correlation provides a way of overcoming the prolems inherent in cross-sample comparisons of the parameters of non-linear proaility models

4 3 Correlations and Non-Linear Proaility Models Both tetook epositions and discussions of research findings commonly interpret the coefficients of non-linear proaility models (henceforth NLPMs such as logits, proits, the ordered logit and proit, and the multinomial logit, as the coefficients of an underlying latent linear model, aleit identified only up to scale In this paper drawing on and etending the work of McKelvey and Zavoina (1975 we point out that the coefficients of NLPMs are, in fact, closely related to the correlations etween the dependent and predictor variales of the latent linear model, and, in some circumstances, interpreting them in the light of this could e particularly useful This is especially so when we want to compare the effects of the same variale in the same NLPM fitted to different groups (for the prolems in doing this see Allison 1999 We first develop the logit and proit models in a latent variale framework and show how their coefficients can e used to recover the correlation etween a predictor variale,, and y*, the underlying latent dependent variale We discuss the conditions under which the correlation coefficient is likely to prove useful and we provide statistical tests of the derived correlation coefficient and for differences in correlations etween samples We also show how to derive the correlation in the ordered and multinomial case We conclude the paper with two eamples dealing with trends in educational inequality The first is a reanalysis of data on educational inequality in Europe (Breen et al 009 The second uses GSS data to study the trend in educational inequality in the US among cohorts orn etween 1930 and 1969 Four appendices contain some technical details of our approach

5 4 A Latent Variale Formulation of Logit and Proit Models Let y* denote a continuous outcome variale, and let e a predictor variale whose effect on y* we want to estimate We can write a regression model for y* as y* = α + β y * + u, with sd( u = σ u (1 Here u is a random error term and σ u its standard deviation, which summaries that part of the variation in y* uneplained y α is the intercept of the model, and β y* captures the effect of on y* α, β y*, and σ u are all unknown parameters However, we assume that y* is unoserved and instead we oserve: y = 1 if y* > τ y = 0 if y* τ ( This means that we only oserve whether an oservation s value of y* is greater or less than a constant, τ, which is a threshold parameter whose value is unknown Following a standard latent variale formulation of discrete choice models we rewrite the error term in (1 such that u = sω For the logit model, we assume that ω is a standard logistic random variale, with mean ero and variance π /3 and s is a scale parameter, yielding a variance of σ = s π / 3 for the error term in (1 For the proit model, we assume that ω is a standard normal random variale, with mean ero and unit variance and s is a scale parameter, yielding a variance of σ u = s for the error term in (1 For the logit case we specify the proaility of success as a function of : u α τ β y* ep + u β α τ y* s s Pr( y = 1 = Pr( y* > τ = Pr > + =, (3a s s s α τ β y* 1+ ep + s s

6 5 where the final equality holds ecause we assumed ω to e a standard logistic random variale Taking the logarithm of the odds of the proaility in (3a, we otain the well-known logistic regression model: α τ β y* logit(pr( y = 1 = + = a + y (3 s s For the proit case we also specify the proaility of success as a function of : α τ s * Pr( y* > τ = Φ + y = Φ ( a + y β s (4 Where Φ( denotes the cumulative distriution function of the standard normal distriution, and where the first equality holds ecause we assumed ω to e a standard normal random variale The intercept of the logit or proit model depends on the intercept of the underlying linear model (α and the threshold parameter (τ and these two cannot e recovered separately, eing asored in the intercept, a, of the logit or proit The parameters y in the logit and proit models are equal to the regression coefficients from the underlying linear model in (1 divided y the scale parameter, which is a function of the underlying conditional error variance Thus in proit and logit models we can identify regression coefficients only up to scale, which is a function of the conditional standard deviation of the latent outcome variale This presents particular difficulties for parameter comparisons, whether these are etween parameter estimates for the same variale in different model specifications (for eample, with and without control variales; see Karlson, Holm and Breen 01 or parameter estimates for the same variale in the same models fitted to different samples (Allison 1999 In oth cases, real differences in a variale s effect will e confounded y possile differences in scaling The issues pertaining to scale identification of coefficients from non-linear proaility models have wide-ranging consequences for comparative research analying discrete outcomes

7 6 Most significantly, ecause these coefficients are inherently standardied, researchers cannot generally follow the recognied advice of using "unstandardied coefficients" in group comparisons (Tukey 1954; Blalock 1967 Consequently, we need to look to other metrics that might prove useful in certain areas of sociological research The Prolem in Group Comparisons When Not Assuming a Latent Outcome Variale The previous derivations hold under the assumption that we oserve the latent outcome, y*, via its inary manifestation, y However, in some sociological applications, motivating the logit model in terms of latent variales is less ovious Yet, even when the inary outcome variale can e said to e truly inary, the prolem in comparing logit or other nonlinear proaility model coefficients across groups remains Using the distinction etween marginal and conditional models (Agresti 00, in Appendi A we show that the group comparison prolem also applies when the outcome can e said to e truly inary and the latent variale formulation consequently does not apply In either case, group comparisons of coefficients are distorted y differences in unmeasured heterogeneity across groups even when the unmeasured heterogeneity is independent of the oserved predictor variales The Correlation and Its Relation to the Logit or Proit Coefficient Rather than interpreting logit and proit coefficients as underlying effects identified up to scale, they can also e interpreted as functions of the correlation coefficient etween the predictor and the underlying latent variale, y* To show this, we use results from the early literature on the relationship etween coefficients from linear models and correlations (eg, Blalock 1964, 1967; Linn and Werts 1969; Theil 197 The only difference is that we apply the results to y*, allowing us to derive the correlation coefficient from the proit and logit model

8 7 We can write the variance of the underlying latent variale, y*, as var( y* = β var( + var( y* = β var( s var( (5 y * y* + ω and the correlation etween y* and is equal to r sd ( = β sd ( y* y* y* Using (5 and the fact that the logit or proit coefficient, y, equals β y* s we can write r y* ys sd( y sd( = =, (6 s var( + s var( ω var( + var( ω y y since the s terms cancel For the logit we sustitute var(ω = π /3 and for the proit var(ω =1 With a single predictor variale the correlation in (6 equals the fully standardied coefficient suggested y McKelvey and Zavoina (1975: 115 Using (6 we can write the logit or proit coefficient as a function of the correlation coefficient To do so, we solve for k in y = k ry * Using that * sd( y s = and writing the sd( ω correlation and y coefficient in terms of variances and covariances, we otain cov(, y* cov(, y* = k, var( s sd( sd( y* cov(, y* sd ( ω sd ( sd ( y* sd ( ω sd ( y* 1 sd ( ω k = = = cov(, y* var( sd ( y sd ( sd ( y sd ( * * 1 ry* and thus

9 8 y = r y* 1 r y* sd( ω sd( (7 Thus logit and proit coefficients can e epressed as the square root of the ratio of the eplained to uneplained variance of y* scaled y the ratio of the standardied standard deviation to the standard deviation of the predictor variale 1 From this we see that a logit or proit coefficient has two parts, one which is scale invariant, r y* 1 r y*, and one which is scale-dependent, sd ( ω sd ( As we will argue, this separation provides an entry into addressing the prolem of making comparisons etween groups when using NLPMs The derivative of the logit or proit coefficient with respect to the correlation is r sd ( ω y = 3 / y* sd ( (1 r The derivative tells us that the logit or proit coefficient increases as the correlation deviates from ero From the forgoing we see that the relationship etween the correlation and the y coefficients depends only on known quantities namely sd (ω, which is known (y assumption, and sd ( which is known from the data This is in contrast to the relationship etween y and β y * which depends on the unknown quantity, s 1 In the proit case, ( 1 sd ω =, and if we standardie to have unit standard deviation, the squared proit coefficient will equal the ratio of eplained to uneplained variance in the underlying latent variale regression

10 9 Partial, Semi-Partial, and Fully Standardied Partial Coefficients Equation (6 applies when there is a single predictor variale With two or more predictor variales, we can derive the partial correlation of and y* given (this is demonstrated in Appendi B as r y* = y y sd( var( + var( ω, (8 and, as a consequence, we can write the partial logit or proit coefficient as y = r y* 1 r y* sd( ω sd( (9 The etension to the case with several control variales is straightforward Despite this, the etension from the simple to the partial case has never efore een demonstrated, as far as we are aware As in the linear case (Blalock 1967: 133, the partial correlation in (8 will generally differ from the fully standardied partial coefficient, * y, used y McKelvey and Zavoina (1975, * y = y sd( y var( + y var( + y y cov(, + var( ω, (10 their analytical relation eing: = r * y y* 1 r y* 1 r, Similarly, we derive the semi-partial correlation etween y* and given as The semi-partial correlation is also known as the part correlation

11 10 r y*( = y sd ( y var( + y var( + y y cov(, + var( ω meaning that the semi-partial correlation and the fully standardied partial coefficient have the following relation: * sd( 1 y = ry*( = ry*( sd( 1 r Thus, the fully standardied partial coefficient can e viewed as a rescaled version of the semipartial correlation, with the scale factor eing the square root of the proportion of the variance in that is uneplained y, r 1 When Is the Correlation a Useful Metric? Using correlations rather than logit or proit coefficients themselves implies a shift of focus for researchers interested in group comparisons In this section, we clarify the conditions under which the correlation coefficient is a useful metric for group comparisons and suggest which methods researchers might consider if the conditions we outline are not met When the Correlation Is Useful We have shown how to derive the correlation from the parameters of an NLPM: ut when, and why, should we want to do this? We outline three circumstancesthe first circumstance in which the correlation will e useful is when we want to compare the variation in y* etween and within values of This is particularly the case when is categorical, as, for eample, in studies that focus on the relationship etween educational attainment and social class ackground In many studies the oserved dependent variale, y, is whether or not a student makes a given educational transition (such as the transition from High School to College and the unoserved y* could e

12 11 interpreted as his or her propensity to do so In these sorts of analysis we are often interested in the changing relationship, over irth cohorts, etween dummy variales,, representing social classes, and y* (eg Shavit and Blossfeld 1993 The conventional approach would take comparisons of β y * over cohorts as the ideal measure, and one would try to recover these y estimating the corresponding logit or proit coefficients, y Logit or proit coefficients declining in magnitude over successive irth cohorts would then suggest equaliation in the particular educational transition But the prolem with this interpretation is that we cannot know the etent to which declines over cohorts in y are due to declines in β y * or to increases in the residual variation of y* An alternative measure of equaliation is the degree to which variation in the propensity to make the transition eists etween students from different classes, relative to the variation that eists among students within the same class A decline over cohorts in the variation in y* tells us that there is less variation as a whole: in this case the correlation would decline only if the variation etween students from different classes was declining more quickly than the total variation Thus, if the correlation was constant over irth cohorts, y could decline only as a result of a reduction in the variance of y* and/or an increase in the variance of But it was to remove these spurious influences on the measure of inequality that led sociologists to the use of NLPMs in the first place (Mare 1981: y the same argument, they should then prefer the (partial correlation as a measure of educational inequality The second circumstance in which the correlation can e informative is when we care aout the relative, rather than the asolute, value of the outcome variale In studies of intergenerational income moility, for eample, it is reasonale to focus on an individual s position within a given income distriution (that is, relative to others rather than, or in addition

13 1 to, the individual s asolute level of income For this reason, the correlation of (the logarithm of parent s and child s incomes is sometimes preferred, as a measure of intergenerational immoility, to the elasticity (the coefficient from the regression of log of child s income on log of parent s income ecause it is insensitive to differences in the variance of income in the parental and child generations (see Björklund and Jäntti 009 Because we can recover the correlation etween y* and one or more predictor variales, we can estimate the standardied regression coefficients linking them, and these will e particularly useful when we care aout relative rather than asolute position Consider the following eample: one would epect the impact of a scholar s rate of pulication on his or her academic salary to e affected y the mean and variance of each of these two variales within the scholar s field The pulication of two papers per year more than the mean pulication rate in mathematics is a more impressive performance than the same achievement in chemistry ecause the variance of pulication rates is much larger in the latter field Since the variance of annual salaries also differs across fields, it is unreasonale to epect that pulishing one additional paper during a period of time will have an equivalent impact on annual salary across fields In contrast, it seems more reasonale to epect that the impact of increasing one s pulication rate y one field-specific standard deviation will have an equivalent impact on field-specific standard scores for academic salary across fields (Hargens 1976: 5 If we sustitute the phrase propensity to otain tenure for annual salary we have a inary outcome and here an interpretation of logit or proit coefficients in terms of the -y* correlations or standardied regression coefficients should e preferred over an interpretation of them as underlying regression coefficients identified up to scale

14 13 The third circumstance concerns the situation in which researchers are interested in conditional inference; that is, interested in the association etween two variales (y* and net of a third variale ( For eample, stratification researchers might want to compare across irth cohorts the magnitude of the association etween college completion and parental social status net of race In such situations, the semi-partial correlation will often e preferred over the partial correlation The latter correlation partials out the variance in oth y* and eplained y, while the former only partials out the variance in eplained y In other words, the semi-partial correlation (or its squared counterpart is a measure of the fraction of eplained variance in y* due to net of relative to the unconditional or total variance in y* In contrast to the partial correlation, the semi-partial correlation is unaffected y the predictive power of on y*, therey not conflating differences in the predictive power of in group comparisons An alternative to the semi-partial correlation is the fully standardied partial coefficient of McKelvey and Zavoina (1975, stated in Equation (10 While this coefficient is just a rescaled version of the semi-partial correlation, it is useful whenever researchers are concerned with the relative, rather than the asolute, value of the outcome variale, as outlined aove In this situation, the fully standardied partial coefficient can e interpreted as the epected standard deviation change in y* for a standard deviation change in, given For eample, stratification researchers might want to know whether the dependency of the son's relative position in the socioeconomic status distriution on the relative position of his father in the father's corresponding distriution, once race is controlled, has changed over the 0th century In such a situation, the fully standardied partial coefficient might e preferred One thing which is clear from these eamples of where the correlation may prove useful, is that, when these conditions prevail, interpreting the coefficients of NLPMs in terms of

15 14 correlations also provides a solution to the prolem of comparing the effects of variales across samples, etween which the residual variation, captured in the scale parameter, s, might differ When the Correlation Is Not Useful The conditions under which the correlation is a useful metric for group comparisons may not always e met In this susection we first riefly outline these conditions and then discuss two other methods that might e useful when the conditions are not met Unstandardied or Standardied Coefficients? Unstandardied regression coefficients have long een preferred over standardied coefficients in social science research ecause the former are unaffected y the distriutions of the variales involved in the model (Blalock 1967; Kim and Mueller 1976; Schoenerg 197; Tukey 1954 In the linear regression model, the unstandardied regression coefficient of on y is given y cov(, y β =, var( yielding the epected change in y for a unit change in The standardied or equivalent correlation coefficient is given y sd( r = β, sd( y and is affected y oth the effect of on y and the distriutions of and y Using standardied or correlation coefficients for group comparisons will conflate differences in true effects, captured

16 15 y β, with potential differences in the distriutions of oth and y, captured y sd( and sd(y For this reason, and for reasons stated throughout this paper, using the correlation coefficient for group comparisons is not informative aout differences in effects; it is merely informative aout differences in the predictive power of on y (Achen 1977; King 1986 or aout the association etween relative positions in the distriutions For eample, the correlation coefficient would not e useful for comparing the influence of high school grades on college enrollment etween men and women Because the correlation standardies grades and the latent college enrollment propensity within gender, we cannot know whether gender differences in correlations reflect differences in effects or differences in the dispersion in the latent propensity distriution or in the grade distriution If the effect, β, and the dispersion in the latent propensity, sd(y, were the same for men and women, ut the dispersion in the grade distriution, sd(, was larger among men than women, then we would see larger correlations among men than women Such differences would not e informative aout gender differences in the influence of grades on college enrollment, even if we interpreted them as associations etween relative positions in the distriutions This interpretation would imply that women (men only compete with other women (men in otaining college positions But this is unlikely to e true in most countries Similar eamples can e given of comparison across supopulations that compete for the same goods Nevertheless, although using the correlation for comparative purposes is restricted in many respects, the same applies to nonlinear proaility models in which the coefficients, y design, are inherently standardied on the latent outcome variale via the scale parameter, s Since the true effect, β y*, cannot e separately identified, otaining unstandardied coefficients is not possile in nonlinear proaility models And since the scale parameter, s, depends on oth

17 16 the total variance in the latent outcome and the predictive power of, coefficients of nonlinear proaility models are difficult to interpret 3 This is precisely the reason why we propose viewing these coefficients in terms of correlations which, although limited in their use, have a well-defined interpretation However, viewing nonlinear proaility model coefficients as correlation coefficients is only one way of approaching the limitations of the inherent standardiation of these coefficients As we have already pointed out, the correlation coefficient is standardied on oth the predictor variale and the outcome variale Yet, ecause we know the scale of the predictor variale,, standardiing the coefficient on is not necessary Researchers may instead opt for the coefficient standardied on the latent outcome alone This coefficient was introduced in the seminal work y Winship and Mare ( In the simple case with only two variales, it is given y YSTD y β y* y = = sd ( y* y var( + var( ω This partially standardied coefficient differs from the fully standardied counterpart in (6 y not including sd( in the numerator It yields the epected standard deviation change in y* for a unit change in, and can easily e etended to the multiple case, YSTD y β y* y = ( * = sd y var( + var( + cov(, + var( ω y y y y, 3 The interpretation of coefficients of nonlinear proaility models is even more complicated when several predictor variales are included In this situation, the coefficients reflect the true effects of interest and the predictive power of all predictor variales 4 Breen and Karlson (013 discuss the advantages of using coefficients standardied on the outcome in causal inference involving nonlinear proaility models and show the equivalence etween these coefficients and Cohen's d, an effect sie metric much used in evidence-ased research

18 17 In the multiple case, the coefficient epresses the epected standard deviation change in y* for a unit change in, holding constant The partially standardied coefficient may prove useful in certain areas of comparative research For eample, in studying the lack and white gap in college attainment over the 0th century US, researchers might want to know whether the gap in the underlying propensity to complete college narrowed over time In this situation, the partially standardied coefficient could e used to compare the lack-white gap in y*, measured in standard deviations, across different time periods Such an analysis would e informative aout the changes in the lackwhite gap in college completion as a positional good Moreover, if researchers were interested in comparing these gaps net of socioeconomic status, the partially standardied coefficient for the multiple case could e used Given the often complicated interpretation of nonlinear proaility model coefficients, the partially standardied coefficient is an attractive alternative that acknowledges the inherent standardiation on the latent outcome variale and retains the scale interpretation of the predictor variale of interest Because this coefficient is not sensitive to the distriution in it may e useful in certain areas of comparative research We now turn to two other alternatives to using logit and proit models in group comparisons in sociological research Using Predicted Proailities The issue of comparing logit or proit coefficients across groups was addressed y Ai and Norton (003 whose approach uses the predicted values from the logit or proit model, and Long (009 shows how to implement the method when researchers want to use it to make group

19 18 comparisons Because predicted proailities are non-linear, group differences in them will vary across the distriution of the predictor variale of interest This means that an interaction effect cannot e reduced to a single quantity, ut has to e studied graphically across the distriution of the predictor variale To implement this idea, Long suggests using a logit or proit model to estimate, for each group, the predicted proaility of the outcome at different values of a predictor variale of interest and then plotting the difference etween the groups' predicted proailities across the distriution of the predictor variale Long (and Ai and Norton also suggest statistical tests of group difference in proailities This approach could e applied when hypotheses aout group differences can e tested through their implications for proailities It is also likely to e valuale in showing that the magnitude of group differences can vary across the distriution of predictor variales This can often est e seen graphically; indeed, as Greene (010 notes, such graphical output is needed whenever we are to evaluate interaction effects on the proaility margin Nevertheless, this method has some limitations As Greene (010: 95 oserves, "partial [marginal] effects are neither coefficients nor elements of the specification of the model They are implications of the specified and estimated model" (rackets added One consequence of this property is that tests of significance of differences in predicted proailities cannot e interpreted in the way we would interpret tests of significance of differences in coefficients Furthermore, while the method is simple to implement in models with inary outcomes, in the ordinal or multinomial case the graphical output will rapidly ecome cumersome Using the Heteroskedastic Non-Linear Proaility Model As well as pointing to the prolem of making comparisons across groups when using non-linear proaility models, Allison (1999 also proposed a solution that involved estimating the ratio of

20 19 error variances etween groups Williams (009 showed that Allison s method is a special case of what is variously termed a "heteroskedastic non-linear proaility model", a "heterogeneous choice model", and a "location-scale model" This model is characteried y an equation for the variance as well as the mean The variance can e written as a function of any relevant predictor variales, ut, in the contet of group comparisons, we would e most interested in models that allowed the variance to differ according to dummy variales for groups For eample, applying the model to the male-female iochemist data originally used y Allison, Williams (009: 540 reports that the residual variance for women is 135 times larger than for men What appears to have een overlooked in discussions of these models, however, is the issue of identification; that is, our aility to draw inferences aout the parameters of interest, such as differences in true coefficients across groups If we consider the case of comparisons using the ordered logit or proit model (see equation (11, elow 5, then two groups can differ in one or all of their threshold parameters, τ j, their regression coefficients, β, and their scaling factor, s A model in which all of these differ is not identified This follows ecause such a model is equivalent to fitting a separate model for each group and so, if it were identified, this would allow us to recover the group specific scaling factor from that group s ordered logit model something which we know is not possile Thus, to identify the relative sies of the different groups scaling factors, it is necessary to impose a group constraint on either or oth of τ j and β (see Williams 009: 551, whose application of the heterogeneous choice ordered proit model is "contingent on the thresholds eing the same for oth men and women" 5 Everything that follows holds for inomial as well as ordinal outcomes and, indeed, for non-linear proaility models as a whole

21 0 Whenever we have reason to elieve that a constraint on β or τ j is defensile, the model may e a good choice for making comparisons (see, eg, Mouw and Soel 001 ecause we can recover the group difference in the true coefficients of interest However, as Long (009 argues, in sociology we rarely have either the ody of accumulated knowledge or strong theory that would make the imposition of such constraints anything other than ad hoc 6 Thus, whenever the condition cannot e met, differences in unknown thresholds will e confounded with differences in unknown scaling factors, and so heteroskedastic non-linear proaility models will not solve the group comparaility prolem Some Further Results Etension to the Ordered and Multinomial Case In the ordered logit and proit we oserve not merely whether y* eceeds an unknown value or not (as in the inary logit or proit ut into which interval of y* a given oservation falls Thresholds divide the range of y* into disjoint and ehaustive intervals, τ 1 <τ < <τ J where τ = and J 1 τ = These allow us to define y O = j if τ 1 < y* < τ where j j O y is an ordered, discrete variale with J categories and the ordered proit or logit model takes the form: g(pr( y τ β s * j > τ j = + t, (11 s where g( is a link function the logit or the cumulative normal in the ordered logit and ordered proit respectively and = β / s is the ordered logit or ordered proit coefficient of Because the underlying latent variale model in the ordered models is the same as in the inary case (the 6 It is, of course, possile to make the residual variance depend on other variales But in so far as these variales do not wholly determine the error variance we are still in the dark as to how much it differs etween groups

22 1 difference etween them eing only in the degree to which y* is oserved, the epressions we have derived for the correlation and partial correlation (and for the coefficient of determination which we derive in Appendi D apply directly to the ordered case The multinomial case is more complicated ecause we have a numer of underlying latent variales We define as efore, ut we now define y*a as the propensity of an individual to choose alternative a from among a set of A possiilities We assume that and y*a are related such that y = α + β + s u, (1 * a a a a a where β a captures the effect of for alternative a, ua is an alternative-specific random error term, which is independent across alternatives and follows a standard type-i etreme value distriution, and sa is an alternative-specific scale parameter We only oserve which of the A alternatives the individual actually chooses and we assume that the individual chooses the alternative for which he or she has the greatest propensity: y = a y y > a a * * if a a 0, McFadden (1974 shows that the proaility of choosing alternative a given is equal to ep( ka + a Pr( y = a =, (13 ep( ka + a a with normaliation such as k 1 = 1 = 0 to secure identification, and where a= a= k a α a = and s a a β a = s a The odds of choosing a rather than the aseline category a' are ( k ep a + a and from this we can derive the correlation etween y * * a y a ' and in much the same way as in the inary case, when we add one important qualifier: The standard deviation of used in the correlation should

23 e calculated not on the entire sample, ut on the sample which has chosen either the alternative, a, or the reference category, a' The reason for this is that the standard deviation of on the contrast-specific sample can differ from the standard deviation of in the full sample We may then derive the correlation as r = a a * * ( ya ya ', a a + π sd (, (14 var( / 3 where we define sd( a as the standard deviation of on the sample pertaining to the alternative a versus the references contrast a' (and var( a is its squared counterpart The epression in (14 generalies easily to any contrast etween two alternatives Sensitivity to Mis-Specified Error Term As we saw earlier, the correlation etween a predictor and the latent outcome can e recovered from the parameters of a NLPM ecause we know the variance of the predictor and var(ω is given y assumption Typically, when y is a inary realiation of y* we assume ω to have either a standard normal (for the proit or standard logistic (for the logit distriution But this raises the question: how sensitive are our estimates of the correlation to such an assumption? To answer it we ran a Monte Carlo simulation in which we fitted logit and proit models to data in which we varied the true underlying error distriution, the sample sie, and the magnitude of the correlation In Tale 1 we report the mean, over 500 replications, of the asolute deviation from the true correlation It is noticeale that, ecept for the uniformly distriuted error, the metric is largely insensitive to pure misspecifications, with the mean deviations for the misspecified cases differing little from those where the model is correctly specified (the logit-logistic and proit-

24 3 normal cases This holds even for a lognormal distriution with a skewness of -1 where the error term is highly asymmetric As we would have epected, the sensitivity is less pronounced the larger the sample sie --TABLE 1 HERE -- Relationship to the Polyserial Correlation Coefficient The correlation coefficient we present in this paper is closely related to the polyserial correlation coefficient widely used in psychometrics (Co 1974; Olsson, Drasgow, and Dorens 198 However, we elieve that our approach is more general, more fleile, and easier to implement The polyserial correlation assumes that y* and follow a ivariate normal distriution, whereas our measure places no assumptions on the distriution of and allows y* to have distriutions other than normal (eg, the logistic, complementary log-log Our method is easily generaliale to partial correlations (using partial logits and conditional standard deviation of whereas the partial polyserial correlation has to e derived from several polyserial correlations, severely complicating the derivation of standard errors Furthermore, unlike the polyserial coefficient, our method etends to partial correlations without making any assumptions aout the distriution of the control variale, Significance Testing Because the correlation is asymmetrically distriuted, tests of its statistical significance are usually undertaken on the Fisher transformed coefficient The Fisher transformation of the correlation coefficient, r, is

25 r F ( r = ln = arctanh( r (15 1 r F(r is approimately normally distriuted with a known mean and standard error under the null However, in our case the correlation is not directly calculated from data: rather it is computed as a function of an estimated quantity, y In calculating the standard error of F(r we therefore take this into account, ut, having done this, we can then eploit the normality of F(r to calculate confidence intervals and significance tests We compute the standard error of F(r using the delta method The asymptotic standard error otained from this approach depends on the partial derivative of F(r with respect to and we compute this using the chain rule F( r F( r r = r y y The standard error also depends on the variance of y We can write the asymptotic variance of F(r as: df ( r dr dr d y var( y (16 where df( r dr dr d y = y sd( var( + var( ω The asymptotic standard error of F(r is given y the square root of equation (16 Because the Fisher transformed correlation is asymptotically normally distriuted under the null, we can test the null hypothesis of ero correlation via: Z F( r = (17 se( F( r We analyed the power of the test score in (17 using a Monte Carlo study The results are reported in Tale We generated the data using equation (1 with normally distriuted and u having a standard logistic distriution and we derived the estimated correlation from a logit

26 5 model We varied the true correlation and the sample sie and, for each scenario, we report the percentage of times out of 1000 replications that our test rejected the null hypothesis (ased on a 05 critical value In scenario A of Tale the null hypothesis is true, and here our test rejects the null hypothesis 5% of the time or less In scenarios B through F, the null hypothesis is false, and our test fails to reject it only when the sample sie is small and the true correlation is quite small Even in samples of 00 and for a correlation of 05 a false null has a 90% rejection rate In samples of 1,000 oservations or larger, we otain a power of aove 80 percent even for small correlations -- TABLE HERE -- The Fisher transform can easily e etended to test the null hypothesis of ero difference etween groups in their correlations: Z F( ry*, F( ry*,1 = (18 var ( F( r + var( F( r y*,1 y*, Here the suscript 1 and indicates different samples etween which we want to compare correlations We conducted a Monte Carlo study to evaluate the power of this test The results can e found in Appendi C They show that the larger the difference etween correlations and the larger the sample sie, the more likely the test is to reject the null hypothesis of no difference in correlations With a sample of 1000 the test is effective at rejecting the null hypothesis when the difference in correlations is 0 or greater: for a sample of 5000 the test performs well for differences in correlations of more than aout 07 Applications

27 6 The final part of our eposition applies the methods we have proposed to two eamples that illustrate the relationship etween the correlation and the coefficients of NLPMs Our first eample is a reanalysis of data on trends in educational inequality in Europe (Breen et al 009 Using the correlation slightly modifies the findings of the original study and we eplain why y investigating the source of the differences that we find etween the ordered logit coefficients on which the original conclusions were ased and the correlations on which we rest our conclusions Our second eample uses data from the General Social Survey (GSS to eamine trends in the relationship etween educational attainment and father s socio-economic inde over cohorts orn etween 1930 and 1969, and here we find that the differences etween trends ased on an ordered logit model and those ased on the correlation are much more pronounced Non-Persistent Inequality in Educational Attainment Shavit and Blossfeld (1993 summarie the results of their seminal study of inequalities in educational attainment across different cohorts under the title Persistent Inequality They claim that, despite dramatic educational epansion during the twentieth century, all ut two (Sweden and the Netherlands of the thirteen countries studied in their project ehiit staility of socioeconomic inequalities of educational opportunities (Shavit and Blossfeld 1993: Recently, Breen, Luijk, Müller and Pollak (009 have challenged Shavit and Blossfeld s conclusions using data on educational inequality in the twentieth century in eight European countries (see also Breen and Jonsson 005; Breen, Luijk, Müller and Pollak 010 All these studies ase their comparisons etween irth cohorts on the use of NLPMs: logits in the case of the Shavit and Blossfeld (1993 volume, the ordered logit in the Breen et al (009, 010 analysis This implies that their conclusions may confound real differences in educational

28 7 inequality across cohorts with differences in residual variation Here we re-analye data from three of the eight countries in the Breen et al study and compare the coefficients and thus the conclusions drawn from the original model with the corresponding partial correlations The correlation, as a standardied measure, is particular appropriate in this case ecause, from the perspective of the study of inequality, education can e seen as a positional good: what matters is a person s position in the distriution rather than his or her asolute level of education Furthermore, when, as here, the predictor variales are dummies, the correlation has an attractive interpretation, telling us how much of the variance in the latent y* lies etween classes rather than within them The data we use relate to men in Germany, France, and the Netherlands, orn in one of five cohorts: , , , , and The dependent variale is highest level of educational attainment measured using five ordered categories: 1 Compulsory education with or without elementary vocational education, Secondary intermediate education, vocational or general, 3 Full secondary education, 4 Lower tertiary education, 5 Higher tertiary The predictor variale is social class origins, ased on the respondent s report of his father s occupation when he was growing up Seven classes are distinguished using the EGP class schema (Erikson and Goldthorpe 199, chapter : I II IIIa Upper service, Lower service, Higher grade routine non-manual workers, IVa Self-employed and small employers,

29 8 IVc Farmers, V+VI Skilled manual workers, technicians and supervisors, VIIa+III Semi- and unskilled manual, agricultural, and lower grade routine nonmanual workers Sample sies are 17,14 for Germany, 51,705 for France, and 19,751 for the Netherlands Further detail on the data and on the surveys from which they were otained can e found in Breen et al (009: These three countries were chosen for reanalysis here ecause Breen et al found significant decline in class inequality in educational outcomes over successive irth cohorts in all three countries, whereas in Persistent Inequality a decline was found for the Netherlands ut not for Germany (France was not included The question we now seek to answer is whether Breen et al s claim of non-persistent inequality is roust if we use the correlation as our measure of inequality in educational attainment We egin our analysis y replicating theirs Breen et al use an ordered logit model to regress the five educational categories, considered an ordinal ranking of educational attainment, on dummy variales representing the origin classes They fit a separate model to each of the five irth cohorts and present their results in a series of figures, of which the most important is Figure 4 on page 1495 of their paper Our replication of their analysis is shown here in Figure 1 and this is identical with their Figure 4 -- FIGURE 1 HERE -- Class I serves as the omitted category, having an implicit coefficient of ero, and the lines plotting the coefficients for the other classes (which measure the etent to which the particular class s log-odds of eceeding any given level of education differ from those of class I show a

30 9 general trend of convergence towards ero as we move from later- to earlier-orn cohorts This trend etends across the whole of the 0 th century in the Netherlands ut does not egin until after the cohort orn in Germany and France The trend is most pronounced among the initially most disadvantaged classes, particularly farmers (IVc and unskilled workers (VII+III (see Breen et al 009: Figure shows, for each class, the partial correlation (that is, controlling for the effect of the other class dummies with latent educational attainment, y* These are all negative (ecause they are all measured relative to class I and so a line in Figure that rises towards ero over cohorts (a notale eample is the line for class IVa in the Netherlands tells us that a declining share of the conditional variance in y* lies etween that origin class and class I -- FIGURE HERE -- The trend towards greater equality, measured as the convergence of class coefficients, is somewhat less pronounced in Figure (correlations than in Figure 1 (logits, especially for France, and, in oth France and Germany, Figure suggests that the trend to equaliation egan later (after the cohort than Breen et al found Trends for particular classes are not always the same in the two figures: eg in France the partial correlations show that the positions of classes III and V+VI worsened relative to class I, whereas the logit coefficients in Figure 1 show that their position either remained unchanged or improved Equation (8 tells us that such discrepancies must e due to changes in the conditional (on the other classes variance of the

31 30 class dummies which will, to a considerale etent, reflect the changing relative sies of the social classes 7 We can investigate the differences etween the logit coefficients and the partial correlations y returning to equation (9 that shows that the partial logit coefficient is equal to the square root of the ratio of the eplained to the uneplained conditional variation in the latent y*, r y* 1 r y*, scaled y the ratio of the standardied standard deviation of the error ( π / 3 in the logit case to the residual standard deviation of, sd(ω/sd( We can therefore decompose the logit coefficients into these two parts In Tale 3 we report these two parts for the three countries In these epressions, stands for the dummy variales for the classes other than the one eing studied The logit coefficients shown in Figure 1 are the product of these two components --TABLE 3 HERE -- The trend in the French results reported in Figure are very similar to those of the ratio r shown in Tale 3 But the more pronounced decline in class origin effects in y* / 1 ry* France seen in the logits of Figure 1 is due to the steep reduction in the ratio sd(ω/sd( And since sd (ω is constant over cohorts, this change reflects the increase in the conditional standard deviations of the dummy variales over cohorts The Dutch case is similar: increases in the conditional standard deviations of the dummy variales together with declines in their eplanatory power give rise to the even more marked equaliation shown in Figure 1 But the 7 The conditional variance of a class dummy is the variance of the residual from a regression of that dummy variale on the dummy variales for the other classes Thus, change over cohorts in the conditional variance of a particular dummy reflects not just changes in the relative sie of that particular class; it is also sensitive to changes in the entire class distriution, aleit in a complicated way

32 31 German eperience is somewhat different: here Figures 1 and are more similar ecause less of the decline in the logits is attriutale to change in the conditional standard deviations of the class dummies As the trend in the ratio, sd( ω / sd(, in Tale 3 shows, although the conditional standard deviations of the class dummies have mainly increased (and so the ratio reported in the figure declines they have remained within a narrower range In general, changes in the residual standard deviation of the predictor variales could cause logits (or, generally, NLPMs to eaggerate or underplay trends or differences in the degree to which the predictors account for variation in the latent y* If sd ( increases over cohorts, as here, it will eaggerate a declining trend in educational inequality when measured with logit coefficients ut underestimate the opposite trend when measured with partial correlation coefficients A declining sd ( will magnify an increasing trend in logit coefficients ut dampen a declining one In the case at hand, where changes in sd ( magnify a declining trend, our overall conclusion would nevertheless e that inequalities in educational attainment declined over the 0 th century To support this conclusion, we report in Figure 3 the estimated R values (derived in Appendi D for each cohort We find a clear decline in the degree to which class origins account for variation in latent educational attainment And, ecause correlations are standardied measures, we can also compare them across countries In the oldest cohort class origins eplained most variation in educational attainment in Germany (30%, ut this is the country in which there has een the greatest asolute decline in R and y the youngest cohort class origins account for aout 15% of the variance in oth Germany and France In all cohorts the Netherlands displays the lowest amount of variance eplained y class origins: in the youngest

FinQuiz Notes

FinQuiz Notes Reading 9 A time series is any series of data that varies over time e.g. the quarterly sales for a company during the past five years or daily returns of a security. When assumptions of the regression