Bayesian Analysis of Treatment Effects in an Ordered Potential Outcomes Model

Size: px

Start display at page:

Download "Bayesian Analysis of Treatment Effects in an Ordered Potential Outcomes Model"

Alban Harrison
5 years ago
Views:

1 Bayesan Analyss of Treatment Effects n an Ordered Potental Outcomes Model Mnglang L Department of Economcs SUNY-Buffalo ml3@buffalo.edu Justn L. Tobas Department of Economcs Iowa State Unversty tobasj@astate.edu Abstract We descrbe a new Bayesan estmaton algorthm for fttng a bnary treatment, ordered outcome selecton model n a potental outcomes framework. We show how recent advances n smulaton methods, namely data augmentaton, the Gbbs sampler and the Metropols-Hastngs algorthm can be used to ft ths model effcently, and also ntroduce a reparameterzaton to help accelerate the convergence of our posteror smulator. Several computatonal strateges whch allow for non-normalty are also dscussed. Conventonal treatment effects such as the Average Treatment Effect (ATE), the effect of treatment on the treated (TT) and the Local Average Treatment Effect (LATE) are adapted for ths specfc model, and Bayesan strateges for calculatng these treatment effects are ntroduced. Fnally, we revew how one can potentally learn (or at least bound) the non-dentfed cross-regme correlaton parameter and use ths learnng to calculate (or bound) parameters of nterest beyond mean treatment effects. ACKNOWLEDGEMENTS: We would lke to thank helpful comments and suggestons from two anonymous referees, the edtor Ed Vytlacl, and partcpants at the 4th Annual Advances n Econometrcs Conference. All errors are our own.

2 1 Introducton As evdenced by the vast lterature dedcated to the ssue, the problem of dentfyng and estmatng the effects of treatment from observatonal data s of central mportance to economcs and the socal scences. As suggested by the artcles appearng n ths volume, there are many estmaton strateges commonly employed n ths lterature, and the assumptons made n and ssues emphaszed by these varous approaches can be qute dstnct. For nstance, some studes employ fully parametrc models to conduct ther analyses, argung that the use of such models permts the estmaton of a wde range of polcy-relevant parameters, 1 whle others seek a more agnostc approach and thus pursue nonparametrc or semparametrc technques. 2 Many emprcal studes n ths area argue that the most convncng way to surmount the problem of treatment endogenety s to make use of cleverly chosen natural experments or nstrumental varables, 3 whle others are content to pursue more structural equaton approaches where the role of the excluson restrcton s decdedly less mportant and the dscusson surroundng the nstrument s muted. 4 Fnally, as n econometrcs generally, there are both Bayesan and Classcal approaches for handlng these types of models. In ths paper we focus prmarly on ths last dstncton and take up the case of Bayesan estmaton of a partcular type of treatment-response model. Whle Bayesan work on the analyss of treatment or causal effects has become more common n the econometrcs lterature [e.g., Vjverberg (1993), Koop and Porer (1997), L (1998), Chb and Hamlton (2000, 2002), Porer and Tobas (2003) and L, Porer and Tobas (2004)], the use of such technques contnues to reman rare relatve to Classcal approaches. We do not am to reduce ths dsparty by proselytzng at length n ths paper about the merts of the Bayesan approach relatve to Classcal methods. Instead, our goal s to revew how a Bayesan mght handle specfcatons, smlar to the Roy (1951) model, whch are commonly encountered n the treatment effect lterature, to revew some computatonal advances whch should appeal 1 Heckman, Tobas and Vytlacl (2003), for example, dscuss parametrc approaches for estmatng a varety of popular treatment effects under varous dstrbutonal assumptons. 2 Mansk s (1990, 1994) nonparametrc boundng s a leadng example. 3 See Angrst and Krueger (2001) for a revew. 4 Gould (2002,2005), for example, argues that havng strong predctors for treatment status s more mportant for practcal dentfcaton purposes than requrng that some set of covarates are excluded from the outcome equaton. In appled Bayesan work [e.g., Porer and Tobas (2003), Munkn and Trved (2003) and L, Porer and Tobas (2004)], the nstrument tends to receve decdedly less dscusson. In emprcal practce, however, such excluson restrctons should be, and typcally are used when avalable. 1

3 to all researchers when faced wth estmaton of these types of models, to ntroduce an ssue that s somewhat unque to the Bayesan lterature on ths topc, and to provde new results on Bayesan estmaton of a specfc type of treatment effect model. We take up the partcular case of a treatment-response model where treatment status s bnary and the outcome of nterest s ordered. To our knowledge, a dscusson of ths partcular model s new to the Bayesan lterature, though hghly related models, ncludng those of the bnary treatment / contnuous outcome and ordered treatment / bnary outcome varetes have appeared n Chb and Hamlton (2000). We present our model n a potental outcomes framework and thus model both the observed outcome of the agent gven her treatment choce as well as the potental or counterfactual outcome for that agent had she made a dfferent treatment decson. We show how data augmentaton [e.g., Tanner and Wong (1987), Albert and Chb (1993)] n conjuncton wth the Gbbs sampler and Metropols-Hastngs algorthm [e.g., Casella and George (1992), Terney (1994), Chb and Greenberg (1995)] can be used to ft ths partcular model effcently, and also ntroduce a reparameterzaton to help accelerate the convergence of our posteror smulator. Several computatonal strateges whch allow for non-normalty are also dscussed, though not employed. Treatment effects smlar n sprt to the Average Treatment Effect (ATE), the effect of treatment on the treated (TT) and the Local Average Treatment Effect (LATE) 5 are adapted for the case of our ordered response, and Bayesan strateges for calculatng these treatment effects are descrbed. Fnally, we dscuss how one can potentally learn about (or at least bound) the non-dentfed cross-regme correlaton parameter 6 and use ths learnng to calculate (or bound) parameters of nterest beyond mean treatment effects. The outlne of ths paper s as follows. Secton 2 presents the basc potental outcomes model and secton 3 dscusses our Bayesan estmaton algorthm. Often-reported treatment parameters such as ATE, TT and LATE are derved for our model n secton 4 and procedures for calculatng these effects are descrbed. A generated data experment whch llustrates the performance of our algorthm s provded n secton 5, and the paper concludes wth a 5 See, for example, Imbens and Angrst (1994) for a dscusson of LATE and Heckman and Vytlacl (1999, 2000) for detaled dscussons of these and other treatment effects. 6 For related dscussons on ths topc, see Vjverberg (1993), Koop and Porer (1997), Porer (1998), Porer and Tobas (2003) and L, Porer and Tobas (2004). 2

4 summary n secton 6. 2 The Model What we have n mnd s the development of a parametrc model that wll enable researchers to nvestgate the mpact of a bnary (and potentally endogenous) treatment varable, denoted D, where D = 1 mples recept of treatment and D = 0 mples non-recept, on an ordered outcome of nterest, denoted y {1, 2,, J}. There are numerous examples where such a model would be approprate. For example, one mght use ths model to nvestgate, say, the mpact of enrollng n a supplemental learnng center on atttudes toward educaton (measured as a categorcal response) or the quantty of educaton ultmately receved by the student. More generally, such a model s potentally of value n any stuaton where the outcome of nterest (e.g., earnngs, educaton, expendture) s recorded categorcally rather than contnuously and the model also contans a dummy endogenous varable. 7 We cast ths evaluaton problem n a potental outcomes framework and thus explctly model the counterfactual state - the ordered outcome that would have been observed had the agent made a dfferent treatment decson. We let y (1) denote the outcome receved by the agent n the treatment state and y (0) denote the outcome receved wthout treatment. Only one outcome, denoted y, s ever observed for any agent, and thus y = D y (1) + (1 D )y (0). We suppose that the observed treatment decson D and the observed and potental ordered outcomes y (1) and y (0) are generated by an underlyng latent varable representaton of the model. Specfcally, we wrte: 8 D = w β (D) + u (1) z (1) = x β (1) + ɛ (1) (2) 7 One can also conceve of stuatons where the modelng of count outcomes s desred (e.g. Munkn and Trved 2003). Clearly, approaches to modelng ordered and count outcomes mpose dfferent parametrc assumptons on the response (e.g. ordered probt versus Posson or negatve bnomal dstrbuton), nvoke dfferent nterpretatons of the outcomes of nterest (ordnal versus cardnal), and nvolve dfferent assessments of the censorng feature of the outcomes (censored versus unbounded). Whch approach s more approprate depends crtcally on the type of applcaton that s consdered. 8 In ths paper, we assume that the same set of covarates appear n the treated and untreated states. If desred, ths assumpton could be relaxed and ths extenson ncorporated nto the dervatons whch follow. 3

5 z (0) = x β (0) + ɛ (0). (3) The bnary treatment ndcator D s related to the latent D as follows: D = I(D > 0) = I[u > w β (D) ], (4) wth I( ) denotng the standard ndcator functon. Smlarly, the ordered responses y (1) y (0) are related to the latent varables z (1) and z (0) as follows: y (k) = j ff α (k) j < z (k) and α (k) j+1, k = 0, 1, j = 1, 2,, J. (5) The {α (k) j }, k = 0, 1, j = 1, 2,, J are cutponts n the model, mappng the latent ndces n both states nto dscrete values of our ordered response. We mpose standard dentfcaton condtons on these cutponts, namely, α (1) 1 = α (0) 1 =, α (1) 2 = α (0) 2 = 0 and α (1) J+1 = α (0) J+1 =. We also let α (1) = [α (1) 3 α (1) 4 α (1) J ] denote the cutpont vector for the treated state and defne the 1 (J 2) vector α (0) smlarly. In ths model we also assume the avalablty of an excluson restrcton - some covarate whch enters w that s not contaned n x. To motvate the mportance of ths assumpton, consder a restrcted verson of (1)-(3) whch conssts of equaton (1) and a latent varable equaton lke (2), the latter of whch ncludes the observed D as an element of x. Ths restrcted model would be of the form of a standard treatment or causal effect model that only works wth observed rather than potental outcomes. Maddala (1983, page 122), for example, shows that the parameters of such a model are not dentfable unless the errors of the equaton system are uncorrelated or such an excluson restrcton s present. The former condton often seems rather untenable n emprcal practce, and thus we mantan that such an excluson restrcton s avalable. Fnally, we fx deas throughout the remander of ths dscusson by assumng jont Normalty of the error terms: 9 u ɛ (1) ɛ (0) x, w d N 0 0 0, 1 ρ (1) ρ (0) ρ (1) 1 ρ (10) ρ (0) ρ (10) 1 N(0, Σ). (6) Equatons (1) - (6) then denote the complete specfcaton of our ordered potental outcomes model. 9 We dscuss how ths requrement can be relaxed n secton 3.4 of ths paper. The varances of the errors n all the equatons have been normalzed to unty for dentfcaton purposes. 4

6 2.1 The lkelhood Gven the assumed condtonal ndependence across observatons, we can wrte the lkelhood functon for ths model as: p(y, D Γ) L(Γ; y, D) = [ :D =1 Pr(y (1) = y, D = 1 Γ)][ :D =0 Pr(y (0) = y, D = 0 Γ)], where Γ = [β (D) β (1) β (0) α (1) α (0) ρ (0) ρ (1) ρ (10) ]. The jont probabltes requred n calculatng ths lkelhood can be obtaned from the bvarate Normal cdf. For example, Pr(y (1) = y, D = 1 Γ) = Pr(α (1) y < z (1) α (1) y +1, u > w β (D) w, x, Γ) (7) = Pr(α (1) y x β (1) < ɛ (1) α (1) y +1 x β (1), u > w β (D) w, x, Γ). Provded one uses a statstcal package contanng a routne for evaluatng a bvarate Normal cdf, standard MLE can be mplemented. (8) If no such routne s avalable on a partcular package, one could frst reduce the probabltes above to unvarate ntegraton problems and then employ standard numercal approxmatons such as Smpson s rule or Gaussan quadrature to approxmate the requste ntegrals. To see ths more clearly, let P 1,j Pr(D = 1, y (1) = j Γ), and note from (8) P 1,j = Pr(α (1) j x β (1) < ɛ (1) α (1) = = α (1) j+1 x β (1) α (1) j x β (1) (1) α j+1 x β (1) α (1) j x β (1) Φ p(ɛ (1) w β (D), u ) du dɛ (1) w β (D) ρ (1) ɛ (1) 1 ρ (1)2 j+1 x β (1), u > w β (D) x, w, Γ) p(ɛ (1) ) dɛ (1). In ths form a varety of approaches can be employed to approxmate the requred unvarate ntegrals. In our dscusson of treatment effects n secton 4, we wll return to one approach to ths problem based on Monte Carlo ntegraton usng truncated Normal samplng. Importantly, we also recognze that our estmaton strategy va data augmentaton, as descrbed n the followng secton, avods the need for any numercal ntegraton of the above form, and therefore provdes an attractve alternatve to the mplementaton of standard MLE. 5

7 3 Bayesan Estmaton To perform a Bayesan analyss, a researcher frst starts off as a classcal econometrcan mght by specfyng the lkelhood functon for ths model, as mpled from (1) - (6) and descrbed n the precedng secton. To ths lkelhood, the researcher adds a pror densty, say p(γ), wth Γ denotng the parameters of the model. Ths pror s chosen to reflect her subjectve belefs about values of the parameters, and n most cases s chosen to be suffcently vague or flat so that nformaton contaned n the data wll domnate nformaton nsnuated through the pror. The pror densty p(γ) combned wth the lkelhood p(y, D Γ) yelds the jont posteror densty p(γ y, D) up to proportonalty va Bayes theorem. Ths jont posteror completely summarzes the output of a Bayesan procedure - from t, one can obtan pont and nterval estmates, margnal posteror denstes, posteror quantles, or other quanttes of nterest. Whle n theory ths smple exercse outlnes the machnery nvolved n Bayesan posteror calculatons, n practce, extractng useful nformaton from a gven posteror p(γ y, D) can be dffcult. Drect calculaton of a posteror mean of an element of Γ, for example, frst requres that the normalzng constant of the jont posteror s known (whle often t s not), and even f the normalzng constant were known, the mean calculaton would stll requre solvng a hgh-dmensonal ntegraton problem. In models of moderate complexty, these ntegraton problems usually have no analytc solutons. Instead of drect evaluaton of ths posteror, modern Bayesan emprcal work makes use of recent advances n smulaton methods to carry out a posteror analyss. Two smulaton devces n partcular, called the Gbbs sampler and Metropols-Hastngs algorthm, are wdely used and have become ndspensable nstruments n an appled Bayesan s toolkt. Both of these algorthms solve the problem of calculaton of posteror moments, quantles, margnal denstes or other quanttes of nterest by frst obtanng a set of draws from the posteror p(γ y, D). Typcally, one can not draw drectly from ths densty, but nstead, one can generate a sequence of draws (by approprately followng the steps of the algorthms) that converge to ths dstrbuton. Once convergence has been acheved, the subsequent set of smulated parameter values can be used to calculate the desred quanttes (e.g., posteror means). In the Gbbs sampler, a Markov Chan whose lmtng dstrbuton s p(γ y, D) s produced by teratvely samplng form the complete posteror condtonals of the model. In many 6

8 cases, typcally n models wth condtonally conjugate prors, these posteror condtonals have well-known forms and can be easly sampled. The Metropols-Hastngs algorthm s a generalzaton of the Gbbs sampler and s a multvarate accept-reject algorthm. The algorthm s, agan, constructed so that the lmtng dstrbuton of the Markov chan s the target densty, p(γ y, D). 10 In terms of the model descrbed n ths paper, Bayesan estmaton of the specfcaton n (1) - (6) would lkely make use of data augmentaton [e.g. Tanner and Wong (1987), Albert and Chb (1993)] n conjuncton wth the algorthms above. When data augmentaton s used, the posteror s frst expanded (or, as the name suggests, augmented) to nclude not only the parameter vector Γ, but also the latent data s = [D z (1) z (0) ]. Although ths would seem to complcate the estmaton exercse, use of data augmentaton often smplfes the requred posteror calculatons. Ths s partcularly true when data augmentaton s used n conjuncton wth the Gbbs sampler snce, condtoned on the latent data, nference regardng the regresson parameters proceeds as a lnear regresson model would, and gven the regresson parameters, t s often straght-forward to obtan draws from the posteror condtonal for the latent data. For our model, ths augmented posteror s of the form: p(d, z (1), z (0), Γ y, D) p(y, D, D, z (1), z (0), Γ) (9) = p(y, D D, z (1), z (0), Γ)p(D, z (1), z (0) Γ)p(Γ), (10) wth p(γ) denotng the pror for the parameters of our model. The mddle term n the above expresson s mmedately known as a trvarate Normal densty, gven the jont Normalty assumpton n (6) combned wth the model n (1)-(3). The last term smply denotes the pror for our model parameters. For the frst term, condtoned on the latent varables and model parameters, the observed responses D and y are known wth certanty and thus the jont (condtonal) dstrbuton for y and D s degenerate. Puttng these peces together, and explotng the assumed condtonal ndependence across observatons, we can wrte the 10 A detaled revew of these smulaton methods s beyond the scope of ths paper; the nterested reader s nvted to see Casella and George (1992), Terney (1994), Chb and Greenberg (1995), Glks et al (1998), Geweke (1999), Chen, Shao and Ibrahm (2000), Carln and Lous (2000), Geweke and Keane (2001), Chb (2001), Koop (2003), Lancaster (2004), Gelman et al (2004), Porer and Tobas (2006) and Koop, Porer and Tobas (2006) (among others) for detaled and comprehensve descrptons of these and other methods. 7

9 augmented posteror as follows: n p(d, z (1), z (0), Γ D, y) p(γ) φ 3 (s ; r β, Σ) (11) =1 [ ] I(D = 1)I(D > 0)I(α y (1) < z (1) α (1) y +1) + I(D = 0)I(D 0)I(α y (0) < z (0) α (0) y +1). In the above, we have defned s = D z (1) z (0), r = w x x, β = β (D) β (1) β (0), (12) and φ 3 (x; µ, Ω) denotes a trvarate Normal densty wth mean µ and covarance matrx Ω. Fnally, Σ s defned n (6). The ndcator functons added to (11) serve to capture the degenerate jont dstrbuton of y and D gven the latent data and model parameters. 3.1 A Useful Reparameterzaton In theory, one could drectly apply standard computatonal tools (namely the Gbbs Sampler coupled wth a few Metropols-wthn-Gbbs steps) to ft the model n (11). However, t has been shown n related work [e.g., Cowles (1996), Nandram and Chen (1996) and L and Tobas (2005)], that use of the standard Gbbs sampler n models wth ordered responses suffers from slow mxng due to hgh correlaton between the smulated cutponts and latent data. As dscussed n the prevous secton, the parameter draws obtaned from our estmaton algorthm form a Markov chan, and when the chan mxes slowly, we observe only very small local movements from teraton to teraton. As a result, t may take a very long tme for our smulator to traverse the entre parameter space. When the lagged autocorrelatons between the smulated parameters are very hgh, estmates of posteror features may be qute naccurate, and numercal standard errors assocated wth those estmates wll be unacceptably large. To mtgate ths slow mxng problem, and move closer to a stuaton where we can obtan d samples from the posteror, we suggest below an alternate parameterzaton of the model, buldng of the suggeston of Nandram and Chen (1996). To shed some nsght on ths reparameterzaton, frst separate out the largest cutponts from the treated state, (α (1) J ), and untreated state, (α (0) J ), and defne the transformatons: σ 1 = 1/[α (1) J ] 2 and σ 0 = 1/[α (0) J ] 2. 8

10 In addton, for any varable Q let Q (1) σ 1 Q (1) and defne (0) Q σ 0 Q (0) smlarly. The model n (1) - (3) s then observatonally equvalent to where D = w β (D) + u (13) z (1) = x β(1) + ɛ (1) (14) z (0) = x β(0) + ɛ (0). (15) y (k) = j ff α (k) j < z (k) α j+1, (k) k = 0, 1. (16) In other words, the lkelhood functon for the observed data s unchanged when multplyng (2) and (3) by σ 1 and σ 0, respectvely, and approprately adjustng the rule n (16) whch maps the latent data nto the observed responses. The error varance for the transformed dsturbances now takes the followng form: u 0 ɛ (1) x d, w N 0, ɛ (0) 0 1 σ 1D σ 0D σ 1D σ 1 σ 10 σ 0D σ 10 σ 0 N(0, Σ), (17) where σ 1D σ 1 ρ (1), σ 0D σ 0 ρ (0) and σ 10 = σ 1 σ0 ρ (10). The correlaton parameters ρ (1), ρ (0) and ρ (10) are defned n (6). When the model s wrtten as n (13) - (17), t suggests that we can work wth an augmented posteror dstrbuton contanng the latent varables D, z (1), z (0) and parameters Γ = [ β σ 1D σ 0D σ 10 σ 1 σ 0 α (1) α (0) ] nstead of D, z (1), z (0) and Γ as n (11). The transformed cutpont and coeffcent vectors contaned n Γ are defned as follows: 11 α (1) = [ α (1) 3 α (1) 4 α (1) J 1], α (0) = [ α (0) 3 α (0) 4 α (0) J 1], and β = [β (D) β(1) β(0) ]. Followng smlar dervatons to those leadng to (11), we obtan the augmented jont posteror dstrbuton for the transformed parameters: p(d, z (1), z (0), Γ D, n y) p( Γ) φ 3 ( s ; r β, Σ) (18) =1 [ ] I(D = 1)I(D > 0)I( α y (1) < z (1) α (1) y +1) + I(D = 0)I(D 0)I( α y (0) < z (0) α (0) y +1) where s [D z (1) z (0) ] and Σ s defned n (17). 11 Note that the largest cutponts have been taken out of each cutpont vector and these largest cutponts are replaced by σ 1 and σ 0 n ths alternate parameterzaton. 9

11 When workng wth ths model, we employ ndependent prors for the parameters of Γ: p( Γ) = p( β)p( α (1) )p( α (0) )p( Σ). We center the regresson parameters around a pror mean of zero and specfy them to be ndependently dstrbuted wth large pror varances: β N(b0 = 0 k 1, V β = 1000I k ). The pror probablty densty functon of α (1) and α (0) s assumed to be proportonal to some constant: p( α (1) 3,, α (1) J 1, α (0) 3,, α (0) J 1) c, and fnally, an nverse Wshart pror of the form Σ IW (ρ, R) wth ρ = 6, R = I 3 s employed subject to the restrcton that the (1, 1) element of Σ s equal to one. 3.2 Benefts and Costs of Reparameterzaton To ths pont we have offered no compellng arguments why one should work wth the reparameterzed model nstead of workng drectly wth the orgnal structural representaton of the model. The frst argument n support of the reparameterzaton, as noted n Nandram and Chen (1996), and further shown n L and Tobas (2005), s that the rescalng helps to sgnfcantly reduce the autocorrelaton among the posteror smulatons, thus acceleratng the convergence of the algorthm. In other words, gven an equal number of posteror draws, the numercal standard errors obtaned when workng wth the reparameterzed model wll be sgnfcantly smaller than those obtaned when usng the orgnal parameterzaton of the model. Second, and qute mportantly from a computatonal pont of vew, ths transformaton effectvely restores the conjugacy requred to smulate the parameters of the covarance matrx. That s, n the orgnal parameterzaton of the model n (6), there are restrctons on all three dagonal elements of the covarance matrx. Ths precludes drawng the elements of the nverse covarance matrx from a Wshart dstrbuton, (as s typcally the case when conjugate prors are employed), snce the posteror condtonal s no longer Wshart gven the dagonal restrctons. On the other hand, n our reparameterzed verson of the model, the covarance matrx n (17) contans only one dagonal restrcton, and usng Algorthm 3 of Noble (2000), one can generate draws from ths restrcted Wshart densty. Thus, workng wth the reparameterzed model facltates smulatng parameters of the covarance matrx, and no Metropols-Hastngs steps are requred for ths porton of the posteror smulator. Fnally, for the specfc case where there are three possble ordered outcomes (.e., J = 3), there are effectvely no unknown cutponts n the transformed representaton of ths model. 10

12 For ths case, the cutponts are sampled through standard samplng of the elements of the covarance matrx. For ths partcular model wth 3 outcomes, posteror smulaton usng the reparameterzed model s qute fast, and no Metropols-Hastngs steps are requred at any pont n the algorthm. For J > 3, however, addtonal Metropols-Hastngs steps are requred to smulate elements of the cutpont vectors α 1 and α 0. The man, and perhaps only, drawback to workng wth the reparameterzed model s that t requres us to place prors on the transformed parameters Γ. The prors we place on these parameters may seem reasonable and sutably default, but upon closer nvestgaton, they may mply prors for the structural parameters that are unreasonable and are at odds wth our vews about quanttes for whch we can more easly elct our pror belefs. For a twoequaton treatment-response model contanng an ordered treatment varable and an ordered response, L and Tobas (2005) derve some connectons between prors lke those employed n (18) and ther consequences on the prors mpled on the structural parameters n (11). Wth sutably chosen hyperparameters, they argue that the mpled prors on the structural coeffcents can be reasonable, and that any costs assocated wth ths pror selecton ssue are more than outweghed by the benefts afforded by the reparameterzaton. We take up a more detaled vew of ths ssue of pror selecton for ths partcular model n our generated data experments of secton The Posteror Smulator We now ntroduce our posteror smulator for fttng our reparameterzed treatment-response model. In what follows, we adopt the notaton Γ x to denote all parameters other than x. We frst group the jont posteror nto [D z (1) z (0) α (1) α (0) β Σ]. The latent data and cutponts wll be sampled n blockng steps, whle the regresson parameters and covarance matrx wll be drawn from ther complete posteror condtonal. Step 1: Draw β from 12 β s, Γ β, y, D N ( D β d β, D β ) 12 It s useful to note that, condtonal on the latent varables, our model s essentally a seemngly unrelated regressons (SUR) model except for the restrcton that one dagonal element of the covarance matrx s fxed at one. 11

13 where ( n ) 1 D β r Σ 1 r + V β 1 and d β n r Σ 1 s + V β 1 b 0. =1 =1 Step 2: Draw Σ from ( [ Σ s, Γ n ]) Σ, y, D IW n + ρ, ( s r β)( s r β) + ρr I( Σ 11 = 1). =1 Algorthm 3 n Noble (2000) s used to generate varates from ths nverted Wshart dstrbuton, condtoned on the value of the (1,1) element. The remanng steps n the posteror smulator nvolve jont samplng of the latent data s = [D z (1) z (0) ] and cutpont vectors α (1) and α (0). We attempt to mtgate autocorrelaton n our parameter chans by blockng or groupng the cutponts from a gven equaton together wth the latent data appearng n that equaton. Specfcally, we proceed by samplng from the followng denstes: α (1), z (1) z (0), D, Γ α (1), y, D (19) α (0), z (0) z (1), D, Γ α (0), y, D (20) and D z (1), z (0), Γ, y, D. (21) Takng a closer look at the frst of these three denstes, we fnd from (18) α (1), z (1) z (0), D, Γ α (1), y, D (22) n (1) φ 3 ( s ; r β, Σ)[I( α y < z (1) α (1) y +1)I(D = 1) + I(D = 0)] =1 n =1 φ( z (1) ; µ c 1, σ c 1)[I( α (1) y < z (1) α (1) y +1)I(D = 1) + I(D = 0)]. Note that the ndcator functons nvolvng D and z (0) n (18) have dsappeared completely smply because we are now condtonng on these latent parameters. In the last lne of (22), we have broken the trvarate Normal densty for s nto a condtonal for z (1) tmes the jont for z (0) and D. The latter jont densty s then absorbed nto the normalzng constant of (22), as t does not nvolve α (1) or z (1). It follows that the condtonal mean µ c 1 and condtonal varance σ c 1 are defned as: µ c 1 x β(1) + [ σ 1D σ 10 ] [ 1 σ0d σ 0D σ 0 12 ] 1 [ D w β (D) z (0) x β(0) ]

14 and [ ] 1 [ ] 1 σ 1 c σ0d σ1d σ 1 [ σ 1D σ 10 ]. σ 0D σ 0 σ 10 To obtan a draw from (19), we proceed n two steps and use the method of composton [see, e.g., Chb (2001)]. Frst, we margnalze (19) over z (1) and descrbe a procedure for drawng α (1) from ths densty. In the second step, we draw z (1) from z (1) z (0), D, Γ, y, D. The realzed values of α (1) and z (1) then form a draw from (19). After ntegratng (19) over z (1), we obtan: α (1) z (0), D, Γ α (1), y, D Φ α(1) y +1 µ c ( 1 α (1) y Φ µ c ) 1. (23) c c :D =1 σ 1 σ 1 Step 3: To sample from the densty n (23), we follow the suggeston of Nandram and Chen (1996), who suggest usng a Drchlet proposal densty to sample dfferences 13 between the cutpont values, q (1) j = α (1) j+1 α (1) j, j = 3,, J 1. Gven that the largest cutpont takes the value of unty, we can then solve back to obtan the values of the cutponts themselves. Specfcally, we sample {q (1) j } J 1 j=3 Drchlet({δ (1) j n (1) j + 1} J 1 j=3 ), where {δ (1) j } J 1 j=3 = 0.1 are tunng parameters, and n (1) j n =1 I(y = j)i(d = 1), j = 3, J 1 are the numbers of ndvduals fallng nto each category of the outcome varable n the treated state. The probablty of acceptng the canddate draw s the standard Metropols-Hastngs probablty, mn(r, 1), where R = [ :D =1 Φ([ α (1) y +1,can µ c 1]/ σ 1) c Φ([ α y (1),can µ c 1]/ σ 1) c Φ([ α (1) y +1,l 1 µc 1]/ σ 1) c Φ([ α (1) y,l 1 µc 1]/ σ 1) ][ c J 1 ( q(1) j,l 1 j=3 q (1) j,can ) δ(1) j n (1) j ], l 1 denotes the current value of the algorthm and can denotes the canddate draw from the Drchlet proposal densty. Step 4: Sample z (1) z (1) z (0), D, Γ, y, D nd ndependently from the condtonal T N ( α (1) y, α (1) y +1 ) ( µc 1, σ c 1) f D = 1 N( µ c 1, σ c 1) f D = 0, = 1, 2,, n. Ths s a Normal densty wth mean µ c 1 and varance σ c 1, and s truncated to the nterval ( α (1) y, α (1) y +1) f ndvdual s observed to be n the treatment group. When D = 0, no 13 Note that samplng the cutponts n ths way enforces the orderng restrcton on the cutpont values. 13

15 restrctons arse regardng the latent data z (1), and thus the draw s obtaned from the untruncated Normal densty. To generate draws from a unvarate truncated Normal densty, one can use standard nverson methods. That s, to generate x T N (a,b) (µ, σ 2 ), frst draw U unformly on (0, 1) and then set ( ) [ ( ) ( ) a µ x = µ + σφ [Φ ]] b µ a µ 1 + U Φ Φ. σ σ σ Step 5: By smlar arguments as those leadng up to step 3, one can show that α (0) z (1), D, Γ α (0), y, D Φ α(0) y +1 µ c ( 0 α (0) y Φ µ c ) 0 c c σ 0 σ 0 where µ c 0 x β(0) + [ σ 0D σ 10 ] :D =0 [ 1 σ1d σ 1D σ 1 ] 1 [ D w β (D) z (1) x β(1) and [ ] 1 [ ] 1 σ 0 c σ1d σ0d σ 0 [ σ 0D σ 10 ]. σ 1D σ 1 σ 10 A strategy dentcal to that descrbed n Step 3 can be used to smulate the cutponts from ths proposal densty. ] (24) Step 6: Sample z (0) z (0) z (1), D, Γ, y, D nd ndependently from the condtonal T N ( α (0) y, α (0) y +1 ) ( µc 0, σ c 0) f D = 0 N( µ c 0, σ c 0) f D = 1, = 1, 2,, n. Step 7: Sample D D z 0, z (1), Γ, y, D nd In the above, we have defned ndependently from the condtonal µ c D w β (D) + [ σ 1D σ 0D ] { T N(0, ) ( µ c D, σ c D) f D = 1 T N (,0) ( µ c D, σ c D) f D = 0 [ σ1 σ 10 σ 10 σ 0 ] 1 [ z (1) z (0), = 1, 2,, n. ] x β(1) x β(0) and [ ] 1 [ ] σ D c σ1 σ 1 [ σ 1D σ 0D ] 10 σ1d. σ 10 σ 0 σ 0D Iteratng through steps 1-7 produces a draw from the augmented jont posteror dstrbuton. To recover the structural coeffcents of nterest, we smply nvert the mappngs descrbed above (13) and below (17). 14

16 3.4 Extendng the Model: Allowng for Non-Normalty A lmtaton of the model descrbed thus far n ths paper s ts relance on jont Normalty. For some applcatons, such as log wage outcomes [e.g., Heckman and Sedlacek (1985), Heckman (2004)], the Normalty assumpton may be a reasonable approxmaton, and f the model passes a selecton of dagnostc tests 14 no further refnements would be requred. For other models, researchers may worry about heavy tals, asymmetry or possbly bmodalty n the dsturbance varance. Below we outlne smple computatonal trcks for capturng these features of the data and generalzng the Normalty assumpton. The most straght-forward extenson of the model s to expand to Student-t errors by smply addng the approprate mxng varables to the dsturbance varance [see, e.g., Carln and Polson (1991), Geweke (1993), Albert and Chb (1993), Chb and Hamlton (2000) and L, Porer and Tobas (2004)]. For example, f we generalze the Normalty assumpton n (6) to u ɛ (1) ɛ (0) λ, x, w, Σ nd N (0, λ Σ), and specfy a pror for λ of the form 15 where Σ λ d IG(ν/2, 2/ν), t follows that (margnalzed over the pror for λ ): u ɛ (1) ɛ (0) 1 ρ (1) ρ (0) ρ (1) 1 ρ (10) ρ (0) ρ (10) 1, (25) x, w, Σ t ν (0, Σ), (26) a multvarate Student-t densty wth mean zero, scale matrx Σ and ν degrees of freedom. Ths devce s partcularly useful for modelng symmetrc error denstes whose tals are heaver than those mpled by the Normal densty. In addton, such an extenson to the model comes at lttle computatonal cost snce, condtoned on {λ }, samplng the regresson parameters and covarance matrx s straght-forward, and each λ can be drawn ndependently from ts complete posteror condtonal, whch s of an nverse Gamma form. 14 For example, one can calculate posteror predctve p-values [Gelman et al (2004)], QQ plots and other standard dagnostc crtera [e.g., Lancaster (2004), Koop, Porer and Tobas (2006)] to evaluate the approprateness of the Normalty assumpton. For more on the performance of related models under non-normalty, see, for example Goldberger (1983) or Paarsch (1984). 15 The nverted Gamma (IG) random varable s parameterzed as follows: p(x) x (a+1) exp[ 1/(bx)]. 15

17 An analogous and potentally more flexble extenson of the model s to suppose that the errors were drawn from a mxture of Normal denstes. Lke (25), we mght wrte u ɛ (1) ɛ (0) λ, x, w, Σ nd λ N(0, Σ 1 ) + (1 λ )N(0, Σ 2 ). (27) So, condtoned on λ (whch s unobserved), each observaton s ascrbed to one component of the mxture model wth covarance matrx equal to ether Σ 1 or Σ 2. Snce the component assgnment s known gven λ, t s, agan, straght-forward to obtan draws from the regresson parameters and component-specfc covarance matrces. The λ are then smulated ndependently from a two-pont dstrbuton. 16 Fnally, one can generalze ths mxture model even further by allowng the regresson parameters to vary across the mxture components. To do ths, we wrte s λ, Γ nd λ N(r β 1, Σ 1 ) + (1 λ )N(r β 2, Σ 2 ), where s and r are as defned n (12), and β j, Σ j represent the regresson parameter vector and covarance matrx from the j th component of the mxture. Generalzaton to more than two components s a also straghtforward, and the component ndcators and component probabltes can be smulated from multnomal and Drchlet denstes, respectvely [see, e.g., L, Porer and Tobas (2004)]. 4 Treatment Effects In ths secton we derve expressons for conventonal treatment effects n our ordered outcome treatment-response model. In partcular, we adapt conventonal treatment parameters ncludng the Average Treatment Effect (ATE), the effect of Treatment on the Treated (TT) and the Local Average Treatment Effect (LATE) to our ordered response model, and descrbe how these can be calculated wthn ths framework. 16 See., e.g., McLachlan and Peel (2000). There s an mportant ssue about local non-dentfablty of the mxture model parameters; the parameters are not dentfed up to a permutaton of the mxture components. To ad n dentfcaton, prors can be used that mpose an orderng restrcton on the varance parameters, regresson parameters or component probabltes. In some cases, there s lttle concern for component swtchng, but n other cases, ths ssue may be a sgnfcant concern. 16

18 4.1 The Average Treatment Effect We begn wth a dscusson of the Average Treatment Effect (ATE). Ths parameter typcally quantfes the expected outcome gan for a randomly chosen ndvdual. Snce our response s ordered, ths parameter may not be of drect relevance, as t demands a cardnal representaton of an ordnal varable. 17 In lght of ths ssue, we choose to adapt the ATE parameter to descrbe across-regme changes n probabltes assocated wth varous categores. fx deas, then, we consder the mpact of the treatment on ncreasng (or decreasng) the probablty that the outcome exceeds the lowest category: AT E(x; Γ) Pr(y (1) 2 x, Γ) Pr(y (0) 2 x, Γ) (28) = J [ Pr(y (1) = j x, Γ) Pr(y (0) = j x, Γ) ] (29) = j=2 J j=2 ([ Φ(α (1) j+1 xβ (1) ) Φ(α (1) j xβ (1) ) ] [ Φ(α (0) j+1 xβ (0) ) Φ(α (0) j xβ (0) ) ]). The choce of the lowest category s wthout loss of generalty; other probabltes can be obtaned n smlar ways. We relate ths quantty to ATE snce t corresponds to a probablty ncrease (or decrease) for a randomly chosen ndvdual. To A pont estmate of ths treatment mpact s readly obtaned usng our smulated set of parameters drawn from the jont posteror: ˆ AT E(x) E Γ y,d [AT E(x; Γ)] 1 M M AT E(x; Γ m ), (30) m=1 where Γ m p(γ y, D) and s obtaned from the algorthm descrbed n secton The Effect of Treatment on the Treated The effect of Treatment on the Treated (TT) s a conceptually dfferent parameter and descrbes the outcome gan (or loss) from treatment for those actually selectng nto treatment. 17 In some cases, however, t may be. For example, one could use an ordered model to analyze, say, years of schoolng completed, and thus reman true to the nteger-valued nature of the educaton data. In ths case, the ordered varable has a natural cardnal nterpretaton, and thus the conventonal ATE parameter would be of nterest. 17

19 Agan, we examne the treatment effect on the probablty that the outcome varable does not fall nto the lowest category: T T (x, w, D(w) = 1; Γ) Pr(y (1) 2 x, w, D(w) = 1, Γ) Pr(y (0) 2 x, w, D(w) = 1, Γ) (31) J [ = Pr(y (1) = j x, w, D(w) = 1, Γ) Pr(y (0) = j x, w, D(w) = 1, Γ) ]. j=2 To economze on notaton, let us defne P T T 1,j (Γ) Pr(y (1) = j x, w, D(w) = 1, Γ) and P T T 0,j (Γ) Pr(y (0) = j x, w, D(w) = 1, Γ), keepng the condtonng on x, w and D(w) = 1 mplct. Gven these defntons, t follows that T T (x, w, D(w) = 1; Γ) = J j=2 [P T T 1,j (Γ) P T T 0,j (Γ)]. (32) Recallng our descrpton of the lkelhood n secton 2.1, we can wrte the probabltes n (32) n more computatonally convenent forms. For example, P T T 1,j (Γ) Pr(α (1) j < z (1) α (1) j+1 u > wβ (D) ) = Pr(α (1) j xβ (1) < ɛ (1) α (1) = = = = α (1) j+1 xβ(1) α (1) j xβ (1) p(ɛ (1) u > wβ (D) )dɛ (1) α (1) j+1 xβ(1) α (1) j xβ (1) wβ (D) wβ (D) j+1 xβ (1) u > wβ (D) ) wβ (D) p(ɛ (1), u) Pr(u > wβ (D) ) dudɛ(1) α (1) j+1 xβ(1) The ntegral above s smply E u p(ɛ (1) u)dɛ (1) p(u) α (1) j xβ (1) Pr(u > wβ (D) ) du Φ α(1) j+1 xβ (1) ρ (1) u Φ α(1) j xβ (1) ρ (1) u p(u) 1 ρ (1)2 1 ρ (1)2 Pr(u > wβ (D) ) du. Φ α(1) j+1 xβ (1) ρ (1) u Φ 1 ρ (1)2 α(1) j xβ (1) ρ (1) u 1 ρ (1)2, where u T N ( wβ (D), )(0, 1). Thus, the strong law of large numbers guarantees that L ˆP 1,j T T (Γ) (1/L) Φ α(1) j+1 xβ (1) ρ (1) u (l) Φ α(1) j l=1 1 ρ (1)2 xβ (1) ρ (1) u (l) p P T T 1 ρ (1)2 1,j (Γ), 18

20 where {u (l) } L l=1 denotes an d sample from the standard Normal dstrbuton truncated to ( wβ (D), ). 18 Followng smlar arguments, one can show L ˆP 0,j T T (Γ) (1/L) Φ α(0) j+1 xβ (0) ρ (0) u (l) Φ l=1 1 ρ (0)2 α(0) j xβ (0) ρ (0) u (l) p P T T 1 ρ (0)2 0,j (Γ). Puttng these results together, and, of course, explotng the avalablty of draws from our jont posteror, we can calculate the followng pont estmate of TT: wth Γ m p(γ y, D). TˆT (x, w, D(w) = 1) = E Γ y,d [T T (x, w, D(w) = 1; Γ)] 1 M J ( ˆP T T 1,j (Γ m ) M ˆP 0,j T T (Γ m ) ), (33) m=1 j=2 4.3 The Local Average Treatment Effect The Local Average Treatment Effect can be nterpreted as measurng the outcome gan (or loss) from treatment for a group of complers. Ths corresponds to the effect of treatment on a subgroup of the populaton who would choose to receve treatment at a partcular value of the nstrument, say w, but would not choose treatment at some w. 19 Consstent wth our dscussons n the prevous subsectons, our parameter of nterest s the ncreased (or decreased) lkelhood that the outcome varable exceeds the lowest category: LAT E(x, w, w, D(w) = 1, D( w) = 0; Γ) = Pr(y (1) 2 x, w, w, D(w) = 1, D( w) = 0, Γ) Pr(y (0) 2 x, w, w, D(w) = 1, D( w) = 0, Γ) J ( = P LAT E 1,j (Γ) P0,j LAT E (Γ) ), j=2 where P LAT E k,j (Γ) Pr(y (k) = j x, w, w, D(w) = 1, D( w) = 0, Γ), k = 0, In practce, these ntegrals can be approxmated qute accurately (and quckly) usng relatvely few draws from the truncated Normal dstrbuton. A routne for drawng from such a dstrbuton was provded n Step 4 of secton Heckman, Tobas and Vytlacl (2003) provde a smlar defnton of LATE n a parametrc latent varable selecton model. 19

21 To calculate LATE, we follow a smlar strategy to that outlned for calculatng the TT effect. It follows that L ˆP 1,j LAT E (Γ) (1/L) Φ α(1) j+1 xβ (1) ρ (1) u (l) Φ α(1) j l=1 1 ρ (1)2 p P LAT E 1,j (Γ), L ˆP 0,j LAT E (Γ) (1/L) Φ α(0) j+1 xβ (0) ρ (0) u (l) Φ α(0) j l=1 1 ρ (0)2 p P LAT E 0,j (Γ), xβ (1) ρ (1) u (l) 1 ρ (1)2 xβ (0) ρ (0) u (l) 1 ρ (0)2 where u (l) LATE: d T N [ wβ (D), wβ (D) ](0, 1). We can then proceed to obtan a pont estmate of LAT ˆ E[x, w, w, D(w) = 1, D( w = 0)] = E Γ y,d [LAT E(x, w, w, D(w) = 1, D( w) = 0; Γ)] 1 M J ( ˆP LAT E LAT E 1,j (Γ m ) ˆP 0,j (Γ m ) ), M wth Γ m p(γ y, D). m=1 j=2 4.4 Beyond Mean Treatment Parameters: Learnng about ρ (10) The treatment parameters dscussed n the prevous subsectons are typcal of the mean treatment effects consdered n the lterature. To see ths, note that an equvalent expresson of the parameter of nterest Pr(y (1) 2 x) Pr(y (0) 2 x) s E[I(y (1) 2) I(y (0) 2) x], where I( ) denotes the ndcator functon. Part of the appeal of these mean treatment parameters s that they enable researchers to quantfy a feature of the treatment mpact - the average gans or losses under varous condtonng scenaros - even though y (0) and y (1) are not jontly observed. Unlke the mean treatment effects descrbed n the prevous subsectons, however, other quanttes of sgnfcant polcy relevance, such as the probablty of a postve treatment effect Pr(y (1) y (0) > 0 x) wll depend on the correlaton parameter ρ (10). Ths correlaton parameter does not enter the lkelhood for the observed data (see, e.g., secton 2.1) and thus s not dentfable. Ths 20

22 fact has, perhaps, lmted the scope of most research to the estmaton of mean treatment mpacts. 20 For the Bayesan, ths non-dentfablty ssue rases the queston of what can and should be done about our treatment of the correlaton parameter ρ (10). 21 One approach, whch was used by Chb and Hamlton (2000), s to smply set ρ (10) = 0, ft the model subject to ths restrcton and then mpose that the restrcted covarance matrx (subject to ρ (10) = 0) s postve defnte. Whle n most cases ths wll be an nnocuous restrcton, n some cases, ths approach may have unantcpated consequences. For example, f we set ρ 10 = 0 n (6), t follows that Σ s postve defnte f and only f [ρ (1) ] 2 + [ρ (0) ] 2 1. Ths restrcton thus forces the dentfed correlaton parameters ρ (1) and ρ (0) to le wthn the unt crcle rather than the unt square. To llustrate what ths restrcton means, suppose that we performed a generated data experment and set ρ (1) = ρ (0) =.8. If we proceeded to ft ths model subject to ρ (10) = 0, and enforced that the restrcted covarance matrx was postve defnte, then our posteror mode must be nconsstent - the jont posteror ρ (1), ρ (0) y, D could never place any mass over the actual values used to generate the data regardless of the sze of the generated data set. Ths problem manfests tself for rather extreme cases of correlaton among the unobservables; f the correlatons are more moderate, then ths s not lkely to be a sgnfcant ssue. 22 An alternate approach, whch we have advocated n prevous work [e.g., Koop and Porer (1997), Porer and Tobas (2003), L, Porer and Tobas (2004)] s to smply work wth the full covarance matrx, as descrbed n secton 3.3, wthout restrctng ρ (10) a pror. As shown n Porer and Tobas (2003), ths does not nduce an nconsstency regardng the dentfed model parameters, and moreover, one can potentally learn about the non-dentfed correlaton parameter. Intutvely, nformaton arsng through the lkelhood functon wll enable us to pn down all of the correlaton parameters n (6) that are dentfable, leavng 20 Related work has sought to expand the focus beyond mean effects and dentfy outcome gan dstrbutons. See, for example, Heckman and Honoré (1990), Heckman, Smth and Clements (1997), Heckman and Smth (1998) and Carnero, Hansen and Heckman (2003). 21 See Porer (1998) for more on learnng about non-dentfable parameters through pror nformaton. Porer and Tobas (2000) contan related materal descrbng the mplcatons of pror restrctons on ρ (10). 22 Ths s partcularly true, as Chb and Hamlton (2000) pont out, n, say, panel models where most of the varaton s captured through fxed or random effects, and one would suspect that any remanng correlaton among the unobservables was mnmal. 21

23 only ρ (10) unknown. An addtonal source of nformaton then arses from the fact that Σ must be postve defnte. In partcular, the p.d. restrcton mposes that ρ (10) must have the followng condtonal support: ρ (1) ρ (0) (1 [ρ (1) ] 2 )(1 [ρ (0) ] 2 ) ρ (10) ρ (1) ρ (0) + (1 [ρ (1) ] 2 )(1 [ρ (0) ] 2 ). (34) Ths equaton provdes dentfable bounds on ρ (10) as a functon of the dentfed correlaton parameters ρ (1) and ρ (0). Somewhat surprsngly, these bounds also suggest that selecton bas may, n a partcular sense, be a good thng. When ρ (1) and ρ (0) are large, the bounds gven n (34) become ncreasngly nformatve. Intutvely, the presence of selecton bas provdes a vehcle for learnng about ρ (10) - f the errors n the outcome equatons are correlated suffcently wth the error n the treatment equaton, then to some extent, they must also be correlated wth one another. Equaton (34) shows that belefs regardng ρ (10) wll generally be revsed from the data - as we learn about ρ (1) and ρ (0), ths nformaton splls over and restrcts the condtonal support of ρ (10). Ths s, unfortunately, as far as the data wll take us - the shape of the margnal posteror densty of ρ (10) wthn the bounds n (34) s not updated from the data. Porer and Tobas (2003) show that n suffcently large samples where ρ (1) and ρ (0) are estmated precsely and are approxmately equal to, say, ρ (1, ) and ρ (0, ) : p(ρ (10) y, D) p(ρ (10) ρ (1) = ρ (1, ), ρ (0) = ρ (0, ) ). (35) That s, the margnal posteror for the non-dentfed correlaton parameter s approxmately equal to the condtonal pror for that correlaton parameter evaluated at the gven values ρ (1, ) and ρ (0, ). The support bounds n (34) are updated from the data, but wthn the bounds, the shape of the posteror s completely determned by the shape of the condtonal pror. For the Bayesan, ths s a natural result; n the absence of nformaton arsng from the data, one resorts to the use of pror nformaton. 23 The results of these studes suggest that there s, n one sense, a lmted opportunty for expandng the focus of research beyond mean effects. One could at least bound ρ (10) and then use these bounds to bound other parameters of nterest. If one s comfortable wth 23 Heckman, Smth and Clements (1997), for example, nformally dscuss plausble pror belefs for ρ (10). They wrte (page 510) In consderng outcomes lke employment and earnngs, many plausble models of program partcpaton suggest that outcomes n the treatment state are postvely related to outcomes n the non-treatment state...there s a wdely-held belef that good persons are good at whatever they do. 22

24 nsnuatng pror nformaton, however, one could obtan pont estmates of any parameter of nterest under a partcular pror. Default prors yeldng margnal posterors that are unform over the condtonal support bounds may appeal to many researchers when carryng out these calculatons. Use of such prors, however, typcally makes the problem more challengng from a computatonal pont of vew as they often break the nherent conjugacy of the model. 5 A Generated Data Experment In ths secton we conduct a generated data experment to demonstrate the performance of our posteror smulator and address a potental concern regardng choce of pror. A sample of 5,000 observatons s generated from the followng ordered potental outcome model: D = β (D) 0 + w β (D) 1 + u z (1) = β (1) + ɛ (1) z (0) = β (0) + ɛ (0), where w s drawn ndependently from a N(0, 1) dstrbuton and the error terms [u ɛ (1) ɛ (0) ] are drawn jontly from the trvarate Normal dstrbuton: u ɛ (1) ɛ (0) w d N 0 0 0, We consder ths specfc desgn wth a hgh degree of unobservable correlaton to reveal how our algorthm performs when selecton bas s a sgnfcant problem. 24 The non-dentfed correlaton ρ (10) s set to.6, and thus from (34) the covarance matrx s postve defnte. Fnally, the regresson parameters β (D) 0, β (D) 1, β (1) and β (0) and cutpont values α (k) j, j = 3, 4, 5, k = 0, 1 are enumerated n the frst column of Table 1, and the observables D, y (1) and y (0) are generated as follows: D = I(D > 0), y (1) = j f α (1) j < z (1) α (1) j+1, j = 1, 2, 3, 4, 5, y (0) = j f α (0) j < z (0) α (0) j+1, j = 1, 2, 3, 4, We do not address the weak nstruments problem here, but to fx deas, we consder the case where the nstrument plays a sgnfcant role n the treatment decson. 23

25 Wth ths expermental desgn, the number of treated versus untreated observatons s wellbalanced, as 51% of the sample ponts are assgned to the treatment group. Of those sample ponts that are assgned to the treatment group, 5%, 5%, 8%, 10% and 71% are assocated wth ordered outcomes of y (1) = 1, 2, 3, 4 and 5, respectvely. Lkewse, for those observatons that do not receve treatment, 46%, 14%, 13%, 10% and 16% of them fall nto the categores of y (0) = 1, 2, 3, 4 and 5, respectvely. We consder ths desgn to be reasonably typcal of actual emprcal stuatons, where the outcome varables are not unformly dstrbuted over the set of possble choces. We ft our model usng the posteror smulator descrbed n secton 3.3, run the algorthm for 3,000 teratons, and dscard the frst 600 draws as the burn-n perod. To llustrate the performance of the algorthm, we plot n Fgure 1 the lagged autocorrelatons up to order 100 for several selected parameters: β (D) 0, β (1), α (0) 3 and ρ (1). The lagged autocorrelaton plots are a useful way to assess the mxng of the parameter chans - f the lagged autocorrelatons reman close to unty, for example, then the posteror smulator only makes small local movements from teraton to teraton, resultng n naccurate posteror estmates. As shown n Fgure 1, the lagged autocorrelatons drop away reasonably quckly for all the selected parameters, suggestng that posteror quanttes can be approxmated reasonably accurately wth only a moderate number of posteror smulatons. Fgure 1 about here As dscussed n secton 3.2, one potental concern about workng wth the reparameterzed model s that we need to mpose prors drectly on the transformed parameters nstead of the structural parameters. Ths s an mportant ssue because prors that look sutable for the transformed parameters may turn out to mply rather unreasonable (and possbly qute nformatve) prors for the structural parameters. For ths generated data experment, we employ the prors descrbed n Secton 3.1. We can calculate the mpled prors for the structural parameters by frst samplng from the prors for the transformed parameters, nvertng to obtan the values of the structural parameters, and then smoothng the collecton of structural parameter values to obtan ther approxmate margnal pror denstes. demonstrate ths process, we plot n Fgure 2 the margnal prors and posterors for the selected parameters β (D) 0, β (1), α (0) 3 and ρ (1). As can be seen clearly from the graphs, the pror denstes for all the parameters are almost completely flat over the regons where the 24 To

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes