Bayesian Analysis of Treatment Effects in an Ordered Potential Outcomes Model

Size: px
Start display at page:

Download "Bayesian Analysis of Treatment Effects in an Ordered Potential Outcomes Model"

Transcription

1 Bayesan Analyss of Treatment Effects n an Ordered Potental Outcomes Model Mnglang L Department of Economcs SUNY-Buffalo ml3@buffalo.edu Justn L. Tobas Department of Economcs Iowa State Unversty tobasj@astate.edu Abstract We descrbe a new Bayesan estmaton algorthm for fttng a bnary treatment, ordered outcome selecton model n a potental outcomes framework. We show how recent advances n smulaton methods, namely data augmentaton, the Gbbs sampler and the Metropols-Hastngs algorthm can be used to ft ths model effcently, and also ntroduce a reparameterzaton to help accelerate the convergence of our posteror smulator. Several computatonal strateges whch allow for non-normalty are also dscussed. Conventonal treatment effects such as the Average Treatment Effect (ATE), the effect of treatment on the treated (TT) and the Local Average Treatment Effect (LATE) are adapted for ths specfc model, and Bayesan strateges for calculatng these treatment effects are ntroduced. Fnally, we revew how one can potentally learn (or at least bound) the non-dentfed cross-regme correlaton parameter and use ths learnng to calculate (or bound) parameters of nterest beyond mean treatment effects. ACKNOWLEDGEMENTS: We would lke to thank helpful comments and suggestons from two anonymous referees, the edtor Ed Vytlacl, and partcpants at the 4th Annual Advances n Econometrcs Conference. All errors are our own.

2 1 Introducton As evdenced by the vast lterature dedcated to the ssue, the problem of dentfyng and estmatng the effects of treatment from observatonal data s of central mportance to economcs and the socal scences. As suggested by the artcles appearng n ths volume, there are many estmaton strateges commonly employed n ths lterature, and the assumptons made n and ssues emphaszed by these varous approaches can be qute dstnct. For nstance, some studes employ fully parametrc models to conduct ther analyses, argung that the use of such models permts the estmaton of a wde range of polcy-relevant parameters, 1 whle others seek a more agnostc approach and thus pursue nonparametrc or semparametrc technques. 2 Many emprcal studes n ths area argue that the most convncng way to surmount the problem of treatment endogenety s to make use of cleverly chosen natural experments or nstrumental varables, 3 whle others are content to pursue more structural equaton approaches where the role of the excluson restrcton s decdedly less mportant and the dscusson surroundng the nstrument s muted. 4 Fnally, as n econometrcs generally, there are both Bayesan and Classcal approaches for handlng these types of models. In ths paper we focus prmarly on ths last dstncton and take up the case of Bayesan estmaton of a partcular type of treatment-response model. Whle Bayesan work on the analyss of treatment or causal effects has become more common n the econometrcs lterature [e.g., Vjverberg (1993), Koop and Porer (1997), L (1998), Chb and Hamlton (2000, 2002), Porer and Tobas (2003) and L, Porer and Tobas (2004)], the use of such technques contnues to reman rare relatve to Classcal approaches. We do not am to reduce ths dsparty by proselytzng at length n ths paper about the merts of the Bayesan approach relatve to Classcal methods. Instead, our goal s to revew how a Bayesan mght handle specfcatons, smlar to the Roy (1951) model, whch are commonly encountered n the treatment effect lterature, to revew some computatonal advances whch should appeal 1 Heckman, Tobas and Vytlacl (2003), for example, dscuss parametrc approaches for estmatng a varety of popular treatment effects under varous dstrbutonal assumptons. 2 Mansk s (1990, 1994) nonparametrc boundng s a leadng example. 3 See Angrst and Krueger (2001) for a revew. 4 Gould (2002,2005), for example, argues that havng strong predctors for treatment status s more mportant for practcal dentfcaton purposes than requrng that some set of covarates are excluded from the outcome equaton. In appled Bayesan work [e.g., Porer and Tobas (2003), Munkn and Trved (2003) and L, Porer and Tobas (2004)], the nstrument tends to receve decdedly less dscusson. In emprcal practce, however, such excluson restrctons should be, and typcally are used when avalable. 1

3 to all researchers when faced wth estmaton of these types of models, to ntroduce an ssue that s somewhat unque to the Bayesan lterature on ths topc, and to provde new results on Bayesan estmaton of a specfc type of treatment effect model. We take up the partcular case of a treatment-response model where treatment status s bnary and the outcome of nterest s ordered. To our knowledge, a dscusson of ths partcular model s new to the Bayesan lterature, though hghly related models, ncludng those of the bnary treatment / contnuous outcome and ordered treatment / bnary outcome varetes have appeared n Chb and Hamlton (2000). We present our model n a potental outcomes framework and thus model both the observed outcome of the agent gven her treatment choce as well as the potental or counterfactual outcome for that agent had she made a dfferent treatment decson. We show how data augmentaton [e.g., Tanner and Wong (1987), Albert and Chb (1993)] n conjuncton wth the Gbbs sampler and Metropols-Hastngs algorthm [e.g., Casella and George (1992), Terney (1994), Chb and Greenberg (1995)] can be used to ft ths partcular model effcently, and also ntroduce a reparameterzaton to help accelerate the convergence of our posteror smulator. Several computatonal strateges whch allow for non-normalty are also dscussed, though not employed. Treatment effects smlar n sprt to the Average Treatment Effect (ATE), the effect of treatment on the treated (TT) and the Local Average Treatment Effect (LATE) 5 are adapted for the case of our ordered response, and Bayesan strateges for calculatng these treatment effects are descrbed. Fnally, we dscuss how one can potentally learn about (or at least bound) the non-dentfed cross-regme correlaton parameter 6 and use ths learnng to calculate (or bound) parameters of nterest beyond mean treatment effects. The outlne of ths paper s as follows. Secton 2 presents the basc potental outcomes model and secton 3 dscusses our Bayesan estmaton algorthm. Often-reported treatment parameters such as ATE, TT and LATE are derved for our model n secton 4 and procedures for calculatng these effects are descrbed. A generated data experment whch llustrates the performance of our algorthm s provded n secton 5, and the paper concludes wth a 5 See, for example, Imbens and Angrst (1994) for a dscusson of LATE and Heckman and Vytlacl (1999, 2000) for detaled dscussons of these and other treatment effects. 6 For related dscussons on ths topc, see Vjverberg (1993), Koop and Porer (1997), Porer (1998), Porer and Tobas (2003) and L, Porer and Tobas (2004). 2

4 summary n secton 6. 2 The Model What we have n mnd s the development of a parametrc model that wll enable researchers to nvestgate the mpact of a bnary (and potentally endogenous) treatment varable, denoted D, where D = 1 mples recept of treatment and D = 0 mples non-recept, on an ordered outcome of nterest, denoted y {1, 2,, J}. There are numerous examples where such a model would be approprate. For example, one mght use ths model to nvestgate, say, the mpact of enrollng n a supplemental learnng center on atttudes toward educaton (measured as a categorcal response) or the quantty of educaton ultmately receved by the student. More generally, such a model s potentally of value n any stuaton where the outcome of nterest (e.g., earnngs, educaton, expendture) s recorded categorcally rather than contnuously and the model also contans a dummy endogenous varable. 7 We cast ths evaluaton problem n a potental outcomes framework and thus explctly model the counterfactual state - the ordered outcome that would have been observed had the agent made a dfferent treatment decson. We let y (1) denote the outcome receved by the agent n the treatment state and y (0) denote the outcome receved wthout treatment. Only one outcome, denoted y, s ever observed for any agent, and thus y = D y (1) + (1 D )y (0). We suppose that the observed treatment decson D and the observed and potental ordered outcomes y (1) and y (0) are generated by an underlyng latent varable representaton of the model. Specfcally, we wrte: 8 D = w β (D) + u (1) z (1) = x β (1) + ɛ (1) (2) 7 One can also conceve of stuatons where the modelng of count outcomes s desred (e.g. Munkn and Trved 2003). Clearly, approaches to modelng ordered and count outcomes mpose dfferent parametrc assumptons on the response (e.g. ordered probt versus Posson or negatve bnomal dstrbuton), nvoke dfferent nterpretatons of the outcomes of nterest (ordnal versus cardnal), and nvolve dfferent assessments of the censorng feature of the outcomes (censored versus unbounded). Whch approach s more approprate depends crtcally on the type of applcaton that s consdered. 8 In ths paper, we assume that the same set of covarates appear n the treated and untreated states. If desred, ths assumpton could be relaxed and ths extenson ncorporated nto the dervatons whch follow. 3

5 z (0) = x β (0) + ɛ (0). (3) The bnary treatment ndcator D s related to the latent D as follows: D = I(D > 0) = I[u > w β (D) ], (4) wth I( ) denotng the standard ndcator functon. Smlarly, the ordered responses y (1) y (0) are related to the latent varables z (1) and z (0) as follows: y (k) = j ff α (k) j < z (k) and α (k) j+1, k = 0, 1, j = 1, 2,, J. (5) The {α (k) j }, k = 0, 1, j = 1, 2,, J are cutponts n the model, mappng the latent ndces n both states nto dscrete values of our ordered response. We mpose standard dentfcaton condtons on these cutponts, namely, α (1) 1 = α (0) 1 =, α (1) 2 = α (0) 2 = 0 and α (1) J+1 = α (0) J+1 =. We also let α (1) = [α (1) 3 α (1) 4 α (1) J ] denote the cutpont vector for the treated state and defne the 1 (J 2) vector α (0) smlarly. In ths model we also assume the avalablty of an excluson restrcton - some covarate whch enters w that s not contaned n x. To motvate the mportance of ths assumpton, consder a restrcted verson of (1)-(3) whch conssts of equaton (1) and a latent varable equaton lke (2), the latter of whch ncludes the observed D as an element of x. Ths restrcted model would be of the form of a standard treatment or causal effect model that only works wth observed rather than potental outcomes. Maddala (1983, page 122), for example, shows that the parameters of such a model are not dentfable unless the errors of the equaton system are uncorrelated or such an excluson restrcton s present. The former condton often seems rather untenable n emprcal practce, and thus we mantan that such an excluson restrcton s avalable. Fnally, we fx deas throughout the remander of ths dscusson by assumng jont Normalty of the error terms: 9 u ɛ (1) ɛ (0) x, w d N 0 0 0, 1 ρ (1) ρ (0) ρ (1) 1 ρ (10) ρ (0) ρ (10) 1 N(0, Σ). (6) Equatons (1) - (6) then denote the complete specfcaton of our ordered potental outcomes model. 9 We dscuss how ths requrement can be relaxed n secton 3.4 of ths paper. The varances of the errors n all the equatons have been normalzed to unty for dentfcaton purposes. 4

6 2.1 The lkelhood Gven the assumed condtonal ndependence across observatons, we can wrte the lkelhood functon for ths model as: p(y, D Γ) L(Γ; y, D) = [ :D =1 Pr(y (1) = y, D = 1 Γ)][ :D =0 Pr(y (0) = y, D = 0 Γ)], where Γ = [β (D) β (1) β (0) α (1) α (0) ρ (0) ρ (1) ρ (10) ]. The jont probabltes requred n calculatng ths lkelhood can be obtaned from the bvarate Normal cdf. For example, Pr(y (1) = y, D = 1 Γ) = Pr(α (1) y < z (1) α (1) y +1, u > w β (D) w, x, Γ) (7) = Pr(α (1) y x β (1) < ɛ (1) α (1) y +1 x β (1), u > w β (D) w, x, Γ). Provded one uses a statstcal package contanng a routne for evaluatng a bvarate Normal cdf, standard MLE can be mplemented. (8) If no such routne s avalable on a partcular package, one could frst reduce the probabltes above to unvarate ntegraton problems and then employ standard numercal approxmatons such as Smpson s rule or Gaussan quadrature to approxmate the requste ntegrals. To see ths more clearly, let P 1,j Pr(D = 1, y (1) = j Γ), and note from (8) P 1,j = Pr(α (1) j x β (1) < ɛ (1) α (1) = = α (1) j+1 x β (1) α (1) j x β (1) (1) α j+1 x β (1) α (1) j x β (1) Φ p(ɛ (1) w β (D), u ) du dɛ (1) w β (D) ρ (1) ɛ (1) 1 ρ (1)2 j+1 x β (1), u > w β (D) x, w, Γ) p(ɛ (1) ) dɛ (1). In ths form a varety of approaches can be employed to approxmate the requred unvarate ntegrals. In our dscusson of treatment effects n secton 4, we wll return to one approach to ths problem based on Monte Carlo ntegraton usng truncated Normal samplng. Importantly, we also recognze that our estmaton strategy va data augmentaton, as descrbed n the followng secton, avods the need for any numercal ntegraton of the above form, and therefore provdes an attractve alternatve to the mplementaton of standard MLE. 5

7 3 Bayesan Estmaton To perform a Bayesan analyss, a researcher frst starts off as a classcal econometrcan mght by specfyng the lkelhood functon for ths model, as mpled from (1) - (6) and descrbed n the precedng secton. To ths lkelhood, the researcher adds a pror densty, say p(γ), wth Γ denotng the parameters of the model. Ths pror s chosen to reflect her subjectve belefs about values of the parameters, and n most cases s chosen to be suffcently vague or flat so that nformaton contaned n the data wll domnate nformaton nsnuated through the pror. The pror densty p(γ) combned wth the lkelhood p(y, D Γ) yelds the jont posteror densty p(γ y, D) up to proportonalty va Bayes theorem. Ths jont posteror completely summarzes the output of a Bayesan procedure - from t, one can obtan pont and nterval estmates, margnal posteror denstes, posteror quantles, or other quanttes of nterest. Whle n theory ths smple exercse outlnes the machnery nvolved n Bayesan posteror calculatons, n practce, extractng useful nformaton from a gven posteror p(γ y, D) can be dffcult. Drect calculaton of a posteror mean of an element of Γ, for example, frst requres that the normalzng constant of the jont posteror s known (whle often t s not), and even f the normalzng constant were known, the mean calculaton would stll requre solvng a hgh-dmensonal ntegraton problem. In models of moderate complexty, these ntegraton problems usually have no analytc solutons. Instead of drect evaluaton of ths posteror, modern Bayesan emprcal work makes use of recent advances n smulaton methods to carry out a posteror analyss. Two smulaton devces n partcular, called the Gbbs sampler and Metropols-Hastngs algorthm, are wdely used and have become ndspensable nstruments n an appled Bayesan s toolkt. Both of these algorthms solve the problem of calculaton of posteror moments, quantles, margnal denstes or other quanttes of nterest by frst obtanng a set of draws from the posteror p(γ y, D). Typcally, one can not draw drectly from ths densty, but nstead, one can generate a sequence of draws (by approprately followng the steps of the algorthms) that converge to ths dstrbuton. Once convergence has been acheved, the subsequent set of smulated parameter values can be used to calculate the desred quanttes (e.g., posteror means). In the Gbbs sampler, a Markov Chan whose lmtng dstrbuton s p(γ y, D) s produced by teratvely samplng form the complete posteror condtonals of the model. In many 6

8 cases, typcally n models wth condtonally conjugate prors, these posteror condtonals have well-known forms and can be easly sampled. The Metropols-Hastngs algorthm s a generalzaton of the Gbbs sampler and s a multvarate accept-reject algorthm. The algorthm s, agan, constructed so that the lmtng dstrbuton of the Markov chan s the target densty, p(γ y, D). 10 In terms of the model descrbed n ths paper, Bayesan estmaton of the specfcaton n (1) - (6) would lkely make use of data augmentaton [e.g. Tanner and Wong (1987), Albert and Chb (1993)] n conjuncton wth the algorthms above. When data augmentaton s used, the posteror s frst expanded (or, as the name suggests, augmented) to nclude not only the parameter vector Γ, but also the latent data s = [D z (1) z (0) ]. Although ths would seem to complcate the estmaton exercse, use of data augmentaton often smplfes the requred posteror calculatons. Ths s partcularly true when data augmentaton s used n conjuncton wth the Gbbs sampler snce, condtoned on the latent data, nference regardng the regresson parameters proceeds as a lnear regresson model would, and gven the regresson parameters, t s often straght-forward to obtan draws from the posteror condtonal for the latent data. For our model, ths augmented posteror s of the form: p(d, z (1), z (0), Γ y, D) p(y, D, D, z (1), z (0), Γ) (9) = p(y, D D, z (1), z (0), Γ)p(D, z (1), z (0) Γ)p(Γ), (10) wth p(γ) denotng the pror for the parameters of our model. The mddle term n the above expresson s mmedately known as a trvarate Normal densty, gven the jont Normalty assumpton n (6) combned wth the model n (1)-(3). The last term smply denotes the pror for our model parameters. For the frst term, condtoned on the latent varables and model parameters, the observed responses D and y are known wth certanty and thus the jont (condtonal) dstrbuton for y and D s degenerate. Puttng these peces together, and explotng the assumed condtonal ndependence across observatons, we can wrte the 10 A detaled revew of these smulaton methods s beyond the scope of ths paper; the nterested reader s nvted to see Casella and George (1992), Terney (1994), Chb and Greenberg (1995), Glks et al (1998), Geweke (1999), Chen, Shao and Ibrahm (2000), Carln and Lous (2000), Geweke and Keane (2001), Chb (2001), Koop (2003), Lancaster (2004), Gelman et al (2004), Porer and Tobas (2006) and Koop, Porer and Tobas (2006) (among others) for detaled and comprehensve descrptons of these and other methods. 7

9 augmented posteror as follows: n p(d, z (1), z (0), Γ D, y) p(γ) φ 3 (s ; r β, Σ) (11) =1 [ ] I(D = 1)I(D > 0)I(α y (1) < z (1) α (1) y +1) + I(D = 0)I(D 0)I(α y (0) < z (0) α (0) y +1). In the above, we have defned s = D z (1) z (0), r = w x x, β = β (D) β (1) β (0), (12) and φ 3 (x; µ, Ω) denotes a trvarate Normal densty wth mean µ and covarance matrx Ω. Fnally, Σ s defned n (6). The ndcator functons added to (11) serve to capture the degenerate jont dstrbuton of y and D gven the latent data and model parameters. 3.1 A Useful Reparameterzaton In theory, one could drectly apply standard computatonal tools (namely the Gbbs Sampler coupled wth a few Metropols-wthn-Gbbs steps) to ft the model n (11). However, t has been shown n related work [e.g., Cowles (1996), Nandram and Chen (1996) and L and Tobas (2005)], that use of the standard Gbbs sampler n models wth ordered responses suffers from slow mxng due to hgh correlaton between the smulated cutponts and latent data. As dscussed n the prevous secton, the parameter draws obtaned from our estmaton algorthm form a Markov chan, and when the chan mxes slowly, we observe only very small local movements from teraton to teraton. As a result, t may take a very long tme for our smulator to traverse the entre parameter space. When the lagged autocorrelatons between the smulated parameters are very hgh, estmates of posteror features may be qute naccurate, and numercal standard errors assocated wth those estmates wll be unacceptably large. To mtgate ths slow mxng problem, and move closer to a stuaton where we can obtan d samples from the posteror, we suggest below an alternate parameterzaton of the model, buldng of the suggeston of Nandram and Chen (1996). To shed some nsght on ths reparameterzaton, frst separate out the largest cutponts from the treated state, (α (1) J ), and untreated state, (α (0) J ), and defne the transformatons: σ 1 = 1/[α (1) J ] 2 and σ 0 = 1/[α (0) J ] 2. 8

10 In addton, for any varable Q let Q (1) σ 1 Q (1) and defne (0) Q σ 0 Q (0) smlarly. The model n (1) - (3) s then observatonally equvalent to where D = w β (D) + u (13) z (1) = x β(1) + ɛ (1) (14) z (0) = x β(0) + ɛ (0). (15) y (k) = j ff α (k) j < z (k) α j+1, (k) k = 0, 1. (16) In other words, the lkelhood functon for the observed data s unchanged when multplyng (2) and (3) by σ 1 and σ 0, respectvely, and approprately adjustng the rule n (16) whch maps the latent data nto the observed responses. The error varance for the transformed dsturbances now takes the followng form: u 0 ɛ (1) x d, w N 0, ɛ (0) 0 1 σ 1D σ 0D σ 1D σ 1 σ 10 σ 0D σ 10 σ 0 N(0, Σ), (17) where σ 1D σ 1 ρ (1), σ 0D σ 0 ρ (0) and σ 10 = σ 1 σ0 ρ (10). The correlaton parameters ρ (1), ρ (0) and ρ (10) are defned n (6). When the model s wrtten as n (13) - (17), t suggests that we can work wth an augmented posteror dstrbuton contanng the latent varables D, z (1), z (0) and parameters Γ = [ β σ 1D σ 0D σ 10 σ 1 σ 0 α (1) α (0) ] nstead of D, z (1), z (0) and Γ as n (11). The transformed cutpont and coeffcent vectors contaned n Γ are defned as follows: 11 α (1) = [ α (1) 3 α (1) 4 α (1) J 1], α (0) = [ α (0) 3 α (0) 4 α (0) J 1], and β = [β (D) β(1) β(0) ]. Followng smlar dervatons to those leadng to (11), we obtan the augmented jont posteror dstrbuton for the transformed parameters: p(d, z (1), z (0), Γ D, n y) p( Γ) φ 3 ( s ; r β, Σ) (18) =1 [ ] I(D = 1)I(D > 0)I( α y (1) < z (1) α (1) y +1) + I(D = 0)I(D 0)I( α y (0) < z (0) α (0) y +1) where s [D z (1) z (0) ] and Σ s defned n (17). 11 Note that the largest cutponts have been taken out of each cutpont vector and these largest cutponts are replaced by σ 1 and σ 0 n ths alternate parameterzaton. 9

11 When workng wth ths model, we employ ndependent prors for the parameters of Γ: p( Γ) = p( β)p( α (1) )p( α (0) )p( Σ). We center the regresson parameters around a pror mean of zero and specfy them to be ndependently dstrbuted wth large pror varances: β N(b0 = 0 k 1, V β = 1000I k ). The pror probablty densty functon of α (1) and α (0) s assumed to be proportonal to some constant: p( α (1) 3,, α (1) J 1, α (0) 3,, α (0) J 1) c, and fnally, an nverse Wshart pror of the form Σ IW (ρ, R) wth ρ = 6, R = I 3 s employed subject to the restrcton that the (1, 1) element of Σ s equal to one. 3.2 Benefts and Costs of Reparameterzaton To ths pont we have offered no compellng arguments why one should work wth the reparameterzed model nstead of workng drectly wth the orgnal structural representaton of the model. The frst argument n support of the reparameterzaton, as noted n Nandram and Chen (1996), and further shown n L and Tobas (2005), s that the rescalng helps to sgnfcantly reduce the autocorrelaton among the posteror smulatons, thus acceleratng the convergence of the algorthm. In other words, gven an equal number of posteror draws, the numercal standard errors obtaned when workng wth the reparameterzed model wll be sgnfcantly smaller than those obtaned when usng the orgnal parameterzaton of the model. Second, and qute mportantly from a computatonal pont of vew, ths transformaton effectvely restores the conjugacy requred to smulate the parameters of the covarance matrx. That s, n the orgnal parameterzaton of the model n (6), there are restrctons on all three dagonal elements of the covarance matrx. Ths precludes drawng the elements of the nverse covarance matrx from a Wshart dstrbuton, (as s typcally the case when conjugate prors are employed), snce the posteror condtonal s no longer Wshart gven the dagonal restrctons. On the other hand, n our reparameterzed verson of the model, the covarance matrx n (17) contans only one dagonal restrcton, and usng Algorthm 3 of Noble (2000), one can generate draws from ths restrcted Wshart densty. Thus, workng wth the reparameterzed model facltates smulatng parameters of the covarance matrx, and no Metropols-Hastngs steps are requred for ths porton of the posteror smulator. Fnally, for the specfc case where there are three possble ordered outcomes (.e., J = 3), there are effectvely no unknown cutponts n the transformed representaton of ths model. 10

12 For ths case, the cutponts are sampled through standard samplng of the elements of the covarance matrx. For ths partcular model wth 3 outcomes, posteror smulaton usng the reparameterzed model s qute fast, and no Metropols-Hastngs steps are requred at any pont n the algorthm. For J > 3, however, addtonal Metropols-Hastngs steps are requred to smulate elements of the cutpont vectors α 1 and α 0. The man, and perhaps only, drawback to workng wth the reparameterzed model s that t requres us to place prors on the transformed parameters Γ. The prors we place on these parameters may seem reasonable and sutably default, but upon closer nvestgaton, they may mply prors for the structural parameters that are unreasonable and are at odds wth our vews about quanttes for whch we can more easly elct our pror belefs. For a twoequaton treatment-response model contanng an ordered treatment varable and an ordered response, L and Tobas (2005) derve some connectons between prors lke those employed n (18) and ther consequences on the prors mpled on the structural parameters n (11). Wth sutably chosen hyperparameters, they argue that the mpled prors on the structural coeffcents can be reasonable, and that any costs assocated wth ths pror selecton ssue are more than outweghed by the benefts afforded by the reparameterzaton. We take up a more detaled vew of ths ssue of pror selecton for ths partcular model n our generated data experments of secton The Posteror Smulator We now ntroduce our posteror smulator for fttng our reparameterzed treatment-response model. In what follows, we adopt the notaton Γ x to denote all parameters other than x. We frst group the jont posteror nto [D z (1) z (0) α (1) α (0) β Σ]. The latent data and cutponts wll be sampled n blockng steps, whle the regresson parameters and covarance matrx wll be drawn from ther complete posteror condtonal. Step 1: Draw β from 12 β s, Γ β, y, D N ( D β d β, D β ) 12 It s useful to note that, condtonal on the latent varables, our model s essentally a seemngly unrelated regressons (SUR) model except for the restrcton that one dagonal element of the covarance matrx s fxed at one. 11

13 where ( n ) 1 D β r Σ 1 r + V β 1 and d β n r Σ 1 s + V β 1 b 0. =1 =1 Step 2: Draw Σ from ( [ Σ s, Γ n ]) Σ, y, D IW n + ρ, ( s r β)( s r β) + ρr I( Σ 11 = 1). =1 Algorthm 3 n Noble (2000) s used to generate varates from ths nverted Wshart dstrbuton, condtoned on the value of the (1,1) element. The remanng steps n the posteror smulator nvolve jont samplng of the latent data s = [D z (1) z (0) ] and cutpont vectors α (1) and α (0). We attempt to mtgate autocorrelaton n our parameter chans by blockng or groupng the cutponts from a gven equaton together wth the latent data appearng n that equaton. Specfcally, we proceed by samplng from the followng denstes: α (1), z (1) z (0), D, Γ α (1), y, D (19) α (0), z (0) z (1), D, Γ α (0), y, D (20) and D z (1), z (0), Γ, y, D. (21) Takng a closer look at the frst of these three denstes, we fnd from (18) α (1), z (1) z (0), D, Γ α (1), y, D (22) n (1) φ 3 ( s ; r β, Σ)[I( α y < z (1) α (1) y +1)I(D = 1) + I(D = 0)] =1 n =1 φ( z (1) ; µ c 1, σ c 1)[I( α (1) y < z (1) α (1) y +1)I(D = 1) + I(D = 0)]. Note that the ndcator functons nvolvng D and z (0) n (18) have dsappeared completely smply because we are now condtonng on these latent parameters. In the last lne of (22), we have broken the trvarate Normal densty for s nto a condtonal for z (1) tmes the jont for z (0) and D. The latter jont densty s then absorbed nto the normalzng constant of (22), as t does not nvolve α (1) or z (1). It follows that the condtonal mean µ c 1 and condtonal varance σ c 1 are defned as: µ c 1 x β(1) + [ σ 1D σ 10 ] [ 1 σ0d σ 0D σ 0 12 ] 1 [ D w β (D) z (0) x β(0) ]

14 and [ ] 1 [ ] 1 σ 1 c σ0d σ1d σ 1 [ σ 1D σ 10 ]. σ 0D σ 0 σ 10 To obtan a draw from (19), we proceed n two steps and use the method of composton [see, e.g., Chb (2001)]. Frst, we margnalze (19) over z (1) and descrbe a procedure for drawng α (1) from ths densty. In the second step, we draw z (1) from z (1) z (0), D, Γ, y, D. The realzed values of α (1) and z (1) then form a draw from (19). After ntegratng (19) over z (1), we obtan: α (1) z (0), D, Γ α (1), y, D Φ α(1) y +1 µ c ( 1 α (1) y Φ µ c ) 1. (23) c c :D =1 σ 1 σ 1 Step 3: To sample from the densty n (23), we follow the suggeston of Nandram and Chen (1996), who suggest usng a Drchlet proposal densty to sample dfferences 13 between the cutpont values, q (1) j = α (1) j+1 α (1) j, j = 3,, J 1. Gven that the largest cutpont takes the value of unty, we can then solve back to obtan the values of the cutponts themselves. Specfcally, we sample {q (1) j } J 1 j=3 Drchlet({δ (1) j n (1) j + 1} J 1 j=3 ), where {δ (1) j } J 1 j=3 = 0.1 are tunng parameters, and n (1) j n =1 I(y = j)i(d = 1), j = 3, J 1 are the numbers of ndvduals fallng nto each category of the outcome varable n the treated state. The probablty of acceptng the canddate draw s the standard Metropols-Hastngs probablty, mn(r, 1), where R = [ :D =1 Φ([ α (1) y +1,can µ c 1]/ σ 1) c Φ([ α y (1),can µ c 1]/ σ 1) c Φ([ α (1) y +1,l 1 µc 1]/ σ 1) c Φ([ α (1) y,l 1 µc 1]/ σ 1) ][ c J 1 ( q(1) j,l 1 j=3 q (1) j,can ) δ(1) j n (1) j ], l 1 denotes the current value of the algorthm and can denotes the canddate draw from the Drchlet proposal densty. Step 4: Sample z (1) z (1) z (0), D, Γ, y, D nd ndependently from the condtonal T N ( α (1) y, α (1) y +1 ) ( µc 1, σ c 1) f D = 1 N( µ c 1, σ c 1) f D = 0, = 1, 2,, n. Ths s a Normal densty wth mean µ c 1 and varance σ c 1, and s truncated to the nterval ( α (1) y, α (1) y +1) f ndvdual s observed to be n the treatment group. When D = 0, no 13 Note that samplng the cutponts n ths way enforces the orderng restrcton on the cutpont values. 13

15 restrctons arse regardng the latent data z (1), and thus the draw s obtaned from the untruncated Normal densty. To generate draws from a unvarate truncated Normal densty, one can use standard nverson methods. That s, to generate x T N (a,b) (µ, σ 2 ), frst draw U unformly on (0, 1) and then set ( ) [ ( ) ( ) a µ x = µ + σφ [Φ ]] b µ a µ 1 + U Φ Φ. σ σ σ Step 5: By smlar arguments as those leadng up to step 3, one can show that α (0) z (1), D, Γ α (0), y, D Φ α(0) y +1 µ c ( 0 α (0) y Φ µ c ) 0 c c σ 0 σ 0 where µ c 0 x β(0) + [ σ 0D σ 10 ] :D =0 [ 1 σ1d σ 1D σ 1 ] 1 [ D w β (D) z (1) x β(1) and [ ] 1 [ ] 1 σ 0 c σ1d σ0d σ 0 [ σ 0D σ 10 ]. σ 1D σ 1 σ 10 A strategy dentcal to that descrbed n Step 3 can be used to smulate the cutponts from ths proposal densty. ] (24) Step 6: Sample z (0) z (0) z (1), D, Γ, y, D nd ndependently from the condtonal T N ( α (0) y, α (0) y +1 ) ( µc 0, σ c 0) f D = 0 N( µ c 0, σ c 0) f D = 1, = 1, 2,, n. Step 7: Sample D D z 0, z (1), Γ, y, D nd In the above, we have defned ndependently from the condtonal µ c D w β (D) + [ σ 1D σ 0D ] { T N(0, ) ( µ c D, σ c D) f D = 1 T N (,0) ( µ c D, σ c D) f D = 0 [ σ1 σ 10 σ 10 σ 0 ] 1 [ z (1) z (0), = 1, 2,, n. ] x β(1) x β(0) and [ ] 1 [ ] σ D c σ1 σ 1 [ σ 1D σ 0D ] 10 σ1d. σ 10 σ 0 σ 0D Iteratng through steps 1-7 produces a draw from the augmented jont posteror dstrbuton. To recover the structural coeffcents of nterest, we smply nvert the mappngs descrbed above (13) and below (17). 14

16 3.4 Extendng the Model: Allowng for Non-Normalty A lmtaton of the model descrbed thus far n ths paper s ts relance on jont Normalty. For some applcatons, such as log wage outcomes [e.g., Heckman and Sedlacek (1985), Heckman (2004)], the Normalty assumpton may be a reasonable approxmaton, and f the model passes a selecton of dagnostc tests 14 no further refnements would be requred. For other models, researchers may worry about heavy tals, asymmetry or possbly bmodalty n the dsturbance varance. Below we outlne smple computatonal trcks for capturng these features of the data and generalzng the Normalty assumpton. The most straght-forward extenson of the model s to expand to Student-t errors by smply addng the approprate mxng varables to the dsturbance varance [see, e.g., Carln and Polson (1991), Geweke (1993), Albert and Chb (1993), Chb and Hamlton (2000) and L, Porer and Tobas (2004)]. For example, f we generalze the Normalty assumpton n (6) to u ɛ (1) ɛ (0) λ, x, w, Σ nd N (0, λ Σ), and specfy a pror for λ of the form 15 where Σ λ d IG(ν/2, 2/ν), t follows that (margnalzed over the pror for λ ): u ɛ (1) ɛ (0) 1 ρ (1) ρ (0) ρ (1) 1 ρ (10) ρ (0) ρ (10) 1, (25) x, w, Σ t ν (0, Σ), (26) a multvarate Student-t densty wth mean zero, scale matrx Σ and ν degrees of freedom. Ths devce s partcularly useful for modelng symmetrc error denstes whose tals are heaver than those mpled by the Normal densty. In addton, such an extenson to the model comes at lttle computatonal cost snce, condtoned on {λ }, samplng the regresson parameters and covarance matrx s straght-forward, and each λ can be drawn ndependently from ts complete posteror condtonal, whch s of an nverse Gamma form. 14 For example, one can calculate posteror predctve p-values [Gelman et al (2004)], QQ plots and other standard dagnostc crtera [e.g., Lancaster (2004), Koop, Porer and Tobas (2006)] to evaluate the approprateness of the Normalty assumpton. For more on the performance of related models under non-normalty, see, for example Goldberger (1983) or Paarsch (1984). 15 The nverted Gamma (IG) random varable s parameterzed as follows: p(x) x (a+1) exp[ 1/(bx)]. 15

17 An analogous and potentally more flexble extenson of the model s to suppose that the errors were drawn from a mxture of Normal denstes. Lke (25), we mght wrte u ɛ (1) ɛ (0) λ, x, w, Σ nd λ N(0, Σ 1 ) + (1 λ )N(0, Σ 2 ). (27) So, condtoned on λ (whch s unobserved), each observaton s ascrbed to one component of the mxture model wth covarance matrx equal to ether Σ 1 or Σ 2. Snce the component assgnment s known gven λ, t s, agan, straght-forward to obtan draws from the regresson parameters and component-specfc covarance matrces. The λ are then smulated ndependently from a two-pont dstrbuton. 16 Fnally, one can generalze ths mxture model even further by allowng the regresson parameters to vary across the mxture components. To do ths, we wrte s λ, Γ nd λ N(r β 1, Σ 1 ) + (1 λ )N(r β 2, Σ 2 ), where s and r are as defned n (12), and β j, Σ j represent the regresson parameter vector and covarance matrx from the j th component of the mxture. Generalzaton to more than two components s a also straghtforward, and the component ndcators and component probabltes can be smulated from multnomal and Drchlet denstes, respectvely [see, e.g., L, Porer and Tobas (2004)]. 4 Treatment Effects In ths secton we derve expressons for conventonal treatment effects n our ordered outcome treatment-response model. In partcular, we adapt conventonal treatment parameters ncludng the Average Treatment Effect (ATE), the effect of Treatment on the Treated (TT) and the Local Average Treatment Effect (LATE) to our ordered response model, and descrbe how these can be calculated wthn ths framework. 16 See., e.g., McLachlan and Peel (2000). There s an mportant ssue about local non-dentfablty of the mxture model parameters; the parameters are not dentfed up to a permutaton of the mxture components. To ad n dentfcaton, prors can be used that mpose an orderng restrcton on the varance parameters, regresson parameters or component probabltes. In some cases, there s lttle concern for component swtchng, but n other cases, ths ssue may be a sgnfcant concern. 16

18 4.1 The Average Treatment Effect We begn wth a dscusson of the Average Treatment Effect (ATE). Ths parameter typcally quantfes the expected outcome gan for a randomly chosen ndvdual. Snce our response s ordered, ths parameter may not be of drect relevance, as t demands a cardnal representaton of an ordnal varable. 17 In lght of ths ssue, we choose to adapt the ATE parameter to descrbe across-regme changes n probabltes assocated wth varous categores. fx deas, then, we consder the mpact of the treatment on ncreasng (or decreasng) the probablty that the outcome exceeds the lowest category: AT E(x; Γ) Pr(y (1) 2 x, Γ) Pr(y (0) 2 x, Γ) (28) = J [ Pr(y (1) = j x, Γ) Pr(y (0) = j x, Γ) ] (29) = j=2 J j=2 ([ Φ(α (1) j+1 xβ (1) ) Φ(α (1) j xβ (1) ) ] [ Φ(α (0) j+1 xβ (0) ) Φ(α (0) j xβ (0) ) ]). The choce of the lowest category s wthout loss of generalty; other probabltes can be obtaned n smlar ways. We relate ths quantty to ATE snce t corresponds to a probablty ncrease (or decrease) for a randomly chosen ndvdual. To A pont estmate of ths treatment mpact s readly obtaned usng our smulated set of parameters drawn from the jont posteror: ˆ AT E(x) E Γ y,d [AT E(x; Γ)] 1 M M AT E(x; Γ m ), (30) m=1 where Γ m p(γ y, D) and s obtaned from the algorthm descrbed n secton The Effect of Treatment on the Treated The effect of Treatment on the Treated (TT) s a conceptually dfferent parameter and descrbes the outcome gan (or loss) from treatment for those actually selectng nto treatment. 17 In some cases, however, t may be. For example, one could use an ordered model to analyze, say, years of schoolng completed, and thus reman true to the nteger-valued nature of the educaton data. In ths case, the ordered varable has a natural cardnal nterpretaton, and thus the conventonal ATE parameter would be of nterest. 17

19 Agan, we examne the treatment effect on the probablty that the outcome varable does not fall nto the lowest category: T T (x, w, D(w) = 1; Γ) Pr(y (1) 2 x, w, D(w) = 1, Γ) Pr(y (0) 2 x, w, D(w) = 1, Γ) (31) J [ = Pr(y (1) = j x, w, D(w) = 1, Γ) Pr(y (0) = j x, w, D(w) = 1, Γ) ]. j=2 To economze on notaton, let us defne P T T 1,j (Γ) Pr(y (1) = j x, w, D(w) = 1, Γ) and P T T 0,j (Γ) Pr(y (0) = j x, w, D(w) = 1, Γ), keepng the condtonng on x, w and D(w) = 1 mplct. Gven these defntons, t follows that T T (x, w, D(w) = 1; Γ) = J j=2 [P T T 1,j (Γ) P T T 0,j (Γ)]. (32) Recallng our descrpton of the lkelhood n secton 2.1, we can wrte the probabltes n (32) n more computatonally convenent forms. For example, P T T 1,j (Γ) Pr(α (1) j < z (1) α (1) j+1 u > wβ (D) ) = Pr(α (1) j xβ (1) < ɛ (1) α (1) = = = = α (1) j+1 xβ(1) α (1) j xβ (1) p(ɛ (1) u > wβ (D) )dɛ (1) α (1) j+1 xβ(1) α (1) j xβ (1) wβ (D) wβ (D) j+1 xβ (1) u > wβ (D) ) wβ (D) p(ɛ (1), u) Pr(u > wβ (D) ) dudɛ(1) α (1) j+1 xβ(1) The ntegral above s smply E u p(ɛ (1) u)dɛ (1) p(u) α (1) j xβ (1) Pr(u > wβ (D) ) du Φ α(1) j+1 xβ (1) ρ (1) u Φ α(1) j xβ (1) ρ (1) u p(u) 1 ρ (1)2 1 ρ (1)2 Pr(u > wβ (D) ) du. Φ α(1) j+1 xβ (1) ρ (1) u Φ 1 ρ (1)2 α(1) j xβ (1) ρ (1) u 1 ρ (1)2, where u T N ( wβ (D), )(0, 1). Thus, the strong law of large numbers guarantees that L ˆP 1,j T T (Γ) (1/L) Φ α(1) j+1 xβ (1) ρ (1) u (l) Φ α(1) j l=1 1 ρ (1)2 xβ (1) ρ (1) u (l) p P T T 1 ρ (1)2 1,j (Γ), 18

20 where {u (l) } L l=1 denotes an d sample from the standard Normal dstrbuton truncated to ( wβ (D), ). 18 Followng smlar arguments, one can show L ˆP 0,j T T (Γ) (1/L) Φ α(0) j+1 xβ (0) ρ (0) u (l) Φ l=1 1 ρ (0)2 α(0) j xβ (0) ρ (0) u (l) p P T T 1 ρ (0)2 0,j (Γ). Puttng these results together, and, of course, explotng the avalablty of draws from our jont posteror, we can calculate the followng pont estmate of TT: wth Γ m p(γ y, D). TˆT (x, w, D(w) = 1) = E Γ y,d [T T (x, w, D(w) = 1; Γ)] 1 M J ( ˆP T T 1,j (Γ m ) M ˆP 0,j T T (Γ m ) ), (33) m=1 j=2 4.3 The Local Average Treatment Effect The Local Average Treatment Effect can be nterpreted as measurng the outcome gan (or loss) from treatment for a group of complers. Ths corresponds to the effect of treatment on a subgroup of the populaton who would choose to receve treatment at a partcular value of the nstrument, say w, but would not choose treatment at some w. 19 Consstent wth our dscussons n the prevous subsectons, our parameter of nterest s the ncreased (or decreased) lkelhood that the outcome varable exceeds the lowest category: LAT E(x, w, w, D(w) = 1, D( w) = 0; Γ) = Pr(y (1) 2 x, w, w, D(w) = 1, D( w) = 0, Γ) Pr(y (0) 2 x, w, w, D(w) = 1, D( w) = 0, Γ) J ( = P LAT E 1,j (Γ) P0,j LAT E (Γ) ), j=2 where P LAT E k,j (Γ) Pr(y (k) = j x, w, w, D(w) = 1, D( w) = 0, Γ), k = 0, In practce, these ntegrals can be approxmated qute accurately (and quckly) usng relatvely few draws from the truncated Normal dstrbuton. A routne for drawng from such a dstrbuton was provded n Step 4 of secton Heckman, Tobas and Vytlacl (2003) provde a smlar defnton of LATE n a parametrc latent varable selecton model. 19

21 To calculate LATE, we follow a smlar strategy to that outlned for calculatng the TT effect. It follows that L ˆP 1,j LAT E (Γ) (1/L) Φ α(1) j+1 xβ (1) ρ (1) u (l) Φ α(1) j l=1 1 ρ (1)2 p P LAT E 1,j (Γ), L ˆP 0,j LAT E (Γ) (1/L) Φ α(0) j+1 xβ (0) ρ (0) u (l) Φ α(0) j l=1 1 ρ (0)2 p P LAT E 0,j (Γ), xβ (1) ρ (1) u (l) 1 ρ (1)2 xβ (0) ρ (0) u (l) 1 ρ (0)2 where u (l) LATE: d T N [ wβ (D), wβ (D) ](0, 1). We can then proceed to obtan a pont estmate of LAT ˆ E[x, w, w, D(w) = 1, D( w = 0)] = E Γ y,d [LAT E(x, w, w, D(w) = 1, D( w) = 0; Γ)] 1 M J ( ˆP LAT E LAT E 1,j (Γ m ) ˆP 0,j (Γ m ) ), M wth Γ m p(γ y, D). m=1 j=2 4.4 Beyond Mean Treatment Parameters: Learnng about ρ (10) The treatment parameters dscussed n the prevous subsectons are typcal of the mean treatment effects consdered n the lterature. To see ths, note that an equvalent expresson of the parameter of nterest Pr(y (1) 2 x) Pr(y (0) 2 x) s E[I(y (1) 2) I(y (0) 2) x], where I( ) denotes the ndcator functon. Part of the appeal of these mean treatment parameters s that they enable researchers to quantfy a feature of the treatment mpact - the average gans or losses under varous condtonng scenaros - even though y (0) and y (1) are not jontly observed. Unlke the mean treatment effects descrbed n the prevous subsectons, however, other quanttes of sgnfcant polcy relevance, such as the probablty of a postve treatment effect Pr(y (1) y (0) > 0 x) wll depend on the correlaton parameter ρ (10). Ths correlaton parameter does not enter the lkelhood for the observed data (see, e.g., secton 2.1) and thus s not dentfable. Ths 20

22 fact has, perhaps, lmted the scope of most research to the estmaton of mean treatment mpacts. 20 For the Bayesan, ths non-dentfablty ssue rases the queston of what can and should be done about our treatment of the correlaton parameter ρ (10). 21 One approach, whch was used by Chb and Hamlton (2000), s to smply set ρ (10) = 0, ft the model subject to ths restrcton and then mpose that the restrcted covarance matrx (subject to ρ (10) = 0) s postve defnte. Whle n most cases ths wll be an nnocuous restrcton, n some cases, ths approach may have unantcpated consequences. For example, f we set ρ 10 = 0 n (6), t follows that Σ s postve defnte f and only f [ρ (1) ] 2 + [ρ (0) ] 2 1. Ths restrcton thus forces the dentfed correlaton parameters ρ (1) and ρ (0) to le wthn the unt crcle rather than the unt square. To llustrate what ths restrcton means, suppose that we performed a generated data experment and set ρ (1) = ρ (0) =.8. If we proceeded to ft ths model subject to ρ (10) = 0, and enforced that the restrcted covarance matrx was postve defnte, then our posteror mode must be nconsstent - the jont posteror ρ (1), ρ (0) y, D could never place any mass over the actual values used to generate the data regardless of the sze of the generated data set. Ths problem manfests tself for rather extreme cases of correlaton among the unobservables; f the correlatons are more moderate, then ths s not lkely to be a sgnfcant ssue. 22 An alternate approach, whch we have advocated n prevous work [e.g., Koop and Porer (1997), Porer and Tobas (2003), L, Porer and Tobas (2004)] s to smply work wth the full covarance matrx, as descrbed n secton 3.3, wthout restrctng ρ (10) a pror. As shown n Porer and Tobas (2003), ths does not nduce an nconsstency regardng the dentfed model parameters, and moreover, one can potentally learn about the non-dentfed correlaton parameter. Intutvely, nformaton arsng through the lkelhood functon wll enable us to pn down all of the correlaton parameters n (6) that are dentfable, leavng 20 Related work has sought to expand the focus beyond mean effects and dentfy outcome gan dstrbutons. See, for example, Heckman and Honoré (1990), Heckman, Smth and Clements (1997), Heckman and Smth (1998) and Carnero, Hansen and Heckman (2003). 21 See Porer (1998) for more on learnng about non-dentfable parameters through pror nformaton. Porer and Tobas (2000) contan related materal descrbng the mplcatons of pror restrctons on ρ (10). 22 Ths s partcularly true, as Chb and Hamlton (2000) pont out, n, say, panel models where most of the varaton s captured through fxed or random effects, and one would suspect that any remanng correlaton among the unobservables was mnmal. 21

23 only ρ (10) unknown. An addtonal source of nformaton then arses from the fact that Σ must be postve defnte. In partcular, the p.d. restrcton mposes that ρ (10) must have the followng condtonal support: ρ (1) ρ (0) (1 [ρ (1) ] 2 )(1 [ρ (0) ] 2 ) ρ (10) ρ (1) ρ (0) + (1 [ρ (1) ] 2 )(1 [ρ (0) ] 2 ). (34) Ths equaton provdes dentfable bounds on ρ (10) as a functon of the dentfed correlaton parameters ρ (1) and ρ (0). Somewhat surprsngly, these bounds also suggest that selecton bas may, n a partcular sense, be a good thng. When ρ (1) and ρ (0) are large, the bounds gven n (34) become ncreasngly nformatve. Intutvely, the presence of selecton bas provdes a vehcle for learnng about ρ (10) - f the errors n the outcome equatons are correlated suffcently wth the error n the treatment equaton, then to some extent, they must also be correlated wth one another. Equaton (34) shows that belefs regardng ρ (10) wll generally be revsed from the data - as we learn about ρ (1) and ρ (0), ths nformaton splls over and restrcts the condtonal support of ρ (10). Ths s, unfortunately, as far as the data wll take us - the shape of the margnal posteror densty of ρ (10) wthn the bounds n (34) s not updated from the data. Porer and Tobas (2003) show that n suffcently large samples where ρ (1) and ρ (0) are estmated precsely and are approxmately equal to, say, ρ (1, ) and ρ (0, ) : p(ρ (10) y, D) p(ρ (10) ρ (1) = ρ (1, ), ρ (0) = ρ (0, ) ). (35) That s, the margnal posteror for the non-dentfed correlaton parameter s approxmately equal to the condtonal pror for that correlaton parameter evaluated at the gven values ρ (1, ) and ρ (0, ). The support bounds n (34) are updated from the data, but wthn the bounds, the shape of the posteror s completely determned by the shape of the condtonal pror. For the Bayesan, ths s a natural result; n the absence of nformaton arsng from the data, one resorts to the use of pror nformaton. 23 The results of these studes suggest that there s, n one sense, a lmted opportunty for expandng the focus of research beyond mean effects. One could at least bound ρ (10) and then use these bounds to bound other parameters of nterest. If one s comfortable wth 23 Heckman, Smth and Clements (1997), for example, nformally dscuss plausble pror belefs for ρ (10). They wrte (page 510) In consderng outcomes lke employment and earnngs, many plausble models of program partcpaton suggest that outcomes n the treatment state are postvely related to outcomes n the non-treatment state...there s a wdely-held belef that good persons are good at whatever they do. 22

24 nsnuatng pror nformaton, however, one could obtan pont estmates of any parameter of nterest under a partcular pror. Default prors yeldng margnal posterors that are unform over the condtonal support bounds may appeal to many researchers when carryng out these calculatons. Use of such prors, however, typcally makes the problem more challengng from a computatonal pont of vew as they often break the nherent conjugacy of the model. 5 A Generated Data Experment In ths secton we conduct a generated data experment to demonstrate the performance of our posteror smulator and address a potental concern regardng choce of pror. A sample of 5,000 observatons s generated from the followng ordered potental outcome model: D = β (D) 0 + w β (D) 1 + u z (1) = β (1) + ɛ (1) z (0) = β (0) + ɛ (0), where w s drawn ndependently from a N(0, 1) dstrbuton and the error terms [u ɛ (1) ɛ (0) ] are drawn jontly from the trvarate Normal dstrbuton: u ɛ (1) ɛ (0) w d N 0 0 0, We consder ths specfc desgn wth a hgh degree of unobservable correlaton to reveal how our algorthm performs when selecton bas s a sgnfcant problem. 24 The non-dentfed correlaton ρ (10) s set to.6, and thus from (34) the covarance matrx s postve defnte. Fnally, the regresson parameters β (D) 0, β (D) 1, β (1) and β (0) and cutpont values α (k) j, j = 3, 4, 5, k = 0, 1 are enumerated n the frst column of Table 1, and the observables D, y (1) and y (0) are generated as follows: D = I(D > 0), y (1) = j f α (1) j < z (1) α (1) j+1, j = 1, 2, 3, 4, 5, y (0) = j f α (0) j < z (0) α (0) j+1, j = 1, 2, 3, 4, We do not address the weak nstruments problem here, but to fx deas, we consder the case where the nstrument plays a sgnfcant role n the treatment decson. 23

25 Wth ths expermental desgn, the number of treated versus untreated observatons s wellbalanced, as 51% of the sample ponts are assgned to the treatment group. Of those sample ponts that are assgned to the treatment group, 5%, 5%, 8%, 10% and 71% are assocated wth ordered outcomes of y (1) = 1, 2, 3, 4 and 5, respectvely. Lkewse, for those observatons that do not receve treatment, 46%, 14%, 13%, 10% and 16% of them fall nto the categores of y (0) = 1, 2, 3, 4 and 5, respectvely. We consder ths desgn to be reasonably typcal of actual emprcal stuatons, where the outcome varables are not unformly dstrbuted over the set of possble choces. We ft our model usng the posteror smulator descrbed n secton 3.3, run the algorthm for 3,000 teratons, and dscard the frst 600 draws as the burn-n perod. To llustrate the performance of the algorthm, we plot n Fgure 1 the lagged autocorrelatons up to order 100 for several selected parameters: β (D) 0, β (1), α (0) 3 and ρ (1). The lagged autocorrelaton plots are a useful way to assess the mxng of the parameter chans - f the lagged autocorrelatons reman close to unty, for example, then the posteror smulator only makes small local movements from teraton to teraton, resultng n naccurate posteror estmates. As shown n Fgure 1, the lagged autocorrelatons drop away reasonably quckly for all the selected parameters, suggestng that posteror quanttes can be approxmated reasonably accurately wth only a moderate number of posteror smulatons. Fgure 1 about here As dscussed n secton 3.2, one potental concern about workng wth the reparameterzed model s that we need to mpose prors drectly on the transformed parameters nstead of the structural parameters. Ths s an mportant ssue because prors that look sutable for the transformed parameters may turn out to mply rather unreasonable (and possbly qute nformatve) prors for the structural parameters. For ths generated data experment, we employ the prors descrbed n Secton 3.1. We can calculate the mpled prors for the structural parameters by frst samplng from the prors for the transformed parameters, nvertng to obtan the values of the structural parameters, and then smoothng the collecton of structural parameter values to obtan ther approxmate margnal pror denstes. demonstrate ths process, we plot n Fgure 2 the margnal prors and posterors for the selected parameters β (D) 0, β (1), α (0) 3 and ρ (1). As can be seen clearly from the graphs, the pror denstes for all the parameters are almost completely flat over the regons where the 24 To

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

An Introduction to Censoring, Truncation and Sample Selection Problems

An Introduction to Censoring, Truncation and Sample Selection Problems An Introducton to Censorng, Truncaton and Sample Selecton Problems Thomas Crossley SPIDA June 2003 1 A. Introducton A.1 Basc Ideas Most of the statstcal technques we study are for estmatng (populaton)

More information

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition) Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for

More information

1 Binary Response Models

1 Binary Response Models Bnary and Ordered Multnomal Response Models Dscrete qualtatve response models deal wth dscrete dependent varables. bnary: yes/no, partcpaton/non-partcpaton lnear probablty model LPM, probt or logt models

More information

Limited Dependent Variables and Panel Data. Tibor Hanappi

Limited Dependent Variables and Panel Data. Tibor Hanappi Lmted Dependent Varables and Panel Data Tbor Hanapp 30.06.2010 Lmted Dependent Varables Dscrete: Varables that can take onl a countable number of values Censored/Truncated: Data ponts n some specfc range

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Web Appendix B Estimation. We base our sampling procedure on the method of data augmentation (e.g., Tanner and Wong,

Web Appendix B Estimation. We base our sampling procedure on the method of data augmentation (e.g., Tanner and Wong, Web Appendx B Estmaton Lkelhood and Data Augmentaton We base our samplng procedure on the method of data augmentaton (eg anner and Wong 987) here e treat the unobserved ndvdual choces as parameters Specfcally

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes 25/6 Canddates Only January Examnatons 26 Student Number: Desk Number:...... DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR Department Module Code Module Ttle Exam Duraton

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Basically, if you have a dummy dependent variable you will be estimating a probability.

Basically, if you have a dummy dependent variable you will be estimating a probability. ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy

More information

= z 20 z n. (k 20) + 4 z k = 4

= z 20 z n. (k 20) + 4 z k = 4 Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

Supplementary Notes for Chapter 9 Mixture Thermodynamics

Supplementary Notes for Chapter 9 Mixture Thermodynamics Supplementary Notes for Chapter 9 Mxture Thermodynamcs Key ponts Nne major topcs of Chapter 9 are revewed below: 1. Notaton and operatonal equatons for mxtures 2. PVTN EOSs for mxtures 3. General effects

More information

Online Appendix to: Axiomatization and measurement of Quasi-hyperbolic Discounting

Online Appendix to: Axiomatization and measurement of Quasi-hyperbolic Discounting Onlne Appendx to: Axomatzaton and measurement of Quas-hyperbolc Dscountng José Lus Montel Olea Tomasz Strzaleck 1 Sample Selecton As dscussed before our ntal sample conssts of two groups of subjects. Group

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Statistics for Economics & Business

Statistics for Economics & Business Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable

More information

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

Methods Lunch Talk: Causal Mediation Analysis

Methods Lunch Talk: Causal Mediation Analysis Methods Lunch Talk: Causal Medaton Analyss Taeyong Park Washngton Unversty n St. Lous Aprl 9, 2015 Park (Wash U.) Methods Lunch Aprl 9, 2015 1 / 1 References Baron and Kenny. 1986. The Moderator-Medator

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

LOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi

LOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi LOGIT ANALYSIS A.K. VASISHT Indan Agrcultural Statstcs Research Insttute, Lbrary Avenue, New Delh-0 02 amtvassht@asr.res.n. Introducton In dummy regresson varable models, t s assumed mplctly that the dependent

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Chapter 9: Statistical Inference and the Relationship between Two Variables

Chapter 9: Statistical Inference and the Relationship between Two Variables Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,

More information

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 6 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutons to assst canddates preparng for the eamnatons n future years and for

More information

January Examinations 2015

January Examinations 2015 24/5 Canddates Only January Examnatons 25 DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR STUDENT CANDIDATE NO.. Department Module Code Module Ttle Exam Duraton (n words)

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10) I. Defnton and Problems Econ7 Appled Econometrcs Topc 9: Heteroskedastcty (Studenmund, Chapter ) We now relax another classcal assumpton. Ths s a problem that arses often wth cross sectons of ndvduals,

More information

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients ECON 5 -- NOE 15 Margnal Effects n Probt Models: Interpretaton and estng hs note ntroduces you to the two types of margnal effects n probt models: margnal ndex effects, and margnal probablty effects. It

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

Andreas C. Drichoutis Agriculural University of Athens. Abstract

Andreas C. Drichoutis Agriculural University of Athens. Abstract Heteroskedastcty, the sngle crossng property and ordered response models Andreas C. Drchouts Agrculural Unversty of Athens Panagots Lazards Agrculural Unversty of Athens Rodolfo M. Nayga, Jr. Texas AMUnversty

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Lecture 6: Introduction to Linear Regression

Lecture 6: Introduction to Linear Regression Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

x i1 =1 for all i (the constant ).

x i1 =1 for all i (the constant ). Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by

More information

U-Pb Geochronology Practical: Background

U-Pb Geochronology Practical: Background U-Pb Geochronology Practcal: Background Basc Concepts: accuracy: measure of the dfference between an expermental measurement and the true value precson: measure of the reproducblty of the expermental result

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Learning from Data 1 Naive Bayes

Learning from Data 1 Naive Bayes Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : http://anc.ed.ac.uk/ dbarber/lfd1/lfd1.html c Davd Barber 2001, 2002 1 Learnng from Data 1 : c Davd Barber 2001,2002 2 1 Why

More information

Primer on High-Order Moment Estimators

Primer on High-Order Moment Estimators Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek Dscusson of Extensons of the Gauss-arkov Theorem to the Case of Stochastc Regresson Coeffcents Ed Stanek Introducton Pfeffermann (984 dscusses extensons to the Gauss-arkov Theorem n settngs where regresson

More information

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger JAB Chan Long-tal clams development ASTIN - September 2005 B.Verder A. Klnger Outlne Chan Ladder : comments A frst soluton: Munch Chan Ladder JAB Chan Chan Ladder: Comments Black lne: average pad to ncurred

More information

Testing for seasonal unit roots in heterogeneous panels

Testing for seasonal unit roots in heterogeneous panels Testng for seasonal unt roots n heterogeneous panels Jesus Otero * Facultad de Economía Unversdad del Rosaro, Colomba Jeremy Smth Department of Economcs Unversty of arwck Monca Gulett Aston Busness School

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

2016 Wiley. Study Session 2: Ethical and Professional Standards Application 6 Wley Study Sesson : Ethcal and Professonal Standards Applcaton LESSON : CORRECTION ANALYSIS Readng 9: Correlaton and Regresson LOS 9a: Calculate and nterpret a sample covarance and a sample correlaton

More information

4.3 Poisson Regression

4.3 Poisson Regression of teratvely reweghted least squares regressons (the IRLS algorthm). We do wthout gvng further detals, but nstead focus on the practcal applcaton. > glm(survval~log(weght)+age, famly="bnomal", data=baby)

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

Chapter 5 Multilevel Models

Chapter 5 Multilevel Models Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation Econ 388 R. Butler 204 revsons Lecture 4 Dummy Dependent Varables I. Lnear Probablty Model: the Regresson model wth a dummy varables as the dependent varable assumpton, mplcaton regular multple regresson

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

Lecture 3 Stat102, Spring 2007

Lecture 3 Stat102, Spring 2007 Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture

More information

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics ECOOMICS 35*-A Md-Term Exam -- Fall Term 000 Page of 3 pages QUEE'S UIVERSITY AT KIGSTO Department of Economcs ECOOMICS 35* - Secton A Introductory Econometrcs Fall Term 000 MID-TERM EAM ASWERS MG Abbott

More information

Statistical inference for generalized Pareto distribution based on progressive Type-II censored data with random removals

Statistical inference for generalized Pareto distribution based on progressive Type-II censored data with random removals Internatonal Journal of Scentfc World, 2 1) 2014) 1-9 c Scence Publshng Corporaton www.scencepubco.com/ndex.php/ijsw do: 10.14419/jsw.v21.1780 Research Paper Statstcal nference for generalzed Pareto dstrbuton

More information

III. Econometric Methodology Regression Analysis

III. Econometric Methodology Regression Analysis Page Econ07 Appled Econometrcs Topc : An Overvew of Regresson Analyss (Studenmund, Chapter ) I. The Nature and Scope of Econometrcs. Lot s of defntons of econometrcs. Nobel Prze Commttee Paul Samuelson,

More information

Uncertainty as the Overlap of Alternate Conditional Distributions

Uncertainty as the Overlap of Alternate Conditional Distributions Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

Statistics II Final Exam 26/6/18

Statistics II Final Exam 26/6/18 Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the

More information

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6 Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.

More information