The FIRE Project: FIRE3

Size: px

Start display at page:

Download "The FIRE Project: FIRE3"

Theodore Barnett
5 years ago
Views:

1 The FIRE Project: FIRE3 Richard Butler, Brigham Young University 3 Identification Strategies Largely from Lars Lefgren s BYU (econ 488) class notes. The notation and some of the concepts in this section come from Heckman and Smith (1995), and Chay (2001). Another reference coming to the author s attention after this chapter was drafted, was Angrist s Mostly Harmless Econometrics, which has a more sophisticated discussion of these same issues. 3.1 Covariance May Not Be Causal OLS estimation seems eminently reasonable, but is it also statistically justified as a tool for retrieving casual estimates that can be used for testing theories or predicting outcomes? Note in the simple regression case (with just one slope variable), that the normal equations for ˆβ 1 yields an estimator that converges to cov(x, y)/var(x), where cov(x, y) is the covariance between x and y, and var(x) is the variance of x. Does a positive covariance mean its a casual relationship, in the sense that as x increases, then y will increase? No, covariance doesn t necessarily indicate causality. Consider urban, insured fire-loss damage. In cities, for one-alarm fires, relatively few crews show up to battle the blaze. In a two-alarm fire, more crews come, etc. After we collected a lot of these fire-loss incidents, we regress insured losses on the number of alarms sounded. We find a strong, positive, linear relation: the more alarms, the greater the insured losses. Is a policy implication that we can minimize insured losses by only sounding one alarm for all fires? Of course not. The problem is that anticipated insured losses and number of alarms sounded are jointly determined. The relationship is not causal. Covariance does not imply causality. Another example: do low credit scores causally determine insurance claims? If that were the case, shouldn t insurers loss prevention efforts for some lines of insurance focus on credit counseling? What is an appropriate x and y, vary by questions being addressed and the sample information at hand: one person s x variable may be another person s y variable. As suggested above, there be covariance between any two variables for a number of reasons. The first, and for our purposes, the best reason for the covariance to be positive (or negative) is that there may actually be a unidirectional causal relationship, x causes y. Note the emphasis on getting the direction of causality correct: it is possible that we 1

2 have confused the roles of x and y in our analysis, so rather than mistakingly regressing the number of children on accrued dependent dental costs (so the model assumes accrued dependent dental costs increase the number of children): children i = β 0 + β 1 dental expenses i + µ i the causality is actually just reversed from the specification above, and we should be regressing accrued dependent dental costs on the number of children. The regression immediately above is a case of a reverse causal relationship, the y in our model is actually causing the x in our model specified in the above regression; we have confused their respective roles in the specification immediately above. Frequently, there may some third variable driving both our observed x and y. Come back to the example of credit scores and insurance losses. Suppose that individuals vary by their aversion to risk, with those who are most risk averse being careful with their credit cards and also being relatively more careful while they are driving. Hence, the negative covariance between credit score and insured losses is actually the result of a more complex interaction with one or more other variables not in our model (namely, risk aversion in this case). Finally, there may be a persistent covariance because x and y are jointly determined. This is almost always the case in markets where the relationships between price and quantity is being estimated. For example, the amount of term life insurance policies demanded depends on the premium charged. On the other hand, the number of such policies offered also depends on the premiums. Premiums, and quantity of term insurance coverage transacted, are jointly determined. (This is frequently called endogeneity.) 3.2 Identification: Causal Relationships With Discrete Treatment Variables Consider an outcome, home maintenance expenditures, $H, and a binary treatment variable, D, which is a damage rider for tornado-related property losses. The FIRE researcher wants to estimate the impact of a recent mandate that all homeowners policies in the state must offer a tornado-damage rider if they market insurance in that state (or one could imagine the state government changing insurance in another way: dictating that the rider cost no more than a low, specified maximum amount). The causal response to be estimated is How does a home owner maintain her property after she is chosen at random to get the tornado rider?, or Among those purchasing the rider, what happens when they get the rider relative to what they would do if they had not gotten the rider? (or, what is the Average Treatment Effect for everyone? and what is the Average Treatment Effect for those getting the treatment?) For each individual, the home maintenance expenditures with the rider, is $H 1, or the home maintenance expenditures without rider, is $H 0. In the parlance of this literature, the treatment group is those who have purchased the insurance rider, and the control group are those who didn t purchase the rider. The causal issues here is whether the rider induces a change of behavior in those purchasing the rider (moral hazard), or if those 2

3 purchasing do so because they already don t take care of their property and so know that they are at greater risk if there is a storm (adverse selection). The treatment for a particular individual is $H 1 $H 0. The fundamental problem of causal identification is that we generally observe individuals in the treated state or the untreated state (control state), but not both. The estimation problem for the relevant population is to estimate an average treatment effect (ATE): or, ATE (everyone) = E($H 1 ) E($H 0 ) ATE (for the treated) = E($H 1 D = 1) E($H 0 D = 1) = E($H 1 $H 0 D = 1) Where the conditional expectation has the same meaning as it did with assumption 3 in the first chapter: E($H 1 D = 1) are the average expenditures among those chosing to buy the rider, and E($H 0 D = 1) is the counterfactual that we don t observe: namely, what the average expenditures for those choosing to get the rider would have been if they did not get the rider. This notation allows us to consider how the rider might affect behavior. However, these desirable AT E listed above are not what we usually get by estimating the average mean differences in our samples. The expected outcome of the treated group is the average of their home maintenance expenses, given that they chose the rider E($H 1 D = 1). The expected outcome of the untreated group, those not choosing to buy the rider, is E($H 0 D = 0). In other words, we see the treated outcome of those who chose treatment. We see the untreated outcome of those who chose not to be treated. What does this observed difference in means relate to the AT E we want to estimate? E($H 1 D = 1) - E($H 0 D = 0) = E($H 1 D = 1) -E($H 0 D = 1) + E($H 0 D = 1) - E($H 0 D = 0) = E($H 1 $H 0 D = 1) + E($H 0 D = 1) - E($H 0 D = 0) (Polsky and Nicholson (2003) essentially use this framework to decompose HMO costs.) That is, the observed difference in means of those with and without the rider, equals the AT E for those being treated (E($H 1 $H 0 D = 1)) plus a term representing the sample selection (E($H 0 D = 1) - E($H 0 D = 0)). This sample selection term is the difference in home maintenance expenditures between the counterfactual expenditures for those acquiring the rider but asking what their as if behavior would have been without the rider (E($H 0 D = 1)), and those not acquiring the rider (E($H 0 D = 0)). The sample selection term is the bias between what is observed, and the AT E for the treated. So to get the observed difference equal to the AT E for the treated group, we need E($H 0 D = 1) =E($H 0 D = 0) 3

4 that is, no sample selection between groups for the $H 0 side: the counterfactual behavior of those getting the treatment E($H 0 D = 1)), would have have been have been the same as those not getting the treatment E($H 0 D = 0)), if the (D = 1) group had in fact not gotten the treatment (as if the treatment had been randomly assigned). To get the observed difference to equal the AT E for everyone, we also need: E($H 1 D = 1) =E($H 1 D = 0) or, going in the other counterfactual direction, there is no sample selection between groups on the $H 1 side. In summary, we get only if E($H 1 D = 1) - E($H 0 D = 0) = E($H 1 ) -E($H 0 ) E($H 1 D = 1) =E($H 1 D = 0) and E($H 0 D = 1) =E($H 0 D = 0) This means that the average difference between treated and untreated individuals corresponds to the treatment effect only if treated and untreated individuals are the same except for treatment status. That is, there is no sample selection. Suppose that our data generating process is $H i = D i α + X i β + µ i Then, holding X i constant between groups, we get E($H 1 i D = 1, X i ) - E($H 0 i D = 0, X i ) = α + X i β + [E(µ i D = 1) E(µ i D = 0)] Hence, a big problem with biased estimation of the treatment effect (ˆα) is with the unobservables in the model ( [E(µ i D = 1) E(µ i D = 0)]), even when we are holding the observables (X i ) constant. When are we likely to be able to get rid of the sample selection effect, and get consistent AT E estimates? An identifying assumption is what you need to assume in order to identify the parameter of interest. So what is/are the identifying assumptions we can use to help us out? Random assignment. By construction, selection into treatment is random. This means that selection into the treatment is not related to the the ith person s observable (X i ) or unobservable (µ i ) characteristics. What do we identify with random assignment, or what can we learn from a random experiment? E($H 1 $H 0 participated in the random assignment experiment) Identifying assumption. There should be no sample selection effect: participation (getting the tornado rider) is uncorrelated to unobservable characteristics (things in the µ term of the regression). This would be plausible if participation is uncorrelated to observable characteristics, which we check by examining the differences in the means of the D = 1 and D = 0 groups. 4

5 OLS implementation. We model expected expenses as a linear function of observable characteristics and treatment status, and allow non-random selection to effect treatment through a linear function observable characteristics: $H i = X i β + D i α + ɛ i where we are employing matrix notation for generalizability, α is the treatment effect and ɛ is orthogonal to D (treatment status) in this population model. and So that, E($H D = 1, X) = Xβ + α E($H D = 0, X) = Xβ Identifying assumption in this regression context. Treatment is uncorrelated to the error term, e.g. uncorrelated with the unobservables in the model. Assessing plausibility: Treatment status (D, getting the rider under the random assignment experiment) is likely uncorrelated to unobservable factors (ɛ) if it is uncorrelated with observable characteristics (X). So check on the differences in means of the various variables in X between treatment and control status, or simply regress treatment on other X variables (that we did not use in outcome, $H, regression) in the data set, expecting to find an R 2 close to zero. Matching to make the control group comparable to the treatment group. In matching, we simulate random assignment by matching someone with the exact same X in both the D=1 and D=0 groups. (In practice, we match them as close as we can get.) This allows for an arbitrary (non-linear) relationship between X and the outcome. Treatment status is assumed to be random given a particular value of X. That is, the comparison is between a male, age 33 years of age, with a college degree, married, and a full time job who has the rider (D = 1) and a male, age 33 years of age, with a college degree, married, and a full time job who does not have the rider (D = 0) (you can see the problem of finding a E($H 1 D = 1, X) exact match if we additionally included number of children and detailed job occupation). Again, it is assumed that conditioning on the same X removes sample selection effects, so that: E($H 1 D = 1, X) = E($H 1 D = 0, X) and E($H 0 D = 1, X) = E($H 0 D = 0, X) A matching example that mimics these assumptions has been the study of identical twins who ended up with different levels of education. Since many other characteristics are held constant within each twin pair (gender, age, genetic endowment, etc), the resulting differences in earnings associated with differences in education represent a matched analysis of the effect of education on earnings. (Of course, what is not matched is which twin came out first; do you think that this could make a difference?) 5

6 Identifying assumption is that the treatment (insurance rider, D, in our example) is uncorrelated with the error terms of matched observations. Assessing plausibility Do matched observations have comparable characteristics? If we matched on gender, age, schooling and work status, is it also the case that they have similar occupations, wage and salary income, and metropolitan status (suppose these later were unmatched characteristics)? If these unmatched characteristics are also similar, then the matching procedure is much more convincing. Propensity score matching. Since there is a dimensionality problem with matching when there are many variables, or many values within each variable, matching by the means of propensity score is a convenient (sometimes essential) alternative. The Propensity Score Theorem (Rosenbaum and Rubin, 1983) is that if D is randomly assigned conditional on X (this is the notion behind matching, that for two observationally identical individuals, we can treat the insurance rider D as if it were randomly assigned), then it is a random assignment conditional on the propensity score. The propensity score is the likelihood of getting treated (D = 1) conditional on observables X. You estimate it by running a probit or logit on the relevant X. From this, you get a predicted likelihood (ˆp i ) of treatment for everyone in the sample, given their characteristics (again, their X i ). Matching someone who got treated, with someone not treated but with the same propensity score (predicated likelihood of treatment given X i ), results in a match. Then you can get the treatment effect simply by comparing the means of the matched sample. Or you might incorporate the propensity score (or a polynomial of propensity scores, ˆp i ), into your basic estimation ($H) equation. Assessing plausibility Same as for matching in general. Propensity scores can also give a sense of the non-random nature of the treatment (D): generate propensity scores for the whole sample, and look at the boxplots of those scores by treatment status (for the D=1 and D=0 groups), and see how much overlap in the distributions there is. The more overlap, the more like random assignment (on the basis of observables) there is, and the better the comparison. //*** STATA program to estimate propensity scores and do boxplots ***// logit D <BASELINE VARIABBLES, X > predict prob treat graph box prob treat over(d) //*** SAS program to estimate propensity scores and do boxplots ***// proc logistic data=patient variables descending; model D = <BASELINE VARIABLES, X> ; output out=propensity scores pred = prob treat; run; proc boxplot data=propensity scores; plot prob treat*d; run; 6

7 Before and after comparison (i.e., simple difference) This strategy takes the difference in outcomes for the same individual before and after treatment. Treatment effect= E($H 1 D = 1, after) E($H 0 D = 1, before) An important issue is what would those receiving treatment had done in the absence of receiving treatment, that is, is it the case that they would have done nothing but for the treatment, namely, that E($H 0 D = 1, after) = E($H 0 D = 1, before)? There is good empirical evidence that this last condition does not always hold. Indeed the Ashenfelter dip is the often noted (first by Ashenfelter, 1978) finding that the average earnings of program participants in employment and training programs usually decline during the period just prior to participation. That is, participant s behavior is different from nonparticipant s behavior, even before participation arises. In terms of regression analysis, the difference in outcomes by treatment status is estimated as an interrupted time series: $H t = β 1 + β 2 trend t + β 3 after t + β 4 (trend t after t ) + ɛ t where after is a dummy variable equaling one in the second period (of the before/after comparison sample). The coefficient on the trend t after t variable, β 4, indicates the effect of the rider on home maintenance expenses, if the identifying strategy is working. Identifying assumption. Is the timing of treatment uncorrelated to other unobserved determinants of the outcome (no Ashenfelter dip?) Intuitively, is the level of the outcome immediately before treatment reflective of what the outcome would have been immediately after treatment in the absence of treatment? Assessing plausibility. Was the timing of the treatment mostly random and unanticipated? Was the trend changing in anticipation of the treatment? Are observables changing at the time of treatment? Panel Data Techniques: Fixed Effects and Lagged Dependent Variables. As we saw in the examples above, causal inference is often ruined by unobservable confounders: in FIRE1, omitted variables or classical measurement error biased the estimates; in the examples above, sample selection biased the estimates. Having repeated observations on the same individuals can help get rid of some of the unobserved, confounding influences (that is, it helps to have a panel data set that has home maintenance expenditures each year for 10 years, for each of 1000 homeowners). Letting A i be time-invariant, unobserved factors for individual i, and assume E($H 0 it A i, X it, D it ) = E($H 0 it A i, X it ), where time trends may now be included in the X it matrix of variables. The above equality says that controlling for the time invariant factors (completed schooling before the sample was begun, family background including parents use of insurance when growing up, genetic factors, etc), then the purchase of the rider, D it, is as if randomly assigned conditional on A i and X it. That is, the expected value of home maintenance expenditures before the 7

8 insurance rider is actually purchased ($H 0 it), will not depend upon whether the rider (D it ) was subsequently purchased, when controlling for A i and X it. So X it is observed, so its easy to condition on those variables, so we are left with conditioning on A i as well. How do we do that? As long as the A i have a linear impact on the outcome ($H), then we can demean the data, household by household just like we did in FIRE1 for our multiple regression example using Y, X 1, X 2, but with panel data sets we take deviations within each household rather than deviations within the whole sample (which we did in FIRE1). Or alternatively, we can just difference the data, year by year, within each household. There are advantages to both (see Wooldridge, 2015). If the error term in our equation is homoskedastic and serially uncorrelated, it is more efficient to use fixed effect models rather than the differencing models. (Statistical routines for demeaning within cohorts are given below in the appropriate FIRE lectures.) Alternatively, researchers have also conditioned on lagged values of the dependent variable, as follows: E($H 0 it $H i,t 1, X it, D it ) = E($H 0 it $H i,t 1, X it ), assuming that such conditioning generated an as if random assignment of the treatment. (There is an old literature on why this might be the case: whatever makes home maintenance expenditures unique this period, also made home maintenance expenditures unique last period within each household.) Since these two forms of conditioning are not nested (one is not a special case of the other), researchers have tried to generalize this conditioning with the following assumption: E($H 0 it A i, $H i,t 1, X it, D it ) = E($H 0 it A i, $H i,t 1, X it ). While this is less restrictive than the previous two assumptions, note that it places more restrictions on the error structured required for consistent estimation of the model. (See Angrist and Pischke, 2009, chapter 5.) Assessing plausibility. For all three versions of the model above, there are several ways to test for plausibility of the identifying assumptions. With sufficient data over time, one can estimate individual cohort time trends (time trends for each i group), and examine whether the results are robust to those trends. Where plausible, this should always be done. Also we can test the plausibility of the assumptions on making the D it treatments as if randomly assigned, by doing Granger (1969) type causality tests. This involves testing whether the treatment, the purchase of the insurance rider D it this year, affected past behavior (it should not, if the identification strategy worked), as well as future behavior. The estimation equation for a Granger test would look something like: $H i,t = A i + X i,t β + m φ=0 γ φd i,t φ + q φ=1 γ +φd i,t+φ + µ i,t 8

9 Treatments given in the future (namely, the q φ=1 γ +φd i,t+φ terms), should have no influence on today s behavior, so that all γ +φ coefficients should be zero. Difference-in-differences. This is a special case of panel data sets, where the treatment is aggregated across individuals. We look at the before and after gain for a treated group and compare it to the before and after gain for a control (untreated) group. Treatment effect= [E($H 1 D = 1, after) E($H 0 D = 1, before)] [E($H 0 D = 0, after) E($H 0 D = 0, before)] The strategy here is to control for the trend in outcome (given by the before/after difference for the control group) not attributable to treatment. If the trend change in home maintenance for the treatment group without treatment, is the same as the trend change in home maintenance for the control group, or [E($H 0 D = 1, after) E($H 0 D = 1, before)] = [E($H 0 D = 0, after) E($H 0 D = 0, before)] then after substituting this into equation above, we get Treatment effect = E($H 1 D = 1, after) E($H 0 D = 1, after) the treatment effect on the treated group. The corresponding regression model is and $H i,t = β 1 + β 2 after t + β 3 treated i + β 4 (after treated) i,t + ɛ i,t $H i = β 2 + β 4 treated i + ɛ i The change in home maintenance expenditures after the institution of the damage rider requirement (where those choosing the rider are the treated group, D=1 ), equals the increase common to both the treated and control groups, β 2, plus the differential change for those getting the treatment, β 4. It is the β 4 coefficient that measures the treatment effect for those getting treated. Identifying assumption The trend of the control group maps out what would have happened to the trend of the treatment group in the absence of treatment. Assessing plausibility. Do the pretreatment trends of the treatment and control group look similar? To check, run the following regression: trend check: $H i,t = α 1 + α 2 trend t + α 3 D i + α 4 (D i trend t ) i,t + µ i,t where D is a dummy variable for those who will eventually choose the tornado rider. Then the joint test for differences in trend is the test for α 3 = 0 and α 4 = 0. Regression Discontinuity. Treatment is strictly assigned on the basis of an observed index. If individuals are above a cutoff, they receive the treatment. If just below, they receive no treatment. For example, suppose there is a development bank that makes 9

10 disaster aid available to local regions within less developed countries whose risk management index, RMI exceeds a certain value, say 30 (let s suppose they have evaluated the availability of aid for a 1000 such disaster incidences, scattered widely). To assess the value of disaster aid in promoting subsequent economic recovery, we compare the real economic growth of countries just below the cutoff value (and barely not qualifying for the aid) with those whose RMI is just above 30, and hence barely eligible for the aid, at times when disasters strike and such aid is needed. The difference between the just below, and just above, country growth levels identifies the treatment effect. In practice, one controls for the level of the index as well. Is effect identified for all regions? No, just those near the cutoff; hence, it is a local average treatment effect (LATE). Regression for this type of identification strategy is growth i,t = β 1 + β 2 RMI i,t + β 3 D i,t + β 4 (RMI D) i,t + ɛ i,t where D i,t = 1 if the region receives aid (at the time of the disaster, they the must have a RMI of 30 or greater), and D i,t = 0 if the region does not receive aid (RMI < 30) at the time disaster strikes. Assume that while the cutoff value of 30 is well established, and many regional entities RMI populate either side of the cutoff. The regression is estimated for regional governments just around the cutoff value. The causal affect of receiving aid on growth is given by ˆβ 3 and ˆβ 4. Identifying assumption The unobserved characteristics on either side of the discontinuity are comparable. Assessing plausibility. Show that observables are comparable on either side of the cutoff. Print out and examine the data to confirm there are no suspicious jumps in the density of observations near the cutoff. Instrumental variables. Return to our regression of home maintenance on getting the tornado rider for the homeowner s policy, discussed earlier: $H i = β 1 + D i α + ɛ i Further, let s assume that treatment D is correlated with the residual (for any one of a number of reasons, including simultaneous equation structure, measurement error, or omitted variable bias in the structural, data generating process). This violates the standard OLS assumption leading to biased estimates of α. plim ˆα OLS = α + cov(d, ɛ)/var(d) We can overcome this problem if we have an instrument, Z, that is correlated to treatment but uncorrelated to the residual: Cov(D, Z) 0; Cov(ɛ, Z) = 0 The intuition underlying this condition is that Z only affects the outcome, home maintenance expenditures, through its correlation with the insurance rider. Using this instrument, we can perform two stage least squares, as indicated in the first chapter. We regress treatment on the instrument in what is called a first-stage regression. 10

11 D i = γ 1 + Z i γ 2 + ν i In the second stage, we regress the outcome on the predicted treatment. $H i = β 0 + ˆD i α + ɛ i [[where ˆD i = ˆγ 1 + Z iˆγ 2 ]] The IV estimate converges to the following: plim ˆα = α + cov(z, ɛ)/cov(z, D) Note that if the IV assumptions hold, the last term equals zero. When will the IV estimator be more or less biased than the OLS estimator? Identifying assumption. The instrumental variable only affects the outcome, home maintenance expenditures, through the decision to acquire the insurance rider. Suppose you had data on whether the individual insured s parents bought insurance riders. This might work as an instrument, if it increased the likelihood of a buyer acquiring a tornadorider, but had no effect on home maintenance expenditures otherwise. The instrument has to be uncorrelated to the unobserved determinants of outcome. Assessing plausibility. You need to have a compelling reason about why the instrumental variable only affects the rider, without any other influence on home maintenance expenditures. Show that instrument is uncorrelated with other observable determinants of outcome. Heterogeneous Treatment Effects. So far in these first two lectures, we have assumed the the model coefficients are stable against observations (so we have i subscripts on the coefficients, as well as the variables). Suppose otherwise and we also allow treatment effects to differ across individuals. Then our data generating regression looks like: $H i = β i + D i α i + ɛ i What does the IV estimate of α converge to? To simplify the issue, assume where Z has two values (0 and 1) and D has two values (0 and 1). Let s further assume that there are two types of individuals each forming one half of the population. Type A s (T = A) treatment status responds to Z: P r(d = 1 Z = 1, T = A) > P r(d = 1 Z = 0, T = A). prob(you got a rider parents did) > prob(you got a rider parents did not) Type B s (T = B) treatment status does not respond to Z: P r(d = 1 Z = 1, T = B) = P r(d = 1 Z = 0, T = B). probability of rider not affected by parents purchases We ll assume that α is the same within types but differs across types (i.e. α A does not equal α B ). Then what does this estimator converge to? plim(ˆα) = 11

12 {[.5E($H Z = 1, T = A) +.5E($H Z = 1, T = B)] [.5E($H Z = 0, T = A) +.5E($H Z = 0, T = B)]}/ {[.5P r(d Z = 1, T = A) +.5P r(d Z = 1, T = B)] [.5P r(d Z = 0, T = A) +.5P r(d Z = 0, T = B)]} Or, plim(ˆα) = {E($H Z = 1, T = A) E($H Z = 0, T = A)}/ {P r(d Z = 1, T = A) P r(d Z = 0, T = A)} Or, plim(ˆα) = {α A P r(d = 1 Z = 1, T = A) α A P r(d = 1 Z = 0, T = A)}/ {α A P r(d = 1 Z = 1, T = A) α A P r(d = 1 Z = 0, T = A)} Or, plim(ˆα) = α A Note that the instrumental variable here gives us a local average treatment effect: this estimator converges to that part of the population whose responses are sensitive to variations in the instrumental variable (e.g., the As in the population, but not the Bs). This is a general result for instrumental variables. Structural Econometric Models. There is no clear dividing line between structural econometric modeling and non-structural modeling. Basically, structural modeling tries to tie the estimation equation more fully to economic theory (and sometimes, statistical theory), in explaining how variations in x k affects the output y, and sometimes, how the unobservable variables, µ affect y as well. As noted in Reiss and Wolak (2007), there are three things that this attention to economic structure in that statistical modeling of relationships attempts to accomplish: First, structural modeling may be able to make inferences about unobserved behavioral relationships that could not be retrieved from nonexperimental data, without reference to the structure. Such use of modeling is ubiquitous in FIRE projects. Second, structural models provide justification for speculation on counterfactual outcomes or provide policy simulations. This is inevitably done in actuarial science and in studies of regulation, most often implicitly. Finally, structural models are often employed to distinguish the implications of competitive theories. But as Reiss and Wolak note the only sense in which one can test the two theories is to ask whether one of these ways of combining the same economic and stochastic primitives provides a markedly better description of observed or out-of-sample data. But even when physics, and metaphysics, conspire to make our theories unassailably true, the data used to estimate those theories always include unobservables that are not wholly accounted for by our theories. Therefore, all of the identification insights discussed above apply to the best of structural models. Hence, we will not distinguish further between structural and non-structural models below. Though we will discuss Butler and 12

13 Lambson (2015), both because their estimation equation derives from a structural specification (their first order condition for portfolio maximization), and because its structural specification (a non-expected utility theory) implicitly contradicts assumptions implicit in much of the subsequent empirical studies we will be examining. But first some advice and a couple of useful, if somewhat out of the way, tools. 13

EMERGING MARKETS - Lecture 2: Methodology refresher

EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different