Are electric toothbrushes effective? An example of structural modelling.

Size: px

Start display at page:

Download "Are electric toothbrushes effective? An example of structural modelling."

Amberlynn Cooper
5 years ago
Views:

1 Are electric toothbrushes effective? An example of structural modelling. Martin Browning Department of Economics, University of Oxford Revised, February Background. Very broadly, there are four classes of explanation for a correlation between two variables and : 1. Chance. Any two vectors of the same length will almost surely have a non-zero correlation. 2. There is a third factor,, that causes both and. A classic example is if and are trending time series variables. In that case, is time. 3. causes. 4. causes. These are not mutually exclusive. For example, we might believe that both and could cause. In that case, when testing for whether actually causes, we would want to control for. A famous example of this is the link between smoking () and lung cancer (). A strong correlation was found in epidemiological studies in the early 1950 s. These showed that patients with lung cancer were much more likely to be heavy smokers than the general population. The correlation was very strong so that chance could be ruled out. The immediate suspicion was that ThesenotesaresolelyfortheuseofstudentsontheStructuralModellingcourse.Please do not distribute them or quote them without permission. 1

2 smoking was causing cancer ( ). The tobacco companies responded by, amongst other things, employing the greatest statistician of the 20th century, R. A, Fisher, to point out the weaknesses in this causal inference. One of his arguments was that, perhaps, cancer causes smoking. It was easy to rule this out on the grounds that most people started smoking in their teens but the onset of cancer almost certainly had to be later than this. Thus a temporal sequence argument was used to rule. Fisher s most telling argument was that perhaps some people had a genetic pre-disposition towards smoking and lung cancer. Thus item is genetic makeup. This was much more difficult to rule out in observational studies. One piece of evidence against was that pipe smokers were less likely to contract lung cancer than cigarette smokers who had the same dosage. This started to make a genetic cause less credible but effectively, the genetic link was only almost universally rejected when random controlled experiments on beagles showed that dogs who systematically had their lungs filled with tobacco smoke were more likely to contract cancer. 2 A model of brushing. We shall use a transparent example to illustrate structural modelling. Soon after electric toothbrushes were introduced there was some evidence that people who used electric toothbrushes (ET) had healthier teeth than those who used a regular toothbrush (RT). There are many possible reasons for this correlation. To rule out 1 above we would run a test of whether the ET effect was significantly different from zero. An example of 2 above would be if high income leads people to buy electric toothbrushes more and also leads them to spend more on dentistry. This is an example of selection on an observable (assuming we can observe income). It is relatively easy to take account of selection on observables. Much more difficult to deal with is selectiononanunobservable.thisarisesin our context if, for example, people who care more about their teeth bought an ET thinking that it was better for their teeth. But these people will also have had better dental health habits (avoiding sweets, flossing, brushing after every meal etc.). So that even if the ET is no better than the RT, we would observe better health amongst those who use them. Another possible explanation for the toothbrush correlation is causal : ET s really do make brushing more effective. Note that both selection on an unobservable and causality could be operating at the same time, in which case the coefficient on using an ET overstates the 2

3 causal impact. With non-experimental data the way we would proceed is to try and find some observable variable that influences the choice of ET/RT but does not impact directly on the healthiness of teeth (an instrument for using an ET). Such an instrument is quite difficult to find. The effectiveness of ET s is a really big issue in dentistry so the next step was to conduct controlled experiments in which some people were randomly assigned an ET (the treatment group) and others an RT (the control group). The broad conclusion from this was that, on average, those who had been assigned an ET had healthier teeth at the end of the experimental period than those who were assigned a RT. 2.1 The model. The experimental result is mildly interesting but it raises more questions than it answers. To see why, consider a simple model in which a person cares about the health of their teeth, denoted, and the time taken brushing,. 1 One of the most fruitful aspects of modelling in microeconomics is to first consider constraints and preferences separately and then to bring them together to give observable behaviour Constraints. We first specify constraints, in this case two production functions. Using a regular toothbrush, health is produced by a production function = (). Using an electric toothbrush, the production function is = (). We shall say that an ET is effective if () () for all 0. An ET is ineffective if () = () for all. Figure1 illustrates a case in which an ET is more effective than an RT. 3 Most people would interpret the experimental finding above as proving that ET s are effective. As we shall see, this is not necessarily the case. 1 In practice we should consider different types of healthiness and different decisions that impact on dental health (such as sugar intake). But this simple model will serve to illustrate all the points of interest. 2 Many researchers in other social sciences think that this is nonsense. They believe that what is available conditions what you want and that we cannot consider preferences and constraints separately. This would invalidate most economic models. 3 Note that we allow that too much brushing can be a bad thing. Also, the mode is irrelevant if the agent never brushes. 3

4 2.1.2 Preferences. Let be a dummy variable that is 1 if the person uses an ET (and zero otherwise). Preferences are represented by the utility function ( ). Weassume that () is increasing in and decreasing in. The dependence on the mode,, is to capture pure preference effects for using an ET. One possible reason for this is that using an ET requires less effort or, maybe, an ET is considered too noisy for the early morning. An important consideration for the mode is whether it affects the trade-off between time and health. If the utility function can be written as: ( ) = ( ( ) ) (1) then we say that preferences over ( ) are separable from. 4 That is, the marginal rate of substitution between time and health is independent of : = = (2) where the final expression on the right hand side does not depend on. In figure 2 we show preferences which are non-separable so that the mode does change the slope of the indifference curves (that is, indifference curves can cross). In the case illustrated, using an ET makes time brushing less onerous. To see this, note that at the point where the two indifference curves cross a person using an RT would need a bigger increase in to compensate her for an increase in than if using an ET. 2.2 Choices. Having established what the agent likes and what the agent can do, we can put them together to model their choice. For modes 0 (RT) and 1 (ET) respectively we have: ˆ 0 = max{ ( () 0)} (RT) ˆ 1 = max{ ( () 1)} (ET) 4 We have already made an implicit separability assumption that preferences over ( ) are independent of the thousands of other things the agent cares about. 4

5 The corresponding choices are denoted ³ˆ0 ˆ 0 and ³ˆ1 ˆ 1. Ifwewereconsidering non-experimental ( observational ) data in which people choose whether to use RT or ET we would need to take into account these utility levels and the different prices of the two modes ( and for RT and ET respectively, with ). This would require us to make allowance for differences in the marginal utility of money,. Thuswehavethattheagentchoosestousethe ET if and only if: ˆ 1 ˆ 0 (3) This is an example of cost-benefit analysis: there are two choices and the one chosen has the highest benefit minus cost. This analysis would give the result discussed in the first section in which richer people (who have a low ) choose the ET option. In practice it is very difficult to control for the marginal utility of money (and other factors that might influence choice). One way around this is to conduct a controlled experiment in which people are assigned RT or ET. 2.3 Interpreting the experimental finding. The purpose of this subsection is to show that we have to be very careful how we interpret experimental effects. In an experimental setting, we take the choice of mode as external to the agent so that a control individual with ( =0) sets ( ) = ³ˆ0 ˆ 0 andatreatedpersonwith( =1)sets( ) = ³ˆ1 ˆ 1. Suppose the experimental finding tells us that the average ˆ 1 is statistically significantly higher than the average ˆ 0. 5 Without supplementary assumptions this does not tell us anything about whether or not ET is effective. Toseethis, consider two cases. The first case is illustrated in figure 3. HeretheETiseffective (the ET production curve is above the RT curve) but people s choice of time spent brushing undoes the effect. As shown, agents take all of the increased productivity to reduce their time brushing and they keep health constant. 6 This would show up as a zero experimental effect: ˆ 1 = () = ( )=ˆ 0 (4) 5 A simple way to do this is to regress the observed health variable on the treatment dummy and to test for whether the coefficient on the latter is significantly greater than zero. 6 Indifference curves that shift horizontally correspond to quasi-linear preferences. 5

6 Thus the experiment shows no effect even though an ET is effective. The converse case is illustrated in figure 4. Here ET is ineffective ( () = ()) but agents find the ET requires less effort and they increase. Thus we have: ˆ 1 = () ( )=ˆ 0 (5) Thus the experiment shows a positive effect even though ET is ineffective.assuming that preferences over ( ) are separable from rules this case out. This shows that a positive experimental effect is neither necessary nor sufficient for the hypothesis that ET s are effective. In this case, the problem is that people know whether they are a treatment or a control and they may change their behaviour accordingly. One solution to this problem would be to assign the time spent brushing. For example, everyone could be required to spend exactly 10 minutes per day brushing their teeth. Then we could just compare the postexperiment health of the two groups and determine whether ET s are effective at =10. This is obviously completely impractical. 7 Instead we might collect information on the time spent brushing and use this to control for changes in consequent on the assigned mode, RT or ET. Here we have taken the ideal case. For example, we are assuming that assignment is perfect in the sense that someone assigned an ET always uses it and similarly for RT. In practice experimental subjects do not always comply with the experimental assignment. This leads to additional complications for interpreting experimental outcomes. The situation is worse for quasi-experimental studies (or natural experiments ) in which there may even be doubt about the randomness of the assignment to treatment or control as well as the degree of compliance. To judge whether ET s are effective we need to build a structural model which uses information on the time spent brushing. Given people in total, the data are the values of ( ) for =12. 7 In a medical random controlled trial a patient does not know whether they are receiving the treatment or a placebo and consequently cannot react to the assignment. In the toothbrush example it would be difficult to hide the mode of toothbrush from the agent! 6

7 3 Taking the theory to the data. 3.1 A regression approach. We shall now consider how to take the theory to the data. We first consider choosing a functional form that gives a conventional looking linear regression. We take the following functional form for the health production function: = 2 exp ( ) (6) where is a stochastic health production shifter. This represents unobserved heterogeneity. The variable will vary across people according to the acidity of their saliva and their diet and also their behaviour (how often you visit the dentist). This functional form has some desirable properties; for example it is increasing and concave in if 2 (0 1). It also implies that the choice of mode does not affect health for those who never brush their teeth ( =0). The functional form also has some undesirable properties. For example, it does not allow for any decreasing segment. More importantly, there is no heterogeneity in health amongst people who never brush their teeth. Finally, the effect of the mode (here given by 1 ) is assumed to be the same for everyone. We could probably do better but at a cost of introducing more parameters. Choosing a flexible functional form that parsimoniously captures all desirable features in terms of theory and that fits the data is a very important element in parametric structural modelling. An ET is effective if and only if 1 0 Thus 1 is the parameter of interest. There are different ways to quantify effectiveness. For example, we could take the difference in health between those who use an ET and those who do not: ( =1) ( =0).Thisisheterogeneousinthesensethat: ( =1) ( =0)={ 2 exp ( 0 + )} (exp ( 1 ) 1) (7) depends on the heterogeneity parameter. An alternative definition of effectiveness is ( =1) ( =0) =exp( 1) (8) which is homogeneous (that is, independent of ). We shall assume that everyone spends some time brushing their teeth ( 0) 7

8 and take logs of (6) to give the simple linear-in-log form: ln = ln + (9) Given observations on ( ) wecouldrunaregressiononthefullsampleofln on ( ln ) and find OLS estimates for the parameter of interest. 8 Often someone presenting regression results such as this will say they are controlling for the time spent brushing. The OLS coefficient estimates yield consistent estimates of 2 only if and are uncorrelated with unobservable health heterogeneity,. That is ( ) =0. By design, the random experimental assignment implies that is uncorrelated with. This is the great value of an experiment - we do not need to model the selection of mode as in (3). 9 To determine whether it is plausible to assume is uncorrelated with we need to re-consider the choice of the time spent brushing. We need only consider the RT group (the same analysis applies for the ET group). We have the following program 10 : max ( ( ) 0) (10) The first order conditions are (denoting partial derivatives by subscripts): ˆ ˆ 0 ˆ + ˆ ˆ 0 =0 (11) Generally the optimal time spent brushing, ˆ, is a function of. To determine if ˆ depends on we calculate the partial derivative with respect to. Aftersome manipulations we find that: ˆ i h ( ) [ + + ]=0 (12) 8 A more sophisticated approach would be to model the dependence of on nonparametrically for each mode, RT and ET. Then test whether the two nonparametric curves are the same. This does not solve the problems we are about to discover. No amount of statistical pyrotechnics can solve an identification problem. 9 This ignores the important issue of compliance. Some of the treated may use a RT and some of the controls may use an ET even though they have been asked to use the assigned mode. 10 It is important to note exactly what we are assuming here. In particular, we assume that people know the true production functions for the two modes and use this in their decision making. Since the whole point of the research in this area is to establish whether ET s are effective, this is a very strong assumption! 8

9 If ˆ =0then the second term in brackets must also be zero, so that: + + =0 (13) If this holds then OLS gives a consistent estimate of the effect of using an electric toothbrush. It is difficult (impossible?) to interpret this condition but it looks unlikely that it will hold given that it depends on both technology (the () function) and tastes (the () function). Just to check how restrictive it is, we now consider a parametric form for the utility function and use the parametric form (6). 3.2 A parametric model. Let be the maximum number of minutes per day that anyone would brush; (for example, =20). Denote the logistic function: () = exp () 1+exp() so that () is always between zero and unity. functional form for the utility function: (14) We shall take the following ( ) = ( )ln( )+(1 ( )) ln ( ) (15) so that time brushing is a bad and health is a good. If the function value ( ) is high then the person cares a lot about the health of their teeth (relative to the time spent brushing). Preferences over ( ) are separable from the mode if 1 =0.Thevariable captures unobserved heterogeneity in tastes for healthy teeth; this may include concerns about how your teeth look. Maximising ( ) subject to (6) gives the following first order condition for the optimal,denoted ˆ: ( ) 2ˆ (2 1) exp ( ) ˆ 2 exp ( ) (1 ( )) ˆ =0 (16) This does not have closed form expression in ˆ but it is easy to solve it numerically for given values of ( ). The important point here is that the optimal value ˆ will generally depend on the unobserved production heterogeneity. That is, ln in equation (9) is endogenous. If you look care- 9

10 fully, however, we can rule this out by setting =0. 11 can find a closed form expression for the optimal : If we do that then we ˆ = ( ) 2 1+ ( )( 2 1) (17) The most important implication of this is that the OLS estimates of (9) are now consistent estimates of the parameters of interest, so long as and are uncorrelated. Thatis,if is correlated with then depends indirectly on even though the latter does not appear in the right hand expression explicitly. The two terms and being uncorrelated is a strong assumption since contains other things than and that determine the health of teeth and captures how much the agent cares about their teeth relative to the time spent brushing. One important thing to note is that separability of ( ) from in the utility function ( 1 =0) is irrelevant for the exogeneity of ln in equation (9). If we are willing to assume that and are uncorrelated, why not also assume =0? It is difficult to decide whether this is a strong assumption. Within our model, it implies that if a person never brushed their teeth ( =0) then they would have =0and a utility level of. This is not necessarily a disaster for the model since this sub-utility function is embedded within a wider utility function that may give low weight to a utility of for this particular outcome. 12 This analysis shows that once we take a structural model seriously, it is difficult to justify that OLS estimates of equation (9) will deliver reliable estimates of whether ET s are effective. The obvious approach is to try and find an instrument for in equation (9) For example, some measures of how busy the person is; this impacts on the time spent brushing but not on health directly. Examples would include the hours of work and whether there are young children in the household. Suppose we have such a variable, denoted. Assuming is uncorrelated with ( ), we can consistently estimate the parameters of (9) by instrumental variable estimation, using ( ) as instruments. The point of all this analysis is that the simple difference between dental health for those who are assigned an ET rather than an RT is not a reliable guide to whether ET s are effective. Instead our structural model gave that we 11 It is tedious to check, but this is equivalent to condition (13) for the general case. 12 For example, we might have an overall utility function that looks like: = (exp ( ( )) everything ealse) 10

11 have had to control for the time spent brushing and find a determinant of this that is uncorrelated with the unobservables. The latter have a precise definition within our structural model. Having developed a structural model, we can exploit the instrument to gain insight into the other parts of the model. 3.3 A structural model A nonparametric structural model. One approach is to set up a structural model. When doing this, the ideal would be to allow for full generality in tastes. That is, the utility function ( ) which now also depends on the taste shifter. We would also like to simply estimate the health production functions, ( ) and ( ). Thus we only assume that the functions exist, are smooth and satisfy certain monotonicity and concavity assumptions. For example, we would assume that () is strictly increasing and concave in, strictly increasing in ; strictly decreasing and concave in for fixed values of the other variables. We may also require that tastes over ( ) are strictly increasing in for fixed ( ); thatis: ( ) 0 (18) Similarly we might impose that () and () are strictly increasing and concave in and strictly increasing in. We would also need to allow for nonparametric distributions for the heterogeneity distributions for and. Again, we may impose conditions on the probability distribution functions such as continuity ( no mass points ) and full support. Ideally the data then determines the form of all of the unknowns. If we go that route, we have to show nonparametric identification of the unknown structure; that is, the forms of ( ), ( ) and ( ) and the joint distribution of and. This identification analysis would assume that we have sufficiently rich data that we can estimate consistently the joint distribution of ( ). Toshowidentification we need to show that if two different choices of the structure give the same joint distribution for ( ) then they are the same structure. If we found that the elements of the structure were not nonparametrically identified, a sophisticated response would be to conduct a partial identification analysis and to determine the bounds on the effectiveness of an ET. 11

12 3.3.2 A fully parametric structural model. A structural approach that is the complete opposite of the nonparametric structural approach is start with a simple fully parametric model. In fact, we have almost done that in the last section. Thus (9) and (16) give equations for the determination of observables ( ) given observables ( ) and unobservables ( ). All that remains is to specify distributions for the latter. For example we might assume: Ã! Ã" 0 # #! 0 " 2 2 (19) This still requires an identification analysis to show that the parameter of interest, 1,isidentified. 13 If we can point identify the parameter of interest, 1, (the nuisance parameters are ( ) and 2 2 )thenwe could estimate by maximum likelihood estimation. Then we would test 1 =0 against the alternative, 1 0. If additional observables such as age, gender and education are available, we could allow that 0 and 0 (the means for the heterogeneity) depend on them. In practice, deriving the likelihood function for the Normal model may be tricky and we may have to resort to simulation based estimation methods such as indirect inference ( simulated minimum distance ) Goodness of fit. If we do adopt a fully parametric model then we are open to the criticism that our results depend too much on the functional form assumptions. For example, suppose we took some other distribution for the heterogeneity than the bivariate Normal? Or a different functional form for the production function? Therearetwowaystoovercomethis. Thefirst is to adopt flexible functional forms for the various functions. For example, for the heterogeneity we could take a mixture of three Normals for and. A mixture of three Normals can approximate closely almost any continuous distribution. The drawback from doing this is the mixture distribution has 17 parameters rather than the 5 parameters in (19). Similarly, we could take much more complicated production and utility functions than (6) and (15). If we did this we would have to conduct 13 It is important to remember that we can have parametric models in which some parameters are point identified and others are not. 12

13 an identification analysis to show that the effectiveness of ET is point identified. This approach involves a move toward the nonparametric model discussed above. An alternative approach is to start with simple functional forms (such as those from the previous subsection) and estimate a small number of parameters. Then apply a wide range of goodness of fit tests. For example, one would want to test whether the estimated parameters adequately fit thefirst four moments (means, variances, skewness and kurtosis) of the distributions of and for each mode. Note that one of these moments is the quasi-experimental effect: the difference between the mean health outcomes for the two groups (RT and ET). One would also want to check the fit for the dependence of these two variables (at a minimum, their correlation). More ambitiously, can we match the quantiles and dependencies between the and variables. If our estimated model fits these moments well then we have some confidence in our estimates. This is a progressive strategy in the sense that failures to fit incertaindirections usually indicate in which way we need to generalise the model. We keep moving back and forth between theory, identification, estimation and goodness of fit until we have a model that captures all of the features of the observed joint distribution of, and. I would then take the final model as an adequate structural model for determining whether electric toothbrushes are effective. Note, however, that if the object of interest is not nonparametrically identified, then there will be other structural models that give a different estimate and also fit the data as well as our preferred model. This is a serious drawback - showing nonparametric identification remains the gold standard. 13

14 health, h ET, h=g(t) RT, h=f(t) time brushing, t Figure 1: Production functions for ET and RT Health, h ET, d=1 RT, d=0 time brushing, t Figure 2: Indifference curves for ET and RT 14

15 health ET h(et)=h(rt) RT t(et) t(rt) time brushing Figure 3: Effective ET and separable preferences health h(et) RT=ET h(rt) t(rt) t(et) time brushing Figure 4: Ineffective ET and non-separable preferences 15

Chapter 5: Preferences

Chapter 5: Preferences 5.1: Introduction In chapters 3 and 4 we considered a particular type of preferences in which all the indifference curves are parallel to each other and in which each indifference