Estimation of the Binary Response Model using a. Mixture of Distributions Estimator (MOD) Mark Coppejans 1. Department of Economics.

Size: px
Start display at page:

Download "Estimation of the Binary Response Model using a. Mixture of Distributions Estimator (MOD) Mark Coppejans 1. Department of Economics."

Transcription

1 Estimation of the Binary Response Model using a Mixture of Distributions Estimator (MOD) Mar Coppejans 1 Department of Economics Due University Durham NC USA Phone: (919) Fax: (919) mtc@econ.due.edu July I would lie to than Rosa Matzin, Torben Andersen, and especially Ian Domowitz for their comments. This and other papers by the author are available at

2 ABSTRACT This paper develops a semiparametric sieve estimator, which is termed a mixture of distributions estimator (MOD), to estimate a binary response model when the distribution of the errors is unnown. The estimator for the distribution function is composed of a mixture of smooth distributions, where the number of mixtures increases with the sample size. The model is semiparametric because it is assumed that a parametric index type restriction holds. Optimal rates of convergence are established for the distribution function under the L 2 norm, and conditions are derived under which estimates of the parametric component are asymptotically normal. An appealing feature about MOD is that it is possible to restrict the estimator of the distribution function, a priori, to be smooth, non-negative, increasing, and to integrate to one. This has important practical and theoretical implications. KEY WORDS: Binary Response Model, Mixture of Distributions, Sieve Estimator, Index Restriction.

3 1 Introduction This paper develops a semiparametric sieve estimator, which is termed a mixture of distributions estimator (MOD), to estimate a binary response model. The underlying form is y i = x 0 i 0 + i ; i = 1; : : : ; n, where fx 0 i; i g 0 are i.i.d. random vectors, x i 2 < d, d 1, and i 2 <, but the distribution of, F (), is unnown. One observes the sequence fy i ; x 0 ig 0, where y i is dened as y i = 8 >< >: 1; if y i > 0; 0; if y i 0: In many economic settings, the statistic of main interest is E[yjx], and this is not specic to the problem addressed here, nor is it specic to the estimation approach considered in this study. When the regressors and errors are mutually independent, E[yjx] = 1? F (?x 0 0 ) = E[yjx 0 0 ], and because the conditional expectation is completely specied as a function of x 0 0, this model is said to satisfy a single index restriction. We propose to estimate E[yjx] by estimating F () and 0 simultaneously. The estimator for F () is a type of mixture of distributions where the number of mixtures increases with the sample size. Given that the number of mixture terms grows at a suitable rate, optimal rates of convergence in L 2 E[yjx] and F () are obtained. We will also provide some results on the estimates of the density of the errors, f(), and on the asymptotic distribution of the estimates of 0. The estimation procedure is called the method of sieves, where a sieve is a sequence of nite parameter spaces constructed so that, in the limit, the function of interest, E[yjx], lies within it. Useful restrictions will be imposed on the sieve as well. For example, suppose a mixture of normals is used to estimate F (), P j=1 [(? j)= ], where () is the normal c.d.f., j and are parameters, = n, and > 0 is to be determined later. The sieve then is this sequence of mixtures. Under unconstrained optimization, for moderate values of, will typically be set to zero. The result is a step function, which, in general, will lead to poor estimates of F () if we believe that F (), as we do here, is smooth. The ey is to bound for 1

4 from below, allowing it to decrease with n. One must use care in placing this bound, however. If it is too small, the estimated distribution will be close to a step function. If the bound is too large, the sieve might be a poor approximation to the underlying function, and as a result, optimal rates may not be obtainable. 1 part alleviates this problem by placing bounds on the rate at which The theory developed in this paper in can decrease and can increase, where the choice of has important consequences. As Shen and Wong (1994) have shown in a general setting, if is too small, then optimal rates may not be obtainable, and if is too large, then even consistency may not be achieved. Using a nite mixture model to estimate a binary response model is not new, and it is, in fact, prevalent throughout economics. Nor is the idea of using a mixture model, where the number of mixtures increases with the sample size, new. One of the rst theoretical papers on sieve estimation, Geman and Hwang (1983), proved that estimating a density by using a mixture of densities is consistent under appropriate conditions. As another example, Hecman and Singer (1984) showed that estimating a distribution by a mixture of distributions is consistent, but their estimator is a step function, and as a result, it is not surprising that they obtained poor estimates. 2 What is novel in this paper is that we rigorously outline the necessary conditions under which optimal estimates of F () in L 2 guaranteed, asymptotically. This is a much stronger result than just consistency. As a consequence, optimal estimates of E[yjx] are also obtained. It is well nown that as we increase the number of terms, mixtures of, say, normals can approximate any continuous distribution function arbitrarily well (e.g. Zubov, 1995). Nonetheless, this alone will not provide us with anything more than consistency. Hence a stronger result is required for the rate of convergence calculations. This is obtained by bounding the sieve approximation error, which is dened here as the rate in terms of at which jf ()? F ()j tends to zero under the L 2 norm, where F can be thought of as the closest distribution function composed of mixtures to F (). In nonparametric estimation, 1 The intuition is similar to that of a ernel estimator, where plays the role of the bandwidth parameter. 2 Hecman and Singer (1984) were also interested in estimates of structural parameters and F (t) = R G(tj) d(), where, in their notation, G(tj) is nown up to the scalar, and () is an unnown distribution. They found that even though they do not estimate () very well, they did estimate the structural parameter and F (t) well, which is not surprising because the functional form of G(tj) is nown and integration acts as a smoother. is 2

5 there is often a variance and bias 2 tradeo, and the bias here can be thought of as the sieve approximation error. Once this is obtained, rates of convergence easily follow from the results in Shen and Wong (1994) or Chen and Shen (1998), both of whom treat the sieve approximation error as mainly given, instead focusing on the variance component of the variance and bias 2 tradeo. The main theoretical contribution of this paper, therefore, is in obtaining a lower bound on the sieve approximation error. For example, suppose that F () is twice continuously dierentiable. Then, on a compact support, the sieve approximation error is of order?6=7+, > 0 arbitrarily small, under the L 2 norm. In addition, F () and its rst two derivatives converge to F () and its rst two derivatives, respectively, in the strong norm. There are many dierent approaches in the literature on how to estimator this type of binary response model. However, unlie most of the other methods, MOD has the property that the estimator for F () can be restricted, a priori, to be a proper distribution function that is also smooth. This has both theoretical and practical advantages. From a statistical standpoint, this provides an estimator for those who would prefer to estimate a distribution with a distribution. From a microeconomic standpoint in which the underlying model is based on individual preferences, an estimator for F () that is not strictly monotonic implies that consumers are not necessarily utility maximizing a violation of basic economic principles. 3 Enforcing this type of restriction, then, which is derived from economic theory, is in line with Matzin (1992), for example. From a practical standpoint, non-monotonic estimates of E[yjx 0 0 ] are hard to interpret in an economic context. 4 As an example, Stern (1996) estimated supply and demand functions, and in doing so, illustrated the diculties involved when analyzing non-monotonic estimates of E[yjx 0 0 ]. Stern (1996) proposed an alternative estimator that is monotonic, but it tends to have \ins" and \at spots", which are also dicult to interpret in an economic context. Another nice feature about MOD is that when it is composed of a mixture of normals, it can be viewed as a natural extension 3 For example, let U i;j = x 0 i 0;j + i;j be the ith person's indirect utilities associated with goods j = 1; 2. Set 0 = 0;2? 0;1 and i = i;2? i;1. Then dening y i = 1 as the event that the ith person chooses good j = 2, we have P (y i = 1) = P (U i;2 > U i;1 ) = 1? F (?x 0 0 ), which is necessarily non-increasing in?x 0 0. Therefore, if an estimator for F () is somewhere decreasing, then it is possible to construct a situation such that y i = 1 but U i;2 < U i;1. 4 Of course there are ways of xing this ex post, but the method is often arbitrary. 3

6 of the paradigm case for purely parametric settings the probit model. Given the importance of E[yjx], it is somewhat surprising that most of the econometric literature in this area has focused almost entirely on estimates of 0. This was fueled, in large part, as a response to the results in Ruud (1983), who derived conditions under which consistency of 0 is possible even if F () is incorrectly specied as a normal. For example, Mansi (1985), Stoer (1986), and Han (1987) have all constructed consistent estimates of 0. 5 Two more recent approaches by Ichimura (1993) and Klein and Spady (1993) concentrated on the asymptotic distributional properties of the estimates of 0, but their methods can also be used to obtain optimal estimates of E[yjx] and F (). Both estimators are set up as Nadaraya{Watson type ernel regressions, where the former uses a nonparametric nonlinear least squares objective function, and the latter uses a nonparametric maximum lielihood objective function. The estimators also share the property that neither one is necessarily monotonic. The model developed in this paper is most similar to Cosslett's (1983) nonparametric MLE estimator because Cosslett's estimator for F () is a proper distribution function. However, Cosslett's estimator for F () is a step function, and rates of convergence are not established. Our method is also similar to Gallant and Nycha's (1987) technique for sample selection problems in that they estimate the unnown density function using a Hermite type polynomial expansion. Gallant and Nycha established consistency results for their estimator, but they do not establish general results on rates of convergence. This paper instead uses the conditions developed in Shen and Wong (1994) to derive rate results and the conditions in Shen (1997) to derive the asymptotic normality result. Horowitz (1996) has developed a n?1=2 {consistent, asymptotically normal, nonparametric estimator for F () for single index models with unnown transformation of the dependent variables. Horowitz's results, however, do not extend to the binary response model and the estimator is not necessarily monotonic. It should also be noted that other type of sieve estimators can also be used to construct 5 Their wor has generated many other papers, most of which provide asymptotic normality results for the estimates of 0. These include Horowitz's (1992) extension of Mansi's results, Powell's et al. (1989), Hardle and Stoer's (1989), and Newey and Stoer's (1993) extensions of Stoer's (1986) results, and Sherman's (1993) extension of Han's results. See Powell (1994) for a thorough review. 4

7 optimal estimates of E[yjx] and F (). For example, let s (x) be a polynomial of degree or a spline with nots. A possible estimator for F (), that is also a proper distribution function, is R?1 s (x) 2 dx= R 1?1 s (x) 2 dx. Nonetheless, this paper focuses on estimators of the form of mixtures of distributions because their use is widespread throughout economics. The organization of this paper is as follows. Section 2 begins by heuristically describing MOD. The rest of this section is broen into four parts. Subsection 2.1 covers identication, whereas Subsection 2.2 describes the sieve approximation error in more detail. The next subsection formally denes MOD, and it provides the asymptotic convergence results for E[yjx] and F () and its derivatives. Subsection 2.4 outlines the necessary conditions for estimates of 0 to be asymptotically normal. Section 3 is a Monte Carlo experiment, and the last section outlines a few extensions. All proofs are given in the Appendix. 2 Binary Response Model If the errors and regressors are mutually independent, then E[yjx] = P ( >?x 0 0 ) = 1? F (?x 0 0 ); where F () is the common c.d.f. of the errors. Hence the model reduces to a single index restriction, E[yjx] = E[yjx 0 0 ]. Klein and Spady (1993), among others, have shown that mutual independence of x and is not necessary for the index restriction to hold, but the assumption is used below only to identify F (). If one is only interested in estimates of E[yjx], then the mutual independence assumption can be relaxed as in Klein and Spady. A linear transformation of a mixture of distributions is used to estimate F (), X? j (; a; b; f j ; j g; ) = a + (b? a) j H ; (2.1) where is the number of mixing components, H() is a smooth distribution function, j, 0 j 1, P j=1 j = 1, is a mixing weight, j is a translation parameter, is a scaling constant, a is an intercept term, and b? a is a slope term, 0 a b 1. By construction, is a function of smooth distributions. 6 j=1 6 See Titterington et al. (1985) for an analysis of nite mixtures. 5

8 In most parametric applications, one would set a = 0 and b = 1, but as will be shown below, we need this additional parameterization when the support of x 0 0 is a strict subset of the support of. This is partly due to the fact that if the support of x 0 0 is [?M; M], 0 < M < 1, and the support of is (?M 0 ; M 0 ), M < M 0, for example, then F () will not be identied over the region (?M 0 ;?M) and (M; M 0 ) because there is no information there. The information in this setup comes from the possible values of?x 0 0. The idea here is to increase the number of mixtures,, as the sample size, n, increases. In this sense, MOD can be viewed as a natural extension of the purely parametric binary response model where is held xed. As gets larger, the estimator dened in (2.1) becomes more \exible", enabling it to give a better approximation of the true distribution function F (). This statistical procedure is called the method of sieves. Given ( ; a; b; f j ; j g; ) and, an estimator for E[yjx] is 1? (?x 0 ; a; b; f j ; j g; ) = (2.2) 1? 2 X 4 a + (b? a) j=1 j H!3?x0? j 5 : Since Efy? E[yjx 0 0 ]g = 0, we can estimate 0 and F () by maximizing Q n (!(x; ; )) =? 1 n nx i=1 with respect to (; a; b; f j ; j g; ), where!(x; ; ) 1? (?x 0 ; a; b; f j ; j g; ): fy i? [1? (?x 0 i ; a; b; f j ; j g; )]g 2 (2.3) For a xed, the method described above is just (misspecied) nonlinear least squares. To ensure consistency, it is necessary that! 1. We will also impose additional restrictions, such as placing a bound on the second derivative, which will be described in more detail below. Ichimura (1993) also exploited a similar type of objective function, but in that paper, ernels were used to estimate the conditional expectation. Hence estimation is performed only over given that the bandwidth parameter is some predetermined sequence. One could view this as advantageous because there are fewer parameters to estimate, but the cost is that the estimator for the distribution function is not necessarily a distribution itself. 6

9 A nonlinear least squares type of objective function is preferred to a lielihood based objective function, such as in Klein and Spady (1993), because (2.3) is bounded without any form of truncation. This maes maximization easier. The method developed here is also robust to certain types of temporal dependency, which will be further noted in Section 4. The function to be estimated is! 0 (x; F; 0 ) 1? F (?x 0 0 ); where! 0 2 0, a possibly innite dimensional parameter space. For notational simplicity, we will usually denote ( ; a; b; f j ; j g; ) and!( ; ; ) as just () and!(). The pseudometric is the L 2 norm, (! 1 ;! 2 ), and it is dened as n E[!1 (x)?! 2 (x)] 2 o 1=2 ; where the expectation is with respect to the distribution of x. Rates of convergence will be determined by the order at which (^! n ;! 0 ) is bounded in probability, where ^! n is our estimate of! 0. In other words, the pseudometric is used as a measure of how close our estimates of E[yjx] are to the conditional expectation itself. 2.1 Identication Let F () 2 F, where F is some family of distributions. Lie in Cosslett (1983), the model is considered identied if the following denition holds. DEFINITION 1 Let 1 2 and F 1 () 2 F. The model is identied if for all 1 2 and F 1 () 2 F such that F 1 (?x 0 1 ) = F (?x 0 0 ) for almost all x, then 1 = 0. The following assumptions will be used in part to identify both F () and 0. ASSUMPTION 1 0 = ( 01 ; : : : ; 0d ) 0, 1 d < 1, is an element in the interior of < d, where is compact, j 0j j C 0 < 1, j = 1; : : : ; d, and C 0 is some constant. ASSUMPTION 2 The index, x 0, is nown up to the parameter. ASSUMPTION 3 i F (). is a sequence of i.i.d. random variables with distribution function 7

10 ASSUMPTION 4 fx i g, x i 2 < d, 1 d < 1, is a sequence of i.i.d. random vectors with nite support. ASSUMPTION 5 x and are mutually independent. The rst two assumptions are standard for purely parametric models. The assumption that the regressors have nite support is made for simplicity. Assumptions 3 and 5 are used to identify F () and 0. We now from Cosslett (1983) that without further restrictions, the constant term (if there is one) is not identied and that the slope coecients are only identied up to scale. As a result, we use the normalization that 01 = 1 and, slightly abusing notation, rewrite the index as x 0 = x x d?1x d = x 1 + ~x: 2.2 Sieve Approximation Error Denote a sequence of spaces 1 ; : : : ; n as approximations to 0, the underlying parameter space. In the next subsection, these parameter spaces will be dened in more detail. Heuristically, the sieve approximation error measures how close n is to 0 with respect to the pseudometric (; ). Formally, suppose that for any! 0 2 0, there exists n (! 0 ) 2 n such that ( n (! 0 );! 0 )! 0 as n! 0. The rate at which this tends to zero is the sieve approximation error. A bound on the sieve approximation error is calculated here by combining results from the ernel and neural networ literature. The calculation itself is quite involved, so the details are provided in the Appendix. However, some of the restrictions imposed in this paper are a direct result of this approximation error, and without some understanding of how it is bounded, the restrictions may appear unnecessary. Two examples already given are the a and b parameters dened in (2.2). With this in mind, a setch of the main ideas behind the calculation is provided below. Observe rst that the sieve approximation error directly associated with 0 is zero because it will be assumed that 0 lies in n for all n. But this is not the case for general F () since we can thin of it as depending on an innite number of mixture terms. Hence, in most 8

11 cases, F () 2 n only as n! 1. As the number of mixture terms increases, there does exist some unnown sequence of distribution functions, F (), depending on mixtures, such that F () gets closer to F (). A lower bound on the rate at which (F (); F ())! 0 bounds the sieve approximation error here. Again denoting the support of?x 0 0 by [?M; M], suppose that we observe the following sequence of i.i.d. random variables, f~ j g ~ j=1, that are generated from the conditional distribution of F () given j j M. We can then estimate the conditional distribution function, ~F(), using the standard type of Rosenblatt-Parzen ernel estimator for distributions (Reiss, 1981). Denote this ernel estimator for ~ F() as K~ () = (1= ~ ) P~ j=1 H[(? ~ j)=h], where h is the bandwidth parameter. Using standard techniques, we rst show that K ~ () approximates ~ F() at a certain rate under the L 2 norm. For reasons described below, we also need to show that the qth derivative of K ~ () converges to the qth derivative of ~ F() under the strong norm, where we will assume that ~ F () is q times continuously dierentiable. Observe that ~ F() is related to F () by ~F(z) = or in terms of F (), F (z)? F (?M);?M z M; F (M)? F (?M) F (z) = [F (M)? F (?M)] ~ F(z) + F (?M);?M z M: To relate this with the estimator, (), in (2.1), set a = F (?M), b = F (M), j = 1= ~, j = ~ i, ~ = h, and ~ =. Clearly, a + (b? a)k ~ () has the same approximating accuracy to F () as K ~ () does to ~ F() on [?M; M]. The parameter values for f j g are random, but by using an argument similar to Barron's (1993), this is enough to show that there exists some nonstochastic values for f j g, call them f j g, such that (1= ~ ) P~ j=1 H[(? j )= ~ ] has the same approximating properties as K ~ () does in probability. By also allowing the f j g to vary, we can use a result in Maovoz (1996) to prove that an approximation with the same accuracy as above can be constructed with < ~ components, P j=1 j H[(? j )= ~ ], where is considerably smaller than ~. Dene the nal approximation to F () on [?M; M], F () = a + (b? a) P j=1 j H[(? j )= ~ ], as the sieve approximation, F = n (F ). This is a nonstochastic and unnown sequence of functions. Then [E(F? F ) 2 ] 1=2 bounds the sieve approximation error, which will depend on the conditions imposed on H() and F (). 9

12 2.3 Convergence Results The idea behind the consistency argument is to obtain an estimate of! 0 on a bounded parameter space, n, called a sieve. Even in the parametric case, theory often requires that we perform estimation over a compact parameter space (i.e. Amemiya, Theorem 4.1.1, 1984). The dierence here is that the parameter space depends on n, which is a result of estimating F (), the nonparametric component. For nite n, the true parameter space, 0, is generally too large to obtain consistent estimates over it. As a result, estimation is performed over n, which is a more manageable parameter space. We require that n is a \reasonable" approximation to 0 so that in the limit, the sieve is dense in the true parameter space, lim n!1 n 0. In other words, the sieve approximation error must tend to zero, which means that the sieve approximation must lie in n for all large n. Finally, we need to control the \size" of n, where size is dened here as the metric entropy, which is the log of the minimum number of -balls it taes to cover n. The metric entropy plays an integral role in deriving rates of convergence. 7 The assumptions below are used, in part, to construct a sieve with these desirable properties. Denote the sth derivative of some function G(x), (@ s =@x s )G(x), as G (s) (x), where G (0) (x) = G(x). ASSUMPTION 6 F () is twice continuously dierentiable, jf (s) ()j M s < 1; s = 1; 2, and R jf (2) ()jd < 1. ASSUMPTION 7 H() in (2:1) is a four times continuously dierentiable increasing function such that i) lim!?1 H() = 0 and lim!1 H() = 1, ii) jh (s) ()j C s < 1; s = 1; : : : ; 4, iii) R H (1) ()d = 0, iv) R 2 H (1) ()d < 1, and v) jh (1) ()j! 0 as! 1, where the C s 's are constants. ASSUMPTION 8 Denote r() as the density of?x 0 0. Then there exists constants c 1 ; c 2 ; c 3 ; c 4 such that R c 2 c 1 r(?x 0 0 ) d(?x 0 0 ) = 1, inf fc1?x 0 0 c 2 g r(?x 0 0 ) c 3 > 0, and sup fc1?x 0 0 c 2 g r(?x 0 0 ) c 4 < 1. 7 See Alexander (1984), Section 3, for a good example of how the metric entropy is used in deriving rates of convergence. 10

13 Placing a smoothness requirement on F (), as in Assumption 6, is common in nonparametric curve estimation. The added requirement that R jf (2) ()jd < 1 is used to show that the second derivative of the sieve approximation uniformly converges to F (2) (). This helps in bounding the metric entropy calculation. As Kolmogorov and Tihormirov (1959) have shown, the metric entropy of a twice continuously dierentiable function on a bounded support is of order?1=2. Assumption 7 is similar to those in the ernel distribution literature (see Reiss, 1981). For example, the normal distribution satises these requirements. We also place restrictions on the 3rd and 4th derivatives, which is also used to show that the second derivative of our sieve approximation uniformly converges to F (2) (). Variants of Assumption 8 are common in single index type settings because it maes some of the proofs less cumbersome by bounding the density of?x 0 0 from above and below. 8 The conditions are primarily used here in the sieve approximation error calculation, which requires the use of a Gabushin (1967) type bound (see the Appendix). Gabushin's result bounds derivatives of functions in terms of the function itself and a higher derivative. The problem is that this bound uses the non-weighted L 2 norm, Z c2 c 1 1=2 [! 1 (z)?! 2 (z)] 2 dz ; so we use Assumption 8 to relate this with the weighted L 2 norm used in the sieve approximation error calculation, Z c2 c 1 1=2 [! 1 (z)?! 2 (z)] 2 r(z) dz : The assumption could be weaened, such as in Gallant and Nycha (1987), at the expense of maing the proofs much more tedious. The next denition formally denes the sieve by restricting the class of functions in (2.2). 8 For a concrete example, suppose ~x 0 0 = (j? 1)=J, j = 1; : : :; J + 1 < 1, with probability p j > 0, and x 1 is distributed independently of ~x 0 0 with density r 1 (x 1 ), x 0 0 = x 1 + ~x 0 0. Let 0 < a 1 r 1 (x 1 ) A 1 < 1 Pfor all x 1, and r 1 () has support on [0; b 1 ] (e.g. uniformly distibuted on [0; b 1 ]). The density of x 0 0 is pj r 1 (x 0 0? 1=j). Hence the density of?x 0 0, r(), has support on [?b 1? 1; 0]. If b 1 1=j, then the conditions in Assumption 8 are satised with c 1 =?b 1?1, c 2 = 0, c 3 = a 1 min j p j, and c 4 = A 1. If b 1 < 1=j, then the conditions are not satised because inf r() = 0. As another example, suppose ~x 0 0 is uniformly distributed on [?1; 1], and given ~x 0 0, x 1 is distributed as a standard truncated normal with values greater than b 1? ~x 0 0 or smaller than?b 1? ~x 0 0 truncated, where b 1 > 1. Then the density of?x 0 0 has support on [?b 1 ; b 1 ] and c 3 = [(b 1 + 1)? (b 1? 1)]=f2[(b 1 )? (?b 1 )]g, where () is the c.d.f. of the standard normal. 11

14 DEFINITION 2 The sieve, n, is dened as the set of functions f!(x; ; )g = f1? (?x 0 ; a; b; f j ; j g; )g (2.4) subject to i) = O(n ); ii) j j j C ; j = 1; : : : ; d? 1; iii) 0 a < b 1; iv) X j=1 j = 1; 0 j 1; j = 1; : : : ; ; v) C L j C U ; j = 1; : : : ; ; vi) c n? C ; 0 < c C < 1; vii) sup C L zc U j (2) (z)j C (2) ; where 0 < ; will be determined below and C C 0 ; C L < c 1 ; C U > c 2 ; C (2) M 2 are nite constants with C 0 ; c 1 ; c 2 ; M 2 dened as in Assumptions 1, 8, and 6. on The constraints imposed on the sieve are to ensure that it is dense in 0. The restrictions (2) () directly bound the order of the metric entropy by?1=2. The above bounds are assumed to be constant, but this is not necessary. We could instead have them slowly increase with the sample size, but this greatly adds to the notational complexity. 9 We are now ready to state the main theorem. THEOREM 1 Suppose that Assumptions 1? 8 hold and that the model is identied. Let ^! n = 1? ^ (?x 0^n ) = 1? (?x 0^n ; ^a;^b; f^ j ; ^ j g; ^ ) be the estimate of! 0 that maximizes Q n (!) dened in (2:3) subject to the sieve dened in (2:4). For some arbitrarily small > 0, put = 7 15(1? ) and = 1? =2 5(1? ) : 9 See Shen and Wong (1994) for the necessary adjustments. 12

15 Then (^! n ;! 0 ) = O p n?2=5 : This theorem states that estimates of E[yjx], ^! n, converge in probability at rate of n?2=5 under the L 2 norm. Stone (1982) has shown that this is the optimal rate for univariate functions that are twice continuously dierentiable. Faster rates of convergence are possible if one assumes that F () is more than twice continuously dierentiable. This requires extensions to Assumptions 6 and 7, and as a consequence, MOD will unfortunately no longer necessarily be monotonic. A related problem, that is the estimator is no longer necessarily positive, occurs in the standard nonparametric ernel density estimation when higher order ernels are used. Again, optimal rates of convergence are possible under the L 2 norm. ASSUMPTION 9 F () is q times continuously dierentiable, jf (s) ()j M s < 1; s = 1; : : : q, q 3, and R jf (q) ()jd < 1. ASSUMPTION 10 H() in (2:1) is q + 2 times continuously dierentiable such that i) lim!?1 H() = 0 and lim!1 H() = 1, ii) jh (s) ()j C s < 1; s = 1; : : : ; q + 2, iii) R s H (1) ()d = 0; s = 1; : : : ; q? 1, iv) R j q H (1) ()jd < 1, and v) j s H (s) ()j! 0 as! 1; s = 1; : : : ; q? 1, where the C s 's are constants. As in Assumption 7, the restrictions imposed on H() in Assumption 10 are reasonable, so the usual methods for constructing higher order ernels can be used. The sieve with respect to (2.4 vii) must also be adjusted. That is, replace sup j C L zc U (2) (z)j C (2) with sup j C L zc U (q) (z)j C (q) ; where C (q) M q and M q is as in Assumption 9. COROLLARY 1 Suppose that the same setup as in Theorem 1 still holds with Assumptions 6 and 7 replaced with Assumptions 9 and 10. Put = 2q + 3 3(2q + 1)(1? ) and = 1? =2 (2q + 1)(1? ) : 13

16 Then (^! n ;! 0 ) = O p n?q=(2q+1) : In some cases, such as estimates of quantiles, a stronger norm may be desired. next result states the convergence rates of the estimate of F () and its derivatives, ^ (s) (), s = 0; : : : ; q? 1, under various norms. The convergence rate of Z c2 c 1 h^ (z)? F (z) i 2 r(z) dz 1=2 is optimal, but the other rates, in general, are not. The COROLLARY 2 Given the conditions in Theorem 1 or Corollary 1, if j 2q=(q? i) for some integers j and i, then ^ (i) (?x0 0 )? F (i) (?x 0 0 ) j = O p (n? ); = 2q q? i + 1=j (2q + 1) 2 ; where 0 i q? 1, j = (Ej j j ) 1=j, and j = 1 is the strong norm (q = 2 with regards to Theroem 1). 2.4 Asymptotic Normality A desirable feature of any semiparametric estimator is to have an asymptotic distributional results for the estimate of the parametric component. This is shown here by verifying the the conditions in Shen (1997). In this subsection, let! = (; 1? (?x 0 0 )), and dene! 0 ; n, and 0 analogously. The ey is to choose an appropriate inner product (recall that x = (x 1 ; ~x 0 ) 0 ), h! 1 ;! 2 i = E n [~x 0 1 f(?x 0 0 ) + (1? ;1(?x 0 0 ))] [~x 0 2 f(?x 0 0 ) + (1? ;2(?x 0 0 ))] o : Dene the norm jj jj such that <!;! > = jj!jj 2. A main step in the proof is to nd a v such that (? 0 ) 0 = h!?! 0 ; v i 14

17 for all! 2 n, where is an arbitrary unit vector in < d?1. By the Riesz representation theorem, v 2 V exists, where V is the completion of V, the space spanned by 0?! 0. Similar to Example 2 in Shen (1997) and Case 2.2 in Chen and Shen (1998), v =?1 ;?(?1 ) 0 f(?x 0 0 )E[~xj x 0 0 ] if f(?x 0 0 )E[~xj x 0 0 ] is smooth enough, where E n f(?x 0 0 ) 2 [~x? E(~xj x 0 0 )] [~x? E(~xj x 0 0 )] 0o : The next assumption guarantees that v 2 V, and it is in the same spirit as in Shen (1997) Example 2. ASSUMPTION 11 f(?x 0 0 )E[~xj x 0 0 ] is at least as smooth as F (?x 0 0 ). Unfortunately, Assumption 11 imposes a rather high-level condition. It is satised if E[~xj? x 0 0 ] = G(?x0 0 ) f(?x 0 0 ) ; where G() is some very smooth function. 10 Another case of when it will hold is if E[~xjx 0 0 ] and f(?x 0 0 ) are innitely times dierentiable with respect to?x 0 0. normal distribution is innitely times dierentiable. For example, the THEOREM 2 Suppose that the conditions in Theorem 1 or Corollary 1 hold and that Assumption 11 is satised. If is positive denite and x 1 is not a function of ~x 0 0, then p n(^n? 0 )! N(0;?1 J?1 ) in distribution, where J = EfF (?x 0 0 )[1? F (?x 0 0 )]f(?x 0 0 ) 2 [~x? E[~xjx 0 0 ]][~x? E[~xjx 0 0 ]] 0 g: As in Klein and Spady (1993) and Ichimura (1993), estimation of the covariance matrix can be performed in the usual way. The conditional expectations piece can be estimated by standard regression ernels with the order of the bandwidth set to n?1=3. This follows by the uniform convergence results in Klein and Spady. 10 As an example of when this might hold, let d = 2 and 0 = 1 for simplicity. Dene r(x 1 ; z) and r(z) as the respective joint and marginal densities, where z =?x 1?x 2. Suppose that on [c 1 ; c 2 ], r(z) is proportional to f(z). Then f(z)e[x 2 jz] = c R r(x 1 ; z) dx 1 for some positive constant c. In this case, Assumption 11 will be satised if r(; z) is very smooth with respect to z. 15

18 3 Simulations The primary purpose of this section is to show that MOD performs reasonably well. The experiment here is performed 1000 times across two sample sizes: n = 250 and n = There are two regressors, x 1, x 2, with x x 2, 0 = 1, where the support of x x 2 is on [?10; 10]. The errors are distributed as one of two mixtures of independent gammas. It is often the case in simulations that a mixture of normals is used instead, but this would not be a meaningful comparison here since we will be using a type of mixture of normals as one of our estimators. Denote G(; ) as a gamma distribution with shape parameter and scale parameter, G(?; ) as G(; ) whose density has been reected about the y-axis, and [G(; ) + ] as G(; ) whose density has been shifted to the right by. The error distributions considered in this experiment are G [G(6; 0:15)? 0:5] + 1 [G(?6; 0:15) + 0:5] [G(8; 1)? 2] [G(?8; 1) + 2]; and G G [G(4; 0:45) + 1] [G(?4; 0:45) + 5] [G(4; 0:75)? 1] + 1 [G(?4; 0:75)? 5]: 8 The rst mixture is symmetric and trimodal, and its form is not that atypical, whereas the shape of the second distribution is nonstandard. Even though it is unliely that G 2 occurs in an economic type setting, it is nonetheless a good test; if MOD performs well under this distribution, then it is liely to perform just as well or better under more \reasonable" distributions. Both G 1 and G 2 are two times continuously dierentiable, and their second derivatives are bounded by 0.5 and 0.7, respectively, in absolute value, and their rst four moments are 0.0, 22.15, 0.0, , and 0.0, 22.60, , The rst regressor, x 1, is uniformly distributed on [?1; 1], and the second regressor, x 2, will be a truncated normal with support on [?9; 9]. Let T N(0; 2 ; tr) represent a normal distribution with mean zero and variance 2, where values greater than tr in absolute value 16

19 have been truncated. Then x 2 will be distributed as either T N(0; 9 2 ; 9) or T N(0; 4:5 2 ; 9). Typically, the nonparametric component at any given point will only be estimated well if there are many observations around that point. Since the distribution function is dened over the range [?10; 10], estimates of it near the tails will only do well if the mass there, with respect to the distribution of?x 0 0, is large. To get a feel for the sensitivity of this, the results for two dierent distributions are provided, with the latter, T N(0; 4:5 2 ; 9), having less mass in the tails. 11 In all, four models are studied, Model 1: x 2 T N(0; 9 2 ; 9); G 1 ; Model 2: x 2 T N(0; 4:5 2 ; 9); G 1 ; Model 3: x 2 T N(0; 9 2 ; 9); G 2 ; Model 4: x 2 T N(0; 4:5 2 ; 9); G 2 ; across four dierent estimators. The estimators are the standard probit, two variants of MOD, and Ichimura's (1993) estimator. In all cases, the estimate of 0 was restricted to be less than 10 in absolute value. For the probit model, the estimator is [(x 1 + x 2? )=], where the parameters ; ; are to be estimated and () is the c.d.f. of the standard normal. The rst version of MOD, called MOD-1, sets H() as the standard normal distribution. As a practical matter, there are two crucial choices one must mae before implementing MOD: a lower bound for and an upper bound on the second derivative of our estimator. Both of these bounds will be obtained from the results of a probit model. Dene the estimated from the probit model as ^ p. A lower bound for with respect to MOD-1 was set here as ( min 0:5; 0:5 ) ^ p ; n 0:2 11 The distribution of the regressors was chosen here because both the uniform and normal are commonplace in simulation studies. Hence this maes the analysis of the estimates of the error density and distribution easier. However, note that the density of z =?x 1?x 2, r(z), when x 2 T N(0; 9 2 ; 9) is [(minfz +1; 9g=9)? (maxfz?1;?9g=9)]=f2[(1)?(?1)]g for?10 z 10. This implies that r(?10) = r(10) = 0, a violation of Assumption 8. The condition that the density has to be positive was only made for technical convenience, as stated earlier. Perhaps more problematic, in terms of the theory, is that the density is not dierentiable at z =?8 and z = 8, though it is innitely times dierentiable elsewhere. Thus Assumption 11 will also be violated. Nonetheless, MOD performing well under these conditions will provide some evidence of its robustness. 17

20 where the lower bound must decrease at a rate of about n?0:2 by the results from the last section. 12 One can use any positive constant besides 0:5 in the above calculation (since for large n it becomes irrelevant), but it is a reasonable upper bound to the lower bound because it deviates only slightly from the case of standard normality. Liewise, noting that the bound on the second derivative with regards to the probit model is exp(?0:5)(^ 2 pp 2)?1, we set the bound on the second derivative of MOD as 8 < C (2) min : exp(?0:5) p max4 0:5?2 ; 2 where 1000 is some predetermined upper bound. 13 2!?1:5 0:5 ^ p n 0:2 3 9 = 5 ; 1000 ; ; The latter bound is constructed so that for moderate sample sizes, the lower bound on the second derivative, (^ p n?0:2 )?1:5, is decreasing at a slower rate than the lower bound under normality, (^ p n?0:2 )?2. This can be viewed as a transition to the case of larger sample sizes, where the derivative will be xed at 1000, whereas the lower bound for eeps decreasing with n. In order to satisfy the second derivative bound, we impose the following constraints, 14 b? a 2 X j=1 j H (2) zj;l? j C (2) ; l = 1; 2; z j;1 = j? ; z j;2 = j + : Observe that jh (2) [(z? j )= ]j obtains a maximum value at j? and j +. These bounds do not guarantee that the overall second derivative is less than C (2) since we are only constraining the derivative at certain points. Nonetheless it wors well, especially when is small, because the normal has exponential tails. After estimation, it was checed if the overall derivative bounds where less than 1:5C (2), and all cases they were. Hence with some abuse of notation, denote the overall derivative bound as 1:5 C (2). 12 The upper bound for was set to 50. The average lower bound across the models and sample sizes was fairly constant, taing on a value of about 0.47 in each case. The median estimate for ^ p ranged from 4.5 to 5.0. In the larger sample sizes, the lower bound for was less than 0:5 about a third of the time, and liewise it occurred about a quarter of the time in the smaller sample sizes. 13 Lie the lower bound on, the average upper bound on the second derivative is fairly constant with a value of about 1.0 in each case. 14 In practice, these constraints may not be necessary and instead checed after optimization, but in a large scale simulation, they speed up the exercise. For example, one could estimate a range of mixtures, = 1; : : :;, without imposing the derivative bounds in the computer program itself, and select the largest model, in terms of, that satises the derivative bounds, where the derivative bounds are checed after estimation. This is much easier to program. 18

21 The last step is to determine the number of terms,, to use. The results from the last section state that it increases at an order of about n 7=15, but this leaves a wide range of choices. Even in moderately large samples, setting = n 7=15 seems unreasonably ambitious. Unfortunately, a trial of sample selection procedures, such as BIC and a variant of generalized cross-validation did not wor very well since they tended to under-predict the optimal number of mixtures (i.e. the number of mixtures which minimized the pseudometric on a per sample basis). However, for moderately large, the objective function did not perceptively decrease as increased. The reason is that the model is nested in terms of, and if was too large, the optimizer converged to a smaller model by either setting some of the i 's to zero and/or by equating some of the i 's. 15 So the ey is to nd a large enough. Given that this is extremely time consuming if done per sample, we instead did a search over the rst ten samples and chose large enough so that in each sample, the same result could have been achieved with a fewer number of terms. For n = 250, was chosen as seven across all models, and for n = 1000, was set to nine for the rst two models and it was set to eight for the remaining two. 16 Except for the nonlinear constraints, the optimization problem is the same as standard nonlinear least squares. There are a variety of computer pacages that handle these types of constraints, and the one used here is NPSOL by Gill et al (1986). 17 As a means of comparison, we also estimated the model using Ichimura's (1993) approach, where the ernel is dened as (x) = 8 >< >: 0; if jxj 1; 35 (1? x 2 ) 3 =32; if jxj < 1; which is constructed to be twice continuously dierentiable and everywhere positive. This ernel is similar to the one used in Lee (1995), who examined the simulated performance of ^ n under a closely related model. If the bandwidth is of order n?0:2, then the distribution 15 Both Geman and Hwang (1983) and Hecman and Singer (1984) also commented on this result with the latter paper calling it the \clustering phenomenon". 16 The theory in the last section is for the case when is a predetermined sequence such as = bn 7=15 c, where bc is the largest integer no bigger than, but the theory easily extends to the case where the search is done over the set fmax(bn 7=15 c? C 1 ; 1); : : :; min(bn 7=15 c + C 2 ; bnc)g, where C 1 ; C 2 are positive constants. The reason why the theory carries over is that the order of the metric entropy remains the same. 17 Optimization was performed on a Sun Ultra 2, and each run with a sample size of 1000 observations and = 9 too about a minute to converge. 19

22 function converges at a rate of n?0:4, which is the same as MOD. The rate of n?0:2 gives us little guidance to actual implementation, so the approach taen here is the same as in the empirical wor of Stern (1996) who treated the bandwidth as a parameter to be estimated as well as 0. We have restricted the bandwidth to lie in [1n?0:2 ; 15n?0:2 ], where the constants, 1 and 15, encompass the range used by Lee (1995). The Simplex method, outlined in Press et al. (1994), is used for optimization, which is also as in Lee (1995). To show that the results are not due to the dierences in ernels (i.e. H (1) () versus ()), even though () is very similar (\bell shaped") to a normal with a variance of 0.12, we used the following alternative function for H(), which is just R x?1 H(x) = 8 >< >: () d, 0; if x?1; 35 (x? x x 5 =5? x 7 =7)=32 + 0:5; if jxj < 1; 1; if x 1; and this model is called MOD-2. This function is only three times continuously dierentiable, but the theory from the last section requires H() to be four times continuously dierentiable. It is easy to construct a ernel with bounded support satisfying the stronger smoothness conditions, but for comparison reasons, the function above is used. The analogous strategy for setting and imposing the bounds as in the case of MOD-1 is adopted here. Because E[yjx] and (@=@x)e[yjx] depend on F (), f(), and 0, only results for ^ (), ^ (1) (), and ^ n are reported. A summary of the results for the distribution and density estimates are reported in Table 1, where ( ^ ; F ) and ( ^ (1) ; f) are the corresponding average (across the 1000 simulations) L 2 and density (i.e. ( ^ ; F ) f P 1000 j=1 calculations for the estimates of the distribution R?10[ ^ ;j(z)? F (z)] 2 r(z) dz=1000g 1=2, where ^ ;j is the estimate with respect to the jth simulation). A summary of the results for the coecient estimates are provided in Table 2, where ^ n, SD, and ASE are the average estimate, the standard deviation of the estimates, and the average of the estimated asymptotic standard errors, respectively. For both MOD models, E[x 2 jx x 2 ] was estimated using standard ernel regression with a normal density ernel and a bandwidth set to n?1=3. In Table 2, results for the exact MLE (i.e. the constant term and the slope coecient are the only parameters being estimated) are also given as a point of comparison. Not surprisingly, the eciency loss with respect to the semiparametric methods is quite considerable. Plots of the 20

23 average estimates of the distribution and density and their corresponding standard deviations are given in Figures 1 and 2 for Models 1 and Finally, to get a sense for the distribution of the estimated coecients, some histograms are provided in Figure 3. Overall, MOD performs very well, especially for the case of 1000 observations where it estimates the distribution and 0, relative to the probit and Ichimura's model, with good accuracy. It is surprising how similar MOD-1 and MOD-2 are, with the only noticeable dierence being in the standard deviations and standard errors for a sample size of 250 observations. MOD clearly outperforms the probit model by all measures in the case of the larger sample size. In the case of the smaller sample size, the only negative outcome is that there is little improvement (in L 2 ) in the estimates of the density. This should not be surprising, however, since it is typical in nonparametric estimation that the derivatives are estimated less precisely. To compound matters here, we only observe an indicator, versus a continuum of values, which is very little information. The similarities between MOD's performance in Models 1 and 2, as well as Models 3 and 4, suggest that the results are not overly sensitive to the underlying distribution of?x 0 0. The main reason why the analogous pseudometrics are sometimes smaller in Models 2 and 4 is due to the fact that the density of?x 0 0, under this second regime, is smaller in the tails, and this is where much of the error lies. As expected, MOD does not estimate the second error distribution, G 2, as well as the rst. In the case of such a complicated function, it appears that we need a relatively large sample size before we are able to obtain good estimates of it. Even though probit does a better job of estimating the density in L 2, it misses, on average, important shape characteristics such as the mass to the right of center and that the mass at the left tail is larger than at the right tail. The histograms provided in Figure 3 give some visual evidence that the distribution of the estimates are approximately normal for moderately large sample sizes. As n increases, the sample distribution appears to be more \bell shaped", even though they are still asymmetric. Somewhat surprising is the performance of Ichimura's estimator. Except for the estimates 18 MOD-2 is not given since its averages are almost identical to that of MOD-1. Models 2 and 4 are similar to Models 1 and 3, and hence omitted. 21

24 Table 1: Average L 2 Distances ( ^ ; F ) ( ^ (1) ; f) ( ^ ; F ) ( ^ (1) ; f) n = 1000 n = 250 Model 1 Probit Mod Ichimura Mod Model 2 Probit Mod Ichimura Mod Model 3 Probit Mod Ichimura Mod Model 4 Probit Mod Ichimura Mod

25 Table 2: Coecient Estimates ^ n SD ASE ^n SD ASE n = 1000 n = 250 Model 1 Probit MOD Ichimura MOD MLE Model 2 Probit MOD Ichimura MOD MLE Model 3 Probit MOD Ichimura MOD MLE Model 4 Probit MOD Ichimura MOD MLE

26 of 0, there is little improvement over the standard probit case. It should be ept in mind that the averages of the distributions and densities in Figures 1 and 2 can be somewhat misleading for this estimator since the non-monotonic regions, which can be very troublesome on a per sample basis, get averaged out across the 1000 simulations. The estimates of the density can also be quite jagged, which is evident by the plots of the standard deviations. In summary, MOD performs very well relative to both probit and Ichimura's model, especially when the errors are distributed under G 1. Given that the support of?x 0 0 was xed at [?10; 10], the variance of both the errors and?x 0 0 were constructed to be large so that the results here would be meaningful. For example, if x 2 T N(0; 1; 9), then P (jx 0 0 j > 5) 0: This causes two problems for analysis: 1) In the L 2 calculations, the tail estimates of the distribution and density are relatively unimportant since the density of?x 0 0 is close to zero there; 2) Tail estimates will be very imprecise since there is almost no information (i.e. observations) there. In practice, therefore, the researcher should also estimate the density of?x 0 0 and use caution for estimates of F () and f() where the estimated density of?x 0 0 is small. 4 Extensions A useful extension is an ordered probit type model. For example, suppose that one observes y i = 8 >< >: 1; if y i > C i ; 0; if c i y C i i;?1; if y t < c i ; where fc i ; C i g is a nown sequence of constants. Then E[y i jx i ] =?1 P (y i < c i jx i ) + 1 P (y i > C i jx i ) = 1? F [(c i? x 0 i 0 )]? F [(C i? x 0 i 0 )]: Given this moment, the problem is in the same format as in Section 2. It is also noted here that MOD can estimate a general class of univariate monotone functions or monotone functions that satisfy an index type restriction. To do this, simply relax the bounds on a and b in (2.4). As an example, suppose you want to estimate g(x), 24

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy Computation Of Asymptotic Distribution For Semiparametric GMM Estimators Hidehiko Ichimura Graduate School of Public Policy and Graduate School of Economics University of Tokyo A Conference in honor of

More information

4.8 Instrumental Variables

4.8 Instrumental Variables 4.8. INSTRUMENTAL VARIABLES 35 4.8 Instrumental Variables A major complication that is emphasized in microeconometrics is the possibility of inconsistent parameter estimation due to endogenous regressors.

More information

Simple Estimators for Semiparametric Multinomial Choice Models

Simple Estimators for Semiparametric Multinomial Choice Models Simple Estimators for Semiparametric Multinomial Choice Models James L. Powell and Paul A. Ruud University of California, Berkeley March 2008 Preliminary and Incomplete Comments Welcome Abstract This paper

More information

Simple Estimators for Monotone Index Models

Simple Estimators for Monotone Index Models Simple Estimators for Monotone Index Models Hyungtaik Ahn Dongguk University, Hidehiko Ichimura University College London, James L. Powell University of California, Berkeley (powell@econ.berkeley.edu)

More information

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................

More information

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ ADJUSTED POWER ESTIMATES IN MONTE CARLO EXPERIMENTS Ji Zhang Biostatistics and Research Data Systems Merck Research Laboratories Rahway, NJ 07065-0914 and Dennis D. Boos Department of Statistics, North

More information

Discrete Dependent Variable Models

Discrete Dependent Variable Models Discrete Dependent Variable Models James J. Heckman University of Chicago This draft, April 10, 2006 Here s the general approach of this lecture: Economic model Decision rule (e.g. utility maximization)

More information

ECONOMETRIC MODELS. The concept of Data Generating Process (DGP) and its relationships with the analysis of specication.

ECONOMETRIC MODELS. The concept of Data Generating Process (DGP) and its relationships with the analysis of specication. ECONOMETRIC MODELS The concept of Data Generating Process (DGP) and its relationships with the analysis of specication. Luca Fanelli University of Bologna luca.fanelli@unibo.it The concept of Data Generating

More information

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity Songnian Chen a, Xun Lu a, Xianbo Zhou b and Yahong Zhou c a Department of Economics, Hong Kong University

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Estimating Gaussian Mixture Densities with EM A Tutorial

Estimating Gaussian Mixture Densities with EM A Tutorial Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

Monte Carlo Methods for Statistical Inference: Variance Reduction Techniques

Monte Carlo Methods for Statistical Inference: Variance Reduction Techniques Monte Carlo Methods for Statistical Inference: Variance Reduction Techniques Hung Chen hchen@math.ntu.edu.tw Department of Mathematics National Taiwan University 3rd March 2004 Meet at NS 104 On Wednesday

More information

13 Endogeneity and Nonparametric IV

13 Endogeneity and Nonparametric IV 13 Endogeneity and Nonparametric IV 13.1 Nonparametric Endogeneity A nonparametric IV equation is Y i = g (X i ) + e i (1) E (e i j i ) = 0 In this model, some elements of X i are potentially endogenous,

More information

Introduction to Nonparametric and Semiparametric Estimation. Good when there are lots of data and very little prior information on functional form.

Introduction to Nonparametric and Semiparametric Estimation. Good when there are lots of data and very little prior information on functional form. 1 Introduction to Nonparametric and Semiparametric Estimation Good when there are lots of data and very little prior information on functional form. Examples: y = f(x) + " (nonparametric) y = z 0 + f(x)

More information

Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation

Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation Nicholas M. Kiefer Cornell University Professor N. M. Kiefer (Cornell University) Lecture 19: Nonparametric Analysis

More information

Midterm 1. Every element of the set of functions is continuous

Midterm 1. Every element of the set of functions is continuous Econ 200 Mathematics for Economists Midterm Question.- Consider the set of functions F C(0, ) dened by { } F = f C(0, ) f(x) = ax b, a A R and b B R That is, F is a subset of the set of continuous functions

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Economics 241B Review of Limit Theorems for Sequences of Random Variables Economics 241B Review of Limit Theorems for Sequences of Random Variables Convergence in Distribution The previous de nitions of convergence focus on the outcome sequences of a random variable. Convergence

More information

Stochastic dominance with imprecise information

Stochastic dominance with imprecise information Stochastic dominance with imprecise information Ignacio Montes, Enrique Miranda, Susana Montes University of Oviedo, Dep. of Statistics and Operations Research. Abstract Stochastic dominance, which is

More information

UNIVERSITY OF CALIFORNIA Spring Economics 241A Econometrics

UNIVERSITY OF CALIFORNIA Spring Economics 241A Econometrics DEPARTMENT OF ECONOMICS R. Smith, J. Powell UNIVERSITY OF CALIFORNIA Spring 2006 Economics 241A Econometrics This course will cover nonlinear statistical models for the analysis of cross-sectional and

More information

Lecture 21. Hypothesis Testing II

Lecture 21. Hypothesis Testing II Lecture 21. Hypothesis Testing II December 7, 2011 In the previous lecture, we dened a few key concepts of hypothesis testing and introduced the framework for parametric hypothesis testing. In the parametric

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria SOLUTION TO FINAL EXAM Friday, April 12, 2013. From 9:00-12:00 (3 hours) INSTRUCTIONS:

More information

Linear Models in Econometrics

Linear Models in Econometrics Linear Models in Econometrics Nicky Grant At the most fundamental level econometrics is the development of statistical techniques suited primarily to answering economic questions and testing economic theories.

More information

Testing for a Global Maximum of the Likelihood

Testing for a Global Maximum of the Likelihood Testing for a Global Maximum of the Likelihood Christophe Biernacki When several roots to the likelihood equation exist, the root corresponding to the global maximizer of the likelihood is generally retained

More information

f(z)dz = 0. P dx + Qdy = D u dx v dy + i u dy + v dx. dxdy + i x = v

f(z)dz = 0. P dx + Qdy = D u dx v dy + i u dy + v dx. dxdy + i x = v MA525 ON CAUCHY'S THEOREM AND GREEN'S THEOREM DAVID DRASIN (EDITED BY JOSIAH YODER) 1. Introduction No doubt the most important result in this course is Cauchy's theorem. Every critical theorem in the

More information

Economics 204 Fall 2011 Problem Set 2 Suggested Solutions

Economics 204 Fall 2011 Problem Set 2 Suggested Solutions Economics 24 Fall 211 Problem Set 2 Suggested Solutions 1. Determine whether the following sets are open, closed, both or neither under the topology induced by the usual metric. (Hint: think about limit

More information

Averaging Estimators for Regressions with a Possible Structural Break

Averaging Estimators for Regressions with a Possible Structural Break Averaging Estimators for Regressions with a Possible Structural Break Bruce E. Hansen University of Wisconsin y www.ssc.wisc.edu/~bhansen September 2007 Preliminary Abstract This paper investigates selection

More information

Parametric identification of multiplicative exponential heteroskedasticity ALYSSA CARLSON

Parametric identification of multiplicative exponential heteroskedasticity ALYSSA CARLSON Parametric identification of multiplicative exponential heteroskedasticity ALYSSA CARLSON Department of Economics, Michigan State University East Lansing, MI 48824-1038, United States (email: carls405@msu.edu)

More information

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be Wavelet Estimation For Samples With Random Uniform Design T. Tony Cai Department of Statistics, Purdue University Lawrence D. Brown Department of Statistics, University of Pennsylvania Abstract We show

More information

2 1. Introduction. Neuronal networks often exhibit a rich variety of oscillatory behavior. The dynamics of even a single cell may be quite complicated

2 1. Introduction. Neuronal networks often exhibit a rich variety of oscillatory behavior. The dynamics of even a single cell may be quite complicated GEOMETRIC ANALYSIS OF POPULATION RHYTHMS IN SYNAPTICALLY COUPLED NEURONAL NETWORKS J. Rubin and D. Terman Dept. of Mathematics; Ohio State University; Columbus, Ohio 43210 Abstract We develop geometric

More information

ECONOMETRICS. Bruce E. Hansen. c2000, 2001, 2002, 2003, University of Wisconsin

ECONOMETRICS. Bruce E. Hansen. c2000, 2001, 2002, 2003, University of Wisconsin ECONOMETRICS Bruce E. Hansen c2000, 200, 2002, 2003, 2004 University of Wisconsin www.ssc.wisc.edu/~bhansen Revised: January 2004 Comments Welcome This manuscript may be printed and reproduced for individual

More information

Multivariate Differentiation 1

Multivariate Differentiation 1 John Nachbar Washington University February 23, 2017 1 Preliminaries. Multivariate Differentiation 1 I assume that you are already familiar with standard concepts and results from univariate calculus;

More information

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ).

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ). Connectedness 1 Motivation Connectedness is the sort of topological property that students love. Its definition is intuitive and easy to understand, and it is a powerful tool in proofs of well-known results.

More information

A Bootstrap Test for Conditional Symmetry

A Bootstrap Test for Conditional Symmetry ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School

More information

Nonparametric Identification and Estimation of a Transformation Model

Nonparametric Identification and Estimation of a Transformation Model Nonparametric and of a Transformation Model Hidehiko Ichimura and Sokbae Lee University of Tokyo and Seoul National University 15 February, 2012 Outline 1. The Model and Motivation 2. 3. Consistency 4.

More information

Dimensionality in the Stability of the Brunn-Minkowski Inequality: A blessing or a curse?

Dimensionality in the Stability of the Brunn-Minkowski Inequality: A blessing or a curse? Dimensionality in the Stability of the Brunn-Minkowski Inequality: A blessing or a curse? Ronen Eldan, Tel Aviv University (Joint with Bo`az Klartag) Berkeley, September 23rd 2011 The Brunn-Minkowski Inequality

More information

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

More information

Nonparametric Identication of a Binary Random Factor in Cross Section Data and

Nonparametric Identication of a Binary Random Factor in Cross Section Data and . Nonparametric Identication of a Binary Random Factor in Cross Section Data and Returns to Lying? Identifying the Effects of Misreporting When the Truth is Unobserved Arthur Lewbel Boston College This

More information

Working Paper No Maximum score type estimators

Working Paper No Maximum score type estimators Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,

More information

IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I IE 5531: Engineering Optimization I Lecture 15: Nonlinear optimization Prof. John Gunnar Carlsson November 1, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 1 / 24

More information

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Tak Wai Chau February 20, 2014 Abstract This paper investigates the nite sample performance of a minimum distance estimator

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor guirregabiria SOLUTION TO FINL EXM Monday, pril 14, 2014. From 9:00am-12:00pm (3 hours) INSTRUCTIONS:

More information

CV-NP BAYESIANISM BY MCMC. Cross Validated Non Parametric Bayesianism by Markov Chain Monte Carlo CARLOS C. RODRIGUEZ

CV-NP BAYESIANISM BY MCMC. Cross Validated Non Parametric Bayesianism by Markov Chain Monte Carlo CARLOS C. RODRIGUEZ CV-NP BAYESIANISM BY MCMC Cross Validated Non Parametric Bayesianism by Markov Chain Monte Carlo CARLOS C. RODRIGUE Department of Mathematics and Statistics University at Albany, SUNY Albany NY 1, USA

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

1/sqrt(B) convergence 1/B convergence B

1/sqrt(B) convergence 1/B convergence B The Error Coding Method and PICTs Gareth James and Trevor Hastie Department of Statistics, Stanford University March 29, 1998 Abstract A new family of plug-in classication techniques has recently been

More information

Likelihood Ratio Tests and Intersection-Union Tests. Roger L. Berger. Department of Statistics, North Carolina State University

Likelihood Ratio Tests and Intersection-Union Tests. Roger L. Berger. Department of Statistics, North Carolina State University Likelihood Ratio Tests and Intersection-Union Tests by Roger L. Berger Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series Number 2288

More information

Antonietta Mira. University of Pavia, Italy. Abstract. We propose a test based on Bonferroni's measure of skewness.

Antonietta Mira. University of Pavia, Italy. Abstract. We propose a test based on Bonferroni's measure of skewness. Distribution-free test for symmetry based on Bonferroni's Measure Antonietta Mira University of Pavia, Italy Abstract We propose a test based on Bonferroni's measure of skewness. The test detects the asymmetry

More information

Rank Estimation of Partially Linear Index Models

Rank Estimation of Partially Linear Index Models Rank Estimation of Partially Linear Index Models Jason Abrevaya University of Texas at Austin Youngki Shin University of Western Ontario October 2008 Preliminary Do not distribute Abstract We consider

More information

Function Approximation

Function Approximation 1 Function Approximation This is page i Printer: Opaque this 1.1 Introduction In this chapter we discuss approximating functional forms. Both in econometric and in numerical problems, the need for an approximating

More information

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v 250) Contents 2 Vector Spaces 1 21 Vectors in R n 1 22 The Formal Denition of a Vector Space 4 23 Subspaces 6 24 Linear Combinations and

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Nonparametric Identi cation of Regression Models Containing a Misclassi ed Dichotomous Regressor Without Instruments

Nonparametric Identi cation of Regression Models Containing a Misclassi ed Dichotomous Regressor Without Instruments Nonparametric Identi cation of Regression Models Containing a Misclassi ed Dichotomous Regressor Without Instruments Xiaohong Chen Yale University Yingyao Hu y Johns Hopkins University Arthur Lewbel z

More information

Calculation of maximum entropy densities with application to income distribution

Calculation of maximum entropy densities with application to income distribution Journal of Econometrics 115 (2003) 347 354 www.elsevier.com/locate/econbase Calculation of maximum entropy densities with application to income distribution Ximing Wu Department of Agricultural and Resource

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

A characterization of consistency of model weights given partial information in normal linear models

A characterization of consistency of model weights given partial information in normal linear models Statistics & Probability Letters ( ) A characterization of consistency of model weights given partial information in normal linear models Hubert Wong a;, Bertrand Clare b;1 a Department of Health Care

More information

Richard DiSalvo. Dr. Elmer. Mathematical Foundations of Economics. Fall/Spring,

Richard DiSalvo. Dr. Elmer. Mathematical Foundations of Economics. Fall/Spring, The Finite Dimensional Normed Linear Space Theorem Richard DiSalvo Dr. Elmer Mathematical Foundations of Economics Fall/Spring, 20-202 The claim that follows, which I have called the nite-dimensional normed

More information

A Local Generalized Method of Moments Estimator

A Local Generalized Method of Moments Estimator A Local Generalized Method of Moments Estimator Arthur Lewbel Boston College June 2006 Abstract A local Generalized Method of Moments Estimator is proposed for nonparametrically estimating unknown functions

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

ON STATISTICAL INFERENCE UNDER ASYMMETRIC LOSS. Abstract. We introduce a wide class of asymmetric loss functions and show how to obtain

ON STATISTICAL INFERENCE UNDER ASYMMETRIC LOSS. Abstract. We introduce a wide class of asymmetric loss functions and show how to obtain ON STATISTICAL INFERENCE UNDER ASYMMETRIC LOSS FUNCTIONS Michael Baron Received: Abstract We introduce a wide class of asymmetric loss functions and show how to obtain asymmetric-type optimal decision

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

Machine Learning Lecture Notes

Machine Learning Lecture Notes Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective

More information

A SEMIPARAMETRIC MODEL FOR BINARY RESPONSE AND CONTINUOUS OUTCOMES UNDER INDEX HETEROSCEDASTICITY

A SEMIPARAMETRIC MODEL FOR BINARY RESPONSE AND CONTINUOUS OUTCOMES UNDER INDEX HETEROSCEDASTICITY JOURNAL OF APPLIED ECONOMETRICS J. Appl. Econ. 24: 735 762 (2009) Published online 24 April 2009 in Wiley InterScience (www.interscience.wiley.com).1064 A SEMIPARAMETRIC MODEL FOR BINARY RESPONSE AND CONTINUOUS

More information

9 Classification. 9.1 Linear Classifiers

9 Classification. 9.1 Linear Classifiers 9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive

More information

Economics 241B Estimation with Instruments

Economics 241B Estimation with Instruments Economics 241B Estimation with Instruments Measurement Error Measurement error is de ned as the error resulting from the measurement of a variable. At some level, every variable is measured with error.

More information

APPROXIMATING CONTINUOUS FUNCTIONS: WEIERSTRASS, BERNSTEIN, AND RUNGE

APPROXIMATING CONTINUOUS FUNCTIONS: WEIERSTRASS, BERNSTEIN, AND RUNGE APPROXIMATING CONTINUOUS FUNCTIONS: WEIERSTRASS, BERNSTEIN, AND RUNGE WILLIE WAI-YEUNG WONG. Introduction This set of notes is meant to describe some aspects of polynomial approximations to continuous

More information

Contents. 6 Systems of First-Order Linear Dierential Equations. 6.1 General Theory of (First-Order) Linear Systems

Contents. 6 Systems of First-Order Linear Dierential Equations. 6.1 General Theory of (First-Order) Linear Systems Dierential Equations (part 3): Systems of First-Order Dierential Equations (by Evan Dummit, 26, v 2) Contents 6 Systems of First-Order Linear Dierential Equations 6 General Theory of (First-Order) Linear

More information

Economics 620, Lecture 20: Generalized Method of Moment (GMM)

Economics 620, Lecture 20: Generalized Method of Moment (GMM) Economics 620, Lecture 20: Generalized Method of Moment (GMM) Nicholas M. Kiefer Cornell University Professor N. M. Kiefer (Cornell University) Lecture 20: GMM 1 / 16 Key: Set sample moments equal to theoretical

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

STOCHASTIC DIFFERENTIAL EQUATIONS WITH EXTRA PROPERTIES H. JEROME KEISLER. Department of Mathematics. University of Wisconsin.

STOCHASTIC DIFFERENTIAL EQUATIONS WITH EXTRA PROPERTIES H. JEROME KEISLER. Department of Mathematics. University of Wisconsin. STOCHASTIC DIFFERENTIAL EQUATIONS WITH EXTRA PROPERTIES H. JEROME KEISLER Department of Mathematics University of Wisconsin Madison WI 5376 keisler@math.wisc.edu 1. Introduction The Loeb measure construction

More information

Testing for Regime Switching: A Comment

Testing for Regime Switching: A Comment Testing for Regime Switching: A Comment Andrew V. Carter Department of Statistics University of California, Santa Barbara Douglas G. Steigerwald Department of Economics University of California Santa Barbara

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

The best expert versus the smartest algorithm

The best expert versus the smartest algorithm Theoretical Computer Science 34 004 361 380 www.elsevier.com/locate/tcs The best expert versus the smartest algorithm Peter Chen a, Guoli Ding b; a Department of Computer Science, Louisiana State University,

More information

PARAMETER IDENTIFICATION IN THE FREQUENCY DOMAIN. H.T. Banks and Yun Wang. Center for Research in Scientic Computation

PARAMETER IDENTIFICATION IN THE FREQUENCY DOMAIN. H.T. Banks and Yun Wang. Center for Research in Scientic Computation PARAMETER IDENTIFICATION IN THE FREQUENCY DOMAIN H.T. Banks and Yun Wang Center for Research in Scientic Computation North Carolina State University Raleigh, NC 7695-805 Revised: March 1993 Abstract In

More information

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector On Minimax Filtering over Ellipsoids Eduard N. Belitser and Boris Y. Levit Mathematical Institute, University of Utrecht Budapestlaan 6, 3584 CD Utrecht, The Netherlands The problem of estimating the mean

More information

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE Chapter 6. Panel Data Joan Llull Quantitative Statistical Methods II Barcelona GSE Introduction Chapter 6. Panel Data 2 Panel data The term panel data refers to data sets with repeated observations over

More information

Generated Covariates in Nonparametric Estimation: A Short Review.

Generated Covariates in Nonparametric Estimation: A Short Review. Generated Covariates in Nonparametric Estimation: A Short Review. Enno Mammen, Christoph Rothe, and Melanie Schienle Abstract In many applications, covariates are not observed but have to be estimated

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition)

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition) Vector Space Basics (Remark: these notes are highly formal and may be a useful reference to some students however I am also posting Ray Heitmann's notes to Canvas for students interested in a direct computational

More information

Spurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics

Spurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics UNIVERSITY OF CAMBRIDGE Numerical Analysis Reports Spurious Chaotic Solutions of Dierential Equations Sigitas Keras DAMTP 994/NA6 September 994 Department of Applied Mathematics and Theoretical Physics

More information

ARTIFICIAL INTELLIGENCE LABORATORY. and CENTER FOR BIOLOGICAL INFORMATION PROCESSING. A.I. Memo No August Federico Girosi.

ARTIFICIAL INTELLIGENCE LABORATORY. and CENTER FOR BIOLOGICAL INFORMATION PROCESSING. A.I. Memo No August Federico Girosi. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL INFORMATION PROCESSING WHITAKER COLLEGE A.I. Memo No. 1287 August 1991 C.B.I.P. Paper No. 66 Models of

More information

Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis

Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis Glenn Heller Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center,

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

arxiv: v1 [physics.comp-ph] 22 Jul 2010

arxiv: v1 [physics.comp-ph] 22 Jul 2010 Gaussian integration with rescaling of abscissas and weights arxiv:007.38v [physics.comp-ph] 22 Jul 200 A. Odrzywolek M. Smoluchowski Institute of Physics, Jagiellonian University, Cracov, Poland Abstract

More information

Estimation of Treatment Effects under Essential Heterogeneity

Estimation of Treatment Effects under Essential Heterogeneity Estimation of Treatment Effects under Essential Heterogeneity James Heckman University of Chicago and American Bar Foundation Sergio Urzua University of Chicago Edward Vytlacil Columbia University March

More information

Extension of continuous functions in digital spaces with the Khalimsky topology

Extension of continuous functions in digital spaces with the Khalimsky topology Extension of continuous functions in digital spaces with the Khalimsky topology Erik Melin Uppsala University, Department of Mathematics Box 480, SE-751 06 Uppsala, Sweden melin@math.uu.se http://www.math.uu.se/~melin

More information

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley Time Series Models and Inference James L. Powell Department of Economics University of California, Berkeley Overview In contrast to the classical linear regression model, in which the components of the

More information

Statistical Learning Theory

Statistical Learning Theory Statistical Learning Theory Fundamentals Miguel A. Veganzones Grupo Inteligencia Computacional Universidad del País Vasco (Grupo Inteligencia Vapnik Computacional Universidad del País Vasco) UPV/EHU 1

More information

A Note on Demand Estimation with Supply Information. in Non-Linear Models

A Note on Demand Estimation with Supply Information. in Non-Linear Models A Note on Demand Estimation with Supply Information in Non-Linear Models Tongil TI Kim Emory University J. Miguel Villas-Boas University of California, Berkeley May, 2018 Keywords: demand estimation, limited

More information

Estimating Semi-parametric Panel Multinomial Choice Models

Estimating Semi-parametric Panel Multinomial Choice Models Estimating Semi-parametric Panel Multinomial Choice Models Xiaoxia Shi, Matthew Shum, Wei Song UW-Madison, Caltech, UW-Madison September 15, 2016 1 / 31 Introduction We consider the panel multinomial choice

More information