Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation
|
|
- Annabel Pope
- 5 years ago
- Views:
Transcription
1 Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric techniques that are widely applied in empirical researches with focuses on the kernel and sieve estimation. The kernel and sieve methods form the backbone of nonparametrics despite their distinctive nature in terms of local or global approximation. We occasionally touch upon alternative estimation methodology but refer the readers directly to the related books or articles. The lecture note is comprised of three parts. In the first part, we provide a rigorous introduction to nonparametric methods. After we study nonparametric density estimation and testing, we move quickly to nonparametric regression analysis. Kernel estimation with mixed data will also be addressed. Then we study the sieve estimation of the conditional mean function. In the second part, we examine various semiparametric models, which bridges the gap between parametric and nonparametric models. Here the primary interest is typically to estimate a finite-dimensional parameter in the presence of one or several infinite-dimensional nuisance parameters (i.e., nonparametric components). We provide a unified framework to analyze the asymptotic properties of the semiparametric estimator of the finite dimensional parameter and then study various semiparametric regression models in details, including partially linear models, index models, and additive models. In the third part, we focus on various topics in nonparametric and semiparametric econometrics. We will mainly discuss nonparametric kernel and sieve estimation with endogenous regressors and study the estimation of various nonparametric and semiparametric panel data models. In each chapter, we shall first introduce the theory for nonparametric or semiparametric estimation, followed by one or two applications in related areas. To help the reader grasp the material, we also include a small amount of theoretical exercises together with some real data exercises. Nonparametric Density Estimation and Testing. Introduction In this chapter we will describe the most important method of estimating density functions, namely kernel density estimation. As Pagan and Ullah (999) remarked, there are three areas in which one needs to estimate densities. First, density estimates may be required to capture the stylized facts that need explanation and to judge how well a potential model is likely to fit the data. Secondly, in the case where we need a complete picture of the distribution of an estimator, we need density estimates to summarize the information. Thirdly, some parametric estimators (e.g., quantile estimators) have asymptotic distributions that depend on a density evaluated at a specific point. Let be a generic random variable or vector with cumulative distribution function (CDF) ( ) and probability density function (PDF) ( ) Let the observations bedrawnfromanunknown distribution function Weareinterestedinestimating at a point.2 Univariate Density Estimation Several estimators have been proposed to estimate the density function nonparametrically. These include the kernel density estimator by Rosenblatt (956) and Parzen (962), the nearest neighborhood estimator
2 by Fix and Hodges (95), the series estimator by Cencov (962), the penalized likelihood estimator by Good and Gaskins (97, 980), and more recently the local likelihood estimator by Loader (993). Pagan and Ullah (999) discussed all of these estimators, among which the kernel density estimator is the best known estimator. Also, it is better developed and more widely used than others. Therefore, we will only focus on nonparametric kernel density estimation..2. Motivation for Kernel Density Estimator For simplicity, we first look at the issue of estimating the density ( ) of a scalar continuously-valued random variable at a particular point. To motivate the method, noticing that () = 0 () a.e., one can obtain a simple estimator of : () = ( + ) ( ) (.) 2 where = () is a sequence of positive constants and () is the empirical distribution function of () = X ( ) where () is the indicator function: () =if holds and 0 otherwise. Note that X 2 () = [ ( + ) ( )] = ( + ) = as a summation of independent Bernoulli random variables, has the binomial distribution Binomial( ( + ) ( )) It follows that and [ ()] = Var [ ()] = = ( + ) ( ) 2 = () if 0 as [ ( + ) ( )] { [ ( + ) ( )]} 4 2 (+) ( ) 2 () 2 { [ ( + ) ( )]} 2 0 if 0 and Thus, to guarantee a good behavior of () we should choose such that 0 and as One can also calculate the MSE of () and establish asymptotic normality for it. Rewrite () as () = = 2 2 X ( + ) One can propose a useful class of kernel density estimator of the form b() = X µ = = X ( ) = X = µ (.2) = 2 (.3)
3 where we refer to ( ) as a kernel function on R and to as a smoothing parameter (or alternatively, a bandwidth ), and () = (). (2) is called a uniform kernel estimator because the kernel function ( ) corresponds to a uniform pdf on [-, ]. It is sometimes referred to as a näive kernel estimator and was first introduced in Fix and Hodges (95). In practice, there are a variety of kernel functions that might be chosen, among which, three are most popular choices: () the Gaussian kernel () = exp µ 2 (.4) 2 2 (2) the Epanechnikov kernel () = 3 2 ( ) (.5) 4 (3) the Biweight or Quartic kernel () = ( ) (.6) 6 Another three less frequent choices of kernels are: (4) the Uniform kernel () = ( ) (.7) 2 (5) the Triangular kernel () =( ) ( ) (.8) (6)theTriweightkernel () = ( ) (.9) 32 It turns out that the choice between these kernels rarely make significant difference in the estimates. The kernel functions are used to smooth the data whereas the amount of smoothing is controlled by the bandwidth 0 Intuitively, () b is the average of a set of weights. If a large number of the observations are near then the weights are relatively large and () b is large. Conversely, if only a few are close to then the weights are small and () b is small. The bandwidth controls the degree of closeness..2.2 Asymptotic Properties of the Kernel Density Estimator The properties of b () are well studied in the literature. To summarize some important properties of b() we make the following assumptions. A The observations are IID with density A2 The second order derivatives of are continuous and bounded in the neighborhood of A3 The kernel is a symmetric PDF around zero satisfying (i) R () = (ii) R 2 () = 2 (0 ) (iii) R 2 () A4 As = () 0 and is a second-order kernel if R () = R () =0 and R 2 () So Assumption A3 implies that is a second-order kernel. 3
4 . () b is a valid density. That is, () b 0 for all and it integrates to : X µ X µ b() = = = = X µ µ = = = X () = = where the fourth equality applies the Fubini Theorem and change-of-variables: =( ) 2. The moments of the density b () can be calculated easily. The mean is b () = = = = X = = X µ ( + ) () X () + = X = () where the last equality follows from the facts R () =and R () =0The second moment is 2 () b X µ = 2 = = X ( + ) 2 () = = X = where 2 = R 2 () is the variance of the kernel. Consequently, R 2 () b P = 2 if 0 as 3. MSE of () b Suppose that () is second order continuously differentiable, then h i b() = = = µ () () ( + ) () ()+ 0 () + 2 (2) () = ()+ 2 (2) () This leads to the bias expression h i Bias b() = 2 (2) () (.0) 4
5 For the variance, we have h i Var b() = Var [ ( )] = [ ( )] 2 { [ ( )]} 2 µ 2 = 2 () µ ()2 + = () 2 ( + ) µ ()2 + = () Since the variance is of order () assumption A4 ensures that Var Adding (.) to the square of (.0) we obtain where 02 (MISE) µ () 2 + (.) h i b() h i b() = 4 (2) () () 02 + µ + 4 converges to zero. = R ()2 Integrating this expression, we obtain the mean integrated squared error µ () = ()+ + 4 (.2) where () = (2) (.3) and (2) R (2) () 2 is a measure of the total curvature of We call AMISE the asymptotic MISE since it provides a useful large sample approximation to the MISE. Both MISE and AMISE are global measures of precision for estimation of () Notice that the integrated squared bias is asymptotically proportional to 4 so for this quantity to decrease we need to take to be small. However, since the leading term of the integrated variance is proportional to () we need to choose large to reduce the variance. Therefore, as increases should vary in such a way that each of the components in MISE or AMISE becomes smaller. This is known as the variance-bias trade-off. This is a mathematical quantification for the critical role of the bandwidth. Example (Density estimate of annual salary income). In Figure we use the Panel Study of Income Dynamics (PSID) dataset to estimate the density of annual salary income in the USA. We have = 445 observations on the annual salary income for the year We choose the standard normal kernel. The bandwidth is chosen according to = 5 for =025.06, and 4, where is the sample standard deviation. For comparison, we also include plot of the density estimates using the least squares cross-validated (LSCV, see below) bandwidth. Figure illustrates the effect of choosing the bandwidth. For small value of ( =025) the estimated density function is very spiky and hence very variable in the sense that, over repeated sampling from the true density the spikes would appear in different places. There is, however, very little bias. When is increased, the variability is reduced at the expense of introducing bias. When the over-smoothing bandwidth is applied ( =4) the estimated density function is too flat, indicating significant bias. For the given dataset, we observe that the density estimate using the ROT bandwidth ( =06 to be introduced below) almost coincide with that using the LSCV bandwidth. Nevertheless, the AMISE is a much simpler expression to comprehend than the expression for the MISE given by (.2). The main advantage of AMISE is that we can define the optimal bandwidth with 5
6 2 x c=0.25 c=.06 c=4 LSCV.4.2 Density Annual salary income x 0 5 Figure : Density estimate based on the PSID annual salary income data in 2003 (Kernel: standard normal, bandwidth: = 5 for = and 4 Least squares cross validated (LSCV) bandwidth is also used.) 6
7 respect to this criterion, and it has a closed form expression. To be specific, we define the asymptotically optimal bandwidth 0 as the value to minimize () : 0 =argmin () The solution can be found by solving the first order condition yielding R 0 = ()2 (2) R 2 () (.4) Aside from its dependence on the known and this expression shows us that 0 is inversely proportional to (2) 5 Thus for a density with little curvature, (2) will be small and a large bandwidth is called for. On the other hand, when (2) is large, little smoothing will be desired. Unfortunately, direct use of (.4) to choose a good bandwidth in practice is not possible since (2) is unknown. However, there now exist several rules for choosing which may or may not be based on estimating (2) Let denote the sample standard deviation. For the Gaussian, Epanechnikov, and Biweight kernels discussed above, one can choose 0 = and respectively. Such choices of bandwidth are called bandwidth selected by Silverman s rule of thumb (ROT) in the literature. Substituting (.4) into (3) leads to inf 0 = 5 4 ( 2 2 µ 4 () 2 (2) )5 45 (.5) which is the smallest possible AMISE for estimation of using the kernel We can rewrite the information conveyed by (.4) and (.5) in terms of the MISE itself using the asymptotic notation. Let be the minimizer of MISE() defined in (.2). Then R ()2 (2) R 2 () and inf ( 2 2 µ 4 () 2 (2) )5 45 These expressions give the rate of convergence of the MISE optimal bandwidth and the minimum MISE, respectively, to zero as Under the stated assumptions, the best obtainable rate of convergence of MISE of the kernel density estimator is of order 45 which is slower than the typical parametric rate of order for MSE convergence..2.3 Univariate Bandwidth Selection There are several ways to choose the bandwidth in practice. We introduce the most widely used methods only. Rule of thumb and plug in method (4) shows that the optimal smoothing parameter depends on the integrated second derivative of the unknown density. In practice, we may choose an initial pilot value of (say the ROT one) to estimate (2) R (2) () 2 nonparametrically, and then use 7
8 this value to obtain estimate of 0 optimally. Such a method is referred to as plug-in method in the literature. It is worth mentioning the estimation of (2) requires estimation of the second order derivatives of which is introduced in the next subsection. A popular way of choosing the initial value of is to assume that () belongs to certain parametric family of distributions, and then estimate the optimal bandwidth using (4) For example, if we assume () is a normal density function with variance 2 we get the pilot estimate =(4) 0 h (38) 2i which is plugged into R b (2) (). The latter is then used to obtain the estimate for the optimal bandwidth. In practice we replace by the sample standard deviation of { } whereas Silverman (986) advocates using a robust measure of spread which replaces by an adaptive measure of spread given by min(standard deviation, interquartile range/.34). Least squares cross validation Another popular method in choosing the bandwidth parameter is least squares cross validation, which is a fully automatic and data-driven method. This method is based on the principle that a bandwidth is chosen to minimize the mean squared error of the resulting estimate. The integrated squared difference between b and is h i 2 = b () () h i 2 = b () 2 b () () + () 2 (.6) () 2 h i Note that the last term does not depend on the bandwidth and 2 = b () where ( ) denotes expectation with respect to not with respect to the random observations { } used in defining b ( ) Therefore, 2 can be estimated by b 2 = X X X b ( )= ( ) ( ) = = 6= where b ( )= X ( ) =6= is the leave-one-out kernel estimator of ( ) We ask the reader to verify as an exercise that we cannot use the usual kernel density estimate b ( ) in obtaining b 2 For the first term we have h i 2 = b X X () = 2 ( ) ( ) = = X X µ µ = 2 = = X X µ = 2 () = = X X µ = 2 = = 8
9 where () = R () ( ) is the convolution kernel derived from ( ) Given the exact form of ( ) we may obtain an analytic expression for ( ) For example, if () =exp then () =exp a normal density with zero mean and variance 2 This follows from the fact that two independent (0 ) random variables sum to an (0 2) random variable. So we choose to minimize () = 2 b 2 = X X µ 2 X X 2 ( ) (.7) ( ) = = = 6= Let b denote the solution to the above cross-validation problem. Härdle et al. (988) show that That is, b 0 converges to as we would expect. b 0 0 = 0 Likelihood cross validation Another data-driven method to choose the bandwidth is done by the likelihood cross validation. This approach yields a density estimates which has an entropy interpretation. That is, the bandwidth is chosen to minimize the Kullback-Leibler distance between the estimated density and the true one. In general, the Kullback-Leibler distance between two density functions and is given by µ µ () () ( ) ln = ()ln () () = ()ln(()) ()ln(()) Here taking = and = b we have ()= b ()ln(()) ()ln b () The first term on the right hand side does not depend on the bandwidth and the second term is hln b () i which can be estimated by P = log b ( ) Therefore minimizing () b is equivalent to maximizing the log-likelihood. So we choose to maximize X = log () = log b ( ) The main problem with the likelihood cross validation is that it is severely affected by the tail behavior of () and works poorly when the true distribution has fat tails..2.4 Univariate Density Derivative Estimation Estimation of density derivatives may be needed in several cases. First and second derivatives may be of intrinsic interest as measures of slope and curvature. Other important functions, such as the score function 0 depend on density derivatives. Automatic plug-in bandwidth selection methods require estimation of quantities involving density derivatives too. = 9
10 When the density function () is th order differentiable at a natural estimator of its th derivative () () is b () () = X µ + () (.8) = which is the th derivative of b () The mean squared error properties of b() () can be derived straightforwardly to obtain b () () = () h i ½ ¾ 2 2+ () (+2) () (.9) 2+ where () = R () () 2 and 2 = R 2 () It follows that the MSE-optimal bandwidth for estimating () () is of order (2+5) Therefore, estimation of 0 () requires a bandwidth of order 7 compared to the optimal bandwidth rate 5 for the estimation of () itself. Moreover, the optimal MSE or its integrated version is of order 4(2+5) This rate becomes slower for higher values of, whichreflects the increasing difficulty inherent in the problems of estimating higher order derivatives..2.5 Univariate Cumulative Distribution Function Estimation We can estimate the cumulative distribution function (CDF) () by the empirical distribution function (edf) () Nevertheless, () is not smooth as it jumps by at each sample realization point. We can obtain a smoothed kernel estimate of () by integrating b () : b () = = = = b () = X ( ) = X µ µ = X µ = X = where () = R () To calculate the MSE of b h i h i () we first calculate b () and b () 2 () h i b () µ µ = = () = () ( ) = () ( ) = () ( ) = + ( ) () = 0+ () () () + 2 (2) () 2 2 3! (3) () ! (4) () () = ()+ 2 2 (2) () ! (4) ()
11 where = R () ( =24) () () is the th order derivative of Similarly, we can show that µ µ Var b () = Var µ 2 = ( ½ µ ¾ ) 2 = () 0 ()+ 2 ª n ()+ 2 2 (2) () o 2 = ()[ ()] 0 ()+() where 0 =2 R () () Consequently b () h i 2 h i n h io 2 = b () () = Var b () + bias b () = ()[ ()] 0 ()+ 4 2 ()+ 4 + where 2 () = 2 (2) () 2 To choose the bandwidth, we can minimize the mean integrated squared error of b It turns out the optimal bandwidth in this case is 0 =[ 0 (4 2 )] 3 3 (.20) where 2 = R 2 () Hence the optimal bandwidth for estimating a univariate CDF has faster rate of convergence than the optimal bandwidth for estimating a univariate PDF. Bowman et al. (998) suggest choosing to minimize the following cross-validation function () = X = h ( ) b ()i 2 (.2) where b ( )=( ) P =6= (( ) ) istheleaveoneoutestimatorof ( ) Let b denote the value of minimizing () It can be shown that b 0 Using = b we can easily derive the asymptotic distribution of b () : b () () (0 ()( ())) (.22) So the asymptotic property of b () is the same as that of the empirical distribution function () In particular, it converges to its probability limit () at the parametric rate. Example 2 (Cumulative distribution estimates of annual salary income). Using the same data as in Example, we now estimate the cumulative distribution function ( ) of annual salary income in the USA. We choose the standard normal kernel. We could choose the bandwidth by minimizing the cross-validation criterion function in (.2). For simplicity, we choose the bandwidth by using the LSCV criterion function for estimating the density functions (see (.7) and then adjust it to have the optimal rate for estimating the cdfs. That is, we set = 5 3 = 2 5 where is the least squares cross validated bandwidth for estimating the pdf Figure 2 plots the smoothed kernel estimate b () of () together with the edf estimate () The two curves almost coincide except that the edf curve is wiggly even for the dataset with large sample size ( = 445)
12 Probability smoothed kernel estimate for cdf empirical distribution function Annual salary income x 0 5 Figure 2: Smoothed kernel estimate and empirical distribution estimate for the cdf of annual salary income (LSCV bandwidth is used in the kernel estimate. See the text for details.).2.6 Higher Order Kernels and Bias Reduction Recall that we have used the so-called second order kernel in the previous sections. For the univariate kernel density estimate, the best obtainable rate of convergence of the MISE is of order 45 In this subsection, we demonstrate that it is possible to obtain better rates of convergence by relaxing the restriction that the kernel be a probability density function. To this goal, higher order kernels are needed because they can help reduce the bias in the kernel density estimates. Recall the result for bias of () b in (.0): Bias b() = 2 2 (2) () Here the kernel is constrained to be a probability density function so that it is necessary to have 2 = R 2 () 0 Nevertheless, without this restriction, it is possible to construct such that 2 =0 which will have the effect of reducing the bias to be of order 4 provided that has a continuous square integrable fourth derivative. It is easy to check that the MSE and MISE will have optimal rate of convergence of order 89 if such a kernel is used. In general, the order of a kernel ( ), ( 0) is defined as the order of the first non-zero moment of the kernel. A general th order kernel must satisfy the following conditions, () () = () () = 0 for = () () = 2
13 For example, the standard normal (Gaussian) kernel () =(2) 2 exp 2 2 is a second order kernel. If we use a th order kernel in estimating the multivariate density, we can show that Bias b () = ( ) (.23) Var b () = () (.24) Consequently, and b () = 2 +() b () () = +() 2 It is simple to construct symmetric higher order kernel functions. In order to construct a th ( 2 even) order kernel, one could begin with a second order kernel such as the standard normal (Gaussian) kernel, set up a polynomial in its arguments, and solve for the roots of the polynomial subject to the desired moment restrictions. For example, let () =(2) 2 exp 2 2 we could begin with () = 2 X =0 2 () (.25) where ( =0 2 ) are constants that must satisfy the requirement of a symmetric th order kernel. One can verify that the 4, 6, 8th order Gaussian kernels are given respectively by µ 3 4 () = () µ 5 6 () = () µ 35 8 () = () The general formula for the kernels that are higher-order extensions of the second order normal kernel are given by () = 2 X =0 ( ) 2! (2) () =0 2 4 See Wand and Schucany (990) or Wand and Jones (995, p.34). If we start with the second order Epanechnikov kernel (standardized to have unit variance): 2 () = 3 µ the 4,6, 8th order Epanechnikov kernels are given respectively by µ 5 4 () = () µ 75 6 () = () µ () = () 3
14 nd order 4th order 6th order 8th order Value of kernel: k(u) u Figure 3: Gaussian kernels of various orders.6.4 Value of kernel: k(u) nd order 4th order 6th order 8th order u Figure 4: Epanechnikov kernels of various orders 4
15 Obviously, when 2 the kernels can take negative values on their support. This is a drawback for higher order kernel when used in density estimation because they may assign negative weight in the density estimate. In some cases, we may have negative density estimate. Example 3 (Higher order kernels). Figures 3 and 4 plot the Gaussian and Epanechnikov kernels of various orders. For the two figures, we clearly see that for kernels of order higher than 2 they can indeed take negative values. There are some other rules for constructing higher-order kernels (e.g., Jones and Foster, 993). Let denote a th order kernel. Then one can use the following recursive formula to generate higher order kernels: +2 () = 3 2 ()+ 2 0 () (.26) where () is assumed to be differentiable. For example, application of (.26) to the second order Gaussian kernel 2 () = () leads directly to the fourth-order kernel given above..3 Multivariate Density Estimation We now investigate the extension of the univariate kernel density estimator to the multivariate setting. As Wand and Jones (995) remark, the need for nonparametric density estimates for recovering structure in multivariate data is greater since parametric modelling is more difficult than in the univariate case. The extension of the univariate kernel methodology is not without its problems. Firstly, the most general kernel smoothing parametrization of the kernel density estimator in higher dimensions requires the specification of many more bandwidth parameters than in the univariate setting. This leads us to consider simpler smoothing parametrization as well. Secondly, the sparseness of data in higher-dimensional space makes kernel smoothing more difficult unless the sample size is very large. This phenomenon is usually referred to as the curse of dimensionality problem in the nonparametric literature. It means that, with practical sample sizes, reasonable nonparametric density estimation is very difficult in more than about five dimensions (see Exercise in this chapter). Nevertheless, there have been many studies where the kernel density estimator has been demonstrated to be an effective tool for displaying structure in bivariate samples (e.g., Silverman, 986, Scott, 992). The multivariate kernel density estimate has also played an important role in developments of the visualization of structure in three- and four-dimensional data sets (Scott, 992). Also, in many nonparametric hypothesis testing (e.g., testing for conditional independence, Su and White, 2007, 2008, 203; Song, 2009, Huang, 200), it is required to estimate multivariate densities..3. The Multivariate Kernel Density Estimator Let denote a -variate random sample having density We will use the notation = ( ) 0 to denote the components of and a generic vector R will have the representation =( ) 0 In its most general form, the -dimensional kernel density estimator of takes the form b () = X ( ) where is a symmetric positive matrix called the bandwidth matrix, () = 2 2 is a -variate kernel function satisfying R () = and is the determinant of. = 5
16 The kernel function isoftenchosentobe-variate probability density function. There are two common techniques for generating multivariate kernels from a symmetric univariate kernel : Y ( 2 0 ) () = ( ) and () = R ( 2 0 ) = The first of these is often called a product kernel, and the second has the property of being spherically or radially symmetric. A popular choice for is the standard -variate normal density µ () =(2) 2 exp 2 0 in which case ( ) is the ( ) density in the vector It is well known that the -variate normal kernel can be constructed from the univariate standard normal density using either the product or spherically symmetric extensions. In general, the bandwidth matrix has ( +)2 independent entries, which, even for moderate can be a substantial number of smoothing parameters to have to choose. Another extreme case is to specify = 2 where is the identity matrix. In this case, the kernel estimator of the density is a straightforward generalization of (.3): b () = X = µ (.27) where, is a multivariate kernel function such that R R () = Nevertheless, the use of a single bandwidth may not be appropriate if the variation in one component of is much greater than the others. In this case, it may be more appropriate to use a vector or matrix of bandwidth parameters. An attractive practical approach is to linearly transform the data to have a unit covariance matrix, apply (.27) to the transformed data, and finally transform back to the original metric. Almost all the results in the previous subsection go through for the multivariate density estimator b () with obvious modifications. In particular, the ROT bandwidth is proportional to (4+) A realistic choice is to set = diag 2 2 Just as in the univariate case, some important choices have to be made when constructing a multivariate kernel density estimator. Nevertheless, the extension to higher dimension means there are more degrees of freedom. Below we restrict ourself to use the product kernel and diagonal. Forbrevity,we will write () instead of () for the product kernel. We estimate () by b () = X ( ) = = where =( ) and ( ) = b () below. Y = X = = Y µ (.28) We will study the asymptotic properties of.3.2 Asymptotic Properties of the Multivariate Kernel Density Estimator As in the univariate setting we can obtain a simple asymptotic approximation to the MISE of a multivariate kernel density estimator under certain smoothness assumptions on the density and some standard assumptions on the kernel and bandwidth. Before studying this, we first study the asymptotic normality 6
17 of the nonparametric kernel density estimator b () For this purpose, we make the following assumptions on and A The observations are IID with density A2 is third order continuously differentiable at the interior point of its support. A3 is a product of univariate kernel which is a symmetric function around zero satisfying (i) R () = (ii) R 2 () = 2 (0 ) (iii) R 2 () = 02 A4 As,and 0 for = Clearly, Assumptions A-A4 parallel those in Section.2 and they are not the minimal requirements. The choice of product kernel and diagonal bandwidth matrix is just for the simplicity of notation. One can generalize the results below straightforwardly to the general choice of kernel and bandwidth (e.g., Wand and Jones, 995). Just as in the univariate setting, under the above assumptions we can easily show that à Bias b () = X X! 2 () (.29) 2 = and µ Var b () = 02 ()+ 2 (.30) where () is the second order derivative of () with respect to 2 = R 2 () 02 = R () 2 and 2 = P = 2 Consequently, b () = h Bias b () i 2 + Var b () = = 2 2 +( ) (.3) which converges to zero under Assumption A4. Since convergence in MSE implies convergence in probability by the Chebyshev inequality, we have b () () To derive the asymptotic normality of b () we can write it as the summation of a double-array random variable and apply the Liapounov central limit theorem (CLT) in the appendix. Thus we state the following theorem. Theorem. Suppose Assumptions A-A4 hold. Suppose that R () 2+ for some 0 If P = 6 0, then Ã! p b X 2 () () () 2 (0 02 ()) (.32) 2 where = R () for =0 2 Proof. We only sketch the proof. Write n h i b () () = b () () = o n h io + b () b () + The first term contributes to the bias of the estimator and the second term contributes to the variance. By (.29), it suffices to show that p (0 02 ()) Define µ µ ¾ =( ) ½ 2 7
18 Then p = P = Note that ( )=0and P = Var( ) = 02 ()+() To verify the other conditions in Theorem.8, it suffices to check that P = 2+ = () By the and Jensen inequalities X 2+ = X µ µ 2+ ( ) + 2 = ( 2 + µ 2+ µ 2+ ) + 2 ( ) µ 2+ 2 ( ) + 2 = 2 + ( ) 2 () 2+ ( + ) = ( ) 2 = () where denotes Hadamard product. So we can apply the Liapounov CLT to conclude p (0 02 ()) Remark. It can be easily shown that cov ˆ ( ) ˆ ( 2 ) = () for any two distinct points and 2 on the support of () By the Cramér-Wold device, we can show that for any finite collection of points p P ˆ ( ) ( ) 2 2 = ( ) 2. (0 02 p ˆ ( ) ( ) P 2 2 = ( ) diag ( ( ) ( ))) 2 That is, ˆ ( ) ˆ ( ) are asymptotically independent of each other. This observation is very useful when we try to construct pointwise confidence intervals for () when is evaluated at distinct points since we do not need to take into account any dependence between the estimators at different points. For the asymptotic MISE approximations, we need the second order partial derivatives of to be square integrable. From (.29)-(.3), we can obtain the MISE of b () µ () = ()+ + 2 (.33) where = P = 2 () = " 2 X () # 2 + = 02 (.34) Unlike the univariate setting explicit expressions for the AMISE-optimal bandwidth are not available in general and this quantity can only be obtained numerically (see Wand, 992a). In the special case = = = the optimal bandwidth has an explicit formula and is given by = " # (+4) 02 R () ª 2 (+4) The inequality says that + 2 ( + ) where 8
19 where 2 () = P = 2 () One can easily obtain the minimum AMISE as +4 inf () = 0 4 ½( 2 ) 2 ( 02 )4 2 () ª 2 4 ¾ (+4) The last expression implies that the rate of convergence of inf 0 () is of order 4(+4) arate which becomes slower as the dimension increases. This slower rate indicates the curse of dimensionality as discussed previously..3.3 Multivariate Bandwidth Selection As is the case for univariate density estimation, the optimal bandwidth =( ) should balance the squared bias and variance terms. This requires = ( 4 ) = (4+) Let 0 =( 0 0 ) denote the optimal bandwidth that minimizes the AMISE of b () Let = Y where () is the convolution kernel of ( ) In practice, we often choose the bandwidth = =( ) by minimizing the following least squares cross validation function () = 2 b X X µ 2 X X 2 = 2 ( ) (.35) ( ) = = = 6= which is a generalization of (7) from the univariate case to the multivariate density estimation setting. Let b = b b denote the solution to the above cross-validation problem. Then one can show b 0 for = It is worth mentioning that one can also apply the plug-in principle to choose bandwidth in the multivariate setting. Using the asymptotic MISE approximations developed in the last subsection, it is possible to develop a multivariate version of the plug-in type bandwidth. See Wand and Jones (995) for details..3.4 Construction of Confidence Intervals P If undersmoothing bandwidths are chosen so that = 4 0 then Theorem. implies that the pointwise 00( )% (or simply ) asymptoticconfidence interval (CI) for () can be constructed as follows s ˆ ˆ s () 02 () 2 ˆ ˆ 02 () ()+ 2 (.36) where 2 denotes the ( 2)-percentile of the standard normal distribution. Nevertheless, the above CI can be badly behaved in finite samples in that its coverage probability may be significantly smaller than the nominal level ( ). An alternative method is to use bootstrap method to construct the CI for kernel density estimator. Since CIs for high-dimensional density estimators ( 3) are seldom used, we now turn to the construction of the confidence band for a uni-variate density estimator. In this case, the CI in (??) can be written as s ˆ () 2 s 02 ˆ () ˆ () ˆ () (.37) 9
20 where ˆ () = E and 02 = R () 2 The pointwise confidence band for () is s s B () = ( ) : X ˆ () 2 02 ˆ () ˆ () ˆ () where X is the support of The coverage probability of B () at a point is given by ( ) = {( ()) B ()} (.38) The limit of ( ) is not given by unless one uses undersmoothing bandwidth to remove the asymptotic bias of ˆ () h i To consider the bootstrap CI for () or ˆ () we define another estimator of the variance of ˆ () Note that the variance of ˆ () (in the case of =)is given by à 2 X µ! () = Var = X µ µ 2 2 Var = = ( = µ µ 2 µ µ 2) 2 E E Hall (992) proposes to estimate 2 () by " ˆ 2 () = X µ µ # 2 2 ˆ () 2 h i We now describe how to construct the bootstrap CI for ˆ () in several steps: =. Draw a bootstrap sample { } = randomly (with replacement) from the original sample D { } 2. Compute with undersmoothing choice of bandwidth : ˆ () = X = " 2 ˆ 2 () = µ X = µ () = ˆ () ˆ () ˆ () µ # 2 ˆ () 2 n o 3. Repeat Steps -2 for a large number of times, say times, we obtain estimators () = o Let 2 and 2 n denote the 2 and 2 percentiles of () Then the symmetric h i = two-sided CI of ˆ () is given by n ˆ () ˆ () 2 ˆ () ˆ () 2o (.39) 20
21 h i Note that ˆ () D = ˆ () That is, ˆ () is an unbiased estimator of ˆ () conditional on the data, though h ˆ i() is a biased estimator of () Hence, () is a bootstrap -statistic for forming a CI for ˆ () The CI in (.39) can be regarded as the CI for () when we use the undersmoothing bandwidth so that the bias of ˆ () is asymptotically negligible. Horowitz (200, Handbook of Econometrics) demonstrates that the bootstrap provides asymptotic refinements fortestsofhypothesisand confidence intervals in density estimation. More recently, Hall and Horowitz (203, AoS) propose a simple bootstrap method for constructing the CI for kernel density estimator without the need of undersmoothing. Instead of drawing the bootstrap observations randomly from D Hall and Horowitz (203) suggest that we can draw a random sample D = { } from the distribution with density ˆ () and define ˆ to be the corresponding kernel estimator of ˆ : ˆ () = X µ = One can show that conditional on D the asymptotic bias and variance of ˆ () are the same as that of ˆ (), ignoring asymptotically negligible terms. So that we can construct the CI or confidence band for () based on ˆ () The bootstrap version of B () in (.38) is given by s B () = ( ) : X ˆ 02 ˆ () () 2 ˆ ()+ 2 s 02 ˆ () The bootstrap estimator ˆ ( ) of the coverage probability ( ) that B () covers ( ()) is defined by n ˆ ( ) = ˆ o () B () D and can be computed, by Monte Carlo simulations, in the form X n ˆ o () B () (.40) = where B () is calculated as B () based on the th bootstrap resample. [The bootstrap resamples are independent of each other conditional on D ]Forlargeenough we can treat (.40) as ˆ ( ), ignoring the small simulation error. Then we can define ˆ ( 0 ) to be the solution, in of ˆ ( ) = 0 Take X 0 =[ ] as a subset of X.Let be evenly distributed on [ ] such that = 0 + = so that the distance between two adjacent points is given by =( ) ( +) Let ˆ ( 0 ) denote the -level empirical quantile of points in the set nˆ ( 0 ) ˆ o ( 0 ) For a value (0 2 ] construct the 0 confidence band B (ˆ ( 0 )) by taking sufficiently small. Hall and Horowitz (203) recommends =00 and remarks that =005 maybewarrantedinthecaseof large samples (p.899)..3.5 Conditional Density Estimation Even though conditional pdfs form the backbone of most popular statistical methods in use today, they are not modeled directly in the parametric setting. They have received even less attention in the nonparametric literature. A few exceptions include Chen et al. (2003), and Li and Racine (2007). Let =() where is a scalar and =( ) is a vector. We are interested in estimating the conditional density of given = which is denoted as ( ) Writing ( ) = 2
22 ( ) () we estimate it by where and b ( ) = b ( ) = b ( ) b () X ( ) ( ) = b () = X ( ) = The asymptotic properties of b ( ) caneasilybederivedfromthoseof b ( ) and b () To consider the choice of the bandwidths, we can consider the following criterion function based upon the weighted integrated squared error (ISE): n o 2 = b ( ) ( ) () () = ( ) 2 () () (.4) where the last term does not depend on and = b ( ) 2 () () and 2 = b ( ) ( ) () () Here we use the weight function () to mitigate the random denominator problem. Let b () = R b ( ) 2 Then = R b () b () 2 () () One can verify that b () = X X 2 ( ) ( ) ( ) ( ) = 2 = = X = = X µ ( ) ( ) where () = R () ( ) is the convolution kernel of ( ) We estimate and 2 by b = X = b ( )= 2 b ( ) ( ) b ( ) 2 and b 2 = =6= =6= X = b ( ) ( ) b ( ) 2 respectively, where the subscript denotes the leave-one-out estimators. For example, Ã! X X ( ) ( ) Thus, we can choose ( ) to minimize the following cross-validation function ( ) ( )= b 2 b 2 Let b b be the solution to the above minimization problem. Let ( 0 0 ) be the minimizer of (.4). Hall et al. (2003a) shows that b 0 0 (+5) b and 0 0 (+5) for = 22
23 .3.6 Uniform Rates of Convergence Up to now we have only demonstrated the case of pointwise and mean integrated square error consistency for the density estimator b () In fact, the consistency of b () can be generalized to a stronger uniform consistency result. It can be proved that the nonparametric kernel estimators are uniformly strongly (almost sure) consistent. This result is important for theoretical purpose. So we only report the result below. For the proof of the result, we refer the reader to Masry (996a,b). Theorem.2 Under some regularity conditions given in Masry (996a, b), we have (i) b () () = ( ) 2 (ln ) 2 + P = 2 S h i 2 (ii) b () () = ( ) + P = 4 S where S is a compact set in the interior of the support of In the above theorem, we restrict ourself to a compact set in the interior of the support of Suppose is compactly supported, it is well known that when lies at the boundary of the support, we cannot estimate at the usual rate. In fact the MSE of b () is not () in this case. Some modifications are needed to consistently estimate () for at the boundary of the support. For details, see Gausser and Müller (979), Rice (984), Hall and Wehrly (99), Scott (992, pp ), Wand and Jones (995, pp ). Example 4 (Boundary bias problem). Suppose has compact support [0 ] and (0) 0 Suppose we estimate (0) by the usual nonparametric kernel estimator b (0) that is, b (0) = () P = 0 Then h i b (0) = µ = () 0 = () () () (0) = (0) 2 0 where we have used the dominated convergence theorem and the fact that ( ) is a symmetric kernel (implying R 0 () =2) So b (0) is a biased estimator for (0) even asymptotically and the asymptotic bias of of magnitude (0) 2 When is infinitely supported, however, we can extend the above result to the full support of For more details, see the work by Hansen (2008). 0.4 Testing Hypotheses about Densities In this section we consider testing hypotheses about densities. Suppose and are two possible densities for the random variable or vector We may like to test several types of hypotheses regarding these densities, each of which will be formulated as testing for 0 : () = () versus : () 6= () (.42) Pagan and Ullah (999) consider several examples which we reformulate below. Examples. (a) It is sometimes desirable to test whether a nonparametrically estimated density has a particular form, say normal density. In this example, we estimate () nonparametrically and estimate () parametrically according to the parametric assumption: (; ) where is a vector of unknown parameters. 23
24 It is interesting to note that the finite dimensional parameter can usually be estimated at the parametric -rate, which is faster than any nonparametric rate of estimation, so that whether is estimated will not have any asymptotic impact on the test. (b) Symmetry of a density around some point, say zero, is frequently assumed in the literature. If this assumption is not met, subsequent statistical inference may not be valid. So it is desirable to have a test for symmetry. For example, if we are testing whether the density () of is symmetric around zero, we may let () = ( ) in (.42). (c) Conditional symmetry of a conditional density may be of great interest also. It turns about several tests are available to test for conditional symmetry. See, e.g., Su (2006). (d) Independence is a fundamental concept in statistics. A large number of parametric or nonparametric tests are available to test for independence. In testing whether two random variables and are independent, we may put =() Let and denote the marginal density of marginal density of and the joint density of and respectively. So that the null hypothesis in (.42) can be written as 0 : ( ) = () () Similarly, we can formulate hypotheses for testing for various variants of independence such as serial independence, spatial independence, or conditional independence. (e)itisusefultocomparedensities() and () that come from two different groups (male or female, white or non-white), regions (rural and urban, coastal or inland), or time periods. As Pagan and Ullah (999) remark, the above testing problems can be tackled by considering a widely accepted measure of global distance (closeness) between two densities () and () In practice, people frequently use the weighted integrated squared error: () = [ () ()] 2 () (.43) where () is a nonnegative weight function. For example, if one takes () = () or (), then (.43) can be estimated by its sample analogue b = X h i 2 b ( ) b ( ) (.44) = Another measure of distance (affinity) between two densities is the well known Kullback-Leibler (KL) distance (information) measure introduced earlier on. Under the null hypothesis, the KL distance between and is zero and it is nonzero otherwise..4. Comparison with a Parametric Density Function Now consider the problem of testing 0 : () = (; ) where (; ) is a fully specified (known) density up to the finite dimensional parameters. Given data { } let b () be the nonparametric kernel density estimator of and b be the maximum likelihood estimator for based upon the parametric density (; ) Based on the observation, () = [ () (; )] 2 = 2 () + 2 (; ) 2 () (; ) = [ ()] + 2 (; ) 2 [ (; )] 24
25 Following Fan (994), we can propose a feasible test statistic by replacing ( ) and by b ( ) and b : b = X b ( )+ 2 ; b 2 X ; b (.45) = = Under certain conditions, we can follow the proof of Theorem 4. of Fan (994) to prove the following Theorem. Theorem.3 Under some regularity conditions and 0, we have where b 2 = 2 P = = ( ) 2 b b P = 2 where (0 ) = Π = Fan (994) uses R b 2 () to replace R 2 () and proves an analog of the above theorem. Noticing that our test is a one-sided test, we will reject the null when the upper -percentile of standard normal distribution. This is true for other tests in this section too. Note that the integration in (.45) may not be needed in practice. For example, if ; b is the pdf b 2 b for a normal distribution with mean b and variance b 2 i.e., b = then ; 2 b to the density of b b 2 2 and it is easy to verify that R ; 2 b =.4.2 Testing for Symmetry is proportional 4b 2 2 in this case. To test whether a density function () is symmetric around zero, we write the null and alternative hypotheses as 0 : () = ( ) versus 0 : () 6= ( ) (.46) Noting that = [ () ( )] 2 = [ () ( )] [ () ( )] 2 2 = [ () ( )] () [ () ( )] ( ) 2 2 = [ () ( )] () = [ () ( )] () Ahmad and Li (997) propose a test based upon the last functional. Clearly, we can estimate by b = = X h b ( ) b ( )i = 2 X X = = µ µ + (.47) Under the null hypothesis and the standard assumption that 0 and Ahmad and Li (997) prove the following theorem. Theorem.4 Under some regularity conditions and 0 we have ( ) 2 b () = b 25 (0 )
26 where b 2 =4 P = b ( ) R 2 () and () = (0) ( ) is used to correct for finite sample bias. One can prove the above theorem with a simple application of the CLT for degenerate second order U-statistics. See Theorem.9 in the appendix and Exercise 4 in this chapter..4.3 Comparison with Unknown Densities Comparison of two densities is important in some empirical work. For example, we may be interested in comparing income distributions across two groups, regions, or time periods. Let { } = and { } 2 = be two samples from -dimensional random vectors. Assume that and have density and and distribution functions and respectively. The null hypothesis of interest is 0 : () = () Noticing that = = [ () ()] 2 () ()+ () () 2 () () we can propose a feasible test statistic by replacing and by b b b and b respectively, where b () = P = ( ) b () = P 2 2 = ( ) and b and b are the empirical distributions of { } = and { } 2 = respectively. This leads to b = b () b ()+ b () b () 2 b () b () = X = 2 = X X = b ( )+ X 2 2 = = 2 where = Π = (( ) ) The following theorem states the main result. b ( ) 2 X 2 2 X X 2 = = (( ) ) = Π = = b ( ) 2 2 Theorem.5 Under some regularity conditions and 0 we have ( 2 ) 2 b () h where () = (0) = + 2 i and X X b 2 =2 2 = = b 2 4 X 2 X 2 + = = X 2 X = = (( ) ) and = Π = (0 ) 2 4 X X ( 2 ) 2 = = For a proof of the above result, see Li and Racine (2006). See a variant of the above test, see Li (996). 26
27 .4.4 Testing for Independence Let ( ) 0 be a ( + ) random vector with joint cdf ( ) and pdf ( ) Further let () and 2 () denote the marginal cdf of and with marginal pdf () and 2 () respectively. The null hypothesis of interest is 0 : ( ) = () 2 (). Observing that = = [ ( ) () 2 ()] 2 ( ) ( )+ () () 2 () 2 () 2 () 2 () ( ) = [ ( )] + [ ()] [ 2 ( )] 2 [ () 2 ( )] we can propose a feasible test statistic by replacing ( ), ( ) and 2 ( ) by their leave-one-out kernel estimators b ( ), b ( ) and b 2 ( ). This will lead to the following expression b = X b ( )+ 2 = X = = X b ( ) b 2 ( ) 2 X b ( ) b 2 ( ) = where b ( )= P 6= ( ) ( ) b ( )= P 6= ( ) and b 2 ( ) = P 6= ( ) with ( ) = Π = (( ) ) and ( )= Π = (( ) ) Under certain conditions, Ahmad and Li (997) prove the following theorem. Theorem.6 Under some regularity conditions and 0, we have = ( ) 2 b b (0 ) where b 2 = 2 2 P P = 6= 2 2 with, e.g., = Π = Like the other nonparametric tests defined in this section, large values of is in favor of the alternative and we reject the null hypothesis if the upper -percentile of standard normal distribution. It is worth mentioning that the previously developed theories on kernel density estimation and testing go through under weak data dependence conditions. In the next application, we consider testing for structural change in the time series framework..4.5 Test for Structural Change in Densities Since Page (956), the problem of testing for a structural change has generated much interest in both statistics and econometrics. Early study mainly focused on the case of parameter change in the parametric framework. Recently, much attention has been paid to the whole distribution or density level when testing for structural change. Let { } be a stationary strong mixing process satisfying () =sup ( ) () () : F F + ª 0 27
28 where F = ( ) is the -field generated by and We wishes to test for the change of the marginal density of { } = So the null hypothesis is and the alternative hypothesis is 0 : have a common marginal density ; : for some (0 ) dc have a common density and dc+ have a common density 2 where dc denotes the largest integer less than or equal to and 2 are all assumed unknown. To test 0 define Define dc () = dc X dc = µ ( ) = µ () R 2 () and dc () = ( dc) X =dc+ 2 dc dc () dc () dc µ (.48) provided () 6= 0 If () =0 (.48) is defined to be zero. Under the null 0 we can define a partial sum process: ( ) = = µ () µ () 2 2 dc () 2 () 2 dc X = if () 6= 0 and ( ) if () =0 Then we can write dc () dc () µ µ ( ) = ( ) dc () Lee and Na (2004) shows for fixed { ( ) 0 } converges weakly to a standard Brownian motion process, which implies that { ( ) 0 } converges to a Brownian bridge. Let be distant real numbers. Define = max Lee and Na (2004) prove the following theorem. sup ( ) 0 Theorem.7 Suppose the regularity conditions given in Lee and Na (2004) hold. (i) Under 0 as max 0 () where 0 0 are independent Brownian bridges. (ii) Under as sup 0 if ( ) 6= 2 ( ) for some { } Thus we reject the null if is large enough. In practice, one can tabulate the critical values based on simulations on Brownian bridges. 28
Nonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationEcon 582 Nonparametric Regression
Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume
More informationConfidence intervals for kernel density estimation
Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationNonparametric Inference via Bootstrapping the Debiased Estimator
Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be
More informationEconomics 583: Econometric Theory I A Primer on Asymptotics
Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:
More informationDefect Detection using Nonparametric Regression
Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare
More information7 Semiparametric Estimation of Additive Models
7 Semiparametric Estimation of Additive Models Additive models are very useful for approximating the high-dimensional regression mean functions. They and their extensions have become one of the most widely
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationIntroduction to Maximum Likelihood Estimation
Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:
More informationNonparametric Density Estimation
Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted
More informationAdaptive Nonparametric Density Estimators
Adaptive Nonparametric Density Estimators by Alan J. Izenman Introduction Theoretical results and practical application of histograms as density estimators usually assume a fixed-partition approach, where
More informationA Bootstrap Test for Conditional Symmetry
ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School
More informationRecall the Basics of Hypothesis Testing
Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationDo Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods
Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods Robert V. Breunig Centre for Economic Policy Research, Research School of Social Sciences and School of
More informationLawrence D. Brown* and Daniel McCarthy*
Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals
More information11. Bootstrap Methods
11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationSmooth simultaneous confidence bands for cumulative distribution functions
Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationNonparametric Density Estimation (Multidimension)
Nonparametric Density Estimation (Multidimension) Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann February 19, 2007 Setup One-dimensional
More informationStatistica Sinica Preprint No: SS
Statistica Sinica Preprint No: SS-017-0013 Title A Bootstrap Method for Constructing Pointwise and Uniform Confidence Bands for Conditional Quantile Functions Manuscript ID SS-017-0013 URL http://wwwstatsinicaedutw/statistica/
More informationLong-Run Covariability
Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips
More informationWooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics
Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationUniversity, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA
This article was downloaded by: [University of New Mexico] On: 27 September 2012, At: 22:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More informationLocally Robust Semiparametric Estimation
Locally Robust Semiparametric Estimation Victor Chernozhukov Juan Carlos Escanciano Hidehiko Ichimura Whitney K. Newey The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper
More informationSemiparametric Estimation of Partially Linear Dynamic Panel Data Models with Fixed Effects
Semiparametric Estimation of Partially Linear Dynamic Panel Data Models with Fixed Effects Liangjun Su and Yonghui Zhang School of Economics, Singapore Management University School of Economics, Renmin
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationMultiscale Adaptive Inference on Conditional Moment Inequalities
Multiscale Adaptive Inference on Conditional Moment Inequalities Timothy B. Armstrong 1 Hock Peng Chan 2 1 Yale University 2 National University of Singapore June 2013 Conditional moment inequality models
More informationKernel Density Estimation
Kernel Density Estimation and Application in Discriminant Analysis Thomas Ledl Universität Wien Contents: Aspects of Application observations: 0 Which distribution? 0?? 0.0 0. 0. 0. 0.0 0. 0. 0 0 0.0
More informationNonparametric Econometrics
Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-
More informationWEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract
Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of
More information1 Appendix A: Matrix Algebra
Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationMore on Estimation. Maximum Likelihood Estimation.
More on Estimation. In the previous chapter we looked at the properties of estimators and the criteria we could use to choose between types of estimators. Here we examine more closely some very popular
More informationDoes k-th Moment Exist?
Does k-th Moment Exist? Hitomi, K. 1 and Y. Nishiyama 2 1 Kyoto Institute of Technology, Japan 2 Institute of Economic Research, Kyoto University, Japan Email: hitomi@kit.ac.jp Keywords: Existence of moments,
More informationDiscussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon
Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationNonparametric Cointegrating Regression with Endogeneity and Long Memory
Nonparametric Cointegrating Regression with Endogeneity and Long Memory Qiying Wang School of Mathematics and Statistics TheUniversityofSydney Peter C. B. Phillips Yale University, University of Auckland
More informationNonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix
Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract
More informationRobust Backtesting Tests for Value-at-Risk Models
Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society
More informationBayesian estimation of bandwidths for a nonparametric regression model with a flexible error density
ISSN 1440-771X Australia Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Bayesian estimation of bandwidths for a nonparametric regression model
More informationSemiparametric Estimation of Partially Linear Dynamic Panel Data Models with Fixed Effects
Semiparametric Estimation of Partially Linear Dynamic Panel Data Models with Fixed Effects Liangjun Su and Yonghui Zhang School of Economics, Singapore Management University School of Economics, Renmin
More informationThe Nonparametric Bootstrap
The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use
More informationSome Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model
Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;
More informationA Primer on Asymptotics
A Primer on Asymptotics Eric Zivot Department of Economics University of Washington September 30, 2003 Revised: October 7, 2009 Introduction The two main concepts in asymptotic theory covered in these
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationA Novel Nonparametric Density Estimator
A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with
More informationAveraging Estimators for Regressions with a Possible Structural Break
Averaging Estimators for Regressions with a Possible Structural Break Bruce E. Hansen University of Wisconsin y www.ssc.wisc.edu/~bhansen September 2007 Preliminary Abstract This paper investigates selection
More informationUNIVERSITÄT POTSDAM Institut für Mathematik
UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam
More informationAnalogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006
Analogy Principle Asymptotic Theory Part II James J. Heckman University of Chicago Econ 312 This draft, April 5, 2006 Consider four methods: 1. Maximum Likelihood Estimation (MLE) 2. (Nonlinear) Least
More informationSYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions
SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu
More informationTesting Statistical Hypotheses
E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions
More informationInference via Kernel Smoothing of Bootstrap P Values
Queen s Economics Department Working Paper No. 1054 Inference via Kernel Smoothing of Bootstrap P Values Jeff Racine McMaster University James G. MacKinnon Queen s University Department of Economics Queen
More informationThree Papers by Peter Bickel on Nonparametric Curve Estimation
Three Papers by Peter Bickel on Nonparametric Curve Estimation Hans-Georg Müller 1 ABSTRACT The following is a brief review of three landmark papers of Peter Bickel on theoretical and methodological aspects
More informationSupplement to Quantile-Based Nonparametric Inference for First-Price Auctions
Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract
More informationDensity Estimation (II)
Density Estimation (II) Yesterday Overview & Issues Histogram Kernel estimators Ideogram Today Further development of optimization Estimating variance and bias Adaptive kernels Multivariate kernel estimation
More informationWhat s New in Econometrics. Lecture 13
What s New in Econometrics Lecture 13 Weak Instruments and Many Instruments Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Motivation 3. Weak Instruments 4. Many Weak) Instruments
More informationEco517 Fall 2004 C. Sims MIDTERM EXAM
Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering
More informationTesting Homogeneity Of A Large Data Set By Bootstrapping
Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp
More informationBayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples
Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,
More informationA Goodness-of-fit Test for Copulas
A Goodness-of-fit Test for Copulas Artem Prokhorov August 2008 Abstract A new goodness-of-fit test for copulas is proposed. It is based on restrictions on certain elements of the information matrix and
More informationG. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication
G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationChapter 2: Resampling Maarten Jansen
Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More information14.30 Introduction to Statistical Methods in Economics Spring 2009
MIT OpenCourseWare http://ocw.mit.edu 4.0 Introduction to Statistical Methods in Economics Spring 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationResearch Article A Nonparametric Two-Sample Wald Test of Equality of Variances
Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner
More informationOne-Sample Numerical Data
One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationMaximum-Likelihood Estimation: Basic Ideas
Sociology 740 John Fox Lecture Notes Maximum-Likelihood Estimation: Basic Ideas Copyright 2014 by John Fox Maximum-Likelihood Estimation: Basic Ideas 1 I The method of maximum likelihood provides estimators
More informationV. Properties of estimators {Parts C, D & E in this file}
A. Definitions & Desiderata. model. estimator V. Properties of estimators {Parts C, D & E in this file}. sampling errors and sampling distribution 4. unbiasedness 5. low sampling variance 6. low mean squared
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationBTRY 4090: Spring 2009 Theory of Statistics
BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationMeasure-Transformed Quasi Maximum Likelihood Estimation
Measure-Transformed Quasi Maximum Likelihood Estimation 1 Koby Todros and Alfred O. Hero Abstract In this paper, we consider the problem of estimating a deterministic vector parameter when the likelihood
More informationA nonparametric two-sample wald test of equality of variances
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David
More informationLocal Polynomial Regression
VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based
More informationKullback-Leibler Designs
Kullback-Leibler Designs Astrid JOURDAN Jessica FRANCO Contents Contents Introduction Kullback-Leibler divergence Estimation by a Monte-Carlo method Design comparison Conclusion 2 Introduction Computer
More informationIntegrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University
Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y
More informationSTATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN
Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:
More informationPrevious lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)
Previous lecture Single variant association Use genome-wide SNPs to account for confounding (population substructure) Estimation of effect size and winner s curse Meta-Analysis Today s outline P-value
More informationIEOR E4703: Monte-Carlo Simulation
IEOR E4703: Monte-Carlo Simulation Output Analysis for Monte-Carlo Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Output Analysis
More informationStatistical inference
Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationLecture 2: CDF and EDF
STAT 425: Introduction to Nonparametric Statistics Winter 2018 Instructor: Yen-Chi Chen Lecture 2: CDF and EDF 2.1 CDF: Cumulative Distribution Function For a random variable X, its CDF F () contains all
More informationTime Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY
Time Series Analysis James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY & Contents PREFACE xiii 1 1.1. 1.2. Difference Equations First-Order Difference Equations 1 /?th-order Difference
More informationOn the Power of Tests for Regime Switching
On the Power of Tests for Regime Switching joint work with Drew Carter and Ben Hansen Douglas G. Steigerwald UC Santa Barbara May 2015 D. Steigerwald (UCSB) Regime Switching May 2015 1 / 42 Motivating
More informationSINGLE-STEP ESTIMATION OF A PARTIALLY LINEAR MODEL
SINGLE-STEP ESTIMATION OF A PARTIALLY LINEAR MODEL DANIEL J. HENDERSON AND CHRISTOPHER F. PARMETER Abstract. In this paper we propose an asymptotically equivalent single-step alternative to the two-step
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationA COMPARISON OF HETEROSCEDASTICITY ROBUST STANDARD ERRORS AND NONPARAMETRIC GENERALIZED LEAST SQUARES
A COMPARISON OF HETEROSCEDASTICITY ROBUST STANDARD ERRORS AND NONPARAMETRIC GENERALIZED LEAST SQUARES MICHAEL O HARA AND CHRISTOPHER F. PARMETER Abstract. This paper presents a Monte Carlo comparison of
More information