Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Size: px
Start display at page:

Download "Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation"

Transcription

1 Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric techniques that are widely applied in empirical researches with focuses on the kernel and sieve estimation. The kernel and sieve methods form the backbone of nonparametrics despite their distinctive nature in terms of local or global approximation. We occasionally touch upon alternative estimation methodology but refer the readers directly to the related books or articles. The lecture note is comprised of three parts. In the first part, we provide a rigorous introduction to nonparametric methods. After we study nonparametric density estimation and testing, we move quickly to nonparametric regression analysis. Kernel estimation with mixed data will also be addressed. Then we study the sieve estimation of the conditional mean function. In the second part, we examine various semiparametric models, which bridges the gap between parametric and nonparametric models. Here the primary interest is typically to estimate a finite-dimensional parameter in the presence of one or several infinite-dimensional nuisance parameters (i.e., nonparametric components). We provide a unified framework to analyze the asymptotic properties of the semiparametric estimator of the finite dimensional parameter and then study various semiparametric regression models in details, including partially linear models, index models, and additive models. In the third part, we focus on various topics in nonparametric and semiparametric econometrics. We will mainly discuss nonparametric kernel and sieve estimation with endogenous regressors and study the estimation of various nonparametric and semiparametric panel data models. In each chapter, we shall first introduce the theory for nonparametric or semiparametric estimation, followed by one or two applications in related areas. To help the reader grasp the material, we also include a small amount of theoretical exercises together with some real data exercises. Nonparametric Density Estimation and Testing. Introduction In this chapter we will describe the most important method of estimating density functions, namely kernel density estimation. As Pagan and Ullah (999) remarked, there are three areas in which one needs to estimate densities. First, density estimates may be required to capture the stylized facts that need explanation and to judge how well a potential model is likely to fit the data. Secondly, in the case where we need a complete picture of the distribution of an estimator, we need density estimates to summarize the information. Thirdly, some parametric estimators (e.g., quantile estimators) have asymptotic distributions that depend on a density evaluated at a specific point. Let be a generic random variable or vector with cumulative distribution function (CDF) ( ) and probability density function (PDF) ( ) Let the observations bedrawnfromanunknown distribution function Weareinterestedinestimating at a point.2 Univariate Density Estimation Several estimators have been proposed to estimate the density function nonparametrically. These include the kernel density estimator by Rosenblatt (956) and Parzen (962), the nearest neighborhood estimator

2 by Fix and Hodges (95), the series estimator by Cencov (962), the penalized likelihood estimator by Good and Gaskins (97, 980), and more recently the local likelihood estimator by Loader (993). Pagan and Ullah (999) discussed all of these estimators, among which the kernel density estimator is the best known estimator. Also, it is better developed and more widely used than others. Therefore, we will only focus on nonparametric kernel density estimation..2. Motivation for Kernel Density Estimator For simplicity, we first look at the issue of estimating the density ( ) of a scalar continuously-valued random variable at a particular point. To motivate the method, noticing that () = 0 () a.e., one can obtain a simple estimator of : () = ( + ) ( ) (.) 2 where = () is a sequence of positive constants and () is the empirical distribution function of () = X ( ) where () is the indicator function: () =if holds and 0 otherwise. Note that X 2 () = [ ( + ) ( )] = ( + ) = as a summation of independent Bernoulli random variables, has the binomial distribution Binomial( ( + ) ( )) It follows that and [ ()] = Var [ ()] = = ( + ) ( ) 2 = () if 0 as [ ( + ) ( )] { [ ( + ) ( )]} 4 2 (+) ( ) 2 () 2 { [ ( + ) ( )]} 2 0 if 0 and Thus, to guarantee a good behavior of () we should choose such that 0 and as One can also calculate the MSE of () and establish asymptotic normality for it. Rewrite () as () = = 2 2 X ( + ) One can propose a useful class of kernel density estimator of the form b() = X µ = = X ( ) = X = µ (.2) = 2 (.3)

3 where we refer to ( ) as a kernel function on R and to as a smoothing parameter (or alternatively, a bandwidth ), and () = (). (2) is called a uniform kernel estimator because the kernel function ( ) corresponds to a uniform pdf on [-, ]. It is sometimes referred to as a näive kernel estimator and was first introduced in Fix and Hodges (95). In practice, there are a variety of kernel functions that might be chosen, among which, three are most popular choices: () the Gaussian kernel () = exp µ 2 (.4) 2 2 (2) the Epanechnikov kernel () = 3 2 ( ) (.5) 4 (3) the Biweight or Quartic kernel () = ( ) (.6) 6 Another three less frequent choices of kernels are: (4) the Uniform kernel () = ( ) (.7) 2 (5) the Triangular kernel () =( ) ( ) (.8) (6)theTriweightkernel () = ( ) (.9) 32 It turns out that the choice between these kernels rarely make significant difference in the estimates. The kernel functions are used to smooth the data whereas the amount of smoothing is controlled by the bandwidth 0 Intuitively, () b is the average of a set of weights. If a large number of the observations are near then the weights are relatively large and () b is large. Conversely, if only a few are close to then the weights are small and () b is small. The bandwidth controls the degree of closeness..2.2 Asymptotic Properties of the Kernel Density Estimator The properties of b () are well studied in the literature. To summarize some important properties of b() we make the following assumptions. A The observations are IID with density A2 The second order derivatives of are continuous and bounded in the neighborhood of A3 The kernel is a symmetric PDF around zero satisfying (i) R () = (ii) R 2 () = 2 (0 ) (iii) R 2 () A4 As = () 0 and is a second-order kernel if R () = R () =0 and R 2 () So Assumption A3 implies that is a second-order kernel. 3

4 . () b is a valid density. That is, () b 0 for all and it integrates to : X µ X µ b() = = = = X µ µ = = = X () = = where the fourth equality applies the Fubini Theorem and change-of-variables: =( ) 2. The moments of the density b () can be calculated easily. The mean is b () = = = = X = = X µ ( + ) () X () + = X = () where the last equality follows from the facts R () =and R () =0The second moment is 2 () b X µ = 2 = = X ( + ) 2 () = = X = where 2 = R 2 () is the variance of the kernel. Consequently, R 2 () b P = 2 if 0 as 3. MSE of () b Suppose that () is second order continuously differentiable, then h i b() = = = µ () () ( + ) () ()+ 0 () + 2 (2) () = ()+ 2 (2) () This leads to the bias expression h i Bias b() = 2 (2) () (.0) 4

5 For the variance, we have h i Var b() = Var [ ( )] = [ ( )] 2 { [ ( )]} 2 µ 2 = 2 () µ ()2 + = () 2 ( + ) µ ()2 + = () Since the variance is of order () assumption A4 ensures that Var Adding (.) to the square of (.0) we obtain where 02 (MISE) µ () 2 + (.) h i b() h i b() = 4 (2) () () 02 + µ + 4 converges to zero. = R ()2 Integrating this expression, we obtain the mean integrated squared error µ () = ()+ + 4 (.2) where () = (2) (.3) and (2) R (2) () 2 is a measure of the total curvature of We call AMISE the asymptotic MISE since it provides a useful large sample approximation to the MISE. Both MISE and AMISE are global measures of precision for estimation of () Notice that the integrated squared bias is asymptotically proportional to 4 so for this quantity to decrease we need to take to be small. However, since the leading term of the integrated variance is proportional to () we need to choose large to reduce the variance. Therefore, as increases should vary in such a way that each of the components in MISE or AMISE becomes smaller. This is known as the variance-bias trade-off. This is a mathematical quantification for the critical role of the bandwidth. Example (Density estimate of annual salary income). In Figure we use the Panel Study of Income Dynamics (PSID) dataset to estimate the density of annual salary income in the USA. We have = 445 observations on the annual salary income for the year We choose the standard normal kernel. The bandwidth is chosen according to = 5 for =025.06, and 4, where is the sample standard deviation. For comparison, we also include plot of the density estimates using the least squares cross-validated (LSCV, see below) bandwidth. Figure illustrates the effect of choosing the bandwidth. For small value of ( =025) the estimated density function is very spiky and hence very variable in the sense that, over repeated sampling from the true density the spikes would appear in different places. There is, however, very little bias. When is increased, the variability is reduced at the expense of introducing bias. When the over-smoothing bandwidth is applied ( =4) the estimated density function is too flat, indicating significant bias. For the given dataset, we observe that the density estimate using the ROT bandwidth ( =06 to be introduced below) almost coincide with that using the LSCV bandwidth. Nevertheless, the AMISE is a much simpler expression to comprehend than the expression for the MISE given by (.2). The main advantage of AMISE is that we can define the optimal bandwidth with 5

6 2 x c=0.25 c=.06 c=4 LSCV.4.2 Density Annual salary income x 0 5 Figure : Density estimate based on the PSID annual salary income data in 2003 (Kernel: standard normal, bandwidth: = 5 for = and 4 Least squares cross validated (LSCV) bandwidth is also used.) 6

7 respect to this criterion, and it has a closed form expression. To be specific, we define the asymptotically optimal bandwidth 0 as the value to minimize () : 0 =argmin () The solution can be found by solving the first order condition yielding R 0 = ()2 (2) R 2 () (.4) Aside from its dependence on the known and this expression shows us that 0 is inversely proportional to (2) 5 Thus for a density with little curvature, (2) will be small and a large bandwidth is called for. On the other hand, when (2) is large, little smoothing will be desired. Unfortunately, direct use of (.4) to choose a good bandwidth in practice is not possible since (2) is unknown. However, there now exist several rules for choosing which may or may not be based on estimating (2) Let denote the sample standard deviation. For the Gaussian, Epanechnikov, and Biweight kernels discussed above, one can choose 0 = and respectively. Such choices of bandwidth are called bandwidth selected by Silverman s rule of thumb (ROT) in the literature. Substituting (.4) into (3) leads to inf 0 = 5 4 ( 2 2 µ 4 () 2 (2) )5 45 (.5) which is the smallest possible AMISE for estimation of using the kernel We can rewrite the information conveyed by (.4) and (.5) in terms of the MISE itself using the asymptotic notation. Let be the minimizer of MISE() defined in (.2). Then R ()2 (2) R 2 () and inf ( 2 2 µ 4 () 2 (2) )5 45 These expressions give the rate of convergence of the MISE optimal bandwidth and the minimum MISE, respectively, to zero as Under the stated assumptions, the best obtainable rate of convergence of MISE of the kernel density estimator is of order 45 which is slower than the typical parametric rate of order for MSE convergence..2.3 Univariate Bandwidth Selection There are several ways to choose the bandwidth in practice. We introduce the most widely used methods only. Rule of thumb and plug in method (4) shows that the optimal smoothing parameter depends on the integrated second derivative of the unknown density. In practice, we may choose an initial pilot value of (say the ROT one) to estimate (2) R (2) () 2 nonparametrically, and then use 7

8 this value to obtain estimate of 0 optimally. Such a method is referred to as plug-in method in the literature. It is worth mentioning the estimation of (2) requires estimation of the second order derivatives of which is introduced in the next subsection. A popular way of choosing the initial value of is to assume that () belongs to certain parametric family of distributions, and then estimate the optimal bandwidth using (4) For example, if we assume () is a normal density function with variance 2 we get the pilot estimate =(4) 0 h (38) 2i which is plugged into R b (2) (). The latter is then used to obtain the estimate for the optimal bandwidth. In practice we replace by the sample standard deviation of { } whereas Silverman (986) advocates using a robust measure of spread which replaces by an adaptive measure of spread given by min(standard deviation, interquartile range/.34). Least squares cross validation Another popular method in choosing the bandwidth parameter is least squares cross validation, which is a fully automatic and data-driven method. This method is based on the principle that a bandwidth is chosen to minimize the mean squared error of the resulting estimate. The integrated squared difference between b and is h i 2 = b () () h i 2 = b () 2 b () () + () 2 (.6) () 2 h i Note that the last term does not depend on the bandwidth and 2 = b () where ( ) denotes expectation with respect to not with respect to the random observations { } used in defining b ( ) Therefore, 2 can be estimated by b 2 = X X X b ( )= ( ) ( ) = = 6= where b ( )= X ( ) =6= is the leave-one-out kernel estimator of ( ) We ask the reader to verify as an exercise that we cannot use the usual kernel density estimate b ( ) in obtaining b 2 For the first term we have h i 2 = b X X () = 2 ( ) ( ) = = X X µ µ = 2 = = X X µ = 2 () = = X X µ = 2 = = 8

9 where () = R () ( ) is the convolution kernel derived from ( ) Given the exact form of ( ) we may obtain an analytic expression for ( ) For example, if () =exp then () =exp a normal density with zero mean and variance 2 This follows from the fact that two independent (0 ) random variables sum to an (0 2) random variable. So we choose to minimize () = 2 b 2 = X X µ 2 X X 2 ( ) (.7) ( ) = = = 6= Let b denote the solution to the above cross-validation problem. Härdle et al. (988) show that That is, b 0 converges to as we would expect. b 0 0 = 0 Likelihood cross validation Another data-driven method to choose the bandwidth is done by the likelihood cross validation. This approach yields a density estimates which has an entropy interpretation. That is, the bandwidth is chosen to minimize the Kullback-Leibler distance between the estimated density and the true one. In general, the Kullback-Leibler distance between two density functions and is given by µ µ () () ( ) ln = ()ln () () = ()ln(()) ()ln(()) Here taking = and = b we have ()= b ()ln(()) ()ln b () The first term on the right hand side does not depend on the bandwidth and the second term is hln b () i which can be estimated by P = log b ( ) Therefore minimizing () b is equivalent to maximizing the log-likelihood. So we choose to maximize X = log () = log b ( ) The main problem with the likelihood cross validation is that it is severely affected by the tail behavior of () and works poorly when the true distribution has fat tails..2.4 Univariate Density Derivative Estimation Estimation of density derivatives may be needed in several cases. First and second derivatives may be of intrinsic interest as measures of slope and curvature. Other important functions, such as the score function 0 depend on density derivatives. Automatic plug-in bandwidth selection methods require estimation of quantities involving density derivatives too. = 9

10 When the density function () is th order differentiable at a natural estimator of its th derivative () () is b () () = X µ + () (.8) = which is the th derivative of b () The mean squared error properties of b() () can be derived straightforwardly to obtain b () () = () h i ½ ¾ 2 2+ () (+2) () (.9) 2+ where () = R () () 2 and 2 = R 2 () It follows that the MSE-optimal bandwidth for estimating () () is of order (2+5) Therefore, estimation of 0 () requires a bandwidth of order 7 compared to the optimal bandwidth rate 5 for the estimation of () itself. Moreover, the optimal MSE or its integrated version is of order 4(2+5) This rate becomes slower for higher values of, whichreflects the increasing difficulty inherent in the problems of estimating higher order derivatives..2.5 Univariate Cumulative Distribution Function Estimation We can estimate the cumulative distribution function (CDF) () by the empirical distribution function (edf) () Nevertheless, () is not smooth as it jumps by at each sample realization point. We can obtain a smoothed kernel estimate of () by integrating b () : b () = = = = b () = X ( ) = X µ µ = X µ = X = where () = R () To calculate the MSE of b h i h i () we first calculate b () and b () 2 () h i b () µ µ = = () = () ( ) = () ( ) = () ( ) = + ( ) () = 0+ () () () + 2 (2) () 2 2 3! (3) () ! (4) () () = ()+ 2 2 (2) () ! (4) ()

11 where = R () ( =24) () () is the th order derivative of Similarly, we can show that µ µ Var b () = Var µ 2 = ( ½ µ ¾ ) 2 = () 0 ()+ 2 ª n ()+ 2 2 (2) () o 2 = ()[ ()] 0 ()+() where 0 =2 R () () Consequently b () h i 2 h i n h io 2 = b () () = Var b () + bias b () = ()[ ()] 0 ()+ 4 2 ()+ 4 + where 2 () = 2 (2) () 2 To choose the bandwidth, we can minimize the mean integrated squared error of b It turns out the optimal bandwidth in this case is 0 =[ 0 (4 2 )] 3 3 (.20) where 2 = R 2 () Hence the optimal bandwidth for estimating a univariate CDF has faster rate of convergence than the optimal bandwidth for estimating a univariate PDF. Bowman et al. (998) suggest choosing to minimize the following cross-validation function () = X = h ( ) b ()i 2 (.2) where b ( )=( ) P =6= (( ) ) istheleaveoneoutestimatorof ( ) Let b denote the value of minimizing () It can be shown that b 0 Using = b we can easily derive the asymptotic distribution of b () : b () () (0 ()( ())) (.22) So the asymptotic property of b () is the same as that of the empirical distribution function () In particular, it converges to its probability limit () at the parametric rate. Example 2 (Cumulative distribution estimates of annual salary income). Using the same data as in Example, we now estimate the cumulative distribution function ( ) of annual salary income in the USA. We choose the standard normal kernel. We could choose the bandwidth by minimizing the cross-validation criterion function in (.2). For simplicity, we choose the bandwidth by using the LSCV criterion function for estimating the density functions (see (.7) and then adjust it to have the optimal rate for estimating the cdfs. That is, we set = 5 3 = 2 5 where is the least squares cross validated bandwidth for estimating the pdf Figure 2 plots the smoothed kernel estimate b () of () together with the edf estimate () The two curves almost coincide except that the edf curve is wiggly even for the dataset with large sample size ( = 445)

12 Probability smoothed kernel estimate for cdf empirical distribution function Annual salary income x 0 5 Figure 2: Smoothed kernel estimate and empirical distribution estimate for the cdf of annual salary income (LSCV bandwidth is used in the kernel estimate. See the text for details.).2.6 Higher Order Kernels and Bias Reduction Recall that we have used the so-called second order kernel in the previous sections. For the univariate kernel density estimate, the best obtainable rate of convergence of the MISE is of order 45 In this subsection, we demonstrate that it is possible to obtain better rates of convergence by relaxing the restriction that the kernel be a probability density function. To this goal, higher order kernels are needed because they can help reduce the bias in the kernel density estimates. Recall the result for bias of () b in (.0): Bias b() = 2 2 (2) () Here the kernel is constrained to be a probability density function so that it is necessary to have 2 = R 2 () 0 Nevertheless, without this restriction, it is possible to construct such that 2 =0 which will have the effect of reducing the bias to be of order 4 provided that has a continuous square integrable fourth derivative. It is easy to check that the MSE and MISE will have optimal rate of convergence of order 89 if such a kernel is used. In general, the order of a kernel ( ), ( 0) is defined as the order of the first non-zero moment of the kernel. A general th order kernel must satisfy the following conditions, () () = () () = 0 for = () () = 2

13 For example, the standard normal (Gaussian) kernel () =(2) 2 exp 2 2 is a second order kernel. If we use a th order kernel in estimating the multivariate density, we can show that Bias b () = ( ) (.23) Var b () = () (.24) Consequently, and b () = 2 +() b () () = +() 2 It is simple to construct symmetric higher order kernel functions. In order to construct a th ( 2 even) order kernel, one could begin with a second order kernel such as the standard normal (Gaussian) kernel, set up a polynomial in its arguments, and solve for the roots of the polynomial subject to the desired moment restrictions. For example, let () =(2) 2 exp 2 2 we could begin with () = 2 X =0 2 () (.25) where ( =0 2 ) are constants that must satisfy the requirement of a symmetric th order kernel. One can verify that the 4, 6, 8th order Gaussian kernels are given respectively by µ 3 4 () = () µ 5 6 () = () µ 35 8 () = () The general formula for the kernels that are higher-order extensions of the second order normal kernel are given by () = 2 X =0 ( ) 2! (2) () =0 2 4 See Wand and Schucany (990) or Wand and Jones (995, p.34). If we start with the second order Epanechnikov kernel (standardized to have unit variance): 2 () = 3 µ the 4,6, 8th order Epanechnikov kernels are given respectively by µ 5 4 () = () µ 75 6 () = () µ () = () 3

14 nd order 4th order 6th order 8th order Value of kernel: k(u) u Figure 3: Gaussian kernels of various orders.6.4 Value of kernel: k(u) nd order 4th order 6th order 8th order u Figure 4: Epanechnikov kernels of various orders 4

15 Obviously, when 2 the kernels can take negative values on their support. This is a drawback for higher order kernel when used in density estimation because they may assign negative weight in the density estimate. In some cases, we may have negative density estimate. Example 3 (Higher order kernels). Figures 3 and 4 plot the Gaussian and Epanechnikov kernels of various orders. For the two figures, we clearly see that for kernels of order higher than 2 they can indeed take negative values. There are some other rules for constructing higher-order kernels (e.g., Jones and Foster, 993). Let denote a th order kernel. Then one can use the following recursive formula to generate higher order kernels: +2 () = 3 2 ()+ 2 0 () (.26) where () is assumed to be differentiable. For example, application of (.26) to the second order Gaussian kernel 2 () = () leads directly to the fourth-order kernel given above..3 Multivariate Density Estimation We now investigate the extension of the univariate kernel density estimator to the multivariate setting. As Wand and Jones (995) remark, the need for nonparametric density estimates for recovering structure in multivariate data is greater since parametric modelling is more difficult than in the univariate case. The extension of the univariate kernel methodology is not without its problems. Firstly, the most general kernel smoothing parametrization of the kernel density estimator in higher dimensions requires the specification of many more bandwidth parameters than in the univariate setting. This leads us to consider simpler smoothing parametrization as well. Secondly, the sparseness of data in higher-dimensional space makes kernel smoothing more difficult unless the sample size is very large. This phenomenon is usually referred to as the curse of dimensionality problem in the nonparametric literature. It means that, with practical sample sizes, reasonable nonparametric density estimation is very difficult in more than about five dimensions (see Exercise in this chapter). Nevertheless, there have been many studies where the kernel density estimator has been demonstrated to be an effective tool for displaying structure in bivariate samples (e.g., Silverman, 986, Scott, 992). The multivariate kernel density estimate has also played an important role in developments of the visualization of structure in three- and four-dimensional data sets (Scott, 992). Also, in many nonparametric hypothesis testing (e.g., testing for conditional independence, Su and White, 2007, 2008, 203; Song, 2009, Huang, 200), it is required to estimate multivariate densities..3. The Multivariate Kernel Density Estimator Let denote a -variate random sample having density We will use the notation = ( ) 0 to denote the components of and a generic vector R will have the representation =( ) 0 In its most general form, the -dimensional kernel density estimator of takes the form b () = X ( ) where is a symmetric positive matrix called the bandwidth matrix, () = 2 2 is a -variate kernel function satisfying R () = and is the determinant of. = 5

16 The kernel function isoftenchosentobe-variate probability density function. There are two common techniques for generating multivariate kernels from a symmetric univariate kernel : Y ( 2 0 ) () = ( ) and () = R ( 2 0 ) = The first of these is often called a product kernel, and the second has the property of being spherically or radially symmetric. A popular choice for is the standard -variate normal density µ () =(2) 2 exp 2 0 in which case ( ) is the ( ) density in the vector It is well known that the -variate normal kernel can be constructed from the univariate standard normal density using either the product or spherically symmetric extensions. In general, the bandwidth matrix has ( +)2 independent entries, which, even for moderate can be a substantial number of smoothing parameters to have to choose. Another extreme case is to specify = 2 where is the identity matrix. In this case, the kernel estimator of the density is a straightforward generalization of (.3): b () = X = µ (.27) where, is a multivariate kernel function such that R R () = Nevertheless, the use of a single bandwidth may not be appropriate if the variation in one component of is much greater than the others. In this case, it may be more appropriate to use a vector or matrix of bandwidth parameters. An attractive practical approach is to linearly transform the data to have a unit covariance matrix, apply (.27) to the transformed data, and finally transform back to the original metric. Almost all the results in the previous subsection go through for the multivariate density estimator b () with obvious modifications. In particular, the ROT bandwidth is proportional to (4+) A realistic choice is to set = diag 2 2 Just as in the univariate case, some important choices have to be made when constructing a multivariate kernel density estimator. Nevertheless, the extension to higher dimension means there are more degrees of freedom. Below we restrict ourself to use the product kernel and diagonal. Forbrevity,we will write () instead of () for the product kernel. We estimate () by b () = X ( ) = = where =( ) and ( ) = b () below. Y = X = = Y µ (.28) We will study the asymptotic properties of.3.2 Asymptotic Properties of the Multivariate Kernel Density Estimator As in the univariate setting we can obtain a simple asymptotic approximation to the MISE of a multivariate kernel density estimator under certain smoothness assumptions on the density and some standard assumptions on the kernel and bandwidth. Before studying this, we first study the asymptotic normality 6

17 of the nonparametric kernel density estimator b () For this purpose, we make the following assumptions on and A The observations are IID with density A2 is third order continuously differentiable at the interior point of its support. A3 is a product of univariate kernel which is a symmetric function around zero satisfying (i) R () = (ii) R 2 () = 2 (0 ) (iii) R 2 () = 02 A4 As,and 0 for = Clearly, Assumptions A-A4 parallel those in Section.2 and they are not the minimal requirements. The choice of product kernel and diagonal bandwidth matrix is just for the simplicity of notation. One can generalize the results below straightforwardly to the general choice of kernel and bandwidth (e.g., Wand and Jones, 995). Just as in the univariate setting, under the above assumptions we can easily show that à Bias b () = X X! 2 () (.29) 2 = and µ Var b () = 02 ()+ 2 (.30) where () is the second order derivative of () with respect to 2 = R 2 () 02 = R () 2 and 2 = P = 2 Consequently, b () = h Bias b () i 2 + Var b () = = 2 2 +( ) (.3) which converges to zero under Assumption A4. Since convergence in MSE implies convergence in probability by the Chebyshev inequality, we have b () () To derive the asymptotic normality of b () we can write it as the summation of a double-array random variable and apply the Liapounov central limit theorem (CLT) in the appendix. Thus we state the following theorem. Theorem. Suppose Assumptions A-A4 hold. Suppose that R () 2+ for some 0 If P = 6 0, then Ã! p b X 2 () () () 2 (0 02 ()) (.32) 2 where = R () for =0 2 Proof. We only sketch the proof. Write n h i b () () = b () () = o n h io + b () b () + The first term contributes to the bias of the estimator and the second term contributes to the variance. By (.29), it suffices to show that p (0 02 ()) Define µ µ ¾ =( ) ½ 2 7

18 Then p = P = Note that ( )=0and P = Var( ) = 02 ()+() To verify the other conditions in Theorem.8, it suffices to check that P = 2+ = () By the and Jensen inequalities X 2+ = X µ µ 2+ ( ) + 2 = ( 2 + µ 2+ µ 2+ ) + 2 ( ) µ 2+ 2 ( ) + 2 = 2 + ( ) 2 () 2+ ( + ) = ( ) 2 = () where denotes Hadamard product. So we can apply the Liapounov CLT to conclude p (0 02 ()) Remark. It can be easily shown that cov ˆ ( ) ˆ ( 2 ) = () for any two distinct points and 2 on the support of () By the Cramér-Wold device, we can show that for any finite collection of points p P ˆ ( ) ( ) 2 2 = ( ) 2. (0 02 p ˆ ( ) ( ) P 2 2 = ( ) diag ( ( ) ( ))) 2 That is, ˆ ( ) ˆ ( ) are asymptotically independent of each other. This observation is very useful when we try to construct pointwise confidence intervals for () when is evaluated at distinct points since we do not need to take into account any dependence between the estimators at different points. For the asymptotic MISE approximations, we need the second order partial derivatives of to be square integrable. From (.29)-(.3), we can obtain the MISE of b () µ () = ()+ + 2 (.33) where = P = 2 () = " 2 X () # 2 + = 02 (.34) Unlike the univariate setting explicit expressions for the AMISE-optimal bandwidth are not available in general and this quantity can only be obtained numerically (see Wand, 992a). In the special case = = = the optimal bandwidth has an explicit formula and is given by = " # (+4) 02 R () ª 2 (+4) The inequality says that + 2 ( + ) where 8

19 where 2 () = P = 2 () One can easily obtain the minimum AMISE as +4 inf () = 0 4 ½( 2 ) 2 ( 02 )4 2 () ª 2 4 ¾ (+4) The last expression implies that the rate of convergence of inf 0 () is of order 4(+4) arate which becomes slower as the dimension increases. This slower rate indicates the curse of dimensionality as discussed previously..3.3 Multivariate Bandwidth Selection As is the case for univariate density estimation, the optimal bandwidth =( ) should balance the squared bias and variance terms. This requires = ( 4 ) = (4+) Let 0 =( 0 0 ) denote the optimal bandwidth that minimizes the AMISE of b () Let = Y where () is the convolution kernel of ( ) In practice, we often choose the bandwidth = =( ) by minimizing the following least squares cross validation function () = 2 b X X µ 2 X X 2 = 2 ( ) (.35) ( ) = = = 6= which is a generalization of (7) from the univariate case to the multivariate density estimation setting. Let b = b b denote the solution to the above cross-validation problem. Then one can show b 0 for = It is worth mentioning that one can also apply the plug-in principle to choose bandwidth in the multivariate setting. Using the asymptotic MISE approximations developed in the last subsection, it is possible to develop a multivariate version of the plug-in type bandwidth. See Wand and Jones (995) for details..3.4 Construction of Confidence Intervals P If undersmoothing bandwidths are chosen so that = 4 0 then Theorem. implies that the pointwise 00( )% (or simply ) asymptoticconfidence interval (CI) for () can be constructed as follows s ˆ ˆ s () 02 () 2 ˆ ˆ 02 () ()+ 2 (.36) where 2 denotes the ( 2)-percentile of the standard normal distribution. Nevertheless, the above CI can be badly behaved in finite samples in that its coverage probability may be significantly smaller than the nominal level ( ). An alternative method is to use bootstrap method to construct the CI for kernel density estimator. Since CIs for high-dimensional density estimators ( 3) are seldom used, we now turn to the construction of the confidence band for a uni-variate density estimator. In this case, the CI in (??) can be written as s ˆ () 2 s 02 ˆ () ˆ () ˆ () (.37) 9

20 where ˆ () = E and 02 = R () 2 The pointwise confidence band for () is s s B () = ( ) : X ˆ () 2 02 ˆ () ˆ () ˆ () where X is the support of The coverage probability of B () at a point is given by ( ) = {( ()) B ()} (.38) The limit of ( ) is not given by unless one uses undersmoothing bandwidth to remove the asymptotic bias of ˆ () h i To consider the bootstrap CI for () or ˆ () we define another estimator of the variance of ˆ () Note that the variance of ˆ () (in the case of =)is given by à 2 X µ! () = Var = X µ µ 2 2 Var = = ( = µ µ 2 µ µ 2) 2 E E Hall (992) proposes to estimate 2 () by " ˆ 2 () = X µ µ # 2 2 ˆ () 2 h i We now describe how to construct the bootstrap CI for ˆ () in several steps: =. Draw a bootstrap sample { } = randomly (with replacement) from the original sample D { } 2. Compute with undersmoothing choice of bandwidth : ˆ () = X = " 2 ˆ 2 () = µ X = µ () = ˆ () ˆ () ˆ () µ # 2 ˆ () 2 n o 3. Repeat Steps -2 for a large number of times, say times, we obtain estimators () = o Let 2 and 2 n denote the 2 and 2 percentiles of () Then the symmetric h i = two-sided CI of ˆ () is given by n ˆ () ˆ () 2 ˆ () ˆ () 2o (.39) 20

21 h i Note that ˆ () D = ˆ () That is, ˆ () is an unbiased estimator of ˆ () conditional on the data, though h ˆ i() is a biased estimator of () Hence, () is a bootstrap -statistic for forming a CI for ˆ () The CI in (.39) can be regarded as the CI for () when we use the undersmoothing bandwidth so that the bias of ˆ () is asymptotically negligible. Horowitz (200, Handbook of Econometrics) demonstrates that the bootstrap provides asymptotic refinements fortestsofhypothesisand confidence intervals in density estimation. More recently, Hall and Horowitz (203, AoS) propose a simple bootstrap method for constructing the CI for kernel density estimator without the need of undersmoothing. Instead of drawing the bootstrap observations randomly from D Hall and Horowitz (203) suggest that we can draw a random sample D = { } from the distribution with density ˆ () and define ˆ to be the corresponding kernel estimator of ˆ : ˆ () = X µ = One can show that conditional on D the asymptotic bias and variance of ˆ () are the same as that of ˆ (), ignoring asymptotically negligible terms. So that we can construct the CI or confidence band for () based on ˆ () The bootstrap version of B () in (.38) is given by s B () = ( ) : X ˆ 02 ˆ () () 2 ˆ ()+ 2 s 02 ˆ () The bootstrap estimator ˆ ( ) of the coverage probability ( ) that B () covers ( ()) is defined by n ˆ ( ) = ˆ o () B () D and can be computed, by Monte Carlo simulations, in the form X n ˆ o () B () (.40) = where B () is calculated as B () based on the th bootstrap resample. [The bootstrap resamples are independent of each other conditional on D ]Forlargeenough we can treat (.40) as ˆ ( ), ignoring the small simulation error. Then we can define ˆ ( 0 ) to be the solution, in of ˆ ( ) = 0 Take X 0 =[ ] as a subset of X.Let be evenly distributed on [ ] such that = 0 + = so that the distance between two adjacent points is given by =( ) ( +) Let ˆ ( 0 ) denote the -level empirical quantile of points in the set nˆ ( 0 ) ˆ o ( 0 ) For a value (0 2 ] construct the 0 confidence band B (ˆ ( 0 )) by taking sufficiently small. Hall and Horowitz (203) recommends =00 and remarks that =005 maybewarrantedinthecaseof large samples (p.899)..3.5 Conditional Density Estimation Even though conditional pdfs form the backbone of most popular statistical methods in use today, they are not modeled directly in the parametric setting. They have received even less attention in the nonparametric literature. A few exceptions include Chen et al. (2003), and Li and Racine (2007). Let =() where is a scalar and =( ) is a vector. We are interested in estimating the conditional density of given = which is denoted as ( ) Writing ( ) = 2

22 ( ) () we estimate it by where and b ( ) = b ( ) = b ( ) b () X ( ) ( ) = b () = X ( ) = The asymptotic properties of b ( ) caneasilybederivedfromthoseof b ( ) and b () To consider the choice of the bandwidths, we can consider the following criterion function based upon the weighted integrated squared error (ISE): n o 2 = b ( ) ( ) () () = ( ) 2 () () (.4) where the last term does not depend on and = b ( ) 2 () () and 2 = b ( ) ( ) () () Here we use the weight function () to mitigate the random denominator problem. Let b () = R b ( ) 2 Then = R b () b () 2 () () One can verify that b () = X X 2 ( ) ( ) ( ) ( ) = 2 = = X = = X µ ( ) ( ) where () = R () ( ) is the convolution kernel of ( ) We estimate and 2 by b = X = b ( )= 2 b ( ) ( ) b ( ) 2 and b 2 = =6= =6= X = b ( ) ( ) b ( ) 2 respectively, where the subscript denotes the leave-one-out estimators. For example, Ã! X X ( ) ( ) Thus, we can choose ( ) to minimize the following cross-validation function ( ) ( )= b 2 b 2 Let b b be the solution to the above minimization problem. Let ( 0 0 ) be the minimizer of (.4). Hall et al. (2003a) shows that b 0 0 (+5) b and 0 0 (+5) for = 22

23 .3.6 Uniform Rates of Convergence Up to now we have only demonstrated the case of pointwise and mean integrated square error consistency for the density estimator b () In fact, the consistency of b () can be generalized to a stronger uniform consistency result. It can be proved that the nonparametric kernel estimators are uniformly strongly (almost sure) consistent. This result is important for theoretical purpose. So we only report the result below. For the proof of the result, we refer the reader to Masry (996a,b). Theorem.2 Under some regularity conditions given in Masry (996a, b), we have (i) b () () = ( ) 2 (ln ) 2 + P = 2 S h i 2 (ii) b () () = ( ) + P = 4 S where S is a compact set in the interior of the support of In the above theorem, we restrict ourself to a compact set in the interior of the support of Suppose is compactly supported, it is well known that when lies at the boundary of the support, we cannot estimate at the usual rate. In fact the MSE of b () is not () in this case. Some modifications are needed to consistently estimate () for at the boundary of the support. For details, see Gausser and Müller (979), Rice (984), Hall and Wehrly (99), Scott (992, pp ), Wand and Jones (995, pp ). Example 4 (Boundary bias problem). Suppose has compact support [0 ] and (0) 0 Suppose we estimate (0) by the usual nonparametric kernel estimator b (0) that is, b (0) = () P = 0 Then h i b (0) = µ = () 0 = () () () (0) = (0) 2 0 where we have used the dominated convergence theorem and the fact that ( ) is a symmetric kernel (implying R 0 () =2) So b (0) is a biased estimator for (0) even asymptotically and the asymptotic bias of of magnitude (0) 2 When is infinitely supported, however, we can extend the above result to the full support of For more details, see the work by Hansen (2008). 0.4 Testing Hypotheses about Densities In this section we consider testing hypotheses about densities. Suppose and are two possible densities for the random variable or vector We may like to test several types of hypotheses regarding these densities, each of which will be formulated as testing for 0 : () = () versus : () 6= () (.42) Pagan and Ullah (999) consider several examples which we reformulate below. Examples. (a) It is sometimes desirable to test whether a nonparametrically estimated density has a particular form, say normal density. In this example, we estimate () nonparametrically and estimate () parametrically according to the parametric assumption: (; ) where is a vector of unknown parameters. 23

24 It is interesting to note that the finite dimensional parameter can usually be estimated at the parametric -rate, which is faster than any nonparametric rate of estimation, so that whether is estimated will not have any asymptotic impact on the test. (b) Symmetry of a density around some point, say zero, is frequently assumed in the literature. If this assumption is not met, subsequent statistical inference may not be valid. So it is desirable to have a test for symmetry. For example, if we are testing whether the density () of is symmetric around zero, we may let () = ( ) in (.42). (c) Conditional symmetry of a conditional density may be of great interest also. It turns about several tests are available to test for conditional symmetry. See, e.g., Su (2006). (d) Independence is a fundamental concept in statistics. A large number of parametric or nonparametric tests are available to test for independence. In testing whether two random variables and are independent, we may put =() Let and denote the marginal density of marginal density of and the joint density of and respectively. So that the null hypothesis in (.42) can be written as 0 : ( ) = () () Similarly, we can formulate hypotheses for testing for various variants of independence such as serial independence, spatial independence, or conditional independence. (e)itisusefultocomparedensities() and () that come from two different groups (male or female, white or non-white), regions (rural and urban, coastal or inland), or time periods. As Pagan and Ullah (999) remark, the above testing problems can be tackled by considering a widely accepted measure of global distance (closeness) between two densities () and () In practice, people frequently use the weighted integrated squared error: () = [ () ()] 2 () (.43) where () is a nonnegative weight function. For example, if one takes () = () or (), then (.43) can be estimated by its sample analogue b = X h i 2 b ( ) b ( ) (.44) = Another measure of distance (affinity) between two densities is the well known Kullback-Leibler (KL) distance (information) measure introduced earlier on. Under the null hypothesis, the KL distance between and is zero and it is nonzero otherwise..4. Comparison with a Parametric Density Function Now consider the problem of testing 0 : () = (; ) where (; ) is a fully specified (known) density up to the finite dimensional parameters. Given data { } let b () be the nonparametric kernel density estimator of and b be the maximum likelihood estimator for based upon the parametric density (; ) Based on the observation, () = [ () (; )] 2 = 2 () + 2 (; ) 2 () (; ) = [ ()] + 2 (; ) 2 [ (; )] 24

25 Following Fan (994), we can propose a feasible test statistic by replacing ( ) and by b ( ) and b : b = X b ( )+ 2 ; b 2 X ; b (.45) = = Under certain conditions, we can follow the proof of Theorem 4. of Fan (994) to prove the following Theorem. Theorem.3 Under some regularity conditions and 0, we have where b 2 = 2 P = = ( ) 2 b b P = 2 where (0 ) = Π = Fan (994) uses R b 2 () to replace R 2 () and proves an analog of the above theorem. Noticing that our test is a one-sided test, we will reject the null when the upper -percentile of standard normal distribution. This is true for other tests in this section too. Note that the integration in (.45) may not be needed in practice. For example, if ; b is the pdf b 2 b for a normal distribution with mean b and variance b 2 i.e., b = then ; 2 b to the density of b b 2 2 and it is easy to verify that R ; 2 b =.4.2 Testing for Symmetry is proportional 4b 2 2 in this case. To test whether a density function () is symmetric around zero, we write the null and alternative hypotheses as 0 : () = ( ) versus 0 : () 6= ( ) (.46) Noting that = [ () ( )] 2 = [ () ( )] [ () ( )] 2 2 = [ () ( )] () [ () ( )] ( ) 2 2 = [ () ( )] () = [ () ( )] () Ahmad and Li (997) propose a test based upon the last functional. Clearly, we can estimate by b = = X h b ( ) b ( )i = 2 X X = = µ µ + (.47) Under the null hypothesis and the standard assumption that 0 and Ahmad and Li (997) prove the following theorem. Theorem.4 Under some regularity conditions and 0 we have ( ) 2 b () = b 25 (0 )

26 where b 2 =4 P = b ( ) R 2 () and () = (0) ( ) is used to correct for finite sample bias. One can prove the above theorem with a simple application of the CLT for degenerate second order U-statistics. See Theorem.9 in the appendix and Exercise 4 in this chapter..4.3 Comparison with Unknown Densities Comparison of two densities is important in some empirical work. For example, we may be interested in comparing income distributions across two groups, regions, or time periods. Let { } = and { } 2 = be two samples from -dimensional random vectors. Assume that and have density and and distribution functions and respectively. The null hypothesis of interest is 0 : () = () Noticing that = = [ () ()] 2 () ()+ () () 2 () () we can propose a feasible test statistic by replacing and by b b b and b respectively, where b () = P = ( ) b () = P 2 2 = ( ) and b and b are the empirical distributions of { } = and { } 2 = respectively. This leads to b = b () b ()+ b () b () 2 b () b () = X = 2 = X X = b ( )+ X 2 2 = = 2 where = Π = (( ) ) The following theorem states the main result. b ( ) 2 X 2 2 X X 2 = = (( ) ) = Π = = b ( ) 2 2 Theorem.5 Under some regularity conditions and 0 we have ( 2 ) 2 b () h where () = (0) = + 2 i and X X b 2 =2 2 = = b 2 4 X 2 X 2 + = = X 2 X = = (( ) ) and = Π = (0 ) 2 4 X X ( 2 ) 2 = = For a proof of the above result, see Li and Racine (2006). See a variant of the above test, see Li (996). 26

27 .4.4 Testing for Independence Let ( ) 0 be a ( + ) random vector with joint cdf ( ) and pdf ( ) Further let () and 2 () denote the marginal cdf of and with marginal pdf () and 2 () respectively. The null hypothesis of interest is 0 : ( ) = () 2 (). Observing that = = [ ( ) () 2 ()] 2 ( ) ( )+ () () 2 () 2 () 2 () 2 () ( ) = [ ( )] + [ ()] [ 2 ( )] 2 [ () 2 ( )] we can propose a feasible test statistic by replacing ( ), ( ) and 2 ( ) by their leave-one-out kernel estimators b ( ), b ( ) and b 2 ( ). This will lead to the following expression b = X b ( )+ 2 = X = = X b ( ) b 2 ( ) 2 X b ( ) b 2 ( ) = where b ( )= P 6= ( ) ( ) b ( )= P 6= ( ) and b 2 ( ) = P 6= ( ) with ( ) = Π = (( ) ) and ( )= Π = (( ) ) Under certain conditions, Ahmad and Li (997) prove the following theorem. Theorem.6 Under some regularity conditions and 0, we have = ( ) 2 b b (0 ) where b 2 = 2 2 P P = 6= 2 2 with, e.g., = Π = Like the other nonparametric tests defined in this section, large values of is in favor of the alternative and we reject the null hypothesis if the upper -percentile of standard normal distribution. It is worth mentioning that the previously developed theories on kernel density estimation and testing go through under weak data dependence conditions. In the next application, we consider testing for structural change in the time series framework..4.5 Test for Structural Change in Densities Since Page (956), the problem of testing for a structural change has generated much interest in both statistics and econometrics. Early study mainly focused on the case of parameter change in the parametric framework. Recently, much attention has been paid to the whole distribution or density level when testing for structural change. Let { } be a stationary strong mixing process satisfying () =sup ( ) () () : F F + ª 0 27

28 where F = ( ) is the -field generated by and We wishes to test for the change of the marginal density of { } = So the null hypothesis is and the alternative hypothesis is 0 : have a common marginal density ; : for some (0 ) dc have a common density and dc+ have a common density 2 where dc denotes the largest integer less than or equal to and 2 are all assumed unknown. To test 0 define Define dc () = dc X dc = µ ( ) = µ () R 2 () and dc () = ( dc) X =dc+ 2 dc dc () dc () dc µ (.48) provided () 6= 0 If () =0 (.48) is defined to be zero. Under the null 0 we can define a partial sum process: ( ) = = µ () µ () 2 2 dc () 2 () 2 dc X = if () 6= 0 and ( ) if () =0 Then we can write dc () dc () µ µ ( ) = ( ) dc () Lee and Na (2004) shows for fixed { ( ) 0 } converges weakly to a standard Brownian motion process, which implies that { ( ) 0 } converges to a Brownian bridge. Let be distant real numbers. Define = max Lee and Na (2004) prove the following theorem. sup ( ) 0 Theorem.7 Suppose the regularity conditions given in Lee and Na (2004) hold. (i) Under 0 as max 0 () where 0 0 are independent Brownian bridges. (ii) Under as sup 0 if ( ) 6= 2 ( ) for some { } Thus we reject the null if is large enough. In practice, one can tabulate the critical values based on simulations on Brownian bridges. 28

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Confidence intervals for kernel density estimation

Confidence intervals for kernel density estimation Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Nonparametric Inference via Bootstrapping the Debiased Estimator

Nonparametric Inference via Bootstrapping the Debiased Estimator Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

Defect Detection using Nonparametric Regression

Defect Detection using Nonparametric Regression Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare

More information

7 Semiparametric Estimation of Additive Models

7 Semiparametric Estimation of Additive Models 7 Semiparametric Estimation of Additive Models Additive models are very useful for approximating the high-dimensional regression mean functions. They and their extensions have become one of the most widely

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted

More information

Adaptive Nonparametric Density Estimators

Adaptive Nonparametric Density Estimators Adaptive Nonparametric Density Estimators by Alan J. Izenman Introduction Theoretical results and practical application of histograms as density estimators usually assume a fixed-partition approach, where

More information

A Bootstrap Test for Conditional Symmetry

A Bootstrap Test for Conditional Symmetry ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods Robert V. Breunig Centre for Economic Policy Research, Research School of Social Sciences and School of

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Nonparametric Density Estimation (Multidimension)

Nonparametric Density Estimation (Multidimension) Nonparametric Density Estimation (Multidimension) Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann February 19, 2007 Setup One-dimensional

More information

Statistica Sinica Preprint No: SS

Statistica Sinica Preprint No: SS Statistica Sinica Preprint No: SS-017-0013 Title A Bootstrap Method for Constructing Pointwise and Uniform Confidence Bands for Conditional Quantile Functions Manuscript ID SS-017-0013 URL http://wwwstatsinicaedutw/statistica/

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA This article was downloaded by: [University of New Mexico] On: 27 September 2012, At: 22:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Locally Robust Semiparametric Estimation

Locally Robust Semiparametric Estimation Locally Robust Semiparametric Estimation Victor Chernozhukov Juan Carlos Escanciano Hidehiko Ichimura Whitney K. Newey The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper

More information

Semiparametric Estimation of Partially Linear Dynamic Panel Data Models with Fixed Effects

Semiparametric Estimation of Partially Linear Dynamic Panel Data Models with Fixed Effects Semiparametric Estimation of Partially Linear Dynamic Panel Data Models with Fixed Effects Liangjun Su and Yonghui Zhang School of Economics, Singapore Management University School of Economics, Renmin

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Multiscale Adaptive Inference on Conditional Moment Inequalities

Multiscale Adaptive Inference on Conditional Moment Inequalities Multiscale Adaptive Inference on Conditional Moment Inequalities Timothy B. Armstrong 1 Hock Peng Chan 2 1 Yale University 2 National University of Singapore June 2013 Conditional moment inequality models

More information

Kernel Density Estimation

Kernel Density Estimation Kernel Density Estimation and Application in Discriminant Analysis Thomas Ledl Universität Wien Contents: Aspects of Application observations: 0 Which distribution? 0?? 0.0 0. 0. 0. 0.0 0. 0. 0 0 0.0

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

1 Appendix A: Matrix Algebra

1 Appendix A: Matrix Algebra Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

More on Estimation. Maximum Likelihood Estimation.

More on Estimation. Maximum Likelihood Estimation. More on Estimation. In the previous chapter we looked at the properties of estimators and the criteria we could use to choose between types of estimators. Here we examine more closely some very popular

More information

Does k-th Moment Exist?

Does k-th Moment Exist? Does k-th Moment Exist? Hitomi, K. 1 and Y. Nishiyama 2 1 Kyoto Institute of Technology, Japan 2 Institute of Economic Research, Kyoto University, Japan Email: hitomi@kit.ac.jp Keywords: Existence of moments,

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Nonparametric Cointegrating Regression with Endogeneity and Long Memory

Nonparametric Cointegrating Regression with Endogeneity and Long Memory Nonparametric Cointegrating Regression with Endogeneity and Long Memory Qiying Wang School of Mathematics and Statistics TheUniversityofSydney Peter C. B. Phillips Yale University, University of Auckland

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density ISSN 1440-771X Australia Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Bayesian estimation of bandwidths for a nonparametric regression model

More information

Semiparametric Estimation of Partially Linear Dynamic Panel Data Models with Fixed Effects

Semiparametric Estimation of Partially Linear Dynamic Panel Data Models with Fixed Effects Semiparametric Estimation of Partially Linear Dynamic Panel Data Models with Fixed Effects Liangjun Su and Yonghui Zhang School of Economics, Singapore Management University School of Economics, Renmin

More information

The Nonparametric Bootstrap

The Nonparametric Bootstrap The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

A Primer on Asymptotics

A Primer on Asymptotics A Primer on Asymptotics Eric Zivot Department of Economics University of Washington September 30, 2003 Revised: October 7, 2009 Introduction The two main concepts in asymptotic theory covered in these

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

A Novel Nonparametric Density Estimator

A Novel Nonparametric Density Estimator A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with

More information

Averaging Estimators for Regressions with a Possible Structural Break

Averaging Estimators for Regressions with a Possible Structural Break Averaging Estimators for Regressions with a Possible Structural Break Bruce E. Hansen University of Wisconsin y www.ssc.wisc.edu/~bhansen September 2007 Preliminary Abstract This paper investigates selection

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006 Analogy Principle Asymptotic Theory Part II James J. Heckman University of Chicago Econ 312 This draft, April 5, 2006 Consider four methods: 1. Maximum Likelihood Estimation (MLE) 2. (Nonlinear) Least

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Inference via Kernel Smoothing of Bootstrap P Values

Inference via Kernel Smoothing of Bootstrap P Values Queen s Economics Department Working Paper No. 1054 Inference via Kernel Smoothing of Bootstrap P Values Jeff Racine McMaster University James G. MacKinnon Queen s University Department of Economics Queen

More information

Three Papers by Peter Bickel on Nonparametric Curve Estimation

Three Papers by Peter Bickel on Nonparametric Curve Estimation Three Papers by Peter Bickel on Nonparametric Curve Estimation Hans-Georg Müller 1 ABSTRACT The following is a brief review of three landmark papers of Peter Bickel on theoretical and methodological aspects

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

Density Estimation (II)

Density Estimation (II) Density Estimation (II) Yesterday Overview & Issues Histogram Kernel estimators Ideogram Today Further development of optimization Estimating variance and bias Adaptive kernels Multivariate kernel estimation

More information

What s New in Econometrics. Lecture 13

What s New in Econometrics. Lecture 13 What s New in Econometrics Lecture 13 Weak Instruments and Many Instruments Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Motivation 3. Weak Instruments 4. Many Weak) Instruments

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

Testing Homogeneity Of A Large Data Set By Bootstrapping

Testing Homogeneity Of A Large Data Set By Bootstrapping Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

A Goodness-of-fit Test for Copulas

A Goodness-of-fit Test for Copulas A Goodness-of-fit Test for Copulas Artem Prokhorov August 2008 Abstract A new goodness-of-fit test for copulas is proposed. It is based on restrictions on certain elements of the information matrix and

More information

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Chapter 2: Resampling Maarten Jansen

Chapter 2: Resampling Maarten Jansen Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009 MIT OpenCourseWare http://ocw.mit.edu 4.0 Introduction to Statistical Methods in Economics Spring 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

One-Sample Numerical Data

One-Sample Numerical Data One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Maximum-Likelihood Estimation: Basic Ideas

Maximum-Likelihood Estimation: Basic Ideas Sociology 740 John Fox Lecture Notes Maximum-Likelihood Estimation: Basic Ideas Copyright 2014 by John Fox Maximum-Likelihood Estimation: Basic Ideas 1 I The method of maximum likelihood provides estimators

More information

V. Properties of estimators {Parts C, D & E in this file}

V. Properties of estimators {Parts C, D & E in this file} A. Definitions & Desiderata. model. estimator V. Properties of estimators {Parts C, D & E in this file}. sampling errors and sampling distribution 4. unbiasedness 5. low sampling variance 6. low mean squared

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Measure-Transformed Quasi Maximum Likelihood Estimation

Measure-Transformed Quasi Maximum Likelihood Estimation Measure-Transformed Quasi Maximum Likelihood Estimation 1 Koby Todros and Alfred O. Hero Abstract In this paper, we consider the problem of estimating a deterministic vector parameter when the likelihood

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

Kullback-Leibler Designs

Kullback-Leibler Designs Kullback-Leibler Designs Astrid JOURDAN Jessica FRANCO Contents Contents Introduction Kullback-Leibler divergence Estimation by a Monte-Carlo method Design comparison Conclusion 2 Introduction Computer

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:

More information

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure) Previous lecture Single variant association Use genome-wide SNPs to account for confounding (population substructure) Estimation of effect size and winner s curse Meta-Analysis Today s outline P-value

More information

IEOR E4703: Monte-Carlo Simulation

IEOR E4703: Monte-Carlo Simulation IEOR E4703: Monte-Carlo Simulation Output Analysis for Monte-Carlo Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Output Analysis

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Lecture 2: CDF and EDF

Lecture 2: CDF and EDF STAT 425: Introduction to Nonparametric Statistics Winter 2018 Instructor: Yen-Chi Chen Lecture 2: CDF and EDF 2.1 CDF: Cumulative Distribution Function For a random variable X, its CDF F () contains all

More information

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY Time Series Analysis James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY & Contents PREFACE xiii 1 1.1. 1.2. Difference Equations First-Order Difference Equations 1 /?th-order Difference

More information

On the Power of Tests for Regime Switching

On the Power of Tests for Regime Switching On the Power of Tests for Regime Switching joint work with Drew Carter and Ben Hansen Douglas G. Steigerwald UC Santa Barbara May 2015 D. Steigerwald (UCSB) Regime Switching May 2015 1 / 42 Motivating

More information

SINGLE-STEP ESTIMATION OF A PARTIALLY LINEAR MODEL

SINGLE-STEP ESTIMATION OF A PARTIALLY LINEAR MODEL SINGLE-STEP ESTIMATION OF A PARTIALLY LINEAR MODEL DANIEL J. HENDERSON AND CHRISTOPHER F. PARMETER Abstract. In this paper we propose an asymptotically equivalent single-step alternative to the two-step

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

A COMPARISON OF HETEROSCEDASTICITY ROBUST STANDARD ERRORS AND NONPARAMETRIC GENERALIZED LEAST SQUARES

A COMPARISON OF HETEROSCEDASTICITY ROBUST STANDARD ERRORS AND NONPARAMETRIC GENERALIZED LEAST SQUARES A COMPARISON OF HETEROSCEDASTICITY ROBUST STANDARD ERRORS AND NONPARAMETRIC GENERALIZED LEAST SQUARES MICHAEL O HARA AND CHRISTOPHER F. PARMETER Abstract. This paper presents a Monte Carlo comparison of

More information