7 Semiparametric Estimation of Additive Models

Size: px
Start display at page:

Download "7 Semiparametric Estimation of Additive Models"

Transcription

1 7 Semiparametric Estimation of Additive Models Additive models are very useful for approximating the high-dimensional regression mean functions. They and their extensions have become one of the most widely used nonparametric techniques since the excellent monograph by Hastie and Tibshirani (1990) and the companion software as described in Chambers and Hastie (1991). For a recent survey on additive models, see Horowitz (2014). Much applied research in economics and statistics is concerned with the estimation of a conditional mean or quantile function. Specifically, let () be a random pair where is a scalar random variable and is a 1 random vector that is continuously distributed. We are interested in the estimation of either () ( = ) or () = argmin ( ) [ ( ()) = ] the th conditional quantile function of given = : ( () = ) = In a classical nonparametric additive model, or is assumed to have the form or () = 0 + () = 0 + ( ) (7.1) ( ) (7.2) where 0 is a constant, is the th element of and 1 are one-dimensional smooth functions that are unknown and estimated nonparametrically. Model (7.1) or (7.2) can be extended to () = 0 + ( ) (7.3) or () = 0 + ( ) (7.4) where is a strictly increasing function that may be known or unknown. Below we focus on the study of the estimation of model (7.1) and its extension. Then we briefly touch upon the models in (7.2)-(7.4). 7.1 The Additive Model and the Backfitting Algorithm The Basic Additive Model In the regression framework, a simple additive model is defined by = 0 + ( )+ (7.5) where ( 1 )=0 2 1 = 2 ( 1 ) the 0 are arbitrary univariate functions that are assumed to be smooth and unknown. Note that we can add a constant to a component or 0 and subtract the constant from another component in (7.5). Thus the 0 1 are not identified without further restrictions. To prevent ambiguity, various identification conditions can be assumed. For example, one can assume that either [ ( )] = 0 (7.6) 142

2 or or Z (0) = 0 (7.7) () =0 (7.8) whichever is convenient for the estimation method on hand. We also assume that the are smooth functions so that they can be estimated as well as the one-dimensional nonparametric regression problem (Stone, 1985, 1986). Hence, the curse of dimensionality is avoided. Frequently, we will denote below () = ( = ) = 0 + ( ) (7.9) where =( 1 ) 0 and =( 1 ) 0 Model (7.5) allows use to examine the extent of nonlinear contribution of each explanatory variable to the dependent variable Under the identification conditions that [ ( )] = 0 hold for each =1 and ( 1 )=0 we have 0 = ( ) so that the single finite dimensional parameter 0 can be estimated by the sample mean = 1 P Since converges to 0 at the parametric -rate, which is faster than any nonparametric convergence rate, here we will simply work on the model without 0 in (7.5) by assuming ( )=0 Additive models of the form (7.5) have been shown to be useful in practice. They naturally generalize the linear regression models and allow interpretation of marginal changes, i.e., the effect of one variable, say on the conditional mean function () holding everything else constant. They are also interesting from a theoretical perspective since they combine flexible nonparametric modeling of many variables with statistical precision that is typical for just one explanatory variable. Example 7.1 (Additive AR(p) models) In the time series literature, a useful class of nonlinear autoregressive models are the additive models = ( )+ (7.10) In this case, the model is also called additive autoregressive models of order and simply denoted as AAR(). In particular, it includes the AR() model as a special case and allows us to test whether an AR() model holds reasonably for a given time series. Restricting ourselves to the class of additive models (7.5), the prediction error can be written as 2 0 ( ) = [ ( 1 )] 2 + ( 1 ) 0 2 ( ) (7.11) where ( 1 )= ( 1 ) Thus finding the best additive model to minimize the least squares prediction error is equivalent to finding the one that best approximates the conditional mean function in the senses that 0 and ( 1 ) minimize the second term in (7.11). In the case where the additive model is not correctly specified (i.e., Pr( ( 1 )= 0 + P ( )) 1), wecan interpret it as the approximation of the conditional mean function 143

3 7.1.2 The Backfitting Algorithm The estimation of ( 1 ) can easily be done by using the backfitting algorithm in the nonparametric literature. To do this we first introduce some background knowledge on global spline approximation. The local linear modelling cannot be directly applied to fit the additive model (7.5) with =0 To approximate the unknown functions 1 locally at the point 1 we need to localize simultaneously in the variables 1 This yields a -dimensional hypercube, which contains hardly any data points for small to moderate sample sizes, unless the neighborhood is very large. Nevertheless, when the neighborhood is too large to contain enough data points, the approximation error will be large. This is the key problem underlying the curse of dimensionality. To attenuate the problem, we can approximate the nonlinear functions 1 by polynomial splines, or Hermite polynomials, among others. For example, we can approximate () by X ( b )= () (7.12) where b = 0 1 and { ()} can be chosen from some bases of functions, which includes the trigonometric series {sin () cos ()} the polynomial series ª and the Gallant s (1982) flexible Fourier form 2 sin () cos () sin (2) cos (2) ª Below we introduce two popular choices of approximating functions, namely, polynomial splines and Hermite polynomials. Spline methods are very useful for nonparametric modelling. They are based on global approximation and are useful extensions of polynomial regression techniques. Let 1 be a sequence of given knots such that 1 These knots can be chosen either by researchers or data themselves. A spline function of order is a ( 1)th continuously differentiable function such that its restriction to each of the intervals ( 1 ] [ 1 2 ) [ ) is a polynomial function of order 1 The following formal definition is adapted from Eubank (1999, p. 281). Definition 7.1 A polynomial spline function () of order with knots 1 is a function of the form + X () = () (7.13) for some set of coefficients + where () = 1 =1 (7.14) + () =( ) 1 + =1 and ( ) 1 + =max{( ) 1 0} The above definition is equivalent to saying that (i) is a piecewise polynomial of order 1 on any subinterval [ +1 ); (ii) has 2 continuous derivatives, and (iii) has a discontinuous ( 1)st derivative with jumps at Thus, a spline is a piecewise polynomial whose different polynomial segments are joined together at the knots in a fashion that insures the continuity properties. Let ( 1 ) denote the set of all functions of the form (7.14). Then ( 1 ) is a vector space in the sense that the sums of functions 144

4 in ( 1 ) remain in the set, etc. Since the functions 1, 1 ( 1 ) 1 + ( ) 1 + are linearly independent, it follows that ( 1 ) has dimensions + Example 7.2 (Polynomial splines) To restrict to our case, we can approximate ( =1 ) by a polynomial spline of order with knots 1 ª by =0 X 1 X () ' ( ) 1 + ( b ) (7.15) In real applications, for any given number of knots value the knot ª can be simply chosen as the empirical quantiles of i.e., = ( +1)-th quantile of for =1 When the knots are fine enough on the support of which is usually assumed to be compact, the resulting spline function ( b ) can approximate the smooth function quite well. Two popular choices of are =2and 4. When =2 ( b ) is simply the piecewise linear function ( b )= ( 1 ) (7.16) One can easily verify that ( b ) is piecewise linear and continuous, and has kinks at the knots 1 Example 7.3 (Hermite polynomials of order ) We can also approximate the unknown function ( =1 ) by Hermite polynomials of order : ( ) X () ' ( 1 ) exp ( 1) ( b ) (7.17) =0 where 1 and 2 2 can be chosen as the sample mean and sample variance of the data { } Hermite polynomials are often chosen when the underlying variables have infinite support. After the approximation, we can estimate the unknown parameters in the approximation by the least squares method. That is, we choose b 1 b to minimize the following criterion function 1 X { 1 ( 1 b 1 ) ( b )} 2 (7.18) Let the solution be b Then the estimated functions are simply b () = b =1 (7.19) The above least squares problem can be solved directly, resulting in a large parametric problem with an inversion of matrix of high order. Alternatively, the optimization problem can be solved using the backfitting algorithm. Conditional expectations provide a simple intuitive motivation for the backfitting algorithm. If the additive model (7.5) is correct with =0(otherwise replace by minus its sample mean), then for any =1 X ( ) = ( ) (7.20) 6= This immediately suggests an iterative algorithm for computing : 145

5 Step 1. Given the initial values of b 2 b (say from the direct least squares solution), minimize (7.18) with respect to b 1 This is a much smaller parametric problem can can be solved relatively easily. Step 2. With estimated values of b 1 and values of b 3 b we now minimize (7.18) with respect to b 2 This results an updated estimate of b 2 Repeat this exercise until b is updated. Step 3. Repeat Steps 1-2 until certain convergence criterion is met. This is the basic idea of the backfitting algorithm (Ezekiel, 1924, Buja et al., 1989). Let b = P 6= b be the partial residuals without using the regressor Then the backfitting algorithm finds b by minimizing 1 X {b ( b )} 2 (7.21) This is a nonparametric regression problem of b on the variable. The resulting estimate is linear in the partial residuals {b } and can be written as 1 b 2 b. b = S b 1 b 2. b (7.22) where S is the smoothing matrix. For the ease of presentation, denote the left hand side of (7.22) as bg and write Y =( 1 ) 0 Then (7.22) can be written as bg = S Y X bg (7.23) 6= The above example can utilize polynomial splines or Hermite polynomials as a nonparametric smoother. The idea can be applied to any nonparametric smoother. Let S be a smoothing matrix that is obtained by regressing nonparametrically the partial residuals {b } on { } The general backfitting algorithm can be outlined as below. Step 1. Initialize the functions bg 1 bg Step 2. For =1 compute bg = S Y P 6= bg and center the estimator to obtain bg ( ) =bg ( ) 1 where bg ( ) denotes the th element of bg Step 3. Repeat Step 2 until convergence. X bg ( ) (7.24) See Hastie and Tibshirani (1990, p.91) for a discussion on the above algorithm. The re-centering in Step 2 is to comply with the constraint in (7.6). The convergence issue of the algorithm is delicate and has been addressed via the concept of concurvity by Buja et al. (1989). Concurvity is the analogue of 146

6 collinearity in the linear regression models. Assuming that concurvity is not present, it is shown there that the backfitting algorithm converges and solves the following equation 1 bg 1 S 1 S 1 S 1 1 bg 2. = S 2 S 2 S (7.25) bg S S S Direct calculation of the right-hand-side of (7.25) involves inverting a square matrix and can hardly be implemented on an average computer for moderate to large sample sizes. In contrast, the backfitting does not share this drawback and is frequently used in practical implementations. To add: Mammen, Linton, and Nielsen (1999, AoS,The existence and asymptotic properties of a backfitting projection algorithm under weak conditions) Generalized Additive Models: Logistic Regression As Hastie and Tibshirani (1990) remark, the linear model is used for regression in a wide variety of contexts other than the ordinary regression, including log-linear models, logistic regression, the proportionalhazards model for survival data, models for ordinal categorical response, and transformation models. It is a convenient but crude first-order approximation to the regression function, and in many cases it is adequate. The additive model can be used to generalize all these models in an obvious way. For clarity, we focus on the logistic regression model. In this setting the response variable is dichotomous, such as yes/no or survived/died, increase/decrease, and the data analysis is aimed at relating this outcome to the predictors. One quantity of interest is the proportion of outcome as a function of the predictors (explanatory variables). In linear modelling of binary data, the most popular approach is logistic regression which models the logit of the response probability with a linear form ½ logit { ()} log () 1 () ¾ = 0 (7.26) where () = ( =1 ) Alternatively, we can write () = exp (0 ) 1+exp( 0 (7.27) ) There are several reasons for its popularity, but the most compelling is that the logit model ensures that the proportions () lie in (0 1) (see (7.27)) without any constraints on the linear predictor 0 We can generalize the model in (7.26) by replacing the linear predictor with an additive one ½ ¾ () log = + ( ) (7.28) 1 () or that in (7.27) to where () = exp() 1+exp() () = + ( ) (7.29) denotes the CDF of the standard logistic distribution. Thus (7.29) is a special case of the generalized additive model in (7.4) where is a strictly increasing function that has a known functional form. 147

7 Insight from the Linear Logistic Regression Model To estimate the model (7.28), we can gain some insight from the linear logistic regression methodology. Maximum likelihood is the most popular method for estimating the linear logistic model. For the present problem the log-likelihood has the form () = { log ( ( )) + (1 )log(1 ( ))} (7.30) where ( )=exp( 0) (1 + exp (0 )) The score equations () = X [ ( )] = 0 (7.31) are nonlinear in the parameters and consequently one has to find the solution iteratively. The Newton-Raphson iterative method can be expressed in an appealing form. Given the current estimate b we can estimate the probabilities ( ) by =exp 0 b 1+exp 0 b We form the linearized response = 0 b +( ) { (1 )} (7.32) where the quantity represents the first-order Taylor s series approximation to logit( ) about the current estimate 3 Denote =( ) { (1 )} If b and hence are fixed, the variance of is 1/{ (1 )} and hence we choose the weights (1 ) Alternatively, we can verify that ( )=0and 2 1 = (7.33) ( )(1 ( )) in the extreme case where = ( ) So when approximates ( ) we expect that ( ) 0 and 2 1 ( )(1 ( )) 1 (1 ) (7.34) Consequently, a new b can be obtained by weighted linear regression of on with weights (1 ) This is repeated until b converges. Algorithm The above iterative algorithm lends itself ideally to the generalized additive model in (7.28). Define = e + e ( )+( ) { (1 )} (7.35) where (e e ) are the current estimates for the additive model components and exp e + P e ( ) = 1+exp e + P (7.36) e ( ) Define the weights = (1 ) (7.37) The new estimates of 0 and ( =1 ) are computed by fitting a weighted additive model to Of course, this additive model fitting procedure is iterative as well. Fortunately, the functions from the previous step are good starting values for the next step. This procedure is called the local-scoring algorithm in the literature. The new estimates from each local scoring step are monitored and the iterations are stopped when their relative change is negligible. 3 Pretending that isboundedawayfrom0and1andiscloseto,wehave,bythefirst order Taylor expansion, that logit( )=log log (1 ) ( ) 148

8 7.2 The Marginal Integration Method The Marginal Integration Estimator Let ( 1 ) be a random sample from the following additive model = 0 + ( )+ (7.38) where ( 1 ) = = 2 ( 1 ) and the { ( )} is a set of unknown functions satisfying [ ( )] = 0. We follow Chen et al. (1995) to define the marginal integration estimator. Let ( ) be the marginal density of Let ( 1 )= 0 + P ( ) be the conditional mean function. For any {1 } we define =(1 1+1 ) 0 The joint density of =( ) 0 is denoted as Then for a fixed =( 1 ) 0 the functional Z () Y6= (7.39) is 0 + ( ) Let ( ) and ( ) be kernel functions with compact support. Let ( ) = 1 ( ) and ( ) = ( 1) ( ) (7.40) Using the Nadaraya-Watson (NW) kernel method to estimate the mean function ( ) we average over the observations to obtain the following estimator. For 1 and any in the domain of ( ) define b ( ) = 1 e ( ) "P = 1 # ( ) P ( ) " # 1 ( ) = P ( ) (7.41) If the regressors were independent, we might use P ( ) P ( ) to estimate ( ) This is a one-dimensional NW estimator. Nevertheless, this estimator has larger variance in comparison to the above estimator even in this restricted situation, see Härdle and Tsybakov (1995). Let ( ) denote the joint density of =( 1 ) 0 The following assumptions are modified from Chen et al. (1995). Assumptions A1. { } is IID. 4 A2. The densities and are bounded, Lipschitz continuous and bounded away from zero. The function has Lipschitz continuous derivatives. A3. The conditional variance function 2 () = 2 = is Lipschitz continuous. 149

9 A4 The kernel function ( ) is a bounded nonnegative second order kernel that is compactly supported and Lipschitz continuous. 02 = R 2 () and 21 = R 2 () A5. The kernel function is a bounded, compactly supported, and Lipschitz continuous. is a th order kernel with kk 2 2 = R 2 () A6. As 0 = 0 15 and 1 log Theorem 7.2 Suppose that Assumptions A1-A6 hold and the order of satisfies ( 1) 2 Then 25 {b ( ) ( ) 0 } ( ( ) ( )) (7.42) where ( ) = ( ( )+ 0 ( ) ( ) = Z 2 () 2 () Z () ) () Proof. See Chen et al. (1995). Theorem 7.2 says that the rate of convergence to the asymptotic normal limit distribution does not suffer from the curse of dimensionality. Nevertheless, to achieve this rate of convergence, we must impose some restrictions on the bandwidth sequences and choices of kernels. Note that the above bandwidth condition does not exclude the optimal one dimensional smoothing bandwidth of the 15 rate for both and when 4 More importantly, we can take = 15 When 5 we can no longer use at the rate 15 and we need to use higher order kernel to reduce the bias that is associated with the use of Estimation of the regression surface Define b () = b ( ) ( 1) b (7.43) where b = 1 P The following theorem gives the asymptotic distribution of b (). Theorem 7.3 Under the conditions of Theorem 7.2, where () = P ( ) and () = P ( ). 25 { b () ()} ( () ()) (7.44) Proof. See Chen et al. (1995). Theorem 7.3 says that the covariance between b ( ) and b ( ) for 6= is asymptotically negligible, i.e., it is of smaller order than the variances of each component function estimator Marginal Integration Estimation of Additive Models with Known Links For many situations, especially binary and survival time data, the model (7.38) may not be appropriate. In the parametric case, a more appropriate framework of modelling is provided by generalized linear 150

10 models of McCullagh & Nelder (1989). Hastie and Tibshirani (1991) extend these ideas to nonparametric modelling. In the nonparametric case, the model can be fully or partially specified. In the full model specification case, the conditional distribution of given is assumed to belong to an exponential family with known link function and mean function such that { ()} = 0 + ( ) (7.45) where [ ( )] = 0. This model is usually called a generalized additive model in the literature. It implies, for example, that the variance is functionally related to the mean. In the partial model specification case, we keep the form (7.45) but don t restrict ourselves to the exponential family. In this latter case, the variance function is unrestricted. Example 7.4. Clearly, when is the identity function we have the additive regression model examined above. Other examples include the logit and probit link functions for binary data, and the logarithm transform for Poisson count data (McCullagh & Nelder 1989, p.30) and the Box-cox transformation (e.g., 1 ). It also incudes cases when the regression function is multiplicative. The backfitting procedure in conjunction with Fisher scoring is widely used to estimate (7.45) (Hastie and Tibshirani 1991, p.141). It exploits the likelihood structure. Nevertheless, it is even less tractable from a statistical perspective when is not the identity because the estimate is not linear in Estimation of the Additive Components Linton and Härdle (1996) propose a marginal integrationbased method of estimating the components in (7.45). The main advantage of their method is that one can obtain its asymptotic properties. They also suggest how to take into account of the additional information provided by the exponential family structure. Noticing that under the additive structure (7.45), Z ( ) ª = 0 + ( ) (7.46) The general strategy is to replace both and in (7.46) by their estimates. We estimate by e X = (7.47) where ( ) can be estimated by its sample analogue: = ( ) P ( ) (7.48) b ( )= 1 X e ª (7.49) When is the identity function, b ( ) is linear in, i.e., b ( )= P ( ) (c.f. equation (7.41)), where ( )= 1 In general, b ( ) is a nonlinear function of X 151

11 The above procedure is carried out for each =1 We can obtain estimates of each ( ) evaluated at each sample point. Let We reestimate () by b =() 1 b ( ) b ( )=b ( ) b b () = 1 (b + ) b ( ) where 1 is the inverse function of Let b = b ( ) be the additive regression residuals which estimate the errors = ( ) These residuals can be used to test the additive structure, i.e., to look for possible interactions ignored in the simple additive structure. When (7.45) is true, b should be approximately uncorrelated with any function of We modify Assumption A6 to be A6* As 0 = and 25 1 Theorem 7.4 Suppose that Assumptions A1-A6 and A6* hold and the order of satisfies ( 1) Assume that is twice continuously differentiable. Then 25 {b ( ) ( ) 0 } ( ( ) ( )) (7.50) where ½ Z Z 1 ( ) = ( ) 0 ( ()) + 0 ( ) Z ( ) = [ 0 ( ())] 2 2 () 2 () 0 ( ()) ¾ ln () Proof. See Linton and Härdle (1996). As Linton and Härdle (1996) remark, we can use the local linear smoother as a pilot in place of the NW estimator. In this case, the asymptotic variance of b ( ) will be the same but the bias will take the simpler form ½ Z 1 ( )= ( ) 0 ( ()) ¾ To construct the asymptotic confidence interval of b ( ) we need to estimate ( ) and ( ) It is easy to show that ( ) can be consistently estimated by where b = 1 b ( )= b b 2 n 0 b o The formula for the estimate of ( ) is quite complicated and we thus omit it. Nevertheless, in case of undersmoothing, i.e., = 15 it suffices to estimate ( ) Alternatively, we can use bootstrap method (e.g., wild bootstrap) for approximating the desired asymptotic confidence interval. 152

12 Estimation of the Regression Surface After we obtain estimates for the additive component, we can obtain estimates for the regression function. Note that we don t limit ourselves to the exponential family structure discussed earlier. Theorem 7.5 Suppose the conditions of Theorem 7.4 hold and 1 is twice continuously differentiable. Then 25 { b () ()} ( () ()) (7.51) where () = 0 ( 0 + P ( )) P ( ) and () ={ 0 ( 0 + P ( ))} 2 P ( ). Proof. The result follows from Theorem 7.4, the delta method, and the fact that b ( ) and b ( ) are asymptotically independent for 6= Theorem 7.5 says that the rate of convergence of b () is free from the curse of dimensionality as desired. Remark. Yang, Sperlich, and Härdle (2003, JSPI) studies the derivative estimation of generalized additive models via the kernel method. In addition, they study the hypothesis testing on derivatives Efficient Estimation of Generalized Additive Models Recently, Linton (2000, ET) defines new procedures for estimating the generalized additive nonparametric regression models that are more efficient than the Linton and Härdle (1996). He considers criterion functions based on the linear exponential family. When the linear exponential family specification is correct, the new estimator achieves certain oracle bounds. For brevity, we refer the reader to Linton (2000). 7.3 Additive Partially Linear Models In this section we introduce two methods to estimate additive partially linear models. One is the series method of Li (2000, Intl Economic Review)) and the other is the kernel method of Fan and Li (2003, Statistica Sinica) Series Estimation A typical additive partially linear model is of the form = ( 1 )+ + ( )+ (7.52) where ( 1 ) = 0 is a 1 vector of random variables that does not contain a constant term, 0 is a 1 vector of unknown parameter; is of dimension ( 1 ); ( ) are unknown smooth functions. Denote by the non-overlapping variables obtained from ( 1 ). is of dimension with P In practice, the most widely used case is where =1 so that is a scalar random variable and =. Clearly, the individual functions ( ) ( =1 ) arenotidentified without some identification conditions. In the literature of kernel estimation, it is convenient to impose [ ( )] = 0 for all =2 Nevertheless, such conditions are not easily imposed for series estimation. Instead, here it is more convenient to impose (0) = 0 =2 (7.53) 153

13 To construct a series estimator for the unknown parameters in the model, we use Li s (2000) definition of the class of additive functions. Definition. Afunction () is said to belong to an additive class of functions G ( G) if (i) () = P ( ) ( ) is continuous in its support Z which is a compact sunset of R ( =1 ); (ii) P h ( ) 2i ; (iii) (0) = 0 for =2 When () is a vector-valued function, we say that G if each of its component belongs to G In vector notation, we can write (7.52) as = = (7.54) where = ( 1 ) 0 = ( 1 ) 0 = ( ( 1 ) ( )) 0 = ( 1 ) 0 and = For =1 we shall use a linear combination of functions, approximate ( ) Let () =[ 1 1 ( 1 ) 0 ( )=[ 1 ( ) ( )] 0 to ( ) 0 ] 0 where = P The linear combination of () forms an approximation function for () ( 1 )= P ( ) The approximation function has the following properties: (1) it belongs to G and (2) as grows for all =1 there is a linear combination of () that can approximate any G arbitrarily well in the mean squared error sense. Define 0 h = ( 1 ) ( )i ( =1 ), (7.55) = ( 1 ) Note that is of dimension and is of dimension Define = ( 0 ) 0 where is the symmetric generalized inverse of For a matrix with rows, let e = If we premultiply both sides of (7.54) by then we have Subtracting (7.56) from (7.54) gives e = So we can estimate 0 by regressing e on e to obtain b = e = e 0 + e + e (7.56) e 0 +( e)+ e (7.57) 0 e e e 0 e (7.58) After obtaining b we can estimate () = P ( ) by b () = () 0 b where b =( 0 ) 0 e b (7.59) ( ) can also be estimated easily based upon b For example, b 1 ( 1 )= 1 1 ( 1 ) 0 b 1 where b 1 is a collection of the first 1 elements in b Define () ( = ) and 2 ( ) Var( = = ) We use () to denote the projection of () onto G That is, () = [ ()] where by the definition of ( ) we know () is 154

14 an additive function, i.e., () = P ( ) G and ( ) is the solution of the following minimization problem: [ ( ) ( )] [ ( ) ( )] 0ª " #" # 0 = inf = ( G ) ( ) ( ) ( ) Noting that ( ) is of dimension 1, wecanwrite () = (1) () () () 0 To state the main result, we need to make some assumptions. Assumptions. A1. (i) {( ) } are IID. The support of ( ) is a compact subset of R + ; (ii) Both () ( = ) and 2 ( ) Var( = = ) are bounded functions on the support of ( ) A2. (i) For every there is a nonsingular matrix such that for () = () the smallest eigenvalue of [ ( ) ( ) 0 ] is bounded away from zero uniformly in ; (ii) there is a sequence of constants 0 () satisfying sup Z () 0 () and = such that ( 0 ()) 2 0 as where Z is the support of A3. (i) For = P there exist some 0 and = () =( ) 0 such that sup Z () () 0 P = as min{ 1 } where =( ) 0 ; (ii) P 0 as A4. Φ = [ ( )] [ ( )] 0ª is positive definite. Assumption A1 is quite standard in the literature of estimating additive models. Assumption A2 ensures that ( 0 ) is asymptotically nonsingular. While Assumptions A2-A3 are not primitive conditions, it is known that many series functions satisfy these conditions. Newey (1997) gives primitive conditions for power series and splines such that Assumptions A2-A3 hold. For power series, 0 () = (); for B-splines, 0 () =( ) For =1 if the density ( ) of is continuously differentiable of order then = The following theorem states the asymptotic property of b Theorem 7.6 Under Assumptions A1-A4, we have b 0 0 Φ 1 ΨΦ 1 where Ψ = 2 ( ) 0 and = ( ) For statistical inference, we need to obtain consistent estimates of Φ and Ψ. Li (2000) shows that we can estimate them consistently by X bφ = 1 e e 0 and bψ = 1 X b 2 e e 0 where e is the th row of e = and b = 0 b b ( ) Li (2000) also gives the convergence rate of b () = () 0 b to () = P ( ) 155

15 Theorem 7.7 Under Assumptions A1-A4, we have p P (i) sup Z b () () = ( 0 ()) + (ii) 1 P [b ( ) ( )] 2 = + P 2 ; (iii) R [b () ()] 2 () = + P 2 The property of b ( ) is similar and thus omitted for brevity. ; where ( ) is the cdf of Kernel Method We now study the kernel method of estimating the additive partially linear models. Like Fan and Li (2003), we consider the additive partially linear model = ( 1 )+ + ( )+ (7.60) where ( 1 )=0 is a 1 vector of random variables that does not contain a constant term, 0 is a scalar parameter, = 1 0 is a 1 vector of unknown parameters; the 0 are univariate continuous random variables, and ( ) are unknown smooth functions. Let =( ) where is removed from =( 1 ). Define Then we can rewrite (7.60) as = 1 ( 1 )+ + 1 ( 1 )+ +1 ( +1 )+ + ( ) = ( )+ + (7.61) Fan, Härdle, and Mammen (1998) consider the case where is a 1 vector of discrete variables and suggest two ways of estimating model (7.61). In neither method did they make full use of the information that enters the regression function linearly. Motivated by this observation, Fan and Li (2003) consider a two-stage estimation procedure which applies to the case where contains both discrete and continuous elements and makes full use of the information that enters the regression function linearly. For =1 define ( ) = = = ( )= ( ) ( ) = = = ( )= ( ) Denote = ( ) and = ( ) Then taking conditional expectations on both sides of (7.61) gives ( )= 0 + ( ) 0 + ( )+ (7.62) Integrating both sides of (7.62) over leads to ( )= 0 + ( ) 0 + ( ) (7.63) wherewehaveusedtheidentification condition that =0 Replacing in (7.63) by and then summing both sides of (7.63) gives = ( ) (7.64)

16 Subtracting (7.64) from (7.60), we can eliminate P ( ) and get Ã! 0 =(1 ) (7.65) as Let Y = P and X = 1 ( P ) 0 Then in vector notation we can write (7.65) Y = X + (7.66) where Y and X are 1 and ( +1) matrices with th rows given by Y and X respectively, =( 1 ) 0 and = with 0 =(1 ) 0 We can apply OLS regression to (7.66) to obtain Ã! 0 = =(X 0 X ) 1 X 0 Y = +(X 0 X ) 1 X 0 (7.67) Under standard conditions, we can show that converges to at the parametric -rate. Nevertheless, is an infeasible estimator because it depends on the unknown quantities P and P. To obtain a feasible estimator of we need to replace these unknown quantities by their consistent estimates. A consistent estimator of = ( ) is given by b = 1 = 1 ½P ¾ ( ) ( ) P ( ) ( ) X ( ) ( ) P ( ) ( ) (7.68) where the definition for is clear, () = 1 ( ) () = ( 1) and are kernel functions, and are smoothing parameters. Fan and Li (2003) use the leave-one-out method to obtain which can only simplify the proofs but does not change the asymptotic results. Note that P =1 b is a weighted average of Similarly, a consistent estimator of = ( ) is given by b = (7.69) Let Y b = P b and X b = 1 P b We can obtain a feasible estimator of by replacing Y and X in (7.67) by Y b and X b respectively. Nevertheless, noting that near the boundary the density () = ( 1 ) of cannot be estimated well. So Fan and Li (2003) consider trimming observations near the boundary. Assume that [ ] where are both finite constants, =1 Define the trimming set S = Π [ + ] where = for some =max{ } Let 1 = 1 ( S ) Fan and Li (2003) estimate by b = Ã b 0 b! Ã 1 X = bx 0 X b bx 0 Y b = bx b X 0 1! 1 X bx b Y 1 (7.70) where b Y and b X are 1 and ( +1)matrices with th rows given by b Y 1 and b X 1 respectively. To state the main result,we make the following assumptions. 157

17 Assumptions. A1. { } are IID; ( ) has a finite support with the support of beingaproductset Π [ ]; the density function of is bounded from below by a positive constant on its support; 4 A2. ( ) ( ) ( ) and ( 1 ) all belong to G 4 where 2 is an integer. Let = where and denote the continuous and discrete components of respectively, then for all values of 2 ( 1 )=( 2 = 1 = 1 = ) G1 4 G is defined in Section h A3. Φ = ( P )( P ) 0i is positive definite. A4. The kernel functions and are bounded, symmetric, and both are of order A5. As and ( ) 0 Note that when = = Assumption A5 implies that 32 and 2 0 which allows the use of a second order kernel if 5 It also implies that the data needs to be undersmoothed. The following theorem shows the asymptotic property of b Theorem 7.8 Under Assumptions A1-A5, we have b (0 Φ 1 ΨΦ 1 ) where Ψ = 2 0 with = +( ( )) (1 P ) = ( ) = ( ) P ( ) and = ( ) Here ( ) ( ) and ( ) are the density functions of and respectively. Given the -consistent estimator b we can rewrite (7.60) as 0 b = 0 + = 0 + ( )+ + 0 ( )+error term. The intercept term 0 can be -consistently estimated by b (7.71) b 0 = 0 b where = 1 P and = 1 P Note that (7.71) is essentially an additive regression model with 0 b as the new dependent variable and + 0 b as the new error term. Since b = 12 a rate faster than any nonparametric convergence rate, the asymptotic distribution of any nonparametric estimator of ( ) based on (7.71) will remain the same as if one replaces b by To make statistical inference on, weneedtoestimateφ and Ψ consistently. Let b ( ) b ( ) and b ( ) denote the kernel estimators of ( ) ( ) and ( ) respectively. That is, X b ( ) = 1 ( ) b = 1 X b ( ) = 1 ( ) 158

18 Let b denote a kernel estimator of ( ) Define b = b ( ) b b = b à b = b b = b +(b ) 1 b = b 0 0 b Then we can estimate Φ and Ψ consistently by à bφ = 1 and b ( ) bψ = 1 b!ã b 2 b b0 7.4 Specification Test for Additive Models b = 1! b b! Test for Additive Partially Linear Models via Series Method Li, Hsiao, and Zinn s (2003, JoE) consider consistent specification tests for semiparametric/nonparametric models where the null models all contain some nonparametric components based on series estimation methods. A leading case is to test for an additive partially linear model. The null hypothesis is 0 : ( )= 0 + ( ) a.s. for some B X b ( ) G, (7.72) where is a 1 vector of regressors, is of dimension 1 is an 1 unknown parameter, is a 1 non-overlapping variables of =1 B is a compact set of R and G is the class of additive functions defined in the last section. In particular, the identification condition is (0) = 0 for =2 The alternative hypothesis is 1 : ( ) 6= 0 + ( ) (7.73) on a set with positive measure for any B and any P ( ) G. Let = 0 P ( ) and =( ) The null hypothesis 0 is equivalent to 0 : ( )=0a.s. (7.74) Noting that ( )=0a.s. if and only if ( ( )) = 0 for all ( ) A, the class of bounded ( )- measurable functions, Li, Hsiao, and Zinn (2003) follow Bierens and Ploberger (1997), Stute (1997), and Stinchcombe and White (1998), and consider the following unconditional moment test: [ ( )] = 0 for almost all = R + (7.75) where ( ) is a proper choice of weight function so that (7.75) is equivalent to (7.74). Stinchcombe and White (1998) show that there exists a wide class of weight functions ( ) that makes (7.75) equivalent 159

19 to (7.74). Choices of weight functions include the exponential function ( ) = exp( 0 ) the logistic function ( )=1(1 + exp( 0)) with 6= 0 the trigonometric function ( )= cos( 0)+sin( 0) and the usual indicator function ( )=1 ( ) See Stinchcombe and White (1998) and Bierens and Ploberger (1997) for more discussions. The goodness of switching from a conditional moment test of (7.74) to an unconditional moment test of (7.75) is to avoid the estimation of the alternative model nonparametrically, as in Chen and Fan (1999), and Delgado and González Manteiga (2001). If is observable, one can construct a test based upon the sample analogue of [ ( )] = 0 : 0 () = 1 X ( ) (7.76) 0 ( ) can be viewed as a random element taking values in the separable space L 2 (=) of all real, Borel measurable functions on = such that R = ()2 () 4 which is endowed with the 2 norm ½Z 12 kk = () 2 ()¾ = Chen and White (1997) show that for a sequence IID process { ( )}. L 2 (=)-valued elements, 12 P ( ) converges weakly to Z ( ) in the topology of (L 2 (=) ) if and only if R h = () 2i (), wherez is a Gaussian element with the same covariance function Ω ( 0 )=[ () ( 0 )] Let 2 ( )= 2 It is easy to check that for ( ) = ( ) we have h i k ( )k 2 ½Z = ¾ ½ Z 2 ( ) 2 () = 2 ( ) 2 ( ) 2 Z = () ¾ ( ) 2 () provided the weight function ( ) is bounded above by on = =.Thisimpliesthat 0 ( ) converges weakly to 0 ( ) in L 2 (= ) where 0 ( ) is a Gaussian process centered at zero and with covariance function Ω ( 0 )= [ () ( 0 )] = 2 ( ) ( ) ( 0 ) (7.77) Since is unobservable, we can replace it by a consistent estimate of it, say b and construct a feasible test statistic as b () = 1 X b ( ) (7.78) There are several ways to obtain b One is to apply the series method of Li (2000) to obtain consistent estimates of the parameters in the additive partially linear models first. Another is to apply the kernel method of Fan and Li (2003). In either case, let b be the consistent estimator of and b ( ) be the consistent estimator of ( ) Then we estimate consistently by b = 0 b b ( ) 4 A separable space is topological space that has a countable dense subset. An example is the Euclidean space R 160

20 Now, one can construct a Cramer-von Mises statistic for testing 0 : Z = b () 2 () = 1 b ( ) 2 where ( ) is the empirical distribution of 1 Alternatively, one can construct the Kolmogorov- Smirnov statistic: =sup b () or = max b ( ) = 1 The following theorem states the asymptotic distributions of the test statistics. Theorem 7.9 Under some conditions and under 0 (i) b ( ) converges weakly to ( ) in L 2 (= ) where ( ) is a Gaussian process with zero mean and certain covariance function Ω 1 ( 0 ) where Ω 1 ( 0 )= 2 ( ) () ( 0 ) () = ( ) () () with () = [ ( )] () = [ ( ) ][ ( 0 )] 1 = [ ] and [ ] denotes the projection onto the space G of additive functions. R (ii) [ ()] 2 () and sup = () where ( ) is the cdf of Li, Hsiao and Zinn (2003) consider a special case of the additive partially linear model where = ( ) is a 1 vector of known functions. In their case, = 2 = 2 = 2 ( ) and Ω 1 ( 0 )=Ω 1 ( 0 )= 2 ( ) () ( 0 ) where () = ( ) () () with () = [ ( )] () = [ ( ) ][ ( 0 )] 1 and = ( ) [ ( )] The asymptotic null distribution is not pivotal. Li, Hsiao and Zinn (2003) suggest using a residualbasedwildbootstrapmethodtoapproximatethecritical values for the null limiting distributions of and The procedure is standard and we only discuss it briefly here based on the method of sieves. Let denote the wild bootstrap error that is generated via a two-point distribution: =[(1 5)2]b with probability (1 + 5)(2 5) and =[(1+ 5)2]b with probability ( 5 1)(2 5) We generate accordingtothenullmodel = 0 b + ( ) 0 b + Basedonthewildbootstrapsample{( )} we can re-estimate the model under the null to obtain estimates b and b Let b = 0 b ( ) 0 b the bootstrap residual. Then the bootstrap test statstic is given by b () = 1 X b ( ) Using b () we can compute a bootstrap version of the statistic, i.e., = 1 P [ b ( )] 2 The bootstrap version of the statistic is analogously defined. 161

21 7.5 Generalized Additive Partially Linear Models In this section, we introduce Gozalo and Linton s (2001) estimation and test of generalized additive partially linear models. Let ( ) be a random vector with of dimension, of dimension and as scalar. Let ( ) = ( = = ) ( ) has a generalized additive partially linear structure: 0 ( ( )) = ( ) (7.79) where ( ), Θ R is a parametric family of transformations (a link function which is known up to the finite dimensional parameter ), is a 1 vector of unknown parameters, =( 1 ) are the -dimensional random variables, and ( ), =1 are one-dimensional unknown smooth functions. For the identification purpose, it is convenient to assume that [ ( )] = 0 For future use, let 0 =( ) There are many cases where the specification in (7.79) can arise. It allows us to nest the renown standard logit and probit models in more flexible structures. Also, it nest the Box-Cox-type of transformation as a special case where () = 1 Example 7.5 (Misclassification of a binary dependent variable). The specification in (7.79) can arise from a misclassification of a binary dependent variable as in Copas (1988) and Hausman, Abrevaya and Scott-Morton (1998). Suppose that! ( =1 = = ) = ( ) = Ã ( ) (7.80) for some known link function but that when =1we erroneously observe =0with probability 1 and =0we erroneously observe =1with probability 2 Then which is of the form (7.79) with 1 ( =1 = = ) = ( =1 = = )(1 1 )+ ( =0 = = ) 2! = 2 +(1 1 2 ) Ã ( ) = 2 +(1 1 2 ) Estimation To simplify presentation, we shall follow Gozalo and Linton (2001) and assume that is continuous while is discretely valued. Specifically, { 1 } Let =( ) and =( ) The values and take will be written as =( ) and =() Like before, for any =1 we shall partition =( ) and =( ) Let ( ) ( ) and ( ) denote the conditional density functions of and given = respectively. Define ( ) = ( ) (7.81) ( ) which is a measure of the dependence between and in the conditional distribution. Obviously, ( ) lies between zero and infinity, and when and are independent given = it is one. 162

22 Let X be the support of and X 0 be a rectangular strict subset of X, i.e., X 0 = Π X 0 X 0 =1 are intervals. Let X 0 = Π 6= X 0 For any =( ) let Z 0 ( ; ) = 0 ( ( )) 0 (7.82) 0 (; ) = Ã + 0 +! 0 ( ; ) (7.83) where = 1 When the additive structure in (7.79) is true, 0 ( ; 0 )= ( ) and 0 (; 0 )= ( ) for all (7.82) and (7.83) form the basis of the so-called marginal integration method of estimating additive nonparametric models. As we shall see, the estimation of the model (7.79) is similar to the idea profile least squares. That is, one can first assume that =( ) is known and estimate the additive index using the integration method. Then one can estimate by generalized method of moments. Estimation of for given We can estimate () in two ways. One is consistent when (7.79) holds, and the other is consistent more generally. First, we estimate () by the NW method P b () = ( )1( = ) P ( )1( = ) (7.84) where ( )=Π ( ) ( ) = 1 ( ) ( ) is a one-dimensional kernel of order and = () is a bandwidth sequence. The property of b () is standard. At any interior point, µ b () ()! 1 () (0 02 ()) (7.85) where = R () and () = 2 () ( ) where 2 () =Var( = ) is the conditional variance function. The bias function () is the probability limit of () where () = 1 ( ) ( )1( = ) { ( ) ( )} When ( ) satisfies the generalized additive model structure (7.79), we can estimate () with a better rate of convergence by imposing the additive restrictions. The empirical versions of 0 ( ; ) and 0 ( ; ) are given by e ( ; ) = 1 0 b 1 X 0 0 (7.86) e ( ) = Ã + 0 +! e ( ; ) (7.87) where a bandwidth 0 = 0 () is used throughout this estimation. The asymptotic properties of these estimators can be studied using the similar methods to those of Linton and Härdle (1996) who derive the pointwise asymptotic properties of e ( ; ) and e ( ) in the absence of discrete variables. Specifically, p µ 0 e ( ) ( ) 0! 1 0 (; ) ( (; )) (7.88) 163

23 where = R () 0 (; ) = 0 ( ( ())) P 0 (; ) 0 (; ) = 0 ( ( ())) 2 P 0 (; ) with 0 (; ) = R 0 ( ()) (; ) and 0 (; ) = R 0 ( ())2 (; ) 2 Estimation of Gozalo and Linton (2001) suggest estimating the parameters in by the generalized method of moments. That is, they choose Ψ to minimize the criterion function 1 () = ( em ( ; ) ) 2 (7.89) where Ψ is a compact parameter space in R ++1 ( ) is a given moment function, is a symmetric and positive definite weighting matrix, em ( ; ) =(e ( ; ) e ( ; ) ) and kk 2 = 0 for any positive definite matrix and conformable vector To study the asymptotic properties of the estimator b obtained from minimizing the above criterion function, we suppose that the 1 vector ( + +1) of moment functions satisfy m 0 ( ; ) =0 (7.90) where m 0 ( ; ) = 0 ( ; ) 0 ( ; ) if and only if = 0 For example, from a Gaussian likelihood for homoskedastic regression we obtain m 0 ( ; ) =( ( ; )) ( ; ) (7.91) while from a binary choice likelihood we obtain m 0 ( ; ) ( ; ) ( ; ) = (7.92) ( ; )(1 ( ; )) Given the estimate b ( m) we can finally estimate () by e ; b Let ( m)= ( m (; ) ) and ( ) = m 0 (; ) Let ( ) = ( ) 0 = [ ( )] Define ( m) R ( m (; ) )= and R ( ) =R m 0 (; ) m m=m(;) Under weak conditions, we have 1 X X R 0 em ( ; ) m 0 ( ; ) (0) for some finite positive definite matrix The following theorem states the asymptotic property of the GMM estimator b Theorem 7.10 Under certain conditions, µ b The above theorem implies that the optimal choice of the weighting matrix is given by = b 1 where b Now the asymptotic variance becomes

24 7.5.2 Specification Test We now test for the validity of the additive specification (7.79) of the regression function () over a subset of interest J 0 R + of the support of The null hypothesis is H 0 : () = 0 (; 0 ) for some 0 Ψ and all J 0 (7.93) The alternative hypothesis H 1 is the negation of H 0 Gozalo and Linton (2001) consider replacing 0 by their estimator b which is -consistent under the null hypothesis. Let be a family of monotonic transformations. They consider the following test statistics: b 0 = 1 b 1 = 1 b 2 = b 3 = h ( b ( )) e ; i b 2 ( ) (7.94) h e b ( ) e ; b i ( ) (7.95) 6= e e ( ) ( ) (7.96) (e b ) 2 ( ) (7.97) where = (( ) )1( = ) b = b ( ) and e = e ; b are the unrestricted and restricted (additive) residuals, respectively, and ( ) is a pre-specified nonnegative weighting function, which is continuous and strictly positive on J 0 In (7.94), likely candidates for the function are = and = One reject the null for large values of b =0123 Under certain conditions, the test statistics b =0123 are, after location and scale adjustment, asymptotically standard normal under the null hypothesis. That is, for each test statistic there exists a random sequence ª such that under H0 = 2 b p (0 1) =0 3 (7.98) The expressions of and are given in Gozalo and Linton (2001). As usual, the asymptotic normal approximations here can work poorly. To implement the test, one needs to rely on the bootstrap to compute the bootstrapped p-values or critical values. The procedure is standard. For more details, see Gozalo and Linton (2001). 7.6 Exercises 1. Generate = 100 binary observations according to the model ½ ¾ () log = () where () = ( =1 = ) and is uniformly distributed on [ 1 1] Fit the model logit( ()) = 0 + () as described in the chapter. Plot both the estimated logits and probabilities with their true values. 2. Derive approximate formulae for the standard errors of the estimates in the previous exercise and add these to the plots above. 165

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric

More information

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Semiparametric Estimation of Partially Linear Dynamic Panel Data Models with Fixed Effects

Semiparametric Estimation of Partially Linear Dynamic Panel Data Models with Fixed Effects Semiparametric Estimation of Partially Linear Dynamic Panel Data Models with Fixed Effects Liangjun Su and Yonghui Zhang School of Economics, Singapore Management University School of Economics, Renmin

More information

Semiparametric Estimation of Partially Linear Dynamic Panel Data Models with Fixed Effects

Semiparametric Estimation of Partially Linear Dynamic Panel Data Models with Fixed Effects Semiparametric Estimation of Partially Linear Dynamic Panel Data Models with Fixed Effects Liangjun Su and Yonghui Zhang School of Economics, Singapore Management University School of Economics, Renmin

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Discrete Dependent Variable Models

Discrete Dependent Variable Models Discrete Dependent Variable Models James J. Heckman University of Chicago This draft, April 10, 2006 Here s the general approach of this lecture: Economic model Decision rule (e.g. utility maximization)

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

A review of some semiparametric regression models with application to scoring

A review of some semiparametric regression models with application to scoring A review of some semiparametric regression models with application to scoring Jean-Loïc Berthet 1 and Valentin Patilea 2 1 ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France

More information

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE Estimating the error distribution in nonparametric multiple regression with applications to model testing Natalie Neumeyer & Ingrid Van Keilegom Preprint No. 2008-01 July 2008 DEPARTMENT MATHEMATIK ARBEITSBEREICH

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria SOLUTION TO FINAL EXAM Friday, April 12, 2013. From 9:00-12:00 (3 hours) INSTRUCTIONS:

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Quantile methods Class Notes Manuel Arellano December 1, 2009 1 Unconditional quantiles Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Q τ (Y ) q τ F 1 (τ) =inf{r : F

More information

Non-linear panel data modeling

Non-linear panel data modeling Non-linear panel data modeling Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini May 2010 Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 1

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

Testing Homogeneity Of A Large Data Set By Bootstrapping

Testing Homogeneity Of A Large Data Set By Bootstrapping Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Arthur Lewbel Boston College Original December 2016, revised July 2017 Abstract Lewbel (2012)

More information

A Local Generalized Method of Moments Estimator

A Local Generalized Method of Moments Estimator A Local Generalized Method of Moments Estimator Arthur Lewbel Boston College June 2006 Abstract A local Generalized Method of Moments Estimator is proposed for nonparametrically estimating unknown functions

More information

Generated Covariates in Nonparametric Estimation: A Short Review.

Generated Covariates in Nonparametric Estimation: A Short Review. Generated Covariates in Nonparametric Estimation: A Short Review. Enno Mammen, Christoph Rothe, and Melanie Schienle Abstract In many applications, covariates are not observed but have to be estimated

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Function Approximation

Function Approximation 1 Function Approximation This is page i Printer: Opaque this 1.1 Introduction In this chapter we discuss approximating functional forms. Both in econometric and in numerical problems, the need for an approximating

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

IDENTIFICATION OF MARGINAL EFFECTS IN NONSEPARABLE MODELS WITHOUT MONOTONICITY

IDENTIFICATION OF MARGINAL EFFECTS IN NONSEPARABLE MODELS WITHOUT MONOTONICITY Econometrica, Vol. 75, No. 5 (September, 2007), 1513 1518 IDENTIFICATION OF MARGINAL EFFECTS IN NONSEPARABLE MODELS WITHOUT MONOTONICITY BY STEFAN HODERLEIN AND ENNO MAMMEN 1 Nonseparable models do not

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Multivariate Time Series: Part 4

Multivariate Time Series: Part 4 Multivariate Time Series: Part 4 Cointegration Gerald P. Dwyer Clemson University March 2016 Outline 1 Multivariate Time Series: Part 4 Cointegration Engle-Granger Test for Cointegration Johansen Test

More information

Working Paper No Maximum score type estimators

Working Paper No Maximum score type estimators Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

New Developments in Econometrics Lecture 16: Quantile Estimation

New Developments in Econometrics Lecture 16: Quantile Estimation New Developments in Econometrics Lecture 16: Quantile Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. Review of Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

More information

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 7 Introduction to Specification Testing in Dynamic Econometric Models In this lecture I want to briefly describe

More information

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

More information

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E. Forecasting Lecture 3 Structural Breaks Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, 2013 1 / 91 Bruce E. Hansen Organization Detection

More information

ESTIMATION OF NONPARAMETRIC MODELS WITH SIMULTANEITY

ESTIMATION OF NONPARAMETRIC MODELS WITH SIMULTANEITY ESTIMATION OF NONPARAMETRIC MODELS WITH SIMULTANEITY Rosa L. Matzkin Department of Economics University of California, Los Angeles First version: May 200 This version: August 204 Abstract We introduce

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

Threshold Autoregressions and NonLinear Autoregressions

Threshold Autoregressions and NonLinear Autoregressions Threshold Autoregressions and NonLinear Autoregressions Original Presentation: Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Threshold Regression 1 / 47 Threshold Models

More information

Adaptive Nonparametric Density Estimators

Adaptive Nonparametric Density Estimators Adaptive Nonparametric Density Estimators by Alan J. Izenman Introduction Theoretical results and practical application of histograms as density estimators usually assume a fixed-partition approach, where

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?

More information

Defect Detection using Nonparametric Regression

Defect Detection using Nonparametric Regression Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare

More information

Cross-fitting and fast remainder rates for semiparametric estimation

Cross-fitting and fast remainder rates for semiparametric estimation Cross-fitting and fast remainder rates for semiparametric estimation Whitney K. Newey James M. Robins The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP41/17 Cross-Fitting

More information

IDENTIFICATION OF THE BINARY CHOICE MODEL WITH MISCLASSIFICATION

IDENTIFICATION OF THE BINARY CHOICE MODEL WITH MISCLASSIFICATION IDENTIFICATION OF THE BINARY CHOICE MODEL WITH MISCLASSIFICATION Arthur Lewbel Boston College December 19, 2000 Abstract MisclassiÞcation in binary choice (binomial response) models occurs when the dependent

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

What s New in Econometrics? Lecture 14 Quantile Methods

What s New in Econometrics? Lecture 14 Quantile Methods What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Multivariate Distributions

Multivariate Distributions Copyright Cosma Rohilla Shalizi; do not distribute without permission updates at http://www.stat.cmu.edu/~cshalizi/adafaepov/ Appendix E Multivariate Distributions E.1 Review of Definitions Let s review

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 7: Cluster Sampling Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of roups and

More information

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers 6th St.Petersburg Workshop on Simulation (2009) 1-3 A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers Ansgar Steland 1 Abstract Sequential kernel smoothers form a class of procedures

More information

Regression models for multivariate ordered responses via the Plackett distribution

Regression models for multivariate ordered responses via the Plackett distribution Journal of Multivariate Analysis 99 (2008) 2472 2478 www.elsevier.com/locate/jmva Regression models for multivariate ordered responses via the Plackett distribution A. Forcina a,, V. Dardanoni b a Dipartimento

More information

Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

On the econometrics of the Koyck model

On the econometrics of the Koyck model On the econometrics of the Koyck model Philip Hans Franses and Rutger van Oest Econometric Institute, Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR, Rotterdam, The Netherlands Econometric Institute

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley Time Series Models and Inference James L. Powell Department of Economics University of California, Berkeley Overview In contrast to the classical linear regression model, in which the components of the

More information

Asymptotic distribution of GMM Estimator

Asymptotic distribution of GMM Estimator Asymptotic distribution of GMM Estimator Eduardo Rossi University of Pavia Econometria finanziaria 2010 Rossi (2010) GMM 2010 1 / 45 Outline 1 Asymptotic Normality of the GMM Estimator 2 Long Run Covariance

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

A Goodness-of-fit Test for Copulas

A Goodness-of-fit Test for Copulas A Goodness-of-fit Test for Copulas Artem Prokhorov August 2008 Abstract A new goodness-of-fit test for copulas is proposed. It is based on restrictions on certain elements of the information matrix and

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

A comparison of different nonparametric methods for inference on additive models

A comparison of different nonparametric methods for inference on additive models A comparison of different nonparametric methods for inference on additive models Holger Dette Ruhr-Universität Bochum Fakultät für Mathematik D - 44780 Bochum, Germany Carsten von Lieres und Wilkau Ruhr-Universität

More information

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006 Least Squares Model Averaging Bruce E. Hansen University of Wisconsin January 2006 Revised: August 2006 Introduction This paper developes a model averaging estimator for linear regression. Model averaging

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club 36-825 1 Introduction Jisu Kim and Veeranjaneyulu Sadhanala In this report

More information

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Ninth ARTNeT Capacity Building Workshop for Trade Research Trade Flows and Trade Policy Analysis Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric

More information

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006 Analogy Principle Asymptotic Theory Part II James J. Heckman University of Chicago Econ 312 This draft, April 5, 2006 Consider four methods: 1. Maximum Likelihood Estimation (MLE) 2. (Nonlinear) Least

More information

13 Endogeneity and Nonparametric IV

13 Endogeneity and Nonparametric IV 13 Endogeneity and Nonparametric IV 13.1 Nonparametric Endogeneity A nonparametric IV equation is Y i = g (X i ) + e i (1) E (e i j i ) = 0 In this model, some elements of X i are potentially endogenous,

More information

Economic modelling and forecasting

Economic modelling and forecasting Economic modelling and forecasting 2-6 February 2015 Bank of England he generalised method of moments Ole Rummel Adviser, CCBS at the Bank of England ole.rummel@bankofengland.co.uk Outline Classical estimation

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

University of Pavia. M Estimators. Eduardo Rossi

University of Pavia. M Estimators. Eduardo Rossi University of Pavia M Estimators Eduardo Rossi Criterion Function A basic unifying notion is that most econometric estimators are defined as the minimizers of certain functions constructed from the sample

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

Identification and Estimation of Regression Models with Misclassification

Identification and Estimation of Regression Models with Misclassification Identification and Estimation of Regression Models with Misclassification Aprajit Mahajan 1 First Version: October 1, 2002 This Version: December 1, 2005 1 I would like to thank Han Hong, Bo Honoré and

More information