Nonparametric Additive Models

Similar documents
Least-Squares Regression on Sparse Spaces

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

Topic 7: Convergence of Random Variables

Linear First-Order Equations

Chapter 6: Energy-Momentum Tensors

Spurious Significance of Treatment Effects in Overfitted Fixed Effect Models Albrecht Ritschl 1 LSE and CEPR. March 2009

A Modification of the Jarque-Bera Test. for Normality

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Parameter estimation: A new approach to weighting a priori information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Research Article When Inflation Causes No Increase in Claim Amounts

Additive Isotonic Regression

A. Exclusive KL View of the MLE

Designing of Acceptance Double Sampling Plan for Life Test Based on Percentiles of Exponentiated Rayleigh Distribution

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Proof of SPNs as Mixture of Trees

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING

Schrödinger s equation.

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The Role of Models in Model-Assisted and Model- Dependent Estimation for Domains and Small Areas

Math 1B, lecture 8: Integration by parts

The Exact Form and General Integrating Factors

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Gaussian processes with monotonicity information

Survey-weighted Unit-Level Small Area Estimation

Function Spaces. 1 Hilbert Spaces

SYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. where L is some constant, usually called the Lipschitz constant. An example is

Energy behaviour of the Boris method for charged-particle dynamics

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Text S1: Simulation models and detailed method for early warning signal calculation

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

Separation of Variables

Online Appendix for Trade Policy under Monopolistic Competition with Firm Selection

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE

Table of Common Derivatives By David Abraham

A Novel Decoupled Iterative Method for Deep-Submicron MOSFET RF Circuit Simulation

Entanglement is not very useful for estimating multiple phases

7 Semiparametric Estimation of Additive Models

Abstract A nonlinear partial differential equation of the following form is considered:

Lecture 6: Calculus. In Song Kim. September 7, 2011

Tractability results for weighted Banach spaces of smooth functions

Cascaded redundancy reduction

Lower bounds on Locality Sensitive Hashing

Logarithmic spurious regressions

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods

Expected Value of Partial Perfect Information

Monte Carlo Methods with Reduced Error

Lecture 2: Correlated Topic Model

Inference in Nonparametric Series Estimation with Specification Searches for the Number of Series Terms

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

ANALYSIS OF A GENERAL FAMILY OF REGULARIZED NAVIER-STOKES AND MHD MODELS

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling

A Course in Machine Learning

The total derivative. Chapter Lagrangian and Eulerian approaches

INDEPENDENT COMPONENT ANALYSIS VIA

A Review of Multiple Try MCMC algorithms for Signal Processing

A simple tranformation of copulas

Introduction. A Dirichlet Form approach to MCMC Optimal Scaling. MCMC idea

Track Initialization from Incomplete Measurements

Optimization of Geometries by Energy Minimization

Integration Review. May 11, 2013

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS

Simple Tests for Exogeneity of a Binary Explanatory Variable in Count Data Regression Models

Nonlinear Adaptive Ship Course Tracking Control Based on Backstepping and Nussbaum Gain

inflow outflow Part I. Regular tasks for MAE598/494 Task 1

Image Denoising Using Spatial Adaptive Thresholding

arxiv:hep-th/ v1 3 Feb 1993

Systems & Control Letters

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

θ x = f ( x,t) could be written as

Conservation Laws. Chapter Conservation of Energy

Acute sets in Euclidean spaces

Robust Low Rank Kernel Embeddings of Multivariate Distributions

A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs

APPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.

Witten s Proof of Morse Inequalities

Chaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena

Optimal CDMA Signatures: A Finite-Step Approach

Lagrangian and Hamiltonian Mechanics

under the null hypothesis, the sign test (with continuity correction) rejects H 0 when α n + n 2 2.

A Sketch of Menshikov s Theorem

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

Math 342 Partial Differential Equations «Viktor Grigoryan

The Principle of Least Action

ON THE OPTIMALITY SYSTEM FOR A 1 D EULER FLOW PROBLEM

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential

Final Exam Study Guide and Practice Problems Solutions

MODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS ABSTRACT KEYWORDS

Situation awareness of power system based on static voltage security region

Transcription:

Nonparametric Aitive Moels Joel L. Horowitz The Institute for Fiscal Stuies Department of Economics, UCL cemmap working paper CWP20/2

Nonparametric Aitive Moels Joel L. Horowitz. INTRODUCTION Much applie research in statistics, economics, an other fiels is concerne with estimation of a conitional mean or quantile function. Specifically, let ( Y, X ) be a ranom pair, where Y is a scalar ranom variable an X is a -imensional ranom vector that is continuously istribute. Suppose we have ata consisting of the ranom sample { Y, X : i =,..., n}. Then the problem is to use the ata to estimate the conitional mean function i i g( x) EY ( X= x) or the conitional α quantile function Q ( x). The latter is efine by PY [ Q ( x) X= x] = α for some α satisfying 0< α <. For example, the conitional meian α function is obtaine if α = 0.50. One way to procee is to assume that g or Q α is known up to a finite-imensional parameter θ, thereby obtaining a parametric moel of the conitional mean or quantile function. For example, if g is assume to be linear, then gx ( ) = θ0 + θ x, where θ 0 is a scalar constant an θ is a vector that is conformable with x. Similarly, if Q α is assume to be linear, then α Q x = + x. Given a finite-imensional parametric moel, the parameter θ can be α ( ) θ0 θ estimate consistently by least squares in the case of conitional mean function an by least absolute eviations in the case of the conitional meian function Q 0.5. Similar methos are available for other quantiles. However, a parametric moel is usually arbitrary. For example, economic theory rarely if ever provies one, an a misspecifie parametric moel can be

seriously misleaing. Therefore, it is useful to seek estimation methos that o not require assuming a parametric moel for g or Q α. Many investigators attempt to minimize the risk of specification error by carrying out a specification search. In a specification search, several ifferent parametric moels are estimate, an conclusions are base on the one that appears to fit the ata best. However, there is no guarantee that a specification search will inclue the correct moel or a goo approximation to it, an there is no guarantee that the correct moel will be selecte if it happens to be inclue in the search. Therefore, the use of specification searches shoul be minimize. The possibility of specification error can be essentially eliminate through the use of nonparametric estimation methos. Nonparametric methos assume that g or Q α satisfies certain smoothness conitions, but no assumptions are mae about the shape or functional form of g or Q α. See, for example, Fan an Gibels (996), Härle 990, Pagan an Ullah (999), Li an Racine (2007), an Horowitz (2009), among many other references. However, the precision of a nonparametric estimator ecreases rapily as the imension of X increases. This is calle the curse of imensionality. As a consequence of it, impracticably large samples are usually neee to obtain useful estimation precision if X is multi-imensional. The curse of imensionality can be avoie through the use of imension-reuction techniques. These reuce the effective imension of the estimation problem by making assumptions about the form of g or Q α that are stronger than those mae by fully nonparametric estimation but weaker than those mae in parametric moeling. Single-inex an partially linear moels (Härle, Gao, an Liang 2000, Horowitz 2009) an nonparametric aitive moels, the subect of this chapter, are examples of ways of oing this. These moels 2

achieve greater estimation precision than o fully nonparametric moels, an they reuce (but o not eliminate) the risk of specification error relative to parametric moels. In a nonparametric aitive moel, g or Q α is assume to have the form () gx ( ) 2 or = µ + f( x ) + f2( x ) +... + f ( x ), Qα ( x) where µ is a constant, x ( =,..., ) is the th component of the -imensional vector x, an f,..., f are functions that are assume to be smooth but are otherwise unknown an are estimate nonparametrically. Moel () can be extene to (2) gx ( ) 2 or = F[ µ + f( x ) + f2( x ) +... + f ( x )], Qα ( x) where F is a strictly increasing function that may be known or unknown. It turns out that uner mil smoothness conitions, the aitive components f,..., f can be estimate with the same precision that woul be possible if X were a scalar. Inee, each aitive component can be estimate as well as it coul be if all the other aitive components were known. This chapter reviews methos for achieving these results. Section 2 escribes methos for estimating moel (). Methos for estimating moel (2) with a known or unknown link function F are escribe in Section 3. Section 4 iscusses tests of aitivity. Section 5 presents an empirical example that illustrates the use of moel (), an Section 6 presents conclusions. Estimation of erivatives of the functions f,..., f is important in some applications. Estimation of erivatives is not iscusse in this chapter but is iscusse by Severance-Lossin an Sperlich (999) an Yang, Sperlich, an Härle (2003). The iscussion in this chapter is informal. Regularity conitions an proofs of results are available in the 3

references that are cite in the chapter. The etails of the methos escribe here are lengthy, so most methos are presente in outline form. Details are available in the cite references. 2. METHODS FOR ESTIMATING MODEL () We begin with the conitional mean version of moel (), which can be written as (3) 2 2 EY ( X= x) = µ + f( x) + f ( x ) +... + f ( x ). The conitional quantile version of () is iscusse in Section 2.. Equation (3) remains unchange if a constant, say γ, is ae to f ( =,..., ) an µ is replace by µ γ. Therefore, a location normalization is neee to ientify µ an the = aitive components. Let X enote the th component of the ranom vector X. Depening on the metho that is use to estimate the f s, location normalization consists of assuming that Ef ( X ) = 0 or that (4) f () v v = 0 for each =,...,. Stone (985) was the first to give conitions uner which the aitive components can be estimate with a one-imensional nonparametric rate of convergence an to propose an estimator that achieves this rate. Stone (985) assume that the support of X is [0,], that the probability ensity function of X is boune away from 0 on [0,], an that Var( Y X = x) is boune on [0,]. He propose using least squares to obtain spline estimators of the f s 4

uner the location normalization Ef ( X ) = 0. Let f ˆ enote the resulting estimator of f. For any function h on [0,], efine 2 2 h = h() v v. 0 Stone (985) showe that if each f is p times ifferentiable on [0,], then 2 ˆ 2 p/(2 p ) E f f X,..., X = Op[ n + ]. This is the fastest possible rate of convergence. However, Stone s result oes not establish pointwise convergence of f ˆ to f or the asymptotic istribution of p/(2 p+ ) ˆ n [ f ( x) f ( x)]. Since the work of Stone (985), there have been many attempts to evelop estimators of the f s that are pointwise consistent with the optimal rate of convergence an are asymptotically normally istribute. Oracle efficiency is another esirable property of such estimators. Oracle efficiency means that the asymptotic istribution of the estimator of any aitive component f is the same as it woul be if the other components were known. Bua, Hastie an Tibshirani (989) an Hastie an Tibshirani (990) propose an estimation metho calle backfitting. This metho is base on the observation that k fk( x ) = EY [ µ f( x ) X= ( x,..., x )]. k If µ an the f s for k were known, then f k coul be estimate by applying nonparametric regression to Y µ f ( X ). Backfitting replaces the unknown quantities by preliminary k estimates. Then each aitive component is estimate by nonparametric regression, an the preliminary estimates are upate as each aitive component is estimate. In principle, this process continues until convergence is achieve. Backfitting is implemente in many statistical software packages, but theoretical investigation of the statistical properties of backfitting estimators is ifficult. This is because these estimators are outcomes of an iterative process, not the solutions to optimization problems or systems of equations. Opsomer an Ruppert (997) 5

an Opsomer (2000) investigate the properties of a version of backfitting an foun, among other things, that strong restrictions on the istribution of X were necessary to achieve results an that the estimators are not oracle efficient. Other methos escribe below are oracle efficient an have aitional esirable properties. Compare to these estimators, backfitting is not a esirable approach, espite its intuitive appeal an availability in statistical software packages. The first estimator of the f s that was prove to be pointwise consistent an asymptotically normally istribute was evelope by Linton an Nielsen (995) an extene by Linton an Härle (996). Tøstheim an Auesta (994) an Newey (994) present similar ieas. The metho is calle marginal integration an is base on the observation that uner the location normalization Ef ( X ) = 0, µ = EY ( ) an (5) ( ) ( ) f ( x ) = E( Y X = x) p ( x ) x µ, where ( ) x is the vector consisting of all components of x except x an p is the probability ensity function of analog ( ) X. The constant µ is estimate consistently by the sample n ˆ µ = n Y. i= i To estimate, say, f ( x ), let ( ) gx ˆ(, x ) be the following kernel estimator of ( ) ( ) EY ( X = x, X = x ): (6) where (7) n ( ) ( ) ( ) ˆ ( ) x X i x X i gˆ( x, x ) = P( x, x ) YK i K2, h i h = 2 n ( ) ( ) ( ) ˆ(, x X ) i x X i Px x = K K2, h i h = 2 6

K is a kernel function of a scalar argument, K 2 is a kernel function of a imensional argument, ( ) Xi is the i th observation of ( ) X, an h an h 2 are banwiths. The integral on the right-han sie of (5) is the average of ( ) ( ) EY ( X = x, X = x ) over ( ) X an can be estimate by the sample average of f is ˆ ( ) ( ) n = ˆ(, i ) ˆ i= f x n gx X µ. ( ) gx ˆ(, X ). The resulting marginal integration estimator of Linton an Härle (996) give conitions uner which 2/5 ˆ n [ f( x ) f( x )] N[ β, MI ( x ), V, MI ( x )] for suitable functions β,mi an V,MI. Similar results hol for the marginal integration estimators of the other aitive components. The most important conition is that each aitive component is at least times continuously ifferentiable. This conition implies that the marginal integration estimator has a form of the curse of imensionality, because maintaining an n 2/5 rate of convergence in probability requires the smoothness of the aitive components to increase as increases. In aition, the marginal integration estimator is not oracle efficient an can be har to compute. There have been several refinements of the marginal integration estimator that attempt to overcome these ifficulties. See, for example, Linton (997), Kim, Linton, an Hengartner (999), an Hengartner an Sperlich (2005). Some of these refinements overcome the curse of imensionality, an others achieve oracle efficiency. However, none of the refinements is both free of the curse of imensionality an oracle efficient. The marginal integration estimator has a curse of imensionality because, as can be seen from (6) an (7), it requires full-imensional nonparametric estimation of EY ( X= x) an the probability ensity function of X. The curse of imensionality can be avoie by imposing aitivity at the outset of estimation, thereby avoiing the nee for full-imensional nonparametric estimation. This cannot be one with kernel-base estimators, such as those use in marginal integration, but it can be one easily with series estimators. However, it is har to establish the asymptotic istributional properties of series estimators. Horowitz an Mammen 7

(2004) propose a two-step estimation proceure that overcomes this problem. The first step of the proceure is series estimation of the f s. This is followe by a backfitting step that turns the series estimates into kernel estimates that are both oracle efficient an free of the curse of imensionality. Horowitz an Mammen (2004) use the location normalization (4) an assume that the support of X is [,]. Let { ψ : k =, 2,...} be an orthonormal basis for smooth functions on k [,] that satisfies (4). The first step of the Horowitz-Mammen (2004) proceure consists of using least squares to estimate µ an the generalize Fourier coefficients { θ k} in the series approximation (8) EY ( X x) µ θ ψ ( x ) = +, κ = k= k k where κ is the length of the series approximations to the aitive components. In this approximation, f is approximate by κ θkψk k= f ( x ) ( x ). Thus, the estimators of µ an the θ k s are given by n κ { µθ, k : =,..., ; k =,..., κ } = arg min Yi µ θkψk ( Xi ) µθ, k i= = k= 2, where X i is the th component of the vector X i. Let f enote the resulting estimator of µ an f ( =,..., ). That is, κ = θkψk k= f ( x ) ( x ). 8

Now let K an h, respectively, enote a kernel function an a banwith. The secon-step estimator of, say, f is (9) n n x X i ( ) x Xi = i i h h i= i= fˆ ( x ) K [ Y f ( X )] K, where ( ) Xi is the vector consisting of the i th observations of all components of X except the first an f = f 2 +... + f. In other wors, ˆf is the kernel nonparametric regression of Y ( ) f ( X ) on X. Horowitz an Mammen (2004) give conitions uner which 2/5 ˆ n [ f( x ) f( x )] N[ β, HM ( x ), V, HM ( x )] for suitable functions β,hm an V,HM. Horowitz an Mammen (2004) also show that the secon-step estimator is free of the curse of imensionality an oracle efficient. Freeom from the curse of imensionality means that the f s nee to have only two continuous erivatives, regarless of. Oracle efficiency means that the asymptotic istribution of n [ fˆ ( x ) f ( x )] is the same as it woul be if the estimator 2/5 f in (9) were replace with the true (but unknown) sum of aitive components, f. Similar results apply to the secon-step estimators of the other aitive components. Thus, asymptotically, each aitive component f can be estimate as well as it coul be if the other components were known. Intuitively, the metho works because the bias ue to truncating the series approximations to the f s in the first estimation step can be mae negligibly small by making κ increase at a sufficiently rapi rate as n increases. This increases the variance of the f s, but the variance is reuce in the secon estimation step because this step inclues 9

averaging over the f s. Averaging reuces the variance enough to enable the secon-step estimates to have an 2/5 n rate of convergence in probability. There is also a local linear version of the secon step estimator. For estimating f, this consists of choosing b 0 an b on minimize n ( ) 2 Xi x Sn( b0, b) = ( nh) [ Yi µ b0 b( Xi x ) f ( Xi ] K. h i= Let ( bˆ ˆ 0, b ) enote resulting value of ( b0, b ). The local linear secon-step estimator of f ( x ) is fˆ x = bˆ. The local linear estimator is pointwise consistent, asymptotically normal, oracle ( ) 0 efficient, an free of the curse of imensionality. However, the mean an variance of the asymptotic istribution of the local linear estimator are ifferent from those of the Naaraya- Watson (or local constant) estimator (9). Fan an Gibels (996) iscuss the relative merits of local linear an Naaraya-Watson estimators. Mammen, Linton, an Nielsen (999) evelope an asymptotically normal, oracleefficient estimation proceure for moel () that consists of solving a certain set of integral equations. Wang an Yang (2007) generalize the two-step metho of Horowitz an Mammen (2004) to autoregressive time-series moels. Their moel is t µ ( t)... ( t ) σ( t,..., t ) εt Y = + f X + + f X + X X ; t =,2,..., where X t is the th component of the -vector X, E( ε X ) = 0, an t t t 2 t E( ε X ) =. The t explanatory variables { Xt : =,..., } may inclue lagge values of the epenent variable Y t. The ranom vector ( X, ε ) is require to satisfy a strong mixing conition, an the aitive t t components have two erivatives. Wang an Yang (2007) propose an estimator that is like that 0

of Horowitz an Mammen (2004), except the first step uses a spline basis that is not necessarily orthogonal. Wang an Yang (2007) show that their estimator of each aitive component is pointwise asymptotically normal with an 2/5 n rate of convergence in probability. Thus, the estimator is free of the curse of imensionality. It is also oracle efficient. Nielsen an Sperlich (2005) an Wang an Yang (2007) iscuss computation of some of the foregoing estimators. Song an Yang (200) escribe a ifferent two-step proceure for obtaining oracle efficient estimators with time-series ata. Like Wang an Yang (2007), Song an Yang (200) consier a nonparametric, aitive, autoregressive moel in which the covariates an ranom noise component satisfy a strong mixing conition. The first estimation step consists of using least squares to make a constant-spline approximation to the aitive components. The secon step is like that of Horowitz an Mammen (2004) an Wang an Yang (2007), except a linear spline estimator replaces the kernel estimator of those papers. Most importantly, Song an Yang (200) obtain asymptotic uniform confience bans for the aitive components. They also report that their two-stage spline estimator can be compute much more rapily than proceures that use kernel-base estimation in the secon step. Horowitz an Mammen (2004) an Wang an Yang (2007) obtaine pointwise asymptotic normality for their estimators but i not obtain uniform confience bans for the aitive components. However, the estimators of Horowitz an Mammen (2004) an Wang an Yang (2007) are, essentially, kernel estimators. Therefore, these estimators are multivariate normally istribute over a gri of points that are sufficiently far apart. It is likely that uniform confience bans base on the kernel-type estimators can be obtaine by taking avantage of this multivariate normality an letting the spacing of the gri points ecrease slowly as n increases.

2. Estimating a Conitional Quantile Function This section escribes estimation of the conitional quantile version of (). The iscussion concentrates on estimation of the conitional meian function, but the methos an results also apply to other quantiles. Moel () for the conitional meian function can be estimate using series methos or backfitting, but the rates of convergence an other asymptotic istributional properties of these estimators are unknown. De Gooier an Zerom (2003) propose a marginal integration estimator. Like the marginal integration estimator for a conitional mean function, the marginal integration estimator for a conitional meian or other conitional quantile function is asymptotically normally istribute but suffers from the curse of imensionality. Horowitz an Lee (2005) propose a two-step estimation proceure that is similar to that of Horowitz an Mammen (2004) for conitional mean functions. The two-step metho is oracle efficient an has no curse of imensionality. The first step of the metho of Horowitz an Lee (2005) consists of using least absolute eviations (LAD) to estimate µ an the θ k series approximation (8). That is, s in the n κ k = k = = Yi k k Xi µθ, k i= = k= { µθ, :,..., ;,..., κ } arg min µ θ ψ ( ), As before, f enote the first-step estimator of f. The secon-step of the metho of Horowitz an Lee (2005) is of a form local-linear LAD estimation that is analogous to the secon-step of the metho of Horowitz an Mammen (2004). For estimating f, this step consists of choosing b 0 an b to minimize n ( ) Xi x Sn( b0, b) = ( nh) Yi µ b0 b( Xi x ) f ( Xi K, h i= 2

where h is a banwith, K is a kernel function, an f = f 2 +... + f. Let ( bˆ ˆ 0, b ) enote resulting value of ( b 0, b ). The estimator of f ( x ) is fˆ ˆ ( x ) = b0. Thus, the secon-step estimator of any aitive component is a local linear conitional meian estimator. Horowitz an Lee (2005) give conitions uner which n [ fˆ ( x ) f ( x )] N[ ( x ), V ( x )] for 2/5 β, HL, HL suitable functions β,hl an V,HL. Horowitz an Lee (2005) also show that that ˆf is free of the curse of imensionality an oracle efficient. Similar results apply to the estimators of the other f s. 3. METHODS FOR ESTIMATING MODEL (2) This section escribes methos for estimating moel (2) when the link function F is not the ientity function. Among other applications, this permits extension of methos for nonparametric aitive moeling to settings in which Y is binary. For example, an aitive binary probit moel is obtaine by setting (0) PY ( = X= x) =Φ [ µ + f( x) +... + f ( X )], where Φ is the stanar normal istribution function. In this case, the link function is F = Φ. A binary logit moel is obtaine by replacing Φ in (0) with the logistic istribution function. Section 3. treats the case in which F is known. Section 3.2 treats banwith selection for one of the methos iscusse in Section 3. Section 3.3 iscusses estimation when F is unknown. 3

3. Estimation with a Known Link Function In this section, it is assume that the link function F is known. A necessary conition for point ientification of µ an the f s is that F is strictly monotonic. Given this requirement, it can be assume without loss of generality that F is strictly increasing. Consequently, F [ Q ( x)] α is the α conitional quantile of F ( Y) an has a nonparametric aitive form. Therefore, quantile estimation of the aitive components of moel (2) can be carrie out by applying the methos of Section 2. to F ( Y). Accoringly, the remainer of this section is concerne with estimating the conitional mean version of moel (2). Linton an Härle (996) escribe a marginal integration estimator of the aitive components in moel (2). As in the case of moel (), the marginal integration estimator has a curse of imensionality an is not oracle efficient. The two-step metho of Horowitz an Mammen (2004) is also applicable to moel (2). When F has a Lipschitz continuous secon erivative an the aitive components are twice continuously ifferentiable, it yiels asymptotically normal, oracle efficient estimators of the aitive components. The estimators have an 2/5 n rate of convergence in probability an no curse of imensionality. The first step of the metho of Horowitz an Mammen (2004) is nonlinear least squares estimation of truncate series approximations to the aitive components. That is, the generalize Fourier coefficients of the approximations are estimate by solving Now set n κ { µθ, k : =,..., ; k =,..., κ } = arg min Yi F µ + θkψk ( x ) µθ, k i= = k= 2. 4

κ = θkψk k= f ( x ) ( x ). A secon-step estimator of f ( x ), say can be obtaine by setting n x X i f ( x ) = arg min Yi F µ + b+ f ( Xi ) K, b h i= = 2 where, as before, K is a kernel function an h is a banwith. However, this requires solving a ifficult nonlinear optimization problem. An asymptotically equivalent estimator can be 2 obtaine by taking one Newton step from b 0 = f ( x ) towar f ( x ). To o this, efine { µ } n 2 n = i + + 2 i + + i i= S ( x, f) 2 Y F[ f ( x ) f ( X )... f ( X )] 2 x X i F [ µ + f( x ) + f2( Xi ) +... + f( Xi )] K h an n 2 2 x X i Sn ( x, f) = 2 F [ µ + f( x ) + f2( Xi ) +... + f( Xi )] K h i= n i= 2 i µ 2 i i 2 { Y F[ + f ( x ) + f ( X ) +... + f ( X )]} 2 x X i F [ µ + f( x ) + f2( Xi ) +... + f( Xi )] K. h The secon-step estimator is f ˆ ( x ) f ( x ) S ( x f )/ S ( x, f ). = n n Horowitz an Mammen (2004) also escribe a local-linear version of this estimator. 5

Liu, Yang, an Härle (20) escribe a two-step estimation metho for moel (2) that is analogous to the metho of Wang an Yang (2007) but uses a local pseuo log-likelihoo obective function base on the exponential family at each estimation stage instea of a local least squares obective function. As in Wang an Yang (2007), the metho of Liu, Yang, an Härle (20) applies to an autoregressive moel in which the covariates an ranom noise satisfy a strong mixing conition. Yu, Park, an Mammen (2008) propose an estimation metho for moel (2) that is base on numerically solving a system of nonlinear integral equations. The metho is more complicate than that of Horowitz an Mammen (2004), but the results of Monte Carlo experiments suggest that the estimator of Yu, Park, an Mammen (2008) has better finite-sample properties than that of Horowitz an Mammen (2004), especially when the covariates are highly correlate. 3.2 Banwith Selection for the Two-Step Estimatorof Horowitz an Mammen (2004) This section escribes a penalize least squares (PLS) metho for choosing the banwith h in the secon step of the proceure of Horowitz an Mammen (2004). The metho is escribe here for the local-linear version of the metho, but similar results apply to the local constant version. The metho escribe in this section can be use with moel () by setting F equal to the ientity function. The PLS metho simultaneously estimates the banwiths for secon-step estimation of all the aitive components f ( =,..., ). Let h /5 = Cn be the banwith for ˆ f. The PLS metho selects the C s that minimize an estimate of the average square error (ASE): 6

n ˆ 2 µ i µ i i= ASE( h) = n { F[ + f( X )] F[ + f( X )]}, where f ˆ = f ˆ ˆ +... + f an /5 /5 h= ( Cn,..., C n ). Specifically, the PLS metho selects the C s to C,..., C n n ˆ 2 ˆ 2 ˆ i µ i µ i i i= i= () minimize : PLS( h) = n [ Y F[ + f( X )] + 2 K(0) n { F [ + f( X )] V( X )} where the = [ n CDˆ ( X )], 4/5 i C s are restricte to a compact, positive interval that exclues 0, an D x nh K F f X n Xi x ˆ 2 ( ) = ( ) [ µ + ( i)] h i= n X ˆ( ) i x X... i x V x = K K h i h = n Xi x Xi x ˆ 2 K... K { Yi F[ µ + f( Xi)]. h h i= The banwiths for ˆ V may be ifferent from those use for ˆf, because ˆ V is a full-imensional nonparametric estimator. Horowitz an Mammen (2004) present arguments showing that the solution to () estimates the banwiths that minimize ASE. 3.3 Estimation with an Unknown Link Function This section is concerne with estimating moel (2) when the link function F is unknown. When F is unknown, moel (2) contains semiparametric single-inex moels as a 7

special case. This is important, because semiparametric single-inex moels an nonparametric aitive moels with known link functions are non-neste. In a semiparametric single-inex EY ( X= x) = G( θ x) for some unknown function G an parameter vector θ. This moel coincies with the nonparametric aitive moel with link function F only if the aitive components are linear an F = G. An applie researcher must choose between the two moels an may obtain highly misleaing results if an incorrect choice is mae. A nonparametric aitive moel with an unknown link function makes this choice unnecessary, because the moel nests semiparametric single inex moels an nonparametric aitive moels with known link functions. A nonparametric aitive moel with an unknown link function also nests the multiplicative specification 2 2 EY ( X= x) = F[ f( x) f ( x )... f ( x )]. A further attraction of moel (2) with an unknown link function is that it provies an informal, graphical metho for checking the aitive an single-inex specifications. One can plot the estimates of F an the f s. Approximate linearity of the estimate of F favors the aitive specification (), whereas approximate linearity of the f s favors the single-inex specification. Linearity of F an the f s favors the linear moel EY ( X) = θ X. Ientification of the f s in moel (2) requires more normalizations an restrictions when F is unknown than when F is known. First, observe that µ is not ientifie when f is unknown, because * F[ µ + f ( x ) +... + f ( x )] = F [ f ( x ) +... + f ( x )], where the function * F is efine by F * () v = F( µ + v) for any real v. Therefore, we can set µ = 0 without loss of generality. Similarly, a location normalization is neee because moel (2) remains unchange 8

if each f is replace by f + γ, where γ is a constant, an Fv () is replace by F*( ν) = F( ν γ... γ ). In aition, a scale normalization is neee because moel (2) is unchange if each f is replace by cf for any constant c 0 an Fv () is replace by F*( ν) = F( ν / c). Uner the aitional assumption that F is monotonic, moel (2) with F unknown is ientifie if at least two aitive components are not constant. To see why this assumption is necessary, suppose that only f is not constant. Then conitional mean function is of the form F[ f( x ) + constant]. It is clear that this function oes not ientify F an f. The methos presente in this iscussion use a slightly stronger assumption for ientification. We assume that the erivatives of two aitive components are boune away from 0. The inices an k of these components o not nee to be known. It can be assume without loss of generality that = an k =. Uner the foregoing ientifying assumptions, oracle-efficient, pointwise asymptotically normal estimators of the f s can be obtaine by replacing F in the proceure of Horowitz an Mammen (2004) for moel (2) with a kernel estimator. As in the case of moel (2) with F known, estimation takes place in two steps. In the first step, a moifie version of Ichimura s (993) estimator for a semiparametric single-inex moel is use to obtain a series approximation to each f an a kernel estimator of F. The first-step proceure imposes the aitive structure of moel (2), thereby avoiing the curse of imensionality. The first-step estimates are inputs to the secon step. The secon-step estimator of, say, f is obtaine by taking one Newton step from the first-step estimate towar a local nonlinear least-squares estimate. In large samples, the secon-step estimator has a structure similar to that of a kernel nonparametric regression estimator, so eriving its pointwise rate of convergence an asymptotic 9

istribution is relatively easy. The etails of the two-step proceure are lengthy. They are presente in Horowitz an Mammen (20). The oracle-efficiency property of the two-step estimator implies that asymptotically, there is no penalty for not knowing F in a nonparametric aitive moel. Each f can be estimate as well as it woul be if F an the other f s were known. Horowitz an Mammen (2007) present a penalize least squares (PLS) estimation proceure that applies to moel (2) with an unknown F an also applies to a larger class of moels that inclues quantile regressions an neural networks. The proceure uses the location an scale normalizations µ = 0, (4), an (2) 2 f () v v =. = The PLS estimator of Horowitz an Mammen (2007) chooses the estimators of F an the aitive components to solve n 2 (3) minimize: { Yi F[ f( Xi) +... + f( Xi )]} + λnj( F, f,..., f) F, f,..., f n i= subect to: (4), (2), where { λ n } is a sequence of constants an J is a penalty term that penalizes roughness of the estimate functions. If F an the f s are k times ifferentiable, the penalty term is J F f f J F f f J F f f ν ν = + 2 2 (,,..., ) (,,..., ) (,,..., ), where ν an ν 2 are constants satisfying ν2 ν > 0, 2 2 J( F, f,..., f) = Tk( F) [ T ( f) + Tk ( f)] = (2k )/4, 20

2 2 J2( F, f,..., f) = T( F) [ T ( f) + Tk ( f)] = /4, an 2 ( ) 2 T ( f ) = f () v v for 0 k an any function f whose th erivative is square integrable. The PLS estimator can be compute by approximating F an the f s by B-splines an minimizing (3) over the coefficients of the spline approximation. Denote the estimator by F ˆ, f ˆ ˆ,..., f. Assume without loss of generality that the X is supporte on [0,]. Horowitz an Mammen (2007) give conitions uner which the following result hols: 0 [ fˆ () v f ()] v v O ( n + ) for each =,..., an 2 2 k/(2k ) = p 2 ˆ 2 k/(2k ) ( ) ( )... = p( ) = = F f x F f x x x O n +. In other wors, the integrate square errors of the PLS estimates of the link function an aitive components converge in probability to 0 at the fastest possible rate uner the assumptions. There is no curse of imensionality. The available results o not provie an asymptotic istribution for the PLS estimator. Therefore, it is not yet possible to carry out statistical inference with this estimator. 4. TESTS OF ADDITIVITY Moels () an (2) are misspecifie an can give misleaing results if the conitional mean or quantile of Y is not aitive. Therefore, it is useful to be able to test aitivity. Several 2

tests of aitivity have been propose for moels of conitional mean functions. These tests unoubtely can be moifie for use with conitional quantile functions, but this moification has not yet been carrie out. Accoringly, the remainer of this section is concerne with testing aitivity in the conitional mean versions of moels () an (2). Bearing in min that moel () can be obtaine from moel (2) by letting F be the ientity function, the null hypothesis to be teste is H : EY ( X= x) = F[ µ + f( x) +... + f ( x )]. 0 The alternative hypothesis is H : EY ( X= x) = F[ µ + f( x)], where there are no functions f,..., f such that P[ f( X) = f ( X ) +... + f ( X )] =. 22 Gozalo an Linton (200) have propose a general class of tests. Their tests are applicable regarless of whether F is the ientity function. Wang an Carriere (20) an Dette an von Lieres un Wilkau (200) propose similar tests for the case of an ientity link function. These tests are base on comparing fully a fully nonparametric estimator of f with an estimator that imposes aitivity. Eubank, Hart, Simpson an Stefanski (995) also propose tests for the case in which F is the ientity function. These tests look for interactions among the components of X an are base on Tukey s (949) test for aitivity in analysis of variance. Sperlich, Tøstheim an Yang (2002) also propose a test for the presence of interactions among components of X. Other tests have been propose by Abramovich, De Fesis, an Sapatinas (2009) an Derbort, Dette, an Munk (2002). The remainer of this section outlines a test that Gozalo an Linton (200) foun though Monte Carlo simulation to have satisfactory finite sample performance. The test statistic has the form

n ˆ ˆ 2 n = F f Xi ˆ + f Xi + + f Xi Xi i= ˆ τ { [ ( )] [ µ ( )... ( )]} π( ), where f ˆ( x ) is a full-imensional nonparametric estimator of EY ( X= x), ˆµ an the f ˆ s are estimators of µ an f uner H 0, an π is a weight function. Gozalo an Linton (200) use a Naaraya-Watson kernel estimator for ˆf an a marginal integration estimator for ˆµ an the f ˆ s. Dette an von Lieres un Wilkau (200) also use these marginal integration estimators in their version of the test. However, other estimators can be use. Doing so might increase the power of the test or enable some of the regularity conitions of Gozalo an Linton (200) to be relaxe. In aition, it is clear that ˆn τ can be applie to conitional quantile moels, though the etails of the statistic s asymptotic istribution woul be ifferent from those with conitional mean moels. If F is unknown, then F [ f( x)] is not ientifie, but a test of aitivity can be base on the following moifie version of ˆn τ : n ˆ ˆ ˆ 2 n = f Xi F ˆ + f Xi + + f Xi Xi i= ˆ τ { ( ) [ µ ( )... ( )]} π( ), where ˆf is a full-imensional nonparametric estimator of the conitional mean function, ˆF is a nonparametric estimator of F, an the f ˆ s are estimators of the aitive components. Gozalo an Linton (200) give conitions uner which a centere, scale version of ˆn τ is asymptotically normally istribute as N (0,). Dette an von Lieres un Wilkau (200) provie similar results for the case in which F is the ientity function. Gozalo an Linton (200) an Dette an von Lieres un Wilkau (200) also provie formulae for estimating the centering an scaling parameters. Simulation results reporte by Gozalo an Linton (200) inicate that using 23

the wil bootstrap to fin critical values prouces smaller errors in reection probabilities uner H 0 than using critical values base on the asymptotic normal istribution. Dette an von Lieres un Wilkau (200) also use the wil bootstrap to estimate critical values. 5. AN EMPIRICAL APPLICATION This section illustrates the application of the estimator of Horowitz an Mammen (2004) by using it to estimate a moel of the rate of growth of gross omestic prouct (GDP) among countries. The moel is G = f ( T) + f ( S) + U, T S where G is the average annual percentage rate of growth of a country s GDP from 960 to 965, T is the average share of trae in the country s economy from 960 to 965 measure as exports plus imports ivie by GDP, an S is the average number of years of schooling of ault resients of the country in 960. U is an unobserve ranom variable satisfying EU ( T, S ) = 0. The functions f T an f S are unknown an are estimate by the metho of Horowitz an Mammen (2004). The ata are taken from the ataset Growth in Stock an Watson (20). They comprise values of G, T, an S for 60 countries. Estimation was carrie out using a cubic B-spline basis in the first step. The secon step consiste of Naaraya-Watson (local constant) kernel estimation with the biweight kernel. Banwiths of 0.5 an 0.8 were use for estimating f T an f S, respectively. The estimation results are shown in Figures -2. The estimates of f T an f S are INSERT FIGURES AND 2 HERE nonlinear an ifferently shape. The ip in f S near S = 7 is almost certainly an artifact of ranom sampling errors. The estimate aitive components are not well-approximate by 24

simple parametric functions such as quaratic or cubic functions. A lengthy specification search might be neee to fin a parametric moel that prouces shapes like those in Figures -2. If such a search were successful, the resulting parametric moels might provie useful compact representations of f T an f S but coul not be use for vali inference. 6. CONCLUSIONS Nonparametric aitive moeling with a link function that may or may not be known is an attractive way to achieve imension reuction in nonparametric moels. It greatly eases the restrictions of parametric moeling without suffering from the lack of precision that the curse of imensionality imposes on fully nonparametric moeling. This chapter has reviewe a variety of methos for estimating nonparametric aitive moels. An empirical example has illustrate the usefulness of the nonparametric aitive approach. Several issues about the approach remain unresolve. One of these is to fin ways to carry out inference about aitive components base on the estimation metho of Horowitz an Mammen (2007) that is escribe in Section 3.3. This is the most general an flexible metho that has been evelope to ate. Another issue is the extension of the tests of aitivity escribe in Section 5 to estimators other than partial integration an moels of conitional quantiles. Finally, fining ata-base methos for choosing tuning parameters for the various estimation an testing proceures remains an open issue. 25

Relative Growth Rate 0.2.4.6.8.3.4.5.6.7.8 Trae Share Figure : Aitive component f T in the growth moel. 26

Relative Growth Rate.2.4.6.8 2 4 6 8 0 Average Years of Schooling Figure 2: Aitive component f S in the growth moel. 27

REFERENCES Abramovich, F., I. De Fesis, an T. Sapatinas. 2009. Optimal Testing for Aitivity in Multiple Nonparametric Regression, Annals of the Institute of Statistical Mathematics, 6, pp. 69-74. Bua, A., T. Hastie, an R. Tibshirani. 989. Linear Smoothers an Aitive Moels, Annals of Statistics, 7, pp. 453-555. De Gooier, J.G. an D. Zerom. 2003. On Aitive Conitional Quantiles with High Dimensional Covariates, Journal of the American Statistical Association, 98, pp. 35-46. Dette, H. an C. von Lieres un Wilkau. 200. Testing Aitivity by Kernel-Base Methos What Is a Reasonable Test? Bernoulli, 7, pp. 669-697. Derbort, S., H. Dette, an A. Munk. 2002. A Test for Aitivity in Nonparametric Regression, Annals of the Institute of Statistical Mathematics, 54, pp. 60-82. Eubank, R.L., J.D. Hart, D.G. Simpson, an L.A. Stefanski. 995. Testing for Aitivity in Nonparametric Regression, Annals of Statistics, 23, pp. 896-920. Fan, J. an I. Gibels (996). Local Polynomial Moelling an Its Applications. Lonon: Chapman an Hall. Gozalo, P.L. an O.B. Linton. 200. Testing Aitivity in Generalize Nonparametric Regression Moels with Estimate Parameters, Journal of Econometrics, 04, pp. -48. Härle,W. (990). Applie Nonparametric Regression. Cambrige: Cambrige University Press Härle, W. H. Liang, an J. Gao. 2000. Partially Linear Moels. New York: Springer. Hastie, T.J. an R.J. Tibshirani. 990. Generalize Aitive Moels. Lonon: Chapman an Hall. 28

Hengartner, N.W. an S. Sperlich. 2005. Rate Optimal Estimation with the Integration Metho in the Presence of Many Covariates, Journal of Multivariate Analysis, 95, pp. 246-272. Horowitz, J.L. 2009. Semiparametric an Nonparametric Methos in Econometrics. New York: Springer. Horowitz, J.L. an S. Lee. 2005. Nonparametric Estimation of an Aitive Quantile Regression Moel, Journal of the American Statistical Association, 00, pp. 238-249. Horowitz, J.L. an E. Mammen. 2004. Nonparametric Estimation of an Aitive Moel with a Link Function, Annals of Statistics, 32, pp. 242-2443. Horowitz, J.L. an E. Mammen. 2007. Rate-Optimal Estimation for a General Class of Nonparametric Regression Moels with Unknown Link Functions, Annals of Statistics, 35, pp. 2589-269. Horowitz, J.L. an E. Mammen. 20. Oracle-Efficient Nonparametric Estimation of an Aitive Moel with an Unknown Link Function, Econometric Theory, 27, pp. 582-608. Ichimura, H. 993. Semiparametric Least Squares (SLS) an Weighte SLS Estimation of Single-Inex Moels, Journal of Econometrics 58, pp. 7-20. Kim, W., O.B. Linton, an N.W. Hengartner. 999. A Computationally Efficient Oracle Estimator for Aitive Nonparametric Regression with Bootstrap Confience Intervals, Journal of Computational an Graphical Statistics, 8, pp. 278-297. Li, Q. an J.S. Racine. 2007. Nonparametric Econometrics. Princeton: Princeton University Press. Linton, O.B. (997). Efficient Estimation of Aitive Nonparametric Regression Moels, Biometrika, 84, pp. 469-473. 29

Linton, O. B. an W. Härle. 996. Estimating Aitive Regression Moels with Known Links, Biometrika, 83, pp. 529-540. Linton, O. B. an J. B. Nielsen. 995. A Kernel Metho of Estimating Structure Nonparametric Regression Base on Marginal Integration, Biometrika, 82, pp. 93-00. Liu, R., L. Yang, an W.K. Härle. 20. Oracally Efficient Two-Step Estimation of Generalize Aitive Moel, SFB 649 iscussion paper 20-06, Humbolt-Universität zu Berlin, Germany. Mammen, E., O. Linton, an J. Nielsen. 999. The Existence an Asymptotic Properties of a Backfitting Proection Algorithm uner Weak Conitions, Annals of Statistics, 27, pp. 443-490. Newey, W.K. 994. Kernel Estimation of Partial Means an a General Variance Estimator, Econometric Theory, 0, pp. 233-253. Nielsen, J.P. an S. Sperlich. 2005. Smooth Backfitting in Practice, Journal of the Royal Statistical Society, Series B, 67, pp. 43-6. Pagan, A. an A. Ullah. 999. Nonparametric Econometrics. Cambrige: Cambrige University Press. Opsomer, J.D. 2000. Asymptotic Properties of Backfitting Estimators, Journal of Multivariate Analysis, 73, pp. 66-79. Opsomer, J.D. an D. Ruppert. 997. Fitting a Bivariate Aitive Moel by Local Polynomial Regression, Annals of Statistics, 25, pp. 86-2. Severance-Lossin, E. an S. Sperlich. 999. Estimation of Derivatives for Aitive Separable Moels, Statistics, 33, pp. 24-265. 30

Song. Q. an L. Yang. 200. Oracally Efficient Spline Smoothing of Nonlinear Aitive Autoregression Moels with Simultaneous Confience Ban, Journal of Multivariate Analysis, 0, pp. 2008-2025. Sperlich, S., D. Tøstheim, an L. Yang. 2002. Nonparametric Estimation an Testing of Interaction in Aitive Moels, Econometric Theory, 8, pp. 97-25. Stone, C.J. 985. Aitive Regression an Other Nonparametric Moels, Annals of Statistics, 3, pp. 689-705. Stock, J.H. an M.W. Watson. 20. Introuction to Econometrics, 3r eition. Boston: Pearson/Aison Wesley. Tukey, J. 949. One Degree of Freeom Test for Non-Aitivity, Biometrics, 5, pp. 232-242. Wang, L. an L. Yang. 2007. Spline-Backfitte Kernel Smoothing of Nonlinear Aitive Autoregression Moel, Annals of Statistics, 35, pp. 2474-2503. Wang, X. an K.C. Carriere. 20. Assessing Aitivity in Nonparametric Moels a Kernel- Base Metho, Canaian Journal of Statistics, 39, pp. 632-655. Yang, L., S. Sperlich, an W. Härle. 2003. Derivative Estimation an Testing in Generalize Aitive Moels, Journal of Statistical Planning an Inference, 5, pp. 52-542. Yu, K., B.U. Park, an E. Mammen. 2008. Smooth Backfitting in Generalize Aitive Moels, Annals of Statistics, 36, pp. 228-260. 3