SEMIPARAMETRIC SINGLE-INDEX MODELS. Joel L. Horowitz Department of Economics Northwestern University

Similar documents
DIRECT SEMIPARAMETRIC ESTIMATION OF SINGLE-INDEX MODELS WITH DISCRETE COVARIATES

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

STA6938-Logistic Regression Model

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Kernel density estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Notes On Median and Quantile Regression. James L. Powell Department of Economics University of California, Berkeley

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Maximum Likelihood Estimation

Quantile regression with multilayer perceptrons.

11 THE GMM ESTIMATION

Lecture 33: Bootstrap

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Logit regression Logit regression

ECO 312 Fall 2013 Chris Sims LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Unbiased Estimation. February 7-12, 2008

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Properties and Hypothesis Testing

Statistical Inference Based on Extremum Estimators

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Algorithms for Clustering

Study the bias (due to the nite dimensional approximation) and variance of the estimators

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Chapter 6 Sampling Distributions

1.010 Uncertainty in Engineering Fall 2008

Random Variables, Sampling and Estimation

Lecture 24: Variable selection in linear models

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Estimation for Complete Data

1 Covariance Estimation

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

Problem Cosider the curve give parametrically as x = si t ad y = + cos t for» t» ß: (a) Describe the path this traverses: Where does it start (whe t =

Probability and Statistics

Lecture 19: Convergence

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Mixtures of Gaussians and the EM Algorithm

Mathematical Statistics - MS

A Note on Box-Cox Quantile Regression Estimation of the Parameters of the Generalized Pareto Distribution

Efficient GMM LECTURE 12 GMM II

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Element sampling: Part 2

Topic 9: Sampling Distributions of Estimators

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

Monte Carlo Integration

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

¹Y 1 ¹ Y 2 p s. 2 1 =n 1 + s 2 2=n 2. ¹X X n i. X i u i. i=1 ( ^Y i ¹ Y i ) 2 + P n

Monte Carlo method and application to random processes

Exponential Families and Bayesian Inference

MA Advanced Econometrics: Properties of Least Squares Estimators

Local Polynomial Regression

Bayesian Methods: Introduction to Multi-parameter Models

Problem Set 4 Due Oct, 12

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Berry-Esseen bounds for self-normalized martingales

Linear Regression Models

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

The Expectation-Maximization (EM) Algorithm

Regression with an Evaporating Logarithmic Trend

Rank tests and regression rank scores tests in measurement error models

Asymptotic Results for the Linear Regression Model

Lecture 2: Monte Carlo Simulation

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

On an Application of Bayesian Estimation

4 Conditional Distribution Estimation

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

x = Pr ( X (n) βx ) =

Monte Carlo Methods: Lecture 3 : Importance Sampling

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

1 Inferential Methods for Correlation and Regression Analysis

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Stat410 Probability and Statistics II (F16)

11 Correlation and Regression

CHAPTER 10 INFINITE SEQUENCES AND SERIES

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Topic 9: Sampling Distributions of Estimators

Machine Learning Theory (CS 6783)

Clases 7-8: Métodos de reducción de varianza en Monte Carlo *

Lecture 7: Properties of Random Samples

Distribution of Random Samples & Limit theorems

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Chapter 6 Principles of Data Reduction

Pattern Classification, Ch4 (Part 1)

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

IP Reference guide for integer programming formulations.

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Computing Confidence Intervals for Sample Data

PC5215 Numerical Recipes with Applications - Review Problems

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

10-701/ Machine Learning Mid-term Exam Solution

Last Lecture. Wald Test

ECE 901 Lecture 13: Maximum Likelihood Estimation

Transcription:

SEMIPARAMETRIC SINGLE-INDEX MODELS by Joel L. Horowitz Departmet of Ecoomics Northwester Uiversity

INTRODUCTION Much of applied ecoometrics ad statistics ivolves estimatig a coditioal mea fuctio: E ( Y X = x) Y may be cotiuous or biary If biary, the E ( Y X = x) is P ( Y = 1 X = x) I biary respose model, Y may idicate a idividual s choice amog two alteratives, occurrece or o-occurrece of a evet, etc. Possible approaches Fully parametric Fully oparametric Semiparametric

FULLY PARAMETRIC MODELING I fully parametric model, E ( Y X = x) is kow up to a fiite-dimesioal parameter: E ( Y = 1 X = x) = F( x, θ ) F is kow fuctio θ is ukow, fiite-dimesioal parameter Example: biary probit or logit model Advatages: If F is correctly specified Maximizes estimatio efficiecy Permits extrapolatio of x beyod rage of data Ofte has atural behavioral iterpretatio Disadvatages: F rarely kow i applicatios Ca be highly misleadig if F is misspecified

FULLY NONPARAMETRIC MODELING E ( Y X = x) G( x) assumed to be smooth fuctio of x Nothig assumed about shape of G. G estimated by oparametric mea regressio of Y o X This miimizes a priori assumptios ad likelihood of specificatio error Disadvatages: Hard to icorporate behavioral hypotheses draw from ecoomic or other theory models Estimatio precisio is expoetially decreasig fuctio of dimesio of X Extrapolatio ot possible

SEMIPARAMETRIC MODELING Achieves greater precisio tha oparametric models but with weaker assumptios tha parametric models Does this by restrictig G( x ) so as to reduce effective dimesio of x. Risk of specificatio error greater tha with fully oparametric model but less tha with parametric oe Examples: Sigle-idex model: Gx ( ) = Fxβ ( ), where F is ukow Additive model: Gx ( ) = H[ f( x) +... + f( x)], 1 1 where H is kow or ukow fuctio ad f i s are ukow d d

IDENTIFICATION OF SINGLE-INDEX MODELS E ( Y X = x) = G( x β ) β ot idetified if G is costat fuctio. Sig, scale, ad locatio ormalizatios eeded to idetify β To implemet assume X has o itercept ad β 1 = 1. X 1 must be cotiuously distributed coditioal o other compoets of X. Let X = ( X1, X 2) ad X β = X1+ β2x2. G ad β 2 ca be aythig that satisfy: (X 1,X 2) G(X 1 + β 2X 2) E(Y X) (0,0) G(0) 0 (1,0) G(1) 1 (0,1) G(β 2 ) 3 (1,1) G(1 + β 2 ) 4

OPTIMZATION ESTIMATORS If G kow, β ca be estimated by oliear least squares. 1 2 i i i i= 1 miimize: w( X )[ Y G( X b] b where w ( ) is a weight fuctio. Whe G ukow, replace G(X i b) with oparametric estimator of E(Y X i b) (e.g., kerel). Estimator ow solves 1 2 i i i i= 1 miimize: w( X )[ Y G ( X b)] w may be chose to b Keep deomiator of G away from 0 Achieve asymptotic efficiecy

ASYMPTOTIC NORMALITY Ichimura (1993) gives coditios uder which 1/2 ( b β ) N(0, V ) where b is weighted NLS estimator Proof based o stadard Taylor series methods of asymptotic distributio theory Estimator has 1/2 rate of covergece Hall ad Ichimura (1991) derived asymptotic efficiecy boud for β i Y G( X ) ( Xi ) U i = i β + σ β where the U i are iid with mea 0 Hall ad Ichimura also derived asymptotically efficiet estimator Uses estimate of σ(x i β) -1 as weight fuctio i NLS objective fuctio ad kerel estimator of G. i

MLE FOR BINARY RESPONSE MODEL If Y = 0 or 1, G(xβ) = P(Y=1 X=x) If G kow, log likelihood is i= 1 { i i [ i ]} log Lb ( ) = log GXb ( ) + (1 Y)log 1 GXb ( ) If G ukow, replace it with estimator G log Lb ( ) = i= 1 τ { log G ( X b) + (1 Y)log[ 1 G ( X b) ]} i i i i τ i trims away observatios for which G( Xib) is too close to 0 or 1. Klei ad Spady (1993) gave coditios uder 1/2 which semiparametric MLE estimator is - cosistet ad asymptotically ormal Chamberlai (1986) ad Cosslett (1987) derived asymptotic efficiecy boud for case i which G is a CDF Semiparametric MLE achieves boud

DIRECT ESTIMATORS NLS ad ML estimators are hard to compute Direct estimators avoid eed to solve optimizatio problem Direct estimators are ot asymptotically efficiet Efficiet estimator ca be obtaied easily by oe-step method If X is cotiuous radom vector, β proportioal to average derivative of G β E [ wx ( ) GX ( β ) X] where w is a weight fuctio Oly weighted average derivative eeded because β idetified oly up to scale If w is idetity fuctio, get average derivative estimator of β (Härdle ad Stoker 1989) This estimator is hard to aalyze because of its radom deomiator

DENSITY WEIGHTED AVERAGE DERIVATIVE ESTIMATORS Radom deomiator problem ca be overcome by settig w(x) = f(x), desity of X Itegratio by parts gives [ f( X) G( Xβ ) X] δ E [ β ] = 2 EGX ( ) f( X)/ X [ X] = 2 EY f( X) Estimate δ by replacig E with sample average ad f with kerel estimator to get δ = ( 2/ ) i= 1 Y i fi( Xi) x where f i is leave-oe-out kerel estimator of f(x). Powell, Stock, ad Stoker (1989) gave coditios uder which 1/2 ( δ δ ) N (0, V)

METHOD OF PROOF Write δ as U statistic of order 2 with badwidthdepedet kerel U statistic is asymptotically equivalet to its projectio, which gives δ 1/2 = (2/ ) r( Yi, Xi) + op( ), i= 1 where r ( Y, X ) i i = k + 1 1 Xi x K [ Yi E( Y X = x) ] f( x) dx h h Chagig variables i itegral shows that leadig term of r does ot deped o h or So δ is asymptotically equivalet to a sum of iid radom variables 1/2 -cosistecy ad asymptotic ormality follow from Lideberg-Levy theorem

TECHNICAL DETAILS Must use higher-order K with udersmoothig to isure that asymptotic distributio of 1/2 (δ - δ) is cetered at 0. Härdle ad Tsybakov (1993) ad Powell ad Stoker (1996) describe methods for selectig applicatios. h i Horowitz ad Härdle (1996) show how to iclude discrete compoets of X i direct estimator.

ESTIMATOR WITH DISCRETE COVARIATES Write model as E(Y X = x, Z = z) = G(Xβ + Zα), where X is cotiuous ad Z is discrete with M poits of support. Idetificatio requires a cotiuous covariate Assume estimator of β, b is available, possibly average of average derivative estimates computed at each poit i support of Z. Suppose there are fiite umbers c 0, c 1, v 0, v 1 such that Gv ( + z α) is bouded for all v [v 0,v 1 ] ad z supp( Z). v v0 G( v zα ) c0 + for each z supp( Z) v v1 G( v zα ) c1 Defie + > for each z supp( Z) v 0 0 1 > v J 1 ( z ) = { cigv [ ( + z α) < c ] + cigv [ ( + z α) c 1 ] 0 + Gv ( + zα) Ic [ Gv ( + zα) c]} dv 0 1

The for i DISCRETE COVARIATES (cot.) = 2,..., M () i (1) () i (1) = 1 0 Jz [ ] Jz [ ] ( c c)[ z z ] α. This is M - 1 liear equatios i compoets of α. To solve, write (2) (1) Jz [ ] Jz [ ] J =... ; ( M ) (1) Jz [ ] Jz [ ] W (2) (1) z z =.... ( M ) (1) z z The 1 0 1 1 α = ( c c ) ( WW ) W J. Obtai estimator by replacig G with oparametric regressio estimate of E( Y Xb = v, Z = z). Let J be resultig estimator of J Estimator of α is 1 1 α = ( c c ) ( WW ) W J 0 1 Horowitz ad Härdle (1996) give coditios uder 1/2 d which ( α α) N(0,Vα ).

1.8 J K G(V + 2) G G(V).2 E F G H 0 A B -2.85-1.15-0.85.85 V C D ( c0, c 1) = (0.2,0.8), ( v0, v 1) = (2.85,0.85) (1) J[ z ] = ACGE+ CDHG+ GH = 2c + 1.7c + GHK 0 0 (2) J[ z ] = ABFE + BDKJ + EFJ = 1.7c + 2c + EFJ 0 1 (2) (1) (2) (1) Jz [ ] Jz [ ] = 2( c c) = ( c c)[ z z ] α K 1 0 1 0

HIGH-DIMENSIONAL X Average derivative estimators require G ad f to have may derivatives) if X is high dimesioal. This is form of curse of dimesioality Implies that fiite-sample precisio of average derivatives may be low if dim( ) X large. Hristache, Juditsky, ad Spokoiy (2001) proposed method for iteratively improvig a average derivative estimator. Method uses two badwidths: a large oe i the directio orthogoal to curret estimate ad a small oe i parallel directio. Calculate ew estimate of β usig average derivatives with the two badwidths This procedure yields estimator that is 1/2 - cosistet ad asymptotically ormal regardless of dimesio of X whe G is twice differetiable. Mote Carlo evidece idicates that iterated estimator has smaller fiite-sample errors tha oiterated oe.

OUTLINE OF ITERATIVE METHOD Iitializatio: Specify parameters ρ 1, ρ mi, a ρ, h 1, hmax, ah, k = 1, 0 ˆβ (iitial estimate of β ) Compute S = ( I + ρ ˆ β ˆ β ) 2 1/2 k k k 1 k 1 For every i = 1,...,, compute fˆ ( X ) from ˆ 1 1 = 2 f ( ) k X i SX k ij K 2 fˆ ( ) j 1 Xij Xij h k X i = k k i 1 2 1 SX k ij Yj K 2 j= 1 X ij h k where X ij = X j Xi Compute ˆ 1 k = f ( ) j= 1 k Xi β Set hk+ 1 = ahhk, ρ = k+ 1 a ρ ρ. If k ρk+ 1 > ρm i, set k = k+1 ad retur to step 2. Otherwise, stop. ˆ

AN APPLICATION Model of product iovatio by Germa maufacturers of ivestmet goods Data assembled by IFO Istitute i Muich Cosist of observatios o 1100 maufacturers Model: P(Y=1 X=x) = G(Xβ), where Y = 1 if maufacturer realized a iovatio i a specific product category i 1989 ad 0 otherwise Variables: o. of employees i product category (EMPLP), o. of employees i etire firm (EMPLF), idicator of firm s productio capacity utilizatio (CAP), DEM = 1 if firm expected icreasig demad for product ad 0 otherwise

ESTIMATED COEFFICIENTS FOR MODEL OF PRODUCT INNOVATION EMPLP EMPLF CAP DEM Semiparametric Model 1 0.032 0.346 1.732 (0.028) (0.078) (0.509) Probit Model 1 0.516 0.520 1.895 (0.242) (0.163) (0.387)

1.8 G(V).6.4-4 0 4 8 12 V ESTIMATE OF G(V).1 dg/dv.05 0-4 0 4 8 12 V

CONCLUSIONS Sigle-idex models: Provide compromise betwee restrictios of parametric models ad imprecisio of fully oparametric models May be structural (e.g., radom utility biaryrespose model) Asymptotic efficiecy bouds available i some cases Two classes of estimators Noliear optimizatio: provides asymptotically efficiet estimator i some cases Direct: No-iterative, does ot require solvig oliear optimizatio problem Oe-step estimatio from direct-estimate yields asymptotic efficiecy whe efficiet estimator available Example based o real data illustrates usefuless