Efficient Estimation in Convex Single Index Models 1

Size: px

Start display at page:

Download "Efficient Estimation in Convex Single Index Models 1"

Kelly Stokes
5 years ago
Views:

1 1/28 Efficient Estimation in Convex Single Index Models 1 Rohit Patra University of Florida 1 Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia)

2 2/28 Overview 1 Introduction 2 Estimation 3 Asymptotics 4 Simulation study

3 Introduction 3/28

4 4/28 A semiparametric model Convex single index model Y = m 0 (θ 0 X ) + ɛ, E(ɛ X ) = 0. (Y, X ) R R d P θ0,m 0. θ 0 R d is the unknown coefficient vector. m 0 : R R is an unknown convex link function with no parametric restriction. Offers a balance between flexibility of nonparametric models and interpretability of parametric models.

5 5/28 Goal and Applicability Model Problem Y = m 0 (θ 0 X ) + ɛ, E(ɛ X ) = 0. Estimate θ 0 and m 0 simultaneously, when we have i.i.d. data {(x i, y i )} n i=1 from the above model. Why convex link Convex/concave SIMs are widely used in economics, operations research, financial engineering among other fields. Production functions, utility functions, and call option prices are known to be concave.

6 6/28 Restrictions Identifiability: If m 1 (t) = m 0 ( t/2) and θ 1 = 2θ 0, then m 0 (θ0 x) = m 1(θ1 x). Thus we need to assume θ 0 Θ := {β R d : β = 1, β 1 > 0}. We need some regularity assumptions on the class of link functions. Only Shape constraints Shape and Smoothness constraints Some relevant works in the shape constrained single index model include: Murphy et al. (1999), Chen and Samworth (2015), Groeneboom and Hendrickx (2016), and Balabdaoui et al. (2016).

7 Y = 2(X θ 0 ) 2 + N(0, 5), X U[0, 1] 3, m(t) = 2t 2. Y: Output for 555 Belgian Firms X: Labour, Capital, Wage Index y Convex LSE Smoothing Splines Truth Index log(output) Splines Concave LSE Kong and Xia (2007) Labour Data for Belgian Firms (1996) Figure: Estimated link functions: stability due to convex constraint. Index := ˆθx. 7/28

8 Estimation 8/28

9 9/28 Estimation in Convex SIM [Kuchibhotla, Patra, Sen, 2017] Lipschitz LSE where C L = ( ˆm, ˆθ) 1 = argmin m C L,θ Θ n n {y i m(θ x i )} 2, i=1 { m m is convex and m(t 1 ) m(t 2 ) L t 1 t 2 } t 1, t 2 R. Penalized LSE ( ˇm, ˇθ) 1 := argmin (m,θ) R Θ n n {y i m(θ x i )} 2 + ˇλ 2 n {m (t)} 2 dt, i=1 where R denotes the class of all convex functions that have absolutely continuous first derivative.

10 10/28 Computation: Alternating Scheme Recall: ( ˆm, ˆθ) = argmin m CL,θ Θ Q n (m, θ), where Q n (m, θ) := 1 n n {y i m(θ x i )} 2. i=1 Minimization for fixed θ For a fixed θ, define ˆm θ := argmin Q n (m, θ). m C L The minimization is a convex optimization problem. ˆm θ can computed efficiently using the nnls package in R. ˇm θ can be computed via a damped newton type algorithm. Our R package simest has a implementation of this computation.

11 11/28 Computation contd. Minimization for fixed θ We can now define the profiled loss Q n : Θ R, Q n (θ) := Q n ( ˆm θ, θ) Minimization over θ To find ˆθ, we now minimize Q n (θ) over θ Θ. The loss function is not convex. Simulations suggest a large domain of attraction for moderate d. We use a gradient step on the unit sphere based on the right derivative of ˆm θ. Some initial work has suggested that the convergence of this alternating scheme is linear, i.e., θ (k+1) ˆθ (1 ρ) θ (k) ˆθ.

12 Asymptotics 12/28

13 13/28 Theoretical properties of LLSE Rates of convergence of the estimators [Kuchibhotla, Patra, Sen, 2017] Under some regularity conditions on m 0 and distribution of X (bounded support), sub-gaussian errors and L L 0 the Lipschitz LSE satisfies ˆm ˆθ m 0 θ 0 = O p (n 2/5 ), [Estimation error] ˆm θ 0 m 0 θ 0 = O p (n 2/5 ), [Estimation error of ˆm] ˆm θ 0 m 0 θ 0 = O p (n 2/15 ), [Estimation error of ˆm ] ˆθ θ 0 = O p (n 2/5 ). [Estimation error of ˆθ] For a convex Lipschitz function g : R R let g define the right derivative of g that satisfies g(b) = g(a) + b a g (t)dt. Here for any f : R R, and θ R d, we define f θ 2 := f (θ x) 2 dp X (x).

14 14/28 Asymptotic normality: Homoscedastic model Recall: 1 ( ˆm, ˆθ) = argmin (m,θ) C L Θ n n {y i m(θ x i )} 2. i=1 Semiparametric efficiency of ˆθ [Kuchibhotla, Patra, and Sen, 2017] Assume that E(ɛ 2 X ) σ 2. Let l θ,m : R d R R d 1 be the efficient score and let us define the efficient information matrix I θ0,m 0 := E(l θ0,m 0 l θ 0,m 0 ) R (d 1) (d 1). If m 0 is twice differentiable, then under some regularity conditions we can conclude n( ˆθ θ 0 ) d N(0, H θ0 I 1 θ 0,m 0 H θ 0 ).

15 15/28 Inference Our result readily yields confidence sets for θ 0. We can use the following plug-in estimator for the covariance estimator: ˆΣ := ˆσ 4 Hˆθ Pˆθ, ˆm [lˆθ, ˆm (Y, X )l ˆθ, ˆm (Y, X )] 1 H ˆθ, where n ˆσ 2 := [y i ˆm(ˆθ x i )] 2 /n i=1 l θ,m (y, x) := ( y m(θ x) ) m (θ x)h θ { x hθ (θ x) }. Asymptotic 1 2α confidence interval [ ˆθ i z ) 1/2 α (ˆΣi,i, ˆθi + z ) 1/2 α (ˆΣi,i ], n n

16 16/28 Asymptotic properties of the PLSE ˇλ 1 n If = O p (n 2/5 ) and ˇλ n = o p (n 1/4 ), then under some similar regularity assumptions the PLSE satisfies ˇm ˇθ m 0 θ 0 = O p (ˇλ n ), ˇm θ 0 m 0 θ 0 = O p (ˇλ n ), [Estimation error] [Estimation error for ˇm] ˇm θ 0 m 0 θ 0 = O p (ˇλ 1/2 n ), [Estimation error for ˇm ] and n( ˇθ θ0 ) d N(0, H θ0 I 1 θ 0,m 0 H θ 0 ). An example choice of ˇλ n := C n 2/5. Proof borrows ideas from the empirical process theory; see e.g., Mammen and van de Geer (1997) and van de Geer (2000).

17 17/28 Difficulty in proving efficiency The LLSE ˆm is a piecewise affine function and lies on the boundary of C L. Nuisance tangent space Consider the following model: Y = m(θ X ) + ɛ, where m C L, and θ Θ. A linear perturbation/submodel around m: m s,a (t) = m(t) s a(t), where s R. The score for the single index model along this submodel is proportional to a( ). lin{a : D R m s,a R for small enough s} L 2 (Λ). The set inclusion is strict when m is not strongly convex.

18 18/28 Efficient score Let S θ denote the parametric score of the model and Λ m denote the nuisance tangent space. When m is strongly convex, the efficient score is known to be Π(S θ Λ m) := l θ,m (y, x) = ( y m(θ x) ) m (θ x)h θ { x hθ (θ x) }. Since ˆm is not strongly convex, it is not clear if one can show that?? Π(Sˆθ Λ ˆm) = lˆθ, ˆm.

19 19/28 Efficient score Since ˆm lies on the boundary of C L, the least favorable path does not exist, i.e., we can not find (θ t, m t ) (centered at (ˆθ, ˆm)) such that?? lˆθ, ˆm = ( y mt (θt x) ) 2 t. t=0 Since LLSE is the minimizer of the least squares loss, this would mean P n lˆθ, ˆm = 0. Since ˆm is piecewise affine, it is not clear if one can show that P n lˆθ, ˆm?? = o p (n 1/2 ).

20 20/28 Approximations We next try to construct some paths around (ˆθ, ˆm) that will have score close to lˆθ, ˆm. We find a path that has the following score: S θ,m = { ( y m(θ x) ) H θ [ θ x m (θ x)x + m (u)k (u)du m (θ x)k(θ x) s 0 ] + m 0 (s 0)k(s 0 ) m 0 (s 0)h θ0 (s 0 ). Note l θ0,m 0 = S θ0,m 0 = ( y m(θ 0 x)) H θ 0 m 0 (θ 0 x) { x h θ0 (θ 0 x)}. van der Vaart (2002) calls such a path approximately least favorable.

21 21/28 Approximation, Part 2 S θ,m is not very tractable, so we further approximate this by ψ θ,m (x, y) := (y m(θ x))h θ [m (θ x)x h θ0 (θ x)m 0(θ x)]. Compare ψ θ,m to l θ,m = ( y m(θ x) ) Hθ m (θ x) { x h θ (θ x) }. Also note ψ θ0,m 0 = l θ0,m 0. We show that P n ψˆθ, ˆm = o p(n 1/2 ).

22 Simulation study 22/28

23 23/28 Another Estimator Convex LSE ( m, θ) := argmin Q n (m, θ). (m,θ) C Θ We can compute the LSE via the alternating minimization algorithm. Simulations suggest θ is n consistent. The behavior of m at the boundary is not well-understood. In fact, in univariate regression, m is unbounded in probability at the boundary.

24 Choice of L Y = (θ 0 X ) 2 + N(0,.1 2 ), where X Uniform[ 1, 1] 4 and θ 0 = 1 4 /2. Mean Absolute Deviation CvxLSE L Figure: Box plots of i=1 θ i θ 0,i (over 1000 replications, n = 500) from the following model as the tuning parameter varies over {3, 4, 5, 7, 10} and CvxLSE. Here L 0 = 4. 24/28

25 25/28 Confidence Interval Y = (θ 0 X ) 2 + N(0,.3 2 ), X Uniform[ 1, 1] 3 and θ 0 = 1 3 / 3. (1) Table: The estimated coverage probabilities and average lengths (obtained from 800 replicates) of nominal 95% confidence intervals for the first coordinate of θ 0 for the model (1). n CvxLip CvxPen Coverage Avg Length Coverage Avg Length

26 0.005 d = d = CvxPen CvxLip Smooth d = EFM EDR CvxPen CvxLip Smooth d = 100 EFM EDR CvxPen CvxLip Smooth EFM EDR Figure: Boxplots of d i=1 ˆθ i θ 0,i /d (over 500 replications) based on 200 observations for dimensions 10, 25, 50, and 100. CvxPen CvxLip Smooth EFM 26/28

27 27/28 Summary First work providing efficient estimator in shape-constrained LSE (in a bundled parameter problem) when ˆm is piecewise affine. Our estimators readily lead to asymptotic confidence sets for θ 0. Our methods are robust towards the choice of the tuning parameter. The proposed estimators are implemented in the R package simest.

28 28/28 References [1] Kuchibhotla, A. K. and Patra, R. K. (2016). simest: Single Index Model Estimation with Constraints on Link Function. R package version 0.2. [2] Kuchibhotla, A. K., Patra, R. K., and Sen, B. (2017). Efficient Estimation in Convex Single Index Models. arxiv.org/abs/ [3] Kuchibhotla, A. K., and Patra, R. K. (2017). Efficient estimation in single index models through smoothing splines. arxiv.org/abs/ [4] Balabdaoui, F., Durot, C. and Jankowski, H. (2016). Least squares estimation in the monotone single index model. arxiv.org/abs/ [5] Groeneboom, P. and Hendrickx, K. (2016). Current status linear regression. Annals of Statistics (Forthcoming). [6] Chen, Y. and Samworth, R. J. (2014). Generalised additive and index models with shape constraints. JRSSB. 78(4), [7] Murphy, S. A., van der Vaart, A. W. and Wellner, J. A. (1999). Current status regression. Math. Methods Statist. 8(3), Thank You! Questions?

Efficient Estimation in Single Index Models through Smoothing splines

Efficient Estimation in Single Index Models through Smoothing splines Arun K. Kuchibhotla and Rohit K. Patra University of Pennsylvania and University of Florida, e-mail: arunku@upenn.edu; rohitpatra@ufl.edu