High-dimensional regression:

Size: px

Start display at page:

Download "High-dimensional regression:"

Loreen Mason
5 years ago
Views:

1 High-dimensional regression: How to pick the objective function in high-dimension UC Berkeley March 11, 2013 Joint work with Noureddine El Karoui, Peter Bickel, Chingwhay Lim, and Bin Yu 1 / 12

2 Notation. Standard linear model. Observe n pairs (X i, Y i ): Y i = X T i β 0 + ɛ i. Errors ɛ i iid fɛ dim(x i ) = dim(β 0 ) = p M-estimates. β ρ = argmin β i ρ(y i X T i β) ρ - objective function, loss function 2 / 12

3 Classical theory: low-dimension. Relles (1968); Huber (1973); Portnoy (1985) Behavior of β β 0 : v R p set of p weights v T β unbiased for v T β 0, asym. normal Variance: [ ( ) ] 1 v T X T X v r 2 (ρ, f ɛ ) Key: p grows slowly with n p/n 0. Given f ɛ, compute r 2 possible to compare estimates Best estimate: minimize r 2 over ρ. 3 / 12

4 Best objective function in low-dimension. Given f ɛ, r 2 (ρ, f ɛ ) minimized by: ρ opt = log f ɛ Well known: maximum likelihood estimate (MLE). Example MLEs: 1 Normal errors: Least squares (LS): ρ opt (x) = x 2. 2 Double exponential errors: Least absolute deviations (LAD): ρ opt (x) = x. (Robust) 4 / 12

5 Surprising simulations! LAD ^2/ OLS ^2 D.E. errors: E β LAD β /E β LS β 0 2 Ratio Mean Square LAD to OLS samples, 1000 simulations. p/n 5 / 12

6 M-estimates in high-dimension. PNAS: El Karoui et. al. 2012, to appear Assume: p/n κ (0, 1) X i iid N (0, Ip ) Then: for set of p weights v, v 2 = 1: 1 v T β unbiased for v T β 0, asym. normal. 2 Variance: p 1 r 2 (ρ, f ɛ ; κ) Can characterize r 2 (complicated!) Best estimate: given f ɛ AND κ, minimize r 2 across ρ 6 / 12

7 Optimal M-estimates in high-dimension PNAS: Bean et. al. 2012, to appear Key results: given error density f ɛ, 1 For each dimension p/n κ there exists r opt (κ) such that r(ρ, f ɛ ; κ) r opt (κ) for all ρ Can characterize r opt (κ) 2 When f ɛ is log-concave, r opt is achieved by an optimal loss function ρ opt. 7 / 12

8 Details of ρ opt. Let f r,ɛ = N (0, r 2 ) f ɛ. Write r opt = r opt (κ). Optimal loss: ρ opt (x) = ( P 2 + ropt 2 log f ropt,ɛ) (x) P2 (x), optimal objective adaptive to dimension! P 2 (x) = x 2 /2 g is the conjugate dual of generic convex g. 8 / 12

9 has a d that formed Fig. 2. Ratio r 2 opt (κ)/r2 l 2 (κ) for double exponential errors: the ratio is always Optimal loss vs. LAD, D.E. errors less than 1, showing the superiority of the objective we propose over l 2. jective, resent of our distri- ]) r 2 r r 2 opt (κ)/r2 L1 (κ) Ratio r 2 opt (κ)/r2 L1 (κ) ponenely the κ = lim p/n 9 / 12

10 Optimal loss vs. LS, D.E. errors ictors, egresɛ is a a mulments posiering. m we are in r 2 opt (κ)/r2 L2 (κ) Ratio r 2 opt (κ)/r2 L2 (κ) in low show not at has a κ=lim p/n 10 / 12

11 Example: behavior of optimal loss function IN HIGH 4 Optimal loss, p/n = 0.5, D.E. errors 2 1 Optimal loss x^2 x 0 ρ(x) x 11 / 12

12 Takeaway messages. 1 Low-dimensional intuition upended in high-dimensional setting 12 / 12

13 Takeaway messages. 1 Low-dimensional intuition upended in high-dimensional setting 2 Can get precise distributional behavior in high-dimensions Random vs. fixed design / 12

14 Takeaway messages. 1 Low-dimensional intuition upended in high-dimensional setting 2 Can get precise distributional behavior in high-dimensions Random vs. fixed design... 3 Can optimize the loss in high-dimensions A new family of dimension-adaptive loss functions 12 / 12

15 Takeaway messages. 1 Low-dimensional intuition upended in high-dimensional setting 2 Can get precise distributional behavior in high-dimensions Random vs. fixed design... 3 Can optimize the loss in high-dimensions A new family of dimension-adaptive loss functions 4 (Not presented) Extensions to penalized estimates E.g. LASSO, ridge-type estimates 12 / 12

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49