Inference For High Dimensional M-estimates: Fixed Design Results

Size: px
Start display at page:

Download "Inference For High Dimensional M-estimates: Fixed Design Results"

Transcription

1 Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, /49

2 Table of Contents Background Main Results Heuristics and Proof Techniques Numerical Results 2/49

3 Table of Contents Background Main Results Heuristics and Proof Techniques Numerical Results 3/49

4 Setup Consider a linear Model: Y = Xβ + ɛ. y = (y 1,..., y n ) T R n : response vector; X = (x T 1,..., xt n ) T R n p : design matrix; β = (β 1,..., β p) T R p : coefficient vector; ɛ = (ɛ 1,..., ɛ n ) T R n : random unobserved error with independent entries. 4/49

5 M-Estimator M-Estimator: Given a convex loss function ρ( ) : R [0, ), 1 ˆβ = arg min β R p n n ρ(y i x T i β). When ρ is differentiable with ψ = ρ, ˆβ can be written as the solution: 1 n ψ(y i x T i n ˆβ) = 0. i=1 i=1 5/49

6 M-Estimator: Examples ρ(x) = x 2 /2 gives the Least-Square estimator; L2 Loss x psi(x) rho(x) x 6/49

7 M-Estimator: Examples ρ(x) = x 2 /2 gives the Least-Square estimator; ρ(x) = x gives the Least-Absolute-Deviation estimator; rho(x) L2 Loss L1 Loss x x psi(x) x x 6/49

8 M-Estimator: Examples rho(x) ρ(x) = x 2 /2 gives the Least-Square estimator; ρ(x) = x gives the Least-Absolute-Deviation estimator; { x ρ(x) = 2 /2 x k gives the Huber estimator. k( x k/2) x > k L2 Loss L1 Loss Huber Loss x x x psi(x) x x x 6/49

9 Goals (Informal) Goal (Informal): Make inference on the coordinates of β when X is treated as fixed; no assumption imposed on β ; and the dimension p is comparable to the sample size n. 7/49

10 Goals (Informal) Goal (Informal): Make inference on the coordinates of β when X is treated as fixed; no assumption imposed on β ; and the dimension p is comparable to the sample size n. Why coordinates? 7/49

11 Goals (Informal) Goal (Informal): Make inference on the coordinates of β when X is treated as fixed; no assumption imposed on β ; and the dimension p is comparable to the sample size n. Why coordinates? Why fixed designs? 7/49

12 Goals (Informal) Goal (Informal): Make inference on the coordinates of β when X is treated as fixed; no assumption imposed on β ; and the dimension p is comparable to the sample size n. Why coordinates? Why fixed designs? Why assumption-free β? 7/49

13 Goals (Informal) Goal (Informal): Make inference on the coordinates of β when X is treated as fixed; no assumption imposed on β ; and the dimension p is comparable to the sample size n. Why coordinates? Why fixed designs? Why assumption-free β? Why p n? 7/49

14 Asymptotic Arguments: Motivation Consider β 1 WLOG; 8/49

15 Asymptotic Arguments: Motivation Consider β 1 WLOG; Ideally, we construct a 95% confidence interval for β1 as ( [q L( ˆβ ) 1 ), q (L( ˆβ )] 1 ) where q α denotes the α-th quantile; 8/49

16 Asymptotic Arguments: Motivation Consider β 1 WLOG; Ideally, we construct a 95% confidence interval for β1 as ( [q L( ˆβ ) 1 ), q (L( ˆβ )] 1 ) where q α denotes the α-th quantile; Unfortunately, L( ˆβ 1 ) is unknown. 8/49

17 Asymptotic Arguments: Motivation Consider β 1 WLOG; Ideally, we construct a 95% confidence interval for β1 as ( [q L( ˆβ ) 1 ), q (L( ˆβ )] 1 ) where q α denotes the α-th quantile; Unfortunately, L( ˆβ 1 ) is unknown. This motivates the asymptotic arguments, i.e. find a distribution F s.t. L( ˆβ 1 ) F. 8/49

18 Asymptotic Arguments: Textbook Version The limiting behavior of ˆβ when p is fixed, as n, ( ) L( ˆβ) N β, (X T X) 1 E(ψ2 (ɛ 1 )) [Eψ (ɛ 1 )] 2 ; As a consequence, we obtain an approximate 95% confidence interval for β1, [ ˆβ1 1.96sd( ˆβ 1 ), ˆβ sd( ˆβ ] 1 ) where sd( ˆβ 1 ) could be any consistent estimator of the standard deviation. 9/49

19 Asymptotic Arguments: Hypothetical Problems y X R n p original problem (n = 100, p = 30) y X ˆβ 1 10/49

20 Asymptotic Arguments: Hypothetical Problems y X R n p y 1 X 1 R n 1 p 1 original problem (n = 100, p = 30) y X ˆβ 1 hypothetical problem (n 1 = 200, p 1 = 30) y 1 X 1 ˆβ (1) 1 10/49

21 Asymptotic Arguments: Hypothetical Problems y X R n p y 1 X 1 R n 1 p 1 y 2 X 2 R n 2 p 2 original problem (n = 100, p = 30) y X ˆβ 1 hypothetical problem (n 1 = 200, p 1 = 30) y 1 X 1 ˆβ (1) 1 hypothetical problem (n 2 = 500, p 2 = 30) y 2 X 2 ˆβ (2) 1 10/49

22 Asymptotic Arguments: Hypothetical Problems y X R n p original problem y 1 X 1 R n 1 p 1 y 2 X 2 R n 2 p 2 y 3 X 3 R n 3 p 3 (n = 100, p = 30) hypothetical problem y X ˆβ 1 (n 1 = 200, p 1 = 30) y 1 X 1 ˆβ (1) 1 hypothetical problem (n 2 = 500, p 2 = 30) y 2 X 2 ˆβ (2) 1 hypothetical problem (n 3 = 2000, p 3 = 30) y 3 X 3 ˆβ (3) 1 10/49

23 Asymptotic Arguments: Hypothetical Problems y X R n p original problem y 1 X 1 R n 1 p 1 y 2 X 2 R n 2 p 2 y 3 X 3 R n 3 p 3 (n = 100, p = 30) hypothetical problem y X ˆβ 1 (n 1 = 200, p 1 = 30) y 1 X 1 ˆβ (1) 1 hypothetical problem (n 2 = 500, p 2 = 30) y 2 X 2 ˆβ (2) 1 hypothetical problem (n 3 = 2000, p 3 = 30) y 3 X 3 Asymptotic argument: use lim j L( ˆβ (j) 1 ) to approximate L( ˆβ 1 ). ˆβ (3) 1 10/49

24 Asymptotic Arguments Huber [1973] raised the question of understanding the behavior of ˆβ when both n and p tend to infinity; 11/49

25 Asymptotic Arguments Huber [1973] raised the question of understanding the behavior of ˆβ when both n and p tend to infinity; Huber [1973] showed the L 2 consistency of ˆβ: ˆβ β 2 2 0, when p = o(n 1 3 ); 11/49

26 Asymptotic Arguments Huber [1973] raised the question of understanding the behavior of ˆβ when both n and p tend to infinity; Huber [1973] showed the L 2 consistency of ˆβ: ˆβ β 2 2 0, when p = o(n 1 3 ); Portnoy [1984] prove the L 2 consistency of ˆβ when ( ) n p = o. log n 11/49

27 Asymptotic Arguments Portnoy [1985] and Mammen [1989] showed that ˆβ is jointly asymptotically normal when p << n 2 3, 12/49

28 Asymptotic Arguments Portnoy [1985] and Mammen [1989] showed that ˆβ is jointly asymptotically normal when p << n 2 3, in the sense that for any sequence of vectors a n R p, L at n ( ˆβ β ) Var(a T ˆβ) N(0, 1) n 12/49

29 p/n: A Measure of Difficulty All of the above works requires p/n 0 or n/p. 13/49

30 p/n: A Measure of Difficulty All of the above works requires p/n 0 or n/p. n/p is the number of samples per parameter; Classical rule of thumb: n/p 5 10; Heuristically, a larger n/p would give an easier problem; Hypothetical problems with n j /p j are not appropriate because they are increasingly easier than the original problem. 13/49

31 Moderate p/n Regime Formally, we define Moderate p/n Regime as p/n κ > 0. y X R n p original problem (n = 100, p = 30) y X ˆβ 1 14/49

32 Moderate p/n Regime Formally, we define Moderate p/n Regime as p/n κ > 0. y X R n p y 1 X 1 R n 1 p 1 original problem (n = 100, p = 30) y X ˆβ 1 hypothetical problem (n 1 = 200, p 1 = 60) y 1 X 1 ˆβ (1) 1 14/49

33 Moderate p/n Regime Formally, we define Moderate p/n Regime as p/n κ > 0. y X R n p y 1 X 1 R n 1 p 1 y 2 X 2 R n 2 p 2 original problem (n = 100, p = 30) y X ˆβ 1 hypothetical problem (n 1 = 200, p 1 = 60) y 1 X 1 ˆβ (1) 1 hypothetical problem (n 2 = 500, p 2 = 150) y 2 X 2 ˆβ (2) 1 14/49

34 Moderate p/n Regime Formally, we define Moderate p/n Regime as p/n κ > 0. y X R n p original problem y 1 X 1 R n 1 p 1 y 2 X 2 R n 2 p 2 y 3 X 3 R n 3 p 3 (n = 100, p = 30) hypothetical problem y X ˆβ 1 (n 1 = 200, p 1 = 60) y 1 X 1 ˆβ (1) 1 hypothetical problem (n 2 = 500, p 2 = 150) y 2 X 2 ˆβ (2) 1 hypothetical problem (n 3 = 2000, p 3 = 600) y 3 X 3 ˆβ (3) 1 14/49

35 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. 15/49

36 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. X 15/49

37 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. X β 15/49

38 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. X β ɛ 1 15/49

39 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. y 1 = X β + ɛ 1 15/49

40 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. y 1 = X β + ɛ 1 M-Estimates: ˆβ(1) 1, 15/49

41 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. β y 2 = X + ɛ 1 ɛ 2 M-Estimates: ˆβ(1) 1, 15/49

42 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. β y 2 = X + ɛ 1 ɛ 2 M-Estimates: ˆβ(1) (2) 1, ˆβ 1, 15/49

43 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. β y 3 = X + ɛ 1 ɛ 2 ɛ 3 ɛ 3 M-Estimates: ˆβ(1) (2) 1, ˆβ 1, 15/49

44 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. β y 3 = X + ɛ 1 ɛ 2 ɛ 3 ɛ 3 M-Estimates: ˆβ(1) 1 (2) (3), ˆβ 1, ˆβ 1, 15/49

45 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1 (2) (3), ˆβ 1, ˆβ 1, 15/49

46 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1, ˆβ (2) 1 (3) (r), ˆβ 1,..., ˆβ 1. 15/49

47 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1, ˆβ (2) 1 (3) (r), ˆβ 1,..., ˆβ 1. = ˆL( ˆβ 1 ; X) = ecdf({ (1) (r) ˆβ 1,..., ˆβ 1 }). 15/49

48 Moderate p/n Regime: More Informative Asymptotics A Simulation to compare Fix-p Regime and Moderate p/n Regime: Fix-p Approximation: n = 1000, p = 50κ. 16/49

49 Moderate p/n Regime: More Informative Asymptotics A Simulation to compare Fix-p Regime and Moderate p/n Regime: Fix-p Approximation: n = 1000, p = 50κ. β y r = X + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(F,1) 1, ˆβ (F,2) 1, ˆβ (F,3) 1,..., ˆβ (F,r) 1. = ˆL( ˆβ F 1 ; X) = ecdf({ ˆβ (F,1) 1,..., ˆβ (F,r) 1 }). 16/49

50 Moderate p/n Regime: More Informative Asymptotics A Simulation to compare Fix-p Regime and Moderate p/n Regime: Moderate-p/n Approximation: n = 1000, p = 1000κ. 17/49

51 Moderate p/n Regime: More Informative Asymptotics A Simulation to compare Fix-p Regime and Moderate p/n Regime: Moderate-p/n Approximation: n = 1000, p = 1000κ. y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(M,1) 1, ˆβ (M,2) 1, ˆβ (M,3) 1,..., ˆβ (M,r) 1. = ˆL( ˆβ M 1 ; X) = ecdf({ ˆβ (M,1) 1,..., ˆβ (M,r) 1 }). 17/49

52 Moderate p/n Regime: More Informative Asymptotics Measure the accuracy of two approximations by the Kolmogorov-Smirnov statistics ( d KS ˆL( ˆβ1 ), ˆL( ˆβ ) 1 F ) ( and d KS ˆL( ˆβ1 ), ˆL( ˆβ ) 1 M ) Distance between the small sample and large sample distribution normal t(2) Kolmogorov Smirnov Statistics kappa Asym. Regime p fixed p/n fixed 18/49

53 Moderate p/n Regime: Negative Results The moderate p/n regime in statistics: 19/49

54 Moderate p/n Regime: Negative Results The moderate p/n regime in statistics: Huber [1973] showed that for least-square estimators there always exists a sequence of vectors a n R p such that L at n ( ˆβ LS β ) Var(a T ˆβ N(0, 1). n LS ) 19/49

55 Moderate p/n Regime: Negative Results The moderate p/n regime in statistics: Huber [1973] showed that for least-square estimators there always exists a sequence of vectors a n R p such that L at n ( ˆβ LS β ) Var(a T ˆβ N(0, 1). n LS ) Bickel and Freedman [1982] showed that the bootstrap fails in the Least-Square case and the usual rescaling does not help; 19/49

56 Moderate p/n Regime: Negative Results The moderate p/n regime in statistics: Huber [1973] showed that for least-square estimators there always exists a sequence of vectors a n R p such that L at n ( ˆβ LS β ) Var(a T ˆβ N(0, 1). n LS ) Bickel and Freedman [1982] showed that the bootstrap fails in the Least-Square case and the usual rescaling does not help; El Karoui et al. [2011] showed that for general loss functions, ˆβ β /49

57 Moderate p/n Regime: Negative Results The moderate p/n regime in statistics: Huber [1973] showed that for least-square estimators there always exists a sequence of vectors a n R p such that L at n ( ˆβ LS β ) Var(a T ˆβ N(0, 1). n LS ) Bickel and Freedman [1982] showed that the bootstrap fails in the Least-Square case and the usual rescaling does not help; El Karoui et al. [2011] showed that for general loss functions, ˆβ β El Karoui and Purdom [2015] showed that most widely used resampling schemes give poor inference on β 1. 19/49

58 Moderate p/n Regime: Reason of Failure Qualitatively, Influential observation always exists [Huber, 1973]: let H = X(X T X) 1 X T be the hat matrix, max H i,i 1 i n tr(h) = p >> 0. n 20/49

59 Moderate p/n Regime: Reason of Failure Qualitatively, Influential observation always exists [Huber, 1973]: let H = X(X T X) 1 X T be the hat matrix, max H i,i 1 i n tr(h) = p >> 0. n Regression residuals fail to mimic true error: R i y i x T i ˆβ ɛ i. 20/49

60 Moderate p/n Regime: Reason of Failure Qualitatively, Influential observation always exists [Huber, 1973]: let H = X(X T X) 1 X T be the hat matrix, max H i,i 1 i n tr(h) = p >> 0. n Regression residuals fail to mimic true error: R i y i x T i ˆβ ɛ i. Technically, Taylor expansion/bahadur-type representation fails! 20/49

61 Moderate p/n Regime: Positive Results (Random Designs) Bean et al. [2013] showed that when X has i.i.d. Gaussian entries, for any sequence of a n R p L X,ɛ at n ( ˆβ β ) Var X,ɛ (a T ˆβ) N(0, 1); n 21/49

62 Moderate p/n Regime: Positive Results (Random Designs) Bean et al. [2013] showed that when X has i.i.d. Gaussian entries, for any sequence of a n R p L X,ɛ at n ( ˆβ β ) Var X,ɛ (a T ˆβ) N(0, 1); n El Karoui [2015] extended it to general random designs. 21/49

63 Moderate p/n Regime: Positive Results (Random Designs) Bean et al. [2013] showed that when X has i.i.d. Gaussian entries, for any sequence of a n R p L X,ɛ at n ( ˆβ β ) Var X,ɛ (a T ˆβ) N(0, 1); n El Karoui [2015] extended it to general random designs. The above result does not contradict Huber [1973] in that the randomness comes from both X and ɛ; 21/49

64 Moderate p/n Regime: Positive Results (Random Designs) Bean et al. [2013] showed that when X has i.i.d. Gaussian entries, for any sequence of a n R p L X,ɛ at n ( ˆβ β ) Var X,ɛ (a T ˆβ) N(0, 1); n El Karoui [2015] extended it to general random designs. The above result does not contradict Huber [1973] in that the randomness comes from both X and ɛ; El Karoui et al. [2011] showed that for general loss functions, ˆβ β 0. 21/49

65 Moderate p/n Regime: Summary Provides a more accurate approximation of L( ˆβ 1 ); 22/49

66 Moderate p/n Regime: Summary Provides a more accurate approximation of L( ˆβ 1 ); Qualitatively different from the classical regimes where p/n 0; L2 -consistency of ˆβ no longer holds; the residual R i behaves differently from ɛ i ; fixed design results are different from random design results. 22/49

67 Moderate p/n Regime: Summary Provides a more accurate approximation of L( ˆβ 1 ); Qualitatively different from the classical regimes where p/n 0; L2 -consistency of ˆβ no longer holds; the residual R i behaves differently from ɛ i ; fixed design results are different from random design results. Inference on the vector ˆβ is hard; but inference on the coordinate / low-dimensional linear contrasts of ˆβ is still possible. 22/49

68 Goals (Formal) Our Goal (formal): Under the linear model Y = Xβ + ɛ, Derive the asymptotic distribution of coordinates ˆβ j : under the moderate p/n regime, i.e. p/n κ (0, 1); with a fixed design matrix X; without assumptions on β. 23/49

69 Table of Contents Background Main Results Heuristics and Proof Techniques Numerical Results 24/49

70 Main Result (Informal) Definition 1. Let P and Q be two distributions on R p, d TV (P, Q) = sup A R p P (A) Q(A). 25/49

71 Main Result (Informal) Definition 1. Let P and Q be two distributions on R p, d TV (P, Q) = sup A R p P (A) Q(A). Theorem. Under appropriate conditions on the design matrix X, the distribution of ɛ and the loss function ρ, as p/n κ (0, 1), while n, max j d TV L ˆβ j E ˆβ j Var( ˆβ, N(0, 1) = o(1). j ) 25/49

72 Main Result (Informal) If ρ is an even function and ɛ d = ɛ, then ˆβ β d = β ˆβ = E ˆβ = β. Theorem. Under appropriate conditions on the design matrix X, the distribution of ɛ and the loss function ρ, as p/n κ (0, 1), while n, max j d TV L ˆβ j βj Var( ˆβ j ), N(0, 1) = o(1). 26/49

73 Why Surprising? Classical approaches heavily rely on L 2 consistency of ˆβ, which only holds when p = o(n); Bahadur-type representation for ˆβ where n( ˆβ β) = 1 n n i=1 for some i.i.d. random variable Z i s; Z i + o p ( 1 n ), which can be proved only when p = o ( n 2/3) ; 27/49

74 Why Surprising? Classical approaches heavily rely on L 2 consistency of ˆβ, which only holds when p = o(n); Bahadur-type representation for ˆβ where n( ˆβ β) = 1 n n i=1 for some i.i.d. random variable Z i s; Z i + o p ( 1 n ), which can be proved only when p = o ( n 2/3) ; Question: What happens when p [O(n 2/3 ), O(n)]? 27/49

75 Our Contributions and Limitations Instead, we develops a novel strategy that is built on Leave-on-out method [El Karoui et al., 2011]; and Second-Order Poincaré Inequality [Chatterjee, 2009]. 28/49

76 Our Contributions and Limitations Instead, we develops a novel strategy that is built on Leave-on-out method [El Karoui et al., 2011]; and Second-Order Poincaré Inequality [Chatterjee, 2009]. We prove that ˆβ1 is asymptotically normal for all p [O(1), O(n)] for fixed designs under regularity conditions; the conditions are satisfied by most design matrices. 28/49

77 Our Contributions and Limitations Instead, we develops a novel strategy that is built on Leave-on-out method [El Karoui et al., 2011]; and Second-Order Poincaré Inequality [Chatterjee, 2009]. We prove that ˆβ1 is asymptotically normal for all p [O(1), O(n)] for fixed designs under regularity conditions; the conditions are satisfied by most design matrices. Limitations: we impose strong conditions on ρ and L(ɛ); we do not know how to estimate Var ɛ ( ˆβ 1 ). 28/49

78 Examples: Realization of i.i.d. Designs We consider the case where X is a realization of a random design Z. The examples below are proved to satisfy the technical assumptions with high probability over Z. 29/49

79 Examples: Realization of i.i.d. Designs We consider the case where X is a realization of a random design Z. The examples below are proved to satisfy the technical assumptions with high probability over Z. Example 1 Z has i.i.d. mean-zero sub-gaussian entries with Var(Z ij ) = τ 2 > 0; Example 2 Z contains an intercept term, i.e. Z = (1, Z) and Z R n (p 1) has independent sub-gaussian entries with Z ij µ j d = µj Z ij, Var( Z ij ) > τ 2 for some arbitrary µ j s. 29/49

80 A Counter-Example Consider a one-way ANOVA situation. Each observation i is associated with a label k i {1,..., p} and let X i,j = I(j = k i ). This is equivalent to Y i = β k i + ɛ i. 30/49

81 A Counter-Example Consider a one-way ANOVA situation. Each observation i is associated with a label k i {1,..., p} and let X i,j = I(j = k i ). This is equivalent to Y i = β k i + ɛ i. It is easy to see that ˆβ j = arg min β R i:k i =j This is a standard location problem. ρ(y i β j ). 30/49

82 A Counter-Example Let n j = {i : k i = j}. In the least-square case, i.e. ρ(x) = x 2 /2, ˆβ j = β j + 1 n j ɛ i. i:k i =j 31/49

83 A Counter-Example Let n j = {i : k i = j}. In the least-square case, i.e. ρ(x) = x 2 /2, ˆβ j = β j + 1 n j ɛ i. i:k i =j Assume a balance design, i.e. n j n/p. Then n j << and none of ˆβ j is normal (unless ɛ i are normal); holds for general loss functions ρ. 31/49

84 A Counter-Example Let n j = {i : k i = j}. In the least-square case, i.e. ρ(x) = x 2 /2, ˆβ j = β j + 1 n j ɛ i. i:k i =j Assume a balance design, i.e. n j n/p. Then n j << and none of ˆβ j is normal (unless ɛ i are normal); holds for general loss functions ρ. Conclusion: some non-standard assumptions on X are required. 31/49

85 Table of Contents Background Main Results Heuristics and Proof Techniques Least-Square Estimator: A Motivating Example Second-Order Poincaré Inequality Assumptions Main Results Numerical Results 32/49

86 Least Square Estimator The L 2 loss, ρ(x) = x 2 /2, gives the least-square estimator ˆβ LS = (X T X) 1 X T Y = β + (X T X) 1 X T ɛ. 33/49

87 Least Square Estimator The L 2 loss, ρ(x) = x 2 /2, gives the least-square estimator ˆβ LS = (X T X) 1 X T Y = β + (X T X) 1 X T ɛ. Let e j denote the canonical basis vector in R p, then ˆβ LS j β j = e T j (X T X) 1 X T ɛ α T j ɛ. 33/49

88 Least Square Estimator Lindeberg-Feller CLT claims that in order for ˆβ LS L j βj N(0, 1) Var( ˆβ LS j ) it is sufficient and almost necessary that α j α j 2 0. (1) 34/49

89 Least Square Estimator To see the necessity of the condition, recall the one-way ANOVA case. Let n j = {i : k i = j}, then X T X = diag(n j ) p j=1. Recall that α T j = et j (XT X) 1 X T. This gives α j,i = { 1 n j if k i = j 0 if k i j 35/49

90 Least Square Estimator To see the necessity of the condition, recall the one-way ANOVA case. Let n j = {i : k i = j}, then X T X = diag(n j ) p j=1. Recall that α T j = et j (XT X) 1 X T. This gives α j,i = { 1 n j if k i = j 0 if k i j As a result, α j = 1 n j, α j 2 = 1 nj α j α j 2 = 1 nj and hence However, in moderate p/n regime, there exists j such that n j 1/κ and thus is not asymptotically normal. ˆβ LS j 35/49

91 M-Estimator The result for LSE is derived from the analytical form of ˆβ LS. By contrast, an analytical form is not available for general ρ. 36/49

92 M-Estimator The result for LSE is derived from the analytical form of ˆβ LS. By contrast, an analytical form is not available for general ρ. Let ψ = ρ, it is the solution of 1 n n ψ(y i x T ˆβ) i = 0 1 n i=1 n ψ(ɛ i x T i ( ˆβ β )) = 0. i=1 We show that ˆβj is a smooth function of ɛ; ˆβ j ɛ and ˆβ j ɛ ɛ T are computable. 36/49

93 Second-Order Poincaré Inequality ˆβ j is a smooth transform of a random vector, ɛ, with independent entries. A powerful CLT for this type of statistics is Second-Order Poincaré Inequality [Chatterjee, 2009]. 37/49

94 Second-Order Poincaré Inequality ˆβ j is a smooth transform of a random vector, ɛ, with independent entries. A powerful CLT for this type of statistics is Second-Order Poincaré Inequality [Chatterjee, 2009]. Definition 2. For each c 1, c 2 > 0, let L(c 1, c 2 ) be the class of probability measures on R that arise as laws of random variables like u(w ), where W N(0, 1) and u C 2 (R n ) with u (x) c 1 and u (x) c 2. For example, u = Id gives N(0, 1) and u = Φ gives U([0, 1]). 37/49

95 Second-Order Poincaré Inequality Proposition 1 (SOPI; Chatterjee [2009]). Let W = (W 1,..., W n ) indep. L(c 1, c 2 ). Take any g C 2 (R n ) and let U = g(w ), κ 1 = (E g(w ) 4 2) 1 4 ; κ 2 = (E 2 g(w ) 4 op) 1 4 ; n κ 0 = (E i g(w ) 4 ) 1 2. i=1 If EU 4 <, then ( ) ) U EU d TV (L, N(0, 1) Var(U) κ 0 + κ 1 κ 2 Var(U). 38/49

96 Assumptions A1 ρ(0) = ψ(0) = 0 and for any x R, 0 < K 0 ψ (x) K 1, ψ (x) K 2 ; A2 ɛ has independent entries with ɛ i L(c 1, c 2 ); A3 Let λ + and λ be the largest and smallest eigenvalues of X T X/n and λ + = O(1), λ = Ω(1). A4 Similar to the condition for OLS: max j e T j (XT X) 1 X T e T j (XT X) 1 X T 2 = o(1) A5 Similar to the condition that ( ) min Var( ˆβ 1 j ) = Ω j n 39/49

97 Main Results Theorem 3. Under assumptions A1 A5, as p/n κ for some κ (0, 1) while n, max j d TV L ˆβ j E ˆβ j Var( ˆβ, N(0, 1) = o(1). j ) 40/49

98 Table of Contents Background Main Results Heuristics and Proof Techniques Numerical Results 41/49

99 Setup Design matrix X: (i.i.d. design): X ij i.i.d. F ; (partial Hadamard design): a matrix formed by a random set of p columns of a n n Hadamard matrix. Entry Distribution F: F = N(0, 1); F = t 2. Error Distribution L(ɛ): ɛ i are i.i.d. with ɛ i N(0, 1); ɛ i t 2. 42/49

100 Setup Sample Size n: {100, 200, 400, 800}; κ = p/n: {0.5, 0.8}; Loss Function ρ: Huber loss with k = 1.345, { 1 ρ(x) = 2 x2 x k kx k2 2 x > k ; Coefficients: β = 0. 43/49

101 Asymptotic Normality of A Single Coordinate 44/49

102 Asymptotic Normality of A Single Coordinate X 44/49

103 Asymptotic Normality of A Single Coordinate X β 44/49

104 Asymptotic Normality of A Single Coordinate X β ɛ 1 44/49

105 Asymptotic Normality of A Single Coordinate y 1 = X β + ɛ 1 44/49

106 Asymptotic Normality of A Single Coordinate y 1 = X β + ɛ 1 M-Estimates: ˆβ(1) 1, 44/49

107 Asymptotic Normality of A Single Coordinate β y 2 = X + ɛ 1 ɛ 2 M-Estimates: ˆβ(1) 1, 44/49

108 Asymptotic Normality of A Single Coordinate β y 2 = X + ɛ 1 ɛ 2 M-Estimates: ˆβ(1) (2) 1, ˆβ 1, 44/49

109 Asymptotic Normality of A Single Coordinate β y 3 = X + ɛ 1 ɛ 2 ɛ 3 ɛ 3 M-Estimates: ˆβ(1) (2) 1, ˆβ 1, 44/49

110 Asymptotic Normality of A Single Coordinate β y 3 = X + ɛ 1 ɛ 2 ɛ 3 ɛ 3 M-Estimates: ˆβ(1) 1 (2) (3), ˆβ 1, ˆβ 1, 44/49

111 Asymptotic Normality of A Single Coordinate y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1 (2) (3), ˆβ 1, ˆβ 1, 44/49

112 Asymptotic Normality of A Single Coordinate y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1, ˆβ (2) 1 (3) (r), ˆβ 1,..., ˆβ 1. 44/49

113 Asymptotic Normality of A Single Coordinate y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1 ( ŝd se {, ˆβ (2) 1 (3) (r), ˆβ 1,..., ˆβ 1. (1) (r) ˆβ 1,..., ˆβ 1 ); } 44/49

114 Asymptotic Normality of A Single Coordinate y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1 ( ŝd se {, ˆβ (2) 1 (3) (r), ˆβ 1,..., ˆβ 1. (1) (r) ˆβ 1,..., ˆβ 1 ); } ( ) want to compare L ˆβ1 /ŝd with N(0, 1); 44/49

115 Asymptotic Normality of A Single Coordinate y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1 ( ŝd se {, ˆβ (2) 1 (3) (r), ˆβ 1,..., ˆβ 1. (1) (r) ˆβ 1,..., ˆβ 1 ); } ( ) want to compare L ˆβ1 /ŝd with N(0, 1); count the fraction of (j) ˆβ 1 [ 1.96ŝd, 1.96ŝd] as the proxy; 44/49

116 Asymptotic Normality of A Single Coordinate y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1 ( ŝd se {, ˆβ (2) 1 (3) (r), ˆβ 1,..., ˆβ 1. (1) (r) ˆβ 1,..., ˆβ 1 ); } ( ) want to compare L ˆβ1 /ŝd with N(0, 1); count the fraction of (j) ˆβ 1 [ 1.96ŝd, 1.96ŝd] as the proxy; should be close to 0.95 ideally. 44/49

117 Asymptotic Normality of A Single Coordinate Coverage of β^1 (κ = 0.5) normal t(2) Coverage of β^1 (κ = 0.8) normal t(2) Coverage iid hadamard Coverage iid hadamard Sample Size Entry Dist. normal t(2) hadamard Sample Size Entry Dist. normal t(2) hadamard 45/49

118 Conclusion We establish the coordinate-wise asymptotic normality of the M-estimator for certain fixed design matrices under the moderate p/n regime under regularity conditions on X, L(ɛ) and ρ but no condition on β ; We prove the result by using the novel approach Second-Order Poincaré Inequality [Chatterjee, 2009]; We show that the regularity conditions are satisfied by a broad class of designs. 46/49

119 Discussion 47/49

120 Discussion Inference asym. normality + asym. bias + asym. variance Var( ˆβ 1 X) Var( ˆβ 1 ) when X is indeed a realization of a random design? Resampling method to give conservative variance estimates? More advanced boostrap? 47/49

121 Discussion Inference asym. normality + asym. bias + asym. variance Var( ˆβ 1 X) Var( ˆβ 1 ) when X is indeed a realization of a random design? Resampling method to give conservative variance estimates? More advanced boostrap? Relax the regularity conditions: Generalize to non-strongly convex and non-smooth loss functions? Generalize to general error distributions? 47/49

122 Discussion Inference asym. normality + asym. bias + asym. variance Var( ˆβ 1 X) Var( ˆβ 1 ) when X is indeed a realization of a random design? Resampling method to give conservative variance estimates? More advanced boostrap? Relax the regularity conditions: Generalize to non-strongly convex and non-smooth loss functions? Generalize to general error distributions? Get rid of asymptotics: Yes, exact finite-sample guarantee if n/p > 20; No assumption on X or β ; Only exchangeability assumption on ɛ. 47/49

123 Thank You! 48/49

124 References Derek Bean, Peter J Bickel, Noureddine El Karoui, and Bin Yu. Optimal m-estimation in high-dimensional regression. Proceedings of the National Academy of Sciences, 110(36): , Peter J Bickel and David A Freedman. Bootstrapping regression models with many parameters. Festschrift for Erich L. Lehmann, pages 28 48, Sourav Chatterjee. Fluctuations of eigenvalues and second order poincaré inequalities. Probability Theory and Related Fields, 143(1-2):1 40, Noureddine El Karoui. On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators Noureddine El Karoui and Elizabeth Purdom. Can we trust the bootstrap in high-dimension? UC Berkeley Statistics Department Technical Report, Noureddine El Karoui, Derek Bean, Peter J Bickel, Chinghway Lim, and Bin Yu. On robust regression with high-dimensional predictors. Proceedings of the National Academy of Sciences, 110(36): , Peter J Huber. Robust regression: asymptotics, conjectures and monte carlo. The Annals of Statistics, pages , Enno Mammen. Asymptotics with increasing dimension for robust regression with applications to the bootstrap. The Annals of Statistics, pages , Stephen Portnoy. Asymptotic behavior of m-estimators of p regression parameters when p2/n is large. i. consistency. The Annals of Statistics, pages , Stephen Portnoy. Asymptotic behavior of m estimators of p regression parameters when p2/n is large; ii. normal approximation. The Annals of Statistics, pages , /49

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Inference for High Dimensional Robust Regression

Inference for High Dimensional Robust Regression Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:

More information

High-dimensional regression:

High-dimensional regression: High-dimensional regression: How to pick the objective function in high-dimension UC Berkeley March 11, 2013 Joint work with Noureddine El Karoui, Peter Bickel, Chingwhay Lim, and Bin Yu 1 / 12 Notation.

More information

Can we trust the bootstrap in high-dimension?

Can we trust the bootstrap in high-dimension? Can we trust the bootstrap in high-dimension? Noureddine El Karoui and Elizabeth Purdom Department of Statistics, University of California, Berkeley First submitted: February 4, 205 This version: October

More information

Can We Trust the Bootstrap in High-dimensions? The Case of Linear Models

Can We Trust the Bootstrap in High-dimensions? The Case of Linear Models Journal of Machine Learning Research 9 208-66 Submitted /7; Revised 2/7; Published 08/8 Can We Trust the Bootstrap in High-dimensions? The Case of Linear Models Noureddine El Karoui Criteo AI Lab 32 Rue

More information

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Yuxin Chen Electrical Engineering, Princeton University Coauthors Pragya Sur Stanford Statistics Emmanuel

More information

Can we trust the bootstrap in high-dimension?

Can we trust the bootstrap in high-dimension? Can we trust the bootstrap in high-dimension? Noureddine El Karoui and Elizabeth Purdom Department of Statistics, University of California, Berkeley February 4, 05 Abstract We consider the performance

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade Denis Chetverikov Brad Larsen Christopher Palmer UCLA, Stanford and NBER, UC Berkeley September

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

F9 F10: Autocorrelation

F9 F10: Autocorrelation F9 F10: Autocorrelation Feng Li Department of Statistics, Stockholm University Introduction In the classic regression model we assume cov(u i, u j x i, x k ) = E(u i, u j ) = 0 What if we break the assumption?

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information

Advanced Statistics II: Non Parametric Tests

Advanced Statistics II: Non Parametric Tests Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Robust estimation, efficiency, and Lasso debiasing

Robust estimation, efficiency, and Lasso debiasing Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling

More information

A Resampling Method on Pivotal Estimating Functions

A Resampling Method on Pivotal Estimating Functions A Resampling Method on Pivotal Estimating Functions Kun Nie Biostat 277,Winter 2004 March 17, 2004 Outline Introduction A General Resampling Method Examples - Quantile Regression -Rank Regression -Simulation

More information

AFT Models and Empirical Likelihood

AFT Models and Empirical Likelihood AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t

More information

M-Estimation under High-Dimensional Asymptotics

M-Estimation under High-Dimensional Asymptotics M-Estimation under High-Dimensional Asymptotics 2014-05-01 Classical M-estimation Big Data M-estimation An out-of-the-park grand-slam home run Annals of Mathematical Statistics 1964 Richard Olshen Classical

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017 Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

large number of i.i.d. observations from P. For concreteness, suppose

large number of i.i.d. observations from P. For concreteness, suppose 1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate

More information

MA Advanced Econometrics: Applying Least Squares to Time Series

MA Advanced Econometrics: Applying Least Squares to Time Series MA Advanced Econometrics: Applying Least Squares to Time Series Karl Whelan School of Economics, UCD February 15, 2011 Karl Whelan (UCD) Time Series February 15, 2011 1 / 24 Part I Time Series: Standard

More information

Introduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data

Introduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data Discussion of sampling approach in big data Big data discussion group at MSCS of UIC Outline 1 Introduction 2 The framework 3 Bias and variance 4 Approximate computation of leverage 5 Empirical evaluation

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

Lecture 14 October 13

Lecture 14 October 13 STAT 383C: Statistical Modeling I Fall 2015 Lecture 14 October 13 Lecturer: Purnamrita Sarkar Scribe: Some one Disclaimer: These scribe notes have been slightly proofread and may have typos etc. Note:

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

Multivariate Regression Analysis

Multivariate Regression Analysis Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

A Primer on Asymptotics

A Primer on Asymptotics A Primer on Asymptotics Eric Zivot Department of Economics University of Washington September 30, 2003 Revised: October 7, 2009 Introduction The two main concepts in asymptotic theory covered in these

More information

Robustní monitorování stability v modelu CAPM

Robustní monitorování stability v modelu CAPM Robustní monitorování stability v modelu CAPM Ondřej Chochola, Marie Hušková, Zuzana Prášková (MFF UK) Josef Steinebach (University of Cologne) ROBUST 2012, Němčičky, 10.-14.9. 2012 Contents Introduction

More information

Understanding Regressions with Observations Collected at High Frequency over Long Span

Understanding Regressions with Observations Collected at High Frequency over Long Span Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Solution. If we does not need the pointwise limit of

More information

Robust high-dimensional linear regression: A statistical perspective

Robust high-dimensional linear regression: A statistical perspective Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,

More information

M-estimation in high-dimensional linear model

M-estimation in high-dimensional linear model Wang and Zhu Journal of Inequalities and Applications 208 208:225 https://doi.org/0.86/s3660-08-89-3 R E S E A R C H Open Access M-estimation in high-dimensional linear model Kai Wang and Yanling Zhu *

More information

Indian Statistical Institute

Indian Statistical Institute Indian Statistical Institute Introductory Computer programming Robust Regression methods with high breakdown point Author: Roll No: MD1701 February 24, 2018 Contents 1 Introduction 2 2 Criteria for evaluating

More information

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Lecture 3: Central Limit Theorem

Lecture 3: Central Limit Theorem Lecture 3: Central Limit Theorem Scribe: Jacy Bird (Division of Engineering and Applied Sciences, Harvard) February 8, 003 The goal of today s lecture is to investigate the asymptotic behavior of P N (

More information

Sliced Inverse Regression

Sliced Inverse Regression Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed

More information

Quantile Processes for Semi and Nonparametric Regression

Quantile Processes for Semi and Nonparametric Regression Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response

More information

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Andreas Buja joint with the PoSI Group: Richard Berk, Lawrence Brown, Linda Zhao, Kai Zhang Ed George, Mikhail Traskin, Emil Pitkin,

More information

Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation

Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Maria Ponomareva University of Western Ontario May 8, 2011 Abstract This paper proposes a moments-based

More information

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015

More information

Prediction Intervals For Lasso and Relaxed Lasso Using D Variables

Prediction Intervals For Lasso and Relaxed Lasso Using D Variables Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School 2017 Prediction Intervals For Lasso and Relaxed Lasso Using D Variables Craig J. Bartelsmeyer Southern Illinois University

More information

Inference for Identifiable Parameters in Partially Identified Econometric Models

Inference for Identifiable Parameters in Partially Identified Econometric Models Inference for Identifiable Parameters in Partially Identified Econometric Models Joseph P. Romano Department of Statistics Stanford University romano@stat.stanford.edu Azeem M. Shaikh Department of Economics

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Program Evaluation with High-Dimensional Data

Program Evaluation with High-Dimensional Data Program Evaluation with High-Dimensional Data Alexandre Belloni Duke Victor Chernozhukov MIT Iván Fernández-Val BU Christian Hansen Booth ESWC 215 August 17, 215 Introduction Goal is to perform inference

More information

Nonparametric Inference via Bootstrapping the Debiased Estimator

Nonparametric Inference via Bootstrapping the Debiased Estimator Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be

More information

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean Confidence Intervals Confidence interval for sample mean The CLT tells us: as the sample size n increases, the sample mean is approximately Normal with mean and standard deviation Thus, we have a standard

More information

Fluctuations from the Semicircle Law Lecture 4

Fluctuations from the Semicircle Law Lecture 4 Fluctuations from the Semicircle Law Lecture 4 Ioana Dumitriu University of Washington Women and Math, IAS 2014 May 23, 2014 Ioana Dumitriu (UW) Fluctuations from the Semicircle Law Lecture 4 May 23, 2014

More information

Quantile Regression for Extraordinarily Large Data

Quantile Regression for Extraordinarily Large Data Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step

More information

Inference on distributions and quantiles using a finite-sample Dirichlet process

Inference on distributions and quantiles using a finite-sample Dirichlet process Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics

More information

Large sample distribution for fully functional periodicity tests

Large sample distribution for fully functional periodicity tests Large sample distribution for fully functional periodicity tests Siegfried Hörmann Institute for Statistics Graz University of Technology Based on joint work with Piotr Kokoszka (Colorado State) and Gilles

More information

Model Mis-specification

Model Mis-specification Model Mis-specification Carlo Favero Favero () Model Mis-specification 1 / 28 Model Mis-specification Each specification can be interpreted of the result of a reduction process, what happens if the reduction

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

JEREMY TAYLOR S CONTRIBUTIONS TO TRANSFORMATION MODEL

JEREMY TAYLOR S CONTRIBUTIONS TO TRANSFORMATION MODEL 1 / 25 JEREMY TAYLOR S CONTRIBUTIONS TO TRANSFORMATION MODELS DEPT. OF STATISTICS, UNIV. WISCONSIN, MADISON BIOMEDICAL STATISTICAL MODELING. CELEBRATION OF JEREMY TAYLOR S OF 60TH BIRTHDAY. UNIVERSITY

More information

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2 EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME Xavier Mestre, Pascal Vallet 2 Centre Tecnològic de Telecomunicacions de Catalunya, Castelldefels, Barcelona (Spain) 2 Institut

More information

the error term could vary over the observations, in ways that are related

the error term could vary over the observations, in ways that are related Heteroskedasticity We now consider the implications of relaxing the assumption that the conditional variance Var(u i x i ) = σ 2 is common to all observations i = 1,..., n In many applications, we may

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

A Comparison of Robust Estimators Based on Two Types of Trimming

A Comparison of Robust Estimators Based on Two Types of Trimming Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1 36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations

More information

δ -method and M-estimation

δ -method and M-estimation Econ 2110, fall 2016, Part IVb Asymptotic Theory: δ -method and M-estimation Maximilian Kasy Department of Economics, Harvard University 1 / 40 Example Suppose we estimate the average effect of class size

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR J. Japan Statist. Soc. Vol. 3 No. 200 39 5 CALCULAION MEHOD FOR NONLINEAR DYNAMIC LEAS-ABSOLUE DEVIAIONS ESIMAOR Kohtaro Hitomi * and Masato Kagihara ** In a nonlinear dynamic model, the consistency and

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions Spatial inference I will start with a simple model, using species diversity data Strong spatial dependence, Î = 0.79 what is the mean diversity? How precise is our estimate? Sampling discussion: The 64

More information

STAT 4385 Topic 06: Model Diagnostics

STAT 4385 Topic 06: Model Diagnostics STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information