Inference For High Dimensional M-estimates. Fixed Design Results

Size: px
Start display at page:

Download "Inference For High Dimensional M-estimates. Fixed Design Results"

Transcription

1 : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, /57

2 Table of Contents 1 Background 2 Main Results and Examples 3 Assumptions and Proof Sketch 4 Numerical Results 2/57

3 Table of Contents 1 Background 2 Main Results and Examples 3 Assumptions and Proof Sketch 4 Numerical Results 3/57

4 Setup Observe {x 1, y 1 }, {x 2, y 2 },..., {x n, y n }: response vector Y = (y 1,..., y n ) T R n ; design matrix X = (x T 1,..., x T n ) T R n p. 4/57

5 Setup Observe {x 1, y 1 }, {x 2, y 2 },..., {x n, y n }: response vector Y = (y 1,..., y n ) T R n ; design matrix X = (x1 T,..., x n T ) T R n p. Model: Linear Model: Y = X β + ɛ; ɛ = (ɛ 1,..., ɛ n ) T R n being a random vector; 4/57

6 M-Estimator M-Estimator: Given a convex loss function ρ( ) : R [0, ), 1 ˆβ = arg min β R p n n ρ(y i xi T i=1 β). 5/57

7 M-Estimator M-Estimator: Given a convex loss function ρ( ) : R [0, ), 1 ˆβ = arg min β R p n n ρ(y i xi T i=1 β). When ρ is differentiable with ψ = ρ, ˆβ can be written as the solution: 1 n ψ(y i xi T ˆβ) = 0. n i=1 5/57

8 M-Estimator: Examples ρ(x) = x 2 /2 gives the Least-Square estimator; 6/57

9 M-Estimator: Examples ρ(x) = x 2 /2 gives the Least-Square estimator; L2 Loss x psi(x) rho(x) x 6/57

10 M-Estimator: Examples ρ(x) = x 2 /2 gives the Least-Square estimator; ρ(x) = x gives the Least-Absolute-Deviation estimator; L2 Loss x psi(x) rho(x) x 6/57

11 M-Estimator: Examples ρ(x) = x 2 /2 gives the Least-Square estimator; ρ(x) = x gives the Least-Absolute-Deviation estimator; rho(x) L2 Loss L1 Loss x x psi(x) x x 6/57

12 M-Estimator: Examples rho(x) ρ(x) = x 2 /2 gives the Least-Square estimator; ρ(x) = x { gives the Least-Absolute-Deviation estimator; x ρ(x) = 2 /2 x k gives the Huber estimator. k( x k/2) x > k L2 Loss L1 Loss x x psi(x) x x 6/57

13 M-Estimator: Examples rho(x) ρ(x) = x 2 /2 gives the Least-Square estimator; ρ(x) = x { gives the Least-Absolute-Deviation estimator; x ρ(x) = 2 /2 x k gives the Huber estimator. k( x k/2) x > k L2 Loss L1 Loss Huber Loss x x psi(x) x x x x 6/57

14 Goals (Informal) Goal (Informal): Make inference on the coordinates of ˆβ when the dimension p is comparable to the sample size n; and X is treated as fixed; without assumptions on β. 7/57

15 Goals (Informal) Goal (Informal): Make inference on the coordinates of ˆβ when the dimension p is comparable to the sample size n; and X is treated as fixed; without assumptions on β. Consider β 1 WLOG; Given X and L(ɛ), L( ˆβ 1 ) is uniquely determined; Ideally, we construct a 95% confidence interval for β1 as ( ) )] [q L( ˆβ 1 ), q (L( ˆβ 1 ) where q α denotes the α-th quantile; Unfortunately, L( ˆβ 1 ) is complicated. 7/57

16 Asymptotic Arguments Exact finite sample inference is hard. This motivates statisticians to resort to asymptotic arguments, i.e. find a distribution F s.t. L( ˆβ 1 ) F. 8/57

17 Asymptotic Arguments Exact finite sample inference is hard. This motivates statisticians to resort to asymptotic arguments, i.e. find a distribution F s.t. L( ˆβ 1 ) F. The limiting behavior of ˆβ when p is fixed, as n, ( ) L( ˆβ) N β, (X T X ) 1 E(ψ2 (ɛ 1 )) [Eψ (ɛ 1 )] 2 ; As a consequence, we obtain an approximate 95% confidence interval for β1, [ ˆβ sd( ˆβ 1 ), ˆβ sd( ] ˆβ 1 ) where sd( ˆβ 1 ) could be any consistent estimator of the standard deviation. 8/57

18 Asymptotic Arguments In other words, to approximate L( ˆβ 1 ), we consider a sequence of hypothetical problems, indexed by j, where the j-th problem has a sample size n j and a dimension p j = p. 9/57

19 Asymptotic Arguments In other words, to approximate L( ˆβ 1 ), we consider a sequence of hypothetical problems, indexed by j, where the j-th problem has a sample size n j and a dimension p j = p. For j-th problem, denote by ˆβ (j) the corresponding M-estimator, then the previous slide uses lim L( ˆβ (j) 1 ) to approximate L( ˆβ 1 ). j 9/57

20 Asymptotic Arguments In other words, to approximate L( ˆβ 1 ), we consider a sequence of hypothetical problems, indexed by j, where the j-th problem has a sample size n j and a dimension p j = p. For j-th problem, denote by ˆβ (j) the corresponding M-estimator, then the previous slide uses lim L( ˆβ (j) 1 ) to approximate L( ˆβ 1 ). j In general, p j is not necessarily fixed and can grow to infinity. 9/57

21 Asymptotic Arguments Huber (1973) raised the question of understanding the behavior of ˆβ when both n and p tend to infinity; Huber (1973) showed the L 2 consistency of ˆβ: ˆβ β under the regime p 3 n 0; Portnoy (1984) prove the L 2 consistency of ˆβ under the regime p log p n 0; 10/57

22 Asymptotic Arguments Portnoy (1985) showed that ˆβ is jointly asymptotically normal under the regime (p log n) 3 2 n 0, in the sense that for any sequence of vectors a n R p, L at n ( ˆβ β ) Var(an T ˆβ) N(0, 1) 11/57

23 p/n: A Measure of Difficulty All of the above works requires p/n 0 or n/p. 12/57

24 p/n: A Measure of Difficulty All of the above works requires p/n 0 or n/p. n/p is the number of samples per parameter. Heuristically, a larger n/p would give an easier problem. 12/57

25 p/n: A Measure of Difficulty Recall that the approximation can be seen as a sequence of hypothetical problems with sample size n j and dimension p j. If n j /p j, the problems become increasingly easier as j grows. 13/57

26 p/n: A Measure of Difficulty Recall that the approximation can be seen as a sequence of hypothetical problems with sample size n j and dimension p j. If n j /p j, the problems become increasingly easier as j grows. In other words, the hypothetical problem used for approximation is much easier than the original problem. Then the approximation accuracy might be compromised. 13/57

27 Moderate p/n Regime Instead, we can consider a sequence of hypothetical problems with p j /n j fixed to be the same as the original problem, i.e. p j /n j p/n. 14/57

28 Moderate p/n Regime Instead, we can consider a sequence of hypothetical problems with p j /n j fixed to be the same as the original problem, i.e. p j /n j p/n. In this case, the difficulty of the problem is fixed. 14/57

29 Moderate p/n Regime Formally, we define Moderate p/n Regime as p j /n j κ > 0. A typical value for κ is p/n in the original problem. 15/57

30 Moderate p/n Regime: More Informative Asymptotics Consider a set of small-sample problems where n = 50 and p = nκ for κ {0.1,..., 0.9}. For each pair (n, p), Step 1 Generate X R n p with i.i.d. N(0, 1) entries; Step 2 Fix β = 0 and sample Y = ɛ with ɛ i i.i.d. N(0, 1) or ɛ i i.i.d. t 2 ; Step 3 Estimate β 1 by ˆβ 1 with a Huber loss; Step 4 Repeat Step 2 - Step 3 for 100 times and estimate L( ˆβ 1 ). 16/57

31 Moderate p/n Regime: More Informative Asymptotics Now consider two types of approximations: Fixed-p Approx.: N = 1000, P = p; Moderate-p/n Approx.: N = 1000, P = 1000κ; Repeat Step 1-Step 4 for new pairs (N, P) and estimate L( ˆβ 1 F ) (Fixed p); L( ˆβ 1 M ) (Moderate p/n). 17/57

32 Moderate p/n Regime: More Informative Asymptotics Now consider two types of approximations: Fixed-p Approx.: N = 1000, P = p; Moderate-p/n Approx.: N = 1000, P = 1000κ; Repeat Step 1-Step 4 for new pairs (N, P) and estimate L( ˆβ 1 F ) (Fixed p); L( ˆβ 1 M ) (Moderate p/n). Measure the accuracy of two approximations by the Kolmogorov-Smirnov statistics ) ) d KS (L( ˆβ 1 ), L( ˆβ 1 F ) and d KS (L( ˆβ 1 ), L( ˆβ 1 M ) 17/57

33 Moderate p/n Regime: More Informative Asymptotics Distance between the small sample and large sample distribution normal t(2) Kolmogorov Smirnov Statistics kappa Asym. Regime p fixed p/n fixed 18/57

34 Moderate p/n Regime: Negative Results The moderate p/n regime has been widely studied in random matrix theory. In statistics: Huber (1973) showed that for least-square estimators there always exists a sequence of vectors a n R p such that L at n ( ˆβ LS β ) Var(an T ˆβ N(0, 1). LS ) Bickel and Freedman (1982) showed that the bootstrap fails in the Least-Square case and the usual rescaling does not help; El Karoui et al. (2011) showed that for general loss functions, ˆβ β /57

35 Moderate p/n Regime: Negative Results The moderate p/n regime has been widely studied in random matrix theory. In statistics: Huber (1973) showed that for least-square estimators there always exists a sequence of vectors a n R p such that L at n ( ˆβ LS β ) Var(an T ˆβ N(0, 1). LS ) Bickel and Freedman (1982) showed that the bootstrap fails in the Least-Square case and the usual rescaling does not help; El Karoui et al. (2011) showed that for general loss functions, ˆβ β Main reason: ˆF n, the empirical distribution of the residuals, namely R i y i xi T ˆβ, does not converge to L(ɛ i ). 19/57

36 Moderate p/n Regime: Positive Results If X is assumed to be a random matrix under regularity conditions, 20/57

37 Moderate p/n Regime: Positive Results If X is assumed to be a random matrix under regularity conditions, Bean et al. (2013) showed that when X has i.i.d. Gaussian entries, for any sequence of a n R p L X,ɛ at n ( ˆβ β ) Var X,ɛ (an T ˆβ) N(0, 1); The above result does not contradict Huber (1973) in that the randomness comes from both X and ɛ; El Karoui et al. (2011) showed that for general loss functions, ˆβ β 0. Under weaker assumptions on X, El Karoui (2015) showed L X,ɛ ˆβ 1 (τ) β1 bias( ˆβ 1 (τ)) N(0, 1) Var X,ɛ ( ˆβ 1 (τ)) where ˆβ 1 (τ) is the ridge-penalized M-estimator. 20/57

38 Moderate p/n Regime: Summary Provides a more accurate approximation of L( ˆβ 1 ); 21/57

39 Moderate p/n Regime: Summary Provides a more accurate approximation of L( ˆβ 1 ); Qualitatively different from the classical regimes where p/n 0; L 2 -consistency of ˆβ no longer holds; the residuals R i behaves differently from ɛ i ; fixed design results are different from random design results. 21/57

40 Moderate p/n Regime: Summary Provides a more accurate approximation of L( ˆβ 1 ); Qualitatively different from the classical regimes where p/n 0; L 2 -consistency of ˆβ no longer holds; the residuals R i behaves differently from ɛ i ; fixed design results are different from random design results. Inference on the vector ˆβ is hard; but inference on the coordinate / low-dimensional linear contrasts of ˆβ is still possible. 21/57

41 Goals (Formal) Our Goal (formal): Under the linear model Y = X β + ɛ, Derive the asymptotic distribution of coordinates ˆβ j : under the moderate p/n regime, i.e. p/n κ (0, 1); with a fixed design matrix X ; without assumptions on β. 22/57

42 Table of Contents 1 Background 2 Main Results and Examples 3 Assumptions and Proof Sketch 4 Numerical Results 23/57

43 Main Result (Informal) Definition 1. Let P and Q be two distributions on R p, d TV (P, Q) = sup A R p P(A) Q(A). 24/57

44 Main Result (Informal) Definition 1. Let P and Q be two distributions on R p, d TV (P, Q) = sup A R p P(A) Q(A). Theorem. Under appropriate conditions on the design matrix X, the distribution of ɛ and the loss function ρ, as p/n κ (0, 1), while n, max j d TV L ˆβ j E ˆβ j, N(0, 1) = o(1). Var( ˆβ j ) 24/57

45 Examples: Realization of i.i.d. Designs We consider the case where X is a realization of a random design Z. The examples below are proved to satisfy the technical assumptions with high probability over Z. 25/57

46 Examples: Realization of i.i.d. Designs We consider the case where X is a realization of a random design Z. The examples below are proved to satisfy the technical assumptions with high probability over Z. Example 1 Z has i.i.d. mean-zero sub-gaussian entries with Var(Z ij ) = τ 2 > 0; Example 2 Z contains an intercept term, i.e. Z = (1, Z) and Z R n (p 1) has independent sub-gaussian entries with Z ij µ j d = µj Z ij, Var( Z ij ) > τ 2 for some arbitrary µ j. 25/57

47 Examples: Realizations of Dependent Gaussian Designs Example 3 Z is matrix-normal with vec(z) N(0, Λ Σ) and λ max (Λ), λ max (Σ) = O (1), λ min (Λ), λ min (Σ) = Ω (1) Example 4 Z contains an intercept term, i.e. Z = (1, Z) and vec( Z) N(0, Λ Σ) with Λ and Σ satisfy the above condition and max i (Λ 1 2 1) i min i (Λ 1 2 1) i = O (1). 26/57

48 A Counter-Example Consider a one-way ANOVA situation. Each observation i is associated with a label k i {1,..., p} and let X i,j = I (j = k i ). This is equivalent to Y i = β k i + ɛ i. 27/57

49 A Counter-Example Consider a one-way ANOVA situation. Each observation i is associated with a label k i {1,..., p} and let X i,j = I (j = k i ). This is equivalent to Y i = β k i + ɛ i. It is easy to see that ˆβ j = arg min β R i:k i =j This is a standard location problem. ρ(y i β j ). 27/57

50 A Counter-Example Let n j = {i : k i = j}. In the least-square case, i.e. ρ(x) = x 2 /2, ˆβ j = β j + 1 n j ɛ i. i:k i =j 28/57

51 A Counter-Example Let n j = {i : k i = j}. In the least-square case, i.e. ρ(x) = x 2 /2, ˆβ j = β j + 1 n j i:k i =j ɛ i. Assume a balance design, i.e. n j n/p. Then n j << and none of ˆβ j is normal (unless ɛ i are normal); holds for general loss functions ρ. 28/57

52 A Counter-Example Let n j = {i : k i = j}. In the least-square case, i.e. ρ(x) = x 2 /2, ˆβ j = β j + 1 n j i:k i =j ɛ i. Assume a balance design, i.e. n j n/p. Then n j << and none of ˆβ j is normal (unless ɛ i are normal); holds for general loss functions ρ. Conclusion: some non-standard assumptions on X are required. 28/57

53 Table of Contents 1 Background 2 Main Results and Examples 3 Assumptions and Proof Sketch Least-Square Estimator: A Motivating Example Second-Order Poincaré Inequality Assumptions Main Results 4 Numerical Results 29/57

54 Least Square Estimator The L 2 loss, ρ(x) = x 2 /2, gives the least-square estimator ˆβ LS = (X T X ) 1 X T Y = β + (X T X ) 1 X T ɛ. 30/57

55 Least Square Estimator The L 2 loss, ρ(x) = x 2 /2, gives the least-square estimator ˆβ LS = (X T X ) 1 X T Y = β + (X T X ) 1 X T ɛ. Let e j denote the canonical basis vector in R p, then ˆβ LS j β j = e T j (X T X ) 1 X T ɛ. Write e T j (X T X ) 1 X T as α T j, then ˆβ LS j β j = n α j,i ɛ i. i=1 30/57

56 Least Square Estimator Lindeberg-Feller CLT claims that in order for ˆβ LS L j βj N(0, 1) Var( ˆβ j LS ) it is sufficient and almost necessary that α j α j 2 0. (1) 31/57

57 Least Square Estimator To see the necessity of the condition, recall the one-way ANOVA case. Let n j = {i : k i = j}, then X T X = diag(n j ) p j=1. This gives α j,i = { 1 n j if k i = j 0 if k i j 32/57

58 Least Square Estimator To see the necessity of the condition, recall the one-way ANOVA case. Let n j = {i : k i = j}, then X T X = diag(n j ) p j=1. This gives α j,i = { 1 n j if k i = j 0 if k i j As a result, α j = 1 n j, α j 2 = 1 nj α j α j 2 = 1 nj and hence However, in moderate p/n regime, there exists j such that n j 1/κ and thus ˆβ j LS is not asymptotically normal. 32/57

59 M-Estimator The result for LSE is derived from the analytical form of ˆβ LS. In contrast, an analytical form is not available for general ρ. 33/57

60 M-Estimator The result for LSE is derived from the analytical form of ˆβ LS. In contrast, an analytical form is not available for general ρ. Let ψ = ρ, it is the solution of 1 n n ψ(y i xi T i=1 ˆβ) = 0 33/57

61 M-Estimator The result for LSE is derived from the analytical form of ˆβ LS. In contrast, an analytical form is not available for general ρ. Let ψ = ρ, it is the solution of 1 n n ψ(y i xi T i=1 ˆβ) = 0 WLOG, assume β = 0, then 1 n n ψ(ɛ i xi T i=1 ˆβ) = 0. 33/57

62 M-Estimator Write R i for ɛ i x T i ˆβ and define D, D and G as D = diag(ψ (R i )), D = diag(ψ (R i )), G = I X (X T DX ) 1 X T D. 34/57

63 M-Estimator Write R i for ɛ i x T i ˆβ and define D, D and G as D = diag(ψ (R i )), D = diag(ψ (R i )), G = I X (X T DX ) 1 X T D. Lemma 2. Suppose ψ C 2 (R n ), then ˆβ j ɛ T = et j (X T DX ) 1 X T D, (2) ˆβ j ɛ ɛ T = G T diag(e T j (X T DX ) 1 X T D)G. (3) 34/57

64 Second-Order Poincaré Inequality ˆβ j is a smooth transform of a random vector, ɛ, with independent entries. A powerful CLT for this type of statistics is Second-Order Poincaré Inequality (Chatterjee, 2009). 35/57

65 Second-Order Poincaré Inequality ˆβ j is a smooth transform of a random vector, ɛ, with independent entries. A powerful CLT for this type of statistics is Second-Order Poincaré Inequality (Chatterjee, 2009). Definition 3. For each c 1, c 2 > 0, let L(c 1, c 2 ) be the class of probability measures on R that arise as laws of random variables like u(w ), where W N(0, 1) and u C 2 (R n ) with u (x) c 1 and u (x) c 2. For example, u = Id gives N(0, 1) and u = Φ gives U([0, 1]) 35/57

66 Second-Order Poincaré Inequality Proposition 1 (SOPI; Chatterjee, 2009). Let W = (W 1,..., W n ) indep. L(c 1, c 2 ). Take any g C 2 (R n ) and let U = g(w ), κ 0 = ( E n i g(w ) ) ; i=1 κ 1 = (E g(w ) 4 2) 1 4 ; κ 2 = (E 2 g(w ) 4 op) 1 4. If U has a finite fourth moment, then ( ) ) U EU d TV (L, N(0, 1) Var(U) κ 0 + κ 1 κ 2 Var(U). 36/57

67 Assumptions Assume that A1 ρ(0) = ψ(0) = 0 and for any x R, 0 < K 0 ψ (x) K 1, ψ (x) K 2 ; A2 ɛ has independent entries with ɛ i L(c 1, c 2 ); A3 Let λ + and λ be the largest and smallest eigenvalues of X T X /n and λ + = O(1), λ = Ω(1). 37/57

68 Second-Order Poincaré Inequality on ˆβ j Apply Second-Order Poincaré Inequality to ˆβ j, we obtain that Lemma 4. Let D = diag(ψ (ɛ i x T i ˆβ)) n i=1, and Then under assumptions A1-A3, max j d TV L M j = E e T j (X T DX ) 1 X T D 1 2. ˆβ j E ˆβ ( j maxj (nm, N(0, 1) j 2 = O ) 1 8 p Var( ˆβ j ) n min j Var( ˆβ j ) The main result is obtained if we prove ( ) ( ) 1 1 M j = o n, Var( ˆβ j ) = Ω. n ), 38/57

69 Further Assumptions Define the following quantities: leave-one-predictor-out estimate ˆβ [j] : the M-estimator obtained by removing the j-th column of X (El Karoui, 2013); leave-one-predictor-out residuals r i,[j] = ɛ i x T i,[j] ˆβ [j] where x T i,[j] is the i-th row of X after removing j-th entry; h j,0 = (ψ(r 1,[j] ),..., ψ(r n,[j] )) T ; Q j = Cov(h j,0 ) be the covariance matrix of ψ(r i,[j] ). 39/57

70 Further Assumptions Besides assumptions A1 - A3, we assume that Xj A4 min T Q j X j j tr(q j ) = Ω (1). 40/57

71 Further Assumptions Besides assumptions A1 - A3, we assume that Xj A4 min T Q j X j j tr(q j ) = Ω (1). Q j does not involve X j ; Assumption A4 guarantees Var( ˆβ j ) = Ω ( ) 1. n 40/57

72 Further Assumptions If X j is a realization of a random vector Z j with i.i.d. entries, then EZj T Q j Z j = tr(ez j Zj T Q j ) = EZ1,j 2 tr(q j ). If Zj T Q j Z j concentrates around its mean, then Zj T Q j Z j tr(q j ) EZ 2 1,j > 0. 41/57

73 Further Assumptions If X j is a realization of a random vector Z j with i.i.d. entries, then EZj T Q j Z j = tr(ez j Zj T Q j ) = EZ1,j 2 tr(q j ). If Zj T Q j Z j concentrates around its mean, then Zj T Q j Z j tr(q j ) EZ 2 1,j > 0. For example, when Z j has i.i.d. sub-gaussian entries, the Hansen-Wright inequality implies the concentration. { { t P( Zj T Q j Z j EZj T 2 t Q j Z j t) 2 exp c min Q j 2, F Q j op }}. 41/57

74 Further Assumptions To describe the last assumption, we define the following quantities: D [j] = diag(ψ (r i,[j] )): leave-one-predictor-out version of D; G [j] = I X [j] (X T [j] D [j]x [j] ) 1 X T [j] D [j]; h T j,1,i = e T i G [j] : the i-th row of G [j] ; C = max { max j hj,0 T X j, max h j,0 2 i,j hj,1,i T X } j. h j,1,i 2 42/57

75 Further Assumptions The last assumption: A5 E 8 C = O (polylog(n)). 43/57

76 Further Assumptions The last assumption: A5 E 8 C = O (polylog(n)). It turns out that when ρ(x) = x 2 /2, C max j e T j (X T X ) 1 X T e T j (X T X ) 1 X T 2. Recall that for Least-Squares, ˆβ j are all asymptotically normal iff the right-handed side tends to 0. This indicates that the assumption A5 is not just an artifact of the proof. 43/57

77 Further Assumptions Let α j,0 = h j,0 / h j,0 2, α j,1,i = h j,1,i / h j,1,i 2. Again, if X j is a realization of a random vector Z j with i.i.d. σ 2 -sub-gaussian entries, then α T j,0 Z j and α T j,1,i Z j are all σ 2 -sub-gaussian. 44/57

78 Further Assumptions Let α j,0 = h j,0 / h j,0 2, α j,1,i = h j,1,i / h j,1,i 2. Again, if X j is a realization of a random vector Z j with i.i.d. σ 2 -sub-gaussian entries, then α T j,0 Z j and α T j,1,i Z j are all σ 2 -sub-gaussian. Then C is the maximum of np + p sub-gaussian random variables and hence E 8 C = O(polyLog(n)). 44/57

79 Review of All Assumptions A1 ρ(0) = ψ(0) = 0 and for any x R, 0 < K 0 ψ (x) K 1, ψ (x) K 2 ; A2 ɛ has independent entries with ɛ i L(c 1, c 2 ); A3 Let λ + and λ be the largest and smallest eigenvalues of X T X /n and λ + = O(1), λ = Ω(1). Zj A4 min T Q j Z j j tr(q j ) = Ω (1). A5 E 8 C = O (polylog(n)). 45/57

80 Main Results Theorem 5. Under assumptions A1 A5, as p/n κ for some κ (0, 1) while n, max j d TV L ˆβ j E ˆβ j, N(0, 1) = o(1). Var( ˆβ j ) 46/57

81 A Corollary If further assume that A6 ρ is an even function and ɛ i d = ɛi. Then one can show that ˆβ is unbiased. As a consequence, 47/57

82 A Corollary If further assume that A6 ρ is an even function and ɛ i d = ɛi. Then one can show that ˆβ is unbiased. As a consequence, Theorem 6. Under assumptions A1 A6, as p/n κ for some κ (0, 1) while n, max j d TV L ˆβ j βj, N(0, 1) = o(1), Var( ˆβ j ) 47/57

83 Table of Contents 1 Background 2 Main Results and Examples 3 Assumptions and Proof Sketch 4 Numerical Results 48/57

84 Setup Design matrix X: (i.i.d. design): X ij i.i.d. F ; (partial Hadamard design): a matrix formed by a random set of p columns of a n n Hadamard matrix. Entry Distribution F: F = N(0, 1); F = t 2. Error Distribution L(ɛ): ɛ i are i.i.d. with ɛ i N(0, 1); ɛ i t 2. 49/57

85 Setup Sample Size n: {100, 200, 400, 800}; κ = p/n: {0.5, 0.8}; Loss Function ρ: Huber loss with k = 1.345, { 1 ρ(x) = 2 x 2 x k kx k2 2 x > k 50/57

86 Asymptotic Normality of A Single Coordinate For each set of parameters, we run 50 simulations with each consisting of the following steps: (Step 1) Generate one design matrix X ; (Step 2) Generate the 300 error vectors ɛ; (Step 3) Regress each Y = ɛ on the design matrix X and end up with 300 random samples of ˆβ 1, denoted by ˆβ (1) (300) 1,..., ˆβ 1 ; (Step 4) Estimate the standard deviation of ˆβ 1 by the sample standard error sd; ˆ (Step 5) Construct [ a confidence interval ] I (k) = ˆβ (k) sd, ˆ ˆβ (k) sd ˆ for each k = 1,..., 300; (Step 6) Calculate the empirical 95% coverage by the proportion of confidence intervals which cover the true β 1 = 0. 51/57

87 Asymptotic Normality of A Single Coordinate 1.00 Coverage of β^1 (κ = 0.5) normal t(2) 1.00 Coverage of β^1 (κ = 0.8) normal t(2) Coverage iid hadamard Coverage iid hadamard Sample Size Entry Dist. normal t(2) hadamard Sample Size Entry Dist. normal t(2) hadamard 52/57

88 Conclusion We establish the coordinate-wise asymptotic normality of the M-estimator for certain fixed design matrices under the moderate p/n regime under regularity conditions on X, L(ɛ) and ρ but no condition on β ; We prove the result by using the novel approach Second-Order Poincaré Inequality (Chatterjee, 2009); We show that the regularity conditions are satisfied by a broad class of designs. 53/57

89 Future Works Future works for this project: Estimate Var( ˆβ j ) Relax the assumptions on L(ɛ) Relax the strong convexity of ρ Extend the results to GLM 54/57

90 Future Works Future works for this project: Estimate Var( ˆβ j ) Relax the assumptions on L(ɛ) Relax the strong convexity of ρ Extend the results to GLM Future works for my dissertation: Distributional properties in high dimensions Resampling methods in high dimensions 54/57

91 Thank You! 55/57

92 References I Bean, D., Bickel, P. J., El Karoui, N., & Yu, B. (2013). Optimal m-estimation in high-dimensional regression. Proceedings of the National Academy of Sciences, 110(36), Bickel, P. J., & Freedman, D. A. (1982). Bootstrapping regression models with many parameters. Festschrift for Erich L. Lehmann, Chatterjee, S. (2009). Fluctuations of eigenvalues and second order poincaré inequalities. Probability Theory and Related Fields, 143(1-2), El Karoui, N. (2013). Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: rigorous results. arxiv preprint arxiv: El Karoui, N. (2015). On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators. 56/57

93 References II El Karoui, N., Bean, D., Bickel, P. J., Lim, C., & Yu, B. (2011). On robust regression with high-dimensional predictors. Proceedings of the National Academy of Sciences, 110(36), Huber, P. J. (1973). Robust regression: asymptotics, conjectures and monte carlo. The Annals of Statistics, Portnoy, S. (1984). Asymptotic behavior of m-estimators of p regression parameters when p2/n is large. i. consistency. The Annals of Statistics, Portnoy, S. (1985). Asymptotic behavior of m estimators of p regression parameters when p2/n is large; ii. normal approximation. The Annals of Statistics, /57

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Inference for High Dimensional Robust Regression

Inference for High Dimensional Robust Regression Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:

More information

High-dimensional regression:

High-dimensional regression: High-dimensional regression: How to pick the objective function in high-dimension UC Berkeley March 11, 2013 Joint work with Noureddine El Karoui, Peter Bickel, Chingwhay Lim, and Bin Yu 1 / 12 Notation.

More information

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Yuxin Chen Electrical Engineering, Princeton University Coauthors Pragya Sur Stanford Statistics Emmanuel

More information

Can we trust the bootstrap in high-dimension?

Can we trust the bootstrap in high-dimension? Can we trust the bootstrap in high-dimension? Noureddine El Karoui and Elizabeth Purdom Department of Statistics, University of California, Berkeley First submitted: February 4, 205 This version: October

More information

Can We Trust the Bootstrap in High-dimensions? The Case of Linear Models

Can We Trust the Bootstrap in High-dimensions? The Case of Linear Models Journal of Machine Learning Research 9 208-66 Submitted /7; Revised 2/7; Published 08/8 Can We Trust the Bootstrap in High-dimensions? The Case of Linear Models Noureddine El Karoui Criteo AI Lab 32 Rue

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Negative Association, Ordering and Convergence of Resampling Methods

Negative Association, Ordering and Convergence of Resampling Methods Negative Association, Ordering and Convergence of Resampling Methods Nicolas Chopin ENSAE, Paristech (Joint work with Mathieu Gerber and Nick Whiteley, University of Bristol) Resampling schemes: Informal

More information

Can we trust the bootstrap in high-dimension?

Can we trust the bootstrap in high-dimension? Can we trust the bootstrap in high-dimension? Noureddine El Karoui and Elizabeth Purdom Department of Statistics, University of California, Berkeley February 4, 05 Abstract We consider the performance

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

large number of i.i.d. observations from P. For concreteness, suppose

large number of i.i.d. observations from P. For concreteness, suppose 1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

M-Estimation under High-Dimensional Asymptotics

M-Estimation under High-Dimensional Asymptotics M-Estimation under High-Dimensional Asymptotics 2014-05-01 Classical M-estimation Big Data M-estimation An out-of-the-park grand-slam home run Annals of Mathematical Statistics 1964 Richard Olshen Classical

More information

Sliced Inverse Regression

Sliced Inverse Regression Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed

More information

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama,

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017 Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Advanced Statistics II: Non Parametric Tests

Advanced Statistics II: Non Parametric Tests Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Robust estimation, efficiency, and Lasso debiasing

Robust estimation, efficiency, and Lasso debiasing Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling

More information

Nonparametric Inference via Bootstrapping the Debiased Estimator

Nonparametric Inference via Bootstrapping the Debiased Estimator Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Quantile Regression for Extraordinarily Large Data

Quantile Regression for Extraordinarily Large Data Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step

More information

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department

More information

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Johannes Heiny University of Aarhus Joint work with Thomas Mikosch (Copenhagen), Richard Davis (Columbia),

More information

Robust high-dimensional linear regression: A statistical perspective

Robust high-dimensional linear regression: A statistical perspective Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,

More information

Lecture 20: Linear model, the LSE, and UMVUE

Lecture 20: Linear model, the LSE, and UMVUE Lecture 20: Linear model, the LSE, and UMVUE Linear Models One of the most useful statistical models is X i = β τ Z i + ε i, i = 1,...,n, where X i is the ith observation and is often called the ith response;

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Quantile Processes for Semi and Nonparametric Regression

Quantile Processes for Semi and Nonparametric Regression Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

University of California San Diego and Stanford University and

University of California San Diego and Stanford University and First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford

More information

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

Fluctuations from the Semicircle Law Lecture 4

Fluctuations from the Semicircle Law Lecture 4 Fluctuations from the Semicircle Law Lecture 4 Ioana Dumitriu University of Washington Women and Math, IAS 2014 May 23, 2014 Ioana Dumitriu (UW) Fluctuations from the Semicircle Law Lecture 4 May 23, 2014

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Large sample distribution for fully functional periodicity tests

Large sample distribution for fully functional periodicity tests Large sample distribution for fully functional periodicity tests Siegfried Hörmann Institute for Statistics Graz University of Technology Based on joint work with Piotr Kokoszka (Colorado State) and Gilles

More information

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University  babu. Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling

More information

A Resampling Method on Pivotal Estimating Functions

A Resampling Method on Pivotal Estimating Functions A Resampling Method on Pivotal Estimating Functions Kun Nie Biostat 277,Winter 2004 March 17, 2004 Outline Introduction A General Resampling Method Examples - Quantile Regression -Rank Regression -Simulation

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Andreas Buja joint with the PoSI Group: Richard Berk, Lawrence Brown, Linda Zhao, Kai Zhang Ed George, Mikhail Traskin, Emil Pitkin,

More information

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary

More information

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University)

More information

Sub-Gaussian estimators under heavy tails

Sub-Gaussian estimators under heavy tails Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade Maresias, August 6th 2015 Joint with Luc Devroye (McGill) Matthieu Lerasle (CNRS/Nice) Gábor

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation

Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Maria Ponomareva University of Western Ontario May 8, 2011 Abstract This paper proposes a moments-based

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

Uncertainty Quantification for Inverse Problems. November 7, 2011

Uncertainty Quantification for Inverse Problems. November 7, 2011 Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance

More information

A Note on Auxiliary Particle Filters

A Note on Auxiliary Particle Filters A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,

More information

The Central Limit Theorem: More of the Story

The Central Limit Theorem: More of the Story The Central Limit Theorem: More of the Story Steven Janke November 2015 Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 1 / 33 Central Limit Theorem Theorem (Central Limit

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Lecture I: Asymptotics for large GUE random matrices

Lecture I: Asymptotics for large GUE random matrices Lecture I: Asymptotics for large GUE random matrices Steen Thorbjørnsen, University of Aarhus andom Matrices Definition. Let (Ω, F, P) be a probability space and let n be a positive integer. Then a random

More information

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Bootstrapping high dimensional vector: interplay between dependence and dimensionality Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

A Comparison of Robust Estimators Based on Two Types of Trimming

A Comparison of Robust Estimators Based on Two Types of Trimming Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

Sample Size Requirement For Some Low-Dimensional Estimation Problems

Sample Size Requirement For Some Low-Dimensional Estimation Problems Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Program Evaluation with High-Dimensional Data

Program Evaluation with High-Dimensional Data Program Evaluation with High-Dimensional Data Alexandre Belloni Duke Victor Chernozhukov MIT Iván Fernández-Val BU Christian Hansen Booth ESWC 215 August 17, 215 Introduction Goal is to perform inference

More information

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1 36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Optimization Problems with Probabilistic Constraints

Optimization Problems with Probabilistic Constraints Optimization Problems with Probabilistic Constraints R. Henrion Weierstrass Institute Berlin 10 th International Conference on Stochastic Programming University of Arizona, Tucson Recommended Reading A.

More information

Composite Loss Functions and Multivariate Regression; Sparse PCA

Composite Loss Functions and Multivariate Regression; Sparse PCA Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.

More information

Model Mis-specification

Model Mis-specification Model Mis-specification Carlo Favero Favero () Model Mis-specification 1 / 28 Model Mis-specification Each specification can be interpreted of the result of a reduction process, what happens if the reduction

More information

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get 18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash

More information

A Conditional Approach to Modeling Multivariate Extremes

A Conditional Approach to Modeling Multivariate Extremes A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying

More information

Random Matrix Theory and its Applications to Econometrics

Random Matrix Theory and its Applications to Econometrics Random Matrix Theory and its Applications to Econometrics Hyungsik Roger Moon University of Southern California Conference to Celebrate Peter Phillips 40 Years at Yale, October 2018 Spectral Analysis of

More information

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade Denis Chetverikov Brad Larsen Christopher Palmer UCLA, Stanford and NBER, UC Berkeley September

More information

Robustní monitorování stability v modelu CAPM

Robustní monitorování stability v modelu CAPM Robustní monitorování stability v modelu CAPM Ondřej Chochola, Marie Hušková, Zuzana Prášková (MFF UK) Josef Steinebach (University of Cologne) ROBUST 2012, Němčičky, 10.-14.9. 2012 Contents Introduction

More information

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2 EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME Xavier Mestre, Pascal Vallet 2 Centre Tecnològic de Telecomunicacions de Catalunya, Castelldefels, Barcelona (Spain) 2 Institut

More information

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional

More information

Bickel Rosenblatt test

Bickel Rosenblatt test University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance

More information

Bayesian spatial quantile regression

Bayesian spatial quantile regression Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse

More information

Stochastic process for macro

Stochastic process for macro Stochastic process for macro Tianxiao Zheng SAIF 1. Stochastic process The state of a system {X t } evolves probabilistically in time. The joint probability distribution is given by Pr(X t1, t 1 ; X t2,

More information

Concentration Inequalities for Random Matrices

Concentration Inequalities for Random Matrices Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

Hard-Core Model on Random Graphs

Hard-Core Model on Random Graphs Hard-Core Model on Random Graphs Antar Bandyopadhyay Theoretical Statistics and Mathematics Unit Seminar Theoretical Statistics and Mathematics Unit Indian Statistical Institute, New Delhi Centre New Delhi,

More information