Inference For High Dimensional M-estimates. Fixed Design Results
|
|
- Marjory Webster
- 5 years ago
- Views:
Transcription
1 : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, /57
2 Table of Contents 1 Background 2 Main Results and Examples 3 Assumptions and Proof Sketch 4 Numerical Results 2/57
3 Table of Contents 1 Background 2 Main Results and Examples 3 Assumptions and Proof Sketch 4 Numerical Results 3/57
4 Setup Observe {x 1, y 1 }, {x 2, y 2 },..., {x n, y n }: response vector Y = (y 1,..., y n ) T R n ; design matrix X = (x T 1,..., x T n ) T R n p. 4/57
5 Setup Observe {x 1, y 1 }, {x 2, y 2 },..., {x n, y n }: response vector Y = (y 1,..., y n ) T R n ; design matrix X = (x1 T,..., x n T ) T R n p. Model: Linear Model: Y = X β + ɛ; ɛ = (ɛ 1,..., ɛ n ) T R n being a random vector; 4/57
6 M-Estimator M-Estimator: Given a convex loss function ρ( ) : R [0, ), 1 ˆβ = arg min β R p n n ρ(y i xi T i=1 β). 5/57
7 M-Estimator M-Estimator: Given a convex loss function ρ( ) : R [0, ), 1 ˆβ = arg min β R p n n ρ(y i xi T i=1 β). When ρ is differentiable with ψ = ρ, ˆβ can be written as the solution: 1 n ψ(y i xi T ˆβ) = 0. n i=1 5/57
8 M-Estimator: Examples ρ(x) = x 2 /2 gives the Least-Square estimator; 6/57
9 M-Estimator: Examples ρ(x) = x 2 /2 gives the Least-Square estimator; L2 Loss x psi(x) rho(x) x 6/57
10 M-Estimator: Examples ρ(x) = x 2 /2 gives the Least-Square estimator; ρ(x) = x gives the Least-Absolute-Deviation estimator; L2 Loss x psi(x) rho(x) x 6/57
11 M-Estimator: Examples ρ(x) = x 2 /2 gives the Least-Square estimator; ρ(x) = x gives the Least-Absolute-Deviation estimator; rho(x) L2 Loss L1 Loss x x psi(x) x x 6/57
12 M-Estimator: Examples rho(x) ρ(x) = x 2 /2 gives the Least-Square estimator; ρ(x) = x { gives the Least-Absolute-Deviation estimator; x ρ(x) = 2 /2 x k gives the Huber estimator. k( x k/2) x > k L2 Loss L1 Loss x x psi(x) x x 6/57
13 M-Estimator: Examples rho(x) ρ(x) = x 2 /2 gives the Least-Square estimator; ρ(x) = x { gives the Least-Absolute-Deviation estimator; x ρ(x) = 2 /2 x k gives the Huber estimator. k( x k/2) x > k L2 Loss L1 Loss Huber Loss x x psi(x) x x x x 6/57
14 Goals (Informal) Goal (Informal): Make inference on the coordinates of ˆβ when the dimension p is comparable to the sample size n; and X is treated as fixed; without assumptions on β. 7/57
15 Goals (Informal) Goal (Informal): Make inference on the coordinates of ˆβ when the dimension p is comparable to the sample size n; and X is treated as fixed; without assumptions on β. Consider β 1 WLOG; Given X and L(ɛ), L( ˆβ 1 ) is uniquely determined; Ideally, we construct a 95% confidence interval for β1 as ( ) )] [q L( ˆβ 1 ), q (L( ˆβ 1 ) where q α denotes the α-th quantile; Unfortunately, L( ˆβ 1 ) is complicated. 7/57
16 Asymptotic Arguments Exact finite sample inference is hard. This motivates statisticians to resort to asymptotic arguments, i.e. find a distribution F s.t. L( ˆβ 1 ) F. 8/57
17 Asymptotic Arguments Exact finite sample inference is hard. This motivates statisticians to resort to asymptotic arguments, i.e. find a distribution F s.t. L( ˆβ 1 ) F. The limiting behavior of ˆβ when p is fixed, as n, ( ) L( ˆβ) N β, (X T X ) 1 E(ψ2 (ɛ 1 )) [Eψ (ɛ 1 )] 2 ; As a consequence, we obtain an approximate 95% confidence interval for β1, [ ˆβ sd( ˆβ 1 ), ˆβ sd( ] ˆβ 1 ) where sd( ˆβ 1 ) could be any consistent estimator of the standard deviation. 8/57
18 Asymptotic Arguments In other words, to approximate L( ˆβ 1 ), we consider a sequence of hypothetical problems, indexed by j, where the j-th problem has a sample size n j and a dimension p j = p. 9/57
19 Asymptotic Arguments In other words, to approximate L( ˆβ 1 ), we consider a sequence of hypothetical problems, indexed by j, where the j-th problem has a sample size n j and a dimension p j = p. For j-th problem, denote by ˆβ (j) the corresponding M-estimator, then the previous slide uses lim L( ˆβ (j) 1 ) to approximate L( ˆβ 1 ). j 9/57
20 Asymptotic Arguments In other words, to approximate L( ˆβ 1 ), we consider a sequence of hypothetical problems, indexed by j, where the j-th problem has a sample size n j and a dimension p j = p. For j-th problem, denote by ˆβ (j) the corresponding M-estimator, then the previous slide uses lim L( ˆβ (j) 1 ) to approximate L( ˆβ 1 ). j In general, p j is not necessarily fixed and can grow to infinity. 9/57
21 Asymptotic Arguments Huber (1973) raised the question of understanding the behavior of ˆβ when both n and p tend to infinity; Huber (1973) showed the L 2 consistency of ˆβ: ˆβ β under the regime p 3 n 0; Portnoy (1984) prove the L 2 consistency of ˆβ under the regime p log p n 0; 10/57
22 Asymptotic Arguments Portnoy (1985) showed that ˆβ is jointly asymptotically normal under the regime (p log n) 3 2 n 0, in the sense that for any sequence of vectors a n R p, L at n ( ˆβ β ) Var(an T ˆβ) N(0, 1) 11/57
23 p/n: A Measure of Difficulty All of the above works requires p/n 0 or n/p. 12/57
24 p/n: A Measure of Difficulty All of the above works requires p/n 0 or n/p. n/p is the number of samples per parameter. Heuristically, a larger n/p would give an easier problem. 12/57
25 p/n: A Measure of Difficulty Recall that the approximation can be seen as a sequence of hypothetical problems with sample size n j and dimension p j. If n j /p j, the problems become increasingly easier as j grows. 13/57
26 p/n: A Measure of Difficulty Recall that the approximation can be seen as a sequence of hypothetical problems with sample size n j and dimension p j. If n j /p j, the problems become increasingly easier as j grows. In other words, the hypothetical problem used for approximation is much easier than the original problem. Then the approximation accuracy might be compromised. 13/57
27 Moderate p/n Regime Instead, we can consider a sequence of hypothetical problems with p j /n j fixed to be the same as the original problem, i.e. p j /n j p/n. 14/57
28 Moderate p/n Regime Instead, we can consider a sequence of hypothetical problems with p j /n j fixed to be the same as the original problem, i.e. p j /n j p/n. In this case, the difficulty of the problem is fixed. 14/57
29 Moderate p/n Regime Formally, we define Moderate p/n Regime as p j /n j κ > 0. A typical value for κ is p/n in the original problem. 15/57
30 Moderate p/n Regime: More Informative Asymptotics Consider a set of small-sample problems where n = 50 and p = nκ for κ {0.1,..., 0.9}. For each pair (n, p), Step 1 Generate X R n p with i.i.d. N(0, 1) entries; Step 2 Fix β = 0 and sample Y = ɛ with ɛ i i.i.d. N(0, 1) or ɛ i i.i.d. t 2 ; Step 3 Estimate β 1 by ˆβ 1 with a Huber loss; Step 4 Repeat Step 2 - Step 3 for 100 times and estimate L( ˆβ 1 ). 16/57
31 Moderate p/n Regime: More Informative Asymptotics Now consider two types of approximations: Fixed-p Approx.: N = 1000, P = p; Moderate-p/n Approx.: N = 1000, P = 1000κ; Repeat Step 1-Step 4 for new pairs (N, P) and estimate L( ˆβ 1 F ) (Fixed p); L( ˆβ 1 M ) (Moderate p/n). 17/57
32 Moderate p/n Regime: More Informative Asymptotics Now consider two types of approximations: Fixed-p Approx.: N = 1000, P = p; Moderate-p/n Approx.: N = 1000, P = 1000κ; Repeat Step 1-Step 4 for new pairs (N, P) and estimate L( ˆβ 1 F ) (Fixed p); L( ˆβ 1 M ) (Moderate p/n). Measure the accuracy of two approximations by the Kolmogorov-Smirnov statistics ) ) d KS (L( ˆβ 1 ), L( ˆβ 1 F ) and d KS (L( ˆβ 1 ), L( ˆβ 1 M ) 17/57
33 Moderate p/n Regime: More Informative Asymptotics Distance between the small sample and large sample distribution normal t(2) Kolmogorov Smirnov Statistics kappa Asym. Regime p fixed p/n fixed 18/57
34 Moderate p/n Regime: Negative Results The moderate p/n regime has been widely studied in random matrix theory. In statistics: Huber (1973) showed that for least-square estimators there always exists a sequence of vectors a n R p such that L at n ( ˆβ LS β ) Var(an T ˆβ N(0, 1). LS ) Bickel and Freedman (1982) showed that the bootstrap fails in the Least-Square case and the usual rescaling does not help; El Karoui et al. (2011) showed that for general loss functions, ˆβ β /57
35 Moderate p/n Regime: Negative Results The moderate p/n regime has been widely studied in random matrix theory. In statistics: Huber (1973) showed that for least-square estimators there always exists a sequence of vectors a n R p such that L at n ( ˆβ LS β ) Var(an T ˆβ N(0, 1). LS ) Bickel and Freedman (1982) showed that the bootstrap fails in the Least-Square case and the usual rescaling does not help; El Karoui et al. (2011) showed that for general loss functions, ˆβ β Main reason: ˆF n, the empirical distribution of the residuals, namely R i y i xi T ˆβ, does not converge to L(ɛ i ). 19/57
36 Moderate p/n Regime: Positive Results If X is assumed to be a random matrix under regularity conditions, 20/57
37 Moderate p/n Regime: Positive Results If X is assumed to be a random matrix under regularity conditions, Bean et al. (2013) showed that when X has i.i.d. Gaussian entries, for any sequence of a n R p L X,ɛ at n ( ˆβ β ) Var X,ɛ (an T ˆβ) N(0, 1); The above result does not contradict Huber (1973) in that the randomness comes from both X and ɛ; El Karoui et al. (2011) showed that for general loss functions, ˆβ β 0. Under weaker assumptions on X, El Karoui (2015) showed L X,ɛ ˆβ 1 (τ) β1 bias( ˆβ 1 (τ)) N(0, 1) Var X,ɛ ( ˆβ 1 (τ)) where ˆβ 1 (τ) is the ridge-penalized M-estimator. 20/57
38 Moderate p/n Regime: Summary Provides a more accurate approximation of L( ˆβ 1 ); 21/57
39 Moderate p/n Regime: Summary Provides a more accurate approximation of L( ˆβ 1 ); Qualitatively different from the classical regimes where p/n 0; L 2 -consistency of ˆβ no longer holds; the residuals R i behaves differently from ɛ i ; fixed design results are different from random design results. 21/57
40 Moderate p/n Regime: Summary Provides a more accurate approximation of L( ˆβ 1 ); Qualitatively different from the classical regimes where p/n 0; L 2 -consistency of ˆβ no longer holds; the residuals R i behaves differently from ɛ i ; fixed design results are different from random design results. Inference on the vector ˆβ is hard; but inference on the coordinate / low-dimensional linear contrasts of ˆβ is still possible. 21/57
41 Goals (Formal) Our Goal (formal): Under the linear model Y = X β + ɛ, Derive the asymptotic distribution of coordinates ˆβ j : under the moderate p/n regime, i.e. p/n κ (0, 1); with a fixed design matrix X ; without assumptions on β. 22/57
42 Table of Contents 1 Background 2 Main Results and Examples 3 Assumptions and Proof Sketch 4 Numerical Results 23/57
43 Main Result (Informal) Definition 1. Let P and Q be two distributions on R p, d TV (P, Q) = sup A R p P(A) Q(A). 24/57
44 Main Result (Informal) Definition 1. Let P and Q be two distributions on R p, d TV (P, Q) = sup A R p P(A) Q(A). Theorem. Under appropriate conditions on the design matrix X, the distribution of ɛ and the loss function ρ, as p/n κ (0, 1), while n, max j d TV L ˆβ j E ˆβ j, N(0, 1) = o(1). Var( ˆβ j ) 24/57
45 Examples: Realization of i.i.d. Designs We consider the case where X is a realization of a random design Z. The examples below are proved to satisfy the technical assumptions with high probability over Z. 25/57
46 Examples: Realization of i.i.d. Designs We consider the case where X is a realization of a random design Z. The examples below are proved to satisfy the technical assumptions with high probability over Z. Example 1 Z has i.i.d. mean-zero sub-gaussian entries with Var(Z ij ) = τ 2 > 0; Example 2 Z contains an intercept term, i.e. Z = (1, Z) and Z R n (p 1) has independent sub-gaussian entries with Z ij µ j d = µj Z ij, Var( Z ij ) > τ 2 for some arbitrary µ j. 25/57
47 Examples: Realizations of Dependent Gaussian Designs Example 3 Z is matrix-normal with vec(z) N(0, Λ Σ) and λ max (Λ), λ max (Σ) = O (1), λ min (Λ), λ min (Σ) = Ω (1) Example 4 Z contains an intercept term, i.e. Z = (1, Z) and vec( Z) N(0, Λ Σ) with Λ and Σ satisfy the above condition and max i (Λ 1 2 1) i min i (Λ 1 2 1) i = O (1). 26/57
48 A Counter-Example Consider a one-way ANOVA situation. Each observation i is associated with a label k i {1,..., p} and let X i,j = I (j = k i ). This is equivalent to Y i = β k i + ɛ i. 27/57
49 A Counter-Example Consider a one-way ANOVA situation. Each observation i is associated with a label k i {1,..., p} and let X i,j = I (j = k i ). This is equivalent to Y i = β k i + ɛ i. It is easy to see that ˆβ j = arg min β R i:k i =j This is a standard location problem. ρ(y i β j ). 27/57
50 A Counter-Example Let n j = {i : k i = j}. In the least-square case, i.e. ρ(x) = x 2 /2, ˆβ j = β j + 1 n j ɛ i. i:k i =j 28/57
51 A Counter-Example Let n j = {i : k i = j}. In the least-square case, i.e. ρ(x) = x 2 /2, ˆβ j = β j + 1 n j i:k i =j ɛ i. Assume a balance design, i.e. n j n/p. Then n j << and none of ˆβ j is normal (unless ɛ i are normal); holds for general loss functions ρ. 28/57
52 A Counter-Example Let n j = {i : k i = j}. In the least-square case, i.e. ρ(x) = x 2 /2, ˆβ j = β j + 1 n j i:k i =j ɛ i. Assume a balance design, i.e. n j n/p. Then n j << and none of ˆβ j is normal (unless ɛ i are normal); holds for general loss functions ρ. Conclusion: some non-standard assumptions on X are required. 28/57
53 Table of Contents 1 Background 2 Main Results and Examples 3 Assumptions and Proof Sketch Least-Square Estimator: A Motivating Example Second-Order Poincaré Inequality Assumptions Main Results 4 Numerical Results 29/57
54 Least Square Estimator The L 2 loss, ρ(x) = x 2 /2, gives the least-square estimator ˆβ LS = (X T X ) 1 X T Y = β + (X T X ) 1 X T ɛ. 30/57
55 Least Square Estimator The L 2 loss, ρ(x) = x 2 /2, gives the least-square estimator ˆβ LS = (X T X ) 1 X T Y = β + (X T X ) 1 X T ɛ. Let e j denote the canonical basis vector in R p, then ˆβ LS j β j = e T j (X T X ) 1 X T ɛ. Write e T j (X T X ) 1 X T as α T j, then ˆβ LS j β j = n α j,i ɛ i. i=1 30/57
56 Least Square Estimator Lindeberg-Feller CLT claims that in order for ˆβ LS L j βj N(0, 1) Var( ˆβ j LS ) it is sufficient and almost necessary that α j α j 2 0. (1) 31/57
57 Least Square Estimator To see the necessity of the condition, recall the one-way ANOVA case. Let n j = {i : k i = j}, then X T X = diag(n j ) p j=1. This gives α j,i = { 1 n j if k i = j 0 if k i j 32/57
58 Least Square Estimator To see the necessity of the condition, recall the one-way ANOVA case. Let n j = {i : k i = j}, then X T X = diag(n j ) p j=1. This gives α j,i = { 1 n j if k i = j 0 if k i j As a result, α j = 1 n j, α j 2 = 1 nj α j α j 2 = 1 nj and hence However, in moderate p/n regime, there exists j such that n j 1/κ and thus ˆβ j LS is not asymptotically normal. 32/57
59 M-Estimator The result for LSE is derived from the analytical form of ˆβ LS. In contrast, an analytical form is not available for general ρ. 33/57
60 M-Estimator The result for LSE is derived from the analytical form of ˆβ LS. In contrast, an analytical form is not available for general ρ. Let ψ = ρ, it is the solution of 1 n n ψ(y i xi T i=1 ˆβ) = 0 33/57
61 M-Estimator The result for LSE is derived from the analytical form of ˆβ LS. In contrast, an analytical form is not available for general ρ. Let ψ = ρ, it is the solution of 1 n n ψ(y i xi T i=1 ˆβ) = 0 WLOG, assume β = 0, then 1 n n ψ(ɛ i xi T i=1 ˆβ) = 0. 33/57
62 M-Estimator Write R i for ɛ i x T i ˆβ and define D, D and G as D = diag(ψ (R i )), D = diag(ψ (R i )), G = I X (X T DX ) 1 X T D. 34/57
63 M-Estimator Write R i for ɛ i x T i ˆβ and define D, D and G as D = diag(ψ (R i )), D = diag(ψ (R i )), G = I X (X T DX ) 1 X T D. Lemma 2. Suppose ψ C 2 (R n ), then ˆβ j ɛ T = et j (X T DX ) 1 X T D, (2) ˆβ j ɛ ɛ T = G T diag(e T j (X T DX ) 1 X T D)G. (3) 34/57
64 Second-Order Poincaré Inequality ˆβ j is a smooth transform of a random vector, ɛ, with independent entries. A powerful CLT for this type of statistics is Second-Order Poincaré Inequality (Chatterjee, 2009). 35/57
65 Second-Order Poincaré Inequality ˆβ j is a smooth transform of a random vector, ɛ, with independent entries. A powerful CLT for this type of statistics is Second-Order Poincaré Inequality (Chatterjee, 2009). Definition 3. For each c 1, c 2 > 0, let L(c 1, c 2 ) be the class of probability measures on R that arise as laws of random variables like u(w ), where W N(0, 1) and u C 2 (R n ) with u (x) c 1 and u (x) c 2. For example, u = Id gives N(0, 1) and u = Φ gives U([0, 1]) 35/57
66 Second-Order Poincaré Inequality Proposition 1 (SOPI; Chatterjee, 2009). Let W = (W 1,..., W n ) indep. L(c 1, c 2 ). Take any g C 2 (R n ) and let U = g(w ), κ 0 = ( E n i g(w ) ) ; i=1 κ 1 = (E g(w ) 4 2) 1 4 ; κ 2 = (E 2 g(w ) 4 op) 1 4. If U has a finite fourth moment, then ( ) ) U EU d TV (L, N(0, 1) Var(U) κ 0 + κ 1 κ 2 Var(U). 36/57
67 Assumptions Assume that A1 ρ(0) = ψ(0) = 0 and for any x R, 0 < K 0 ψ (x) K 1, ψ (x) K 2 ; A2 ɛ has independent entries with ɛ i L(c 1, c 2 ); A3 Let λ + and λ be the largest and smallest eigenvalues of X T X /n and λ + = O(1), λ = Ω(1). 37/57
68 Second-Order Poincaré Inequality on ˆβ j Apply Second-Order Poincaré Inequality to ˆβ j, we obtain that Lemma 4. Let D = diag(ψ (ɛ i x T i ˆβ)) n i=1, and Then under assumptions A1-A3, max j d TV L M j = E e T j (X T DX ) 1 X T D 1 2. ˆβ j E ˆβ ( j maxj (nm, N(0, 1) j 2 = O ) 1 8 p Var( ˆβ j ) n min j Var( ˆβ j ) The main result is obtained if we prove ( ) ( ) 1 1 M j = o n, Var( ˆβ j ) = Ω. n ), 38/57
69 Further Assumptions Define the following quantities: leave-one-predictor-out estimate ˆβ [j] : the M-estimator obtained by removing the j-th column of X (El Karoui, 2013); leave-one-predictor-out residuals r i,[j] = ɛ i x T i,[j] ˆβ [j] where x T i,[j] is the i-th row of X after removing j-th entry; h j,0 = (ψ(r 1,[j] ),..., ψ(r n,[j] )) T ; Q j = Cov(h j,0 ) be the covariance matrix of ψ(r i,[j] ). 39/57
70 Further Assumptions Besides assumptions A1 - A3, we assume that Xj A4 min T Q j X j j tr(q j ) = Ω (1). 40/57
71 Further Assumptions Besides assumptions A1 - A3, we assume that Xj A4 min T Q j X j j tr(q j ) = Ω (1). Q j does not involve X j ; Assumption A4 guarantees Var( ˆβ j ) = Ω ( ) 1. n 40/57
72 Further Assumptions If X j is a realization of a random vector Z j with i.i.d. entries, then EZj T Q j Z j = tr(ez j Zj T Q j ) = EZ1,j 2 tr(q j ). If Zj T Q j Z j concentrates around its mean, then Zj T Q j Z j tr(q j ) EZ 2 1,j > 0. 41/57
73 Further Assumptions If X j is a realization of a random vector Z j with i.i.d. entries, then EZj T Q j Z j = tr(ez j Zj T Q j ) = EZ1,j 2 tr(q j ). If Zj T Q j Z j concentrates around its mean, then Zj T Q j Z j tr(q j ) EZ 2 1,j > 0. For example, when Z j has i.i.d. sub-gaussian entries, the Hansen-Wright inequality implies the concentration. { { t P( Zj T Q j Z j EZj T 2 t Q j Z j t) 2 exp c min Q j 2, F Q j op }}. 41/57
74 Further Assumptions To describe the last assumption, we define the following quantities: D [j] = diag(ψ (r i,[j] )): leave-one-predictor-out version of D; G [j] = I X [j] (X T [j] D [j]x [j] ) 1 X T [j] D [j]; h T j,1,i = e T i G [j] : the i-th row of G [j] ; C = max { max j hj,0 T X j, max h j,0 2 i,j hj,1,i T X } j. h j,1,i 2 42/57
75 Further Assumptions The last assumption: A5 E 8 C = O (polylog(n)). 43/57
76 Further Assumptions The last assumption: A5 E 8 C = O (polylog(n)). It turns out that when ρ(x) = x 2 /2, C max j e T j (X T X ) 1 X T e T j (X T X ) 1 X T 2. Recall that for Least-Squares, ˆβ j are all asymptotically normal iff the right-handed side tends to 0. This indicates that the assumption A5 is not just an artifact of the proof. 43/57
77 Further Assumptions Let α j,0 = h j,0 / h j,0 2, α j,1,i = h j,1,i / h j,1,i 2. Again, if X j is a realization of a random vector Z j with i.i.d. σ 2 -sub-gaussian entries, then α T j,0 Z j and α T j,1,i Z j are all σ 2 -sub-gaussian. 44/57
78 Further Assumptions Let α j,0 = h j,0 / h j,0 2, α j,1,i = h j,1,i / h j,1,i 2. Again, if X j is a realization of a random vector Z j with i.i.d. σ 2 -sub-gaussian entries, then α T j,0 Z j and α T j,1,i Z j are all σ 2 -sub-gaussian. Then C is the maximum of np + p sub-gaussian random variables and hence E 8 C = O(polyLog(n)). 44/57
79 Review of All Assumptions A1 ρ(0) = ψ(0) = 0 and for any x R, 0 < K 0 ψ (x) K 1, ψ (x) K 2 ; A2 ɛ has independent entries with ɛ i L(c 1, c 2 ); A3 Let λ + and λ be the largest and smallest eigenvalues of X T X /n and λ + = O(1), λ = Ω(1). Zj A4 min T Q j Z j j tr(q j ) = Ω (1). A5 E 8 C = O (polylog(n)). 45/57
80 Main Results Theorem 5. Under assumptions A1 A5, as p/n κ for some κ (0, 1) while n, max j d TV L ˆβ j E ˆβ j, N(0, 1) = o(1). Var( ˆβ j ) 46/57
81 A Corollary If further assume that A6 ρ is an even function and ɛ i d = ɛi. Then one can show that ˆβ is unbiased. As a consequence, 47/57
82 A Corollary If further assume that A6 ρ is an even function and ɛ i d = ɛi. Then one can show that ˆβ is unbiased. As a consequence, Theorem 6. Under assumptions A1 A6, as p/n κ for some κ (0, 1) while n, max j d TV L ˆβ j βj, N(0, 1) = o(1), Var( ˆβ j ) 47/57
83 Table of Contents 1 Background 2 Main Results and Examples 3 Assumptions and Proof Sketch 4 Numerical Results 48/57
84 Setup Design matrix X: (i.i.d. design): X ij i.i.d. F ; (partial Hadamard design): a matrix formed by a random set of p columns of a n n Hadamard matrix. Entry Distribution F: F = N(0, 1); F = t 2. Error Distribution L(ɛ): ɛ i are i.i.d. with ɛ i N(0, 1); ɛ i t 2. 49/57
85 Setup Sample Size n: {100, 200, 400, 800}; κ = p/n: {0.5, 0.8}; Loss Function ρ: Huber loss with k = 1.345, { 1 ρ(x) = 2 x 2 x k kx k2 2 x > k 50/57
86 Asymptotic Normality of A Single Coordinate For each set of parameters, we run 50 simulations with each consisting of the following steps: (Step 1) Generate one design matrix X ; (Step 2) Generate the 300 error vectors ɛ; (Step 3) Regress each Y = ɛ on the design matrix X and end up with 300 random samples of ˆβ 1, denoted by ˆβ (1) (300) 1,..., ˆβ 1 ; (Step 4) Estimate the standard deviation of ˆβ 1 by the sample standard error sd; ˆ (Step 5) Construct [ a confidence interval ] I (k) = ˆβ (k) sd, ˆ ˆβ (k) sd ˆ for each k = 1,..., 300; (Step 6) Calculate the empirical 95% coverage by the proportion of confidence intervals which cover the true β 1 = 0. 51/57
87 Asymptotic Normality of A Single Coordinate 1.00 Coverage of β^1 (κ = 0.5) normal t(2) 1.00 Coverage of β^1 (κ = 0.8) normal t(2) Coverage iid hadamard Coverage iid hadamard Sample Size Entry Dist. normal t(2) hadamard Sample Size Entry Dist. normal t(2) hadamard 52/57
88 Conclusion We establish the coordinate-wise asymptotic normality of the M-estimator for certain fixed design matrices under the moderate p/n regime under regularity conditions on X, L(ɛ) and ρ but no condition on β ; We prove the result by using the novel approach Second-Order Poincaré Inequality (Chatterjee, 2009); We show that the regularity conditions are satisfied by a broad class of designs. 53/57
89 Future Works Future works for this project: Estimate Var( ˆβ j ) Relax the assumptions on L(ɛ) Relax the strong convexity of ρ Extend the results to GLM 54/57
90 Future Works Future works for this project: Estimate Var( ˆβ j ) Relax the assumptions on L(ɛ) Relax the strong convexity of ρ Extend the results to GLM Future works for my dissertation: Distributional properties in high dimensions Resampling methods in high dimensions 54/57
91 Thank You! 55/57
92 References I Bean, D., Bickel, P. J., El Karoui, N., & Yu, B. (2013). Optimal m-estimation in high-dimensional regression. Proceedings of the National Academy of Sciences, 110(36), Bickel, P. J., & Freedman, D. A. (1982). Bootstrapping regression models with many parameters. Festschrift for Erich L. Lehmann, Chatterjee, S. (2009). Fluctuations of eigenvalues and second order poincaré inequalities. Probability Theory and Related Fields, 143(1-2), El Karoui, N. (2013). Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: rigorous results. arxiv preprint arxiv: El Karoui, N. (2015). On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators. 56/57
93 References II El Karoui, N., Bean, D., Bickel, P. J., Lim, C., & Yu, B. (2011). On robust regression with high-dimensional predictors. Proceedings of the National Academy of Sciences, 110(36), Huber, P. J. (1973). Robust regression: asymptotics, conjectures and monte carlo. The Annals of Statistics, Portnoy, S. (1984). Asymptotic behavior of m-estimators of p regression parameters when p2/n is large. i. consistency. The Annals of Statistics, Portnoy, S. (1985). Asymptotic behavior of m estimators of p regression parameters when p2/n is large; ii. normal approximation. The Annals of Statistics, /57
Inference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationInference for High Dimensional Robust Regression
Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:
More informationHigh-dimensional regression:
High-dimensional regression: How to pick the objective function in high-dimension UC Berkeley March 11, 2013 Joint work with Noureddine El Karoui, Peter Bickel, Chingwhay Lim, and Bin Yu 1 / 12 Notation.
More informationLikelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square
Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Yuxin Chen Electrical Engineering, Princeton University Coauthors Pragya Sur Stanford Statistics Emmanuel
More informationCan we trust the bootstrap in high-dimension?
Can we trust the bootstrap in high-dimension? Noureddine El Karoui and Elizabeth Purdom Department of Statistics, University of California, Berkeley First submitted: February 4, 205 This version: October
More informationCan We Trust the Bootstrap in High-dimensions? The Case of Linear Models
Journal of Machine Learning Research 9 208-66 Submitted /7; Revised 2/7; Published 08/8 Can We Trust the Bootstrap in High-dimensions? The Case of Linear Models Noureddine El Karoui Criteo AI Lab 32 Rue
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationNegative Association, Ordering and Convergence of Resampling Methods
Negative Association, Ordering and Convergence of Resampling Methods Nicolas Chopin ENSAE, Paristech (Joint work with Mathieu Gerber and Nick Whiteley, University of Bristol) Resampling schemes: Informal
More informationCan we trust the bootstrap in high-dimension?
Can we trust the bootstrap in high-dimension? Noureddine El Karoui and Elizabeth Purdom Department of Statistics, University of California, Berkeley February 4, 05 Abstract We consider the performance
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationEconomics 583: Econometric Theory I A Primer on Asymptotics
Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:
More informationlarge number of i.i.d. observations from P. For concreteness, suppose
1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationM-Estimation under High-Dimensional Asymptotics
M-Estimation under High-Dimensional Asymptotics 2014-05-01 Classical M-estimation Big Data M-estimation An out-of-the-park grand-slam home run Annals of Mathematical Statistics 1964 Richard Olshen Classical
More informationSliced Inverse Regression
Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed
More informationSHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan
A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama,
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationSupplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017
Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationAdvanced Statistics II: Non Parametric Tests
Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationRobust estimation, efficiency, and Lasso debiasing
Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling
More informationNonparametric Inference via Bootstrapping the Debiased Estimator
Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationQuantile Regression for Extraordinarily Large Data
Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step
More informationSupplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data
Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department
More informationAssessing the dependence of high-dimensional time series via sample autocovariances and correlations
Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Johannes Heiny University of Aarhus Joint work with Thomas Mikosch (Copenhagen), Richard Davis (Columbia),
More informationRobust high-dimensional linear regression: A statistical perspective
Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,
More informationLecture 20: Linear model, the LSE, and UMVUE
Lecture 20: Linear model, the LSE, and UMVUE Linear Models One of the most useful statistical models is X i = β τ Z i + ε i, i = 1,...,n, where X i is the ith observation and is often called the ith response;
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationQuantile Processes for Semi and Nonparametric Regression
Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationUniversity of California San Diego and Stanford University and
First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford
More informationLecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf
Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationFluctuations from the Semicircle Law Lecture 4
Fluctuations from the Semicircle Law Lecture 4 Ioana Dumitriu University of Washington Women and Math, IAS 2014 May 23, 2014 Ioana Dumitriu (UW) Fluctuations from the Semicircle Law Lecture 4 May 23, 2014
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationLarge sample distribution for fully functional periodicity tests
Large sample distribution for fully functional periodicity tests Siegfried Hörmann Institute for Statistics Graz University of Technology Based on joint work with Piotr Kokoszka (Colorado State) and Gilles
More informationBootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.
Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling
More informationA Resampling Method on Pivotal Estimating Functions
A Resampling Method on Pivotal Estimating Functions Kun Nie Biostat 277,Winter 2004 March 17, 2004 Outline Introduction A General Resampling Method Examples - Quantile Regression -Rank Regression -Simulation
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationWhy Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory
Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Andreas Buja joint with the PoSI Group: Richard Berk, Lawrence Brown, Linda Zhao, Kai Zhang Ed George, Mikhail Traskin, Emil Pitkin,
More informationSOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS
SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary
More informationThe largest eigenvalues of the sample covariance matrix. in the heavy-tail case
The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University)
More informationSub-Gaussian estimators under heavy tails
Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade Maresias, August 6th 2015 Joint with Luc Devroye (McGill) Matthieu Lerasle (CNRS/Nice) Gábor
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationQuantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation
Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Maria Ponomareva University of Western Ontario May 8, 2011 Abstract This paper proposes a moments-based
More informationBTRY 4090: Spring 2009 Theory of Statistics
BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)
More informationRegularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008
Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)
More informationUncertainty Quantification for Inverse Problems. November 7, 2011
Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance
More informationA Note on Auxiliary Particle Filters
A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,
More informationThe Central Limit Theorem: More of the Story
The Central Limit Theorem: More of the Story Steven Janke November 2015 Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 1 / 33 Central Limit Theorem Theorem (Central Limit
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationLecture I: Asymptotics for large GUE random matrices
Lecture I: Asymptotics for large GUE random matrices Steen Thorbjørnsen, University of Aarhus andom Matrices Definition. Let (Ω, F, P) be a probability space and let n be a positive integer. Then a random
More informationBootstrapping high dimensional vector: interplay between dependence and dimensionality
Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang
More informationLawrence D. Brown* and Daniel McCarthy*
Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals
More informationA Comparison of Robust Estimators Based on Two Types of Trimming
Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationBootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator
Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos
More informationLinear models. Linear models are computationally convenient and remain widely used in. applied econometric research
Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y
More informationSample Size Requirement For Some Low-Dimensional Estimation Problems
Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationProgram Evaluation with High-Dimensional Data
Program Evaluation with High-Dimensional Data Alexandre Belloni Duke Victor Chernozhukov MIT Iván Fernández-Val BU Christian Hansen Booth ESWC 215 August 17, 215 Introduction Goal is to perform inference
More information36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1
36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations
More informationLeast Squares Estimation-Finite-Sample Properties
Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions
More informationOptimization Problems with Probabilistic Constraints
Optimization Problems with Probabilistic Constraints R. Henrion Weierstrass Institute Berlin 10 th International Conference on Stochastic Programming University of Arizona, Tucson Recommended Reading A.
More informationComposite Loss Functions and Multivariate Regression; Sparse PCA
Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.
More informationModel Mis-specification
Model Mis-specification Carlo Favero Favero () Model Mis-specification 1 / 28 Model Mis-specification Each specification can be interpreted of the result of a reduction process, what happens if the reduction
More informationIf g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get
18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationA Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices
A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash
More informationA Conditional Approach to Modeling Multivariate Extremes
A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying
More informationRandom Matrix Theory and its Applications to Econometrics
Random Matrix Theory and its Applications to Econometrics Hyungsik Roger Moon University of Southern California Conference to Celebrate Peter Phillips 40 Years at Yale, October 2018 Spectral Analysis of
More informationIV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade
IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade Denis Chetverikov Brad Larsen Christopher Palmer UCLA, Stanford and NBER, UC Berkeley September
More informationRobustní monitorování stability v modelu CAPM
Robustní monitorování stability v modelu CAPM Ondřej Chochola, Marie Hušková, Zuzana Prášková (MFF UK) Josef Steinebach (University of Cologne) ROBUST 2012, Němčičky, 10.-14.9. 2012 Contents Introduction
More informationEXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2
EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME Xavier Mestre, Pascal Vallet 2 Centre Tecnològic de Telecomunicacions de Catalunya, Castelldefels, Barcelona (Spain) 2 Institut
More informationTHE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich
Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional
More informationBickel Rosenblatt test
University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance
More informationBayesian spatial quantile regression
Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse
More informationStochastic process for macro
Stochastic process for macro Tianxiao Zheng SAIF 1. Stochastic process The state of a system {X t } evolves probabilistically in time. The joint probability distribution is given by Pr(X t1, t 1 ; X t2,
More informationConcentration Inequalities for Random Matrices
Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationConcentration behavior of the penalized least squares estimator
Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar
More informationHard-Core Model on Random Graphs
Hard-Core Model on Random Graphs Antar Bandyopadhyay Theoretical Statistics and Mathematics Unit Seminar Theoretical Statistics and Mathematics Unit Indian Statistical Institute, New Delhi Centre New Delhi,
More information