Inference For High Dimensional M-estimates: Fixed Design Results
|
|
- Randolph Hines
- 5 years ago
- Views:
Transcription
1 Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, /49
2 Table of Contents Background Main Results Heuristics and Proof Techniques Numerical Results 2/49
3 Table of Contents Background Main Results Heuristics and Proof Techniques Numerical Results 3/49
4 Setup Consider a linear Model: Y = Xβ + ɛ. y = (y 1,..., y n ) T R n : response vector; X = (x T 1,..., xt n ) T R n p : design matrix; β = (β 1,..., β p) T R p : coefficient vector; ɛ = (ɛ 1,..., ɛ n ) T R n : random unobserved error with independent entries. 4/49
5 M-Estimator M-Estimator: Given a convex loss function ρ( ) : R [0, ), 1 ˆβ = arg min β R p n n ρ(y i x T i β). When ρ is differentiable with ψ = ρ, ˆβ can be written as the solution: 1 n ψ(y i x T i n ˆβ) = 0. i=1 i=1 5/49
6 M-Estimator: Examples ρ(x) = x 2 /2 gives the Least-Square estimator; L2 Loss x psi(x) rho(x) x 6/49
7 M-Estimator: Examples ρ(x) = x 2 /2 gives the Least-Square estimator; ρ(x) = x gives the Least-Absolute-Deviation estimator; rho(x) L2 Loss L1 Loss x x psi(x) x x 6/49
8 M-Estimator: Examples rho(x) ρ(x) = x 2 /2 gives the Least-Square estimator; ρ(x) = x gives the Least-Absolute-Deviation estimator; { x ρ(x) = 2 /2 x k gives the Huber estimator. k( x k/2) x > k L2 Loss L1 Loss Huber Loss x x x psi(x) x x x 6/49
9 Goals (Informal) Goal (Informal): Make inference on the coordinates of β when X is treated as fixed; no assumption imposed on β ; and the dimension p is comparable to the sample size n. 7/49
10 Goals (Informal) Goal (Informal): Make inference on the coordinates of β when X is treated as fixed; no assumption imposed on β ; and the dimension p is comparable to the sample size n. Why coordinates? 7/49
11 Goals (Informal) Goal (Informal): Make inference on the coordinates of β when X is treated as fixed; no assumption imposed on β ; and the dimension p is comparable to the sample size n. Why coordinates? Why fixed designs? 7/49
12 Goals (Informal) Goal (Informal): Make inference on the coordinates of β when X is treated as fixed; no assumption imposed on β ; and the dimension p is comparable to the sample size n. Why coordinates? Why fixed designs? Why assumption-free β? 7/49
13 Goals (Informal) Goal (Informal): Make inference on the coordinates of β when X is treated as fixed; no assumption imposed on β ; and the dimension p is comparable to the sample size n. Why coordinates? Why fixed designs? Why assumption-free β? Why p n? 7/49
14 Asymptotic Arguments: Motivation Consider β 1 WLOG; 8/49
15 Asymptotic Arguments: Motivation Consider β 1 WLOG; Ideally, we construct a 95% confidence interval for β1 as ( [q L( ˆβ ) 1 ), q (L( ˆβ )] 1 ) where q α denotes the α-th quantile; 8/49
16 Asymptotic Arguments: Motivation Consider β 1 WLOG; Ideally, we construct a 95% confidence interval for β1 as ( [q L( ˆβ ) 1 ), q (L( ˆβ )] 1 ) where q α denotes the α-th quantile; Unfortunately, L( ˆβ 1 ) is unknown. 8/49
17 Asymptotic Arguments: Motivation Consider β 1 WLOG; Ideally, we construct a 95% confidence interval for β1 as ( [q L( ˆβ ) 1 ), q (L( ˆβ )] 1 ) where q α denotes the α-th quantile; Unfortunately, L( ˆβ 1 ) is unknown. This motivates the asymptotic arguments, i.e. find a distribution F s.t. L( ˆβ 1 ) F. 8/49
18 Asymptotic Arguments: Textbook Version The limiting behavior of ˆβ when p is fixed, as n, ( ) L( ˆβ) N β, (X T X) 1 E(ψ2 (ɛ 1 )) [Eψ (ɛ 1 )] 2 ; As a consequence, we obtain an approximate 95% confidence interval for β1, [ ˆβ1 1.96sd( ˆβ 1 ), ˆβ sd( ˆβ ] 1 ) where sd( ˆβ 1 ) could be any consistent estimator of the standard deviation. 9/49
19 Asymptotic Arguments: Hypothetical Problems y X R n p original problem (n = 100, p = 30) y X ˆβ 1 10/49
20 Asymptotic Arguments: Hypothetical Problems y X R n p y 1 X 1 R n 1 p 1 original problem (n = 100, p = 30) y X ˆβ 1 hypothetical problem (n 1 = 200, p 1 = 30) y 1 X 1 ˆβ (1) 1 10/49
21 Asymptotic Arguments: Hypothetical Problems y X R n p y 1 X 1 R n 1 p 1 y 2 X 2 R n 2 p 2 original problem (n = 100, p = 30) y X ˆβ 1 hypothetical problem (n 1 = 200, p 1 = 30) y 1 X 1 ˆβ (1) 1 hypothetical problem (n 2 = 500, p 2 = 30) y 2 X 2 ˆβ (2) 1 10/49
22 Asymptotic Arguments: Hypothetical Problems y X R n p original problem y 1 X 1 R n 1 p 1 y 2 X 2 R n 2 p 2 y 3 X 3 R n 3 p 3 (n = 100, p = 30) hypothetical problem y X ˆβ 1 (n 1 = 200, p 1 = 30) y 1 X 1 ˆβ (1) 1 hypothetical problem (n 2 = 500, p 2 = 30) y 2 X 2 ˆβ (2) 1 hypothetical problem (n 3 = 2000, p 3 = 30) y 3 X 3 ˆβ (3) 1 10/49
23 Asymptotic Arguments: Hypothetical Problems y X R n p original problem y 1 X 1 R n 1 p 1 y 2 X 2 R n 2 p 2 y 3 X 3 R n 3 p 3 (n = 100, p = 30) hypothetical problem y X ˆβ 1 (n 1 = 200, p 1 = 30) y 1 X 1 ˆβ (1) 1 hypothetical problem (n 2 = 500, p 2 = 30) y 2 X 2 ˆβ (2) 1 hypothetical problem (n 3 = 2000, p 3 = 30) y 3 X 3 Asymptotic argument: use lim j L( ˆβ (j) 1 ) to approximate L( ˆβ 1 ). ˆβ (3) 1 10/49
24 Asymptotic Arguments Huber [1973] raised the question of understanding the behavior of ˆβ when both n and p tend to infinity; 11/49
25 Asymptotic Arguments Huber [1973] raised the question of understanding the behavior of ˆβ when both n and p tend to infinity; Huber [1973] showed the L 2 consistency of ˆβ: ˆβ β 2 2 0, when p = o(n 1 3 ); 11/49
26 Asymptotic Arguments Huber [1973] raised the question of understanding the behavior of ˆβ when both n and p tend to infinity; Huber [1973] showed the L 2 consistency of ˆβ: ˆβ β 2 2 0, when p = o(n 1 3 ); Portnoy [1984] prove the L 2 consistency of ˆβ when ( ) n p = o. log n 11/49
27 Asymptotic Arguments Portnoy [1985] and Mammen [1989] showed that ˆβ is jointly asymptotically normal when p << n 2 3, 12/49
28 Asymptotic Arguments Portnoy [1985] and Mammen [1989] showed that ˆβ is jointly asymptotically normal when p << n 2 3, in the sense that for any sequence of vectors a n R p, L at n ( ˆβ β ) Var(a T ˆβ) N(0, 1) n 12/49
29 p/n: A Measure of Difficulty All of the above works requires p/n 0 or n/p. 13/49
30 p/n: A Measure of Difficulty All of the above works requires p/n 0 or n/p. n/p is the number of samples per parameter; Classical rule of thumb: n/p 5 10; Heuristically, a larger n/p would give an easier problem; Hypothetical problems with n j /p j are not appropriate because they are increasingly easier than the original problem. 13/49
31 Moderate p/n Regime Formally, we define Moderate p/n Regime as p/n κ > 0. y X R n p original problem (n = 100, p = 30) y X ˆβ 1 14/49
32 Moderate p/n Regime Formally, we define Moderate p/n Regime as p/n κ > 0. y X R n p y 1 X 1 R n 1 p 1 original problem (n = 100, p = 30) y X ˆβ 1 hypothetical problem (n 1 = 200, p 1 = 60) y 1 X 1 ˆβ (1) 1 14/49
33 Moderate p/n Regime Formally, we define Moderate p/n Regime as p/n κ > 0. y X R n p y 1 X 1 R n 1 p 1 y 2 X 2 R n 2 p 2 original problem (n = 100, p = 30) y X ˆβ 1 hypothetical problem (n 1 = 200, p 1 = 60) y 1 X 1 ˆβ (1) 1 hypothetical problem (n 2 = 500, p 2 = 150) y 2 X 2 ˆβ (2) 1 14/49
34 Moderate p/n Regime Formally, we define Moderate p/n Regime as p/n κ > 0. y X R n p original problem y 1 X 1 R n 1 p 1 y 2 X 2 R n 2 p 2 y 3 X 3 R n 3 p 3 (n = 100, p = 30) hypothetical problem y X ˆβ 1 (n 1 = 200, p 1 = 60) y 1 X 1 ˆβ (1) 1 hypothetical problem (n 2 = 500, p 2 = 150) y 2 X 2 ˆβ (2) 1 hypothetical problem (n 3 = 2000, p 3 = 600) y 3 X 3 ˆβ (3) 1 14/49
35 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. 15/49
36 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. X 15/49
37 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. X β 15/49
38 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. X β ɛ 1 15/49
39 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. y 1 = X β + ɛ 1 15/49
40 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. y 1 = X β + ɛ 1 M-Estimates: ˆβ(1) 1, 15/49
41 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. β y 2 = X + ɛ 1 ɛ 2 M-Estimates: ˆβ(1) 1, 15/49
42 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. β y 2 = X + ɛ 1 ɛ 2 M-Estimates: ˆβ(1) (2) 1, ˆβ 1, 15/49
43 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. β y 3 = X + ɛ 1 ɛ 2 ɛ 3 ɛ 3 M-Estimates: ˆβ(1) (2) 1, ˆβ 1, 15/49
44 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. β y 3 = X + ɛ 1 ɛ 2 ɛ 3 ɛ 3 M-Estimates: ˆβ(1) 1 (2) (3), ˆβ 1, ˆβ 1, 15/49
45 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1 (2) (3), ˆβ 1, ˆβ 1, 15/49
46 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1, ˆβ (2) 1 (3) (r), ˆβ 1,..., ˆβ 1. 15/49
47 Moderate p/n Regime: More Informative Asymptotics A simulation to compare Fix-p Regime and Moderate p/n Regime: Original problem: n = 50, p = 50κ, Huber loss, i.i.d. ɛ i s. y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1, ˆβ (2) 1 (3) (r), ˆβ 1,..., ˆβ 1. = ˆL( ˆβ 1 ; X) = ecdf({ (1) (r) ˆβ 1,..., ˆβ 1 }). 15/49
48 Moderate p/n Regime: More Informative Asymptotics A Simulation to compare Fix-p Regime and Moderate p/n Regime: Fix-p Approximation: n = 1000, p = 50κ. 16/49
49 Moderate p/n Regime: More Informative Asymptotics A Simulation to compare Fix-p Regime and Moderate p/n Regime: Fix-p Approximation: n = 1000, p = 50κ. β y r = X + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(F,1) 1, ˆβ (F,2) 1, ˆβ (F,3) 1,..., ˆβ (F,r) 1. = ˆL( ˆβ F 1 ; X) = ecdf({ ˆβ (F,1) 1,..., ˆβ (F,r) 1 }). 16/49
50 Moderate p/n Regime: More Informative Asymptotics A Simulation to compare Fix-p Regime and Moderate p/n Regime: Moderate-p/n Approximation: n = 1000, p = 1000κ. 17/49
51 Moderate p/n Regime: More Informative Asymptotics A Simulation to compare Fix-p Regime and Moderate p/n Regime: Moderate-p/n Approximation: n = 1000, p = 1000κ. y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(M,1) 1, ˆβ (M,2) 1, ˆβ (M,3) 1,..., ˆβ (M,r) 1. = ˆL( ˆβ M 1 ; X) = ecdf({ ˆβ (M,1) 1,..., ˆβ (M,r) 1 }). 17/49
52 Moderate p/n Regime: More Informative Asymptotics Measure the accuracy of two approximations by the Kolmogorov-Smirnov statistics ( d KS ˆL( ˆβ1 ), ˆL( ˆβ ) 1 F ) ( and d KS ˆL( ˆβ1 ), ˆL( ˆβ ) 1 M ) Distance between the small sample and large sample distribution normal t(2) Kolmogorov Smirnov Statistics kappa Asym. Regime p fixed p/n fixed 18/49
53 Moderate p/n Regime: Negative Results The moderate p/n regime in statistics: 19/49
54 Moderate p/n Regime: Negative Results The moderate p/n regime in statistics: Huber [1973] showed that for least-square estimators there always exists a sequence of vectors a n R p such that L at n ( ˆβ LS β ) Var(a T ˆβ N(0, 1). n LS ) 19/49
55 Moderate p/n Regime: Negative Results The moderate p/n regime in statistics: Huber [1973] showed that for least-square estimators there always exists a sequence of vectors a n R p such that L at n ( ˆβ LS β ) Var(a T ˆβ N(0, 1). n LS ) Bickel and Freedman [1982] showed that the bootstrap fails in the Least-Square case and the usual rescaling does not help; 19/49
56 Moderate p/n Regime: Negative Results The moderate p/n regime in statistics: Huber [1973] showed that for least-square estimators there always exists a sequence of vectors a n R p such that L at n ( ˆβ LS β ) Var(a T ˆβ N(0, 1). n LS ) Bickel and Freedman [1982] showed that the bootstrap fails in the Least-Square case and the usual rescaling does not help; El Karoui et al. [2011] showed that for general loss functions, ˆβ β /49
57 Moderate p/n Regime: Negative Results The moderate p/n regime in statistics: Huber [1973] showed that for least-square estimators there always exists a sequence of vectors a n R p such that L at n ( ˆβ LS β ) Var(a T ˆβ N(0, 1). n LS ) Bickel and Freedman [1982] showed that the bootstrap fails in the Least-Square case and the usual rescaling does not help; El Karoui et al. [2011] showed that for general loss functions, ˆβ β El Karoui and Purdom [2015] showed that most widely used resampling schemes give poor inference on β 1. 19/49
58 Moderate p/n Regime: Reason of Failure Qualitatively, Influential observation always exists [Huber, 1973]: let H = X(X T X) 1 X T be the hat matrix, max H i,i 1 i n tr(h) = p >> 0. n 20/49
59 Moderate p/n Regime: Reason of Failure Qualitatively, Influential observation always exists [Huber, 1973]: let H = X(X T X) 1 X T be the hat matrix, max H i,i 1 i n tr(h) = p >> 0. n Regression residuals fail to mimic true error: R i y i x T i ˆβ ɛ i. 20/49
60 Moderate p/n Regime: Reason of Failure Qualitatively, Influential observation always exists [Huber, 1973]: let H = X(X T X) 1 X T be the hat matrix, max H i,i 1 i n tr(h) = p >> 0. n Regression residuals fail to mimic true error: R i y i x T i ˆβ ɛ i. Technically, Taylor expansion/bahadur-type representation fails! 20/49
61 Moderate p/n Regime: Positive Results (Random Designs) Bean et al. [2013] showed that when X has i.i.d. Gaussian entries, for any sequence of a n R p L X,ɛ at n ( ˆβ β ) Var X,ɛ (a T ˆβ) N(0, 1); n 21/49
62 Moderate p/n Regime: Positive Results (Random Designs) Bean et al. [2013] showed that when X has i.i.d. Gaussian entries, for any sequence of a n R p L X,ɛ at n ( ˆβ β ) Var X,ɛ (a T ˆβ) N(0, 1); n El Karoui [2015] extended it to general random designs. 21/49
63 Moderate p/n Regime: Positive Results (Random Designs) Bean et al. [2013] showed that when X has i.i.d. Gaussian entries, for any sequence of a n R p L X,ɛ at n ( ˆβ β ) Var X,ɛ (a T ˆβ) N(0, 1); n El Karoui [2015] extended it to general random designs. The above result does not contradict Huber [1973] in that the randomness comes from both X and ɛ; 21/49
64 Moderate p/n Regime: Positive Results (Random Designs) Bean et al. [2013] showed that when X has i.i.d. Gaussian entries, for any sequence of a n R p L X,ɛ at n ( ˆβ β ) Var X,ɛ (a T ˆβ) N(0, 1); n El Karoui [2015] extended it to general random designs. The above result does not contradict Huber [1973] in that the randomness comes from both X and ɛ; El Karoui et al. [2011] showed that for general loss functions, ˆβ β 0. 21/49
65 Moderate p/n Regime: Summary Provides a more accurate approximation of L( ˆβ 1 ); 22/49
66 Moderate p/n Regime: Summary Provides a more accurate approximation of L( ˆβ 1 ); Qualitatively different from the classical regimes where p/n 0; L2 -consistency of ˆβ no longer holds; the residual R i behaves differently from ɛ i ; fixed design results are different from random design results. 22/49
67 Moderate p/n Regime: Summary Provides a more accurate approximation of L( ˆβ 1 ); Qualitatively different from the classical regimes where p/n 0; L2 -consistency of ˆβ no longer holds; the residual R i behaves differently from ɛ i ; fixed design results are different from random design results. Inference on the vector ˆβ is hard; but inference on the coordinate / low-dimensional linear contrasts of ˆβ is still possible. 22/49
68 Goals (Formal) Our Goal (formal): Under the linear model Y = Xβ + ɛ, Derive the asymptotic distribution of coordinates ˆβ j : under the moderate p/n regime, i.e. p/n κ (0, 1); with a fixed design matrix X; without assumptions on β. 23/49
69 Table of Contents Background Main Results Heuristics and Proof Techniques Numerical Results 24/49
70 Main Result (Informal) Definition 1. Let P and Q be two distributions on R p, d TV (P, Q) = sup A R p P (A) Q(A). 25/49
71 Main Result (Informal) Definition 1. Let P and Q be two distributions on R p, d TV (P, Q) = sup A R p P (A) Q(A). Theorem. Under appropriate conditions on the design matrix X, the distribution of ɛ and the loss function ρ, as p/n κ (0, 1), while n, max j d TV L ˆβ j E ˆβ j Var( ˆβ, N(0, 1) = o(1). j ) 25/49
72 Main Result (Informal) If ρ is an even function and ɛ d = ɛ, then ˆβ β d = β ˆβ = E ˆβ = β. Theorem. Under appropriate conditions on the design matrix X, the distribution of ɛ and the loss function ρ, as p/n κ (0, 1), while n, max j d TV L ˆβ j βj Var( ˆβ j ), N(0, 1) = o(1). 26/49
73 Why Surprising? Classical approaches heavily rely on L 2 consistency of ˆβ, which only holds when p = o(n); Bahadur-type representation for ˆβ where n( ˆβ β) = 1 n n i=1 for some i.i.d. random variable Z i s; Z i + o p ( 1 n ), which can be proved only when p = o ( n 2/3) ; 27/49
74 Why Surprising? Classical approaches heavily rely on L 2 consistency of ˆβ, which only holds when p = o(n); Bahadur-type representation for ˆβ where n( ˆβ β) = 1 n n i=1 for some i.i.d. random variable Z i s; Z i + o p ( 1 n ), which can be proved only when p = o ( n 2/3) ; Question: What happens when p [O(n 2/3 ), O(n)]? 27/49
75 Our Contributions and Limitations Instead, we develops a novel strategy that is built on Leave-on-out method [El Karoui et al., 2011]; and Second-Order Poincaré Inequality [Chatterjee, 2009]. 28/49
76 Our Contributions and Limitations Instead, we develops a novel strategy that is built on Leave-on-out method [El Karoui et al., 2011]; and Second-Order Poincaré Inequality [Chatterjee, 2009]. We prove that ˆβ1 is asymptotically normal for all p [O(1), O(n)] for fixed designs under regularity conditions; the conditions are satisfied by most design matrices. 28/49
77 Our Contributions and Limitations Instead, we develops a novel strategy that is built on Leave-on-out method [El Karoui et al., 2011]; and Second-Order Poincaré Inequality [Chatterjee, 2009]. We prove that ˆβ1 is asymptotically normal for all p [O(1), O(n)] for fixed designs under regularity conditions; the conditions are satisfied by most design matrices. Limitations: we impose strong conditions on ρ and L(ɛ); we do not know how to estimate Var ɛ ( ˆβ 1 ). 28/49
78 Examples: Realization of i.i.d. Designs We consider the case where X is a realization of a random design Z. The examples below are proved to satisfy the technical assumptions with high probability over Z. 29/49
79 Examples: Realization of i.i.d. Designs We consider the case where X is a realization of a random design Z. The examples below are proved to satisfy the technical assumptions with high probability over Z. Example 1 Z has i.i.d. mean-zero sub-gaussian entries with Var(Z ij ) = τ 2 > 0; Example 2 Z contains an intercept term, i.e. Z = (1, Z) and Z R n (p 1) has independent sub-gaussian entries with Z ij µ j d = µj Z ij, Var( Z ij ) > τ 2 for some arbitrary µ j s. 29/49
80 A Counter-Example Consider a one-way ANOVA situation. Each observation i is associated with a label k i {1,..., p} and let X i,j = I(j = k i ). This is equivalent to Y i = β k i + ɛ i. 30/49
81 A Counter-Example Consider a one-way ANOVA situation. Each observation i is associated with a label k i {1,..., p} and let X i,j = I(j = k i ). This is equivalent to Y i = β k i + ɛ i. It is easy to see that ˆβ j = arg min β R i:k i =j This is a standard location problem. ρ(y i β j ). 30/49
82 A Counter-Example Let n j = {i : k i = j}. In the least-square case, i.e. ρ(x) = x 2 /2, ˆβ j = β j + 1 n j ɛ i. i:k i =j 31/49
83 A Counter-Example Let n j = {i : k i = j}. In the least-square case, i.e. ρ(x) = x 2 /2, ˆβ j = β j + 1 n j ɛ i. i:k i =j Assume a balance design, i.e. n j n/p. Then n j << and none of ˆβ j is normal (unless ɛ i are normal); holds for general loss functions ρ. 31/49
84 A Counter-Example Let n j = {i : k i = j}. In the least-square case, i.e. ρ(x) = x 2 /2, ˆβ j = β j + 1 n j ɛ i. i:k i =j Assume a balance design, i.e. n j n/p. Then n j << and none of ˆβ j is normal (unless ɛ i are normal); holds for general loss functions ρ. Conclusion: some non-standard assumptions on X are required. 31/49
85 Table of Contents Background Main Results Heuristics and Proof Techniques Least-Square Estimator: A Motivating Example Second-Order Poincaré Inequality Assumptions Main Results Numerical Results 32/49
86 Least Square Estimator The L 2 loss, ρ(x) = x 2 /2, gives the least-square estimator ˆβ LS = (X T X) 1 X T Y = β + (X T X) 1 X T ɛ. 33/49
87 Least Square Estimator The L 2 loss, ρ(x) = x 2 /2, gives the least-square estimator ˆβ LS = (X T X) 1 X T Y = β + (X T X) 1 X T ɛ. Let e j denote the canonical basis vector in R p, then ˆβ LS j β j = e T j (X T X) 1 X T ɛ α T j ɛ. 33/49
88 Least Square Estimator Lindeberg-Feller CLT claims that in order for ˆβ LS L j βj N(0, 1) Var( ˆβ LS j ) it is sufficient and almost necessary that α j α j 2 0. (1) 34/49
89 Least Square Estimator To see the necessity of the condition, recall the one-way ANOVA case. Let n j = {i : k i = j}, then X T X = diag(n j ) p j=1. Recall that α T j = et j (XT X) 1 X T. This gives α j,i = { 1 n j if k i = j 0 if k i j 35/49
90 Least Square Estimator To see the necessity of the condition, recall the one-way ANOVA case. Let n j = {i : k i = j}, then X T X = diag(n j ) p j=1. Recall that α T j = et j (XT X) 1 X T. This gives α j,i = { 1 n j if k i = j 0 if k i j As a result, α j = 1 n j, α j 2 = 1 nj α j α j 2 = 1 nj and hence However, in moderate p/n regime, there exists j such that n j 1/κ and thus is not asymptotically normal. ˆβ LS j 35/49
91 M-Estimator The result for LSE is derived from the analytical form of ˆβ LS. By contrast, an analytical form is not available for general ρ. 36/49
92 M-Estimator The result for LSE is derived from the analytical form of ˆβ LS. By contrast, an analytical form is not available for general ρ. Let ψ = ρ, it is the solution of 1 n n ψ(y i x T ˆβ) i = 0 1 n i=1 n ψ(ɛ i x T i ( ˆβ β )) = 0. i=1 We show that ˆβj is a smooth function of ɛ; ˆβ j ɛ and ˆβ j ɛ ɛ T are computable. 36/49
93 Second-Order Poincaré Inequality ˆβ j is a smooth transform of a random vector, ɛ, with independent entries. A powerful CLT for this type of statistics is Second-Order Poincaré Inequality [Chatterjee, 2009]. 37/49
94 Second-Order Poincaré Inequality ˆβ j is a smooth transform of a random vector, ɛ, with independent entries. A powerful CLT for this type of statistics is Second-Order Poincaré Inequality [Chatterjee, 2009]. Definition 2. For each c 1, c 2 > 0, let L(c 1, c 2 ) be the class of probability measures on R that arise as laws of random variables like u(w ), where W N(0, 1) and u C 2 (R n ) with u (x) c 1 and u (x) c 2. For example, u = Id gives N(0, 1) and u = Φ gives U([0, 1]). 37/49
95 Second-Order Poincaré Inequality Proposition 1 (SOPI; Chatterjee [2009]). Let W = (W 1,..., W n ) indep. L(c 1, c 2 ). Take any g C 2 (R n ) and let U = g(w ), κ 1 = (E g(w ) 4 2) 1 4 ; κ 2 = (E 2 g(w ) 4 op) 1 4 ; n κ 0 = (E i g(w ) 4 ) 1 2. i=1 If EU 4 <, then ( ) ) U EU d TV (L, N(0, 1) Var(U) κ 0 + κ 1 κ 2 Var(U). 38/49
96 Assumptions A1 ρ(0) = ψ(0) = 0 and for any x R, 0 < K 0 ψ (x) K 1, ψ (x) K 2 ; A2 ɛ has independent entries with ɛ i L(c 1, c 2 ); A3 Let λ + and λ be the largest and smallest eigenvalues of X T X/n and λ + = O(1), λ = Ω(1). A4 Similar to the condition for OLS: max j e T j (XT X) 1 X T e T j (XT X) 1 X T 2 = o(1) A5 Similar to the condition that ( ) min Var( ˆβ 1 j ) = Ω j n 39/49
97 Main Results Theorem 3. Under assumptions A1 A5, as p/n κ for some κ (0, 1) while n, max j d TV L ˆβ j E ˆβ j Var( ˆβ, N(0, 1) = o(1). j ) 40/49
98 Table of Contents Background Main Results Heuristics and Proof Techniques Numerical Results 41/49
99 Setup Design matrix X: (i.i.d. design): X ij i.i.d. F ; (partial Hadamard design): a matrix formed by a random set of p columns of a n n Hadamard matrix. Entry Distribution F: F = N(0, 1); F = t 2. Error Distribution L(ɛ): ɛ i are i.i.d. with ɛ i N(0, 1); ɛ i t 2. 42/49
100 Setup Sample Size n: {100, 200, 400, 800}; κ = p/n: {0.5, 0.8}; Loss Function ρ: Huber loss with k = 1.345, { 1 ρ(x) = 2 x2 x k kx k2 2 x > k ; Coefficients: β = 0. 43/49
101 Asymptotic Normality of A Single Coordinate 44/49
102 Asymptotic Normality of A Single Coordinate X 44/49
103 Asymptotic Normality of A Single Coordinate X β 44/49
104 Asymptotic Normality of A Single Coordinate X β ɛ 1 44/49
105 Asymptotic Normality of A Single Coordinate y 1 = X β + ɛ 1 44/49
106 Asymptotic Normality of A Single Coordinate y 1 = X β + ɛ 1 M-Estimates: ˆβ(1) 1, 44/49
107 Asymptotic Normality of A Single Coordinate β y 2 = X + ɛ 1 ɛ 2 M-Estimates: ˆβ(1) 1, 44/49
108 Asymptotic Normality of A Single Coordinate β y 2 = X + ɛ 1 ɛ 2 M-Estimates: ˆβ(1) (2) 1, ˆβ 1, 44/49
109 Asymptotic Normality of A Single Coordinate β y 3 = X + ɛ 1 ɛ 2 ɛ 3 ɛ 3 M-Estimates: ˆβ(1) (2) 1, ˆβ 1, 44/49
110 Asymptotic Normality of A Single Coordinate β y 3 = X + ɛ 1 ɛ 2 ɛ 3 ɛ 3 M-Estimates: ˆβ(1) 1 (2) (3), ˆβ 1, ˆβ 1, 44/49
111 Asymptotic Normality of A Single Coordinate y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1 (2) (3), ˆβ 1, ˆβ 1, 44/49
112 Asymptotic Normality of A Single Coordinate y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1, ˆβ (2) 1 (3) (r), ˆβ 1,..., ˆβ 1. 44/49
113 Asymptotic Normality of A Single Coordinate y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1 ( ŝd se {, ˆβ (2) 1 (3) (r), ˆβ 1,..., ˆβ 1. (1) (r) ˆβ 1,..., ˆβ 1 ); } 44/49
114 Asymptotic Normality of A Single Coordinate y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1 ( ŝd se {, ˆβ (2) 1 (3) (r), ˆβ 1,..., ˆβ 1. (1) (r) ˆβ 1,..., ˆβ 1 ); } ( ) want to compare L ˆβ1 /ŝd with N(0, 1); 44/49
115 Asymptotic Normality of A Single Coordinate y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1 ( ŝd se {, ˆβ (2) 1 (3) (r), ˆβ 1,..., ˆβ 1. (1) (r) ˆβ 1,..., ˆβ 1 ); } ( ) want to compare L ˆβ1 /ŝd with N(0, 1); count the fraction of (j) ˆβ 1 [ 1.96ŝd, 1.96ŝd] as the proxy; 44/49
116 Asymptotic Normality of A Single Coordinate y r = X β + ɛ 1 ɛ 2 ɛ 3 ɛ r M-Estimates: ˆβ(1) 1 ( ŝd se {, ˆβ (2) 1 (3) (r), ˆβ 1,..., ˆβ 1. (1) (r) ˆβ 1,..., ˆβ 1 ); } ( ) want to compare L ˆβ1 /ŝd with N(0, 1); count the fraction of (j) ˆβ 1 [ 1.96ŝd, 1.96ŝd] as the proxy; should be close to 0.95 ideally. 44/49
117 Asymptotic Normality of A Single Coordinate Coverage of β^1 (κ = 0.5) normal t(2) Coverage of β^1 (κ = 0.8) normal t(2) Coverage iid hadamard Coverage iid hadamard Sample Size Entry Dist. normal t(2) hadamard Sample Size Entry Dist. normal t(2) hadamard 45/49
118 Conclusion We establish the coordinate-wise asymptotic normality of the M-estimator for certain fixed design matrices under the moderate p/n regime under regularity conditions on X, L(ɛ) and ρ but no condition on β ; We prove the result by using the novel approach Second-Order Poincaré Inequality [Chatterjee, 2009]; We show that the regularity conditions are satisfied by a broad class of designs. 46/49
119 Discussion 47/49
120 Discussion Inference asym. normality + asym. bias + asym. variance Var( ˆβ 1 X) Var( ˆβ 1 ) when X is indeed a realization of a random design? Resampling method to give conservative variance estimates? More advanced boostrap? 47/49
121 Discussion Inference asym. normality + asym. bias + asym. variance Var( ˆβ 1 X) Var( ˆβ 1 ) when X is indeed a realization of a random design? Resampling method to give conservative variance estimates? More advanced boostrap? Relax the regularity conditions: Generalize to non-strongly convex and non-smooth loss functions? Generalize to general error distributions? 47/49
122 Discussion Inference asym. normality + asym. bias + asym. variance Var( ˆβ 1 X) Var( ˆβ 1 ) when X is indeed a realization of a random design? Resampling method to give conservative variance estimates? More advanced boostrap? Relax the regularity conditions: Generalize to non-strongly convex and non-smooth loss functions? Generalize to general error distributions? Get rid of asymptotics: Yes, exact finite-sample guarantee if n/p > 20; No assumption on X or β ; Only exchangeability assumption on ɛ. 47/49
123 Thank You! 48/49
124 References Derek Bean, Peter J Bickel, Noureddine El Karoui, and Bin Yu. Optimal m-estimation in high-dimensional regression. Proceedings of the National Academy of Sciences, 110(36): , Peter J Bickel and David A Freedman. Bootstrapping regression models with many parameters. Festschrift for Erich L. Lehmann, pages 28 48, Sourav Chatterjee. Fluctuations of eigenvalues and second order poincaré inequalities. Probability Theory and Related Fields, 143(1-2):1 40, Noureddine El Karoui. On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators Noureddine El Karoui and Elizabeth Purdom. Can we trust the bootstrap in high-dimension? UC Berkeley Statistics Department Technical Report, Noureddine El Karoui, Derek Bean, Peter J Bickel, Chinghway Lim, and Bin Yu. On robust regression with high-dimensional predictors. Proceedings of the National Academy of Sciences, 110(36): , Peter J Huber. Robust regression: asymptotics, conjectures and monte carlo. The Annals of Statistics, pages , Enno Mammen. Asymptotics with increasing dimension for robust regression with applications to the bootstrap. The Annals of Statistics, pages , Stephen Portnoy. Asymptotic behavior of m-estimators of p regression parameters when p2/n is large. i. consistency. The Annals of Statistics, pages , Stephen Portnoy. Asymptotic behavior of m estimators of p regression parameters when p2/n is large; ii. normal approximation. The Annals of Statistics, pages , /49
Inference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationInference for High Dimensional Robust Regression
Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:
More informationHigh-dimensional regression:
High-dimensional regression: How to pick the objective function in high-dimension UC Berkeley March 11, 2013 Joint work with Noureddine El Karoui, Peter Bickel, Chingwhay Lim, and Bin Yu 1 / 12 Notation.
More informationCan we trust the bootstrap in high-dimension?
Can we trust the bootstrap in high-dimension? Noureddine El Karoui and Elizabeth Purdom Department of Statistics, University of California, Berkeley First submitted: February 4, 205 This version: October
More informationCan We Trust the Bootstrap in High-dimensions? The Case of Linear Models
Journal of Machine Learning Research 9 208-66 Submitted /7; Revised 2/7; Published 08/8 Can We Trust the Bootstrap in High-dimensions? The Case of Linear Models Noureddine El Karoui Criteo AI Lab 32 Rue
More informationLikelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square
Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Yuxin Chen Electrical Engineering, Princeton University Coauthors Pragya Sur Stanford Statistics Emmanuel
More informationCan we trust the bootstrap in high-dimension?
Can we trust the bootstrap in high-dimension? Noureddine El Karoui and Elizabeth Purdom Department of Statistics, University of California, Berkeley February 4, 05 Abstract We consider the performance
More informationEconomics 583: Econometric Theory I A Primer on Asymptotics
Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationIV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade
IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade Denis Chetverikov Brad Larsen Christopher Palmer UCLA, Stanford and NBER, UC Berkeley September
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationRegression Analysis for Data Containing Outliers and High Leverage Points
Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain
More informationF9 F10: Autocorrelation
F9 F10: Autocorrelation Feng Li Department of Statistics, Stockholm University Introduction In the classic regression model we assume cov(u i, u j x i, x k ) = E(u i, u j ) = 0 What if we break the assumption?
More informationEconometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague
Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage
More informationAdvanced Statistics II: Non Parametric Tests
Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationRobust estimation, efficiency, and Lasso debiasing
Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling
More informationA Resampling Method on Pivotal Estimating Functions
A Resampling Method on Pivotal Estimating Functions Kun Nie Biostat 277,Winter 2004 March 17, 2004 Outline Introduction A General Resampling Method Examples - Quantile Regression -Rank Regression -Simulation
More informationAFT Models and Empirical Likelihood
AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t
More informationM-Estimation under High-Dimensional Asymptotics
M-Estimation under High-Dimensional Asymptotics 2014-05-01 Classical M-estimation Big Data M-estimation An out-of-the-park grand-slam home run Annals of Mathematical Statistics 1964 Richard Olshen Classical
More informationThe Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University
The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor
More informationSupplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017
Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationlarge number of i.i.d. observations from P. For concreteness, suppose
1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate
More informationMA Advanced Econometrics: Applying Least Squares to Time Series
MA Advanced Econometrics: Applying Least Squares to Time Series Karl Whelan School of Economics, UCD February 15, 2011 Karl Whelan (UCD) Time Series February 15, 2011 1 / 24 Part I Time Series: Standard
More informationIntroduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data
Discussion of sampling approach in big data Big data discussion group at MSCS of UIC Outline 1 Introduction 2 The framework 3 Bias and variance 4 Approximate computation of leverage 5 Empirical evaluation
More informationBootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator
Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationUNIVERSITÄT POTSDAM Institut für Mathematik
UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam
More informationLecture 14 October 13
STAT 383C: Statistical Modeling I Fall 2015 Lecture 14 October 13 Lecturer: Purnamrita Sarkar Scribe: Some one Disclaimer: These scribe notes have been slightly proofread and may have typos etc. Note:
More informationRegression Diagnostics for Survey Data
Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics
More informationMultivariate Regression Analysis
Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More informationStatistics 910, #5 1. Regression Methods
Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationA Primer on Asymptotics
A Primer on Asymptotics Eric Zivot Department of Economics University of Washington September 30, 2003 Revised: October 7, 2009 Introduction The two main concepts in asymptotic theory covered in these
More informationRobustní monitorování stability v modelu CAPM
Robustní monitorování stability v modelu CAPM Ondřej Chochola, Marie Hušková, Zuzana Prášková (MFF UK) Josef Steinebach (University of Cologne) ROBUST 2012, Němčičky, 10.-14.9. 2012 Contents Introduction
More informationUnderstanding Regressions with Observations Collected at High Frequency over Long Span
Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationProblem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function
Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Solution. If we does not need the pointwise limit of
More informationRobust high-dimensional linear regression: A statistical perspective
Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,
More informationM-estimation in high-dimensional linear model
Wang and Zhu Journal of Inequalities and Applications 208 208:225 https://doi.org/0.86/s3660-08-89-3 R E S E A R C H Open Access M-estimation in high-dimensional linear model Kai Wang and Yanling Zhu *
More informationIndian Statistical Institute
Indian Statistical Institute Introductory Computer programming Robust Regression methods with high breakdown point Author: Roll No: MD1701 February 24, 2018 Contents 1 Introduction 2 2 Criteria for evaluating
More informationPanel Data Models. James L. Powell Department of Economics University of California, Berkeley
Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationLecture 3: Central Limit Theorem
Lecture 3: Central Limit Theorem Scribe: Jacy Bird (Division of Engineering and Applied Sciences, Harvard) February 8, 003 The goal of today s lecture is to investigate the asymptotic behavior of P N (
More informationSliced Inverse Regression
Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed
More informationQuantile Processes for Semi and Nonparametric Regression
Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response
More informationWhy Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory
Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Andreas Buja joint with the PoSI Group: Richard Berk, Lawrence Brown, Linda Zhao, Kai Zhang Ed George, Mikhail Traskin, Emil Pitkin,
More informationQuantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation
Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Maria Ponomareva University of Western Ontario May 8, 2011 Abstract This paper proposes a moments-based
More informationLecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf
Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationECO Class 6 Nonparametric Econometrics
ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................
More informationMatrix Factorizations
1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015
More informationPrediction Intervals For Lasso and Relaxed Lasso Using D Variables
Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School 2017 Prediction Intervals For Lasso and Relaxed Lasso Using D Variables Craig J. Bartelsmeyer Southern Illinois University
More informationInference for Identifiable Parameters in Partially Identified Econometric Models
Inference for Identifiable Parameters in Partially Identified Econometric Models Joseph P. Romano Department of Statistics Stanford University romano@stat.stanford.edu Azeem M. Shaikh Department of Economics
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1
MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical
More informationLeast Squares Estimation-Finite-Sample Properties
Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions
More informationProgram Evaluation with High-Dimensional Data
Program Evaluation with High-Dimensional Data Alexandre Belloni Duke Victor Chernozhukov MIT Iván Fernández-Val BU Christian Hansen Booth ESWC 215 August 17, 215 Introduction Goal is to perform inference
More informationNonparametric Inference via Bootstrapping the Debiased Estimator
Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be
More informationOn Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness
Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu
More information1 Motivation for Instrumental Variable (IV) Regression
ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data
More informationConfidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean
Confidence Intervals Confidence interval for sample mean The CLT tells us: as the sample size n increases, the sample mean is approximately Normal with mean and standard deviation Thus, we have a standard
More informationFluctuations from the Semicircle Law Lecture 4
Fluctuations from the Semicircle Law Lecture 4 Ioana Dumitriu University of Washington Women and Math, IAS 2014 May 23, 2014 Ioana Dumitriu (UW) Fluctuations from the Semicircle Law Lecture 4 May 23, 2014
More informationQuantile Regression for Extraordinarily Large Data
Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step
More informationInference on distributions and quantiles using a finite-sample Dirichlet process
Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics
More informationLarge sample distribution for fully functional periodicity tests
Large sample distribution for fully functional periodicity tests Siegfried Hörmann Institute for Statistics Graz University of Technology Based on joint work with Piotr Kokoszka (Colorado State) and Gilles
More informationModel Mis-specification
Model Mis-specification Carlo Favero Favero () Model Mis-specification 1 / 28 Model Mis-specification Each specification can be interpreted of the result of a reduction process, what happens if the reduction
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationJEREMY TAYLOR S CONTRIBUTIONS TO TRANSFORMATION MODEL
1 / 25 JEREMY TAYLOR S CONTRIBUTIONS TO TRANSFORMATION MODELS DEPT. OF STATISTICS, UNIV. WISCONSIN, MADISON BIOMEDICAL STATISTICAL MODELING. CELEBRATION OF JEREMY TAYLOR S OF 60TH BIRTHDAY. UNIVERSITY
More informationEXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2
EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME Xavier Mestre, Pascal Vallet 2 Centre Tecnològic de Telecomunicacions de Catalunya, Castelldefels, Barcelona (Spain) 2 Institut
More informationthe error term could vary over the observations, in ways that are related
Heteroskedasticity We now consider the implications of relaxing the assumption that the conditional variance Var(u i x i ) = σ 2 is common to all observations i = 1,..., n In many applications, we may
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationA Comparison of Robust Estimators Based on Two Types of Trimming
Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical
More informationHierarchical Modeling for Univariate Spatial Data
Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More information36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1
36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations
More informationδ -method and M-estimation
Econ 2110, fall 2016, Part IVb Asymptotic Theory: δ -method and M-estimation Maximilian Kasy Department of Economics, Harvard University 1 / 40 Example Suppose we estimate the average effect of class size
More informationContents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects
Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR
More informationStat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)
Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationEconometrics Summary Algebraic and Statistical Preliminaries
Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L
More informationCALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR
J. Japan Statist. Soc. Vol. 3 No. 200 39 5 CALCULAION MEHOD FOR NONLINEAR DYNAMIC LEAS-ABSOLUE DEVIAIONS ESIMAOR Kohtaro Hitomi * and Masato Kagihara ** In a nonlinear dynamic model, the consistency and
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationProbability and Statistics Notes
Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline
More informationSpatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions
Spatial inference I will start with a simple model, using species diversity data Strong spatial dependence, Î = 0.79 what is the mean diversity? How precise is our estimate? Sampling discussion: The 64
More informationSTAT 4385 Topic 06: Model Diagnostics
STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized
More informationLawrence D. Brown* and Daniel McCarthy*
Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals
More information