M-Estimation under High-Dimensional Asymptotics

Size: px
Start display at page:

Download "M-Estimation under High-Dimensional Asymptotics"

Transcription

1 M-Estimation under High-Dimensional Asymptotics

2 Classical M-estimation Big Data M-estimation An out-of-the-park grand-slam home run Annals of Mathematical Statistics 1964 Richard Olshen

3 Classical M-estimation Big Data M-estimation

4 Classical M-estimation Big Data M-estimation M-estimation Basics Location model Y i = θ + Z i, i = 1,..., n Errors: Z i F, not necessarily Gaussian. Loss Function ρ(t) eg t 2, t, log(f (t)),... (M) min θ n ρ(y i θ) i=1 Asymptotic Distribution n( ˆθ n θ) D N(0, V ), n. Asymptotic Variance: ψ = ρ : Information Bound ψ 2 df V (ψ, F ) = ( ψ df ) 2 V (ψ, F ) 1 I (F )

5 Classical M-estimation Big Data M-estimation The One-Step Viewpoint P. J. BICKEL* One-Step Huber Estimates in the Linear Model Simple "one-step" versions of Huber's (M) estimates for the linear equivalence holds in the more general context of the model are introduced. Some relevant Monte Carlo results obtained linear model for general 46. in the Princeton project [1] are singled out and discussed. The large Typically the estimates obtained from sample behavior of these procedures is examined (1.1) are not under very mild regularity conditions. scale equivariant.1 To obtain acceptable procedures a scale equivariant and location invariant estimate of scale 1. INTRODUCTION 6 must be calculated from the data and 6 be obtained as the solution of In 1964 Huber [7] introduced a class of estimates n (referred to as (M)) in the location problem, studied their E, Off (Xi - 0) = O(1.4) asymptotic behavior and identified robust members of j-1 the group. These procedures are the solutions 8 of equa- where tions of the form, (x) = (x/a). (1.5) n E +(X}-') -0, (1.1) The resulting 6 is then both location and scale equivariant. The estimate 6 can be obtained simultaneously where Xi = 0 + El, *?Xn = + E. and with ' by solving a system of equations such as those of El, ** En are unknown independent, identically distributed Huber's errors Proposal 2 [8, p. 96] or the "likelihood which have a distribution F which is symmetric about 0. equations" n Xj If F has a density f which is smooth and if f is known, - E 4 ;\ ) 0X then maximum likelihood estimates if they exist satisfy a-1 O' (1.6) (1.1) with F6 = -f'/f. n Xi - 6 Under successively milder regularity conditions on V and F, Huber showed in [7] and [8] that such ' were consistent and asymptotically normal with mean 0 and where x (t) = t4* (t) -1. Or, we may choose 6 indepenvariance K(#k, F)/n where dently. For instance, in this article, the normalized interquartile range, K(VI, F) =1 /(t)f(t) 2 dt f(t)do(t)i. (1.2) l = (X(n-[n/4]+1) - X([,/4]))/24'-1(3/4), (1.7) JASA 1975 If F is unknown but close to a normal distribution with and the symmetrized interquartile range, mean 0 and known variance in a suitable sense, Huber 62 = median { i - m I/(D-1(3), (1.8) in [7] further showed that (M) estimates based on are used where X(l) <... < X(n) are the order statistics, #K(t) = t if Itl < K 4' is the standard normal cdf and m is the sample median. = Ksgnt if Itl >K DLD, (1.3) Andrea If 6 Montanari -+ a(f) at rate 1/v'n and F is symmetric as hy-

6 Classical M-estimation Big Data M-estimation Regression M-estimation: the One-Step Viewpoint Regression model Objective function of (M): Y i = X i θ + Z i, R(ϑ) = n i=1 Z i iid F, i = 1,..., n ρ(y i X i ϑ) (M) min R(ϑ) ϑ One-step estimate: θn any n-consistent estimate of θ: ˆθ 1 = θ n [Hess R θn ] 1 R θn. Effectiveness: ˆθ true solution of M-equation: E( ˆθ 1 ˆθ)( ˆθ 1 ˆθ) = o(n 1 )

7 Classical M-estimation Big Data M-estimation Driving Idea of Classical Asymptotics The M-estimate is asymptotically equivalent to a single step of Newton s method for finding a zero of R starting at the true underlying parameter. Goes back to Fisher, Method of Scoring for MLE.

8 Classical M-estimation Big Data M-estimation Derivation of Asymptotic Variance Formula Approximation to One-Step: ˆθ 1 = θ + 1 B(ψ, F ) (X X ) 1 X (ψ(z i )) + o p (n 1/2 ) where B(ψ, F ) = ψ df. Observe that Var((X X ) 1 X (ψ(z i ))) (X X ) 1 A(ψ, F ) where A(ψ, F ) = ψ 2 df. Hence if X i,j N(0, 1 n ) Var(ˆθ i θ i ) A(ψ, F ) B(ψ, F ) 2 = V (ψ, F )

9 Classical M-estimation Big Data M-estimation Asymptotics for Regression, I

10 Classical M-estimation Big Data M-estimation Asymptotics for Regression, II PJ Huber, Annals of Statistics 1973

11 Classical M-estimation Big Data M-estimation

12 Classical M-estimation Big Data M-estimation

13 Classical M-estimation Big Data M-estimation On robust regression with high-dimensional predictors Noureddine El Karoui a,1, Derek Bean a, Peter J. Bickel a,1, Chinghway Lim b, and Bin Yu a a Department of Statistics, University of California, Berkeley, CA 94720; and b Department of Statistics and Applied Probability, Faculty of Science, National University of Singapore, Contributed by Peter J. Bickel, April 25, 2013 (sent for review March 1, 2012) We study regression M-estimates in the setting where p, the number of covariates, and n, the number of observations, are both aged to obtain rigorous proofs for many of our assertions. They simulations. (While the paper was under review, we have man- large, but p n. Wefind an exact stochastic representation for will be presented elsewhere because they are very long and technical.) We give several results for covariates that are Gaussian or the distribution of ^β = argmin β R p n i=1 ρðyi Xi βþ at fixed p and n under various assumptions on the objective function ρ and our derived from Gaussian but present grounds that the behavior statistical model. A scalar random variable whose deterministic holds much more generally the key being concentration of limit rρðκþ can be studied when p=n κ > 0 plays a central role in certain quadratic forms involving the vectors of covariates. We this representation. We discover a nonlinear system of two deterministic equations that characterizes rρðκþ. Interestingly, the sys- the design matrix. [Further results with different designs can be also investigate the sensitivity of our results to the geometry of tem shows that rρðκþ depends on ρ through proximal mappings of found in our work (5).] ρ as well as various aspects of the statistical model underlying our We find that (i) estimates of coordinates and contrasts that study. Several surprising results emerge. In particular, we show have coefficients independent of the observed covariates continue to be unbiased and asymptotically normal; and (ii) as in the that, when p=n is large enough, least squares becomes preferable to least absolute deviations for double-exponential errors. fixed p case, this happens at scale n 1=2, at least when the minimal and maximal eigenvalues of the covariance of the predictors prox function high-dimensional statistics concentration of measure stay bounded away from 0 and, respectively.* These findings are obtained by (i) using leave-one-out perturbation arguments both for the data units and predictors; (ii) n the classical period up to the 1980s, research on regression exhibiting a pair of master equations from which the asymptotic Imodels focused on situations for which the number of covariates p was much smaller than n, the sample size. Least-squares asymptotic variances can be recovered; and (iii) showing that mean square prediction error and the correct expressions for regression (LSE) was the main fitting tool used, but its sensitivity these two quantities depend in a nonlinear way on p/n, the error to outliers came to the fore with the work of Tukey, Huber, distribution, the design matrix, and the form of the objective Hampel, and others starting in the 1950s. function, ρ. Given the model Yi = Xi β 0 + ei and M-estimation methods It is worth noting that our findings go against the likelihood described in the Abstract, it follows from the discussion in ref. 1 principle that the ideal objective function is the negative logdensity of the error distribution. For example, we show that, (p. 170, for instance) that, if the design matrix X (an n p matrix whose ith row is Xi) is nonsingular, under various regularity when p/n is large enough, it becomes preferable use least conditions on X, ρ, ψ = ρ and DLD, the Andrea [independent Montanari identically dis- squares rather than LAD for double-exponential errors. We il- STATISTICS

14 Classical M-estimation Big Data M-estimation Bickel, Yu, El Karoui

15 Classical M-estimation Big Data M-estimation High-Dimensional Asymptotics (HDA) n, p n. X i,j iid N(0, 1 n ) p n 1 θ 0,n 2 2 τ 2 0. Y = X θ 0 + Z n/p n δ (1, ). Meaning of δ : # of observations per parameter to be estimated

16 Classical M-estimation Big Data M-estimation Emergent Phenomena under HDA Classical setting - random design, p fixed, n. var( ˆθ i ) V (ψ, F ), n. HDA setting - for n/pn δ (1, ) Effective Score Ψ = Ψ δ,ψ,f (to be described... ) Effective Error Distribution F = F N(0, τ 2 ) Extra Gaussian noise: τ = τ (δ, ψ, F ). Asymptotic Variance under HDA var( ˆθ i ) V ( ψ, F ), n, p n. Classical Correspondence Ψ δ,ψ,f ψ, F δ,ψ,f F, δ.

17 Classical M-estimation Big Data M-estimation Immediate Implications 1. Classical formulas for confidence statements about M-estimates are overoptimistic under high dimensional asymptotics (even dramatically so). 2. Maximum likelihood estimates are inefficient under high-dimensional asymptotics. Score ψ = ( log f W ) does not yield an efficient estimator. 3. The usual Fisher Information bound is not attainable, as I ( F ) < I (F ).

18 : Sample implication of DLD & Montanari (2013) Corollary. For an M-estimator under HDA with errors Z i iid F : lim Ave ivar(ˆθ i ) n 1 1 1/δ 1 I (F ) Effect of HDA parameter δ (1, ) evident

19

20 Regularized ρ Definition ρ : R R is smooth if C 1 with absolutely continuous derivative ψ = ρ having a bounded a.e. derivative ψ. Excludes l 1 :ρ l1 (z) = z. Allows Huber: ρ H (z) = min(z 2 /2, λ z λ 2 /2) Regularized ρ: { ρ b (z) min bρ(x) + 1 x R 2 (x z)2}, (1) Min-convolution of ρ with a square loss.

21 Regularized Score Function Regularized Score Function Ψ(z; b) = ρ b (z). Example: Huber loss ρ H (z; λ), with score function ψ(z; λ) = min(max( λ, z), λ). ( z ) Ψ(z; b) = bψ 1 + b ; λ. Ψ like ψ, but central slope Ψ ( ; b) = Effective score: for special choice b, Ψ = Ψ(z; b ) b 1+b < 1.

22 Approximate Message Passing Algorithm Initialization θ 0 = 0. Adjusted residuals. R t = Y X θ t + Ψ(R t 1 ; b t 1 ) ; (2) Effective Score. Choose b t > 0 achieving empirical average slope p/n (0, 1). p n = 1 n n Ψ (Ri t ; b). (3) i=1 Scoring. Apply effective score function Ψ(R t ; b t ): θ t+1 = θ t + δx T Ψ(R t ; b t ). (4)

23 Determining Regularization Parameter b t Example: ρ = ρ H ( ; 3), F = 0.95N(0, 1) δ Determination of b History of b t b 0.5 b t Average Slope AMP iteration t

24 Connection of AMP to M-estimation Lemma Let ( θ, R, b ) be a fixed point of the AMP iteration (2), (3), (4) having b > 0. Then θ is a minimizer of the problem (M). Viceversa, any minimizer θ of the problem (M) corresponds to an AMP fixed point of the form ( θ, R, b ).

25 Example: Size n = 1000, p = 200, so δ = 5. Truth θ 0 random θ 0 2 = 6 p. Errors F = CN(0.05, 10), i.e. F = 0.95Φ H 10 H x denotes unit mass at x. Loss ρ = ρ H (z; λ) with λ = 3. Iterations Run Amp for 20 iterations. Comparison Use CVX to obtain Huber estimator directly

26 Convergence of AMP in Example Convergence of AMP ˆθ t to θ 0 AMP M 6 Convergence of AMP ˆθ t to ˆθ RMSE( ˆθ t, θ 0) RMSE( ˆθ t, ˆθ) AMP iteration t AMP iteration t

27 Contrast AMP w/ Traditional Scoring Traditional residual: Z = Y X θ Traditional Scoring: AMP: θ 1 = θ + 1 n 1 n i=1 ψ (Z t ) (X X ) 1 X (ψ(z i )), (5) θ t+1 = θ t + δn 1 X Ψ(R t ; b t ). Object Scoring AMP Average Slope n i=1 ψ (Z i )/n n i=1 Ψ (Ri t; b t)/n δ 1 Gram (X X ) 1 1 Residual Traditional (Z i ) Adjusted (Ri t) Score ψ( ) Ψ( ; b t ) Basepoint True θ Current Iterate Iteration 1 t 1

28 Basic contrasts between HDA and Classical Asymptotics Under the High-Dimensional Asymptotic, δ (1, ), The heuristic expand around θ is incorrect; it understates the true uncertainty. The heuristic errors equal the residuals is incorrect; the residuals are noisier. The heuristic take one step is incorrect; must take many steps.

29 Extra Variance at Initialization of AMP Initialize with θ 0 = 0, R 1 = 0. Initial residual R 1 = Y X θ 0 = Z + X (θ 0 θ 0 ). Terms Z and X (θ 0 θ 0 ) are independent. X (θ 0 θ 0 ) is Gaussian with variance τ 2 0 = θ 0 θ /n = MSE( θ 0, θ 0 )/δ. Var(R 1 i ) = Var(Z) + Var(X (θ 0 θ 0 )) = Var(Z) + MSE( θ 0, θ 0 )/δ. Extra Gaussian noise τ 2 0 = MSE( θ 0, θ 0 )/δ.

30 Evolution of Extra Variance at later AMP iterations Pretend R t Z + τ t W with W an independent standard normal Define variance map { V(τ 2, b; δ, F ) = δ E Ψ(Z + τ W ; b) 2}, Define slope par. b = b(τ; δ, F ): smallest solution b 0 to 1 { } δ = E Ψ (Z + τ W ; b) (6) Definition dynamical system {τ 2 t } t 0, starting at τ 2 0 R 0 by τ 2 t+1 = V(τ 2 t, b(τ t )) = V(τ t, b(τ t ; δ, F ); δ, F ). (7)

31 Variance Map, in running example V(tau 2 ) y=x Variance map V( 2 ), Contaminated Normal 3 Output Variance (0.472,0.472) Input Variance 0 F = 0.95N(0, 1) H 10, ρ H with λ = 3.

32 , in running example V(tau 2 ) y=x Dynamics of t Output Input Variance 0 F = 0.95N(0, 1) H 10, ρ H with λ = 3.

33 Operating Characteristics from Definition The state evolution formalism assigns predictions E under states S = (τ, b, δ, F ) Functions ξ of θ θ 0. Predict p 1 i p ξ( θ i θ 0,i ) by { E(ξ( θ ϑ) S) E ξ( } δ τ Z), Functions ξ 2 of Residual, Error. Predict n 1 n i=1 ξ 2(R i, Z i ) by E(ξ 2 (R, Z) S) Eξ 2 (Z + τ W, Z) where W N(0, 1) and Z F is independent of W.

34 Example of Predictions MSE at iteration t. We let S t = (τ t, b(τ t ), δ, F ) denote the state of AMP at iteration t, and predict { MSE( θ t, θ 0 ) E(( ˆϑ ϑ) 2 S t ) = E ( δ τ t Z) 2} = δτt 2. MSE at convergence. With τ > 0 the limit of τ t, let S = (τ, b(τ ), δ, F ) denote the equilibrium state of AMP { MSE( θ, θ 0 ) E(( ˆϑ ϑ) 2 S ) = E ( δ τ Z) 2} = δτ 2. Ordinary residuals Y X θ at AMP convergence. Setting η(z; b) = z Ψ(z; b) Y X θ D η(z + τ W ; b ).

35 t b t Iteration t MSE 5 MAE Iteration t Iteration t Figure : Experimental means from 10 simulations compared with predictions under CN(0.05, 10), with Huber ψ, λ = 3. Upper Left: ˆτ t = θ t θ 0 2 / n. Upper Right: ˆb t. Lower Left: MSE, Mean Squared Error. Lower Right: MAE, Mean Absolute Error. Blue + symbols: Empirical means of AMP observables. Green Curve: Theoretical predictions by SE.

36 Lower Bounds on Lemma. Suppose that F has a well-defined Fisher information I (F ). Then for any t > 0 τ 2 t 1 δi (F ). Lemma. (uses Barron-Madiman, 2007) Corollary. For t > k I (F N(0, τ 2 )) τ 2 t I (F ) 1 + τ 2 I (F ). 1 + δ δ δ k. δi (F ) Corollary. For every accumulation point τ of τ 2 1 δ 1 1 I (F ). Corollary. For an M-estimator under HDA with errors Z i iid F : lim n Var( ˆθ 1 i ) 1 1/δ 1 I (F )

37 1 Basic Assumptions: A1 Discrepancy function ρ is convex and smooth; A2 Matrices {X(n)} n are iid N(0, 1 n ) A3 θ 0, θ 0 = 0 are deterministic sequences such that AMSE(θ 0, θ 0 ) = δτ 2 0. A4 F has finite second moment. Terminology: Let {τ 2 t } t 0 denote the state evolution sequence with initial condition τ 2 0. Let { θt, R t }t 0 be the AMP trajectory with parameters b t. Definition: A function ξ : R k R is pseudo-lipschitz if there exists L < such that, for all x, y R k, ξ(x) ξ(y) L(1 + x 2 + y 2 ) x y 2.

38 2 Theorem Assume A1-A4. Let ξ : R R, ξ 2 : R R R be pseudo-lipschitz functions. Then, for any t > 0, we have, for W N(0, 1) independent of Z F 1 lim n p 1 lim n n p i=1 n i=1 { ξ( θ i t θ 0,i ) = a.s. E ξ( } δ τ t Z), (8) { } ξ 2 (Ri t, Z i ) = a.s. E ξ 2 (Z + τ t W, Z). (9)

39 Corollary of Correctness Under HDA, can define AMSE( ˆθ t, θ 0 ) = a.s lim θ t θ 0 2 n,p 2/p; and and AMSE( ˆθ t, θ 0 ) = δτ 2 t. lim AMSE( ˆθ t, θ 0 ) = δτ 2 t.

40 Convergence of AMP to M-Estimator, 2 Theorem. Assume A1-A4 and that ρ is strongly convex and δ > 1. Let (τ, b ) be a solution of the two equations { τ 2 = δ E Ψ(Z + τ W ; b) 2}, (10) 1 { } δ = E Ψ (Z + τ W ; b). (11) Assume that AMSE( θ 0, θ 0 ) = δτ 2. Then lim AMSE( θ t, θ) = 0. (12) t

41 Main Result: Asymptotic Variance Formula under High-Dimensional Asymptotics. Corollary. Assume setting of previous Theorem. lim Ave i [p]var( θ i ) = a.s V ( Ψ, F ), (13) n,p where the effective score Ψ is Ψ( ) = Ψ( ; b ), while the effective noise distribution F is F = F N(0, τ 2 ). Here (τ, b ) are the unique solutions of the equations (10)-(11).

42 Driving Idea of High-Dimensional Asymptotics, 1 M-estimate asy. equivalent to limit of many steps of AMP. First step of AMP contains extra Gaussian noise. Extra Gaussian noise caused by errors in initial parameter estimates. Extra Gaussian noise declines with AMP iterations but does not go away completely; declines to level defined by the fixed point equations of state evolution. Extra Gaussian noise in M-estimate characterized by fixed point equations of state evolution.

43 Driving Idea of High-Dimensional Asymptotics, 2 Classically M-estimate propagates true errors through score ˆθ = θ B(ψ, F ) (X X ) 1 X (ψ(z i )) + o p (n 1/2 ) HDA propagates noisier effective errors through effective score ˆθ = θ ( ) B( Ψ, F ) n 1 X Ψ(Z i + τ W i ) + o p (n 1/2 ) where W i Z i and both are iid, and Ψ is the effective score.

44 How is Proved, 1? Simply apply existing paper: Bayati, Mohsen, and Andrea Montanari. The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE IT 57.2 (2011): Bayati-Montanari 2011 designed for a seemingly different problem: Asymptotics of Lasso in p > n case. min Ỹ X β 2 2 /2 + λ β 1 Setting was compressed sensing, where X i,j iid N(0, n 1 ), and p n/n δ (0, 1). Formalism of AMP and was introduced, and developed in DLD Arian Maleki, and Andrea Montanari. Message-passing algorithms for compressed sensing. PNAS (2009): DLD, Arian Maleki, and Andrea Montanari. The noise-sensitivity phase transition in compressed sensing. IEEE IT (2011): These papers systematically understood and used the Extra Gaussian Noise property of High Dimensional Asymptotics. Generality of Bayati-Montanari treatment, easily accommodated M-estimation.

45 Iterative Thresholding Soft Threshold Nonlinearity η(x; λ) = ( x λ) + sgn(x). Iterative solution β t, t = 0, 1, 2, 3,.... β 0 = 0 z t = Ỹ X β t β t+1 = η(β t + X z t ; λ t ) Heuristic to solve LASSO: min β Ỹ X β 2 2 /2 + λ β 1.

46 Approximate Message Passing (AMP) Iterative Thresholding First order Approximate Message Passing (AMP) algorithm z t = Ỹ X β t + 1 δ zt 1 η t 1( X z t 1 + β t 1 ). β t+1 = η t (β t + X z t ), η t( s ) = s η t( s ). Feature: Essentially same cost as Iterative Soft Thresholding If η = soft thresholding, 1 δ η t 1 ( X z t 1 + β t 1 ) = βt 0 n.

47 How is Proved, 2? Central Recursion in Bayati-Montanari 2011 h t, q t R N z t, m t R n Initial Condition q 0 ; m 1 = 0. h t+1 = A m t ξ t q t, m t = g t (b t, w) b t = Aq t λ t m t 1, q t = f t (h t, x 0 ) Reaction Coefficients: ξt = g t (bt, w) ; λ t = 1 δ f t (ht, x 0 ) τ 2 t = E{g t (σ t Z, W ) 2 }; σ 2 t = 1 δ E{ft (τ t 1Z, X 0 )} where W F W, X 0 F X0

48 ϑ t+1 = δx T Ψ(W + S t ; b t ) + q t ϑ t ϑ t+1 X T δψ(w + S t ; b t ) q t ϑ t h t+1 = A m t ξ t q t h t+1 A m t ξ t q t S t = Xϑ t + Ψ(W + S t 1 ; b t 1 ) S t X ϑ t 1 Ψ(W + S t 1 ; b t 1 ) b t = Aq t λ t m t 1 b t A q t λ t m t 1 Table : Correspondences between terms in the centered recursions of DLD-Montanari 2013, and recursions analyzed in Bayati-Montanari We get exact correspondence between the two systems, provided we identify δψ(w + S t ; b t ) with m t = g t (b t ; w) and δh t with f t (h t ). One has, in particular, that λ t = 1 δ f t (ht ) = 1, and that ξ t = g t (bt, w) = δψ (W + S t ; b t ) = q t.

49 Full Circle, 1 Why did work on sparse signal recovery solve problem in robust regression? Duality of Sparsity and Robustness Explicit link between solutions of Lasso with p > n and Huber with p < n. Identity between estimating sparsely nonzero vector and uncovering outliers in a linear relation.

50 (Lasso λ ) 1 p min β R p 2 Ỹ Xβ λ β i, (14) i=1 (Huber λ ) min ϑ R p n ρ H (Y i X i, ϑ ; λ) (15) i=1 Let X be a matrix with orthonormal rows such that XX = 0, i.e. null( X) = image(x), (16) finally, set Ỹ = XY. Proposition. With problem instances (Y, X ) and (Ỹ, X) related as above, the optimal values of the Lasso problem (Lasso λ ) and the Huber problem (Huber λ ) are identical. The solutions of the two problems are in one-one-relation. In particular, we have θ = (X T X) 1 X T (Y β). (17) (numerous references: e.g. Art Owen/IPOD & earlier)

51 Full Circle, 3 Isometry between performance of Lasso in ε-sparse regression problem, p > n Huber (M)-estimation in ε-contaminated data, p < n

52 Two Scalar Minimax problems Huber (1964) Minimax problem: v(ε) = min λ sup{v (ψ λ, F ) : F = (1 ε)φ + εh} H Here ψ λ is Huber ψ capped at λ, Φ is N(0, 1) Minimax MSE under sparse means (DLD & Johnstone, 1992) m(ε) = inf sup E F (η λ (X + Z) X ) 2 : X (1 ε)ν 0 + εh λ H Here η λ is soft-thresholding at λ, ν 0 is point mass at zero.

53 Full Circle, 4 (w/ Montanari, Johnstone) HDA regression with fraction ε contaminated errors: δ = n/p V (ε, δ) min max lim Ave ivar(ˆθ i ) λ H n { + v(ε) > δ V (ε, δ) = v(ε) < δ v(ε) 1 v(ε)/δ HDA Sparse Regression, δ = n/p < 1, Ỹ = Xβ 0 + σ Z, Z iid N(0, 1), X i,j iid N(0, n 1 ) β 0 is ε-sparse M(ε, δ) sup min max lim MSE( ˆβ λ, β 0 )/σ 2 σ>0 λ β 0 0 /n ε n { + m(ε) > δ M(ε, δ) = m(ε) 1 m(ε)/δ m(ε) < δ

54 Conclusions High-Dimensional Asymptotics imposes extra Gaussian noise in estimation, not seen classically Approximate Message Passing new algorithm to understand and analyse new type of analysis to obtain properties of estimates New phenomena become visible, e.g. phase transitions in (M)-estimation, previously unknown. Alternate Approach: N. El Karoui, arxiv:

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

High-dimensional regression:

High-dimensional regression: High-dimensional regression: How to pick the objective function in high-dimension UC Berkeley March 11, 2013 Joint work with Noureddine El Karoui, Peter Bickel, Chingwhay Lim, and Bin Yu 1 / 12 Notation.

More information

Inference for High Dimensional Robust Regression

Inference for High Dimensional Robust Regression Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:

More information

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation David L. Donoho Department of Statistics Arian Maleki Department of Electrical Engineering Andrea Montanari Department of

More information

Estimating LASSO Risk and Noise Level

Estimating LASSO Risk and Noise Level Estimating LASSO Risk and Noise Level Mohsen Bayati Stanford University bayati@stanford.edu Murat A. Erdogdu Stanford University erdogdu@stanford.edu Andrea Montanari Stanford University montanar@stanford.edu

More information

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Yuxin Chen Electrical Engineering, Princeton University Coauthors Pragya Sur Stanford Statistics Emmanuel

More information

Does l p -minimization outperform l 1 -minimization?

Does l p -minimization outperform l 1 -minimization? Does l p -minimization outperform l -minimization? Le Zheng, Arian Maleki, Haolei Weng, Xiaodong Wang 3, Teng Long Abstract arxiv:50.03704v [cs.it] 0 Jun 06 In many application areas ranging from bioinformatics

More information

Sample Size Requirement For Some Low-Dimensional Estimation Problems

Sample Size Requirement For Some Low-Dimensional Estimation Problems Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,

More information

Risk and Noise Estimation in High Dimensional Statistics via State Evolution

Risk and Noise Estimation in High Dimensional Statistics via State Evolution Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel

More information

Lecture 12 Robust Estimation

Lecture 12 Robust Estimation Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

9. Robust regression

9. Robust regression 9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

19.1 Problem setup: Sparse linear regression

19.1 Problem setup: Sparse linear regression ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Asilomar 2011 Jason T. Parker (AFRL/RYAP) Philip Schniter (OSU) Volkan Cevher (EPFL) Problem Statement Traditional

More information

The Minimax Noise Sensitivity in Compressed Sensing

The Minimax Noise Sensitivity in Compressed Sensing The Minimax Noise Sensitivity in Compressed Sensing Galen Reeves and avid onoho epartment of Statistics Stanford University Abstract Consider the compressed sensing problem of estimating an unknown k-sparse

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Uncertainty Quantification for Inverse Problems. November 7, 2011

Uncertainty Quantification for Inverse Problems. November 7, 2011 Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance

More information

Message passing and approximate message passing

Message passing and approximate message passing Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

A Resampling Method on Pivotal Estimating Functions

A Resampling Method on Pivotal Estimating Functions A Resampling Method on Pivotal Estimating Functions Kun Nie Biostat 277,Winter 2004 March 17, 2004 Outline Introduction A General Resampling Method Examples - Quantile Regression -Rank Regression -Simulation

More information

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Robust estimation of scale and covariance with P n and its application to precision matrix estimation Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Robust high-dimensional linear regression: A statistical perspective

Robust high-dimensional linear regression: A statistical perspective Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,

More information

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems 1 Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems Alyson K. Fletcher, Mojtaba Sahraee-Ardakan, Philip Schniter, and Sundeep Rangan Abstract arxiv:1706.06054v1 cs.it

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Theoretical results for lasso, MCP, and SCAD

Theoretical results for lasso, MCP, and SCAD Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Bayesian Models for Regularization in Optimization

Bayesian Models for Regularization in Optimization Bayesian Models for Regularization in Optimization Aleksandr Aravkin, UBC Bradley Bell, UW Alessandro Chiuso, Padova Michael Friedlander, UBC Gianluigi Pilloneto, Padova Jim Burke, UW MOPTA, Lehigh University,

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Stat 710: Mathematical Statistics Lecture 12

Stat 710: Mathematical Statistics Lecture 12 Stat 710: Mathematical Statistics Lecture 12 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 1 / 11 Lecture 12:

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Circle the single best answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 6, 2017 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 32 multiple choice

More information

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Graphlet Screening (GS)

Graphlet Screening (GS) Graphlet Screening (GS) Jiashun Jin Carnegie Mellon University April 11, 2014 Jiashun Jin Graphlet Screening (GS) 1 / 36 Collaborators Alphabetically: Zheng (Tracy) Ke Cun-Hui Zhang Qi Zhang Princeton

More information

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort

More information

high-dimensional inference robust to the lack of model sparsity

high-dimensional inference robust to the lack of model sparsity high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Information-theoretically Optimal Sparse PCA

Information-theoretically Optimal Sparse PCA Information-theoretically Optimal Sparse PCA Yash Deshpande Department of Electrical Engineering Stanford, CA. Andrea Montanari Departments of Electrical Engineering and Statistics Stanford, CA. Abstract

More information

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations.

Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations. Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations. Wenqing Hu. 1 (Joint work with Michael Salins 2, Konstantinos Spiliopoulos 3.) 1. Department of Mathematics

More information

Robust estimation, efficiency, and Lasso debiasing

Robust estimation, efficiency, and Lasso debiasing Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

Approximate Message Passing

Approximate Message Passing Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters

More information

Least squares problems

Least squares problems Least squares problems How to state and solve them, then evaluate their solutions Stéphane Mottelet Université de Technologie de Compiègne 30 septembre 2016 Stéphane Mottelet (UTC) Least squares 1 / 55

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION (REFEREED RESEARCH) COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION Hakan S. Sazak 1, *, Hülya Yılmaz 2 1 Ege University, Department

More information

Steven Cook University of Wales Swansea. Abstract

Steven Cook University of Wales Swansea. Abstract On the finite sample power of modified Dickey Fuller tests: The role of the initial condition Steven Cook University of Wales Swansea Abstract The relationship between the initial condition of time series

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

How to Design Message Passing Algorithms for Compressed Sensing

How to Design Message Passing Algorithms for Compressed Sensing How to Design Message Passing Algorithms for Compressed Sensing David L. Donoho, Arian Maleki and Andrea Montanari, February 17, 2011 Abstract Finding fast first order methods for recovering signals from

More information

Lecture 32: Asymptotic confidence sets and likelihoods

Lecture 32: Asymptotic confidence sets and likelihoods Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence

More information

Lecture 14 October 13

Lecture 14 October 13 STAT 383C: Statistical Modeling I Fall 2015 Lecture 14 October 13 Lecturer: Purnamrita Sarkar Scribe: Some one Disclaimer: These scribe notes have been slightly proofread and may have typos etc. Note:

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com

More information