M-Estimation under High-Dimensional Asymptotics
|
|
- Dinah Parks
- 5 years ago
- Views:
Transcription
1 M-Estimation under High-Dimensional Asymptotics
2 Classical M-estimation Big Data M-estimation An out-of-the-park grand-slam home run Annals of Mathematical Statistics 1964 Richard Olshen
3 Classical M-estimation Big Data M-estimation
4 Classical M-estimation Big Data M-estimation M-estimation Basics Location model Y i = θ + Z i, i = 1,..., n Errors: Z i F, not necessarily Gaussian. Loss Function ρ(t) eg t 2, t, log(f (t)),... (M) min θ n ρ(y i θ) i=1 Asymptotic Distribution n( ˆθ n θ) D N(0, V ), n. Asymptotic Variance: ψ = ρ : Information Bound ψ 2 df V (ψ, F ) = ( ψ df ) 2 V (ψ, F ) 1 I (F )
5 Classical M-estimation Big Data M-estimation The One-Step Viewpoint P. J. BICKEL* One-Step Huber Estimates in the Linear Model Simple "one-step" versions of Huber's (M) estimates for the linear equivalence holds in the more general context of the model are introduced. Some relevant Monte Carlo results obtained linear model for general 46. in the Princeton project [1] are singled out and discussed. The large Typically the estimates obtained from sample behavior of these procedures is examined (1.1) are not under very mild regularity conditions. scale equivariant.1 To obtain acceptable procedures a scale equivariant and location invariant estimate of scale 1. INTRODUCTION 6 must be calculated from the data and 6 be obtained as the solution of In 1964 Huber [7] introduced a class of estimates n (referred to as (M)) in the location problem, studied their E, Off (Xi - 0) = O(1.4) asymptotic behavior and identified robust members of j-1 the group. These procedures are the solutions 8 of equa- where tions of the form, (x) = (x/a). (1.5) n E +(X}-') -0, (1.1) The resulting 6 is then both location and scale equivariant. The estimate 6 can be obtained simultaneously where Xi = 0 + El, *?Xn = + E. and with ' by solving a system of equations such as those of El, ** En are unknown independent, identically distributed Huber's errors Proposal 2 [8, p. 96] or the "likelihood which have a distribution F which is symmetric about 0. equations" n Xj If F has a density f which is smooth and if f is known, - E 4 ;\ ) 0X then maximum likelihood estimates if they exist satisfy a-1 O' (1.6) (1.1) with F6 = -f'/f. n Xi - 6 Under successively milder regularity conditions on V and F, Huber showed in [7] and [8] that such ' were consistent and asymptotically normal with mean 0 and where x (t) = t4* (t) -1. Or, we may choose 6 indepenvariance K(#k, F)/n where dently. For instance, in this article, the normalized interquartile range, K(VI, F) =1 /(t)f(t) 2 dt f(t)do(t)i. (1.2) l = (X(n-[n/4]+1) - X([,/4]))/24'-1(3/4), (1.7) JASA 1975 If F is unknown but close to a normal distribution with and the symmetrized interquartile range, mean 0 and known variance in a suitable sense, Huber 62 = median { i - m I/(D-1(3), (1.8) in [7] further showed that (M) estimates based on are used where X(l) <... < X(n) are the order statistics, #K(t) = t if Itl < K 4' is the standard normal cdf and m is the sample median. = Ksgnt if Itl >K DLD, (1.3) Andrea If 6 Montanari -+ a(f) at rate 1/v'n and F is symmetric as hy-
6 Classical M-estimation Big Data M-estimation Regression M-estimation: the One-Step Viewpoint Regression model Objective function of (M): Y i = X i θ + Z i, R(ϑ) = n i=1 Z i iid F, i = 1,..., n ρ(y i X i ϑ) (M) min R(ϑ) ϑ One-step estimate: θn any n-consistent estimate of θ: ˆθ 1 = θ n [Hess R θn ] 1 R θn. Effectiveness: ˆθ true solution of M-equation: E( ˆθ 1 ˆθ)( ˆθ 1 ˆθ) = o(n 1 )
7 Classical M-estimation Big Data M-estimation Driving Idea of Classical Asymptotics The M-estimate is asymptotically equivalent to a single step of Newton s method for finding a zero of R starting at the true underlying parameter. Goes back to Fisher, Method of Scoring for MLE.
8 Classical M-estimation Big Data M-estimation Derivation of Asymptotic Variance Formula Approximation to One-Step: ˆθ 1 = θ + 1 B(ψ, F ) (X X ) 1 X (ψ(z i )) + o p (n 1/2 ) where B(ψ, F ) = ψ df. Observe that Var((X X ) 1 X (ψ(z i ))) (X X ) 1 A(ψ, F ) where A(ψ, F ) = ψ 2 df. Hence if X i,j N(0, 1 n ) Var(ˆθ i θ i ) A(ψ, F ) B(ψ, F ) 2 = V (ψ, F )
9 Classical M-estimation Big Data M-estimation Asymptotics for Regression, I
10 Classical M-estimation Big Data M-estimation Asymptotics for Regression, II PJ Huber, Annals of Statistics 1973
11 Classical M-estimation Big Data M-estimation
12 Classical M-estimation Big Data M-estimation
13 Classical M-estimation Big Data M-estimation On robust regression with high-dimensional predictors Noureddine El Karoui a,1, Derek Bean a, Peter J. Bickel a,1, Chinghway Lim b, and Bin Yu a a Department of Statistics, University of California, Berkeley, CA 94720; and b Department of Statistics and Applied Probability, Faculty of Science, National University of Singapore, Contributed by Peter J. Bickel, April 25, 2013 (sent for review March 1, 2012) We study regression M-estimates in the setting where p, the number of covariates, and n, the number of observations, are both aged to obtain rigorous proofs for many of our assertions. They simulations. (While the paper was under review, we have man- large, but p n. Wefind an exact stochastic representation for will be presented elsewhere because they are very long and technical.) We give several results for covariates that are Gaussian or the distribution of ^β = argmin β R p n i=1 ρðyi Xi βþ at fixed p and n under various assumptions on the objective function ρ and our derived from Gaussian but present grounds that the behavior statistical model. A scalar random variable whose deterministic holds much more generally the key being concentration of limit rρðκþ can be studied when p=n κ > 0 plays a central role in certain quadratic forms involving the vectors of covariates. We this representation. We discover a nonlinear system of two deterministic equations that characterizes rρðκþ. Interestingly, the sys- the design matrix. [Further results with different designs can be also investigate the sensitivity of our results to the geometry of tem shows that rρðκþ depends on ρ through proximal mappings of found in our work (5).] ρ as well as various aspects of the statistical model underlying our We find that (i) estimates of coordinates and contrasts that study. Several surprising results emerge. In particular, we show have coefficients independent of the observed covariates continue to be unbiased and asymptotically normal; and (ii) as in the that, when p=n is large enough, least squares becomes preferable to least absolute deviations for double-exponential errors. fixed p case, this happens at scale n 1=2, at least when the minimal and maximal eigenvalues of the covariance of the predictors prox function high-dimensional statistics concentration of measure stay bounded away from 0 and, respectively.* These findings are obtained by (i) using leave-one-out perturbation arguments both for the data units and predictors; (ii) n the classical period up to the 1980s, research on regression exhibiting a pair of master equations from which the asymptotic Imodels focused on situations for which the number of covariates p was much smaller than n, the sample size. Least-squares asymptotic variances can be recovered; and (iii) showing that mean square prediction error and the correct expressions for regression (LSE) was the main fitting tool used, but its sensitivity these two quantities depend in a nonlinear way on p/n, the error to outliers came to the fore with the work of Tukey, Huber, distribution, the design matrix, and the form of the objective Hampel, and others starting in the 1950s. function, ρ. Given the model Yi = Xi β 0 + ei and M-estimation methods It is worth noting that our findings go against the likelihood described in the Abstract, it follows from the discussion in ref. 1 principle that the ideal objective function is the negative logdensity of the error distribution. For example, we show that, (p. 170, for instance) that, if the design matrix X (an n p matrix whose ith row is Xi) is nonsingular, under various regularity when p/n is large enough, it becomes preferable use least conditions on X, ρ, ψ = ρ and DLD, the Andrea [independent Montanari identically dis- squares rather than LAD for double-exponential errors. We il- STATISTICS
14 Classical M-estimation Big Data M-estimation Bickel, Yu, El Karoui
15 Classical M-estimation Big Data M-estimation High-Dimensional Asymptotics (HDA) n, p n. X i,j iid N(0, 1 n ) p n 1 θ 0,n 2 2 τ 2 0. Y = X θ 0 + Z n/p n δ (1, ). Meaning of δ : # of observations per parameter to be estimated
16 Classical M-estimation Big Data M-estimation Emergent Phenomena under HDA Classical setting - random design, p fixed, n. var( ˆθ i ) V (ψ, F ), n. HDA setting - for n/pn δ (1, ) Effective Score Ψ = Ψ δ,ψ,f (to be described... ) Effective Error Distribution F = F N(0, τ 2 ) Extra Gaussian noise: τ = τ (δ, ψ, F ). Asymptotic Variance under HDA var( ˆθ i ) V ( ψ, F ), n, p n. Classical Correspondence Ψ δ,ψ,f ψ, F δ,ψ,f F, δ.
17 Classical M-estimation Big Data M-estimation Immediate Implications 1. Classical formulas for confidence statements about M-estimates are overoptimistic under high dimensional asymptotics (even dramatically so). 2. Maximum likelihood estimates are inefficient under high-dimensional asymptotics. Score ψ = ( log f W ) does not yield an efficient estimator. 3. The usual Fisher Information bound is not attainable, as I ( F ) < I (F ).
18 : Sample implication of DLD & Montanari (2013) Corollary. For an M-estimator under HDA with errors Z i iid F : lim Ave ivar(ˆθ i ) n 1 1 1/δ 1 I (F ) Effect of HDA parameter δ (1, ) evident
19
20 Regularized ρ Definition ρ : R R is smooth if C 1 with absolutely continuous derivative ψ = ρ having a bounded a.e. derivative ψ. Excludes l 1 :ρ l1 (z) = z. Allows Huber: ρ H (z) = min(z 2 /2, λ z λ 2 /2) Regularized ρ: { ρ b (z) min bρ(x) + 1 x R 2 (x z)2}, (1) Min-convolution of ρ with a square loss.
21 Regularized Score Function Regularized Score Function Ψ(z; b) = ρ b (z). Example: Huber loss ρ H (z; λ), with score function ψ(z; λ) = min(max( λ, z), λ). ( z ) Ψ(z; b) = bψ 1 + b ; λ. Ψ like ψ, but central slope Ψ ( ; b) = Effective score: for special choice b, Ψ = Ψ(z; b ) b 1+b < 1.
22 Approximate Message Passing Algorithm Initialization θ 0 = 0. Adjusted residuals. R t = Y X θ t + Ψ(R t 1 ; b t 1 ) ; (2) Effective Score. Choose b t > 0 achieving empirical average slope p/n (0, 1). p n = 1 n n Ψ (Ri t ; b). (3) i=1 Scoring. Apply effective score function Ψ(R t ; b t ): θ t+1 = θ t + δx T Ψ(R t ; b t ). (4)
23 Determining Regularization Parameter b t Example: ρ = ρ H ( ; 3), F = 0.95N(0, 1) δ Determination of b History of b t b 0.5 b t Average Slope AMP iteration t
24 Connection of AMP to M-estimation Lemma Let ( θ, R, b ) be a fixed point of the AMP iteration (2), (3), (4) having b > 0. Then θ is a minimizer of the problem (M). Viceversa, any minimizer θ of the problem (M) corresponds to an AMP fixed point of the form ( θ, R, b ).
25 Example: Size n = 1000, p = 200, so δ = 5. Truth θ 0 random θ 0 2 = 6 p. Errors F = CN(0.05, 10), i.e. F = 0.95Φ H 10 H x denotes unit mass at x. Loss ρ = ρ H (z; λ) with λ = 3. Iterations Run Amp for 20 iterations. Comparison Use CVX to obtain Huber estimator directly
26 Convergence of AMP in Example Convergence of AMP ˆθ t to θ 0 AMP M 6 Convergence of AMP ˆθ t to ˆθ RMSE( ˆθ t, θ 0) RMSE( ˆθ t, ˆθ) AMP iteration t AMP iteration t
27 Contrast AMP w/ Traditional Scoring Traditional residual: Z = Y X θ Traditional Scoring: AMP: θ 1 = θ + 1 n 1 n i=1 ψ (Z t ) (X X ) 1 X (ψ(z i )), (5) θ t+1 = θ t + δn 1 X Ψ(R t ; b t ). Object Scoring AMP Average Slope n i=1 ψ (Z i )/n n i=1 Ψ (Ri t; b t)/n δ 1 Gram (X X ) 1 1 Residual Traditional (Z i ) Adjusted (Ri t) Score ψ( ) Ψ( ; b t ) Basepoint True θ Current Iterate Iteration 1 t 1
28 Basic contrasts between HDA and Classical Asymptotics Under the High-Dimensional Asymptotic, δ (1, ), The heuristic expand around θ is incorrect; it understates the true uncertainty. The heuristic errors equal the residuals is incorrect; the residuals are noisier. The heuristic take one step is incorrect; must take many steps.
29 Extra Variance at Initialization of AMP Initialize with θ 0 = 0, R 1 = 0. Initial residual R 1 = Y X θ 0 = Z + X (θ 0 θ 0 ). Terms Z and X (θ 0 θ 0 ) are independent. X (θ 0 θ 0 ) is Gaussian with variance τ 2 0 = θ 0 θ /n = MSE( θ 0, θ 0 )/δ. Var(R 1 i ) = Var(Z) + Var(X (θ 0 θ 0 )) = Var(Z) + MSE( θ 0, θ 0 )/δ. Extra Gaussian noise τ 2 0 = MSE( θ 0, θ 0 )/δ.
30 Evolution of Extra Variance at later AMP iterations Pretend R t Z + τ t W with W an independent standard normal Define variance map { V(τ 2, b; δ, F ) = δ E Ψ(Z + τ W ; b) 2}, Define slope par. b = b(τ; δ, F ): smallest solution b 0 to 1 { } δ = E Ψ (Z + τ W ; b) (6) Definition dynamical system {τ 2 t } t 0, starting at τ 2 0 R 0 by τ 2 t+1 = V(τ 2 t, b(τ t )) = V(τ t, b(τ t ; δ, F ); δ, F ). (7)
31 Variance Map, in running example V(tau 2 ) y=x Variance map V( 2 ), Contaminated Normal 3 Output Variance (0.472,0.472) Input Variance 0 F = 0.95N(0, 1) H 10, ρ H with λ = 3.
32 , in running example V(tau 2 ) y=x Dynamics of t Output Input Variance 0 F = 0.95N(0, 1) H 10, ρ H with λ = 3.
33 Operating Characteristics from Definition The state evolution formalism assigns predictions E under states S = (τ, b, δ, F ) Functions ξ of θ θ 0. Predict p 1 i p ξ( θ i θ 0,i ) by { E(ξ( θ ϑ) S) E ξ( } δ τ Z), Functions ξ 2 of Residual, Error. Predict n 1 n i=1 ξ 2(R i, Z i ) by E(ξ 2 (R, Z) S) Eξ 2 (Z + τ W, Z) where W N(0, 1) and Z F is independent of W.
34 Example of Predictions MSE at iteration t. We let S t = (τ t, b(τ t ), δ, F ) denote the state of AMP at iteration t, and predict { MSE( θ t, θ 0 ) E(( ˆϑ ϑ) 2 S t ) = E ( δ τ t Z) 2} = δτt 2. MSE at convergence. With τ > 0 the limit of τ t, let S = (τ, b(τ ), δ, F ) denote the equilibrium state of AMP { MSE( θ, θ 0 ) E(( ˆϑ ϑ) 2 S ) = E ( δ τ Z) 2} = δτ 2. Ordinary residuals Y X θ at AMP convergence. Setting η(z; b) = z Ψ(z; b) Y X θ D η(z + τ W ; b ).
35 t b t Iteration t MSE 5 MAE Iteration t Iteration t Figure : Experimental means from 10 simulations compared with predictions under CN(0.05, 10), with Huber ψ, λ = 3. Upper Left: ˆτ t = θ t θ 0 2 / n. Upper Right: ˆb t. Lower Left: MSE, Mean Squared Error. Lower Right: MAE, Mean Absolute Error. Blue + symbols: Empirical means of AMP observables. Green Curve: Theoretical predictions by SE.
36 Lower Bounds on Lemma. Suppose that F has a well-defined Fisher information I (F ). Then for any t > 0 τ 2 t 1 δi (F ). Lemma. (uses Barron-Madiman, 2007) Corollary. For t > k I (F N(0, τ 2 )) τ 2 t I (F ) 1 + τ 2 I (F ). 1 + δ δ δ k. δi (F ) Corollary. For every accumulation point τ of τ 2 1 δ 1 1 I (F ). Corollary. For an M-estimator under HDA with errors Z i iid F : lim n Var( ˆθ 1 i ) 1 1/δ 1 I (F )
37 1 Basic Assumptions: A1 Discrepancy function ρ is convex and smooth; A2 Matrices {X(n)} n are iid N(0, 1 n ) A3 θ 0, θ 0 = 0 are deterministic sequences such that AMSE(θ 0, θ 0 ) = δτ 2 0. A4 F has finite second moment. Terminology: Let {τ 2 t } t 0 denote the state evolution sequence with initial condition τ 2 0. Let { θt, R t }t 0 be the AMP trajectory with parameters b t. Definition: A function ξ : R k R is pseudo-lipschitz if there exists L < such that, for all x, y R k, ξ(x) ξ(y) L(1 + x 2 + y 2 ) x y 2.
38 2 Theorem Assume A1-A4. Let ξ : R R, ξ 2 : R R R be pseudo-lipschitz functions. Then, for any t > 0, we have, for W N(0, 1) independent of Z F 1 lim n p 1 lim n n p i=1 n i=1 { ξ( θ i t θ 0,i ) = a.s. E ξ( } δ τ t Z), (8) { } ξ 2 (Ri t, Z i ) = a.s. E ξ 2 (Z + τ t W, Z). (9)
39 Corollary of Correctness Under HDA, can define AMSE( ˆθ t, θ 0 ) = a.s lim θ t θ 0 2 n,p 2/p; and and AMSE( ˆθ t, θ 0 ) = δτ 2 t. lim AMSE( ˆθ t, θ 0 ) = δτ 2 t.
40 Convergence of AMP to M-Estimator, 2 Theorem. Assume A1-A4 and that ρ is strongly convex and δ > 1. Let (τ, b ) be a solution of the two equations { τ 2 = δ E Ψ(Z + τ W ; b) 2}, (10) 1 { } δ = E Ψ (Z + τ W ; b). (11) Assume that AMSE( θ 0, θ 0 ) = δτ 2. Then lim AMSE( θ t, θ) = 0. (12) t
41 Main Result: Asymptotic Variance Formula under High-Dimensional Asymptotics. Corollary. Assume setting of previous Theorem. lim Ave i [p]var( θ i ) = a.s V ( Ψ, F ), (13) n,p where the effective score Ψ is Ψ( ) = Ψ( ; b ), while the effective noise distribution F is F = F N(0, τ 2 ). Here (τ, b ) are the unique solutions of the equations (10)-(11).
42 Driving Idea of High-Dimensional Asymptotics, 1 M-estimate asy. equivalent to limit of many steps of AMP. First step of AMP contains extra Gaussian noise. Extra Gaussian noise caused by errors in initial parameter estimates. Extra Gaussian noise declines with AMP iterations but does not go away completely; declines to level defined by the fixed point equations of state evolution. Extra Gaussian noise in M-estimate characterized by fixed point equations of state evolution.
43 Driving Idea of High-Dimensional Asymptotics, 2 Classically M-estimate propagates true errors through score ˆθ = θ B(ψ, F ) (X X ) 1 X (ψ(z i )) + o p (n 1/2 ) HDA propagates noisier effective errors through effective score ˆθ = θ ( ) B( Ψ, F ) n 1 X Ψ(Z i + τ W i ) + o p (n 1/2 ) where W i Z i and both are iid, and Ψ is the effective score.
44 How is Proved, 1? Simply apply existing paper: Bayati, Mohsen, and Andrea Montanari. The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE IT 57.2 (2011): Bayati-Montanari 2011 designed for a seemingly different problem: Asymptotics of Lasso in p > n case. min Ỹ X β 2 2 /2 + λ β 1 Setting was compressed sensing, where X i,j iid N(0, n 1 ), and p n/n δ (0, 1). Formalism of AMP and was introduced, and developed in DLD Arian Maleki, and Andrea Montanari. Message-passing algorithms for compressed sensing. PNAS (2009): DLD, Arian Maleki, and Andrea Montanari. The noise-sensitivity phase transition in compressed sensing. IEEE IT (2011): These papers systematically understood and used the Extra Gaussian Noise property of High Dimensional Asymptotics. Generality of Bayati-Montanari treatment, easily accommodated M-estimation.
45 Iterative Thresholding Soft Threshold Nonlinearity η(x; λ) = ( x λ) + sgn(x). Iterative solution β t, t = 0, 1, 2, 3,.... β 0 = 0 z t = Ỹ X β t β t+1 = η(β t + X z t ; λ t ) Heuristic to solve LASSO: min β Ỹ X β 2 2 /2 + λ β 1.
46 Approximate Message Passing (AMP) Iterative Thresholding First order Approximate Message Passing (AMP) algorithm z t = Ỹ X β t + 1 δ zt 1 η t 1( X z t 1 + β t 1 ). β t+1 = η t (β t + X z t ), η t( s ) = s η t( s ). Feature: Essentially same cost as Iterative Soft Thresholding If η = soft thresholding, 1 δ η t 1 ( X z t 1 + β t 1 ) = βt 0 n.
47 How is Proved, 2? Central Recursion in Bayati-Montanari 2011 h t, q t R N z t, m t R n Initial Condition q 0 ; m 1 = 0. h t+1 = A m t ξ t q t, m t = g t (b t, w) b t = Aq t λ t m t 1, q t = f t (h t, x 0 ) Reaction Coefficients: ξt = g t (bt, w) ; λ t = 1 δ f t (ht, x 0 ) τ 2 t = E{g t (σ t Z, W ) 2 }; σ 2 t = 1 δ E{ft (τ t 1Z, X 0 )} where W F W, X 0 F X0
48 ϑ t+1 = δx T Ψ(W + S t ; b t ) + q t ϑ t ϑ t+1 X T δψ(w + S t ; b t ) q t ϑ t h t+1 = A m t ξ t q t h t+1 A m t ξ t q t S t = Xϑ t + Ψ(W + S t 1 ; b t 1 ) S t X ϑ t 1 Ψ(W + S t 1 ; b t 1 ) b t = Aq t λ t m t 1 b t A q t λ t m t 1 Table : Correspondences between terms in the centered recursions of DLD-Montanari 2013, and recursions analyzed in Bayati-Montanari We get exact correspondence between the two systems, provided we identify δψ(w + S t ; b t ) with m t = g t (b t ; w) and δh t with f t (h t ). One has, in particular, that λ t = 1 δ f t (ht ) = 1, and that ξ t = g t (bt, w) = δψ (W + S t ; b t ) = q t.
49 Full Circle, 1 Why did work on sparse signal recovery solve problem in robust regression? Duality of Sparsity and Robustness Explicit link between solutions of Lasso with p > n and Huber with p < n. Identity between estimating sparsely nonzero vector and uncovering outliers in a linear relation.
50 (Lasso λ ) 1 p min β R p 2 Ỹ Xβ λ β i, (14) i=1 (Huber λ ) min ϑ R p n ρ H (Y i X i, ϑ ; λ) (15) i=1 Let X be a matrix with orthonormal rows such that XX = 0, i.e. null( X) = image(x), (16) finally, set Ỹ = XY. Proposition. With problem instances (Y, X ) and (Ỹ, X) related as above, the optimal values of the Lasso problem (Lasso λ ) and the Huber problem (Huber λ ) are identical. The solutions of the two problems are in one-one-relation. In particular, we have θ = (X T X) 1 X T (Y β). (17) (numerous references: e.g. Art Owen/IPOD & earlier)
51 Full Circle, 3 Isometry between performance of Lasso in ε-sparse regression problem, p > n Huber (M)-estimation in ε-contaminated data, p < n
52 Two Scalar Minimax problems Huber (1964) Minimax problem: v(ε) = min λ sup{v (ψ λ, F ) : F = (1 ε)φ + εh} H Here ψ λ is Huber ψ capped at λ, Φ is N(0, 1) Minimax MSE under sparse means (DLD & Johnstone, 1992) m(ε) = inf sup E F (η λ (X + Z) X ) 2 : X (1 ε)ν 0 + εh λ H Here η λ is soft-thresholding at λ, ν 0 is point mass at zero.
53 Full Circle, 4 (w/ Montanari, Johnstone) HDA regression with fraction ε contaminated errors: δ = n/p V (ε, δ) min max lim Ave ivar(ˆθ i ) λ H n { + v(ε) > δ V (ε, δ) = v(ε) < δ v(ε) 1 v(ε)/δ HDA Sparse Regression, δ = n/p < 1, Ỹ = Xβ 0 + σ Z, Z iid N(0, 1), X i,j iid N(0, n 1 ) β 0 is ε-sparse M(ε, δ) sup min max lim MSE( ˆβ λ, β 0 )/σ 2 σ>0 λ β 0 0 /n ε n { + m(ε) > δ M(ε, δ) = m(ε) 1 m(ε)/δ m(ε) < δ
54 Conclusions High-Dimensional Asymptotics imposes extra Gaussian noise in estimation, not seen classically Approximate Message Passing new algorithm to understand and analyse new type of analysis to obtain properties of estimates New phenomena become visible, e.g. phase transitions in (M)-estimation, previously unknown. Alternate Approach: N. El Karoui, arxiv:
Inference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationHigh-dimensional regression:
High-dimensional regression: How to pick the objective function in high-dimension UC Berkeley March 11, 2013 Joint work with Noureddine El Karoui, Peter Bickel, Chingwhay Lim, and Bin Yu 1 / 12 Notation.
More informationInference for High Dimensional Robust Regression
Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:
More informationMessage Passing Algorithms for Compressed Sensing: II. Analysis and Validation
Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation David L. Donoho Department of Statistics Arian Maleki Department of Electrical Engineering Andrea Montanari Department of
More informationEstimating LASSO Risk and Noise Level
Estimating LASSO Risk and Noise Level Mohsen Bayati Stanford University bayati@stanford.edu Murat A. Erdogdu Stanford University erdogdu@stanford.edu Andrea Montanari Stanford University montanar@stanford.edu
More informationLikelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square
Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Yuxin Chen Electrical Engineering, Princeton University Coauthors Pragya Sur Stanford Statistics Emmanuel
More informationDoes l p -minimization outperform l 1 -minimization?
Does l p -minimization outperform l -minimization? Le Zheng, Arian Maleki, Haolei Weng, Xiaodong Wang 3, Teng Long Abstract arxiv:50.03704v [cs.it] 0 Jun 06 In many application areas ranging from bioinformatics
More informationSample Size Requirement For Some Low-Dimensional Estimation Problems
Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,
More informationRisk and Noise Estimation in High Dimensional Statistics via State Evolution
Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationDe-biasing the Lasso: Optimal Sample Size for Gaussian Designs
De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel
More informationLecture 12 Robust Estimation
Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More information9. Robust regression
9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More information19.1 Problem setup: Sparse linear regression
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In
More informationRegularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008
Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationThe properties of L p -GMM estimators
The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationCompressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach
Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Asilomar 2011 Jason T. Parker (AFRL/RYAP) Philip Schniter (OSU) Volkan Cevher (EPFL) Problem Statement Traditional
More informationThe Minimax Noise Sensitivity in Compressed Sensing
The Minimax Noise Sensitivity in Compressed Sensing Galen Reeves and avid onoho epartment of Statistics Stanford University Abstract Consider the compressed sensing problem of estimating an unknown k-sparse
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationUncertainty Quantification for Inverse Problems. November 7, 2011
Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance
More informationMessage passing and approximate message passing
Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More informationA Resampling Method on Pivotal Estimating Functions
A Resampling Method on Pivotal Estimating Functions Kun Nie Biostat 277,Winter 2004 March 17, 2004 Outline Introduction A General Resampling Method Examples - Quantile Regression -Rank Regression -Simulation
More informationRobust estimation of scale and covariance with P n and its application to precision matrix estimation
Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationRobust high-dimensional linear regression: A statistical perspective
Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,
More informationRigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems
1 Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems Alyson K. Fletcher, Mojtaba Sahraee-Ardakan, Philip Schniter, and Sundeep Rangan Abstract arxiv:1706.06054v1 cs.it
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationTheoretical results for lasso, MCP, and SCAD
Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationBayesian Models for Regularization in Optimization
Bayesian Models for Regularization in Optimization Aleksandr Aravkin, UBC Bradley Bell, UW Alessandro Chiuso, Padova Michael Friedlander, UBC Gianluigi Pilloneto, Padova Jim Burke, UW MOPTA, Lehigh University,
More informationClosest Moment Estimation under General Conditions
Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationStat 710: Mathematical Statistics Lecture 12
Stat 710: Mathematical Statistics Lecture 12 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 1 / 11 Lecture 12:
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationLinear Regression. Junhui Qian. October 27, 2014
Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationThe Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression
The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationMa 3/103: Lecture 24 Linear Regression I: Estimation
Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationLocal Polynomial Regression
VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based
More informationCircle the single best answer for each multiple choice question. Your choice should be made clearly.
TEST #1 STA 4853 March 6, 2017 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 32 multiple choice
More informationIntroduction Robust regression Examples Conclusion. Robust regression. Jiří Franc
Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationGraphlet Screening (GS)
Graphlet Screening (GS) Jiashun Jin Carnegie Mellon University April 11, 2014 Jiashun Jin Graphlet Screening (GS) 1 / 36 Collaborators Alphabetically: Zheng (Tracy) Ke Cun-Hui Zhang Qi Zhang Princeton
More informationGeneralized Concomitant Multi-Task Lasso for sparse multimodal regression
Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort
More informationhigh-dimensional inference robust to the lack of model sparsity
high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates
More informationInformation-theoretically Optimal Sparse PCA
Information-theoretically Optimal Sparse PCA Yash Deshpande Department of Electrical Engineering Stanford, CA. Andrea Montanari Departments of Electrical Engineering and Statistics Stanford, CA. Abstract
More informationThe Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility
The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationLarge deviations and averaging for systems of slow fast stochastic reaction diffusion equations.
Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations. Wenqing Hu. 1 (Joint work with Michael Salins 2, Konstantinos Spiliopoulos 3.) 1. Department of Mathematics
More informationRobust estimation, efficiency, and Lasso debiasing
Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationDivide-and-combine Strategies in Statistical Modeling for Massive Data
Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017
More informationApproximate Message Passing
Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters
More informationLeast squares problems
Least squares problems How to state and solve them, then evaluate their solutions Stéphane Mottelet Université de Technologie de Compiègne 30 septembre 2016 Stéphane Mottelet (UTC) Least squares 1 / 55
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationCOMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION
(REFEREED RESEARCH) COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION Hakan S. Sazak 1, *, Hülya Yılmaz 2 1 Ege University, Department
More informationSteven Cook University of Wales Swansea. Abstract
On the finite sample power of modified Dickey Fuller tests: The role of the initial condition Steven Cook University of Wales Swansea Abstract The relationship between the initial condition of time series
More informationBayesian Sparse Linear Regression with Unknown Symmetric Error
Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of
More informationSome Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model
Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;
More informationAsymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationLecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011
Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector
More informationHow to Design Message Passing Algorithms for Compressed Sensing
How to Design Message Passing Algorithms for Compressed Sensing David L. Donoho, Arian Maleki and Andrea Montanari, February 17, 2011 Abstract Finding fast first order methods for recovering signals from
More informationLecture 32: Asymptotic confidence sets and likelihoods
Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence
More informationLecture 14 October 13
STAT 383C: Statistical Modeling I Fall 2015 Lecture 14 October 13 Lecturer: Purnamrita Sarkar Scribe: Some one Disclaimer: These scribe notes have been slightly proofread and may have typos etc. Note:
More informationSTA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources
STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY
ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com
More information