Robust estimation, efficiency, and Lasso debiasing
|
|
- Garry Edwards
- 5 years ago
- Views:
Transcription
1 Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
2 High-dimensional linear regression n 1 n p n 1 Linear model: p 1 y i = x T i β + ɛ i, i = 1,..., n Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
3 High-dimensional linear regression n 1 n p n 1 Linear model: p 1 y i = x T i β + ɛ i, i = 1,..., n When p n, assume sparsity: β 0 k Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
4 Robust M-estimators Generalization of OLS suitable for heavy-tailed/contaminated errors: { } 1 n β arg min l(xi T β y i ) β n i=1 Loss Least squares Absolute value Huber Tukey Millions of calls Least squares Huber Tukey Residual Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26 Year
5 Robust M-estimators Generalization of OLS suitable for heavy-tailed/contaminated errors: { } 1 n β arg min l(xi T β y i ) β n Extensive theory (consistency, asymptotic normality) for p fixed, n i=1 Loss Least squares Absolute value Huber Tukey Millions of calls Least squares Huber Tukey Residual Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26 Year
6 High-dimensional M-estimators Natural idea: For p > n, use regularized version: { } 1 n β arg min l(xi T β y i ) + λ β 1 β n i=1 Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
7 High-dimensional M-estimators Natural idea: For p > n, use regularized version: { } 1 n β arg min l(xi T β y i ) + λ β 1 β n Complications: Optimization for nonconvex l? i=1 Statistical theory? Are certain losses provably better than others? Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
8 Some statistical theory When l < C, global optima of high-dimensional M-estimator satisfy k log p β β 2 C, n regardless of distribution of ɛ i Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
9 Some statistical theory When l < C, global optima of high-dimensional M-estimator satisfy k log p β β 2 C, n regardless of distribution of ɛ i Compare to Lasso theory: Requires sub-gaussian ɛ i s Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
10 Some statistical theory When l < C, global optima of high-dimensional M-estimator satisfy k log p β β 2 C, n regardless of distribution of ɛ i Compare to Lasso theory: Requires sub-gaussian ɛ i s If l(u) is locally convex/smooth for u r, any local optima within radius cr of β satisfy β β 2 C k log p n Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
11 Some optimization theory O r! k log p n r b e Local optima may be obtained via two-step algorithm Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
12 Some optimization theory O r! k log p n r b e Local optima may be obtained via two-step algorithm Algorithm 1 Run composite gradient descent on convex, robust loss + l 1 -penalty until convergence, output β H 2 Run composite gradient descent on nonconvex, robust loss + µ-amenable penalty, input β 0 = β H Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
13 Motivating calculation Lasso analysis (e.g., van de Geer 07, Bickel et al. 08): { } 1 β arg min β n y X β λ β 1 }{{} L n(β) Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
14 Motivating calculation Lasso analysis (e.g., van de Geer 07, Bickel et al. 08): { } 1 β arg min β n y X β λ β 1 }{{} L n(β) Rearranging basic inequality L n ( β) L n (β ) and assuming λ 2 X T ɛ, obtain n β β 2 cλ k Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
15 Motivating calculation Lasso analysis (e.g., van de Geer 07, Bickel et al. 08): { } 1 β arg min β n y X β λ β 1 }{{} L n(β) Rearranging basic inequality L n ( β) L n (β ) and assuming λ 2 X T ɛ, obtain n β β 2 cλ k ( Sub-Gaussian assumptions on x i s and ɛ i s provide O bounds, minimax optimal ) k log p n Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
16 Motivating calculation Key observation: For general loss function, if λ 2 obtain β β 2 cλ k X T l (ɛ) n, Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
17 Motivating calculation Key observation: For general loss function, if λ 2 obtain β β 2 cλ k X T l (ɛ) n, l (ɛ) sub-gaussian whenever l bounded Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
18 Motivating calculation Key observation: For general loss function, if λ 2 obtain β β 2 cλ k X T l (ɛ) n, l (ɛ) sub-gaussian whenever l bounded = can achieve estimation error k log p β β 2 c, n without assuming ɛ i is sub-gaussian Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
19 Technical challenges Lasso analysis also requires verifying restricted eigenvalue (RE) condition on design matrix, more complicated for general l Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
20 Technical challenges Lasso analysis also requires verifying restricted eigenvalue (RE) condition on design matrix, more complicated for general l Addressed by local curvature of robust losses around origin Loss Least squares Absolute value Huber Tukey Residual Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
21 Technical challenges Lasso analysis also requires verifying restricted eigenvalue (RE) condition on design matrix, more complicated for general l Addressed by local curvature of robust losses around origin Loss Least squares Absolute value Huber Tukey Residual When l is nonconvex, local optima β may exist that are not global optima Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
22 Technical challenges Lasso analysis also requires verifying restricted eigenvalue (RE) condition on design matrix, more complicated for general l Addressed by local curvature of robust losses around origin Loss Least squares Absolute value Huber Tukey Residual When l is nonconvex, local optima β may exist that are not global optima Addressed by theoretical analysis of β β 2 and derivation of suitable optimization algorithms Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
23 Related work: Nonconvex regularized M-estimators Composite objective function { β arg min β 1 R L n (β) + } p ρ λ (β j ) j=1 Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
24 Related work: Nonconvex regularized M-estimators Composite objective function { β arg min β 1 R L n (β) + } p ρ λ (β j ) j=1 Assumptions: L n satisfies restricted strong convexity with curvature α (Negahban et al. 12) ρ λ has bounded subgradient at 0, and ρ λ (t) + µt 2 convex α > µ Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
25 Stationary points (L. & Wainwright 15) O r! k log p n b e Stationary points statistically indistinguishable from global optima L n ( β) + ρ λ ( β), β β 0, β feasible Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
26 Stationary points (L. & Wainwright 15) O r! k log p n b e Stationary points statistically indistinguishable from global optima L n ( β) + ρ λ ( β), β β 0, β feasible log p Under suitable distributional assumptions, for λ n and R 1 λ, k log p β β 2 c statistical error n Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
27 Mathematical statement Theorem (L. & Wainwright 15) Suppose R is chosen s.t. β is feasible, and λ satisfies { } log p max L n (β ), α λ α n R. For n Cτ2 R 2 log p, any stationary point β satisfies α 2 β β 2 λ k α µ, where k = β 0. Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
28 Mathematical statement Theorem (L. & Wainwright 15) Suppose R is chosen s.t. β is feasible, and λ satisfies { } log p max L n (β ), α λ α n R. For n Cτ2 R 2 log p, any stationary point β satisfies α 2 β β 2 λ k α µ, where k = β 0. New ingredient for robust setting: l convex only in local region = need for local consistency results Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
29 Local RSC condition Local RSC condition: For := β 1 β 2, L n (β 1 ) L n (β 2 ), α 2 2 τ log p n 2 1, β j β 2 r Loss function has directions of both positive and negative curvature. Negative directions are forbidden by regularizer. Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
30 Local RSC condition Local RSC condition: For := β 1 β 2, L n (β 1 ) L n (β 2 ), α 2 2 τ log p n 2 1, β j β 2 r Loss function has directions of both positive and negative curvature. Only requires restricted Negative directions curvature are forbiddenwithin by regularizer. constant-radius region around β Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
31 Consistency of local stationary points O r! k log p n r b e Theorem (L. 17) Suppose L n satisfies α-local RSC and ρ λ is µ-amenable, with α > µ. Suppose l log p τ C and λ n. For n α µ k log p, any stationary point β s.t. β β 2 r satisfies β β 2 λ k α µ. Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
32 Optimization theory Question: How to obtain sufficiently close local solutions? Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
33 Optimization theory Question: How to obtain sufficiently close local solutions? Goal: For regularized M-estimator { 1 n β arg min l(xi T β 1 R n i=1 β y i ) + ρ λ (β) }, where l satisfies α-local RSC, find stationary point such that β β 2 r Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
34 Wisdom from Huber Descending ψ-functions are tricky, especially when the starting values for the iterations are non-robust.... It is therefore preferable to start with a monotone ψ, iterate to death, and then append a few (1 or 2) iterations with the nonmonotone ψ. Huber 1981, pp Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
35 Two-step algorithm (L. 17) Two-step M-estimator: Finds local stationary points of nonconvex, robust loss + µ-amenable penalty { } 1 n β arg min l(xi T β y i ) + ρ λ (β) β 1 R n i=1 Algorithm 1 Run composite gradient descent on convex, robust loss + l 1 -penalty until convergence, output β H 2 Run composite gradient descent on nonconvex, robust loss + µ-amenable penalty, input β 0 = β H Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
36 Two-step algorithm (L. 17) Two-step M-estimator: Finds local stationary points of nonconvex, robust loss + µ-amenable penalty { } 1 n β arg min l(xi T β y i ) + ρ λ (β) β 1 R n i=1 Algorithm 1 Run composite gradient descent on convex, robust loss + l 1 -penalty until convergence, output β H 2 Run composite gradient descent on nonconvex, robust loss + µ-amenable penalty, input β 0 = β H Note: Want to optimize original nonconvex objective, since it leads to more efficient (lower-variance) estimators Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
37 Scale calibration Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
38 Scale calibration Closer look: Loss function l in some sense calibrated to scale of ɛ i Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
39 Scale calibration Closer look: Loss function l in some sense calibrated to scale of ɛ i If Huber parameter too large, estimation error bound based on l becomes suboptimal If Huber parameter too small, RSC no longer satisfied w.h.p. Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
40 Scale calibration Closer look: Loss function l in some sense calibrated to scale of ɛ i If Huber parameter too large, estimation error bound based on l becomes suboptimal If Huber parameter too small, RSC no longer satisfied w.h.p. For Lasso, optimal λ known to depend on σ ɛ, but loss function does not require calibration Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
41 Scale calibration Closer look: Loss function l in some sense calibrated to scale of ɛ i If Huber parameter too large, estimation error bound based on l becomes suboptimal If Huber parameter too small, RSC no longer satisfied w.h.p. For Lasso, optimal λ known to depend on σ ɛ, but loss function does not require calibration Better objective (low-dimensional version proposed by Huber): { 1 n ( yi x T ) } i β ( β, σ) arg min l σ + aσ +λ β 1 β,σ n σ i=1 }{{} L n(β,σ) Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
42 Scale calibration Closer look: Loss function l in some sense calibrated to scale of ɛ i If Huber parameter too large, estimation error bound based on l becomes suboptimal If Huber parameter too small, RSC no longer satisfied w.h.p. For Lasso, optimal λ known to depend on σ ɛ, but loss function does not require calibration Better objective (low-dimensional version proposed by Huber): { 1 n ( yi x T ) } i β ( β, σ) arg min l σ + aσ +λ β 1 β,σ n σ i=1 }{{} L n(β,σ) However, joint location/scale estimation notoriously difficult even in low dimensions Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
43 Scale calibration Another idea: MM-estimator { 1 n ( yi x β T ) } i β arg min l + λ β 1, β n σ 0 i=1 using robust estimate of scale σ 0 based on preliminary estimate β 0 How to obtain ( β 0, σ 0 )? Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
44 Scale calibration Another idea: MM-estimator { 1 n ( yi x β T ) } i β arg min l + λ β 1, β n σ 0 i=1 using robust estimate of scale σ 0 based on preliminary estimate β 0 How to obtain ( β 0, σ 0 )? S-estimators/LMS: where σ(r) = r (n nδ ) LTS: β 0 arg min β β 0 arg min β { σ(r(β))}, 1 n n nα i=1 (y i xi T β) 2 (i) + λ β 1 Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
45 Our approach Lepski s method originally proposed for adaptive bandwidth selection in nonparametric regression Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
46 Our approach Lepski s method originally proposed for adaptive bandwidth selection in nonparametric regression Can be used to select σ in location/scale problem: { } 1 n β σ arg min l σ (y i xi T β) + λσ β 1, β n i=1 where l σ is Huber loss parametrized by σ Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
47 Lepski s method Preceding theory implies k log p β σ β 2 Cσ, n w.h.p., assuming σ Var(ɛ i ) := σ Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
48 Lepski s method Preceding theory implies k log p β σ β 2 Cσ, n w.h.p., assuming σ Var(ɛ i ) := σ Basic idea of Lepski s method: Compute β σ on gridding {σ 1,..., σ M } of interval [σ min, σ max ] σ min max Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
49 Lepski s method Preceding theory implies k log p β σ β 2 Cσ, n w.h.p., assuming σ Var(ɛ i ) := σ Basic idea of Lepski s method: Compute β σ on gridding {σ 1,..., σ M } of interval [σ min, σ max ] σ For each σ j, check if β σj β k log p σl 2 2Cσ l n for all l > j, and let σ be argmin in this set min j ` max Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
50 Lepski s method Preceding theory implies k log p β σ β 2 Cσ, n w.h.p., assuming σ Var(ɛ i ) := σ Basic idea of Lepski s method: Compute β σ on gridding {σ 1,..., σ M } of interval [σ min, σ max ] σ For each σ j, check if β σj β k log p σl 2 2Cσ l n for all l > j, and let σ be argmin in this set min j ` max Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
51 Lepski s method Preceding theory implies k log p β σ β 2 Cσ, n w.h.p., assuming σ Var(ɛ i ) := σ Basic idea of Lepski s method: Compute β σ on gridding {σ 1,..., σ M } of interval [σ min, σ max ] σ For each σ j, check if β σj β k log p σl 2 2Cσ l n for all l > j, and let σ be argmin in this set min j ` max Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
52 Lepski s method Preceding theory implies k log p β σ β 2 Cσ, n w.h.p., assuming σ Var(ɛ i ) := σ Basic idea of Lepski s method: Compute β σ on gridding {σ 1,..., σ M } of interval [σ min, σ max ] σ For each σ j, check if β σj β k log p σl 2 2Cσ l n for all l > j, and let σ be argmin in this set min b max Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
53 Statistical guarantee Theorem (L. 17) With high probability, output of Lepski s method satisfies k log p β σ β 2 C σ, n Method does not require prior knowledge of scale σ Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
54 Efficiency Although β σ guaranteed to be l 2 -consistent, estimator may have relatively high variance Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
55 Efficiency Although β σ guaranteed to be l 2 -consistent, estimator may have relatively high variance One-step estimation proposed for obtaining better efficiency (Bickel 75): b ψ = β + (X T X ) 1 Â(ψ) where Â(ψ) is estimate of E[ψ (ɛ i )] 1 n n i=1 ψ(y i x T i β)x i, Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
56 Efficiency Although β σ guaranteed to be l 2 -consistent, estimator may have relatively high variance One-step estimation proposed for obtaining better efficiency (Bickel 75): b ψ = β + (X T X ) 1 Â(ψ) where Â(ψ) is estimate of E[ψ (ɛ i )] Low-dimensional result: n( bψ β) 1 n ( d N 0, n i=1 ψ(y i x T i β)x i, E[ψ 2 ) (ɛ i )] E[ψ (ɛ i )] 2 Θ, so asymptotic variance for ψ = f f matches variance of MLE Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
57 Efficiency In high dimensions, define b ψ = β σ + Θ Â(ψ) 1 n n ψ(y i xi T i=1 β σ )x i, where Â(ψ) = 1 n n i=1 ψ (y i xi T β σ ) and Θ is high-dimensional estimate of Θ (e.g., graphical Lasso estimator) Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
58 Efficiency In high dimensions, define b ψ = β σ + Θ Â(ψ) 1 n n ψ(y i xi T i=1 β σ )x i, where Â(ψ) = 1 n n i=1 ψ (y i xi T β σ ) and Θ is high-dimensional estimate of Θ (e.g., graphical Lasso estimator) Resembles Lasso debiasing procedure (Zhang & Zhang 14, van de Geer et al. 14, Javanmard & Montanari 14) Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
59 Efficiency Theorem (L. 17) Let J {1,..., p} denote a subset of coordinates of constant dimension. Then ( n( bψ β σ d E[ψ 2 ) (ɛ i )] ) J N 0, E[ψ (ɛ i )] 2 Θ JJ Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
60 Efficiency Theorem (L. 17) Let J {1,..., p} denote a subset of coordinates of constant dimension. Then ( n( bψ β σ d E[ψ 2 ) (ɛ i )] ) J N 0, E[ψ (ɛ i )] 2 Θ JJ Implies semiparametric efficiency of one-step estimator when ψ = f f Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
61 Efficiency Theorem (L. 17) Let J {1,..., p} denote a subset of coordinates of constant dimension. Then ( n( bψ β σ d E[ψ 2 ) (ɛ i )] ) J N 0, E[ψ (ɛ i )] 2 Θ JJ Implies semiparametric efficiency of one-step estimator when ψ = f f Can derive asymptotic confidence intervals/regions for subsets of coefficients Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
62 Efficiency Theorem (L. 17) Let J {1,..., p} denote a subset of coordinates of constant dimension. Then ( n( bψ β σ d E[ψ 2 ) (ɛ i )] ) J N 0, E[ψ (ɛ i )] 2 Θ JJ Implies semiparametric efficiency of one-step estimator when ψ = f f Can derive asymptotic confidence intervals/regions for subsets of coefficients Important: Allows statistical inference for high-dimensional regression in cases when x i s, ɛ i s are heavy-tailed Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
63 Summary New theory for robust high-dimensional M-estimators implies ( ) O error rates when l C based on local RSC k log p n Lepski s method proposed to avoid joint scale parameter estimation Derived properties of one-step estimator for semiparametric efficiency and high-dimensional inference Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
64 Summary New theory for robust high-dimensional M-estimators implies ( ) O error rates when l C based on local RSC k log p n Lepski s method proposed to avoid joint scale parameter estimation Derived properties of one-step estimator for semiparametric efficiency and high-dimensional inference Loh (2017). Statistical consistency and asymptotic normality for high-dimensional robust M-estimators. Annals of Statistics. Thank you! Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26
Robust high-dimensional linear regression: A statistical perspective
Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,
More informationEstimators based on non-convex programs: Statistical and computational guarantees
Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright
More informationInference for High Dimensional Robust Regression
Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:
More informationDe-biasing the Lasso: Optimal Sample Size for Gaussian Designs
De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel
More informationDivide-and-combine Strategies in Statistical Modeling for Massive Data
Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationhigh-dimensional inference robust to the lack of model sparsity
high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,
More informationSparse Learning and Distributed PCA. Jianqing Fan
w/ control of statistical errors and computing resources Jianqing Fan Princeton University Coauthors Han Liu Qiang Sun Tong Zhang Dong Wang Kaizheng Wang Ziwei Zhu Outline Computational Resources and Statistical
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationThe Geometry of Hypothesis Testing over Convex Cones
The Geometry of Hypothesis Testing over Convex Cones Yuting Wei Department of Statistics, UC Berkeley BIRS workshop on Shape-Constrained Methods Jan 30rd, 2018 joint work with: Yuting Wei (UC Berkeley)
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationUniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)
Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: 1304.0282 Victor MIT, Economics + Center for Statistics Co-authors: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)
More informationA General Framework for High-Dimensional Inference and Multiple Testing
A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional
More informationHigher-Order von Mises Expansions, Bagging and Assumption-Lean Inference
Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference Andreas Buja joint with: Richard Berk, Lawrence Brown, Linda Zhao, Arun Kuchibhotla, Kai Zhang Werner Stützle, Ed George, Mikhail
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationSample Size Requirement For Some Low-Dimensional Estimation Problems
Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationComposite nonlinear models at scale
Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)
More informationApproximate Residual Balancing: De-Biased Inference of Average Treatment Effects in High Dimensions
Approximate Residual Balancing: De-Biased Inference of Average Treatment Effects in High Dimensions Susan Athey Guido W. Imbens Stefan Wager Current version November 2016 Abstract There are many settings
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR
More informationBayesian Sparse Linear Regression with Unknown Symmetric Error
Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of
More informationCausal Inference: Discussion
Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement
More informationQuantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be
Quantile methods Class Notes Manuel Arellano December 1, 2009 1 Unconditional quantiles Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Q τ (Y ) q τ F 1 (τ) =inf{r : F
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015
More informationarxiv: v3 [stat.me] 8 Jun 2018
Between hard and soft thresholding: optimal iterative thresholding algorithms Haoyang Liu and Rina Foygel Barber arxiv:804.0884v3 [stat.me] 8 Jun 08 June, 08 Abstract Iterative thresholding algorithms
More informationBootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.
Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling
More informationAdditive Isotonic Regression
Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive
More information19.1 Problem setup: Sparse linear regression
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationHigh-dimensional statistics: Some progress and challenges ahead
High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh
More informationDistributed Statistical Estimation and Rates of Convergence in Normal Approximation
Distributed Statistical Estimation and Rates of Convergence in Normal Approximation Stas Minsker (joint with Nate Strawn) Department of Mathematics, USC July 3, 2017 Colloquium on Concentration inequalities,
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationHigh-dimensional regression:
High-dimensional regression: How to pick the objective function in high-dimension UC Berkeley March 11, 2013 Joint work with Noureddine El Karoui, Peter Bickel, Chingwhay Lim, and Bin Yu 1 / 12 Notation.
More informationAccelerated Stochastic Block Coordinate Gradient Descent for Sparsity Constrained Nonconvex Optimization
Accelerated Stochastic Block Coordinate Gradient Descent for Sparsity Constrained Nonconvex Optimization Jinghui Chen Department of Systems and Information Engineering University of Virginia Quanquan Gu
More information9. Robust regression
9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................
More informationMultivariate Calibration with Robust Signal Regression
Multivariate Calibration with Robust Signal Regression Bin Li and Brian Marx from Louisiana State University Somsubhra Chakraborty from Indian Institute of Technology Kharagpur David C Weindorf from Texas
More informationRestricted Strong Convexity Implies Weak Submodularity
Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationBayesian Models for Regularization in Optimization
Bayesian Models for Regularization in Optimization Aleksandr Aravkin, UBC Bradley Bell, UW Alessandro Chiuso, Padova Michael Friedlander, UBC Gianluigi Pilloneto, Padova Jim Burke, UW MOPTA, Lehigh University,
More informationOptimal Value Function Methods in Numerical Optimization Level Set Methods
Optimal Value Function Methods in Numerical Optimization Level Set Methods James V Burke Mathematics, University of Washington, (jvburke@uw.edu) Joint work with Aravkin (UW), Drusvyatskiy (UW), Friedlander
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationarxiv: v3 [stat.me] 14 Nov 2016
Approximate Residual Balancing: De-Biased Inference of Average Treatment Effects in High Dimensions Susan Athey Guido W. Imbens Stefan Wager arxiv:1604.07125v3 [stat.me] 14 Nov 2016 Current version November
More informationOptimization for Compressed Sensing
Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve
More informationIEOR 265 Lecture 3 Sparse Linear Regression
IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M
More informationAccelerate Subgradient Methods
Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationAdaptive estimation of the copula correlation matrix for semiparametric elliptical copulas
Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work
More informationM-Estimation under High-Dimensional Asymptotics
M-Estimation under High-Dimensional Asymptotics 2014-05-01 Classical M-estimation Big Data M-estimation An out-of-the-park grand-slam home run Annals of Mathematical Statistics 1964 Richard Olshen Classical
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationSome Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model
Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationOptimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method
Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors
More informationImproving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates
Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationOutlier detection and variable selection via difference based regression model and penalized regression
Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable
More informationAn Introduction to Bayesian Linear Regression
An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,
More informationStat 710: Mathematical Statistics Lecture 40
Stat 710: Mathematical Statistics Lecture 40 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 40 May 6, 2009 1 / 11 Lecture 40: Simultaneous
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More information19.1 Maximum Likelihood estimator and risk upper bound
ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 19: Denoising sparse vectors - Ris upper bound Lecturer: Yihong Wu Scribe: Ravi Kiran Raman, Apr 1, 016 This lecture
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationarxiv: v1 [math.st] 13 Feb 2012
Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationOptimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes
Optimization Charles J. Geyer School of Statistics University of Minnesota Stat 8054 Lecture Notes 1 One-Dimensional Optimization Look at a graph. Grid search. 2 One-Dimensional Zero Finding Zero finding
More informationStat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)
Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really
More informationLearning 2 -Continuous Regression Functionals via Regularized Riesz Representers
Learning 2 -Continuous Regression Functionals via Regularized Riesz Representers Victor Chernozhukov MIT Whitney K. Newey MIT Rahul Singh MIT September 10, 2018 Abstract Many objects of interest can be
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationarxiv: v2 [stat.me] 3 Jan 2017
Linear Hypothesis Testing in Dense High-Dimensional Linear Models Yinchu Zhu and Jelena Bradic Rady School of Management and Department of Mathematics University of California at San Diego arxiv:161.987v
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationEfficient Estimation in Convex Single Index Models 1
1/28 Efficient Estimation in Convex Single Index Models 1 Rohit Patra University of Florida http://arxiv.org/abs/1708.00145 1 Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia)
More informationNon-Convex Projected Gradient Descent for Generalized Low-Rank Tensor Regression
Journal of Machine Learning Research 20 (2019) 1-37 Submitted 12/16; Revised 1/18; Published 2/19 Non-Convex Projected Gradient Descent for Generalized Low-Rank Tensor Regression Han Chen Garvesh Raskutti
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationVarious types of likelihood
Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood 2. semi-parametric likelihood, partial likelihood 3. empirical likelihood,
More informationAsymmetric least squares estimation and testing
Asymmetric least squares estimation and testing Whitney Newey and James Powell Princeton University and University of Wisconsin-Madison January 27, 2012 Outline ALS estimators Large sample properties Asymptotic
More informationDistribution-Free Predictive Inference for Regression
Distribution-Free Predictive Inference for Regression Jing Lei, Max G Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman Department of Statistics, Carnegie Mellon University Abstract We
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More informationAn Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization
An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationA Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming
A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose
More informationA Significance Test for the Lasso
A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical
More informationarxiv: v2 [math.st] 12 Feb 2008
arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationKriging models with Gaussian processes - covariance function estimation and impact of spatial sampling
Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling François Bachoc former PhD advisor: Josselin Garnier former CEA advisor: Jean-Marc Martinez Department
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More information