l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities
|
|
- Claud Green
- 5 years ago
- Views:
Transcription
1 l -Regularized Linear Regression: Persistence and Oracle Inequalities Peter Bartlett EECS and Statistics UC Berkeley slides at bartlett Joint work with Shahar Mendelson and Joe Neeman.
2 l -regularized linear regression Random pair: (X, Y ) P, in R d R. n independent samples drawn from P: (X, Y ),..., (X n, Y n ). Find β so linear function X, β has small risk, Pl β = P ( X, β Y ) 2. Here, l β (X, Y ) = ( X, β Y ) 2 is the quadratic loss of the linear prediction.
3 l -regularized linear regression Random pair: (X, Y ) P, in R d R. n independent samples drawn from P: (X, Y ),..., (X n, Y n ). Find β so linear function X, β has small risk, Pl β = P ( X, β Y ) 2. Example. l -regularized least squares: ( ) ˆβ = arg min P n l β + ρ n β l β R d d, where P n l β = n n ( X i, β Y i ) 2, and β l d = i= d β j. j=
4 l -regularized linear regression Example. l -regularized least squares: ( ) ˆβ = arg min P n l β + ρ n β l β R d d, where P n l β = n n ( X i, β Y i ) 2, and β l d = i= d β j. j= Tends to select sparse solutions (few non-zero components β j ). Useful, for example, if d n.
5 l -regularized linear regression Example. l -regularized least squares: ( ) ˆβ = arg min P n l β + ρ n β l β R d d, Example. l -constrained least squares: ˆβ = arg min P n l β. β l d b n [Recall: l β (X, Y ) = ( X, β Y ) 2.]
6 l -regularized linear regression Example. l -regularized least squares: ( ) ˆβ = arg min P n l β + ρ n β l β R d d, Example. l -constrained least squares: Some questions: ˆβ = arg min P n l β. β l d b n Prediction: Does ˆβ give accurate forecasts? e.g., How does Pl ˆβ compare with Pl β? } Here, β = arg min {Pl β : β l b n. d
7 l -regularized linear regression Example. l -regularized least squares: ( ) ˆβ = arg min P n l β + ρ n β l β R d d, Example. l -constrained least squares: Some questions: ˆβ = arg min P n l β. β l d b n Does ˆβ give accurate forecasts? } e.g., Pl ˆβ versus Pl β = min {Pl β : β l b n? d Estimation: Under assumptions on P, is ˆβ correct? Sparsity Pattern Estimation: Under assumptions on P, are the non-zeros of ˆβ correct?
8 Outline of Talk. For l -constrained least squares, bounds on Pl ˆβ Pl β. Persistence: (Greenshtein and Ritov, 2004) For what d n, b n does Pl ˆβ Pl β 0? Convex Aggregation: (Tsybakov, 2003) For b = (convex combinations of dictionary functions), what is rate of Pl ˆβ Pl β? 2. For l -regularized least squares, oracle inequalities. 3. Proof ideas.
9 l -regularized linear regression Key Issue: l β is unbounded, so some key tools (e.g., concentration inequalities) cannot immediately be applied. For (X, Y ) bounded, l β can be bounded using β l d, but this gives loose prediction bounds. We use chaining to show that metric structures of l -constrained linear functions under P n and P are similar.
10 Main Results: Excess Risk For l -constrained least squares, ˆβ = arg min P nl β, β l d b if X and Y have suitable tail behaviour then, with probability δ, Pl ˆβ Pl β c ( logα (nd) b 2 δ 2 min n + d n, b n ( + n b )). Small d regime: d/n. Large d regime: b/ n.
11 Main Results: Excess Risk For l -constrained least squares, with probability δ, ( Pl ˆβ Pl β c logα (nd) b 2 δ 2 min n + d n, b n ( + n b )). Conditions:. PY 2 is bounded by a constant. 2. X bounded a.s., X log concave and maxj X, e j L2 c, or X log concave and isotropic.
12 Application: Persistence Consider l -constrained least squares, ˆβ = arg min P nl β. β l d b Suppose that PY 2 is bounded by a constant and tails of X decay nicely (e.g., X bounded a.s. or X log concave and isotropic). Then for increasing d n and ( ) n b n = o log 3/2 n log 3/2, (nd n ) l -constrained least squares is persistent (i.e., Pl ˆβ Pl β 0).
13 Application: Persistence If PY 2 is bounded and tails of X decay nicely, then l -constrained least squares is persistent provided that d n is increasing and ( ) n b n = o log 3/2 n log 3/2. (nd n ) Previous Results (Greenshtein and Ritov, 2004):. b n = ω(n /2 / log /2 n) implies empirical minimization is not persistent for Gaussian (X, Y ). 2. b n = o(n /2 / log /2 n) implies empirical minimization is persistent for Gaussian (X, Y ). 3. b n = o(n /4 / log /4 n) implies empirical minimization is persistent under tail conditions on (X, Y ).
14 Application: Convex Aggregation Consider b =, so that the l -ball of radius b is the convex hull of a dictionary of d functions (the components of X). Tsybakov (2003) showed that, for any aggregation scheme ˆβ, the rate of convex aggregation satisfies ( ( )) d log d Pl ˆβ Pl β = Ω min n,. n For bounded, isotropic distributions, our result implies that this rate can be achieved, up to log factors, by least squares over the convex hull of the dictionary. Previous positive results (Tsybakov, 2003; Bunea, Tsybakov and Wegkamp, 2006) involved complicated estimators.
15 Main Results: Oracle Inequality For l -regularized least squares, ( ) ˆβ = arg min P n l β + ρ n β β l dn, if d n increases but log d n = o(n), and Y and X bounded a.s., then with probability at least o(), Pl ˆβ inf β ( )) (Pl β + cρ n + β l dn, where ρ n = c log3/2 n log /2 (d n n) n.
16 Outline of Talk. For l -constrained least squares, bounds on Pl ˆβ Pl β. Persistence: For what d n, b n does Pl ˆβ Pl β 0? Convex Aggregation: For b = (convex combinations of dictionary functions), what is rate of Pl ˆβ Pl β? 2. For l -regularized least squares, oracle inequalities. 3. Proof ideas.
17 Proof Ideas:. ɛ-equivalence of P and P n structures Define { } λ G λ = P(l β l β ) (l β l β ) : P(l β l β ) λ. Then: E sup g Gλ P n g Pg is small with high probability, for all β with P(l β l β ) λ, ( ɛ)p(l β l β ) P n (l β l β ) ( + ɛ)p(l β l β ) P(l ˆβ l β ) λ, where ˆβ = arg min β P n l β.
18 Proof Ideas: 2. Symmetrization, subgaussian tails For a class of functions F, E sup f F P n f Pf 2 n E sup f F ɛ i f (X i ), where ɛ i are independent Rademacher (±) random variables. (Giné and Zinn, 984) The Rademacher process i Z β = i ɛ i (l β (X i, Y i ) l β (X i, Y i )) indexed by β in T λ = { β R d : β l d } b, Pl β Pl β λ is subgaussian wrt a certain metric d. That is, for every β, β 2 T λ and t, Pr ( Z β Z β2 td(β, β 2 )) 2 exp( t 2 /2).
19 Proof Ideas: 2. Symmetrization, subgaussian tails The Rademacher process Z β = i ɛ i (l β (X i, Y i ) l β (X i, Y i )) indexed by β in T λ = { β R d : β l d is subgaussian wrt the metric } b, Pl β Pl β λ d(β, β 2 ) = 4 max i X i, β β 2 ( X i, v 2 + i i sup v λd 2T l β (X i, Y i ) ). This is a random l distance, scaled by the random l 2 diameter of λd 2T. Here, D = {x R d : P X, x 2 }.
20 Proof Ideas: 3. Chaining For a subgaussian process {Z t } indexed by a metric space (T, d), and for t 0 T, E sup Z t Z t0 cd(t, d) = c t T diam(t,d) where N(ɛ, T, d) is the ɛ covering number of T. 0 log N(ɛ, T, d) dɛ,
21 Proof Ideas: 4. Bounding the Entropy Integral It suffices to calculate the entropy integral D( λd 2bB d, d). We can approximate this by D( ( λd 2bB d, d) min D( ) λd, d), D(2bB d, d). This leads to: Pl ˆβ Pl β c logα (nd) δ 2 ( b 2 min n + d n, b n ( + n b )).
22 Proof Ideas: 5. Oracle Inequalities We get an isomorphic condition on {l β l β }, 2 P n(l β l β ) ɛ n P(l β l β ) 2P n (l β l β ) + ɛ n, and this implies that ˆβ = arg min β (P n l β + cɛ n ) has ( Pl β inf Plβ + c ) ɛ n. β This leads to oracle inequality: For l -regularized least squares, ( ) ˆβ = arg min P n l β + ρ n β β l dn, with probability at least o(), Pl ˆβ inf β ( )) (Pl β + cρ n + β l dn.
23 Outline of Talk. For l -constrained least squares, Pl ˆβ Pl β c ( logα (nd) b 2 δ 2 min n + d n, b n ( + n b )). Persistence: If b n = õ( n), then Pl ˆβ Pl β 0. Convex Aggregation: Empirical risk ( minimization gives optimal rate (up to log factors): Õ min(d/n, ) log d/n). 2. For l -regularized least squares, oracle inequalities. 3. Proof ideas: subgaussian Rademacher process.
Fast Rates for Estimation Error and Oracle Inequalities for Model Selection
Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@cs.berkeley.edu
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationTHE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich
Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationLecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses
Information and Coding Theory Autumn 07 Lecturer: Madhur Tulsiani Lecture 9: October 5, 07 Lower bounds for minimax rates via multiple hypotheses In this lecture, we extend the ideas from the previous
More informationSupremum of simple stochastic processes
Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationIEOR 265 Lecture 3 Sparse Linear Regression
IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results
Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics
More informationSharp Generalization Error Bounds for Randomly-projected Classifiers
Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More information10-704: Information Processing and Learning Fall Lecture 21: Nov 14. sup
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: Nov 4 Note: hese notes are based on scribed notes from Spring5 offering of this course LaeX template courtesy of UC
More informationStatistical Properties of Large Margin Classifiers
Statistical Properties of Large Margin Classifiers Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mike Jordan, Jon McAuliffe, Ambuj Tewari. slides
More informationSparse and Low Rank Recovery via Null Space Properties
Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,
More informationBennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence
Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present
More informationLecture 13 October 6, Covering Numbers and Maurey s Empirical Method
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect
More informationInference for High Dimensional Robust Regression
Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:
More informationAdaBoost and other Large Margin Classifiers: Convexity in Classification
AdaBoost and other Large Margin Classifiers: Convexity in Classification Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mikhail Traskin. slides at
More informationClassification with Reject Option
Classification with Reject Option Bartlett and Wegkamp (2008) Wegkamp and Yuan (2010) February 17, 2012 Outline. Introduction.. Classification with reject option. Spirit of the papers BW2008.. Infinite
More informationThe Moment Method; Convex Duality; and Large/Medium/Small Deviations
Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential
More informationThe Learning Problem and Regularization
9.520 Class 02 February 2011 Computational Learning Statistical Learning Theory Learning is viewed as a generalization/inference problem from usually small sets of high dimensional, noisy data. Learning
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationStochastic Optimization
Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic
More informationConcentration behavior of the penalized least squares estimator
Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationClass 2 & 3 Overfitting & Regularization
Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating
More informationInformation Geometry
2015 Workshop on High-Dimensional Statistical Analysis Dec.11 (Friday) ~15 (Tuesday) Humanities and Social Sciences Center, Academia Sinica, Taiwan Information Geometry and Spontaneous Data Learning Shinto
More informationConvex Optimization & Lagrange Duality
Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT
More informationConcentration, self-bounding functions
Concentration, self-bounding functions S. Boucheron 1 and G. Lugosi 2 and P. Massart 3 1 Laboratoire de Probabilités et Modèles Aléatoires Université Paris-Diderot 2 Economics University Pompeu Fabra 3
More informationECE598: Information-theoretic methods in high-dimensional statistics Spring 2016
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma
More informationStatistical Issues in Searches: Photon Science Response. Rebecca Willett, Duke University
Statistical Issues in Searches: Photon Science Response Rebecca Willett, Duke University 1 / 34 Photon science seen this week Applications tomography ptychography nanochrystalography coherent diffraction
More information19.1 Maximum Likelihood estimator and risk upper bound
ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 19: Denoising sparse vectors - Ris upper bound Lecturer: Yihong Wu Scribe: Ravi Kiran Raman, Apr 1, 016 This lecture
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationFast learning rates for plug-in classifiers under the margin condition
Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More informationLecture 2: Uniform Entropy
STAT 583: Advanced Theory of Statistical Inference Spring 218 Lecture 2: Uniform Entropy Lecturer: Fang Han April 16 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal
More informationThe PAC Learning Framework -II
The PAC Learning Framework -II Prof. Dan A. Simovici UMB 1 / 1 Outline 1 Finite Hypothesis Space - The Inconsistent Case 2 Deterministic versus stochastic scenario 3 Bayes Error and Noise 2 / 1 Outline
More informationSome Statistical Properties of Deep Networks
Some Statistical Properties of Deep Networks Peter Bartlett UC Berkeley August 2, 2018 1 / 22 Deep Networks Deep compositions of nonlinear functions h = h m h m 1 h 1 2 / 22 Deep Networks Deep compositions
More informationRademacher Bounds for Non-i.i.d. Processes
Rademacher Bounds for Non-i.i.d. Processes Afshin Rostamizadeh Joint work with: Mehryar Mohri Background Background Generalization Bounds - How well can we estimate an algorithm s true performance based
More informationarxiv: v1 [math.st] 8 Jan 2008
arxiv:0801.1158v1 [math.st] 8 Jan 2008 Hierarchical selection of variables in sparse high-dimensional regression P. J. Bickel Department of Statistics University of California at Berkeley Y. Ritov Department
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationInformation theoretic perspectives on learning algorithms
Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez
More informationConcentration function and other stuff
Concentration function and other stuff Sabrina Sixta Tuesday, June 16, 2014 Sabrina Sixta () Concentration function and other stuff Tuesday, June 16, 2014 1 / 13 Table of Contents Outline 1 Chernoff Bound
More informationSteiner s formula and large deviations theory
Steiner s formula and large deviations theory Venkat Anantharam EECS Department University of California, Berkeley May 19, 2015 Simons Conference on Networks and Stochastic Geometry Blanton Museum of Art
More informationregression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,
L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More informationGauge optimization and duality
1 / 54 Gauge optimization and duality Junfeng Yang Department of Mathematics Nanjing University Joint with Shiqian Ma, CUHK September, 2015 2 / 54 Outline Introduction Duality Lagrange duality Fenchel
More informationOn the singular values of random matrices
On the singular values of random matrices Shahar Mendelson Grigoris Paouris Abstract We present an approach that allows one to bound the largest and smallest singular values of an N n random matrix with
More informationOptimal Estimation of a Nonsmooth Functional
Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose
More informationRademacher Averages and Phase Transitions in Glivenko Cantelli Classes
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 1, JANUARY 2002 251 Rademacher Averages Phase Transitions in Glivenko Cantelli Classes Shahar Mendelson Abstract We introduce a new parameter which
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationVCMC: Variational Consensus Monte Carlo
VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object
More informationUniform concentration inequalities, martingales, Rademacher complexity and symmetrization
Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization John Duchi Outline I Motivation 1 Uniform laws of large numbers 2 Loss minimization and data dependence II Uniform
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationarxiv: v2 [math.st] 7 Feb 2017
Estimation bounds and sharp oracle inequalities of regularized procedures with Lipschitz loss functions Pierre Alquier,3,4, Vincent Cottet,3,4, Guillaume Lecué 2,3,4 CREST, ESAE, Université Paris Saclay
More informationThe Geometry of Hypothesis Testing over Convex Cones
The Geometry of Hypothesis Testing over Convex Cones Yuting Wei Department of Statistics, UC Berkeley BIRS workshop on Shape-Constrained Methods Jan 30rd, 2018 joint work with: Yuting Wei (UC Berkeley)
More informationGeneralization Bounds and Stability
Generalization Bounds and Stability Lorenzo Rosasco Tomaso Poggio 9.520 Class 9 2009 About this class Goal To recall the notion of generalization bounds and show how they can be derived from a stability
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More information21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)
10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationLecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture 7: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 3, 06 [Ed. Apr ] In last lecture, we studied the minimax
More informationRecovering overcomplete sparse representations from structured sensing
Recovering overcomplete sparse representations from structured sensing Deanna Needell Claremont McKenna College Feb. 2015 Support: Alfred P. Sloan Foundation and NSF CAREER #1348721. Joint work with Felix
More information10-704: Information Processing and Learning Fall Lecture 9: Sept 28
10-704: Information Processing and Learning Fall 2016 Lecturer: Siheng Chen Lecture 9: Sept 28 Note: These notes are based on scribed notes from Spring15 offering of this course. LaTeX template courtesy
More informationPenalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms
university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional
More informationA Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity
A Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity Peter Grünwald Centrum Wiskunde & Informatica Amsterdam Mathematical Institute Leiden University Joint work with
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More informationStatistical learning with Lipschitz and convex loss functions
Statistical learning with Lipschitz and convex loss functions Geoffrey Chinot, Guillaume Lecué and Matthieu Lerasle October 3, 08 Abstract We obtain risk bounds for Empirical Risk Minimizers ERM and minmax
More informationOrdinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods
Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Sandeep Juneja Tata Institute of Fundamental Research Mumbai, India joint work with Peter Glynn Applied
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationApproximately Gaussian marginals and the hyperplane conjecture
Approximately Gaussian marginals and the hyperplane conjecture Tel-Aviv University Conference on Asymptotic Geometric Analysis, Euler Institute, St. Petersburg, July 2010 Lecture based on a joint work
More informationSparsity oracle inequalities for the Lasso
Electronic Journal of Statistics Vol. 1 (007) 169 194 ISSN: 1935-754 DOI: 10.114/07-EJS008 Sparsity oracle inequalities for the Lasso Florentina Bunea Department of Statistics, Florida State University
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationA talk on Oracle inequalities and regularization. by Sara van de Geer
A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationAccelerate Subgradient Methods
Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods
More informationLecture 21: Minimax Theory
Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways
More informationRobust high-dimensional linear regression: A statistical perspective
Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,
More information16.1 Bounding Capacity with Covering Number
ECE598: Information-theoretic methods in high-dimensional statistics Spring 206 Lecture 6: Upper Bounds for Density Estimation Lecturer: Yihong Wu Scribe: Yang Zhang, Apr, 206 So far we have been mostly
More informationAdventures in Kernel Density Estimation! Clay Scott!! joint with!! JooSeuk Kim, Robert Vandermeulen, and! Efrén Cruz Cortés!
Adventures in Kernel Density Estimation! Clay Scott!! joint with!! JooSeuk Kim, Robert Vandermeulen, and! Efrén Cruz Cortés! Kernel Density Estimation! Kernel Density Estimation! b f (x ) = 1 n X i k ¾
More informationCompressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery
Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad
More informationMcGill University Math 354: Honors Analysis 3
Practice problems McGill University Math 354: Honors Analysis 3 not for credit Problem 1. Determine whether the family of F = {f n } functions f n (x) = x n is uniformly equicontinuous. 1st Solution: The
More informationBayesian Perspectives on Sparse Empirical Bayes Analysis (SEBA)
Bayesian Perspectives on Sparse Empirical Bayes Analysis (SEBA) Natalia Bochkina & Ya acov Ritov February 27, 2010 Abstract We consider a joint processing of n independent similar sparse regression problems.
More informationImproved Bounds on the Dot Product under Random Projection and Random Sign Projection
Improved Bounds on the Dot Product under Random Projection and Random Sign Projection Ata Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationBayesian Aggregation for Extraordinarily Large Dataset
Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work
More informationTheoretical Statistics. Lecture 12.
Theoretical Statistics. Lecture 12. Peter Bartlett Uniform laws of large numbers: Bounding Rademacher complexity. 1. Metric entropy. 2. Canonical Rademacher and Gaussian processes 1 Recall: Covering numbers
More informationMinimax Rates. Homology Inference
Minimax Rates for Homology Inference Don Sheehy Joint work with Sivaraman Balakrishan, Alessandro Rinaldo, Aarti Singh, and Larry Wasserman Something like a joke. Something like a joke. What is topological
More informationSparse solutions of underdetermined systems
Sparse solutions of underdetermined systems I-Liang Chern September 22, 2016 1 / 16 Outline Sparsity and Compressibility: the concept for measuring sparsity and compressibility of data Minimum measurements
More informationLearning Patterns for Detection with Multiscale Scan Statistics
Learning Patterns for Detection with Multiscale Scan Statistics J. Sharpnack 1 1 Statistics Department UC Davis UC Davis Statistics Seminar 2018 Work supported by NSF DMS-1712996 Hyperspectral Gas Detection
More informationMinimax rate of convergence and the performance of ERM in phase recovery
Minimax rate of convergence and the performance of ERM in phase recovery Guillaume Lecué,3 Shahar Mendelson,4,5 ovember 0, 03 Abstract We study the performance of Empirical Risk Minimization in noisy phase
More information1 Cheeger differentiation
1 Cheeger differentiation after J. Cheeger [1] and S. Keith [3] A summary written by John Mackay Abstract We construct a measurable differentiable structure on any metric measure space that is doubling
More informationRefined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities
Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities Maxim Raginsky and Igal Sason ISIT 2013, Istanbul, Turkey Capacity-Achieving Channel Codes The set-up DMC
More informationSupplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data
Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More information