New ways of dimension reduction? Cutting data sets into small pieces

Size: px
Start display at page:

Download "New ways of dimension reduction? Cutting data sets into small pieces"

Transcription

1 New ways of dimension reduction? Cutting data sets into small pieces Roman Vershynin University of Michigan, Department of Mathematics Statistical Machine Learning Ann Arbor, June 5, 2012 Joint work with Yaniv Plan, University of Michigan, Mathematics

2 Plan: 1 Dimension reduction: projecting vs. cutting data sets 2 Data recovery 3 Applications in sparse recovery 4 Applications in sparse binomial regression

3 Dimension reduction Data set: K R n. Dimension reduction Φ : R n R m : (1) m n; (2) Φ preserves the structure (geometry) of K. Classical way: Φ linear; orthogonal projection; random. Johnson-Lindenstrauss Lemma. Let K R n be a finite set. Consider an orthogonal projection Φ onto a random m-dimensional subspace in R n. If m δ log K, then with high probability, Φ(x) Φ(y) 2 δ x y 2 for all x, y K. Φ is an almost isometric embedding K R m.

4 An alternative way of dimension reduction Cut rather than project. Use m hyperplanes. For a pair x, y K, count d H (x, y) = number of hyperplanes separating x, y. For simplicity, K S n 1. Lemma (Cutting finite sets). Let K S n 1 be a finite set, and consider m δ log K independent random hyperplanes. Then with high probability, 1 m d H(x, y) = 1 π x y 2 ± δ for all x, y K. A similar result for K R n. Dimension reduction of K?

5 d H (x, y) = number of separating hyperplanes 1 x y 2 π Dimension reduction of K: For x K, record the orientations of x with respect to the m hyperplanes: Φ(x) = the vector of orientations { 1, 1} m. { 1, 1} m is the Hamming cube; the Hamming distance dist(φ(x), Φ(y)) = d H (x, y). Conclusion of Cutting Lemma: the map Φ : K { 1, 1} m, m log K is an almost isometric embedding.

6 Φ : K { 1, 1} m, m log K is an almost isometric embedding. Target dimension m log K, same as the binary encoding requires. Optimal. Φ is simpler than binary it is non-iterative, close to linear.

7 What is the cut embedding good for? First extend to general sets K R n, usually infinite. (JL lemma is available for infinite sets [Klartag-Mendelson 05, Schechtman 06].) What is the size of K? The width of K in the direction of η R n : sup η, u inf η, u = sup η, x. u K u K x K K Mean width of K: average over all directions: w(k) = E sup x K K g, x where g N(0, I n ).

8 Mean width: w(k) = E sup x K K g, x where g N(0, I n). Observation. Let K S n If K is finite, then w(k) log K. 2. If dim K = k, then w(k) k. So w(k) 2 = effective dimension of K. Theorem (Cutting general sets). Let K S n 1, and consider m δ w(k) 2 independent random hyperplanes. Then with high probability, 1 m d H(x, y) = 1 π x y 2 ± δ for all x, y K. Conclusion: Φ : K { 1, 1} m is an almost isometric embedding. Corollary. If K is finite, then m log K, just like in JL.

9 Thm (Cutting general sets). Let K S n 1, and consider m δ w(k) 2 independent random hyperplanes. Then with high probability, 1 m d H(x, y) = 1 x y 2 ± δ for all x, y K. π An immediate geometric consequence: Corollary (Cutting data sets into small pieces). These hyperplanes cut K into pieces of diameter δ. Proof. If x, y K are in the same cell then d H (x, y) = 0, thus by Theorem Q.E.D. 1 π x y 2 δ.

10 Φ : K { 1, 1} m, m w(k) 2 is an almost isometric embedding. Data recovery Recovery Problem. Estimate x K from y = Φ(x) { 1, 1} m. Suppose K is convex. (If not, pass to conv(k); the mean width won t change.) Corollary (Recovery). One can accurately estimate x K from y = Φ(x) by solving the convex feasibility program Find x K subject to y = Φ(x ). Indeed, the solution satisfies x x 2 δ. Proof. The feasible set of this program is some cell of K. Both x and x belong to this cell. But as we know, all cells have diameter δ. Q.E.D.

11 Corollary (Recovery). One can estimate x K from y = Φ(x) { 1, 1} m by solving the convex feasibility program Find x K subject to y = Φ(x ). Indeed, x x 2 δ. Robust? No. Flip a few bits of y, get an infeasible program. Is robust recovery possible? Yes. Change the viewpoint a little:

12 Linear algebraic view of cutting: Oriented hyperplane normal a R n. Random hyperplane random normal a N(0, I n ). m random hyperplanes m n Gaussian random matrix of normals a 1 A = a m Vector of orientations of x R n : Φ(x) = (sign a i, x ) m i=1 = sign(ax).

13 A is an m n Gaussian matrix. Robust Recovery Problem. Estimate x K from y = sign(ax) { 1, 1} m after some proportion of bits of y are corrupted (flipped). Answer: maximize correlation with the data: max Ax, y subject to x K. Theorem (Robust recovery). Let x K and y = sign(ax). We corrupt τm bits of y, getting ỹ. One can still accurately estimate x from ỹ by solving the convex program above (with ỹ). Indeed, the solution satisfies x x 2 δ + τ log(1/τ). Proof is based on the full power of the Cutting Theorem.

14 Applications in sparse recovery Specialize to sparse x: few non-zeros, x 0 s. S n,s = {x R n : x 2 1, x 0 s}. But S n,s is not convex. Convexify: conv(s n,s ) K n,s = {x R n : x 2 1, x 1 s}. K n,s = {approximately sparse vectors}. w(k n,s ) s log(n/s). Thus we have dimension reduction K { 1, 1} m, m w(k) 2 s log(n/s). Note: m is linear in the sparsity s.

15 Approx. sparse: K n,s = {x R n : x 2 1, x 1 s}. m w(k) 2 s log(n/s). Specialize the Robust Recovery Theorem to the sparse case: Corollary (Sparse recovery). Let x be approximately sparse, x K n,s and let y = sign(ax) { 1, 1} m where m δ s log(n/s). We can estimate x from y by solving the convex program max Ax, y subject to x K. Indeed, the solution satisfies x x 2 δ. The recovery is robust as before (i.e. one can flip bits of y).

16 Single-bit compressed sensing Traditional compressed sensing: recover an s-sparse signal x R n from m linear measurements given by y = Ax R m. Available results: recovery by convex programming, m s log(n/s). Single-bit compressed sensing: recover an s-sparse signal x R n from m single-bit measurements given by y = sign(ax) { 1, 1} m. An extreme way of measurement quantization, A/D conversion. [Boufounos-Baraniuk 08] formulated single-bit CS, connections with embeddings into the Hamming cube, algorithms. +[Gupta-Nowak-Recht 10, Jacques-Laska-Boufounos-Baraniuk 11] No tractable algorithms have been known (unless x has constant dynamic range, or for adaptive measurements). Present work: robust sparse recovery via convex programming.

17 Applications in binomial regression Our model of m one-bit measurements was y i = sign a i, x, i = 1,..., m. (a i N(0, I n )) More general stochastic model: y i = ±1 r.v s independent given {a i }, E y i = θ( a i, x ), i = 1,..., m. Here θ(u) is some function satisfying the correlation assumption: E θ(g)g =: λ > 0 g N(0, 1). Reason: ensures positive correlation with data. Since a i, x N(0, 1), E y i a i, x = E θ(g)g =: λ.

18 Model: E y i = θ( a i, x ), i = 1,..., m. Correlation assumption: E θ(g)g =: λ > 0. This is the generalized linear model (GLM) with link function θ 1. Example. θ(z) = tanh(z/2): logistic regression P{y i = 1} = f ( a i, x ), f (z) = ez e z + 1. Statistical notation: x = unknown coefficient vector (β), y i = binary response variables, a i = independent variables (x i ); ( n P{y i = 1} = f β j x ij ), i = 1,..., m j=1 Recent work on sparse logistic regression: [Negahban-Ravikumar-Wainwright-Yu 11, Bunea 08, Van De Geer 08, Bach 10, Ravikumar-Wainwright-Lafferty 10, Meier-Van De Geer-Bühlmann 08, Kakade-Shamir-Sridharan-Tewari 11]

19 GLM: E y i = θ( a i, x ), i = 1,..., m. Correlation assumption: E θ(g)g =: λ > 0. Theorem (Sparse binomial regression). Suppose we have a GLM where the coefficient vector x R n, x 2 = 1 is approximately s-sparse, x K n,s. If the sample size is m δ s log(n/s) then we can estimate x by solving the convex program max m y i a i, x subject to x K n,s. i=1 Indeed, the solution satisfies x x 2 δ/λ w.h.p. In statistics notation, the sample size is n s log(p/s), thus n p. New, unusual feature? The knowledge of the link function θ is not needed (unlike in max-likelihood approaches). Here the form of GLM may be unknown. The solution is non-parametric.

20 Summary: JL lemma: dimension reduction K R m by projecting K onto m-dimensional subspace. Alternative way: dimension reduction K { 1, 1} m is by cutting K into small pieces by m hyperplanes. Dimension reduction map: y = sign(ax) where A is an m n random Gaussian matrix. Target dimension m w(k) 2 the effective dimension of K. If K = {approximately s-sparse vectors}, then m s log(n/s). One can accurately and robustly estimate x from y by a convex program. More generally, one can accurately estimate a sparse solution to GLM E y = θ(ax), and without even knowing the link function θ.

Random hyperplane tessellations and dimension reduction

Random hyperplane tessellations and dimension reduction Random hyperplane tessellations and dimension reduction Roman Vershynin University of Michigan, Department of Mathematics Phenomena in high dimensions in geometric analysis, random matrices and computational

More information

Sampling and high-dimensional convex geometry

Sampling and high-dimensional convex geometry Sampling and high-dimensional convex geometry Roman Vershynin SampTA 2013 Bremen, Germany, June 2013 Geometry of sampling problems Signals live in high dimensions; sampling is often random. Geometry in

More information

arxiv: v2 [cs.it] 8 Apr 2013

arxiv: v2 [cs.it] 8 Apr 2013 ONE-BIT COMPRESSED SENSING WITH NON-GAUSSIAN MEASUREMENTS ALBERT AI, ALEX LAPANOWSKI, YANIV PLAN, AND ROMAN VERSHYNIN arxiv:1208.6279v2 [cs.it] 8 Apr 2013 Abstract. In one-bit compressed sensing, previous

More information

arxiv: v3 [cs.it] 19 Jul 2012

arxiv: v3 [cs.it] 19 Jul 2012 ROBUST 1-BIT COMPRESSED SENSING AND SPARSE LOGISTIC REGRESSION: A CONVEX PROGRAMMING APPROACH YANIV PLAN AND ROMAN VERSHYNIN arxiv:1202.1212v3 [cs.it] 19 Jul 2012 Abstract. This paper develops theoretical

More information

arxiv: v1 [cs.it] 21 Feb 2013

arxiv: v1 [cs.it] 21 Feb 2013 q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Exponential decay of reconstruction error from binary measurements of sparse signals

Exponential decay of reconstruction error from binary measurements of sparse signals Exponential decay of reconstruction error from binary measurements of sparse signals Deanna Needell Joint work with R. Baraniuk, S. Foucart, Y. Plan, and M. Wootters Outline Introduction Mathematical Formulation

More information

The Pros and Cons of Compressive Sensing

The Pros and Cons of Compressive Sensing The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Acceleration of Randomized Kaczmarz Method

Acceleration of Randomized Kaczmarz Method Acceleration of Randomized Kaczmarz Method Deanna Needell [Joint work with Y. Eldar] Stanford University BIRS Banff, March 2011 Problem Background Setup Setup Let Ax = b be an overdetermined consistent

More information

Optimal linear estimation under unknown nonlinear transform arxiv: v1 [stat.ml] 13 May 2015

Optimal linear estimation under unknown nonlinear transform arxiv: v1 [stat.ml] 13 May 2015 Optimal linear estimation under unknown nonlinear transform arxiv:505.0357v [stat.ml] 3 May 05 Xinyang Yi Zhaoran Wang Constantine Caramanis Han Liu Abstract Linear regression studies the problem of estimating

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Compressed Sensing: Lecture I. Ronald DeVore

Compressed Sensing: Lecture I. Ronald DeVore Compressed Sensing: Lecture I Ronald DeVore Motivation Compressed Sensing is a new paradigm for signal/image/function acquisition Motivation Compressed Sensing is a new paradigm for signal/image/function

More information

Quantized Iterative Hard Thresholding:

Quantized Iterative Hard Thresholding: Quantized Iterative Hard Thresholding: ridging 1-bit and High Resolution Quantized Compressed Sensing Laurent Jacques, Kévin Degraux, Christophe De Vleeschouwer Louvain University (UCL), Louvain-la-Neuve,

More information

The Pros and Cons of Compressive Sensing

The Pros and Cons of Compressive Sensing The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Greedy Sparsity-Constrained Optimization

Greedy Sparsity-Constrained Optimization Greedy Sparsity-Constrained Optimization Sohail Bahmani, Petros Boufounos, and Bhiksha Raj 3 sbahmani@andrew.cmu.edu petrosb@merl.com 3 bhiksha@cs.cmu.edu Department of Electrical and Computer Engineering,

More information

Bregman Divergences for Data Mining Meta-Algorithms

Bregman Divergences for Data Mining Meta-Algorithms p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

Restricted Strong Convexity Implies Weak Submodularity

Restricted Strong Convexity Implies Weak Submodularity Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Binary matrix completion

Binary matrix completion Binary matrix completion Yaniv Plan University of Michigan SAMSI, LDHD workshop, 2013 Joint work with (a) Mark Davenport (b) Ewout van den Berg (c) Mary Wootters Yaniv Plan (U. Mich.) Binary matrix completion

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

Optimal Linear Estimation under Unknown Nonlinear Transform

Optimal Linear Estimation under Unknown Nonlinear Transform Optimal Linear Estimation under Unknown Nonlinear Transform Xinyang Yi The University of Texas at Austin yixy@utexas.edu Constantine Caramanis The University of Texas at Austin constantine@utexas.edu Zhaoran

More information

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING Luis Rademacher, Ohio State University, Computer Science and Engineering. Joint work with Mikhail Belkin and James Voss This talk A new approach to multi-way

More information

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp CoSaMP Iterative signal recovery from incomplete and inaccurate samples Joel A. Tropp Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Joint with D. Needell

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Nearest Neighbor based Coordinate Descent

Nearest Neighbor based Coordinate Descent Nearest Neighbor based Coordinate Descent Pradeep Ravikumar, UT Austin Joint with Inderjit Dhillon, Ambuj Tewari Simons Workshop on Information Theory, Learning and Big Data, 2015 Modern Big Data Across

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

Adaptive one-bit matrix completion

Adaptive one-bit matrix completion Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Compressive Sensing and Beyond

Compressive Sensing and Beyond Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered

More information

Lecture 3: Compressive Classification

Lecture 3: Compressive Classification Lecture 3: Compressive Classification Richard Baraniuk Rice University dsp.rice.edu/cs Compressive Sampling Signal Sparsity wideband signal samples large Gabor (TF) coefficients Fourier matrix Compressive

More information

Greedy Signal Recovery and Uniform Uncertainty Principles

Greedy Signal Recovery and Uniform Uncertainty Principles Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles

More information

Sparse and Low Rank Recovery via Null Space Properties

Sparse and Low Rank Recovery via Null Space Properties Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,

More information

GREEDY SIGNAL RECOVERY REVIEW

GREEDY SIGNAL RECOVERY REVIEW GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

ROBUST BINARY FUSED COMPRESSIVE SENSING USING ADAPTIVE OUTLIER PURSUIT. Xiangrong Zeng and Mário A. T. Figueiredo

ROBUST BINARY FUSED COMPRESSIVE SENSING USING ADAPTIVE OUTLIER PURSUIT. Xiangrong Zeng and Mário A. T. Figueiredo 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) ROBUST BINARY FUSED COMPRESSIVE SENSING USING ADAPTIVE OUTLIER PURSUIT Xiangrong Zeng and Mário A. T. Figueiredo Instituto

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

sample lectures: Compressed Sensing, Random Sampling I & II

sample lectures: Compressed Sensing, Random Sampling I & II P. Jung, CommIT, TU-Berlin]: sample lectures Compressed Sensing / Random Sampling I & II 1/68 sample lectures: Compressed Sensing, Random Sampling I & II Peter Jung Communications and Information Theory

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

On Iterative Hard Thresholding Methods for High-dimensional M-Estimation

On Iterative Hard Thresholding Methods for High-dimensional M-Estimation On Iterative Hard Thresholding Methods for High-dimensional M-Estimation Prateek Jain Ambuj Tewari Purushottam Kar Microsoft Research, INDIA University of Michigan, Ann Arbor, USA {prajain,t-purkar}@microsoft.com,

More information

arxiv: v2 [cs.lg] 6 May 2017

arxiv: v2 [cs.lg] 6 May 2017 arxiv:170107895v [cslg] 6 May 017 Information Theoretic Limits for Linear Prediction with Graph-Structured Sparsity Abstract Adarsh Barik Krannert School of Management Purdue University West Lafayette,

More information

CS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT

CS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT CS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT Sparse Approximations Goal: approximate a highdimensional vector x by x that is sparse, i.e., has few nonzero

More information

Combining geometry and combinatorics

Combining geometry and combinatorics Combining geometry and combinatorics A unified approach to sparse signal recovery Anna C. Gilbert University of Michigan joint work with R. Berinde (MIT), P. Indyk (MIT), H. Karloff (AT&T), M. Strauss

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May University of Luxembourg Master in Mathematics Student project Compressed sensing Author: Lucien May Supervisor: Prof. I. Nourdin Winter semester 2014 1 Introduction Let us consider an s-sparse vector

More information

A SIMPLE TOOL FOR BOUNDING THE DEVIATION OF RANDOM MATRICES ON GEOMETRIC SETS

A SIMPLE TOOL FOR BOUNDING THE DEVIATION OF RANDOM MATRICES ON GEOMETRIC SETS A SIMPLE TOOL FOR BOUNDING THE DEVIATION OF RANDOM MATRICES ON GEOMETRIC SETS CHRISTOPHER LIAW, ABBAS MEHRABIAN, YANIV PLAN, AND ROMAN VERSHYNIN Abstract. Let A be an isotropic, sub-gaussian m n matrix.

More information

Supplementary material for a unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

Supplementary material for a unified framework for high-dimensional analysis of M-estimators with decomposable regularizers Submitted to the Statistical Science Supplementary material for a unified framework for high-dimensional analysis of M-estimators with decomposable regularizers Sahand N. Negahban 1, Pradeep Ravikumar,

More information

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

4. Algebra and Duality

4. Algebra and Duality 4-1 Algebra and Duality P. Parrilo and S. Lall, CDC 2003 2003.12.07.01 4. Algebra and Duality Example: non-convex polynomial optimization Weak duality and duality gap The dual is not intrinsic The cone

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Sparse Proteomics Analysis (SPA)

Sparse Proteomics Analysis (SPA) Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for Feature Selection from Forward Models Martin Genzel Technische Universität Berlin Winter School on Compressed Sensing December 5, 2015

More information

Corrupted Sensing: Novel Guarantees for Separating Structured Signals

Corrupted Sensing: Novel Guarantees for Separating Structured Signals 1 Corrupted Sensing: Novel Guarantees for Separating Structured Signals Rina Foygel and Lester Mackey Department of Statistics, Stanford University arxiv:130554v csit 4 Feb 014 Abstract We study the problem

More information

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008 Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections

More information

Low Resolution Adaptive Compressed Sensing for mmwave MIMO receivers

Low Resolution Adaptive Compressed Sensing for mmwave MIMO receivers Low Resolution Adaptive Compressed Sensing for mmwave MIMO receivers Cristian Rusu, Nuria González-Prelcic and Robert W. Heath Motivation 2 Power consumption at mmwave in a MIMO receiver Power at 60 GHz

More information

Problem Set 6: Solutions Math 201A: Fall a n x n,

Problem Set 6: Solutions Math 201A: Fall a n x n, Problem Set 6: Solutions Math 201A: Fall 2016 Problem 1. Is (x n ) n=0 a Schauder basis of C([0, 1])? No. If f(x) = a n x n, n=0 where the series converges uniformly on [0, 1], then f has a power series

More information

Three Generalizations of Compressed Sensing

Three Generalizations of Compressed Sensing Thomas Blumensath School of Mathematics The University of Southampton June, 2010 home prev next page Compressed Sensing and beyond y = Φx + e x R N or x C N x K is K-sparse and x x K 2 is small y R M or

More information

1-Bit Matrix Completion

1-Bit Matrix Completion 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011 Compressed Sensing Huichao Xue CS3750 Fall 2011 Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Convex Optimization: Applications

Convex Optimization: Applications Convex Optimization: Applications Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 1-75/36-75 Based on material from Boyd, Vandenberghe Norm Approximation minimize Ax b (A R m

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Gauge optimization and duality

Gauge optimization and duality 1 / 54 Gauge optimization and duality Junfeng Yang Department of Mathematics Nanjing University Joint with Shiqian Ma, CUHK September, 2015 2 / 54 Outline Introduction Duality Lagrange duality Fenchel

More information

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very

More information

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014 Convex optimization Javier Peña Carnegie Mellon University Universidad de los Andes Bogotá, Colombia September 2014 1 / 41 Convex optimization Problem of the form where Q R n convex set: min x f(x) x Q,

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 1 Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 2 A system of linear equations may not have a solution. It is well known that either Ax = c has a solution, or A T y = 0, c

More information

Sparse Interactions: Identifying High-Dimensional Multilinear Systems via Compressed Sensing

Sparse Interactions: Identifying High-Dimensional Multilinear Systems via Compressed Sensing Sparse Interactions: Identifying High-Dimensional Multilinear Systems via Compressed Sensing Bobak Nazer and Robert D. Nowak University of Wisconsin, Madison Allerton 10/01/10 Motivation: Virus-Host Interaction

More information

An algebraic perspective on integer sparse recovery

An algebraic perspective on integer sparse recovery An algebraic perspective on integer sparse recovery Lenny Fukshansky Claremont McKenna College (joint work with Deanna Needell and Benny Sudakov) Combinatorics Seminar USC October 31, 2018 From Wikipedia:

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

General principles for high-dimensional estimation: Statistics and computation

General principles for high-dimensional estimation: Statistics and computation General principles for high-dimensional estimation: Statistics and computation Martin Wainwright Statistics, and EECS UC Berkeley Joint work with: Garvesh Raskutti, Sahand Negahban Pradeep Ravikumar, Bin

More information

Solution Recovery via L1 minimization: What are possible and Why?

Solution Recovery via L1 minimization: What are possible and Why? Solution Recovery via L1 minimization: What are possible and Why? Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Eighth US-Mexico Workshop on Optimization

More information

High-dimensional Statistical Models

High-dimensional Statistical Models High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For

More information

Sparse recovery for spherical harmonic expansions

Sparse recovery for spherical harmonic expansions Rachel Ward 1 1 Courant Institute, New York University Workshop Sparsity and Cosmology, Nice May 31, 2011 Cosmic Microwave Background Radiation (CMB) map Temperature is measured as T (θ, ϕ) = k k=0 l=

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Dimensionality Reduction Notes 3

Dimensionality Reduction Notes 3 Dimensionality Reduction Notes 3 Jelani Nelson minilek@seas.harvard.edu August 13, 2015 1 Gordon s theorem Let T be a finite subset of some normed vector space with norm X. We say that a sequence T 0 T

More information

1-Bit Matrix Completion

1-Bit Matrix Completion 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Estimators based on non-convex programs: Statistical and computational guarantees

Estimators based on non-convex programs: Statistical and computational guarantees Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright

More information

Inner Product, Length, and Orthogonality

Inner Product, Length, and Orthogonality Inner Product, Length, and Orthogonality Linear Algebra MATH 2076 Linear Algebra,, Chapter 6, Section 1 1 / 13 Algebraic Definition for Dot Product u 1 v 1 u 2 Let u =., v = v 2. be vectors in Rn. The

More information

Robust 1-Bit Compressive Sensing via Binary Stable Embeddings of Sparse Vectors

Robust 1-Bit Compressive Sensing via Binary Stable Embeddings of Sparse Vectors Robust 1-Bit Compressive Sensing via Binary Stable Embeddings of Sparse Vectors Laurent Jacques, Jason N. Laska, Petros T. Boufounos, and Richard G. Baraniuk April 15, 211 Abstract The Compressive Sensing

More information

Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Outline 1 Introduction Dimensionality reduction The Johnson-Lindenstrauss Lemma Speeding up computation 2 The Fast Johnson-Lindenstrauss

More information