Robust Statistics, Revisited
|
|
- Tyler Augustine Lawrence
- 5 years ago
- Views:
Transcription
1 Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart
2 CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters?
3 CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters? Yes!
4 CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters? Yes! empirical mean: empirical variance:
5 R. A. Fisher The maximum likelihood estimator is asymptotically efficient ( )
6 R. A. Fisher J. W. Tukey The maximum likelihood estimator is asymptotically efficient ( ) What about errors in the model itself? (1960)
7 ROBUST STATISTICS What estimators behave well in a neighborhood around the model?
8 ROBUST STATISTICS What estimators behave well in a neighborhood around the model? Let s study a simple one-dimensional example.
9 ROBUST PARAMETER ESTIMATION Given corrupted samples from a 1-D Gaussian: + = ideal model noise observed model can we accurately estimate its parameters?
10 How do we constrain the noise?
11 How do we constrain the noise? Equivalently: L 1 -norm of noise at most O(ε)
12 How do we constrain the noise? Equivalently: L 1 -norm of noise at most O(ε) Arbitrarily corrupt O(ε)-fraction of samples (in expectation)
13 How do we constrain the noise? Equivalently: L 1 -norm of noise at most O(ε) Arbitrarily corrupt O(ε)-fraction of samples (in expectation) This generalizes Huber s Contamination Model: An adversary can add an ε-fraction of samples
14 How do we constrain the noise? Equivalently: L 1 -norm of noise at most O(ε) Arbitrarily corrupt O(ε)-fraction of samples (in expectation) This generalizes Huber s Contamination Model: An adversary can add an ε-fraction of samples Outliers: Points adversary has corrupted, Inliers: Points he hasn t
15 In what norm do we want the parameters to be close?
16 In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfsf(x) and g(x) is
17 In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfsf(x) and g(x) is From the bound on the L 1 -norm of the noise, we have: ideal observed
18 In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfsf(x) and g(x) is Goal: Find a 1-D Gaussian that satisfies estimate ideal
19 In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfsf(x) and g(x) is Equivalently, find a 1-D Gaussian that satisfies estimate observed
20 Do the empirical mean and empirical variance work?
21 Do the empirical mean and empirical variance work? No!
22 Do the empirical mean and empirical variance work? No! + = ideal model noise observed model
23 Do the empirical mean and empirical variance work? No! + = ideal model noise observed model A single corrupted sample can arbitrarily corrupt the estimates
24 Do the empirical mean and empirical variance work? No! + = ideal model noise observed model A single corrupted sample can arbitrarily corrupt the estimates But the median and median absolute deviation do work
25 Do the empirical mean and empirical variance work? No! + = ideal model noise observed model A single corrupted sample can arbitrarily corrupt the estimates But the median and median absolute deviation do work
26 Fact [Folklore]: Given samples from a distribution that are ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where
27 Fact [Folklore]: Given samples from a distribution that are ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where Also called (properly) agnostically learning a 1-D Gaussian
28 Fact [Folklore]: Given samples from a distribution that are ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where What about robust estimation in high-dimensions?
29 Fact [Folklore]: Given samples from a distribution that are ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where What about robust estimation in high-dimensions? e.g. microarrays with 10k genes
30 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions
31 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions
32 Main Problem: Given samples from a distribution that are ε-close in total variation distance to a d-dimensional Gaussian give an efficient algorithm to find parameters that satisfy
33 Main Problem: Given samples from a distribution that are ε-close in total variation distance to a d-dimensional Gaussian give an efficient algorithm to find parameters that satisfy Special Cases: (1) Unknown mean (2) Unknown covariance
34 A COMPENDIUM OF APPROACHES Unknown Mean Error Guarantee Running Time
35 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee Running Time
36 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time
37 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard
38 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median
39 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median poly(d,n)
40 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median O(ε d) poly(d,n)
41 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median O(ε d) poly(d,n) Tournament O(ε) N O(d)
42 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median O(ε d) poly(d,n) Tournament O(ε) N O(d) Pruning O(ε d) O(dN)
43 A COMPENDIUM OF APPROACHES Unknown Mean Error Guarantee Running Time Tukey Median O(ε) NP-Hard Geometric Median O(ε d) poly(d,n) Tournament O(ε) N O(d) Pruning O(ε d) O(dN)
44 The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension
45 The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension Equivalently: Computationally efficient estimators can only handle fraction of errors and get non-trivial (TV < 1) guarantees
46 The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension Equivalently: Computationally efficient estimators can only handle fraction of errors and get non-trivial (TV < 1) guarantees
47 The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension Equivalently: Computationally efficient estimators can only handle fraction of errors and get non-trivial (TV < 1) guarantees Is robust estimation algorithmically possible in high-dimensions?
48 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions
49 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions
50 OUR RESULTS Robust estimation is high-dimensions is algorithmically possible! Theorem [Diakonikolas, Li, Kamath, Kane, Moitra, Stewart 16]: There is an algorithm when given samples from a distribution that is ε-close in total variation distance to a d-dimensional Gaussian finds parameters that satisfy Moreover the algorithm runs in time poly(n, d)
51 OUR RESULTS Robust estimation is high-dimensions is algorithmically possible! Theorem [Diakonikolas, Li, Kamath, Kane, Moitra, Stewart 16]: There is an algorithm when given samples from a distribution that is ε-close in total variation distance to a d-dimensional Gaussian finds parameters that satisfy Moreover the algorithm runs in time poly(n, d) Alternatively: Can approximate the Tukey median, etc, in interesting semi-random models
52 Simultaneously [Lai, Rao, Vempala 16] gave agnostic algorithms that achieve: and work for non-gaussian distributions too
53 Simultaneously [Lai, Rao, Vempala 16] gave agnostic algorithms that achieve: and work for non-gaussian distributions too Many other applications across both papers: product distributions, mixtures of spherical Gaussians, SVD, ICA
54 A GENERAL RECIPE Robust estimation in high-dimensions: Step #1: Find an appropriate parameter distance Step #2: Detect when the naïve estimator has been compromised Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity
55 A GENERAL RECIPE Robust estimation in high-dimensions: Step #1: Find an appropriate parameter distance Step #2: Detect when the naïve estimator has been compromised Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity Let s see how this works for unknown mean
56 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions
57 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions
58 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians
59 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1)
60 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) This can be proven using Pinsker s Inequality and the well-known formula for KL-divergence between Gaussians
61 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1)
62 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) Corollary: If our estimate (in the unknown mean case) satisfies then
63 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) Corollary: If our estimate (in the unknown mean case) satisfies then Our new goal is to be close in Euclidean distance
64 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions
65 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions
66 DETECTING CORRUPTIONS Step #2: Detect when the naïve estimator has been compromised
67 DETECTING CORRUPTIONS Step #2: Detect when the naïve estimator has been compromised = uncorrupted = corrupted
68 DETECTING CORRUPTIONS Step #2: Detect when the naïve estimator has been compromised = uncorrupted = corrupted There is a direction of large (> 1) variance
69 Key Lemma: If X 1, X 2, X N come from a distribution that is ε-close to and then for (1) (2) with probability at least 1-δ
70 Key Lemma: If X 1, X 2, X N come from a distribution that is ε-close to and then for (1) (2) with probability at least 1-δ Take-away: An adversary needs to mess up the second moment in order to corrupt the first moment
71 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions
72 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions
73 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers
74 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that:
75 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: v where v is the direction of largest variance
76 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: v where v is the direction of largest variance, and T has a formula
77 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: T v where v is the direction of largest variance, and T has a formula
78 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points
79 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we d have no corrupted points left!
80 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we d have no corrupted points left! Eventually we find (certifiably) good parameters
81 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we d have no corrupted points left! Eventually we find (certifiably) good parameters Running Time: Sample Complexity:
82 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we d have no corrupted points left! Eventually we find (certifiably) good parameters Running Time: Sample Complexity: Concentration of LTFs
83 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions
84 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions
85 A GENERAL RECIPE Robust estimation in high-dimensions: Step #1: Find an appropriate parameter distance Step #2: Detect when the naïve estimator has been compromised Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity
86 A GENERAL RECIPE Robust estimation in high-dimensions: Step #1: Find an appropriate parameter distance Step #2: Detect when the naïve estimator has been compromised Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity How about for unknown covariance?
87 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians
88 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2)
89 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2) Again, proven using Pinsker s Inequality
90 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2) Again, proven using Pinsker s Inequality Our new goal is to find an estimate that satisfies:
91 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2) Again, proven using Pinsker s Inequality Our new goal is to find an estimate that satisfies: Distance seems strange, but it s the right one to use to bound TV
92 UNKNOWN COVARIANCE What if we are given samples from?
93 UNKNOWN COVARIANCE What if we are given samples from? How do we detect if the naïve estimator is compromised?
94 UNKNOWN COVARIANCE What if we are given samples from? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices
95 UNKNOWN COVARIANCE What if we are given samples from? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices Proof uses Isserlis s Theorem
96 UNKNOWN COVARIANCE What if we are given samples from? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices need to project out
97 Key Idea: Transform the data, look for restricted large eigenvalues
98 Key Idea: Transform the data, look for restricted large eigenvalues
99 Key Idea: Transform the data, look for restricted large eigenvalues If were the true covariance, we would have for inliers
100 Key Idea: Transform the data, look for restricted large eigenvalues If were the true covariance, we would have for inliers, in which case: would have small restricted eigenvalues
101 Key Idea: Transform the data, look for restricted large eigenvalues If were the true covariance, we would have for inliers, in which case: would have small restricted eigenvalues Take-away: An adversary needs to mess up the (restricted) fourth moment in order to corrupt the second moment
102 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian
103 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick
104 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance
105 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance Step #2: (Agnostic) isotropic position
106 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance Step #2: (Agnostic) isotropic position right distance, in general case
107 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance Step #2: (Agnostic) isotropic position right distance, in general case Now use algorithm for unknown mean
108 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions
109 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions
110 FURTHER RESULTS Use restricted eigenvalue problems to detect outliers
111 FURTHER RESULTS Use restricted eigenvalue problems to detect outliers Binary Product Distributions:
112 FURTHER RESULTS Use restricted eigenvalue problems to detect outliers Binary Product Distributions: Mixtures of Two c-balanced Binary Product Distributions:
113 FURTHER RESULTS Use restricted eigenvalue problems to detect outliers Binary Product Distributions: Mixtures of Two c-balanced Binary Product Distributions: Mixtures of k Spherical Gaussians:
114 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown mean): + 10% noise
115 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown mean): excess `2 error excess `2 error dimension dimension Filtering Sample mean w/ noise RANSAC LRVMean Pruning Geometric Median
116 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown covariance, isotropic): + 10% noise close to identity
117 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown covariance, isotropic): excess `2 error excess `2 error dimension dimension Filtering Sample covariance w/ noise RANSAC LRVCov Pruning
118 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown covariance, anisotropic): + 10% noise far from identity
119 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown covariance, anisotropic): excess `2 error excess `2 error dimension dimension Filtering Sample covariance w/ noise RANSAC LRVCov Pruning
120 REAL DATA EXPERIMENTS Famous study of [Novembreet al. 08]: Take top two singular vectors of people x SNP matrix (POPRES)
121 REAL DATA EXPERIMENTS Famous study of [Novembreet al. 08]: Take top two singular vectors of people x SNP matrix (POPRES) Original Data
122 REAL DATA EXPERIMENTS Famous study of [Novembreet al. 08]: Take top two singular vectors of people x SNP matrix (POPRES) Original Data
123 REAL DATA EXPERIMENTS Famous study of [Novembreet al. 08]: Take top two singular vectors of people x SNP matrix (POPRES) Original Data Genes Mirror Geography in Europe
124 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise?
125 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? % noise Pruning Projection What PCA finds
126 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? % noise Pruning Projection What PCA finds
127 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? % noise RANSAC Projection What RANSAC finds
128 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? 10% noise XCS Projection What robust PCA (via SDPs) finds
129 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? 10% noise Filter Projection What our methods find
130 REAL DATA EXPERIMENTS The power of provably robust estimation: 10% noise Filter Projection no noise Original Data What our methods find
131 LOOKING FORWARD Can algorithms for agnostically learning a Gaussian help in exploratory data analysis in high-dimensions?
132 LOOKING FORWARD Can algorithms for agnostically learning a Gaussian help in exploratory data analysis in high-dimensions? Isn t this what we would have been doing with robust statistical estimators, if we had them all along?
133 Summary: Nearly optimal algorithm for agnostically learning a high-dimensional Gaussian General recipe using restricted eigenvalue problems Further applications to other mixture models Is practical, robust statistics within reach? Thanks! Any Questions?
Robustness Meets Algorithms
Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately
More informationComputational Lower Bounds for Statistical Estimation Problems
Computational Lower Bounds for Statistical Estimation Problems Ilias Diakonikolas (USC) (joint with Daniel Kane (UCSD) and Alistair Stewart (USC)) Workshop on Local Algorithms, MIT, June 2018 THIS TALK
More informationFaster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians. Constantinos Daskalakis, MIT Gautam Kamath, MIT
Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians Constantinos Daskalakis, MIT Gautam Kamath, MIT What s a Gaussian Mixture Model (GMM)? Interpretation 1: PDF is a convex
More informationStatistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures (Extended Abstract)
58th Annual IEEE Symposium on Foundations of Computer Science Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures (Extended Abstract) Ilias Diakonikolas
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationComputationally Efficient Robust Estimation of Sparse Functionals
Computationally Efficient Robust Estimation of Sparse Functionals Simon S Du, Sivaraman Balakrishnan, and Aarti Singh Machine Learning Department, Carnegie Mellon University, USA Statistics Department,
More informationThe Informativeness of k-means for Learning Mixture Models
The Informativeness of k-means for Learning Mixture Models Vincent Y. F. Tan (Joint work with Zhaoqiang Liu) National University of Singapore June 18, 2018 1/35 Gaussian distribution For F dimensions,
More informationFourier PCA. Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech)
Fourier PCA Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech) Introduction 1. Describe a learning problem. 2. Develop an efficient tensor decomposition. Independent component
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationSharp bounds for learning a mixture of two Gaussians
Sharp bounds for learning a mixture of two Gaussians Moritz Hardt Eric Price IBM Almaden 2014-05-28 Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 1 / 25
More informationRobust polynomial regression up to the information theoretic limit
Robust polynomial regression up to the information theoretic limit Daniel Kane Sushrut Karmalkar Eric Price August 16, 017 Abstract We consider the problem of robust polynomial regression, where one receives
More informationA Practical Algorithm for Topic Modeling with Provable Guarantees
1 A Practical Algorithm for Topic Modeling with Provable Guarantees Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu Reviewed by Zhao Song December
More informationLearning SVM Classifiers with Indefinite Kernels
Learning SVM Classifiers with Indefinite Kernels Suicheng Gu and Yuhong Guo Dept. of Computer and Information Sciences Temple University Support Vector Machines (SVMs) (Kernel) SVMs are widely used in
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationA Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k
A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k Jerry Li MIT jerryzli@mit.edu Ludwig Schmidt MIT ludwigs@mit.edu June 27, 205 Abstract Learning
More informationECE 598: Representation Learning: Algorithms and Models Fall 2017
ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationOn Learning Mixtures of Well-Separated Gaussians
On Learning Mixtures of Well-Separated Gaussians Oded Regev Aravindan Viayaraghavan Abstract We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the
More informationDisentangling Gaussians
Disentangling Gaussians Ankur Moitra, MIT November 6th, 2014 Dean s Breakfast Algorithmic Aspects of Machine Learning 2015 by Ankur Moitra. Note: These are unpolished, incomplete course notes. Developed
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationLimitations in Approximating RIP
Alok Puranik Mentor: Adrian Vladu Fifth Annual PRIMES Conference, 2015 Outline 1 Background The Problem Motivation Construction Certification 2 Planted model Planting eigenvalues Analysis Distinguishing
More informationEE16B Designing Information Devices and Systems II
EE6B Designing Information Devices and Systems II Lecture 9B Geometry of SVD, PCA Uniqueness of the SVD Find SVD of A 0 A 0 AA T 0 ) ) 0 0 ~u ~u 0 ~u ~u ~u ~u Uniqueness of the SVD Find SVD of A 0 A 0
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More informationNonlinear Signal Processing ELEG 833
Nonlinear Signal Processing ELEG 833 Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware arce@ee.udel.edu May 5, 2005 8 MYRIAD SMOOTHERS 8 Myriad Smoothers 8.1 FLOM
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationEE16B Designing Information Devices and Systems II
EE16B Designing Information Devices and Systems II Lecture 9A Geometry of SVD, PCA Intro Last time: Described the SVD in Compact matrix form: U1SV1 T Full form: UΣV T Showed a procedure to SVD via A T
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationSub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments
Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments Stas Minsker University of Southern California July 21, 2016 ICERM Workshop Simple question: how to estimate
More informationFuncICA for time series pattern discovery
FuncICA for time series pattern discovery Nishant Mehta and Alexander Gray Georgia Institute of Technology The problem Given a set of inherently continuous time series (e.g. EEG) Find a set of patterns
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationPROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata
' / PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE Noboru Murata Waseda University Department of Electrical Electronics and Computer Engineering 3--
More informationGatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II
Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference
More informationNoisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get
Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationMaximum Margin Matrix Factorization
Maximum Margin Matrix Factorization Nati Srebro Toyota Technological Institute Chicago Joint work with Noga Alon Tel-Aviv Yonatan Amit Hebrew U Alex d Aspremont Princeton Michael Fink Hebrew U Tommi Jaakkola
More informationRelational Nonlinear FIR Filters. Ronald K. Pearson
Relational Nonlinear FIR Filters Ronald K. Pearson Daniel Baugh Institute for Functional Genomics and Computational Biology Thomas Jefferson University Philadelphia, PA Moncef Gabbouj Institute of Signal
More informationCIFAR Lectures: Non-Gaussian statistics and natural images
CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity
More informationLearning and Testing Structured Distributions
Learning and Testing Structured Distributions Ilias Diakonikolas University of Edinburgh Simons Institute March 2015 This Talk Algorithmic Framework for Distribution Estimation: Leads to fast & robust
More informationImage Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang
Image Noise: Detection, Measurement and Removal Techniques Zhifei Zhang Outline Noise measurement Filter-based Block-based Wavelet-based Noise removal Spatial domain Transform domain Non-local methods
More informationIgnoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization
Proceedings of Machine Learning Research vol 65:1 33, 017 Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization Daniel Vainsencher Voleon Shie Mannor Faculty of Electrical Engineering,
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More informationRandom Matrices: Invertibility, Structure, and Applications
Random Matrices: Invertibility, Structure, and Applications Roman Vershynin University of Michigan Colloquium, October 11, 2011 Roman Vershynin (University of Michigan) Random Matrices Colloquium 1 / 37
More informationScalable robust hypothesis tests using graphical models
Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either
More informationIndependent Component Analysis. PhD Seminar Jörgen Ungh
Independent Component Analysis PhD Seminar Jörgen Ungh Agenda Background a motivater Independence ICA vs. PCA Gaussian data ICA theory Examples Background & motivation The cocktail party problem Bla bla
More informationThe properties of L p -GMM estimators
The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion
More informationStatistical Inference On the High-dimensional Gaussian Covarianc
Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationIrr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland
Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS
More informationLinear Dependent Dimensionality Reduction
Linear Dependent Dimensionality Reduction Nathan Srebro Tommi Jaakkola Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 239 nati@mit.edu,tommi@ai.mit.edu
More information1 Distribution Property Testing
ECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data Instructor: Jayadev Acharya Lecture #10-11 Scribe: JA 27, 29 September, 2016 Please send errors to acharya@cornell.edu 1 Distribution
More informationSum-of-Squares Method, Tensor Decomposition, Dictionary Learning
Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning David Steurer Cornell Approximation Algorithms and Hardness, Banff, August 2014 for many problems (e.g., all UG-hard ones): better guarantees
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction
More informationPhysics 509: Bootstrap and Robust Parameter Estimation
Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept
More informationA Generalization of Principal Component Analysis to the Exponential Family
A Generalization of Principal Component Analysis to the Exponential Family Michael Collins Sanjoy Dasgupta Robert E. Schapire AT&T Labs Research 8 Park Avenue, Florham Park, NJ 7932 mcollins, dasgupta,
More informationAn Introduction to Spectral Learning
An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,
More informationarxiv: v1 [cs.lg] 7 Mar 2018
Sever: A Robust Meta-Algorithm for Stochastic Optimization arxiv:1803.02815v1 [cs.lg] 7 Mar 2018 Ilias Diakonikolas CS, USC diakonik@usc.edu Jerry Li EECS & CSAIL, MIT jerryzli@mit.edu Gautam Kamath EECS
More informationNonrobust and Robust Objective Functions
Nonrobust and Robust Objective Functions The objective function of the estimators in the input space is built from the sum of squared Mahalanobis distances (residuals) d 2 i = 1 σ 2(y i y io ) C + y i
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationIndependent Component Analysis
A Short Introduction to Independent Component Analysis Aapo Hyvärinen Helsinki Institute for Information Technology and Depts of Computer Science and Psychology University of Helsinki Problem of blind
More informationStat 159/259: Linear Algebra Notes
Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationIndependent Component (IC) Models: New Extensions of the Multinormal Model
Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research
More informationCPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018
CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,
More informationSupplemental Materials for. Local Multidimensional Scaling for. Nonlinear Dimension Reduction, Graph Drawing. and Proximity Analysis
Supplemental Materials for Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Drawing and Proximity Analysis Lisha Chen and Andreas Buja Yale University and University of Pennsylvania
More informationADAPTIVE FILTER THEORY
ADAPTIVE FILTER THEORY Fourth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada Front ice Hall PRENTICE HALL Upper Saddle River, New Jersey 07458 Preface
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationapproximation algorithms I
SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 201 meta-task encoded as low-degree polynomial in R x example: f(x) = i,j n w ij x i x j 2 given: functions
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationAugmented Reality VU numerical optimization algorithms. Prof. Vincent Lepetit
Augmented Reality VU numerical optimization algorithms Prof. Vincent Lepetit P3P: can exploit only 3 correspondences; DLT: difficult to exploit the knowledge of the internal parameters correctly, and the
More informationADAPTIVE FILTER THEORY
ADAPTIVE FILTER THEORY Fifth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada International Edition contributions by Telagarapu Prabhakar Department
More informationLecture 14 October 13
STAT 383C: Statistical Modeling I Fall 2015 Lecture 14 October 13 Lecturer: Purnamrita Sarkar Scribe: Some one Disclaimer: These scribe notes have been slightly proofread and may have typos etc. Note:
More informationLearning from Untrusted Data
ABSTRACT Moses Charikar Stanford University Stanford, CA 94305, USA moses@cs.stanford.edu The vast majority of theoretical results in machine learning and statistics assume that the training data is a
More informationNetwork Localization via Schatten Quasi-Norm Minimization
Network Localization via Schatten Quasi-Norm Minimization Anthony Man-Cho So Department of Systems Engineering & Engineering Management The Chinese University of Hong Kong (Joint Work with Senshan Ji Kam-Fung
More informationS-estimators in mapping applications
S-estimators in mapping applications João Sequeira Instituto Superior Técnico, Technical University of Lisbon Portugal Email: joaosilvasequeira@istutlpt Antonios Tsourdos Autonomous Systems Group, Department
More informationRobust estimation of scale and covariance with P n and its application to precision matrix estimation
Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY
More informationAsymptotic distribution of the largest eigenvalue with application to genetic data
Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene
More informationStatistical Geometry Processing Winter Semester 2011/2012
Statistical Geometry Processing Winter Semester 2011/2012 Linear Algebra, Function Spaces & Inverse Problems Vector and Function Spaces 3 Vectors vectors are arrows in space classically: 2 or 3 dim. Euclidian
More informationLUCK S THEOREM ALEX WRIGHT
LUCK S THEOREM ALEX WRIGHT Warning: These are the authors personal notes for a talk in a learning seminar (October 2015). There may be incorrect or misleading statements. Corrections welcome. 1. Convergence
More informationLQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin
LQR, Kalman Filter, and LQG Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin May 2015 Linear Quadratic Regulator (LQR) Consider a linear system
More informationcross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way.
10-708: Probabilistic Graphical Models, Spring 2015 22 : Optimization and GMs (aka. LDA, Sparse Coding, Matrix Factorization, and All That ) Lecturer: Yaoliang Yu Scribes: Yu-Xiang Wang, Su Zhou 1 Introduction
More informationAppendix A. Proof to Theorem 1
Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation
More informationA Novel M-Estimator for Robust PCA
Journal of Machine Learning Research 15 (2014) 749-808 Submitted 12/11; Revised 6/13; Published 2/14 A Novel M-Estimator for Robust PCA Teng Zhang Institute for Mathematics and its Applications University
More informationSum-of-Squares and Spectral Algorithms
Sum-of-Squares and Spectral Algorithms Tselil Schramm June 23, 2017 Workshop on SoS @ STOC 2017 Spectral algorithms as a tool for analyzing SoS. SoS Semidefinite Programs Spectral Algorithms SoS suggests
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationSupremum of simple stochastic processes
Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there
More informationcovariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of
Index* The Statistical Analysis of Time Series by T. W. Anderson Copyright 1971 John Wiley & Sons, Inc. Aliasing, 387-388 Autoregressive {continued) Amplitude, 4, 94 case of first-order, 174 Associated
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationCSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization
CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization April 6, 2018 1 / 34 This material is covered in the textbook, Chapters 9 and 10. Some of the materials are taken from it. Some of
More informationTutorial on: Optimization I. (from a deep learning perspective) Jimmy Ba
Tutorial on: Optimization I (from a deep learning perspective) Jimmy Ba Outline Random search v.s. gradient descent Finding better search directions Design white-box optimization methods to improve computation
More informationSmall sample size in high dimensional space - minimum distance based classification.
Small sample size in high dimensional space - minimum distance based classification. Ewa Skubalska-Rafaj lowicz Institute of Computer Engineering, Automatics and Robotics, Department of Electronics, Wroc
More informationTight Bounds for Learning a Mixture of Two Gaussians
Tight Bounds for Learning a Mixture of Two Gaussians Moritz Hardt m@mrtz.org IBM Research Almaden Eric Price ecprice@cs.utexas.edu The University of Texas at Austin May 16, 2015 Abstract We consider the
More informationHST.582J/6.555J/16.456J
Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear
More information