Robustness Meets Algorithms
|
|
- Tyler Dorsey
- 5 years ago
- Views:
Transcription
1 Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th
2 CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters?
3 CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters? Yes!
4 CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters? Yes! empirical mean: empirical variance:
5 R. A. Fisher The maximum likelihood estimator is asymptotically efficient ( )
6 R. A. Fisher J. W. Tukey The maximum likelihood estimator is asymptotically efficient ( ) What about errors in the model itself? (1960)
7 ROBUST STATISTICS What estimators behave well in a neighborhood around the model?
8 ROBUST STATISTICS What estimators behave well in a neighborhood around the model? Let s study a simple one-dimensional example.
9 ROBUST PARAMETER ESTIMATION Given corrupted samples from a 1-D Gaussian: + = ideal model noise observed model can we accurately estimate its parameters?
10 How do we constrain the noise?
11 How do we constrain the noise? Equivalently: L 1 -norm of noise at most O(ε)
12 How do we constrain the noise? Equivalently: L 1 -norm of noise at most O(ε) Arbitrarily corrupt O(ε)-fraction of samples (in expectation)
13 How do we constrain the noise? Equivalently: L 1 -norm of noise at most O(ε) Arbitrarily corrupt O(ε)-fraction of samples (in expectation) This generalizes Huber s Contamination Model: An adversary can add an ε-fraction of samples
14 How do we constrain the noise? Equivalently: L 1 -norm of noise at most O(ε) Arbitrarily corrupt O(ε)-fraction of samples (in expectation) This generalizes Huber s Contamination Model: An adversary can add an ε-fraction of samples Outliers: Points adversary has corrupted, Inliers: Points he hasn t
15 In what norm do we want the parameters to be close?
16 In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfsf(x) and g(x) is
17 In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfsf(x) and g(x) is From the bound on the L 1 -norm of the noise, we have: ideal observed
18 In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfsf(x) and g(x) is Goal: Find a 1-D Gaussian that satisfies estimate ideal
19 In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfsf(x) and g(x) is Equivalently, find a 1-D Gaussian that satisfies estimate observed
20 Do the empirical mean and empirical variance work?
21 Do the empirical mean and empirical variance work? No!
22 Do the empirical mean and empirical variance work? No! + = ideal model noise observed model
23 Do the empirical mean and empirical variance work? No! + = ideal model noise observed model A single corrupted sample can arbitrarily corrupt the estimates
24 Do the empirical mean and empirical variance work? No! + = ideal model noise observed model A single corrupted sample can arbitrarily corrupt the estimates But the median and median absolute deviation do work
25 Do the empirical mean and empirical variance work? No! + = ideal model noise observed model A single corrupted sample can arbitrarily corrupt the estimates But the median and median absolute deviation do work
26 Fact [Folklore]: Given samples from a distribution that are ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where
27 Fact [Folklore]: Given samples from a distribution that are ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where Also called (properly) agnostically learning a 1-D Gaussian
28 Fact [Folklore]: Given samples from a distribution that are ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where What about robust estimation in high-dimensions?
29 Fact [Folklore]: Given samples from a distribution that are ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where What about robust estimation in high-dimensions? e.g. microarrays with 10k genes
30 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions
31 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions
32 Main Problem: Given samples from a distribution that are ε-close in total variation distance to a d-dimensional Gaussian give an efficient algorithm to find parameters that satisfy
33 Main Problem: Given samples from a distribution that are ε-close in total variation distance to a d-dimensional Gaussian give an efficient algorithm to find parameters that satisfy Special Cases: (1) Unknown mean (2) Unknown covariance
34 A COMPENDIUM OF APPROACHES Unknown Mean Error Guarantee Running Time
35 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee Running Time
36 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time
37 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard
38 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median
39 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median poly(d,n)
40 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median O(ε d) poly(d,n)
41 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median O(ε d) poly(d,n) Tournament O(ε) N O(d)
42 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median O(ε d) poly(d,n) Tournament O(ε) N O(d) Pruning O(ε d) O(dN)
43 A COMPENDIUM OF APPROACHES Unknown Mean Error Guarantee Running Time Tukey Median O(ε) NP-Hard Geometric Median O(ε d) poly(d,n) Tournament O(ε) N O(d) Pruning O(ε d) O(dN)
44 The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension
45 The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension Equivalently: Computationally efficient estimators can only handle fraction of errors and get non-trivial (TV < 1) guarantees
46 The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension Equivalently: Computationally efficient estimators can only handle fraction of errors and get non-trivial (TV < 1) guarantees
47 The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension Equivalently: Computationally efficient estimators can only handle fraction of errors and get non-trivial (TV < 1) guarantees Is robust estimation algorithmically possible in high-dimensions?
48 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions
49 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions
50 RECENT RESULTS Robust estimation is high-dimensions is algorithmically possible! Theorem [Diakonikolas, Li, Kamath, Kane, Moitra, Stewart 16]: There is an algorithm when given samples from a distribution that is ε-close in total variation distance to a d-dimensional Gaussian finds parameters that satisfy Moreover the algorithm runs in time poly(n, d)
51 RECENT RESULTS Robust estimation is high-dimensions is algorithmically possible! Theorem [Diakonikolas, Li, Kamath, Kane, Moitra, Stewart 16]: There is an algorithm when given samples from a distribution that is ε-close in total variation distance to a d-dimensional Gaussian finds parameters that satisfy Moreover the algorithm runs in time poly(n, d) Alternatively: Can approximate the Tukey median, etc, in interesting semi-random models
52 Independently and concurrently: Theorem [Lai, Rao, Vempala 16]: There is an algorithm when given samples from a distribution that is ε-close in total variation distance to a d-dimensional Gaussian finds parameters that satisfy Moreover the algorithm runs in time poly(n, d)
53 Independently and concurrently: Theorem [Lai, Rao, Vempala 16]: There is an algorithm when given samples from a distribution that is ε-close in total variation distance to a d-dimensional Gaussian finds parameters that satisfy Moreover the algorithm runs in time poly(n, d) When the covariance is bounded, this translates to:
54 A GENERAL RECIPE Robust estimation in high-dimensions: Step #1: Find an appropriate parameter distance Step #2: Detect when the naïve estimator has been compromised Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity
55 A GENERAL RECIPE Robust estimation in high-dimensions: Step #1: Find an appropriate parameter distance Step #2: Detect when the naïve estimator has been compromised Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity Let s see how this works for unknown mean
56 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions
57 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions
58 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians
59 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1)
60 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) This can be proven using Pinsker s Inequality and the well-known formula for KL-divergence between Gaussians
61 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1)
62 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) Corollary: If our estimate (in the unknown mean case) satisfies then
63 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) Corollary: If our estimate (in the unknown mean case) satisfies then Our new goal is to be close in Euclidean distance
64 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions
65 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions
66 DETECTING CORRUPTIONS Step #2: Detect when the naïve estimator has been compromised
67 DETECTING CORRUPTIONS Step #2: Detect when the naïve estimator has been compromised = uncorrupted = corrupted
68 DETECTING CORRUPTIONS Step #2: Detect when the naïve estimator has been compromised = uncorrupted = corrupted There is a direction of large (> 1) variance
69 Key Lemma: If X 1, X 2, X N come from a distribution that is ε-close to and then for (1) (2) with probability at least 1-δ
70 Key Lemma: If X 1, X 2, X N come from a distribution that is ε-close to and then for (1) (2) with probability at least 1-δ Take-away: An adversary needs to mess up the second moment in order to corrupt the first moment
71 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions
72 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions
73 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers
74 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that:
75 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: v where v is the direction of largest variance
76 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: v where v is the direction of largest variance, and T has a formula
77 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: T v where v is the direction of largest variance, and T has a formula
78 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points
79 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we d have no corrupted points left!
80 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we d have no corrupted points left! Eventually we find (certifiably) good parameters
81 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we d have no corrupted points left! Eventually we find (certifiably) good parameters Running Time: Sample Complexity:
82 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we d have no corrupted points left! Eventually we find (certifiably) good parameters Running Time: Sample Complexity: Concentration of LTFs
83 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions
84 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions
85 A GENERAL RECIPE Robust estimation in high-dimensions: Step #1: Find an appropriate parameter distance Step #2: Detect when the naïve estimator has been compromised Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity
86 A GENERAL RECIPE Robust estimation in high-dimensions: Step #1: Find an appropriate parameter distance Step #2: Detect when the naïve estimator has been compromised Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity How about for unknown covariance?
87 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians
88 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2)
89 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2) Again, proven using Pinsker s Inequality
90 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2) Again, proven using Pinsker s Inequality Our new goal is to find an estimate that satisfies:
91 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2) Again, proven using Pinsker s Inequality Our new goal is to find an estimate that satisfies: Distance seems strange, but it s the right one to use to bound TV
92 UNKNOWN COVARIANCE What if we are given samples from?
93 UNKNOWN COVARIANCE What if we are given samples from? How do we detect if the naïve estimator is compromised?
94 UNKNOWN COVARIANCE What if we are given samples from? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices
95 UNKNOWN COVARIANCE What if we are given samples from? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices Proof uses Isserlis s Theorem
96 UNKNOWN COVARIANCE What if we are given samples from? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices need to project out
97 Key Idea: Transform the data, look for restricted large eigenvalues
98 Key Idea: Transform the data, look for restricted large eigenvalues
99 Key Idea: Transform the data, look for restricted large eigenvalues If were the true covariance, we would have for inliers
100 Key Idea: Transform the data, look for restricted large eigenvalues If were the true covariance, we would have for inliers, in which case: would have small restricted eigenvalues
101 Key Idea: Transform the data, look for restricted large eigenvalues If were the true covariance, we would have for inliers, in which case: would have small restricted eigenvalues Take-away: An adversary needs to mess up the (restricted) fourth moment in order to corrupt the second moment
102 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian
103 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick
104 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance
105 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance Step #2: (Agnostic) isotropic position
106 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance Step #2: (Agnostic) isotropic position right distance, in general case
107 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance Step #2: (Agnostic) isotropic position right distance, in general case Now use algorithm for unknown mean
108 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions
109 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions
110 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown mean): + 10% noise
111 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown mean): excess `2 error excess `2 error dimension dimension Filtering Sample mean w/ noise RANSAC LRVMean Pruning Geometric Median
112 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown covariance, isotropic): + 10% noise close to identity
113 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown covariance, isotropic): excess `2 error excess `2 error dimension dimension Filtering Sample covariance w/ noise RANSAC LRVCov Pruning
114 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown covariance, anisotropic): + 10% noise far from identity
115 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown covariance, anisotropic): excess `2 error excess `2 error dimension dimension Filtering Sample covariance w/ noise RANSAC LRVCov Pruning
116 REAL DATA EXPERIMENTS Famous study of [Novembreet al. 08]: Take top two singular vectors of people x SNP matrix (POPRES)
117 REAL DATA EXPERIMENTS Famous study of [Novembreet al. 08]: Take top two singular vectors of people x SNP matrix (POPRES) Original Data
118 REAL DATA EXPERIMENTS Famous study of [Novembreet al. 08]: Take top two singular vectors of people x SNP matrix (POPRES) Original Data
119 REAL DATA EXPERIMENTS Famous study of [Novembreet al. 08]: Take top two singular vectors of people x SNP matrix (POPRES) Original Data Genes Mirror Geography in Europe
120 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise?
121 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? % noise Pruning Projection What PCA finds
122 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? % noise Pruning Projection What PCA finds
123 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? % noise RANSAC Projection What RANSAC finds
124 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? 10% noise XCS Projection What robust PCA (via SDPs) finds
125 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? 10% noise Filter Projection What our methods find
126 REAL DATA EXPERIMENTS The power of provably robust estimation: 10% noise Filter Projection no noise Original Data What our methods find
127 LOOKING FORWARD Can algorithms for agnostically learning a Gaussian help in exploratory data analysis in high-dimensions?
128 LOOKING FORWARD Can algorithms for agnostically learning a Gaussian help in exploratory data analysis in high-dimensions? Isn t this what we would have been doing with robust statistical estimators, if we had them all along?
129 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions
130 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions
131 LIMITATIONS TO ROBUST ESTIMATION Theorem [Diakonikolas, Kane, Stewart 16]: Any statistical query learning* algorithm in the strong corruption model insertions and deletions that makes error must make at least queries
132 LIMITATIONS TO ROBUST ESTIMATION Theorem [Diakonikolas, Kane, Stewart 16]: Any statistical query learning* algorithm in the strong corruption model insertions and deletions that makes error must make at least queries * Instead of seeing samples directly, an algorithm queries a fnctn and gets expectation, up to sampling noise
133 LIMITATIONS TO ROBUST ESTIMATION Theorem [Diakonikolas, Kane, Stewart 16]: Any statistical query learning* algorithm in the strong corruption model insertions and deletions that makes error must make at least queries * Instead of seeing samples directly, an algorithm queries a fnctn and gets expectation, up to sampling noise This is a powerful but restricted class of algorithms
134 HANDLING MORE CORRUPTIONS What if an adversary is allowed to corrupt more than half of the samples?
135 HANDLING MORE CORRUPTIONS What if an adversary is allowed to corrupt more than half of the samples? Theorem [Charikar, Steinhardt, Valiant 17]: Given samples from a distribution with mean and covariance where have been corrupted, there is an algorithm that outputs with that satisfies
136 HANDLING MORE CORRUPTIONS What if an adversary is allowed to corrupt more than half of the samples? Theorem [Charikar, Steinhardt, Valiant 17]: Given samples from a distribution with mean and covariance where have been corrupted, there is an algorithm that outputs with that satisfies This extends to mixtures straightforwardly
137 SPARSE ROBUST ESTIMATION Can we improve the sample complexity with sparsity assumptions? Theorem [Li 17] [Du, Balakrishnan, Singh 17]: There is an algorithm, in the unknown k-sparse mean case achieves error with samples
138 SPARSE ROBUST ESTIMATION Can we improve the sample complexity with sparsity assumptions? Theorem [Li 17] [Du, Balakrishnan, Singh 17]: There is an algorithm, in the unknown k-sparse mean case achieves error with samples [Li 17] also studied robust sparse PCA
139 SPARSE ROBUST ESTIMATION Can we improve the sample complexity with sparsity assumptions? Theorem [Li 17] [Du, Balakrishnan, Singh 17]: There is an algorithm, in the unknown k-sparse mean case achieves error with samples [Li 17] also studied robust sparse PCA Is it possible to improve the sample complexity to or are there computational vs. statistical tradeoffs?
140 LOOKING FORWARD Can algorithms for agnostically learning a Gaussian help in exploratory data analysis in high-dimensions? Isn t this what we would have been doing with robust statistical estimators, if we had them all along?
141 LOOKING FORWARD Can algorithms for agnostically learning a Gaussian help in exploratory data analysis in high-dimensions? Isn t this what we would have been doing with robust statistical estimators, if we had them all along? What other fundamental tasks in high-dimensional statistics can be solved provably and robustly?
142 Summary: Dimension independent error bounds for robustly learning a Gaussian General recipe using restricted eigenvalue problems SQL lower bounds, handling more corruptions and sparse robust estimation Is practical, robust statistics within reach?
143 Summary: Dimension independent error bounds for robustly learning a Gaussian General recipe using restricted eigenvalue problems SQL lower bounds, handling more corruptions and sparse robust estimation Is practical, robust statistics within reach? Thanks! Any Questions?
Robust Statistics, Revisited
Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown
More informationComputational Lower Bounds for Statistical Estimation Problems
Computational Lower Bounds for Statistical Estimation Problems Ilias Diakonikolas (USC) (joint with Daniel Kane (UCSD) and Alistair Stewart (USC)) Workshop on Local Algorithms, MIT, June 2018 THIS TALK
More informationFaster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians. Constantinos Daskalakis, MIT Gautam Kamath, MIT
Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians Constantinos Daskalakis, MIT Gautam Kamath, MIT What s a Gaussian Mixture Model (GMM)? Interpretation 1: PDF is a convex
More informationComputationally Efficient Robust Estimation of Sparse Functionals
Computationally Efficient Robust Estimation of Sparse Functionals Simon S Du, Sivaraman Balakrishnan, and Aarti Singh Machine Learning Department, Carnegie Mellon University, USA Statistics Department,
More informationStatistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures (Extended Abstract)
58th Annual IEEE Symposium on Foundations of Computer Science Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures (Extended Abstract) Ilias Diakonikolas
More informationSharp bounds for learning a mixture of two Gaussians
Sharp bounds for learning a mixture of two Gaussians Moritz Hardt Eric Price IBM Almaden 2014-05-28 Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 1 / 25
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationA Practical Algorithm for Topic Modeling with Provable Guarantees
1 A Practical Algorithm for Topic Modeling with Provable Guarantees Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu Reviewed by Zhao Song December
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationFourier PCA. Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech)
Fourier PCA Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech) Introduction 1. Describe a learning problem. 2. Develop an efficient tensor decomposition. Independent component
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationapproximation algorithms I
SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 201 meta-task encoded as low-degree polynomial in R x example: f(x) = i,j n w ij x i x j 2 given: functions
More informationSum-of-Squares Method, Tensor Decomposition, Dictionary Learning
Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning David Steurer Cornell Approximation Algorithms and Hardness, Banff, August 2014 for many problems (e.g., all UG-hard ones): better guarantees
More informationLimitations in Approximating RIP
Alok Puranik Mentor: Adrian Vladu Fifth Annual PRIMES Conference, 2015 Outline 1 Background The Problem Motivation Construction Certification 2 Planted model Planting eigenvalues Analysis Distinguishing
More informationLearning from Untrusted Data
ABSTRACT Moses Charikar Stanford University Stanford, CA 94305, USA moses@cs.stanford.edu The vast majority of theoretical results in machine learning and statistics assume that the training data is a
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationarxiv: v1 [cs.lg] 7 Mar 2018
Sever: A Robust Meta-Algorithm for Stochastic Optimization arxiv:1803.02815v1 [cs.lg] 7 Mar 2018 Ilias Diakonikolas CS, USC diakonik@usc.edu Jerry Li EECS & CSAIL, MIT jerryzli@mit.edu Gautam Kamath EECS
More informationMaximum Margin Matrix Factorization
Maximum Margin Matrix Factorization Nati Srebro Toyota Technological Institute Chicago Joint work with Noga Alon Tel-Aviv Yonatan Amit Hebrew U Alex d Aspremont Princeton Michael Fink Hebrew U Tommi Jaakkola
More informationRobust polynomial regression up to the information theoretic limit
Robust polynomial regression up to the information theoretic limit Daniel Kane Sushrut Karmalkar Eric Price August 16, 017 Abstract We consider the problem of robust polynomial regression, where one receives
More informationTutorial on: Optimization I. (from a deep learning perspective) Jimmy Ba
Tutorial on: Optimization I (from a deep learning perspective) Jimmy Ba Outline Random search v.s. gradient descent Finding better search directions Design white-box optimization methods to improve computation
More informationLearning and Testing Structured Distributions
Learning and Testing Structured Distributions Ilias Diakonikolas University of Edinburgh Simons Institute March 2015 This Talk Algorithmic Framework for Distribution Estimation: Leads to fast & robust
More informationDisentangling Gaussians
Disentangling Gaussians Ankur Moitra, MIT November 6th, 2014 Dean s Breakfast Algorithmic Aspects of Machine Learning 2015 by Ankur Moitra. Note: These are unpolished, incomplete course notes. Developed
More informationThe Informativeness of k-means for Learning Mixture Models
The Informativeness of k-means for Learning Mixture Models Vincent Y. F. Tan (Joint work with Zhaoqiang Liu) National University of Singapore June 18, 2018 1/35 Gaussian distribution For F dimensions,
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationThe Pros and Cons of Compressive Sensing
The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal
More informationOn Learning Mixtures of Well-Separated Gaussians
On Learning Mixtures of Well-Separated Gaussians Oded Regev Aravindan Viayaraghavan Abstract We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationIgnoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization
Proceedings of Machine Learning Research vol 65:1 33, 017 Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization Daniel Vainsencher Voleon Shie Mannor Faculty of Electrical Engineering,
More informationImage Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang
Image Noise: Detection, Measurement and Removal Techniques Zhifei Zhang Outline Noise measurement Filter-based Block-based Wavelet-based Noise removal Spatial domain Transform domain Non-local methods
More informationA Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k
A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k Jerry Li MIT jerryzli@mit.edu Ludwig Schmidt MIT ludwigs@mit.edu June 27, 205 Abstract Learning
More informationLearning SVM Classifiers with Indefinite Kernels
Learning SVM Classifiers with Indefinite Kernels Suicheng Gu and Yuhong Guo Dept. of Computer and Information Sciences Temple University Support Vector Machines (SVMs) (Kernel) SVMs are widely used in
More informationSparse Gaussian Markov Random Field Mixtures for Anomaly Detection
Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection Tsuyoshi Idé ( Ide-san ), Ankush Khandelwal*, Jayant Kalagnanam IBM Research, T. J. Watson Research Center (*Currently with University
More informationSparse and Robust Optimization and Applications
Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability
More informationImproved Concentration Bounds for Count-Sketch
Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 Eric Price 2 1 MIT MSR New England 2 MIT IBM Almaden UT Austin 2014-01-06 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds
More informationProvable Alternating Minimization Methods for Non-convex Optimization
Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish
More informationThe Pros and Cons of Compressive Sensing
The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal
More informationSparse Additive Functional and kernel CCA
Sparse Additive Functional and kernel CCA Sivaraman Balakrishnan* Kriti Puniyani* John Lafferty *Carnegie Mellon University University of Chicago Presented by Miao Liu 5/3/2013 Canonical correlation analysis
More informationRandom Matrices: Invertibility, Structure, and Applications
Random Matrices: Invertibility, Structure, and Applications Roman Vershynin University of Michigan Colloquium, October 11, 2011 Roman Vershynin (University of Michigan) Random Matrices Colloquium 1 / 37
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More informationNonlinear Signal Processing ELEG 833
Nonlinear Signal Processing ELEG 833 Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware arce@ee.udel.edu May 5, 2005 8 MYRIAD SMOOTHERS 8 Myriad Smoothers 8.1 FLOM
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More informationInvertibility of random matrices
University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]
More informationRobust estimation of scale and covariance with P n and its application to precision matrix estimation
Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY
More informationSharp Generalization Error Bounds for Randomly-projected Classifiers
Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk
More informationCompressive Sensing and Beyond
Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered
More informationTight Bounds for Learning a Mixture of Two Gaussians
Tight Bounds for Learning a Mixture of Two Gaussians Moritz Hardt m@mrtz.org IBM Research Almaden Eric Price ecprice@cs.utexas.edu The University of Texas at Austin May 16, 2015 Abstract We consider the
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationExploring Granger Causality for Time series via Wald Test on Estimated Models with Guaranteed Stability
Exploring Granger Causality for Time series via Wald Test on Estimated Models with Guaranteed Stability Nuntanut Raksasri Jitkomut Songsiri Department of Electrical Engineering, Faculty of Engineering,
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationSub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments
Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments Stas Minsker University of Southern California July 21, 2016 ICERM Workshop Simple question: how to estimate
More informationGatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II
Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference
More informationSparse Subspace Clustering
Sparse Subspace Clustering Based on Sparse Subspace Clustering: Algorithm, Theory, and Applications by Elhamifar and Vidal (2013) Alex Gutierrez CSCI 8314 March 2, 2017 Outline 1 Motivation and Background
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationStatistical Geometry Processing Winter Semester 2011/2012
Statistical Geometry Processing Winter Semester 2011/2012 Linear Algebra, Function Spaces & Inverse Problems Vector and Function Spaces 3 Vectors vectors are arrows in space classically: 2 or 3 dim. Euclidian
More informationRelational Nonlinear FIR Filters. Ronald K. Pearson
Relational Nonlinear FIR Filters Ronald K. Pearson Daniel Baugh Institute for Functional Genomics and Computational Biology Thomas Jefferson University Philadelphia, PA Moncef Gabbouj Institute of Signal
More informationNonrobust and Robust Objective Functions
Nonrobust and Robust Objective Functions The objective function of the estimators in the input space is built from the sum of squared Mahalanobis distances (residuals) d 2 i = 1 σ 2(y i y io ) C + y i
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationA Novel M-Estimator for Robust PCA
Journal of Machine Learning Research 15 (2014) 749-808 Submitted 12/11; Revised 6/13; Published 2/14 A Novel M-Estimator for Robust PCA Teng Zhang Institute for Mathematics and its Applications University
More informationCIFAR Lectures: Non-Gaussian statistics and natural images
CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationFoundations For Learning in the Age of Big Data. Maria-Florina Balcan
Foundations For Learning in the Age of Big Data Maria-Florina Balcan Modern Machine Learning New applications Explosion of data Classic Paradigm Insufficient Nowadays Modern applications: massive amounts
More informationNoisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get
Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationECE 598: Representation Learning: Algorithms and Models Fall 2017
ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors
More informationGlobal Analysis of Expectation Maximization for Mixtures of Two Gaussians
Global Analysis of Expectation Maximization for Mixtures of Two Gaussians Ji Xu Columbia University jixu@cs.columbia.edu Daniel Hsu Columbia University djhsu@cs.columbia.edu Arian Maleki Columbia University
More informationGreedy Signal Recovery and Uniform Uncertainty Principles
Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationSum-of-Squares and Spectral Algorithms
Sum-of-Squares and Spectral Algorithms Tselil Schramm June 23, 2017 Workshop on SoS @ STOC 2017 Spectral algorithms as a tool for analyzing SoS. SoS Semidefinite Programs Spectral Algorithms SoS suggests
More informationBindel, Fall 2016 Matrix Computations (CS 6210) Notes for
1 A cautionary tale Notes for 2016-10-05 You have been dropped on a desert island with a laptop with a magic battery of infinite life, a MATLAB license, and a complete lack of knowledge of basic geometry.
More informationOptimization for Compressed Sensing
Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve
More informationEffective computation of biased quantiles over data streams
Effective computation of biased quantiles over data streams Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Flip Korn flip@research.att.com Divesh Srivastava divesh@research.att.com
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationMaximum Likelihood Estimation for Mixtures of Spherical Gaussians is NP-hard
Journal of Machine Learning Research 18 018 1-11 Submitted 1/16; Revised 1/16; Published 4/18 Maximum Likelihood Estimation for Mixtures of Spherical Gaussians is NP-hard Christopher Tosh Sanjoy Dasgupta
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationProvable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow
Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow Huishuai Zhang Department of EECS, Syracuse University, Syracuse, NY 3244 USA Yuejie Chi Department of ECE, The Ohio State
More informationIrr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland
Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS
More informationThe properties of L p -GMM estimators
The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion
More informationAugmented Reality VU numerical optimization algorithms. Prof. Vincent Lepetit
Augmented Reality VU numerical optimization algorithms Prof. Vincent Lepetit P3P: can exploit only 3 correspondences; DLT: difficult to exploit the knowledge of the internal parameters correctly, and the
More informationGraph Partitioning Using Random Walks
Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient
More informationWhy is the field of statistics still an active one?
Why is the field of statistics still an active one? It s obvious that one needs statistics: to describe experimental data in a compact way, to compare datasets, to ask whether data are consistent with
More informationCSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization
CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization April 6, 2018 1 / 34 This material is covered in the textbook, Chapters 9 and 10. Some of the materials are taken from it. Some of
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationAlgorithms Reading Group Notes: Provable Bounds for Learning Deep Representations
Algorithms Reading Group Notes: Provable Bounds for Learning Deep Representations Joshua R. Wang November 1, 2016 1 Model and Results Continuing from last week, we again examine provable algorithms for
More informationMULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES
MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES S. Visuri 1 H. Oja V. Koivunen 1 1 Signal Processing Lab. Dept. of Statistics Tampere Univ. of Technology University of Jyväskylä P.O.
More informationBetter Algorithms for Selective Sampling
Francesco Orabona Nicolò Cesa-Bianchi DSI, Università degli Studi di Milano, Italy francesco@orabonacom nicolocesa-bianchi@unimiit Abstract We study online algorithms for selective sampling that use regularized
More informationIndependent Component Analysis. PhD Seminar Jörgen Ungh
Independent Component Analysis PhD Seminar Jörgen Ungh Agenda Background a motivater Independence ICA vs. PCA Gaussian data ICA theory Examples Background & motivation The cocktail party problem Bla bla
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationEE16B Designing Information Devices and Systems II
EE6B Designing Information Devices and Systems II Lecture 9B Geometry of SVD, PCA Uniqueness of the SVD Find SVD of A 0 A 0 AA T 0 ) ) 0 0 ~u ~u 0 ~u ~u ~u ~u Uniqueness of the SVD Find SVD of A 0 A 0
More informationSettling the Polynomial Learnability of Mixtures of Gaussians
Settling the Polynomial Learnability of Mixtures of Gaussians Ankur Moitra Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 02139 moitra@mit.edu Gregory Valiant
More informationConvex and Semidefinite Programming for Approximation
Convex and Semidefinite Programming for Approximation We have seen linear programming based methods to solve NP-hard problems. One perspective on this is that linear programming is a meta-method since
More informationCPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018
CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,
More informationIndependent Component (IC) Models: New Extensions of the Multinormal Model
Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research
More informationLQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin
LQR, Kalman Filter, and LQG Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin May 2015 Linear Quadratic Regulator (LQR) Consider a linear system
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationConvex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)
ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality
More information