Robustness Meets Algorithms

Size: px

Start display at page:

Download "Robustness Meets Algorithms"

Tyler Dorsey
5 years ago
Views:

1 Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th

2 CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters?

3 CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters? Yes!

4 CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters? Yes! empirical mean: empirical variance:

5 R. A. Fisher The maximum likelihood estimator is asymptotically efficient ( )

6 R. A. Fisher J. W. Tukey The maximum likelihood estimator is asymptotically efficient ( ) What about errors in the model itself? (1960)

7 ROBUST STATISTICS What estimators behave well in a neighborhood around the model?

8 ROBUST STATISTICS What estimators behave well in a neighborhood around the model? Let s study a simple one-dimensional example.

9 ROBUST PARAMETER ESTIMATION Given corrupted samples from a 1-D Gaussian: + = ideal model noise observed model can we accurately estimate its parameters?

10 How do we constrain the noise?

11 How do we constrain the noise? Equivalently: L 1 -norm of noise at most O(ε)

12 How do we constrain the noise? Equivalently: L 1 -norm of noise at most O(ε) Arbitrarily corrupt O(ε)-fraction of samples (in expectation)

13 How do we constrain the noise? Equivalently: L 1 -norm of noise at most O(ε) Arbitrarily corrupt O(ε)-fraction of samples (in expectation) This generalizes Huber s Contamination Model: An adversary can add an ε-fraction of samples

14 How do we constrain the noise? Equivalently: L 1 -norm of noise at most O(ε) Arbitrarily corrupt O(ε)-fraction of samples (in expectation) This generalizes Huber s Contamination Model: An adversary can add an ε-fraction of samples Outliers: Points adversary has corrupted, Inliers: Points he hasn t

15 In what norm do we want the parameters to be close?

16 In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfsf(x) and g(x) is

17 In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfsf(x) and g(x) is From the bound on the L 1 -norm of the noise, we have: ideal observed

18 In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfsf(x) and g(x) is Goal: Find a 1-D Gaussian that satisfies estimate ideal

19 In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfsf(x) and g(x) is Equivalently, find a 1-D Gaussian that satisfies estimate observed

20 Do the empirical mean and empirical variance work?

21 Do the empirical mean and empirical variance work? No!

22 Do the empirical mean and empirical variance work? No! + = ideal model noise observed model

23 Do the empirical mean and empirical variance work? No! + = ideal model noise observed model A single corrupted sample can arbitrarily corrupt the estimates

24 Do the empirical mean and empirical variance work? No! + = ideal model noise observed model A single corrupted sample can arbitrarily corrupt the estimates But the median and median absolute deviation do work

25 Do the empirical mean and empirical variance work? No! + = ideal model noise observed model A single corrupted sample can arbitrarily corrupt the estimates But the median and median absolute deviation do work

26 Fact [Folklore]: Given samples from a distribution that are ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where

27 Fact [Folklore]: Given samples from a distribution that are ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where Also called (properly) agnostically learning a 1-D Gaussian

28 Fact [Folklore]: Given samples from a distribution that are ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where What about robust estimation in high-dimensions?

29 Fact [Folklore]: Given samples from a distribution that are ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where What about robust estimation in high-dimensions? e.g. microarrays with 10k genes

30 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions

31 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions

32 Main Problem: Given samples from a distribution that are ε-close in total variation distance to a d-dimensional Gaussian give an efficient algorithm to find parameters that satisfy

33 Main Problem: Given samples from a distribution that are ε-close in total variation distance to a d-dimensional Gaussian give an efficient algorithm to find parameters that satisfy Special Cases: (1) Unknown mean (2) Unknown covariance

34 A COMPENDIUM OF APPROACHES Unknown Mean Error Guarantee Running Time

35 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee Running Time

36 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time

37 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard

38 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median

39 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median poly(d,n)

40 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median O(ε d) poly(d,n)

41 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median O(ε d) poly(d,n) Tournament O(ε) N O(d)

42 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median O(ε d) poly(d,n) Tournament O(ε) N O(d) Pruning O(ε d) O(dN)

43 A COMPENDIUM OF APPROACHES Unknown Mean Error Guarantee Running Time Tukey Median O(ε) NP-Hard Geometric Median O(ε d) poly(d,n) Tournament O(ε) N O(d) Pruning O(ε d) O(dN)

44 The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension

45 The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension Equivalently: Computationally efficient estimators can only handle fraction of errors and get non-trivial (TV < 1) guarantees

46 The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension Equivalently: Computationally efficient estimators can only handle fraction of errors and get non-trivial (TV < 1) guarantees

47 The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension Equivalently: Computationally efficient estimators can only handle fraction of errors and get non-trivial (TV < 1) guarantees Is robust estimation algorithmically possible in high-dimensions?

48 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions

49 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions

50 RECENT RESULTS Robust estimation is high-dimensions is algorithmically possible! Theorem [Diakonikolas, Li, Kamath, Kane, Moitra, Stewart 16]: There is an algorithm when given samples from a distribution that is ε-close in total variation distance to a d-dimensional Gaussian finds parameters that satisfy Moreover the algorithm runs in time poly(n, d)

distribution that is ε-close in total variation distance to a d-dimensional Gaussian finds parameters that

51 RECENT RESULTS Robust estimation is high-dimensions is algorithmically possible! Theorem [Diakonikolas, Li, Kamath, Kane, Moitra, Stewart 16]: There is an algorithm when given samples from a distribution that is ε-close in total variation distance to a d-dimensional Gaussian finds parameters that satisfy Moreover the algorithm runs in time poly(n, d) Alternatively: Can approximate the Tukey median, etc, in interesting semi-random models

52 Independently and concurrently: Theorem [Lai, Rao, Vempala 16]: There is an algorithm when given samples from a distribution that is ε-close in total variation distance to a d-dimensional Gaussian finds parameters that satisfy Moreover the algorithm runs in time poly(n, d)

53 Independently and concurrently: Theorem [Lai, Rao, Vempala 16]: There is an algorithm when given samples from a distribution that is ε-close in total variation distance to a d-dimensional Gaussian finds parameters that satisfy Moreover the algorithm runs in time poly(n, d) When the covariance is bounded, this translates to:

54 A GENERAL RECIPE Robust estimation in high-dimensions: Step #1: Find an appropriate parameter distance Step #2: Detect when the naïve estimator has been compromised Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity

55 A GENERAL RECIPE Robust estimation in high-dimensions: Step #1: Find an appropriate parameter distance Step #2: Detect when the naïve estimator has been compromised Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity Let s see how this works for unknown mean

56 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions

57 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions

58 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians

59 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1)

This can be proven using Pinsker s Inequality and

60 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) This can be proven using Pinsker s Inequality and the well-known formula for KL-divergence between Gaussians

61 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1)

62 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) Corollary: If our estimate (in the unknown mean case) satisfies then

Corollary: If our estimate (in the unknown mean case)

63 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) Corollary: If our estimate (in the unknown mean case) satisfies then Our new goal is to be close in Euclidean distance

64 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions

65 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions

66 DETECTING CORRUPTIONS Step #2: Detect when the naïve estimator has been compromised

67 DETECTING CORRUPTIONS Step #2: Detect when the naïve estimator has been compromised = uncorrupted = corrupted

68 DETECTING CORRUPTIONS Step #2: Detect when the naïve estimator has been compromised = uncorrupted = corrupted There is a direction of large (> 1) variance

69 Key Lemma: If X 1, X 2, X N come from a distribution that is ε-close to and then for (1) (2) with probability at least 1-δ

70 Key Lemma: If X 1, X 2, X N come from a distribution that is ε-close to and then for (1) (2) with probability at least 1-δ Take-away: An adversary needs to mess up the second moment in order to corrupt the first moment

71 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions

72 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions

73 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers

74 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that:

75 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: v where v is the direction of largest variance

76 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: v where v is the direction of largest variance, and T has a formula

77 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: T v where v is the direction of largest variance, and T has a formula

78 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points

79 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we d have no corrupted points left!

80 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we d have no corrupted points left! Eventually we find (certifiably) good parameters

81 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we d have no corrupted points left! Eventually we find (certifiably) good parameters Running Time: Sample Complexity:

Suppose that: We can throw out more corrupted than

82 A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we d have no corrupted points left! Eventually we find (certifiably) good parameters Running Time: Sample Complexity: Concentration of LTFs

83 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions

84 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions

85 A GENERAL RECIPE Robust estimation in high-dimensions: Step #1: Find an appropriate parameter distance Step #2: Detect when the naïve estimator has been compromised Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity

86 A GENERAL RECIPE Robust estimation in high-dimensions: Step #1: Find an appropriate parameter distance Step #2: Detect when the naïve estimator has been compromised Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity How about for unknown covariance?

87 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians

88 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2)

89 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2) Again, proven using Pinsker s Inequality

90 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2) Again, proven using Pinsker s Inequality Our new goal is to find an estimate that satisfies:

91 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2) Again, proven using Pinsker s Inequality Our new goal is to find an estimate that satisfies: Distance seems strange, but it s the right one to use to bound TV

92 UNKNOWN COVARIANCE What if we are given samples from?

93 UNKNOWN COVARIANCE What if we are given samples from? How do we detect if the naïve estimator is compromised?

94 UNKNOWN COVARIANCE What if we are given samples from? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices

95 UNKNOWN COVARIANCE What if we are given samples from? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices Proof uses Isserlis s Theorem

96 UNKNOWN COVARIANCE What if we are given samples from? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices need to project out

97 Key Idea: Transform the data, look for restricted large eigenvalues

98 Key Idea: Transform the data, look for restricted large eigenvalues

99 Key Idea: Transform the data, look for restricted large eigenvalues If were the true covariance, we would have for inliers

100 Key Idea: Transform the data, look for restricted large eigenvalues If were the true covariance, we would have for inliers, in which case: would have small restricted eigenvalues

101 Key Idea: Transform the data, look for restricted large eigenvalues If were the true covariance, we would have for inliers, in which case: would have small restricted eigenvalues Take-away: An adversary needs to mess up the (restricted) fourth moment in order to corrupt the second moment

102 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian

103 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick

104 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance

105 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance Step #2: (Agnostic) isotropic position

106 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance Step #2: (Agnostic) isotropic position right distance, in general case

107 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance Step #2: (Agnostic) isotropic position right distance, in general case Now use algorithm for unknown mean

108 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions

109 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions

110 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown mean): + 10% noise

111 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown mean): excess `2 error excess `2 error dimension dimension Filtering Sample mean w/ noise RANSAC LRVMean Pruning Geometric Median

112 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown covariance, isotropic): + 10% noise close to identity

113 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown covariance, isotropic): excess `2 error excess `2 error dimension dimension Filtering Sample covariance w/ noise RANSAC LRVCov Pruning

114 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown covariance, anisotropic): + 10% noise far from identity

115 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown covariance, anisotropic): excess `2 error excess `2 error dimension dimension Filtering Sample covariance w/ noise RANSAC LRVCov Pruning

116 REAL DATA EXPERIMENTS Famous study of [Novembreet al. 08]: Take top two singular vectors of people x SNP matrix (POPRES)

117 REAL DATA EXPERIMENTS Famous study of [Novembreet al. 08]: Take top two singular vectors of people x SNP matrix (POPRES) Original Data

118 REAL DATA EXPERIMENTS Famous study of [Novembreet al. 08]: Take top two singular vectors of people x SNP matrix (POPRES) Original Data

119 REAL DATA EXPERIMENTS Famous study of [Novembreet al. 08]: Take top two singular vectors of people x SNP matrix (POPRES) Original Data Genes Mirror Geography in Europe

120 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise?

121 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? % noise Pruning Projection What PCA finds

122 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? % noise Pruning Projection What PCA finds

123 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? % noise RANSAC Projection What RANSAC finds

124 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? 10% noise XCS Projection What robust PCA (via SDPs) finds

125 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? 10% noise Filter Projection What our methods find

126 REAL DATA EXPERIMENTS The power of provably robust estimation: 10% noise Filter Projection no noise Original Data What our methods find

127 LOOKING FORWARD Can algorithms for agnostically learning a Gaussian help in exploratory data analysis in high-dimensions?

128 LOOKING FORWARD Can algorithms for agnostically learning a Gaussian help in exploratory data analysis in high-dimensions? Isn t this what we would have been doing with robust statistical estimators, if we had them all along?

129 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions

130 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Recent Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised A Win-Win Algorithm Unknown Covariance Part III: Experiments Part IV: Extensions

131 LIMITATIONS TO ROBUST ESTIMATION Theorem [Diakonikolas, Kane, Stewart 16]: Any statistical query learning* algorithm in the strong corruption model insertions and deletions that makes error must make at least queries

132 LIMITATIONS TO ROBUST ESTIMATION Theorem [Diakonikolas, Kane, Stewart 16]: Any statistical query learning* algorithm in the strong corruption model insertions and deletions that makes error must make at least queries * Instead of seeing samples directly, an algorithm queries a fnctn and gets expectation, up to sampling noise

deletions that makes error must make at least

133 LIMITATIONS TO ROBUST ESTIMATION Theorem [Diakonikolas, Kane, Stewart 16]: Any statistical query learning* algorithm in the strong corruption model insertions and deletions that makes error must make at least queries * Instead of seeing samples directly, an algorithm queries a fnctn and gets expectation, up to sampling noise This is a powerful but restricted class of algorithms

134 HANDLING MORE CORRUPTIONS What if an adversary is allowed to corrupt more than half of the samples?

135 HANDLING MORE CORRUPTIONS What if an adversary is allowed to corrupt more than half of the samples? Theorem [Charikar, Steinhardt, Valiant 17]: Given samples from a distribution with mean and covariance where have been corrupted, there is an algorithm that outputs with that satisfies

136 HANDLING MORE CORRUPTIONS What if an adversary is allowed to corrupt more than half of the samples? Theorem [Charikar, Steinhardt, Valiant 17]: Given samples from a distribution with mean and covariance where have been corrupted, there is an algorithm that outputs with that satisfies This extends to mixtures straightforwardly

137 SPARSE ROBUST ESTIMATION Can we improve the sample complexity with sparsity assumptions? Theorem [Li 17] [Du, Balakrishnan, Singh 17]: There is an algorithm, in the unknown k-sparse mean case achieves error with samples

138 SPARSE ROBUST ESTIMATION Can we improve the sample complexity with sparsity assumptions? Theorem [Li 17] [Du, Balakrishnan, Singh 17]: There is an algorithm, in the unknown k-sparse mean case achieves error with samples [Li 17] also studied robust sparse PCA

PCA Is it possible to improve the sample

139 SPARSE ROBUST ESTIMATION Can we improve the sample complexity with sparsity assumptions? Theorem [Li 17] [Du, Balakrishnan, Singh 17]: There is an algorithm, in the unknown k-sparse mean case achieves error with samples [Li 17] also studied robust sparse PCA Is it possible to improve the sample complexity to or are there computational vs. statistical tradeoffs?

140 LOOKING FORWARD Can algorithms for agnostically learning a Gaussian help in exploratory data analysis in high-dimensions? Isn t this what we would have been doing with robust statistical estimators, if we had them all along?

141 LOOKING FORWARD Can algorithms for agnostically learning a Gaussian help in exploratory data analysis in high-dimensions? Isn t this what we would have been doing with robust statistical estimators, if we had them all along? What other fundamental tasks in high-dimensional statistics can be solved provably and robustly?

142 Summary: Dimension independent error bounds for robustly learning a Gaussian General recipe using restricted eigenvalue problems SQL lower bounds, handling more corruptions and sparse robust estimation Is practical, robust statistics within reach?

143 Summary: Dimension independent error bounds for robustly learning a Gaussian General recipe using restricted eigenvalue problems SQL lower bounds, handling more corruptions and sparse robust estimation Is practical, robust statistics within reach? Thanks! Any Questions?

Robust Statistics, Revisited

Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown