Robust Statistics, Revisited

Size: px
Start display at page:

Download "Robust Statistics, Revisited"

Transcription

1 Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart

2 CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters?

3 CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters? Yes!

4 CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters? Yes! empirical mean: empirical variance:

5 R. A. Fisher The maximum likelihood estimator is asymptotically efficient ( )

6 R. A. Fisher J. W. Tukey The maximum likelihood estimator is asymptotically efficient ( ) What about errors in the model itself? (1960)

7 ROBUST STATISTICS What estimators behave well in a neighborhood around the model?

8 ROBUST STATISTICS What estimators behave well in a neighborhood around the model? Let s study a simple one-dimensional example.

9 ROBUST PARAMETER ESTIMATION Given corrupted samples from a 1-D Gaussian: + = ideal model noise observed model can we accurately estimate its parameters?

10 How do we constrain the noise?

11 How do we constrain the noise? Equivalently: L 1 -norm of noise at most O(ε)

12 How do we constrain the noise? Equivalently: L 1 -norm of noise at most O(ε) Arbitrarily corrupt O(ε)-fraction of samples (in expectation)

13 How do we constrain the noise? Equivalently: L 1 -norm of noise at most O(ε) Arbitrarily corrupt O(ε)-fraction of samples (in expectation) This generalizes Huber s Contamination Model: An adversary can add an ε-fraction of samples

14 How do we constrain the noise? Equivalently: L 1 -norm of noise at most O(ε) Arbitrarily corrupt O(ε)-fraction of samples (in expectation) This generalizes Huber s Contamination Model: An adversary can add an ε-fraction of samples Outliers: Points adversary has corrupted, Inliers: Points he hasn t

15 In what norm do we want the parameters to be close?

16 In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfsf(x) and g(x) is

17 In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfsf(x) and g(x) is From the bound on the L 1 -norm of the noise, we have: ideal observed

18 In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfsf(x) and g(x) is Goal: Find a 1-D Gaussian that satisfies estimate ideal

19 In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfsf(x) and g(x) is Equivalently, find a 1-D Gaussian that satisfies estimate observed

20 Do the empirical mean and empirical variance work?

21 Do the empirical mean and empirical variance work? No!

22 Do the empirical mean and empirical variance work? No! + = ideal model noise observed model

23 Do the empirical mean and empirical variance work? No! + = ideal model noise observed model A single corrupted sample can arbitrarily corrupt the estimates

24 Do the empirical mean and empirical variance work? No! + = ideal model noise observed model A single corrupted sample can arbitrarily corrupt the estimates But the median and median absolute deviation do work

25 Do the empirical mean and empirical variance work? No! + = ideal model noise observed model A single corrupted sample can arbitrarily corrupt the estimates But the median and median absolute deviation do work

26 Fact [Folklore]: Given samples from a distribution that are ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where

27 Fact [Folklore]: Given samples from a distribution that are ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where Also called (properly) agnostically learning a 1-D Gaussian

28 Fact [Folklore]: Given samples from a distribution that are ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where What about robust estimation in high-dimensions?

29 Fact [Folklore]: Given samples from a distribution that are ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where What about robust estimation in high-dimensions? e.g. microarrays with 10k genes

30 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions

31 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions

32 Main Problem: Given samples from a distribution that are ε-close in total variation distance to a d-dimensional Gaussian give an efficient algorithm to find parameters that satisfy

33 Main Problem: Given samples from a distribution that are ε-close in total variation distance to a d-dimensional Gaussian give an efficient algorithm to find parameters that satisfy Special Cases: (1) Unknown mean (2) Unknown covariance

34 A COMPENDIUM OF APPROACHES Unknown Mean Error Guarantee Running Time

35 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee Running Time

36 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time

37 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard

38 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median

39 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median poly(d,n)

40 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median O(ε d) poly(d,n)

41 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median O(ε d) poly(d,n) Tournament O(ε) N O(d)

42 A COMPENDIUM OF APPROACHES Unknown Mean Tukey Median Error Guarantee O(ε) Running Time NP-Hard Geometric Median O(ε d) poly(d,n) Tournament O(ε) N O(d) Pruning O(ε d) O(dN)

43 A COMPENDIUM OF APPROACHES Unknown Mean Error Guarantee Running Time Tukey Median O(ε) NP-Hard Geometric Median O(ε d) poly(d,n) Tournament O(ε) N O(d) Pruning O(ε d) O(dN)

44 The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension

45 The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension Equivalently: Computationally efficient estimators can only handle fraction of errors and get non-trivial (TV < 1) guarantees

46 The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension Equivalently: Computationally efficient estimators can only handle fraction of errors and get non-trivial (TV < 1) guarantees

47 The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension Equivalently: Computationally efficient estimators can only handle fraction of errors and get non-trivial (TV < 1) guarantees Is robust estimation algorithmically possible in high-dimensions?

48 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions

49 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions

50 OUR RESULTS Robust estimation is high-dimensions is algorithmically possible! Theorem [Diakonikolas, Li, Kamath, Kane, Moitra, Stewart 16]: There is an algorithm when given samples from a distribution that is ε-close in total variation distance to a d-dimensional Gaussian finds parameters that satisfy Moreover the algorithm runs in time poly(n, d)

51 OUR RESULTS Robust estimation is high-dimensions is algorithmically possible! Theorem [Diakonikolas, Li, Kamath, Kane, Moitra, Stewart 16]: There is an algorithm when given samples from a distribution that is ε-close in total variation distance to a d-dimensional Gaussian finds parameters that satisfy Moreover the algorithm runs in time poly(n, d) Alternatively: Can approximate the Tukey median, etc, in interesting semi-random models

52 Simultaneously [Lai, Rao, Vempala 16] gave agnostic algorithms that achieve: and work for non-gaussian distributions too

53 Simultaneously [Lai, Rao, Vempala 16] gave agnostic algorithms that achieve: and work for non-gaussian distributions too Many other applications across both papers: product distributions, mixtures of spherical Gaussians, SVD, ICA

54 A GENERAL RECIPE Robust estimation in high-dimensions: Step #1: Find an appropriate parameter distance Step #2: Detect when the naïve estimator has been compromised Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity

55 A GENERAL RECIPE Robust estimation in high-dimensions: Step #1: Find an appropriate parameter distance Step #2: Detect when the naïve estimator has been compromised Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity Let s see how this works for unknown mean

56 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions

57 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions

58 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians

59 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1)

60 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) This can be proven using Pinsker s Inequality and the well-known formula for KL-divergence between Gaussians

61 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1)

62 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) Corollary: If our estimate (in the unknown mean case) satisfies then

63 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) Corollary: If our estimate (in the unknown mean case) satisfies then Our new goal is to be close in Euclidean distance

64 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions

65 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions

66 DETECTING CORRUPTIONS Step #2: Detect when the naïve estimator has been compromised

67 DETECTING CORRUPTIONS Step #2: Detect when the naïve estimator has been compromised = uncorrupted = corrupted

68 DETECTING CORRUPTIONS Step #2: Detect when the naïve estimator has been compromised = uncorrupted = corrupted There is a direction of large (> 1) variance

69 Key Lemma: If X 1, X 2, X N come from a distribution that is ε-close to and then for (1) (2) with probability at least 1-δ

70 Key Lemma: If X 1, X 2, X N come from a distribution that is ε-close to and then for (1) (2) with probability at least 1-δ Take-away: An adversary needs to mess up the second moment in order to corrupt the first moment

71 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions

72 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions

73 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers

74 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that:

75 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: v where v is the direction of largest variance

76 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: v where v is the direction of largest variance, and T has a formula

77 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: T v where v is the direction of largest variance, and T has a formula

78 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points

79 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we d have no corrupted points left!

80 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we d have no corrupted points left! Eventually we find (certifiably) good parameters

81 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we d have no corrupted points left! Eventually we find (certifiably) good parameters Running Time: Sample Complexity:

82 OUR ALGORITHM(S) Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we d have no corrupted points left! Eventually we find (certifiably) good parameters Running Time: Sample Complexity: Concentration of LTFs

83 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions

84 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions

85 A GENERAL RECIPE Robust estimation in high-dimensions: Step #1: Find an appropriate parameter distance Step #2: Detect when the naïve estimator has been compromised Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity

86 A GENERAL RECIPE Robust estimation in high-dimensions: Step #1: Find an appropriate parameter distance Step #2: Detect when the naïve estimator has been compromised Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity How about for unknown covariance?

87 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians

88 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2)

89 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2) Again, proven using Pinsker s Inequality

90 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2) Again, proven using Pinsker s Inequality Our new goal is to find an estimate that satisfies:

91 PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2) Again, proven using Pinsker s Inequality Our new goal is to find an estimate that satisfies: Distance seems strange, but it s the right one to use to bound TV

92 UNKNOWN COVARIANCE What if we are given samples from?

93 UNKNOWN COVARIANCE What if we are given samples from? How do we detect if the naïve estimator is compromised?

94 UNKNOWN COVARIANCE What if we are given samples from? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices

95 UNKNOWN COVARIANCE What if we are given samples from? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices Proof uses Isserlis s Theorem

96 UNKNOWN COVARIANCE What if we are given samples from? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices need to project out

97 Key Idea: Transform the data, look for restricted large eigenvalues

98 Key Idea: Transform the data, look for restricted large eigenvalues

99 Key Idea: Transform the data, look for restricted large eigenvalues If were the true covariance, we would have for inliers

100 Key Idea: Transform the data, look for restricted large eigenvalues If were the true covariance, we would have for inliers, in which case: would have small restricted eigenvalues

101 Key Idea: Transform the data, look for restricted large eigenvalues If were the true covariance, we would have for inliers, in which case: would have small restricted eigenvalues Take-away: An adversary needs to mess up the (restricted) fourth moment in order to corrupt the second moment

102 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian

103 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick

104 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance

105 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance Step #2: (Agnostic) isotropic position

106 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance Step #2: (Agnostic) isotropic position right distance, in general case

107 ASSEMBLING THE ALGORITHM Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance Step #2: (Agnostic) isotropic position right distance, in general case Now use algorithm for unknown mean

108 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions

109 OUTLINE Part I: Introduction Robust Estimation in One-dimension Robustness vs. Hardness in High-dimensions Our Results Part II: Agnostically Learning a Gaussian Parameter Distance Detecting When an Estimator is Compromised Filtering and Convex Programming Unknown Covariance Part III: Experiments and Extensions

110 FURTHER RESULTS Use restricted eigenvalue problems to detect outliers

111 FURTHER RESULTS Use restricted eigenvalue problems to detect outliers Binary Product Distributions:

112 FURTHER RESULTS Use restricted eigenvalue problems to detect outliers Binary Product Distributions: Mixtures of Two c-balanced Binary Product Distributions:

113 FURTHER RESULTS Use restricted eigenvalue problems to detect outliers Binary Product Distributions: Mixtures of Two c-balanced Binary Product Distributions: Mixtures of k Spherical Gaussians:

114 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown mean): + 10% noise

115 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown mean): excess `2 error excess `2 error dimension dimension Filtering Sample mean w/ noise RANSAC LRVMean Pruning Geometric Median

116 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown covariance, isotropic): + 10% noise close to identity

117 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown covariance, isotropic): excess `2 error excess `2 error dimension dimension Filtering Sample covariance w/ noise RANSAC LRVCov Pruning

118 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown covariance, anisotropic): + 10% noise far from identity

119 SYNTHETIC EXPERIMENTS Error rates on synthetic data (unknown covariance, anisotropic): excess `2 error excess `2 error dimension dimension Filtering Sample covariance w/ noise RANSAC LRVCov Pruning

120 REAL DATA EXPERIMENTS Famous study of [Novembreet al. 08]: Take top two singular vectors of people x SNP matrix (POPRES)

121 REAL DATA EXPERIMENTS Famous study of [Novembreet al. 08]: Take top two singular vectors of people x SNP matrix (POPRES) Original Data

122 REAL DATA EXPERIMENTS Famous study of [Novembreet al. 08]: Take top two singular vectors of people x SNP matrix (POPRES) Original Data

123 REAL DATA EXPERIMENTS Famous study of [Novembreet al. 08]: Take top two singular vectors of people x SNP matrix (POPRES) Original Data Genes Mirror Geography in Europe

124 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise?

125 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? % noise Pruning Projection What PCA finds

126 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? % noise Pruning Projection What PCA finds

127 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? % noise RANSAC Projection What RANSAC finds

128 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? 10% noise XCS Projection What robust PCA (via SDPs) finds

129 REAL DATA EXPERIMENTS Can we find such patterns in the presence of noise? 10% noise Filter Projection What our methods find

130 REAL DATA EXPERIMENTS The power of provably robust estimation: 10% noise Filter Projection no noise Original Data What our methods find

131 LOOKING FORWARD Can algorithms for agnostically learning a Gaussian help in exploratory data analysis in high-dimensions?

132 LOOKING FORWARD Can algorithms for agnostically learning a Gaussian help in exploratory data analysis in high-dimensions? Isn t this what we would have been doing with robust statistical estimators, if we had them all along?

133 Summary: Nearly optimal algorithm for agnostically learning a high-dimensional Gaussian General recipe using restricted eigenvalue problems Further applications to other mixture models Is practical, robust statistics within reach? Thanks! Any Questions?

Robustness Meets Algorithms

Robustness Meets Algorithms Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately

More information

Computational Lower Bounds for Statistical Estimation Problems

Computational Lower Bounds for Statistical Estimation Problems Computational Lower Bounds for Statistical Estimation Problems Ilias Diakonikolas (USC) (joint with Daniel Kane (UCSD) and Alistair Stewart (USC)) Workshop on Local Algorithms, MIT, June 2018 THIS TALK

More information

Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians. Constantinos Daskalakis, MIT Gautam Kamath, MIT

Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians. Constantinos Daskalakis, MIT Gautam Kamath, MIT Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians Constantinos Daskalakis, MIT Gautam Kamath, MIT What s a Gaussian Mixture Model (GMM)? Interpretation 1: PDF is a convex

More information

Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures (Extended Abstract)

Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures (Extended Abstract) 58th Annual IEEE Symposium on Foundations of Computer Science Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures (Extended Abstract) Ilias Diakonikolas

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Computationally Efficient Robust Estimation of Sparse Functionals

Computationally Efficient Robust Estimation of Sparse Functionals Computationally Efficient Robust Estimation of Sparse Functionals Simon S Du, Sivaraman Balakrishnan, and Aarti Singh Machine Learning Department, Carnegie Mellon University, USA Statistics Department,

More information

The Informativeness of k-means for Learning Mixture Models

The Informativeness of k-means for Learning Mixture Models The Informativeness of k-means for Learning Mixture Models Vincent Y. F. Tan (Joint work with Zhaoqiang Liu) National University of Singapore June 18, 2018 1/35 Gaussian distribution For F dimensions,

More information

Fourier PCA. Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech)

Fourier PCA. Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech) Fourier PCA Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech) Introduction 1. Describe a learning problem. 2. Develop an efficient tensor decomposition. Independent component

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Sharp bounds for learning a mixture of two Gaussians

Sharp bounds for learning a mixture of two Gaussians Sharp bounds for learning a mixture of two Gaussians Moritz Hardt Eric Price IBM Almaden 2014-05-28 Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 1 / 25

More information

Robust polynomial regression up to the information theoretic limit

Robust polynomial regression up to the information theoretic limit Robust polynomial regression up to the information theoretic limit Daniel Kane Sushrut Karmalkar Eric Price August 16, 017 Abstract We consider the problem of robust polynomial regression, where one receives

More information

A Practical Algorithm for Topic Modeling with Provable Guarantees

A Practical Algorithm for Topic Modeling with Provable Guarantees 1 A Practical Algorithm for Topic Modeling with Provable Guarantees Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu Reviewed by Zhao Song December

More information

Learning SVM Classifiers with Indefinite Kernels

Learning SVM Classifiers with Indefinite Kernels Learning SVM Classifiers with Indefinite Kernels Suicheng Gu and Yuhong Guo Dept. of Computer and Information Sciences Temple University Support Vector Machines (SVMs) (Kernel) SVMs are widely used in

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k

A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k Jerry Li MIT jerryzli@mit.edu Ludwig Schmidt MIT ludwigs@mit.edu June 27, 205 Abstract Learning

More information

ECE 598: Representation Learning: Algorithms and Models Fall 2017

ECE 598: Representation Learning: Algorithms and Models Fall 2017 ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

On Learning Mixtures of Well-Separated Gaussians

On Learning Mixtures of Well-Separated Gaussians On Learning Mixtures of Well-Separated Gaussians Oded Regev Aravindan Viayaraghavan Abstract We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the

More information

Disentangling Gaussians

Disentangling Gaussians Disentangling Gaussians Ankur Moitra, MIT November 6th, 2014 Dean s Breakfast Algorithmic Aspects of Machine Learning 2015 by Ankur Moitra. Note: These are unpolished, incomplete course notes. Developed

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Limitations in Approximating RIP

Limitations in Approximating RIP Alok Puranik Mentor: Adrian Vladu Fifth Annual PRIMES Conference, 2015 Outline 1 Background The Problem Motivation Construction Certification 2 Planted model Planting eigenvalues Analysis Distinguishing

More information

EE16B Designing Information Devices and Systems II

EE16B Designing Information Devices and Systems II EE6B Designing Information Devices and Systems II Lecture 9B Geometry of SVD, PCA Uniqueness of the SVD Find SVD of A 0 A 0 AA T 0 ) ) 0 0 ~u ~u 0 ~u ~u ~u ~u Uniqueness of the SVD Find SVD of A 0 A 0

More information

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions

More information

Nonlinear Signal Processing ELEG 833

Nonlinear Signal Processing ELEG 833 Nonlinear Signal Processing ELEG 833 Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware arce@ee.udel.edu May 5, 2005 8 MYRIAD SMOOTHERS 8 Myriad Smoothers 8.1 FLOM

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

EE16B Designing Information Devices and Systems II

EE16B Designing Information Devices and Systems II EE16B Designing Information Devices and Systems II Lecture 9A Geometry of SVD, PCA Intro Last time: Described the SVD in Compact matrix form: U1SV1 T Full form: UΣV T Showed a procedure to SVD via A T

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments

Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments Stas Minsker University of Southern California July 21, 2016 ICERM Workshop Simple question: how to estimate

More information

FuncICA for time series pattern discovery

FuncICA for time series pattern discovery FuncICA for time series pattern discovery Nishant Mehta and Alexander Gray Georgia Institute of Technology The problem Given a set of inherently continuous time series (e.g. EEG) Find a set of patterns

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata ' / PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE Noboru Murata Waseda University Department of Electrical Electronics and Computer Engineering 3--

More information

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference

More information

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Maximum Margin Matrix Factorization

Maximum Margin Matrix Factorization Maximum Margin Matrix Factorization Nati Srebro Toyota Technological Institute Chicago Joint work with Noga Alon Tel-Aviv Yonatan Amit Hebrew U Alex d Aspremont Princeton Michael Fink Hebrew U Tommi Jaakkola

More information

Relational Nonlinear FIR Filters. Ronald K. Pearson

Relational Nonlinear FIR Filters. Ronald K. Pearson Relational Nonlinear FIR Filters Ronald K. Pearson Daniel Baugh Institute for Functional Genomics and Computational Biology Thomas Jefferson University Philadelphia, PA Moncef Gabbouj Institute of Signal

More information

CIFAR Lectures: Non-Gaussian statistics and natural images

CIFAR Lectures: Non-Gaussian statistics and natural images CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity

More information

Learning and Testing Structured Distributions

Learning and Testing Structured Distributions Learning and Testing Structured Distributions Ilias Diakonikolas University of Edinburgh Simons Institute March 2015 This Talk Algorithmic Framework for Distribution Estimation: Leads to fast & robust

More information

Image Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang

Image Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang Image Noise: Detection, Measurement and Removal Techniques Zhifei Zhang Outline Noise measurement Filter-based Block-based Wavelet-based Noise removal Spatial domain Transform domain Non-local methods

More information

Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization

Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization Proceedings of Machine Learning Research vol 65:1 33, 017 Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization Daniel Vainsencher Voleon Shie Mannor Faculty of Electrical Engineering,

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Random Matrices: Invertibility, Structure, and Applications

Random Matrices: Invertibility, Structure, and Applications Random Matrices: Invertibility, Structure, and Applications Roman Vershynin University of Michigan Colloquium, October 11, 2011 Roman Vershynin (University of Michigan) Random Matrices Colloquium 1 / 37

More information

Scalable robust hypothesis tests using graphical models

Scalable robust hypothesis tests using graphical models Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either

More information

Independent Component Analysis. PhD Seminar Jörgen Ungh

Independent Component Analysis. PhD Seminar Jörgen Ungh Independent Component Analysis PhD Seminar Jörgen Ungh Agenda Background a motivater Independence ICA vs. PCA Gaussian data ICA theory Examples Background & motivation The cocktail party problem Bla bla

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Statistical Inference On the High-dimensional Gaussian Covarianc

Statistical Inference On the High-dimensional Gaussian Covarianc Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS

More information

Linear Dependent Dimensionality Reduction

Linear Dependent Dimensionality Reduction Linear Dependent Dimensionality Reduction Nathan Srebro Tommi Jaakkola Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 239 nati@mit.edu,tommi@ai.mit.edu

More information

1 Distribution Property Testing

1 Distribution Property Testing ECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data Instructor: Jayadev Acharya Lecture #10-11 Scribe: JA 27, 29 September, 2016 Please send errors to acharya@cornell.edu 1 Distribution

More information

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning David Steurer Cornell Approximation Algorithms and Hardness, Banff, August 2014 for many problems (e.g., all UG-hard ones): better guarantees

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

A Generalization of Principal Component Analysis to the Exponential Family

A Generalization of Principal Component Analysis to the Exponential Family A Generalization of Principal Component Analysis to the Exponential Family Michael Collins Sanjoy Dasgupta Robert E. Schapire AT&T Labs Research 8 Park Avenue, Florham Park, NJ 7932 mcollins, dasgupta,

More information

An Introduction to Spectral Learning

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,

More information

arxiv: v1 [cs.lg] 7 Mar 2018

arxiv: v1 [cs.lg] 7 Mar 2018 Sever: A Robust Meta-Algorithm for Stochastic Optimization arxiv:1803.02815v1 [cs.lg] 7 Mar 2018 Ilias Diakonikolas CS, USC diakonik@usc.edu Jerry Li EECS & CSAIL, MIT jerryzli@mit.edu Gautam Kamath EECS

More information

Nonrobust and Robust Objective Functions

Nonrobust and Robust Objective Functions Nonrobust and Robust Objective Functions The objective function of the estimators in the input space is built from the sum of squared Mahalanobis distances (residuals) d 2 i = 1 σ 2(y i y io ) C + y i

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Independent Component Analysis

Independent Component Analysis A Short Introduction to Independent Component Analysis Aapo Hyvärinen Helsinki Institute for Information Technology and Depts of Computer Science and Psychology University of Helsinki Problem of blind

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Independent Component (IC) Models: New Extensions of the Multinormal Model

Independent Component (IC) Models: New Extensions of the Multinormal Model Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research

More information

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018 CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,

More information

Supplemental Materials for. Local Multidimensional Scaling for. Nonlinear Dimension Reduction, Graph Drawing. and Proximity Analysis

Supplemental Materials for. Local Multidimensional Scaling for. Nonlinear Dimension Reduction, Graph Drawing. and Proximity Analysis Supplemental Materials for Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Drawing and Proximity Analysis Lisha Chen and Andreas Buja Yale University and University of Pennsylvania

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fourth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada Front ice Hall PRENTICE HALL Upper Saddle River, New Jersey 07458 Preface

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

approximation algorithms I

approximation algorithms I SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 201 meta-task encoded as low-degree polynomial in R x example: f(x) = i,j n w ij x i x j 2 given: functions

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Augmented Reality VU numerical optimization algorithms. Prof. Vincent Lepetit

Augmented Reality VU numerical optimization algorithms. Prof. Vincent Lepetit Augmented Reality VU numerical optimization algorithms Prof. Vincent Lepetit P3P: can exploit only 3 correspondences; DLT: difficult to exploit the knowledge of the internal parameters correctly, and the

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fifth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada International Edition contributions by Telagarapu Prabhakar Department

More information

Lecture 14 October 13

Lecture 14 October 13 STAT 383C: Statistical Modeling I Fall 2015 Lecture 14 October 13 Lecturer: Purnamrita Sarkar Scribe: Some one Disclaimer: These scribe notes have been slightly proofread and may have typos etc. Note:

More information

Learning from Untrusted Data

Learning from Untrusted Data ABSTRACT Moses Charikar Stanford University Stanford, CA 94305, USA moses@cs.stanford.edu The vast majority of theoretical results in machine learning and statistics assume that the training data is a

More information

Network Localization via Schatten Quasi-Norm Minimization

Network Localization via Schatten Quasi-Norm Minimization Network Localization via Schatten Quasi-Norm Minimization Anthony Man-Cho So Department of Systems Engineering & Engineering Management The Chinese University of Hong Kong (Joint Work with Senshan Ji Kam-Fung

More information

S-estimators in mapping applications

S-estimators in mapping applications S-estimators in mapping applications João Sequeira Instituto Superior Técnico, Technical University of Lisbon Portugal Email: joaosilvasequeira@istutlpt Antonios Tsourdos Autonomous Systems Group, Department

More information

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Robust estimation of scale and covariance with P n and its application to precision matrix estimation Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY

More information

Asymptotic distribution of the largest eigenvalue with application to genetic data

Asymptotic distribution of the largest eigenvalue with application to genetic data Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene

More information

Statistical Geometry Processing Winter Semester 2011/2012

Statistical Geometry Processing Winter Semester 2011/2012 Statistical Geometry Processing Winter Semester 2011/2012 Linear Algebra, Function Spaces & Inverse Problems Vector and Function Spaces 3 Vectors vectors are arrows in space classically: 2 or 3 dim. Euclidian

More information

LUCK S THEOREM ALEX WRIGHT

LUCK S THEOREM ALEX WRIGHT LUCK S THEOREM ALEX WRIGHT Warning: These are the authors personal notes for a talk in a learning seminar (October 2015). There may be incorrect or misleading statements. Corrections welcome. 1. Convergence

More information

LQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin

LQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin LQR, Kalman Filter, and LQG Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin May 2015 Linear Quadratic Regulator (LQR) Consider a linear system

More information

cross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way.

cross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way. 10-708: Probabilistic Graphical Models, Spring 2015 22 : Optimization and GMs (aka. LDA, Sparse Coding, Matrix Factorization, and All That ) Lecturer: Yaoliang Yu Scribes: Yu-Xiang Wang, Su Zhou 1 Introduction

More information

Appendix A. Proof to Theorem 1

Appendix A. Proof to Theorem 1 Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation

More information

A Novel M-Estimator for Robust PCA

A Novel M-Estimator for Robust PCA Journal of Machine Learning Research 15 (2014) 749-808 Submitted 12/11; Revised 6/13; Published 2/14 A Novel M-Estimator for Robust PCA Teng Zhang Institute for Mathematics and its Applications University

More information

Sum-of-Squares and Spectral Algorithms

Sum-of-Squares and Spectral Algorithms Sum-of-Squares and Spectral Algorithms Tselil Schramm June 23, 2017 Workshop on SoS @ STOC 2017 Spectral algorithms as a tool for analyzing SoS. SoS Semidefinite Programs Spectral Algorithms SoS suggests

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of Index* The Statistical Analysis of Time Series by T. W. Anderson Copyright 1971 John Wiley & Sons, Inc. Aliasing, 387-388 Autoregressive {continued) Amplitude, 4, 94 case of first-order, 174 Associated

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization April 6, 2018 1 / 34 This material is covered in the textbook, Chapters 9 and 10. Some of the materials are taken from it. Some of

More information

Tutorial on: Optimization I. (from a deep learning perspective) Jimmy Ba

Tutorial on: Optimization I. (from a deep learning perspective) Jimmy Ba Tutorial on: Optimization I (from a deep learning perspective) Jimmy Ba Outline Random search v.s. gradient descent Finding better search directions Design white-box optimization methods to improve computation

More information

Small sample size in high dimensional space - minimum distance based classification.

Small sample size in high dimensional space - minimum distance based classification. Small sample size in high dimensional space - minimum distance based classification. Ewa Skubalska-Rafaj lowicz Institute of Computer Engineering, Automatics and Robotics, Department of Electronics, Wroc

More information

Tight Bounds for Learning a Mixture of Two Gaussians

Tight Bounds for Learning a Mixture of Two Gaussians Tight Bounds for Learning a Mixture of Two Gaussians Moritz Hardt m@mrtz.org IBM Research Almaden Eric Price ecprice@cs.utexas.edu The University of Texas at Austin May 16, 2015 Abstract We consider the

More information

HST.582J/6.555J/16.456J

HST.582J/6.555J/16.456J Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear

More information