Sharp bounds for learning a mixture of two Gaussians

Size: px
Start display at page:

Download "Sharp bounds for learning a mixture of two Gaussians"

Transcription

1 Sharp bounds for learning a mixture of two Gaussians Moritz Hardt Eric Price IBM Almaden Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

2 Problem Height (cm) Height distribution of American 20 year olds. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

3 Problem Height (cm) Height distribution of American 20 year olds. Male/female heights are very close to Gaussian distribution. Can we learn the average male and female heights from unlabeled population data? How many samples to learn µ 1, µ 2 to ±ɛσ? Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

4 Gaussian Mixtures: Origins Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

5 Gaussian Mixtures: Origins Contributions to the Mathematical Theory of Evolution, Karl Pearson, 1894 Pearson s naturalist buddy measured lots of crab body parts. Most lengths seemed to follow the normal distribution (a recently coined name) But the forehead size wasn t symmetric. Maybe there were actually two species of crabs? Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

6 More previous work Pearson 1894: proposed method for 2 Gaussians Method of moments Other empirical papers over the years: Royce 58, Gridgeman 70, Gupta-Huang 80 Provable results assuming the components are well-separated: Clustering: Dasgupta 99, DA 00 Spectral methods: VW 04, AK 05, KSV 05, AM 05, VW 05 Kalai-Moitra-Valiant 2010: first general polynomial bound. Extended to general k mixtures: Moitra-Valiant 10, Belkin-Sinha 10 The KMV polynomial is very large. Our result: tight upper and lower bounds for the sample complexity. For k = 2 mixtures, arbitrary d dimensions. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

7 Learning the components vs. learning the sum Height (cm) It s important that we want to learn the individual components: Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

8 Learning the components vs. learning the sum Height (cm) It s important that we want to learn the individual components: Male/female average heights, std. deviations. Getting ɛ approximation in TV norm to overall distribution takes Θ(1/ɛ 2 ) samples from black box techniques. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

9 Learning the components vs. learning the sum Height (cm) It s important that we want to learn the individual components: Male/female average heights, std. deviations. Getting ɛ approximation in TV norm to overall distribution takes Θ(1/ɛ 2 ) samples from black box techniques. Quite general: for any mixture of known unimodal distributions. [Chan, Diakonikolas, Servedio, Sun 13] Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

10 We show Pearson s 1894 method can be extended to be optimal! Suppose we want means and variances to ɛ accuracy: µ i to ±ɛσ σ 2 i to ±ɛ 2 σ 2 In one dimension: Θ(1/ɛ 12 ) samples necessary and sufficient. Previously: O(1/ɛ 300 ). Moreover: algorithm is almost the same as Pearson (1894). In d dimensions, Θ(1/ɛ 12 log d) samples necessary and sufficient. σ 2 is max variance in any coordinate. Get each entry of covariance matrix to ±ɛ 2 σ 2. Previously: O((d/ɛ) 300,000 ). Caveat: assume p 1, p 2 are bounded away from zero. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

11 Outline 1 Algorithm in One Dimension 2 Algorithm in d Dimensions 3 Lower Bound Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

12 Outline 1 Algorithm in One Dimension 2 Algorithm in d Dimensions 3 Lower Bound Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

13 Method of Moments Height (cm) We want to learn five parameters: µ 1, µ 2, σ 1, σ 2, p 1, p 2 with p 1 + p 2 = 1. Moments give polynomial equations in parameters: M 1 := E[x 1 ] = p 1 µ 1 + p 2 µ 2 M 2 := E[x 2 ] = p 1 µ p 2µ p 1σ p 2σ 2 2 M 3, M 4, M 5 = [...] Use our samples to estimate the moments. Solve the system of equations to find the parameters. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

14 Method of Moments Solving the system Start with five parameters. First, can assume mean zero: Convert to central moments M 2 = M 2 M1 2 is independent of translation. Analogously, can assume min(σ 1, σ 2 ) = 0 by converting to excess moments X 4 = M 4 3M2 2 is independent of adding N(0, σ2 ). Excess kurtosis coined by Pearson, appearing in every Wikipedia probability distribution infobox. Leaves three free parameters. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

15 Method of Moments: system of equations Convenient to reparameterize by Gives that X 3 = α(β + 3γ) X 4 = α( 2α + β 2 + 6βγ + 3γ 2 ) α = µ 1 µ 2, β = µ 1 + µ 2, γ = σ2 2 σ2 1 µ 2 µ 1 X 5 = α(β 3 8αβ + 10β 2 γ + 15γ 2 β 20αγ) X 6 = α(16α 2 12αβ 2 60αβγ + β β 3 γ + 45β 2 γ βγ 3 ) All my attempts to obtain a simpler set have failed... It is possible, however, that some other... equations of a less complex kind may ultimately be found. Karl Pearson Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

16 Pearson s Polynomial Chug chug chug... Get a 9th degree polynomial in the excess moments X 3, X 4, X 5 : p(α) = 8α X 4 α 7 12X3 2 α6 + (24X 3 X X4 2 )α5 + (6X X3 2 X 4)α 4 + (96X3 4 36X 3X 4 X 5 + 9X4 3 )α3 + (24X3 3 X X3 2 X 4 2 )α2 32X3 4 X 4α + 8X3 6 = 0 Easy to go from solutions α to mixtures µ i, σ i, p i. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

17 Pearson s Polynomial Get a 9th degree polynomial in the excess moments X 3, X 4, X 5. Positive roots correspond to mixtures that match on five moments. Usually have two roots. Pearson s proposal: choose candidate with closer 6th moment. Works because six moments uniquely identify mixture [KMV] How robust to moment estimation error? Usually works well Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

18 Pearson s Polynomial Get a 9th degree polynomial in the excess moments X 3, X 4, X 5. Positive roots correspond to mixtures that match on five moments. Usually have two roots. Pearson s proposal: choose candidate with closer 6th moment. Works because six moments uniquely identify mixture [KMV] How robust to moment estimation error? Usually works well Not when there s a double root. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

19 Making it robust in all cases Can create another ninth degree polynomial p 6 from X 3, X 4, X 5, X 6. Then α is the unique positive root of r(α) := p 5 (α) 2 + p 6 (α) 2 = 0. Therefore q(x) := r/(x α) 2 has no positive roots. Would like that q(x) c > 0 for all x and all mixtures α, β, γ. Then for p 5 p 6, p 6 p 6 ɛ, α arg min r(x) ɛ/ c. Compactness: true for any closed and bounded region. Bounded: For unbounded variables, dominating terms show q. Closed: Issue is that x > 0 isn t closed. Can use X3, X 4 to get an O(1) approximation α to α. x [α/10, α] is closed. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

20 Result Large Small Suppose the two components have means σ apart. Then if we know M i to ±ɛ( σ) i, the algorithm recovers the means to ±ɛ σ. Therefore O( 12 ɛ 2 ) samples give an ɛ approximation. If components are Ω(1) standard deviations apart, O(1/ɛ 2 ) samples suffice. In general, O(1/ɛ 12 ) samples suffice to get ɛσ accuracy. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

21 Outline 1 Algorithm in One Dimension 2 Algorithm in d Dimensions 3 Lower Bound Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

22 Algorithm in d dimensions Idea: project to lower dimensions. Look at individual coordinates: get {µ 1,i, µ 2,i } to ±ɛσ. How do we piece them together? Suppose we could solve d = 2: Can match up {µ1,i, µ 2,i } with {µ 1,j, µ 2,j }. Solve d = 2: Project x v, x for many random v. For µ µ, will have µ, v µ, v with constant probability. So we solve d case with poly(d) calls to 1-dimensional case. Only loss is log(1/δ) log(d/δ): Θ(1/ɛ 12 log(d/δ)) samples Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

23 Outline 1 Algorithm in One Dimension 2 Algorithm in d Dimensions 3 Lower Bound Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

24 Lower bound in one dimension The algorithm takes O(ɛ 12 ) samples because it uses six moments Necessary to get sixth moment to ±(ɛσ) 6. Let F, F be any two mixtures with five matching moments: Constant means and variances. Add N(0, σ 2 ) to each mixture as σ grows. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

25 Lower bound in one dimension The algorithm takes O(ɛ 12 ) samples because it uses six moments Necessary to get sixth moment to ±(ɛσ) 6. Let F, F be any two mixtures with five matching moments: Constant means and variances. Add N(0, σ 2 ) to each mixture as σ grows. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

26 Lower bound in one dimension The algorithm takes O(ɛ 12 ) samples because it uses six moments Necessary to get sixth moment to ±(ɛσ) 6. Let F, F be any two mixtures with five matching moments: Constant means and variances. Add N(0, σ 2 ) to each mixture as σ grows. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

27 Lower bound in one dimension The algorithm takes O(ɛ 12 ) samples because it uses six moments Necessary to get sixth moment to ±(ɛσ) 6. Let F, F be any two mixtures with five matching moments: Constant means and variances. Add N(0, σ 2 ) to each mixture as σ grows. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

28 Lower bound in one dimension The algorithm takes O(ɛ 12 ) samples because it uses six moments Necessary to get sixth moment to ±(ɛσ) 6. Let F, F be any two mixtures with five matching moments: Constant means and variances. Add N(0, σ 2 ) to each mixture as σ grows. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

29 Lower bound in one dimension The algorithm takes O(ɛ 12 ) samples because it uses six moments Necessary to get sixth moment to ±(ɛσ) 6. Let F, F be any two mixtures with five matching moments: Constant means and variances. Add N(0, σ 2 ) to each mixture as σ grows. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

30 Lower bound in one dimension The algorithm takes O(ɛ 12 ) samples because it uses six moments Necessary to get sixth moment to ±(ɛσ) 6. Let F, F be any two mixtures with five matching moments: Constant means and variances. Add N(0, σ 2 ) to each mixture as σ grows. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

31 Lower bound in one dimension The algorithm takes O(ɛ 12 ) samples because it uses six moments Necessary to get sixth moment to ±(ɛσ) 6. Let F, F be any two mixtures with five matching moments: Constant means and variances. Add N(0, σ 2 ) to each mixture as σ grows. Claim: Ω(σ 12 ) samples necessary to distinguish the distributions. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

32 Lower bound in one dimension Two mixtures F, F with F F. Have TV(F, F ) 1/σ 6. Shows Ω(σ 6 ) samples, O(σ 12 ) samples. Improve using squared Hellinger distance. H 2 (P, Q) := 1 2 ( p(x) q(x)) 2 dx H 2 is subadditive on product measures Sample complexity is Ω(1/H 2 (F, F )) H 2 TV H, but often H TV. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

33 Bounding the Hellinger distance: general idea Definition H 2 (P, Q) = 1 2 ( p(x) q(x)) 2 dx = 1 p(x)q(x)dx If q(x) = (1 + (x))p(x) for some small, then [Pollard 00] H 2 (p, q) = (x)p(x)dx = 1 E x p [ 1 + (x)] = 1 E [1 + (x)/2 O( 2 (x))] (x) x p }{{} x Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

34 Bounding the Hellinger distance: general idea Definition H 2 (P, Q) = 1 2 ( p(x) q(x)) 2 dx = 1 p(x)q(x)dx If q(x) = (1 + (x))p(x) for some small, then [Pollard 00] H 2 (p, q) = (x)p(x)dx = 1 E x p [ 1 + (x)] = 1 E [1 + (x)/2 O( 2 (x))] (x) x p }{{} x Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

35 Bounding the Hellinger distance: general idea Definition H 2 (P, Q) = 1 2 ( p(x) q(x)) 2 dx = 1 p(x)q(x)dx If q(x) = (1 + (x))p(x) for some small, then [Pollard 00] H 2 (p, q) = (x)p(x)dx = 1 E x p [ 1 + (x)] = 1 E [1 + (x) /2 O( 2 (x))] (x) x p }{{}}{{} q(x) p(x)=0 x Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

36 Bounding the Hellinger distance: general idea Definition H 2 (P, Q) = 1 2 ( p(x) q(x)) 2 dx = 1 p(x)q(x)dx If q(x) = (1 + (x))p(x) for some small, then [Pollard 00] H 2 (p, q) = (x)p(x)dx = 1 E x p [ 1 + (x)] = 1 E [1 + (x) /2 O( 2 (x))] (x) x p }{{}}{{} q(x) p(x)=0 x E x p [ 2 (x)] Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

37 Bounding the Hellinger distance: general idea Definition H 2 (P, Q) = 1 2 ( p(x) q(x)) 2 dx = 1 p(x)q(x)dx If q(x) = (1 + (x))p(x) for some small, then [Pollard 00] H 2 (p, q) = (x)p(x)dx = 1 E x p [ 1 + (x)] = 1 E [1 + (x) /2 O( 2 (x))] (x) x p }{{}}{{} q(x) p(x)=0 x E x p [ 2 (x)] Compare to TV (p, q) = 1 2 E x p[ (x) ] Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

38 Bounding the Hellinger distance: our setting Lemma Let F, F be two subgaussian distributions with k matching moments and constant parameters. Then for G, G = F + N(0, σ 2 ), F + N(0, σ 2 ), H 2 (G, G ) 1/σ 2k+2. Can show both G, G are within O(1) of N(0, σ 2 ) over [ σ 2, σ 2 ]. We have that (x) G (x) G(x) ν(x t) = (F (t) F(t))dt ν(x) ν(x) ( ) d 1 + x/σ σ t d (F (t) F (t))dt d d=0 ( ) d ( ) k x/σ 1 + x/σ σ σ so d=k+1 H 2 (G, G ) E x G [ (x) 2 ] 1/σ 2k+2 Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

39 Lower bound in one dimension Add N(0, σ 2 ) to two mixtures with five matching moments. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

40 Lower bound in one dimension Add N(0, σ 2 ) to two mixtures with five matching moments. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

41 Lower bound in one dimension Add N(0, σ 2 ) to two mixtures with five matching moments. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

42 Lower bound in one dimension Add N(0, σ 2 ) to two mixtures with five matching moments. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

43 Lower bound in one dimension Add N(0, σ 2 ) to two mixtures with five matching moments. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

44 Lower bound in one dimension Add N(0, σ 2 ) to two mixtures with five matching moments. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

45 Lower bound in one dimension Add N(0, σ 2 ) to two mixtures with five matching moments. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

46 Lower bound in one dimension Add N(0, σ 2 ) to two mixtures with five matching moments. For G = 1 2 N( 1, 1 + σ2 ) N(1, 2 + σ2 ) G 0.297N( 1.226, σ 2 ) N(0.517, σ 2 ) have H 2 (G, G ) 1/σ 12. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

47 Lower bound in one dimension Add N(0, σ 2 ) to two mixtures with five matching moments. For G = 1 2 N( 1, 1 + σ2 ) N(1, 2 + σ2 ) G 0.297N( 1.226, σ 2 ) N(0.517, σ 2 ) have H 2 (G, G ) 1/σ 12. Therefore distinguishing G from G takes Ω(σ 12 ) samples. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

48 Lower bound in one dimension Add N(0, σ 2 ) to two mixtures with five matching moments. For G = 1 2 N( 1, 1 + σ2 ) N(1, 2 + σ2 ) G 0.297N( 1.226, σ 2 ) N(0.517, σ 2 ) have H 2 (G, G ) 1/σ 12. Therefore distinguishing G from G takes Ω(σ 12 ) samples. Cannot learn either means to ±ɛσ or variance to ±ɛ 2 σ 2 with o(1/ɛ 12 ) samples. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

49 Recap and open questions Our result: Θ(ɛ 12 log d) samples necessary and sufficient to estimate µ i to ±ɛσ, σi 2 to ±ɛ 2 σ 2. If the means have σ separation, just O(ɛ 2 12 ) for ɛ σ accuracy. Extend to k > 2? Lower bound extends, so Ω(ɛ 6k ). Do we really care about finding an O(ɛ 18 ) algorithm? Solving the system of equations gets nasty. Automated way of figuring out whether solution to system of polynomial equations is robust? Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

50 Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians / 25

Tight Bounds for Learning a Mixture of Two Gaussians

Tight Bounds for Learning a Mixture of Two Gaussians Tight Bounds for Learning a Mixture of Two Gaussians Moritz Hardt m@mrtz.org IBM Research Almaden Eric Price ecprice@cs.utexas.edu The University of Texas at Austin May 16, 2015 Abstract We consider the

More information

Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians. Constantinos Daskalakis, MIT Gautam Kamath, MIT

Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians. Constantinos Daskalakis, MIT Gautam Kamath, MIT Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians Constantinos Daskalakis, MIT Gautam Kamath, MIT What s a Gaussian Mixture Model (GMM)? Interpretation 1: PDF is a convex

More information

Computational Lower Bounds for Statistical Estimation Problems

Computational Lower Bounds for Statistical Estimation Problems Computational Lower Bounds for Statistical Estimation Problems Ilias Diakonikolas (USC) (joint with Daniel Kane (UCSD) and Alistair Stewart (USC)) Workshop on Local Algorithms, MIT, June 2018 THIS TALK

More information

Disentangling Gaussians

Disentangling Gaussians Disentangling Gaussians Ankur Moitra, MIT November 6th, 2014 Dean s Breakfast Algorithmic Aspects of Machine Learning 2015 by Ankur Moitra. Note: These are unpolished, incomplete course notes. Developed

More information

On Learning Mixtures of Well-Separated Gaussians

On Learning Mixtures of Well-Separated Gaussians On Learning Mixtures of Well-Separated Gaussians Oded Regev Aravindan Viayaraghavan Abstract We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the

More information

Robustness Meets Algorithms

Robustness Meets Algorithms Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately

More information

Improved Concentration Bounds for Count-Sketch

Improved Concentration Bounds for Count-Sketch Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 Eric Price 2 1 MIT MSR New England 2 MIT IBM Almaden UT Austin 2014-01-06 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds

More information

A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k

A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k Jerry Li MIT jerryzli@mit.edu Ludwig Schmidt MIT ludwigs@mit.edu June 27, 205 Abstract Learning

More information

Robust Statistics, Revisited

Robust Statistics, Revisited Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown

More information

Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures

Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures Jayadev Acharya MIT jayadev@mit.edu Ashkan Jafarpour, Alon Orlitsky, Ananda Theertha Suresh UC San Diego {ashkan, alon, asuresh}@ucsd.edu

More information

Learning and Testing Structured Distributions

Learning and Testing Structured Distributions Learning and Testing Structured Distributions Ilias Diakonikolas University of Edinburgh Simons Institute March 2015 This Talk Algorithmic Framework for Distribution Estimation: Leads to fast & robust

More information

UNIT-I CURVE FITTING AND THEORY OF EQUATIONS

UNIT-I CURVE FITTING AND THEORY OF EQUATIONS Part-A 1. Define linear law. The relation between the variables x & y is liner. Let y = ax + b (1) If the points (x i, y i ) are plotted in the graph sheet, they should lie on a straight line. a is the

More information

Robust polynomial regression up to the information theoretic limit

Robust polynomial regression up to the information theoretic limit Robust polynomial regression up to the information theoretic limit Daniel Kane Sushrut Karmalkar Eric Price August 16, 017 Abstract We consider the problem of robust polynomial regression, where one receives

More information

Probability Revealing Samples

Probability Revealing Samples Krzysztof Onak IBM Research Xiaorui Sun Microsoft Research Abstract In the most popular distribution testing and parameter estimation model, one can obtain information about an underlying distribution

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

2.6. Graphs of Rational Functions. Copyright 2011 Pearson, Inc.

2.6. Graphs of Rational Functions. Copyright 2011 Pearson, Inc. 2.6 Graphs of Rational Functions Copyright 2011 Pearson, Inc. Rational Functions What you ll learn about Transformations of the Reciprocal Function Limits and Asymptotes Analyzing Graphs of Rational Functions

More information

Math 1310 Section 4.1: Polynomial Functions and Their Graphs. A polynomial function is a function of the form ...

Math 1310 Section 4.1: Polynomial Functions and Their Graphs. A polynomial function is a function of the form ... Math 1310 Section 4.1: Polynomial Functions and Their Graphs A polynomial function is a function of the form... where 0,,,, are real numbers and n is a whole number. The degree of the polynomial function

More information

On Learning Mixtures of Well-Separated Gaussians

On Learning Mixtures of Well-Separated Gaussians 58th Annual IEEE Symposium on Foundations of Computer Science On Learning Mixtures of Well-Separated Gaussians Oded Regev Courant Institute of Mathematical Sciences New York University New York, USA regev@cims.nyu.edu

More information

Learning Mixtures of Gaussians in High Dimensions

Learning Mixtures of Gaussians in High Dimensions Learning Mixtures of Gaussians in High Dimensions [Extended Abstract] Rong Ge Microsoft Research, New England Qingqing Huang MIT, EECS Sham M. Kakade Microsoft Research, New England ABSTRACT Efficiently

More information

Lecture 21: Minimax Theory

Lecture 21: Minimax Theory Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways

More information

6.2 Parametric vs Non-Parametric Generative Models

6.2 Parametric vs Non-Parametric Generative Models ECE598: Representation Learning: Algorithms and Models Fall 2017 Lecture 6: Generative Models: Mixture of Gaussians Lecturer: Prof. Pramod Viswanath Scribe: Moitreya Chatterjee, Tao Sun, WZ, Sep 14, 2017

More information

Dissertation Defense

Dissertation Defense Clustering Algorithms for Random and Pseudo-random Structures Dissertation Defense Pradipta Mitra 1 1 Department of Computer Science Yale University April 23, 2008 Mitra (Yale University) Dissertation

More information

Maximum Likelihood Estimation for Mixtures of Spherical Gaussians is NP-hard

Maximum Likelihood Estimation for Mixtures of Spherical Gaussians is NP-hard Journal of Machine Learning Research 18 018 1-11 Submitted 1/16; Revised 1/16; Published 4/18 Maximum Likelihood Estimation for Mixtures of Spherical Gaussians is NP-hard Christopher Tosh Sanjoy Dasgupta

More information

Expectation Maximization and Mixtures of Gaussians

Expectation Maximization and Mixtures of Gaussians Statistical Machine Learning Notes 10 Expectation Maximiation and Mixtures of Gaussians Instructor: Justin Domke Contents 1 Introduction 1 2 Preliminary: Jensen s Inequality 2 3 Expectation Maximiation

More information

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Expectation. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Expectation DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean, variance,

More information

Settling the Polynomial Learnability of Mixtures of Gaussians

Settling the Polynomial Learnability of Mixtures of Gaussians Settling the Polynomial Learnability of Mixtures of Gaussians Ankur Moitra Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 02139 moitra@mit.edu Gregory Valiant

More information

6.3 Partial Fractions

6.3 Partial Fractions 6.3 Partial Fractions Mark Woodard Furman U Fall 2009 Mark Woodard (Furman U) 6.3 Partial Fractions Fall 2009 1 / 11 Outline 1 The method illustrated 2 Terminology 3 Factoring Polynomials 4 Partial fraction

More information

Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics

Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics Ankur P. Parikh School of Computer Science Carnegie Mellon University apparikh@cs.cmu.edu Shay B. Cohen School of Informatics

More information

The Sylvester Resultant

The Sylvester Resultant Lecture 10 The Sylvester Resultant We want to compute intersections of algebraic curves F and G. Let F and G be the vanishing sets of f(x,y) and g(x, y), respectively. Algebraically, we are interested

More information

Notes 10: List Decoding Reed-Solomon Codes and Concatenated codes

Notes 10: List Decoding Reed-Solomon Codes and Concatenated codes Introduction to Coding Theory CMU: Spring 010 Notes 10: List Decoding Reed-Solomon Codes and Concatenated codes April 010 Lecturer: Venkatesan Guruswami Scribe: Venkat Guruswami & Ali Kemal Sinop DRAFT

More information

Maths Extension 2 - Polynomials. Polynomials

Maths Extension 2 - Polynomials. Polynomials Maths Extension - Polynomials Polynomials! Definitions and properties of polynomials! Factors & Roots! Fields ~ Q Rational ~ R Real ~ C Complex! Finding zeros over the complex field! Factorization & Division

More information

Improved Bounds on the Dot Product under Random Projection and Random Sign Projection

Improved Bounds on the Dot Product under Random Projection and Random Sign Projection Improved Bounds on the Dot Product under Random Projection and Random Sign Projection Ata Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

B. Appendix B. Topological vector spaces

B. Appendix B. Topological vector spaces B.1 B. Appendix B. Topological vector spaces B.1. Fréchet spaces. In this appendix we go through the definition of Fréchet spaces and their inductive limits, such as they are used for definitions of function

More information

Gaussian Random Variables Why we Care

Gaussian Random Variables Why we Care Gaussian Random Variables Why we Care I Gaussian random variables play a critical role in modeling many random phenomena. I By central limit theorem, Gaussian random variables arise from the superposition

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

On the tradeoff between computational complexity and sample complexity in learning

On the tradeoff between computational complexity and sample complexity in learning On the tradeoff between computational complexity and sample complexity in learning Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Joint work with Sham

More information

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Expectation. DS GA 1002 Probability and Statistics for Data Science.   Carlos Fernandez-Granda Expectation DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean,

More information

Global Analysis of Expectation Maximization for Mixtures of Two Gaussians

Global Analysis of Expectation Maximization for Mixtures of Two Gaussians Global Analysis of Expectation Maximization for Mixtures of Two Gaussians Ji Xu Columbia University jixu@cs.columbia.edu Daniel Hsu Columbia University djhsu@cs.columbia.edu Arian Maleki Columbia University

More information

Learning a Mixture of Gaussians via Mixed Integer Optimization

Learning a Mixture of Gaussians via Mixed Integer Optimization Learning a Mixture of Gaussians via Mixed Integer Optimization Hari Bandi Dimitris Bertsimas Rahul Mazumder October 15, 2018 Abstract We consider the problem of estimating the parameters of a multivariate

More information

CS369N: Beyond Worst-Case Analysis Lecture #4: Probabilistic and Semirandom Models for Clustering and Graph Partitioning

CS369N: Beyond Worst-Case Analysis Lecture #4: Probabilistic and Semirandom Models for Clustering and Graph Partitioning CS369N: Beyond Worst-Case Analysis Lecture #4: Probabilistic and Semirandom Models for Clustering and Graph Partitioning Tim Roughgarden April 25, 2010 1 Learning Mixtures of Gaussians This lecture continues

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

Lecture 2 Sep 5, 2017

Lecture 2 Sep 5, 2017 CS 388R: Randomized Algorithms Fall 2017 Lecture 2 Sep 5, 2017 Prof. Eric Price Scribe: V. Orestis Papadigenopoulos and Patrick Rall NOTE: THESE NOTES HAVE NOT BEEN EDITED OR CHECKED FOR CORRECTNESS 1

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

THE N-VALUE GAME OVER Z AND R

THE N-VALUE GAME OVER Z AND R THE N-VALUE GAME OVER Z AND R YIDA GAO, MATT REDMOND, ZACH STEWARD Abstract. The n-value game is an easily described mathematical diversion with deep underpinnings in dynamical systems analysis. We examine

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Lecture 3: Lower Bounds for Bandit Algorithms

Lecture 3: Lower Bounds for Bandit Algorithms CMSC 858G: Bandits, Experts and Games 09/19/16 Lecture 3: Lower Bounds for Bandit Algorithms Instructor: Alex Slivkins Scribed by: Soham De & Karthik A Sankararaman 1 Lower Bounds In this lecture (and

More information

Sample-Optimal Density Estimation in Nearly-Linear Time

Sample-Optimal Density Estimation in Nearly-Linear Time Sample-Optimal Density Estimation in Nearly-Linear Time Jayadev Acharya EECS, MIT jayadev@csail.mit.edu Jerry Li EECS, MIT jerryzli@csail.mit.edu Ilias Diakonikolas Informatics, U. of Edinburgh ilias.d@ed.ac.uk

More information

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

The Moment Method; Convex Duality; and Large/Medium/Small Deviations Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential

More information

Spectral gaps and geometric representations. Amir Yehudayoff (Technion)

Spectral gaps and geometric representations. Amir Yehudayoff (Technion) Spectral gaps and geometric representations Amir Yehudayoff (Technion) Noga Alon (Tel Aviv) Shay Moran (Simons) pseudorandomness random objects have properties difficult to describe hard to compute expanders

More information

Agnostic Learning of Disjunctions on Symmetric Distributions

Agnostic Learning of Disjunctions on Symmetric Distributions Agnostic Learning of Disjunctions on Symmetric Distributions Vitaly Feldman vitaly@post.harvard.edu Pravesh Kothari kothari@cs.utexas.edu May 26, 2014 Abstract We consider the problem of approximating

More information

Junta Approximations for Submodular, XOS and Self-Bounding Functions

Junta Approximations for Submodular, XOS and Self-Bounding Functions Junta Approximations for Submodular, XOS and Self-Bounding Functions Vitaly Feldman Jan Vondrák IBM Almaden Research Center Simons Institute, Berkeley, October 2013 Feldman-Vondrák Approximations by Juntas

More information

Sample-Optimal Density Estimation in Nearly-Linear Time

Sample-Optimal Density Estimation in Nearly-Linear Time Sample-Optimal Density Estimation in Nearly-Linear Time Jayadev Acharya EECS, MIT jayadev@csail.mit.edu Jerry Li EECS, MIT jerryzli@csail.mit.edu Ilias Diakonikolas CS, USC diakonik@usc.edu Ludwig Schmidt

More information

Sparse Solutions to Nonnegative Linear Systems and Applications

Sparse Solutions to Nonnegative Linear Systems and Applications Sparse Solutions to Nonnegative Linear Systems and Applications Aditya Bhaskara, Ananda Theertha Suresh 2, and Morteza Zadimoghaddam Google Research NYC 2 University of California, San Diego Abstract We

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

ELEMENTS OF PROBABILITY THEORY

ELEMENTS OF PROBABILITY THEORY ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable

More information

Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions

Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions [Extended Abstract] ABSTRACT Daniel Hsu Microsoft Research New England dahsu@microsoft.com This work provides a computationally

More information

Trace Reconstruction Revisited

Trace Reconstruction Revisited Trace Reconstruction Revisited Andrew McGregor 1, Eric Price 2, and Sofya Vorotnikova 1 1 University of Massachusetts Amherst {mcgregor,svorotni}@cs.umass.edu 2 IBM Almaden Research Center ecprice@mit.edu

More information

Math 141: Lecture 11

Math 141: Lecture 11 Math 141: Lecture 11 The Fundamental Theorem of Calculus and integration methods Bob Hough October 12, 2016 Bob Hough Math 141: Lecture 11 October 12, 2016 1 / 36 First Fundamental Theorem of Calculus

More information

Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures (Extended Abstract)

Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures (Extended Abstract) 58th Annual IEEE Symposium on Foundations of Computer Science Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures (Extended Abstract) Ilias Diakonikolas

More information

BALANCING GAUSSIAN VECTORS. 1. Introduction

BALANCING GAUSSIAN VECTORS. 1. Introduction BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,

More information

learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015

learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015 learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015 Introduction Often, training distribution does not match testing distribution Want to utilize information about test

More information

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department

More information

18.175: Lecture 15 Characteristic functions and central limit theorem

18.175: Lecture 15 Characteristic functions and central limit theorem 18.175: Lecture 15 Characteristic functions and central limit theorem Scott Sheffield MIT Outline Characteristic functions Outline Characteristic functions Characteristic functions Let X be a random variable.

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

4 Integration. Copyright Cengage Learning. All rights reserved.

4 Integration. Copyright Cengage Learning. All rights reserved. 4 Integration Copyright Cengage Learning. All rights reserved. 4.1 Antiderivatives and Indefinite Integration Copyright Cengage Learning. All rights reserved. Objectives! Write the general solution of

More information

How might we evaluate this? Suppose that, by some good luck, we knew that. x 2 5. x 2 dx 5

How might we evaluate this? Suppose that, by some good luck, we knew that. x 2 5. x 2 dx 5 8.4 1 8.4 Partial Fractions Consider the following integral. 13 2x (1) x 2 x 2 dx How might we evaluate this? Suppose that, by some good luck, we knew that 13 2x (2) x 2 x 2 = 3 x 2 5 x + 1 We could then

More information

Local Minimax Testing

Local Minimax Testing Local Minimax Testing Sivaraman Balakrishnan and Larry Wasserman Carnegie Mellon June 11, 2017 1 / 32 Hypothesis testing beyond classical regimes Example 1: Testing distribution of bacteria in gut microbiome

More information

Math123 Lecture 1. Dr. Robert C. Busby. Lecturer: Office: Korman 266 Phone :

Math123 Lecture 1. Dr. Robert C. Busby. Lecturer: Office: Korman 266 Phone : Lecturer: Math1 Lecture 1 Dr. Robert C. Busby Office: Korman 66 Phone : 15-895-1957 Email: rbusby@mcs.drexel.edu Course Web Site: http://www.mcs.drexel.edu/classes/calculus/math1_spring0/ (Links are case

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

Mixture of Gaussians Models

Mixture of Gaussians Models Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures

More information

1 Dimension Reduction in Euclidean Space

1 Dimension Reduction in Euclidean Space CSIS0351/8601: Randomized Algorithms Lecture 6: Johnson-Lindenstrauss Lemma: Dimension Reduction Lecturer: Hubert Chan Date: 10 Oct 011 hese lecture notes are supplementary materials for the lectures.

More information

Review all the activities leading to Midterm 3. Review all the problems in the previous online homework sets (8+9+10).

Review all the activities leading to Midterm 3. Review all the problems in the previous online homework sets (8+9+10). MA109, Activity 34: Review (Sections 3.6+3.7+4.1+4.2+4.3) Date: Objective: Additional Assignments: To prepare for Midterm 3, make sure that you can solve the types of problems listed in Activities 33 and

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Decoding Reed-Muller codes over product sets

Decoding Reed-Muller codes over product sets Rutgers University May 30, 2016 Overview Error-correcting codes 1 Error-correcting codes Motivation 2 Reed-Solomon codes Reed-Muller codes 3 Error-correcting codes Motivation Goal: Send a message Don t

More information

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models Part 2: Algorithms Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

Statistical and Computational Analysis of Locality Preserving Projection

Statistical and Computational Analysis of Locality Preserving Projection Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637

More information

Iterative solvers for linear equations

Iterative solvers for linear equations Spectral Graph Theory Lecture 23 Iterative solvers for linear equations Daniel A. Spielman November 26, 2018 23.1 Overview In this and the next lecture, I will discuss iterative algorithms for solving

More information

Confidence Intervals

Confidence Intervals Quantitative Foundations Project 3 Instructor: Linwei Wang Confidence Intervals Contents 1 Introduction 3 1.1 Warning....................................... 3 1.2 Goals of Statistics..................................

More information

Distance between multinomial and multivariate normal models

Distance between multinomial and multivariate normal models Chapter 9 Distance between multinomial and multivariate normal models SECTION 1 introduces Andrew Carter s recursive procedure for bounding the Le Cam distance between a multinomialmodeland its approximating

More information

Lecture 9: March 26, 2014

Lecture 9: March 26, 2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 9: March 26, 204 Spring 204 Scriber: Keith Nichols Overview. Last Time Finished analysis of O ( n ɛ ) -query

More information

Sparse Solutions to Nonnegative Linear Systems and Applications

Sparse Solutions to Nonnegative Linear Systems and Applications Aditya Bhaskara Ananda Theertha Suresh Morteza Zadimoghaddam Google Research NYC University of California, San Diego Google Research NYC Abstract We give an efficient algorithm for finding sparse approximate

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

Chebyshev Polynomials and Approximation Theory in Theoretical Computer Science and Algorithm Design

Chebyshev Polynomials and Approximation Theory in Theoretical Computer Science and Algorithm Design Chebyshev Polynomials and Approximation Theory in Theoretical Computer Science and Algorithm Design (Talk for MIT s Danny Lewin Theory Student Retreat, 2015) Cameron Musco October 8, 2015 Abstract I will

More information

Learning symmetric non-monotone submodular functions

Learning symmetric non-monotone submodular functions Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru

More information

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria Weak Leiden University Amsterdam, 13 November 2013 Outline 1 2 3 4 5 6 7 Definition Definition Let µ, µ 1, µ 2,... be probability measures on (R, B). It is said that µ n converges weakly to µ, and we then

More information

18.440: Lecture 26 Conditional expectation

18.440: Lecture 26 Conditional expectation 18.440: Lecture 26 Conditional expectation Scott Sheffield MIT 1 Outline Conditional probability distributions Conditional expectation Interpretation and examples 2 Outline Conditional probability distributions

More information

7.4: Integration of rational functions

7.4: Integration of rational functions A rational function is a function of the form: f (x) = P(x) Q(x), where P(x) and Q(x) are polynomials in x. P(x) = a n x n + a n 1 x n 1 + + a 0. Q(x) = b m x m + b m 1 x m 1 + + b 0. How to express a

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

Accelerating the EM Algorithm for Mixture Density Estimation

Accelerating the EM Algorithm for Mixture Density Estimation Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18 Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic

More information

Optimal estimation of Gaussian mixtures via denoised method of moments

Optimal estimation of Gaussian mixtures via denoised method of moments Optimal estimation of Gaussian mixtures via denoised method of moments Yihong Wu and Pengkun Yang Working paper: July 15, 2018 Abstract The Method of Moments is one of the most widely used methods in statistics

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information

Efficient Density Estimation via Piecewise Polynomial Approximation

Efficient Density Estimation via Piecewise Polynomial Approximation Efficient Density Estimation via Piecewise Polynomial Approximation [Extended Abstract] ABSTRACT Siu-On Chan Microsoft Research Cambridge, MA siuon@cs.berkeley.edu Rocco A. Servedio Columbia University

More information

Polynomial Harmonic Decompositions

Polynomial Harmonic Decompositions DOI: 10.1515/auom-2017-0002 An. Şt. Univ. Ovidius Constanţa Vol. 25(1),2017, 25 32 Polynomial Harmonic Decompositions Nicolae Anghel Abstract For real polynomials in two indeterminates a classical polynomial

More information