ECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data

Similar documents
Lecture 19: Convergence

Lecture 7: October 18, 2017

An Introduction to Randomized Algorithms

Lecture Chapter 6: Convergence of Random Sequences

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Element sampling: Part 2

Lecture 2: Concentration Bounds

SDS 321: Introduction to Probability and Statistics

1 Convergence in Probability and the Weak Law of Large Numbers

Parameter, Statistic and Random Samples

Glivenko-Cantelli Classes

Lecture 5: April 17, 2013

STAT Homework 1 - Solutions

Lecture 2. The Lovász Local Lemma

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

Rademacher Complexity

Lecture 14: Graph Entropy

Maximum Likelihood Estimation and Complexity Regularization

Learning Theory: Lecture Notes

This section is optional.

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 12: November 13, 2018

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Probability and Random Processes

Last Lecture. Unbiased Test

ECE 901 Lecture 13: Maximum Likelihood Estimation

Machine Learning Brett Bernstein

Notes 5 : More on the a.s. convergence of sums

Simulation. Two Rule For Inverting A Distribution Function

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Lecture 6: Coupon Collector s problem

Information Theory and Statistics Lecture 4: Lempel-Ziv code

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

Distribution of Random Samples & Limit theorems

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

f(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1

Chapter 6 Principles of Data Reduction

Sequences and Series of Functions

Introduction to Probability. Ariel Yadin

Lecture 9: Expanders Part 2, Extractors

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Fall 2013 MTH431/531 Real analysis Section Notes

Stat 319 Theory of Statistics (2) Exercises

Lecture 20: Multivariate convergence and the Central Limit Theorem

CS/ECE 715 Spring 2004 Homework 5 (Due date: March 16)

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

0, otherwise. EX = E(X 1 + X n ) = EX j = np and. Var(X j ) = np(1 p). Var(X) = Var(X X n ) =

Lecture 15: Strong, Conditional, & Joint Typicality

Lecture 33: Bootstrap

Lecture 12: September 27

Probability review (week 2) Solutions

A Probabilistic Analysis of Quicksort

Lecture 13: Maximum Likelihood Estimation

1 Review and Overview

Notes 19 : Martingale CLT

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

7.1 Convergence of sequences of random variables

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Lecture 7: Properties of Random Samples

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Basics of Probability Theory (for Theory of Computation courses)

Lecture 8: Convergence of transformations and law of large numbers

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Lecture 12: Subadditive Ergodic Theorem

Lecture 4: Unique-SAT, Parity-SAT, and Approximate Counting

Application to Random Graphs

PRACTICE PROBLEMS FOR THE FINAL

Entropy Rates and Asymptotic Equipartition

Chapter 2 The Monte Carlo Method

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Unbiased Estimation. February 7-12, 2008

Understanding Samples

The multiplicative structure of finite field and a construction of LRC

On Random Line Segments in the Unit Square

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.

Exponential Families and Bayesian Inference

Please do NOT write in this box. Multiple Choice. Total

18.657: Mathematics of Machine Learning

Empirical Process Theory and Oracle Inequalities

Mathematical Statistics - MS

Solutions to HW Assignment 1

7.1 Convergence of sequences of random variables

Agnostic Learning and Concentration Inequalities

B Supplemental Notes 2 Hypergeometric, Binomial, Poisson and Multinomial Random Variables and Borel Sets

Problem Set 4 Due Oct, 12

Seunghee Ye Ma 8: Week 5 Oct 28

INFINITE SEQUENCES AND SERIES

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Solutions of Homework 2.

Law of the sum of Bernoulli random variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

Lecture 2: Monte Carlo Simulation

Lecture 4: April 10, 2013

Lecture 10 October Minimaxity and least favorable prior sequences

Stat 421-SP2012 Interval Estimation Section

Estimation for Complete Data

Transcription:

ECE 6980 A Algorithmic ad Iformatio-Theoretic Toolbo for Massive Data Istructor: Jayadev Acharya Lecture # Scribe: Huayu Zhag 8th August, 017 1 Recap X =, ε is a accuracy parameter, ad δ is a error parameter. Learig discrete distributios TV-Estimatio Problem: Give X 1, X,..., X idepedet samples draw from a uow distributio p over [], we eed to output ˆp s.t. with probability at least 1 δ, d T V (p, ˆp < ε. Here we assume δ = 0.1 (for ow. Suppose we observe X1 def = X 1, X,..., X from a distributio p over X. Let N def = {#times symbol appears i X 1 }. We defie the empirical estimator pˆ ( = N. Theorem 1. The empirical estimator satisfies [ E X l1 1 (p, pˆ ] Lemma (Cauchy-Schwarz Iequality. let a 1,..., a m, b 1,..., b m R, we have m m m ( a i b i ( a i ( b i The two sides are equal if ad oly if for all i, a i /b i = c. Proof. Usig CSI with a = p( pˆ (, b =1, ( ( l 1 (p, pˆ (p( pˆ ( If we tae epectatio for both sides, we have X [ E l 1 (p, pˆ ] E [ X ( N p(] (1 = [ E (N p( ] ( X = p((1 p( (3 X (4

The last two lies come from the fact that N Bi(, p(. So we have E[N ] = p( ad Var(N = p((1 p(. Because f( = is a cove fuctio, accordig to Jese s iequality, we get [ ] [ E l 1 (p, pˆ E l 1 (p, pˆ ] Lemma 3 (Marov s Iequality. If X is a oegative radom variable ad a > 0, the Prob(X a E[X] a Usig Marov s Iequality, ( Prob l 1 (p, pˆ > ε 1 ε Let 1 ε 0.1, we ca get 100. So if we use a empirical estimator, we get a upper ε boud of O(. ε 3 Poisso Samplig Poisso Samplig is a samplig method that produces idepedet N s without too much loss. 3.1 Properties of Poisso Distributio If X Poi(λ 1, Y Poi(λ 1 PMF: P(X = i = e λ1 λi 1 i!, Mea ad Variace: E[X] = Var(X = λ 1, 3 Whe ( p is fied ad p 0, Bi(, p goes to Poi( p. To be specific, whe p = λ, lim p i (1 p i = e λ λi p 0 i i! 4 X + Y Poi(λ 1 + λ 3. Procedure for Poisso samplig Fied legth samplig: We have a fied sample size ad we draw X 1, X,..., X iid samples from distributio p, N Bi(, p( Poisso legth samplig: 1 Poi( Geerate idepedet samples from p.

3..1 Properties of Poisso Samplig 1 N Poi( p(. Proof. Pr(N = j = Pr ( N = j, = j e! (p(j (p(j (p(j ( j (p( j (1 p( j j (1 p( j j ( j! ((1 p( j j p( (p(j. e (1 p( ( j! Coditio o, the distributio becomes fied legth with respect to parameter. 3 P(N =, N y = y = P(N = P(N y = y 4 Testig Problem Give descriptio of a probability distributio q over [], parameter ε ad idepedet samples from a uow distributio p, we wat to ow whether p = q or d T V (p, q > ε. The followig picture illustrates the case whe q = u[]. We eed to distiguish betwee p is the origi or p lies outside the square. Now we cosider a special case whe q is uiform. Give ε > 0 ad idepedet samples from p, we wat to figure out, with probability at least 0.9, whether p = q or d T V (p, q > ε. 3

Theorem 4. Testig uiformity requires Ω( samples for ay fied ε. Before we loo at the argumet for this theorem, let us see the followig lemma first. Lemma 5 (Birthday Parado. At least Ω( samples from u[] are eeded before you ca fid a repeated symbol with some costat probability. You ca prove this lemma by showig E[#symbols appear more tha 1 time] <. Do t forget uder Poisso Samplig, for every, N Poi(/. You ca also try to prove the followig result: At least Ω( 1 1/α samples from u[] are eeded before you ca fid a symbol appear α times with some costat probability. Now let us go bac to the theorem. Recall that P = u[] is the uiform distributio o []. Let u[/] be the collectio of all distributios that are uiform over a subset of / elemets of. There are ( / distributios. The ote that: For ay q u[/], dt V (q, u[] = 0.5. Let Q be the distributio uiformly draw from u[/]. The if we sample from P = u[] by /10 umber of samples, all symbols are distict. The same is true for Q. Hece we ca t distiguish betwee P ad Q with a costat probability. 4.1 Goldreich-Ro Algorithm The algorithm is as follows: Let T def = i<j I{ i = j }. If T ( ( 1 else we output p = q. + ε, we output d T V (p, q > ε Theorem 6. The coicidece based test solves uiformity testig problem with O( ε 4 Proof. Whe p is a uiform distributio, the epectatio of statistics T is: ( E[T p = u] = p ( (5 ( = 1 (6 Whe d T V (p, q > ε, by usig Jeso s iequality ad Cauchy-Schwarz iequality, ( p( 1 ( p( 1 ε Besides, ( p( 1 = p ( = p ( 1 p( + 1 (7 (8 The we have p ( 1 + ε 4

So the epectatio of the statistics is: ( E[T l 1 (p, u] = p ( (9 ( 1 + ε (10 The followig proof about boudig variace ad usig Chebychev s iequality will be covered i the et lecture. I the et lecture we will loo at a statistic that gives a upper boud of O( /ε samples. 5 Referece Mitzemacher, Michael, ad Eli Upfal. Probability ad Computig: Radomizatio ad Probabilistic Techiques i Algorithms ad Data Aalysis. Paisi 08 : http://www.stat.columbia.edu/ liam/research/pubs/sparse-uif-test.pdf 5