Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Similar documents
Lecture Chapter 6: Convergence of Random Sequences

Distribution of Random Samples & Limit theorems

Lecture 20: Multivariate convergence and the Central Limit Theorem

7.1 Convergence of sequences of random variables

This section is optional.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

4. Partial Sums and the Central Limit Theorem

5. Limit Theorems, Part II: Central Limit Theorem. ECE 302 Fall 2009 TR 3 4:15pm Purdue University, School of ECE Prof.

Lecture 2: Concentration Bounds

1 Convergence in Probability and the Weak Law of Large Numbers

Estimation of the Mean and the ACVF

Problem Set 2 Solutions

Lecture 12: November 13, 2018

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Lecture 19: Convergence

7.1 Convergence of sequences of random variables

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Topic 9: Sampling Distributions of Estimators

Lecture 2 February 8, 2016

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

Introduction to Probability. Ariel Yadin

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Notes 19 : Martingale CLT

Topic 9: Sampling Distributions of Estimators

SDS 321: Introduction to Probability and Statistics

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Learning Theory: Lecture Notes

Topic 9: Sampling Distributions of Estimators

Expectation and Variance of a random variable

LECTURE 8: ASYMPTOTICS I

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

STAT Homework 1 - Solutions

ST5215: Advanced Statistical Theory

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Advanced Stochastic Processes.

Solutions: Homework 3

Binomial Distribution

STAT Homework 2 - Solutions

Central Limit Theorem using Characteristic functions

32 estimating the cumulative distribution function

MATH/STAT 352: Lecture 15

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Lecture 19. sup y 1,..., yn B d n

Element sampling: Part 2

Parameter, Statistic and Random Samples

Chapter 6 Sampling Distributions

Stat 421-SP2012 Interval Estimation Section

Notes 5 : More on the a.s. convergence of sums

Approximations and more PMFs and PDFs

EE 4TM4: Digital Communications II Probability Theory

MA Advanced Econometrics: Properties of Least Squares Estimators

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

Lecture 3: August 31

Basics of Probability Theory (for Theory of Computation courses)

Discrete Mathematics and Probability Theory Fall 2016 Walrand Probability: An Overview

2.2. Central limit theorem.

1 Review and Overview

Variance of Discrete Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19

An Introduction to Randomized Algorithms

Notes 27 : Brownian motion: path properties

AMS 216 Stochastic Differential Equations Lecture 02 Copyright by Hongyun Wang, UCSC ( ( )) 2 = E X 2 ( ( )) 2

Lecture 8: Convergence of transformations and law of large numbers

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

Unbiased Estimation. February 7-12, 2008

Lecture 3. Properties of Summary Statistics: Sampling Distribution

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Law of the sum of Bernoulli random variables

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Simulation. Two Rule For Inverting A Distribution Function

AMS570 Lecture Notes #2

Empirical Process Theory and Oracle Inequalities

Fall 2013 MTH431/531 Real analysis Section Notes

Random Variables, Sampling and Estimation

Intro to Learning Theory

Probability and Random Processes

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

6 Infinite random sequences

f X (12) = Pr(X = 12) = Pr({(6, 6)}) = 1/36

Lecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Self-normalized deviation inequalities with application to t-statistic

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

An Introduction to Asymptotic Theory

The standard deviation of the mean

Lecture 2: April 3, 2013

Sieve Estimators: Consistency and Rates of Convergence

CS 330 Discussion - Probability

Transcription:

CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze the total sum of a certai kid of result i a series of repeated idepedet radom experimets each of which has a well-defied expected value ad fiite variace I other words, a certai kid of result (eg whether the experimet is a success ) has some probability to be produced i each experimet We would like to repeat the experimet may times idepedetly ad uderstad the total sum of the results Beroulli variables We first cosider the sum of a buch of Beroulli variables Specifically, let X, X,, X be iid radom variables with Pr[X i = ] = p, Pr[X i = 0] = p Let S = S = X + X + + X ad we wat to uderstad S Accordig to the liearity of expectatio, we have E[S] = E[X ] + E[X ] + + E[X ] = p Sice X, X, X are idepedet, we have Var[S] = p( p) Now let us use a liear trasformatio to make S mea 0 ad variace Ie let us itroduce Z, a liear fuctio of S, to be Z = S p p( p) Usig µ = p ad σ = p( p), we have Z = S µ σ

Lecture 0: the Cetral Limit Theorem Via this trasformatio, we do ot lose ay iformatio about S = S Specifically, for ay u, we have [ Pr[S u] = Pr[σZ + µ u] = Pr Z u µ ] σ Therefore, we proceed to study the distributio of Z As a special istace, let us temporarily set p = so that X i s become ubiased coi flips I such case, we have Z = X + X + + X = ((X ) + (X ) + + (X )) For each iteger a [0, ], we have [ Pr Z = a ] ( = a) Therefore, we ca easily plot the probability desity curve of Z I Figure, we plot the desity curve for a few values of (a) = 5 (b) = 0 (c) = 0 (d) = 40 Figure : Probability desity curves of Z for a few values of

Lecture 0: the Cetral Limit Theorem 3 We ca see that as, the probability desity curve coverges to a fixed cotiuous curve as illustrated i Figure Figure : The famous Bell curve the probability desity fuctio of a stadard Gaussia variable Ideed, eve whe p = Pr[X i = ] is a costat i (0, ) other tha, the probability desity curve of Z still coverges to the same curve as We call the probability distributio usig such curve as pdf the Gaussia distributio (or Normal distributio) The Cetral Limit Theorem The Cetral Limit Theorem (CLT) for iid radom variables ca be stated as follows Theorem (the Cetral Limit Theorem) Let Z be a stadard Gaussia For ay iid X, X,, X (ot ecessarily biary valued), as, we have Z Z i the sese that u R, Pr[Z u] Pr[Z u] More specifically, for each ɛ > 0, there exists N N so that for every > N ad every u R, we have Pr[Z u] Pr[Z u] < ɛ Defiitio We use Z N (0, ) to deote that Z is a stadard Gaussia variable More specifically, Z is a cotiuous radom variable with probability desity fuctio φ(z) = π e z / We also use Y N (µ, σ) to deote that Y is a Gaussia variable with mea µ ad variace σ, ie Y = σz + µ where Z is a stadard Gaussia Now we itroduce a few facts about Gaussia variables

Lecture 0: the Cetral Limit Theorem 4 Theorem 3 Let Z = (Z, Z,, Z d ) R d, where Z, Z,, Z d are iid stadard Gaussias The the distributio of Z is rotatioally symmetric Ie, the probability desity will be the same for z ad z whe z = z Proof The probability desity fuctio of Z at z = (z, z,, z d ) is φ(z )φ(z )φ(z d ) = which oly depeds o z ( ) d ( ) d e (z +z ++z d )/ = e z, π π The followig corollary says that the fuctio φ( ) is ideed a probability desity fuctio Corollary 4 π e Z dz = Corollary 5 Liear combiatio of idepedet gaussias is still gaussia The Berry-Essee Theorem (CLT with error bouds) Whe desigig ad aalyzig algorithms, we usually eed to kow the covergece rate i order to derive a guaratee o the performace (eg time/space complexity) of the algorithm I this sese, the Cetral Limit Theorem (Theorem ) may ot be practically useful The followig Berry-Essee theorem stregthes the CLT with cocrete error bouds Theorem 6 (the Berry-Essee Theorem) Let X, X,, X be idepedet Assume wlog that E(X i ) = 0 ad Var(X i ) = σi ad i= σ i = Let Z = X + X + + X (Note that E[Z] =, Var[Z] = ) The u R, we have Pr[S u] Pr [Z u] O() β, where β = Σ i= E X i 3 Z N (0,) Remark The hidde costat i the upperboud of the theorem ca be as good as 554 by [She3] Remark The Berry-Essee theorem does ot eed X i s to be idetical Idepedece amog variables is still essetial We still use the ubiased coi flips example to see how this boud works

Lecture 0: the Cetral Limit Theorem 5 Let be idepedet radom variables X i = { + N, wp N, wp We ca check that E[X i ] = 0 ad Var(X i ) =, σ i = satisfy the requiremet i the Berry-Essee theroem We ca also compute that E X i 3 =, ad therefore β = Accordig to the Berry-Essee theorem, we have u R, Pr[S u] Pr Z N (0,) 3 [Z u] 56 () The right-had side ( 56 ) gives a cocrete covergece rate ( ) Now let us ivestigate whether the O upper boud ca be improved Say is eve, the S = #Heads #Tails The S = 0 #H = #T = Now let us estimate this probability usig () For ɛ > 0, we have Pr[#H = #T ] = Pr[S = 0] = Pr[S 0] Pr[S ɛ] = (Pr[S 0] Pr[Z 0]) (Pr[S ɛ] Pr[Z ɛ]) + (Pr[Z 0] Pr[Z ɛ]) Takig ɛ 0 +, we have Pr[S 0] Pr[Z 0] Pr[S ɛ] Pr[Z ɛ] + Pr[ ɛ < Z 0] Pr[#H = #T ] Pr[S 0] Pr[Z 0] Pr[S ɛ] Pr[Z ɛ] 56 + 56 =, () where the last iequality is because of () O the other had, it is easy to see that P r[#h = #T ] = ( ) Usig Sterlig s approximatio, whe, we have π( e Pr[#H = #T ] ) π ( e ) = 798 (3) π ( If we had a essetially better upper boud (say o )) i (), we would get a upper boud ( ) of o i () This would cotradict (3) Therefore the upper boud i () give by the Berry-Essee theorem is asymptotically tight

Lecture 0: the Cetral Limit Theorem 6 Refereces [She3] I G Shevtsova O the absolute costats i the Berry Essee iequality ad its structural ad ouiform improvemets Iform Prime, 7():4 5, 03