Lecture 02: Bounding tail distributions of a random variable

Similar documents
Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Lecture 4 Sep 9, 2015

Parameter, Statistic and Random Samples

Law of Large Numbers

Econometric Methods. Review of Estimation

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

,m = 1,...,n; 2 ; p m (1 p) n m,m = 0,...,n; E[X] = np; n! e λ,n 0; E[X] = λ.

Lecture 3 Probability review (cont d)

D KL (P Q) := p i ln p i q i

5 Short Proofs of Simplified Stirling s Approximation

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

The Occupancy and Coupon Collector problems

3. Basic Concepts: Consequences and Properties

18.657: Mathematics of Machine Learning

1 Solution to Problem 6.40

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

Chapter 5 Properties of a Random Sample

Introduction to Probability

Lecture 9: Tolerant Testing

Lecture Notes Types of economic variables

CHAPTER 4 RADICAL EXPRESSIONS

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

ρ < 1 be five real numbers. The

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Laboratory I.10 It All Adds Up

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Complete Convergence and Some Maximal Inequalities for Weighted Sums of Random Variables

Special Instructions / Useful Data

Module 7. Lecture 7: Statistical parameter estimation

STK4011 and STK9011 Autumn 2016

Chapter 8: Statistical Analysis of Simulated Data

Lecture 3. Sampling, sampling distributions, and parameter estimation

Introduction to local (nonparametric) density estimation. methods

Functions of Random Variables

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

Chapter 3 Sampling For Proportions and Percentages

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

X ε ) = 0, or equivalently, lim

A tighter lower bound on the circuit size of the hardest Boolean functions

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Chapter 4 Multiple Random Variables

Qualifying Exam Statistical Theory Problem Solutions August 2005

Bayes (Naïve or not) Classifiers: Generative Approach

TESTS BASED ON MAXIMUM LIKELIHOOD

Multivariate Transformation of Variables and Maximum Likelihood Estimation

2SLS Estimates ECON In this case, begin with the assumption that E[ i

Summary of the lecture in Biostatistics

Chapter 14 Logistic Regression Models

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

Class 13,14 June 17, 19, 2015

6.867 Machine Learning

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

Maximum Likelihood Estimation

Point Estimation: definition of estimators

Sampling Theory MODULE X LECTURE - 35 TWO STAGE SAMPLING (SUB SAMPLING)

Idea is to sample from a different distribution that picks points in important regions of the sample space. Want ( ) ( ) ( ) E f X = f x g x dx

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Mu Sequences/Series Solutions National Convention 2014

M2S1 - EXERCISES 8: SOLUTIONS

ECON 5360 Class Notes GMM

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Dimensionality Reduction and Learning

Module 7: Probability and Statistics

Arithmetic Mean Suppose there is only a finite number N of items in the system of interest. Then the population arithmetic mean is

Simulation Output Analysis

Learning Theory: Lecture Notes

Problem Set 2 Solutions

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Random Variables and Probability Distributions

Lecture 8: Linear Regression

Chain Rules for Entropy

COV. Violation of constant variance of ε i s but they are still independent. The error term (ε) is said to be heteroscedastic.

( ) ( ) ( ) f ( ) ( )

The expected value of a sum of random variables,, is the sum of the expected values:

Part I: Background on the Binomial Distribution

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS

CHAPTER VI Statistical Analysis of Experimental Data

ENGI 3423 Simple Linear Regression Page 12-01

22 Nonparametric Methods.

9.1 Introduction to the probit and logit models

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

1 Review and Overview

Lecture 7: Linear and quadratic classifiers

F. Inequalities. HKAL Pure Mathematics. 進佳數學團隊 Dr. Herbert Lam 林康榮博士. [Solution] Example Basic properties

1 Mixed Quantum State. 2 Density Matrix. CS Density Matrices, von Neumann Entropy 3/7/07 Spring 2007 Lecture 13. ψ = α x x. ρ = p i ψ i ψ i.

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Application of Generating Functions to the Theory of Success Runs

ε. Therefore, the estimate

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

This section is optional.

Transcription:

CSCI-B609: A Theorst s Toolkt, Fall 206 Aug 25 Lecture 02: Boudg tal dstrbutos of a radom varable Lecturer: Yua Zhou Scrbe: Yua Xe & Yua Zhou Let us cosder the ubased co flps aga. I.e. let the outcome of the -th co toss to be a radom varables { +, w.p. 2 X =., w.p. 2 We assume all co tosses are depedet ad let would lke to study the sum S of the frst co tosses, S = X. = I ths lecture, we would lke the study the probablty that S greatly devates from ts mea E S = 0. Specfcally, for some parameter t, we would lke to estmate the probablty Pr[S > t]. Itutvely, we kow that such probablty should be small for large eough t. The goal of ths lecture (ad part of the ext oe) s to derve qualtatve upper bouds o the tal probablty mass parameterzed by t. As we dd the prevous lecture, usg the Berry-Essee theorem, we kow that Pr[S t] Pr[G t] O(), where G N (0, ) s a stadard Gaussa. For coveece, we may also use the followg formal otato Pr[S t] = Pr[G t] ± O(). () Usg basc calculus, we ca estmate that Pr[G t] = + t 2π e t2 2 dt O() e t 2 2. (2) Now let us fx the parameter t = 0 l. Combg () ad (2), we have Pr[S ( ) ( ( )) t] Pr[G t] + O = O exp (0 l ) 2 2 ( ) = O 50 ( ) + O ( ) ( ) + O = O.

Lecture 02: Boudg tal dstrbutos of a radom varable 2 We see that the tal mass of the stadard Gaussa s oly O ( ) ( ). However, the error term 50 O troduced by the Berry-Essee theorem s much greater. Ths error term s the ma reaso that we ca t get better results. I the followg part of ths lecture, we wll try varous other methods to mprove the upper boud. Markov equalty Whe we oly kow the mea of a oegatve radom varable, Markov equalty gves a smple upper boud o the probablty that t devates from ts mea. Theorem (Markov equalty). Gve a radom varable X, assume X 0. For each parameter t, we have Pr[X t E[X]] t. Proof. For each α > 0, we have E[X] = Pr[X α] E[X X α] + Pr[X < α] E[X X < α] Pr[X α] α + Pr[X < α] 0 = Pr[X α] α. Dvdg both sdes of the equalty by α > 0 E[X] α Takg α = E[X] t we get the desred boud. Pr[X α] Now let us try to apply the Markov equalty to boudg the tal mass of S. Sce S s ot a oegatve radom varable, we caot drectly apply the equalty. However, ote that S always holds. We apply the equalty to T = S + where E T = E S + =. Let t = 0 l. We have Pr[S t] = Pr[T t + ] = Pr [ T (E T ) t + ] t + = Ths s a very bad boud t does ot eve coverge to 0 as grows! + 0 l. 2 Chebyshev equalty The Chebyshev equalty ot oly uses the mea of the radom varable, but also eeds the varace (or the secod momet). Sce we have more formato about the radom varable, we may potetally get better bouds.

Lecture 02: Boudg tal dstrbutos of a radom varable 3 Theorem 2 (Chebyshev equalty). Asumme that E[X] = µ ad Var[X] = σ 2 > 0. For every parameter t > 0, we have Pr[ X µ t σ] t 2. Proof. Let Y = (X µ) 2. We ca check that E[Y ] = σ 2 ad Y 0. Applyg Markov equalty, we have Pr[ X µ t σ] = Pr[(X µ) 2 t 2 σ 2 ] = Pr[Y t 2 E[Y ]] t 2. Now let us go back to the scearo dscussed at the begg of ths lecture. We compute that µ = E S = 0 ad σ = Var[S] = E S 2 =. Therefore Pr[S 0 l ] Pr[ S 0 l ] [ = Pr S σ 0 l σ ] σ 2 (0 l ) = 2 00 l. Ths boud s stll ot as good as expected. However, at least t coverges to 0 as. Remark. Note that Chebyshev equalty oly eeds parwse depedece amog X s. Specfcally, whe computg the varace of S, we have Var[S] = Var[X + X 2 +... + X ] = E[(X + X 2 +... + X ) 2 ] E[(X + X 2 +... + X )] 2 = E[(X + X 2 +... + X ) 2 ] = E[X X j ] = E[X 2 ] =. = E[X 2 ] + j I the peultmate equalty, we used the fact that X s depedet from X j for j. = 3 The fourth momet method Usg the frst two momets, we had better bouds the oly usg the mea of the radom varable. Now let us try to exted ths method to the fourth momet. Let us cosder S 4 0. By Markov equalty, we have Pr[S 0 l ] Pr [S 4 (0 ] l ) 4 E S 4 0000 2 l 2. (3)

Lecture 02: Boudg tal dstrbutos of a radom varable 4 Now let us estmate ( ) 4 E[S 4 ] = E X = E X 4 + j E X 2 Xj 2 ( ) 4 + 2 2 ( ) 4 E X Xj 3 + j + j k:k,k j j k:k,k j q:q,q j,q k ( ) 4 E X X j Xk 2 2 E X X j X k X q. (4) Fortuately because of depedece ad E X = 0, we have that E X Xj 3 = E X X j Xk 2 = E X X j X k X q = 0. Therefore we ca smplfy (4) by E[S 4 ] = E X 4 + E X 2 Xj 2 3 = + 3( ) 3 2. (5) Combg (3) ad (5), we get j Pr[S 0 l ] 3 2 0000 2 l 2 = 3 0000 l 2. Ths s a better boud tha what we get from Chebyshev. Remark 2. Whe usg the fourth momet method, we oly used the depedece amog every quadruple of radom varables. Therefore the boud works for 4-wse depedet radom varables too. Remark 3. We ca exted ths method to cosderg S 2k for postve teger k s, ad pckg the k that optmzes the upper boud. However, ths pla would lead to the paful estmato of E S 2k. We wll use a slghtly dfferet method to get better upper bouds. 4 The Cheroff method Istead of S 2k, let us cosder the fucto e λs for some postve parameter λ. Sce e x s a mootocally creasg fucto, we have Pr[S 0 l ] = Pr[λS 0λ l ] Pr[e λs e 0λ l ]. By Markov equalty (also checkg that e λs > 0, we have Pr[e λs e 0λ l ] E e λs e 0λ l. (6)

Lecture 02: Boudg tal dstrbutos of a radom varable 5 Now t remas to upper boud E e λs. We have E e λs = E e λ X ] = E e λx = E e λx. (7) Note that the last equalty, we used the full depedece amog all X s. O the other had, by the dstrbuto of X, we have E e λx = 2 eλ + 2 e λ = ) ( + λ + λ2 2 2! + λ3 3! + λ4 4! +... + 2 = + λ2 2! + λ4 4! + λ6 6! +... eλ2 /2. Gettg back to (7), we have ) ( λ + λ2 2! λ3 3! + λ4 4!... (Taylor expaso) E e λs e λ2 /2 = e λ2 /2. Combg ths wth (6), we have Pr[e λs e 0λ l ] e λ2 /2 0λ l. (8) l Pckg λ = 0, we mmze the rght-had sde of (8) ad get our desred upper boud Pr[e λs e 0λ l ] e 50 l 00 l =. 50 I the begg of the ext lecture, we are gog to exted ths method to more geeral radom varables ad geeral thresholds, ad go through the proof the famous Cheroff boud.