Probability inequalities 11

Similar documents
Central limit theorem. Paninski, Intro. Math. Stats., October 5, probability, Z N P Z, if

Introduction. Outline. Paninski, Intro. Math. Stats., October 5, What is statistics? inference from data. building models of the world

Lecture 11. Probability Theory: an Overveiw

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

Math Bootcamp 2012 Miscellaneous

(, ) : R n R n R. 1. It is bilinear, meaning it s linear in each argument: that is

6.1 Moment Generating and Characteristic Functions

1 Stat 605. Homework I. Due Feb. 1, 2011

Discrete Random Variables

Lecture 4: September Reminder: convergence of sequences

Lecture 1 Measure concentration

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Lecture 22: Variance and Covariance

Joint Probability Distributions and Random Samples (Devore Chapter Five)

CLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES

Tail and Concentration Inequalities

STAT 200C: High-dimensional Statistics

In other words, we are interested in what is happening to the y values as we get really large x values and as we get really small x values.

ABSTRACT CONDITIONAL EXPECTATION IN L 2

Week 2. Review of Probability, Random Variables and Univariate Distributions

y = 7x 2 + 2x 7 ( x, f (x)) y = 3x + 6 f (x) = 3( x 3) 2 dy dx = 3 dy dx =14x + 2 dy dy dx = 2x = 6x 18 dx dx = 2ax + b

ECEN 5612, Fall 2007 Noise and Random Processes Prof. Timothy X Brown NAME: CUID:

The n th moment M n of a RV X is defined to be E(X n ), M n = E(X n ). Andrew Dabrowski

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)

Lecture 10: Powers of Matrices, Difference Equations

Probability Background

Exercise 2. Prove that [ 1, 1] is the set of all the limit points of ( 1, 1] = {x R : 1 <

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write

Distance in the Plane

11.1 Set Cover ILP formulation of set cover Deterministic rounding

2. Variance and Higher Moments

Parabolas and lines

Disjointness and Additivity

Unit #16 : Differential Equations

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Matroids and Greedy Algorithms Date: 10/31/16

COMPSCI 240: Reasoning Under Uncertainty

Solving with Absolute Value

Midterm 1 Review. Distance = (x 1 x 0 ) 2 + (y 1 y 0 ) 2.

Euclidean Spaces. Euclidean Spaces. Chapter 10 -S&B

Tropical Polynomials

SUFFICIENT STATISTICS

Lecture 2 One too many inequalities

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Lecture 1. ABC of Probability

Expectations and moments

8 Laws of large numbers

Section 3.1: Definition and Examples (Vector Spaces), Completed

4.4 Graphs of Logarithmic Functions

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

8.5 Taylor Polynomials and Taylor Series

V. Graph Sketching and Max-Min Problems

1. Which one of the following points is a singular point of. f(x) = (x 1) 2/3? f(x) = 3x 3 4x 2 5x + 6? (C)

From Calculus II: An infinite series is an expression of the form

Most data analysis starts with some data set; we will call this data set P. It will be composed of a set of n

Sums of Squares (FNS 195-S) Fall 2014

Chapter 1 Review of Equations and Inequalities

17. Convergence of Random Variables

Homework 4 Solutions

Regression Analysis. Ordinary Least Squares. The Linear Model

R. Koenker Spring 2017 Economics 574 Problem Set 1

p. 6-1 Continuous Random Variables p. 6-2

INTRODUCTION TO REAL ANALYSIS II MATH 4332 BLECHER NOTES

9. TRANSFORMING TOOL #2 (the Multiplication Property of Equality)

Handout 1: Probability

The distribution inherited by Y is called the Cauchy distribution. Using that. d dy ln(1 + y2 ) = 1 arctan(y)

Exponents. Reteach. Write each expression in exponential form (0.4)

Product measure and Fubini s theorem

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

Lecture 4. P r[x > ce[x]] 1/c. = ap r[x = a] + a>ce[x] P r[x = a]

Lecture 4 Lebesgue spaces and inequalities

Fall TMA4145 Linear Methods. Exercise set 10

Tom Salisbury

Math 2 Variable Manipulation Part 7 Absolute Value & Inequalities

Section 20: Arrow Diagrams on the Integers

Math 276, Spring 2007 Additional Notes on Vectors

Math 142 (Summer 2018) Business Calculus 6.1 Notes

Solution to Proof Questions from September 1st

Integer-Valued Polynomials

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

DOUBLE SERIES AND PRODUCTS OF SERIES

Math 52: Course Summary

Handout 1: Mathematical Background

WEEK 7 NOTES AND EXERCISES

Handout 1: Mathematical Background

Stat410 Probability and Statistics II (F16)

2.2 Graphs of Functions

CSE 312 Final Review: Section AA

Stochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring Luc Rey-Bellet

Evaluating Integrals (Section 5.3) and the Fundamental Theorem of Calculus (Section 1 / 15 (5.4

Problem 1: (3 points) Recall that the dot product of two vectors in R 3 is

CS 361: Probability & Statistics

base 2 4 The EXPONENT tells you how many times to write the base as a factor. Evaluate the following expressions in standard notation.

Probability and Statistics

The main results about probability measures are the following two facts:

The definition, and some continuity laws. Types of discontinuities. The Squeeze Theorem. Two special limits. The IVT and EVT.

Suppose that f is continuous on [a, b] and differentiable on (a, b). Then

Inner product spaces. Layers of structure:

{X i } realize. n i=1 X i. Note that again X is a random variable. If we are to

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Transcription:

Paninski, Intro. Math. Stats., October 5, 2005 29 Probability inequalities 11 There is an adage in probability that says that behind every limit theorem lies a probability inequality (i.e., a bound on the probability of some undesired event happening). Since a large part of probability theory is about proving limit theorems, people have developed a bewildering number of inequalities. Luckily, we ll only need a few key inequalities. Even better, three of them are really just versions of one another. Exercise 29: Can you think of example distributions for which each of the following inequalities are tight (that is, the inequalities may be replaced by equalities)? Are these extremal distributions unique? Markov s inequality Figure 5: Markov. For a nonnegative r.v. X, P (X > u) E(X) u. So if E(X) is small and we know X 0, then X must be near zero with high probability. (Note that the inequality is not true if X can be negative.) 11 HMC 1.10

Paninski, Intro. Math. Stats., October 5, 2005 30 The proof is really simple: up (X > u) Chebyshev s inequality tp X (t)dt u 0 tp X (t)dt = E(X). Figure 6: Chebyshev. aka P ( X E(X) > u) V (X), u 2 ( ) X E(X) P > u 1 σ(x) u. 2 Proof: just look at the (nonnegative) r.v. (X E(X)) 2, and apply Markov. So if the variance of X is really small, X is close to its mean with high probability. Chernoff s inequality P (X > u) = P (e sx > e su ) e su M(s). So the mgf controls the size of the tail of the distribution yet another surprising application of the mgf idea.

Paninski, Intro. Math. Stats., October 5, 2005 31 Figure 7: Chernoff. The really nice thing about this bound is that it is easy to deal with sums of independent r.v. s (recall our discussion above of mgf s for sums of independent r.v. s). Exercise 30: Derive Chernoff s bound for sums of independent r.v. s (i.e., derive an upper bound for the probability that i X i is greater than u, in terms of M(X i )). The other nice thing is that the bound is exponentially decreasing in u, which is much stronger than Chebyshev. (On the other hand, since not all r.v. s have mgf s, Chernoff s bound can be applied less generally than Chebyshev.) The other other nice thing is that the bound holds for all s simultaneously, so if we need as tight a bound as possible, we can use i.e., we can minimize over s. Jensen s inequality P (X > u) inf s e su M(s), This one is more geometric. Think about a function g(u) which is curved upward, that is, g (u) 0, for all u. Such a g(u) is called convex. (Downward-curving functions are called concave. More generally, a convex

Paninski, Intro. Math. Stats., October 5, 2005 32 Figure 8: Jensen. function is bounded above by its chords: g(tx + (1 t)y) tg(x) + (1 t)g(y), while a concave function is bounded below.) Then if you draw yourself a picture, it s easy to see that E(g(X)) g(e(x)). That is, the average of g(x) is always greater than or equal to g evaluated at the average of X. Exercise 31: Prove this. (Hint: try subtracting off f(x), where f is a linear function of X such that g(x) f(x) reaches a minimum at E(X).) Exercise 32: What does this inequality tell you about the means of 1/X? of X log X? About E i (X) vs. E j (X), where i > j? Cauchy-Schwarz inequality C(X, Y ) σ(x)σ(y ), that is, the correlation coefficient is bounded between 1 (X and Y are anti-correlated) and 1 (correlated).

Paninski, Intro. Math. Stats., October 5, 2005 33 The proof of this one is based on our rules for adding variance: C(X, Y ) = 1 [V (X + Y ) V (X) V (Y )] 2 (assuming E(X) = E(Y ) = 0). Exercise 33: Complete the proof. (Hint: try looking at X and X, using the fact that C( X, Y ) = C(X, Y ).) Exercise for the people who have taken linear algebra: interpret the Cauchy-Schwarz inequality in terms of the angle between the vectors X and Y (where we think of functions that is, r.v. s as vectors, and define the dot product as E(XY ) and the length of a vector as E(X 2 )). Thus this inequality is really geometric in nature.

Paninski, Intro. Math. Stats., October 5, 2005 34 Limit theorems 12 An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question. The first golden rule of applied mathematics, sometimes attributed to John Tukey (Weak) law of large numbers Chebyshev s simple inequality is enough to prove perhaps the fundamental result in probability theory: the law of averages. This says that if we take the sample average of a bunch of i.i.d. r.v. s, the sample average will be close to the true average. More precisely, under the assumption that V (X) <, then ( ) P E(X) 1 N X i > ɛ 0 N i=1 as N, no matter how small ɛ is. The proof: ( ) 1 N E X i = E(X), N i=1 by the linearity of the expectation. ( ) 1 N V X i N i=1 = V (X) N, by the rules for adding variance and the fact that X i are independent. Now just look at Chebyshev. Remember, the LLN does not hold for all r.v. s: remember what happened when you took averages of i.i.d. Cauchy r.v. s? Exercise 34: What goes wrong in the Cauchy case? Stochastic convergence concepts In the above, we say that the sample mean 1 N N i=1 X i converges in probability to the true mean. More generally, we say r.v. s Z N converge to Z in 12 HMC chapter 4