Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

Similar documents
Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

p-adic Egyptian Fractions

Continuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

1B40 Practical Skills

Review of Gaussian Quadrature method

Interpreting Integrals and the Fundamental Theorem

Improper Integrals. The First Fundamental Theorem of Calculus, as we ve discussed in class, goes as follows:

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

Review of Probability Distributions. CS1538: Introduction to Simulations

The Regulated and Riemann Integrals

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

Quadratic Forms. Quadratic Forms

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

1 Probability Density Functions

Lecture 2e Orthogonal Complement (pages )

Section 4: Integration ECO4112F 2011

Parse trees, ambiguity, and Chomsky normal form

10. AREAS BETWEEN CURVES

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

2.4 Linear Inequalities and Interval Notation

Lecture Solution of a System of Linear Equation

The area under the graph of f and above the x-axis between a and b is denoted by. f(x) dx. π O

Lecture 3: Equivalence Relations

Math 426: Probability Final Exam Practice

CS667 Lecture 6: Monte Carlo Integration 02/10/05

Bases for Vector Spaces

Chapters Five Notes SN AA U1C5

Lecture 2: January 27

Surface maps into free groups

Section 6: Area, Volume, and Average Value

Continuous Random Variables

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Lecture 17

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

Designing Information Devices and Systems I Discussion 8B

Chapter 9 Definite Integrals

Linear Inequalities. Work Sheet 1

Chapter 6 Continuous Random Variables and Distributions

4.1. Probability Density Functions

MTH 505: Number Theory Spring 2017

Vectors , (0,0). 5. A vector is commonly denoted by putting an arrow above its symbol, as in the picture above. Here are some 3-dimensional vectors:

Section 6.1 Definite Integral

DEFINITION The inner product of two functions f 1 and f 2 on an interval [a, b] is the number. ( f 1, f 2 ) b DEFINITION 11.1.

Suppose we want to find the area under the parabola and above the x axis, between the lines x = 2 and x = -2.

Mathematics Number: Logarithms

The practical version

Section 6.1 INTRO to LAPLACE TRANSFORMS

The steps of the hypothesis test

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

1 Structural induction, finite automata, regular expressions

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

Chapter 2. Random Variables and Probability Distributions

Math 135, Spring 2012: HW 7

Recitation 3: More Applications of the Derivative

Convert the NFA into DFA

set is not closed under matrix [ multiplication, ] and does not form a group.

Polynomial Approximations for the Natural Logarithm and Arctangent Functions. Math 230

Lecture 3 Gaussian Probability Distribution

10 Vector Integral Calculus

Math 61CM - Solutions to homework 9

20 MATHEMATICS POLYNOMIALS

Best Approximation. Chapter The General Case

Math 1B, lecture 4: Error bounds for numerical methods

MA123, Chapter 10: Formulas for integrals: integrals, antiderivatives, and the Fundamental Theorem of Calculus (pp.

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

September 13 Homework Solutions

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Derivations for maximum likelihood estimation of particle size distribution using in situ video imaging

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

Things to Memorize: A Partial List. January 27, 2017

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).

Theoretical foundations of Gaussian quadrature

Homework Solution - Set 5 Due: Friday 10/03/08

This chapter will show you. What you should already know. 1 Write down the value of each of the following. a 5 2

Bridging the gap: GCSE AS Level

APPENDIX. Precalculus Review D.1. Real Numbers and the Real Number Line

Linear Systems with Constant Coefficients

Numerical integration

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

We partition C into n small arcs by forming a partition of [a, b] by picking s i as follows: a = s 0 < s 1 < < s n = b.

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as

Continuous Random Variables

Lecture 1. Functional series. Pointwise and uniform convergence.

The Shortest Confidence Interval for the Mean of a Normal Distribution

Minimal DFA. minimal DFA for L starting from any other

Regular Language. Nonregular Languages The Pumping Lemma. The pumping lemma. Regular Language. The pumping lemma. Infinitely long words 3/17/15

CS 188: Artificial Intelligence Fall Announcements

For the percentage of full time students at RCC the symbols would be:

Entropy and Ergodic Theory Notes 10: Large Deviations I

CS103 Handout 32 Fall 2016 November 11, 2016 Problem Set 7

Infinite Geometric Series

QUADRATURE is an old-fashioned word that refers to

Line integrals, and arc length of curves. Some intuitive explanations and definitions.

Chapter 5 : Continuous Random Variables

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

Abstract inner product spaces

Tests for the Ratio of Two Poisson Rates

Homework Assignment 3 Solution Set

Describe in words how you interpret this quantity. Precisely what information do you get from x?

Transcription:

CS 70 Discrete Mthemtics nd Proility Theory Summer 2014 Jmes Cook Note 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion, y tking smll rndom smple. How lrge does our smple hve to e to gurntee tht our estimte will e within (sy) n dditive fctor of 0.1 of the true vlue with proility t lest 0.95? This is perhps the most sic sttisticl estimtion prolem, nd shows up everywhere. We will develop simple solution tht uses only Cheyshev s inequlity. More refined methods cn e used to get shrper results. Let s denote the size of our smple y n (to e determined), nd the numer of Democrts in it y the rndom vrile S n. (The suscript n just reminds us tht the rndom vrile depends on the size of the smple.) Then our estimte will e the vlue A n = 1 n S n. Now s hs often een the cse, we will find it helpful to write S n = X 1 + X 2 + + X n, where { 1 if person i in smple is Democrt; X i = 0 otherwise. Note tht ech X i cn e viewed s coin toss, with Heds proility p (though of course we do not know the vlue of p!). And the coin tosses re independent 1. We cll such fmily of rndom vriles independent, identiclly distriuted, or i.i.d. for short. Wht is the expecttion of our estimte? E(A n ) = E( 1 n S n) = 1 n E(X 1 + X 2 + + X n ) = 1 n (np) = p. So for ny vlue of n, our estimte will lwys hve the correct expecttion p. (Such r.v. is often clled n unised estimtor of p.) Now presumly, s we increse our smple size n, our estimte should get more nd more ccurte. This will show up in the fct tht the vrince decreses s n grows: i.e, the proility tht we re fr from the men p will get smller. To see this, we need to compute Vr(A n ). But A n = 1 n n i=1 X i, which is just multiple of sum of independent rndom vriles. Theorem 17.1: For ny rndom vrile X nd constnt c, we hve Vr(cX) = c 2 Vr(X). And for independent rndom vriles X nd Y, we hve Vr(X +Y ) = Vr(X) + Vr(Y ). 1 We re ssuming here tht the smpling is done with replcement ; i.e, we select ech person in the smple from the entire popultion, including those we hve lredy picked. So there is smll chnce tht we will pick the sme person twice. CS 70, Summer 2014, Note 17 1

Before we prove this theorem, let us look more crefully t something we hve een using implicitly for some time: Joint Distriutions Consider two rndom vriles X nd Y defined on the sme proility spce. By linerity of expecttion, we know tht E(X +Y ) = E(X)+E(Y ). Since E(X) cn e clculted if we know the distriution of X nd E(Y ) cn e clculted if we know the distriution of Y, this mens tht E(X +Y ) cn e computed knowing only the two individul distriutions. No informtion is needed out the reltionship etween X nd Y. This is not true if we need to compute, sy, E((X +Y ) 2 ), e.g. s when we computed the vrince of inomil r.v. This is ecuse E((X +Y ) 2 ) = E(X 2 )+2E(XY )+E(Y 2 ), nd E(XY ) depends on the reltionship etween X nd Y. How cn we cpture such reltionship? Recll tht the distriution of single rndom vrile X is the collection of the proilities of ll events X =, for ll possile vlues of tht X cn tke on. When we hve two rndom vriles X nd Y, we cn think of (X,Y ) s two-dimensionl rndom vrile, in which cse the events of interest re X = Y = for ll possile vlues of (,) tht (X,Y ) cn tke on. Thus, nturl generliztion of the notion of distriution to multiple rndom vriles is the following. Definition 17.1 (joint distriution): The joint distriution of two discrete rndom vriles X nd Y is the collection of vlues {(,,Pr[X = Y = ]) : (,) A B}, where A nd B re the sets of ll possile vlues tken y X nd Y respectively. This notion oviously generlizes to three or more rndom vriles. Since we will write Pr[X = Y = ] quite often, we will revite it to Pr[X =,Y = ]. Just like the distriution of single rndom vrile, the joint distriution is normlized, i.e. Pr[X =,Y = ] = 1. A, B This follows from noticing tht the events X = Y =, A, B, prtition the smple spce. The joint distriution of two rndom vriles fully descrie their sttisticl reltionships, nd provides enough informtion to compute ny proility or expecttion involving the two rndom vriles. For exmple, E(XY ) = c More generlly, if f is ny function on R R, E( f (X,Y )) = c c Pr[XY = c] = Pr[X =,Y = ]. c Pr[ f (X,Y ) = c] = f (,) Pr[X =,Y = ]. Moreover, the individul distriutions of X nd Y cn e recovered from the joint distriution s follows: Pr[X = ] = Pr[X =,Y = ] A, (1) B Pr[Y = ] = Pr[X =,Y = ] B. (2) A The first follows from the fct tht the events Y =, B, form prtition of the smple spce Ω, nd so the events X = Y =, B re disjoint nd their union yields the event X =. Similr logic pplies to the second fct. Pictorilly, one cn think of the joint distriution vlues s entries filling tle, with the columns indexed y the vlues tht X cn tke on nd the rows indexed y the vlues Y cn tke on (Figure 1). To get the CS 70, Summer 2014, Note 17 2

Figure 1: A tulr representtion of joint distriution. distriution of X, ll one needs to do is to sum the entries in ech of the columns. To get the distriution of Y, just sum the entries in ech of the rows. This process is sometimes clled mrginliztion nd the individul distriutions re sometimes clled mrginl distriutions to differentite them from the joint distriution. Independent Rndom Vriles Independence of rndom vriles is defined in nlogous fshion to independence for events: Definition 17.2 (independent r.vs): Rndom vriles X nd Y on the sme proility spce re sid to e independent if the events X = nd Y = re independent for ll vlues,. Equivlently, the joint distriution of independent r.vs decomposes s Pr[X =,Y = ] = Pr[X = ]Pr[Y = ],. Note tht for independent r.vs, the joint distriution is fully specified y the mrginl distriutions. Mutul independence of more thn two r.vs is defined similrly. A very importnt exmple of independent r.vs is indictor r.vs for independent events. Thus, for exmple, if {X i } re indictor r.vs for the ith toss of coin eing Heds, then the X i re mutully independent r.vs. We sw tht the expecttion of sum of r.vs is the sum of the expecttions of the individul r.vs. This is not true in generl for vrince. However, s the ove theorem sttes, this is true if the rndom vriles re independent. To see this, first we look t the expecttion of product of independent r.vs (which is quntity tht frequently shows up in vrince clcultions, s we hve seen). Theorem 17.2: For independent rndom vriles X nd Y, we hve E(XY ) = E(X)E(Y ). Proof: We hve E(XY ) = = = ( Pr[X =,Y = ] Pr[X = ] Pr[Y = ] Pr[X = ] = E(X) E(Y ), ) ( Pr[Y = ] s required. In the second line here we mde crucil use of independence. For exmple, this theorem would hve llowed us to conclude immeditely in our rndom wlk exmple t the eginning of Lecture Note 16 tht E(X i X j ) = E(X i )E(X j ) = 0, without the need for clcultion. ) CS 70, Summer 2014, Note 17 3

We now use the ove theorem to conclude the nice property of the vrince of independent rndom vriles stted in the theorem ove, nmely tht for independent rndom vriles X nd Y, Vr(X +Y ) = Vr(X)+ Vr(Y ): Proof: From the lterntive formul for vrince in Theorem 16.1, we hve, using linerity of expecttion extensively, Vr(X +Y ) = E((X +Y ) 2 ) E(X +Y ) 2 = E(X 2 ) + E(Y 2 ) + 2E(XY ) (E(X) + E(Y )) 2 = (E(X 2 ) E(X) 2 ) + (E(Y 2 ) E(Y ) 2 ) + 2(E(XY ) E(X)E(Y )) = Vr(X) + Vr(Y ) + 2(E(XY ) E(X)E(Y )). Now ecuse X nd Y re independent, y Theorem 18.1 the finl term in this expression is zero. Hence we get our result. Note: The expression E(XY ) E(X)E(Y ) ppering in the ove proof is clled the covrince of X nd Y, nd is mesure of the dependence etween X nd Y. It is zero when X nd Y re independent. It is very importnt to rememer tht neither of these two results is true in generl, without the ssumption tht X nd Y re independent. As simple exmple, note tht even for 0-1 r.v. X with Pr[X = 1] = p, E(X 2 ) = p is not equl to E(X) 2 = p 2 (ecuse of course X nd X re not independent!). Note lso tht the theorem does not quite sy tht vrince is liner for independent rndom vriles: it sys only tht vrinces sum. It is not true tht Vr(cX) = cvr(x) for constnt c. It sys tht Vr(cX) = c 2 Vr(X). The proof is left s strightforwrd exercise. We now return to our exmple of estimting the proportion of Democrts, where we were out to compute Vr(A n ): Vr(A n ) = Vr( 1 n n X i ) = ( 1 n n )2 Vr( X i ) = ( 1 n )2 n Vr(X i ) = σ 2 i=1 i=1 i=1 n, where we hve written σ 2 for the vrince of ech of the X i. So we see tht the vrince of A n decreses linerly with n. This fct ensures tht, s we tke lrger nd lrger smple sizes n, the proility tht we devite much from the expecttion p gets smller nd smller. Let s now use Cheyshev s inequlity to figure out how lrge n hs to e to ensure specified ccurcy in our estimte of the proportion of Democrts p. A nturl wy to mesure this is for us to specify two prmeters, ε nd δ, oth in the rnge (0,1). The prmeter ε controls the error we re prepred to tolerte in our estimte, nd δ controls the confidence we wnt to hve in our estimte. A more precise version of our originl question is then the following: Question: For the Democrt-estimtion prolem ove, how lrge does the smple size n hve to e in order to ensure tht Pr[ A n p ε] δ? In our originl question, we hd ε = 0.1 nd δ = 0.05. Let s pply Cheyshev s inequlity to nswer our more precise question ove. Since we know Vr(A n ), this will e quite simple. From Cheyshev s inequlity, we hve Pr[ A n p ε] Vr(A n) ε 2 = σ 2 nε 2 CS 70, Summer 2014, Note 17 4

To mke this less thn the desired vlue δ, we need to set n σ 2 ε 2 δ. (3) Now recll tht σ 2 = Vr(X i ) is the vrince of single smple X i. So, since X i is 0/1-vlued r.v, we hve σ 2 = p(1 p), nd inequlity (3) ecomes n p(1 p) ε 2. (4) δ Since p(1 p) is tkes on its mximum vlue for p = 1/2, we cn conclude tht it is sufficient to choose n such tht: n 1 4ε 2 δ. (5) Plugging in ε = 0.1 nd δ = 0.05, we see tht smple size of n = 500 is sufficient. Notice tht the size of the smple is independent of the totl size of the popultion! This is how polls cn ccurtely estimte quntities of interest for popultion of severl hundred million while smpling only very smll numer of people. Estimting generl expecttion Wht if we wnted to estimte something little more complex thn the proportion of Democrts in the popultion, such s the verge welth of people in the US? Then we could use exctly the sme scheme s ove, except tht now the r.v. X i is the welth of the ith person in our smple. Clerly E(X i ) = µ, the verge welth (which is wht we re trying to estimte). And our estimte will gin e A n = 1 n n i=1 X i, for suitly chosen smple size n. Once gin the X i re i.i.d. rndom vriles, so we gin hve E(A n ) = µ nd Vr(A n ) = σ 2 n, where σ 2 = Vr(X i ) is the vrince of the X i. (Recll tht the only fcts we used out the X i ws tht they were independent nd hd the sme distriution ctully the sme expecttion nd vrince would e enough: why?) This time, however, since we do not hve ny priori ound on the men µ, it mkes more sense to let ε e the reltive error. i.e. we wish to find n estimte tht is within n dditive error of εµ. Using eqution (3), ut sustituting εµ in plce of ε, it is enough for the smple size n to stisfy n σ 2 µ 2 1 ε 2 δ. (6) Here ε nd δ re the desired reltive error nd confidence respectively. Now of course we don t know the other two quntities, µ nd σ 2, ppering in eqution (6). In prctice, we would use lower ound on µ nd n upper ound on σ 2 (just s we used lower ound on p in the Democrts prolem). Plugging these ounds into eqution (6) will ensure tht our smple size is lrge enough. For exmple, in the verge welth prolem we could proly sfely tke µ to e t lest (sy) $20k (proly more). However, the existence of people such s Bill Gtes mens tht we would need to tke very high vlue for the vrince σ 2. Indeed, if there is t lest one individul with welth $50 illion, then ssuming reltively smll vlue of µ mens tht the vrince must e t lest out (50 109 ) 2 = 10 13. (Check 250 10 6 this.) There is relly no wy round this prolem with simple uniform smpling: the uneven distriution of welth mens tht the vrince is inherently very lrge, nd we will need huge numer of smples efore we re likely to find nyody who is immensely welthy. But if we don t include such people in our smple, then our estimte will e wy too low. CS 70, Summer 2014, Note 17 5

As further exmple, suppose we re trying to estimte the verge rte of emission from rdioctive source, nd we re willing to ssume tht the emissions follow Poisson distriution with some unknown prmeter λ of course, this λ is precisely the expecttion we re trying to estimte. Now in this cse we hve µ = λ nd lso σ 2 = λ (see the previous lecture note). So σ 2 = 1 µ 2 λ. Thus in this cse smple size of n = 1 suffices. (Agin, in prctice we would use lower ound on λ.) λε 2 δ The Lw of Lrge Numers The estimtion method we used in the previous two sections is sed on principle tht we ccept s prt of everydy life: nmely, the Lw of Lrge Numers (LLN). This sserts tht, if we oserve some rndom vrile mny times, nd tke the verge of the oservtions, then this verge will converge to single vlue, which is of course the expecttion of the rndom vrile. In other words, verging tends to smooth out ny lrge fluctutions, nd the more verging we do the etter the smoothing. Theorem 17.3: [Lw of Lrge Numers] Let X 1,X 2,...,X n e i.i.d. rndom vriles with common expecttion µ = E(X i ). Define A n = 1 n n i=1 X i. Then for ny α > 0, we hve Pr[ A n µ α] 0 s n. Proof: Let Vr(X i ) = σ 2 e the common vrince of the r.vs; we ssume tht σ 2 is finite 2. With this (reltively mild) ssumption, the LLN is n immedite consequence of Cheyshev s Inequlity. For, s we hve seen ove, E(A n ) = µ nd Vr(A n ) = σ 2 n, so y Cheyshev we hve This completes the proof. Pr[ A n µ α] Vr(A n) α 2 = σ 2 0 s n. nα2 Notice tht the LLN sys tht the proility of ny devition α from the men, however smll, tends to zero s the numer of oservtions n in our verge tends to infinity. Thus y tking n lrge enough, we cn mke the proility of ny given devition s smll s we like. 2 If σ 2 is not finite, the LLN still holds ut the proof is much trickier. CS 70, Summer 2014, Note 17 6