Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Similar documents
Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

p-adic Egyptian Fractions

Continuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Review of Gaussian Quadrature method

Interpreting Integrals and the Fundamental Theorem

1B40 Practical Skills

Improper Integrals. The First Fundamental Theorem of Calculus, as we ve discussed in class, goes as follows:

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

The Regulated and Riemann Integrals

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

10. AREAS BETWEEN CURVES

Quadratic Forms. Quadratic Forms

Review of Probability Distributions. CS1538: Introduction to Simulations

1 Probability Density Functions

Lecture 2e Orthogonal Complement (pages )

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

2.4 Linear Inequalities and Interval Notation

Section 4: Integration ECO4112F 2011

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

Bases for Vector Spaces

Designing Information Devices and Systems I Discussion 8B

Parse trees, ambiguity, and Chomsky normal form

The area under the graph of f and above the x-axis between a and b is denoted by. f(x) dx. π O

Math 426: Probability Final Exam Practice

Lecture 2: January 27

4.1. Probability Density Functions

Section 6.1 Definite Integral

Surface maps into free groups

Lecture 3: Equivalence Relations

Suppose we want to find the area under the parabola and above the x axis, between the lines x = 2 and x = -2.

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Lecture 17

Section 6: Area, Volume, and Average Value

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

CS667 Lecture 6: Monte Carlo Integration 02/10/05

Continuous Random Variables

Chapters Five Notes SN AA U1C5

1 Structural induction, finite automata, regular expressions

Mathematics Number: Logarithms

Linear Inequalities. Work Sheet 1

Theoretical foundations of Gaussian quadrature

Math 1B, lecture 4: Error bounds for numerical methods

Math 135, Spring 2012: HW 7

Vectors , (0,0). 5. A vector is commonly denoted by putting an arrow above its symbol, as in the picture above. Here are some 3-dimensional vectors:

Recitation 3: More Applications of the Derivative

Chapter 6 Continuous Random Variables and Distributions

MTH 505: Number Theory Spring 2017

DEFINITION The inner product of two functions f 1 and f 2 on an interval [a, b] is the number. ( f 1, f 2 ) b DEFINITION 11.1.

Lecture 3 Gaussian Probability Distribution

Lecture Solution of a System of Linear Equation

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

Minimal DFA. minimal DFA for L starting from any other

Best Approximation. Chapter The General Case

Chapter 9 Definite Integrals

Polynomial Approximations for the Natural Logarithm and Arctangent Functions. Math 230

The steps of the hypothesis test

MA123, Chapter 10: Formulas for integrals: integrals, antiderivatives, and the Fundamental Theorem of Calculus (pp.

Homework Assignment 3 Solution Set

Section 6.1 INTRO to LAPLACE TRANSFORMS

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

set is not closed under matrix [ multiplication, ] and does not form a group.

and that at t = 0 the object is at position 5. Find the position of the object at t = 2.

Convert the NFA into DFA

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

Derivations for maximum likelihood estimation of particle size distribution using in situ video imaging

Bridging the gap: GCSE AS Level

The practical version

20 MATHEMATICS POLYNOMIALS

Entropy and Ergodic Theory Notes 10: Large Deviations I

Numerical integration

This chapter will show you. What you should already know. 1 Write down the value of each of the following. a 5 2

MAA 4212 Improper Integrals

Riemann is the Mann! (But Lebesgue may besgue to differ.)

An Overview of Integration

Describe in words how you interpret this quantity. Precisely what information do you get from x?

Lecture 1. Functional series. Pointwise and uniform convergence.

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Math 61CM - Solutions to homework 9

September 13 Homework Solutions

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

5: The Definite Integral

Designing Information Devices and Systems I Spring 2018 Homework 7

Name Solutions to Test 3 November 8, 2017

Chapter 5 : Continuous Random Variables

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).

Line integrals, and arc length of curves. Some intuitive explanations and definitions.

Chapter 2. Random Variables and Probability Distributions

Continuous Random Variables

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Regular Language. Nonregular Languages The Pumping Lemma. The pumping lemma. Regular Language. The pumping lemma. Infinitely long words 3/17/15

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Math 8 Winter 2015 Applications of Integration

2 b. , a. area is S= 2π xds. Again, understand where these formulas came from (pages ).

ODE: Existence and Uniqueness of a Solution

Tests for the Ratio of Two Poisson Rates

APPENDIX. Precalculus Review D.1. Real Numbers and the Real Number Line

Abstract inner product spaces

Transcription:

EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion, y tking smll rndom smple. How lrge does our smple hve to e to gurntee tht our estimte will e within (sy) n dditive fctor of 0.1 of the true vlue with proility t lest 0.95? This is perhps the most sic sttisticl estimtion prolem, nd shows up everywhere. We will develop simple solution tht uses only Cheyshev s inequlity. More refined methods cn e used to get shrper results. Let s denote the size of our smple y n (to e determined), nd the numer of Democrts in it y the rndom vrile S n. (The suscript n just reminds us tht the r.v. depends on the size of the smple.) Then our estimte will e the vlue A n = 1 n S n. Now s hs often een the cse, we will find it helpful to write S n = X 1 + X 2 + + X n, where { 1 if person i in smple is Democrt; X i = 0 otherwise. Note tht ech X i cn e viewed s coin toss, with Heds proility p (though of course we do not know the vlue of p!). And the coin tosses re independent. 1 We cll such fmily of rndom vriles independent, identiclly distriuted, or i.i.d. for short. (For precise definition of independent rndom vriles, see the next lecture note; for now we work with the intuitive mening tht knowing the vlue of ny suset of the r.v. s does not chnge the distriution of the others.) Wht is the expecttion of our estimte? E(A n ) = E( 1 n S n) = 1 n E(X 1 + X 2 + + X n ) = 1 n (np) = p. So for ny vlue of n, our estimte will lwys hve the correct expecttion p. [Such r.v. is often clled n unised estimtor of p.] Now presumly, s we increse our smple size n, our estimte should get more nd more ccurte. This will show up in the fct tht the vrince decreses with n: i.e., s n increses, the proility tht we re fr from the men p will get smller. To see this, we need to compute Vr(A n ). But A n = 1 n n X i, which is just multiple of sum of independent rndom vriles. Theorem 17.1: For ny rndom vrile X nd constnt c, we hve Vr(cX) = c 2 Vr(X). 1 We re ssuming here tht the smpling is done with replcement ; i.e., we select ech person in the smple from the entire popultion, including those we hve lredy picked. So there is smll chnce tht we will pick the sme person twice. EECS 70, Spring 2013, Lecture 17 1

And for independent rndom vriles X,Y, we hve Vr(X +Y ) = Vr(X) + Vr(Y ). Before we prove this theorem, let us formlize something we hve een ssuming implicitly for some time: Joint Distriutions Consider two rndom vriles X nd Y defined on the sme proility spce. By linerity of expecttion, we know tht E(X +Y ) = E(X)+E(Y ). Since E(X) cn e clculted if we know the distriution of X nd E(Y ) cn e clculted if we know the distriution of Y, this mens tht E(X +Y ) cn e computed knowing only the two individul distriutions. No informtion is needed out the reltionship etween X nd Y. This is not true if we need to compute, sy, E((X +Y ) 2 ), e.g. s when we computed the vrince of inomil r.v. This is ecuse E((X +Y ) 2 ) = E(X 2 )+2E(XY )+E(Y 2 ), nd E(XY ) depends on the reltionship etween X nd Y. How cn we cpture such reltionship? Recll tht the distriution of single rndom vrile X is the collection of the proilities of ll events X =, for ll possile vlues of tht X cn tke on. When we hve two rndom vriles X nd Y, we cn think of (X,Y ) s "two-dimensionl" rndom vrile, in which cse the events of interest re X = Y = for ll possile vlues of (,) tht (X,Y ) cn tke on. Thus, nturl generliztion of the notion of distriution to multiple rndom vriles is the following. Definition 17.1 (joint distriution): The joint distriution of two discrete rndom vriles X nd Y is the collection of vlues {(,,Pr[X = Y = ]) : (,) A B}, where A nd B re the sets of ll possile vlues tken y X nd Y respectively. This notion oviously generlizes to three or more rndom vriles. Since we will write Pr[X = Y = ] quite often, we will revite it to Pr[X =,Y = ]. Just like the distriution of single rndom vrile, the joint distriution is normlized, i.e. Pr[X =,Y = ] = 1. A, B This follows from noticing tht the events X = Y =, A, B, prtition the smple spce. Independent Rndom Vriles Independence for rndom vriles is defined in nlogous fshion to independence for events: Definition 17.2 (independent r.v. s): Rndom vriles X nd Y on the sme proility spce re sid to e independent if the events X = nd Y = re independent for ll vlues,. Equivlently, the joint distriution of independent r.v. s decomposes s Pr[X =,Y = ] = Pr[X = ]Pr[Y = ],. Note tht for independent r.v. s, the joint distriution is fully specified y the mrginl distriutions. Mutul independence of more thn two r.v. s is defined similrly. A very importnt exmple of independent r.v. s is indictor r.v. s for independent events. Thus, for exmple, if {X i } re indictor r.v. s for the ith toss of coin eing Heds, then the X i re mutully independent r.v. s. We sw tht the expecttion of sum of r.v. s is the sum of the expecttions of the individul r.v. s. This is not true in generl for vrince. However, s the ove theorem sttes, this is true if the rndom vriles re independent. To see this, first we look t the expecttion of product of independent r.v. s (which is quntity tht frequently shows up in vrince clcultions, s we hve seen). Theorem 17.2: For independent rndom vriles X,Y, we hve E(XY ) = E(X)E(Y ). EECS 70, Spring 2013, Lecture 17 2

Proof: We hve E(XY ) = = = ( Pr[X =,Y = ] Pr[X = ] Pr[Y = ] Pr[X = ] = E(X) E(Y ), ) ( Pr[Y = ] s required. In the second line here we mde crucil use of independence. For exmple, this theorem would hve llowed us to conclude immeditely in our rndom wlk exmple t the eginning of Lecture Note 16 tht E(X i X j ) = E(X i )E(X j ) = 0, without the need for clcultion. We now use the ove theorem to conclude the nice property of the vrince of independent rndom vriles stted in the theorem ove, nmely tht for independent rndom vriles X nd Y, Vr(X +Y ) = Vr(X)+ Vr(Y ): Proof: From the lterntive formul for vrince in Theorem 16.1, we hve, using linerity of expecttion extensively, Vr(X +Y ) = E((X +Y ) 2 ) E(X +Y ) 2 = E(X 2 ) + E(Y 2 ) + 2E(XY ) (E(X) + E(Y )) 2 = (E(X 2 ) E(X) 2 ) + (E(Y 2 ) E(Y ) 2 ) + 2(E(XY ) E(X)E(Y )) = Vr(X) + Vr(Y ) + 2(E(XY ) E(X)E(Y )). Now ecuse X,Y re independent, y Theorem 18.1 the finl term in this expression is zero. Hence we get our result. Note: The expression E(XY ) E(X)E(Y ) ppering in the ove proof is clled the covrince of X nd Y, nd is mesure of the dependence etween X,Y. It is zero when X,Y re independent. It is very importnt to rememer tht neither of these two results is true in generl, without the ssumption tht X,Y re independent. As simple exmple, note tht even for 0-1 r.v. X with Pr[X = 1] = p, E(X 2 ) = p is not equl to E(X) 2 = p 2 (ecuse of course X nd X re not independent!). Note lso tht the theorem does not quite sy tht vrince is liner for independent rndom vriles: it sys only tht vrinces sum. It is not true tht Vr(cX) = cvr(x) for constnt c. It sys tht Vr(cX) = c 2 Vr(X). The proof is left s strightforwrd exercise. ) We now return to our exmple of estimting the proportion of Democrts, where we were out to compute Vr(A n ): Vr(A n ) = Vr( 1 n n X i ) = ( 1 n n )2 Vr( X i ) = ( 1 n )2 n Vr(X i ) = σ 2 n, where we hve written σ 2 for the vrince of ech of the X i. So we see tht the vrince of A n decreses linerly with n. This fct ensures tht, s we tke lrger nd lrger smple sizes n, the proility tht we devite much from the expecttion p gets smller nd smller. Let s now use Cheyshev s inequlity to figure out how lrge n hs to e to ensure specified ccurcy in our estimte of the proportion of Democrts p. A nturl wy to mesure this is for us to specify two EECS 70, Spring 2013, Lecture 17 3

prmeters, ε nd δ, oth in the rnge (0,1). The prmeter ε controls the error we re prepred to tolerte in our estimte, nd δ controls the confidence we wnt to hve in our estimte. A more precise version of our originl question is then the following: Question: For the Democrt-estimtion prolem ove, how lrge does the smple size n hve to e in order to ensure tht Pr[ A n p ε] δ? In our originl question, we hd ε = 0.1 nd δ = 0.05. Let s pply Cheyshev s inequlity to nswer our more precise question ove. Since we know Vr(A n ), this will e quite simple. From Cheyshev s inequlity, we hve Pr[ A n p ε] Vr(A n) ε 2 = σ 2 nε 2 To mke this less thn the desired vlue δ, we need to set n σ 2 ε 2 δ. (1) Now recll tht σ 2 = Vr(X i ) is the vrince of single smple X i. So, since X i is 0/1-vlued r.v., we hve σ 2 = p(1 p), nd inequlity (1) ecomes n p(1 p) ε 2. (2) δ Since p(1 p) is tkes on its mximum vlue for p = 1/2, we cn conclude tht it is sufficient to choose n such tht: n 1 4ε 2 δ. (3) Plugging in ε = 0.1 nd δ = 0.05, we see tht smple size of n = 500 is sufficient. Notice tht the size of the smple is independent of the totl size of the popultion! This is how polls cn ccurtely estimte quntities of interest for popultion of severl hundred million while smpling only very smll numer of people. Estimting generl expecttion Wht if we wnted to estimte something little more complex thn the proportion of Democrts in the popultion, such s the verge welth of people in the US? Then we could use exctly the sme scheme s ove, except tht now the r.v. X i is the welth of the ith person in our smple. Clerly E(X i ) = µ, the verge welth (which is wht we re trying to estimte). And our estimte will gin e A n = 1 n n X i, for suitly chosen smple size n. Once gin the X i re i.i.d. rndom vriles, so we gin hve E(A n ) = µ nd Vr(A n ) = σ 2 n, where σ 2 = Vr(X i ) is the vrince of the X i. (Recll tht the only fcts we used out the X i ws tht they were independent nd hd the sme distriution ctully the sme expecttion nd vrince would e enough: why?) This time, however, since we do not hve ny priori ound on the men µ, it mkes more sense to let ε e the reltive error. i.e. we wish to find n estimte tht is within n dditive error of εµ. Using eqution (1), ut sustituting εµ in plce of ε, it is enough for the smple size n to stisfy n σ 2 µ 2 1 ε 2 δ. (4) EECS 70, Spring 2013, Lecture 17 4

Here ε nd δ re the desired reltive error nd confidence respectively. Now of course we don t know the other two quntities, µ nd σ 2, ppering in eqution (4). In prctice, we would use lower ound on µ nd n upper ound on σ 2 (just s we used lower ound on p in the Democrts prolem). Plugging these ounds into eqution (4) will ensure tht our smple size is lrge enough. For exmple, in the verge welth prolem we could proly sfely tke µ to e t lest (sy) $20k (proly more). However, the existence of people such s Bill Gtes mens tht we would need to tke very high vlue for the vrince σ 2. Indeed, if there is t lest one individul with welth $50 illion, then ssuming reltively smll vlue of µ mens tht the vrince must e t lest out (50 109 ) 2 = 10 13. (Check 250 10 6 this.) There is relly no wy round this prolem with simple uniform smpling: the uneven distriution of welth mens tht the vrince is inherently very lrge, nd we will need huge numer of smples efore we re likely to find nyody who is immensely welthy. But if we don t include such people in our smple, then our estimte will e wy too low. As further exmple, suppose we re trying to estimte the verge rte of emission from rdioctive source, nd we re willing to ssume tht the emissions follow Poisson distriution with some unknown prmeter λ of course, this λ is precisely the expecttion we re trying to estimte. Now in this cse we hve µ = λ nd lso σ 2 = λ (see the previous lecture note). So σ 2 = 1 µ 2 λ. Thus in this cse smple size of n = 1 suffices. (Agin, in prctice we would use lower ound on λ.) λε 2 δ The Lw of Lrge Numers The estimtion method we used in the previous two sections is sed on principle tht we ccept s prt of everydy life: nmely, the Lw of Lrge Numers (LLN). This sserts tht, if we oserve some rndom vrile mny times, nd tke the verge of the oservtions, then this verge will converge to single vlue, which is of course the expecttion of the rndom vrile. In other words, verging tends to smooth out ny lrge fluctutions, nd the more verging we do the etter the smoothing. Theorem 17.3: [Lw of Lrge Numers] Let X 1,X 2,...,X n e i.i.d. rndom vriles with common expecttion µ = E(X i ). Define A n = 1 n n X i. Then for ny α > 0, we hve Pr[ A n µ α] 0 s n. Proof: Let Vr(X i ) = σ 2 e the common vrince of the r.v. s; we ssume tht σ 2 is finite 2. With this (reltively mild) ssumption, the LLN is n immedite consequence of Cheyshev s Inequlity. For, s we hve seen ove, E(A n ) = µ nd Vr(A n ) = σ 2 n, so y Cheyshev we hve This completes the proof. Pr[ A n µ α] Vr(A n) α 2 = σ 2 0 s n. nα2 Notice tht the LLN sys tht the proility of ny devition α from the men, however smll, tends to zero s the numer of oservtions n in our verge tends to infinity. Thus y tking n lrge enough, we cn mke the proility of ny given devition s smll s we like. 2 If σ 2 is not finite, the LLN still holds ut the proof is much trickier. EECS 70, Spring 2013, Lecture 17 5