Recursive Computations for Discrete Random Variables

Similar documents
Topic 9: Sampling Distributions of Estimators

Math 10A final exam, December 16, 2016

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Chapter 6 Sampling Distributions

(7 One- and Two-Sample Estimation Problem )

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Introduction to Probability and Statistics Twelfth Edition

Chapter 8: Estimating with Confidence

4.3 Growth Rates of Solutions to Recurrences

MATH/STAT 352: Lecture 15

U8L1: Sec Equations of Lines in R 2

18.440, March 9, Stirling s formula

Topic 9: Sampling Distributions of Estimators

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Discrete probability distributions

Estimation of a population proportion March 23,

Topic 9: Sampling Distributions of Estimators

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Quadratic Functions. Before we start looking at polynomials, we should know some common terminology.

1 Inferential Methods for Correlation and Regression Analysis

Power and Type II Error

1 Review of Probability & Statistics

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Infinite Sequences and Series

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Statistics 511 Additional Materials

Read through these prior to coming to the test and follow them when you take your test.

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

1 Generating functions for balls in boxes

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

Kinetics of Complex Reactions

Bertrand s Postulate

IP Reference guide for integer programming formulations.

INFINITE SEQUENCES AND SERIES

Confidence Intervals for the Population Proportion p

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

NUMERICAL METHODS COURSEWORK INFORMAL NOTES ON NUMERICAL INTEGRATION COURSEWORK

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Computing Confidence Intervals for Sample Data

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

SNAP Centre Workshop. Basic Algebraic Manipulation

Binomial Distribution

September 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1

6.3 Testing Series With Positive Terms

WORKING WITH NUMBERS

Number Representation

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

Homework 5 Solutions

The Binomial Theorem

Chapter 6. Sampling and Estimation

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Measures of Spread: Standard Deviation

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

CONFIDENCE INTERVALS STUDY GUIDE

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Chapter 2 The Monte Carlo Method

Understanding Samples

U8L1: Sec Equations of Lines in R 2

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Disjoint set (Union-Find)

Simulation. Two Rule For Inverting A Distribution Function

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Expectation and Variance of a random variable

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

C/CS/Phys C191 Deutsch and Deutsch-Josza algorithms 10/20/07 Fall 2007 Lecture 17

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

x c the remainder is Pc ().

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

The Random Walk For Dummies

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

Design and Analysis of Algorithms

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Sequences. Notation. Convergence of a Sequence

Probability and statistics: basic terms

Statistical inference: example 1. Inferential Statistics

Chapter Vectors

UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 17 Lecturer: David Wagner April 3, Notes 17 for CS 170

Linear Programming and the Simplex Method

Lecture 19: Convergence

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Math 140 Introductory Statistics

Chapter 18 Summary Sampling Distribution Models

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Line Edge Roughness, part 1

Lecture 24 Floods and flood frequency

GG313 GEOLOGICAL DATA ANALYSIS

Exponential Rules and How to Use Them Together

µ and π p i.e. Point Estimation x And, more generally, the population proportion is approximately equal to a sample proportion

Notes on iteration and Newton s method. Iteration

INTEGRATION BY PARTS (TABLE METHOD)

Empirical Distributions

Instructor: Judith Canner Spring 2010 CONFIDENCE INTERVALS How do we make inferences about the population parameters?

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15

Transcription:

Recursive Computatios for Discrete Radom Variables Ofte times, oe sees a problem stated like this: A machie is shut dow for repairs is a radom sample of 100 items selected from the dail output of the machie reveals at least 15% defectives. If o a give da the machie is producig ol 10% defectives, what is the probabilit that it will be shut dow? Although there is at least some level of approximatio ivolved, it would be appropriate to cosider this a istace of the biomial distributio with = 100 ad p = 0.1, ad we are asked for the probabilit that Y 15, which is a cumulative (or reverse cumulative probabilit uestio. I fact, I foud this problem i a later sectio of our textbook, a sectio devoted to approximatig the biomial distributio b the ormal distributio. We re expected to work out how ma stadard deviatios awa from the mea that is ad the look up the approximate cumulative probabilit o a table of the ormal distributio. Aother possible wa to approach this is to fid a table of cumulative probabilities for biomial distributios. As it happes, the edpapers of our textbook iclude a ormal probabilit table ad a umber of tables of biomial distributios. But that is t reall the be the best wa to do this. We ca t hope to have eough tables of biomial distributios to cover all of the situatios that we might ecouter, ad we ofte do t wat to tie ourselves dow to a bulk collectio of tables. Ad while the ormal approximatio ca be uite useful, it is still a approximatio ad it becomes icreasigl troublesome as a approximatio the further we move awa from the mea. For modest values of, ad with modest computatioal meas available to ourselves, we should be aswerig such uestios directl b computig the probabilit fuctio p( = P (Y = for all of the values of from 0 to, ad the addig up the oes that we eed. The most efficiet computatio will be recursive, ad to compute the biomial probabilities, we do t ever have to directl compute a biomial coefficiet. Oce we get started, we ca write P (Y = i terms of P (Y = 1 b multiplig b two umbers ad dividig b two umbers. Give that, we ca uickl ad efficietl compute the umbers we eed. The recursive procedures I m describig ca be used with a simple had calculator ad a piece of paper o which we write dow our results. But a spreadsheet program (for istace, Excel has a particular affiit for recursive computatios, ad this descriptio has bee writte with a spreadsheet i mid, ad is accompaied b a Excel file that executes these ideas. Of course, the same ideas ca be executed i a programmig laguage or i a eviromet such as MATLAB. I have ot pursued those lies but the reader is certail free to do so. Also, oce we have foud a wa to compute the biomial distributio, we see that we ca use the exact same ideas for the other amed distributios i the curret chapter of our textbook: the Poisso distributio, the hpergeometric distributio, the geometric distributio, ad the egative biomial distributio. Postscript to the problem stated i the first paragraph: usig the ormal approximatio to the biomial ad a table of ormal probabilities gives us a probabilit that the machie will be shut dow of about 0.0668. The tables of biomial probabilities i our book do t iclude a = 100 cases, so we ca t use those. But we ca use the recursive computatio give i these otes (usig the accompaig Excel file to fid that for = 100, p = 0.1, we get P (Y 14 0.927427, which b complemetatio gives us P (Y 15 0.072573. This is the accurate computatio ad the ormal approximatio is less accurate.. 1. The biomial distributio The biomial distributio with trials ad with p the probabilit of success o a oe trial 1

has probabilit fuctio P (Y = = to P (Y = = use both here ad i later sectios: ( = ( p (1 p. We will write = 1 p ad shorte that p. We eed a computatio cocerig biomial coefficiets that we will = + 1 1 ( 1( 2 ( + 1 We ca write. Leavig off the last factor i both umerator ( 1 2 2 ad deomiator will give us, hece euatio (1. 1 That gives us our recursive step. For 1, P (Y = P (Y = 1 = p 1 p 1 +1 = + 1 p That s eough for the recursive scheme: Biomial distributio iitiatio: Biomial distributio recursio: (1 P (Y = 0 = (2 For 1, P (Y = = P (Y = 1 + 1 If we cotiue this scheme too far, which is to sa for >, o harm comes of it. Note the umerator of the fractio i the recursive step, which gives P (Y = + 1 = 0 ad thus P (Y = = 0 b recursio for all >. Wasteful, perhaps, but it does t cause a trouble. We ca compute the cumulative distributio fuctio P (Y just b keepig a ruig sum of the probabilities we are computig. Are there limitatios to this? Yes, there are some practical limitatios. Oe is simpl imposed b storage space. If we tr to do these computatios i a spreadsheet i the particular wa I set it up, the we eed a spreadsheet file with at least rows. We ca probabl maage a few thousad rows, but beod that we re straiig practical file sizes. A more serious limitatio is umerical, ad arises i the iitiatio step. For large, ca be a ver small umber. The dager is that could uderflow, which meas that it could become too small to be represeted i the floatig poit umber sstem implemeted i the particular calculator, spreadsheet, or programmig laguage that we are usig. For the umbers as implemeted i Excel (ad i ma other places: look up IEEE 754 stadard, biar64, ofte called double precisio if ou wat more iformatio, if p = = 1 2, the wo t uderflow util gets to be a little greater tha 1000. If p > 1 2, the would uderflow for smaller values of. But we ca get aroud that b iterchagig the roles of p ad, redefiig success as failure ad failure as success. So i most cases, we ca make this work for < 1000. We could use larger with small eough p, but evetuall, as p gets ver small, a differet umerical problem arises: lack of accurac (loss of sigificat digits i computig = 1 p, leadig to lack of accurac i computig. If that should happe to us, we would best be advised to approximate the resultig biomial radom variable b a Poisso radom variable with the same mea. 2 p (3

2. The Poisso distributio A Poisso radom variable with mea λ has probabilit fuctio P (Y = = e λ λ. This! leads to a particularl simple form for the recursio: We take advatage of this as follows: Poisso distributio iitiatio: Poisso distributio recursio: For 1, P (Y = P (Y = 1 = e λλ! e λ λ 1 ( 1! = λ P (Y = 0 = e λ (4 P (Y = = P (Y = 1 λ Oe of the paths to the Poisso distributio is as the limit of a famil of biomial distributios ad ad p 0 i such a wa that p = λ remais costat. For the biomial distributio, we ca write the uotiet of P (Y = ad P (Y = 1 as follows: But as p 0, that teds to λ. + 1 p = p p( 1 (1 p This scheme ca also suffer from uderflow i the iitiatio step, but e λ does t uderflow util λ > 700, ad for such a large λ, we d eed a few thousad rows of the spreadsheet awa. 3. The hpergeometric distributio The setup for the hpergeometric distributio is the most complicated of our commo discrete distributios. We have a pool of N objects, of which r have a certai propert ad N r do ot have that propert. We choose, without replacemet, a sample of of these objects, ad is the cout of those objects i the sample with the specified propert. P (Y = is positive whe is a iteger such that max(0, r + N mi(, r. We ca draw parallels betwee the hpergeometric distributio ad the biomial distributio. plas the same role i both places. p correspods to r, the proportio of the pool with the specified propert, ad correspods to N N r, the proportio of the pool without the specified propert. If we keep costat ad let N N while keepig r = pn, the the hpergeometric would ted to the biomial. ( r N r ( For max(0, r + N mi(, r, the probabilit fuctio is P (Y = = ( N. We ca work out the appropriate uotiet with two uses of euatio (1: ( r N r P (Y = ( P (Y = 1 = ( N r = r + 1 + 1 N r + = + 1 ( r 1 +1 3 r + 1 N r + (5

We recogize the factor of + 1 as beig the same as oe which appeared i the recursive formula for the biomial distributio. I will leave it to the reader to show that the factor of r + 1 N r + teds to p as N with r = pn. Our recursive formula for the hpergeometric distributio has the most complicated iitiatio of all of these. It is also the ol case i which, i buildig the spreadsheet, ( I cheated ad used the built-i formula for biomial coefficiets. The Excel stax for is COMBIN(, k. k Hpergeometric distributio iitiatio: If r + N 0, the P (Y = 0 = else if r + N > 0, the P (Y = r + N = ( N r ( N (6 ( r N ( N (7 Hpergeometric distributio recursio: For > max(0, r + N, P (Y = = P (Y = 1 + 1 r + 1 N r + There are limitatios o this, i that the biomial coefficiets used i the iitiatio will overflow for large eough values of the parameters. I assume that this is somewhat more sesitive to overflow or uderflow tha a of the other distributios i this ote, but I have ot sstematicall explored how hard we ca push it. 4. The geometric distributio Icludig the geometric distributio i this ote is a little bit sill, as we ca compute the probabilit fuctio at a poit ad the cumulative probabilit at a poit directl, with o eed for recursio. We have a trial for with the probabilit of success (defied the same wa as i the biomial distributio is p. We repeat idepedet trials util success occurs ad the radom variable is defied as the umber of turs it took for that to happe. We must have 1, sice we must take at least oe tur. The probabilit fuctio is P (Y = = p 1 ad as for cumulative probabilit, we have P (Y > =, so that P (Y = 1. So we do t eed recursive computatios, but just for completeess, I ll iclude them awa: Geometric distributio iitiatio: Geometric distributio recursio: (8 P (Y = 1 = p (9 For 2, P (Y = = P (Y = 1 (10 5. The egative biomial distributio The egative biomial distributio has the same setup as the geometric distributio, ol this time, we cout the umber of trials eeded to get r successes, for some r 1. If r = 1, the this is 4

precisel the geometric distributio. Note that we must have r to have a positive probabilit. The probabilit fuctio for the egative biomial is ( 1 P (Y = = p r r. r 1 I will leave the details of determiig the recursive formula to the reader, ad will uote the results: Negative biomial distributio iitiatio: Negative biomial distributio recursio: P (Y = r = p r (11 For r + 1, P (Y = = P (Y = 1 1 r (12 6. Implemetatio i Excel I have icluded a Excel spreadsheet implemetig these ideas. It s ver much a bare-boes file, with ver little i the wa of formattig ad ot ma features. There are five sheets to this file, oe for each of our amed distributios. You should be able to avigate via the tabs at the bottom. Each sheet has oe to three cells set aside for ou to eter the parameters of the particular distributio. Those cells have bee outlied, ad are the ol cells outlied. Everthig else is give b formulas that deped o those cells i some wa. (I have pre-populated each sheet with parameters; the all happe to have mea 40. But the idea is that ou should replace those values with other values that ou happe to be iterested i. The, the probabilit fuctio P (Y = ad the cumulative distributio fuctio P (Y are tabulated i a block of cells below. Possible values of are i colum A, the probabilit fuctio is i colum B, ad the cumulative distributio fuctio is i colum C. I ve give ou about 200 rows of that; if ou eed more rows, make a selectio that icludes the bottom of the table ad exteds dowwards as far as ou eed it to, ad the issue a Fill Dow commad. The formulas i those cells are precisel the formulas i the umbered euatios above. To uderstad them, ou ol eed to kow the distictio betwee a relative address ad a fixed address i Excel. I did t iclude a graphs or charts. Feel free to add those ourself, as the meas to do that are available i Excel. Ad ou ca make other chages. Do ou wat more digits of accurac? The icrease the width of the appropriate colums ad chage the umber format. Do ou wat reversed cumulative probabilities? The add a colum subtractig the cumulative probabilities from 1. Ad so o. Also, as I metioed, all of this could be doe i a programmig laguage or other procedurebased eviromet (icludig Excel macros. If that is the world ou are familiar with, ou should be able to make the traslatio for ourself. For m ow part, I wat to show ou how we ca implemet all of this i a spreadsheet without macros. 5