The Near-miss Birthday Problem Gregory Quenell Plattsburgh State 1
The Classic Birthday Problem Assuming birthdays are uniformly distributed over 365 days, find P (at least one shared birthday) in a random sample of n people. Solution: Use complementation. P (at least one shared birthday) = 1 P (no shared birthday) = 1 P (n different birthdays) 2
Finding P (n different birthdays) 3
Finding P (n different birthdays) Birthday of person 1 364 days {}}{ 4
Finding P (n different birthdays) Birthday of person 1 364 days {}}{ P (n different birthdays) = number of ways to place n 1 birthdays in 364 days with no collision ( ) total number of ways to place n 1 birthdays in 365 days 5
Finding P (n different birthdays) Birthday of person 1 364 days {}}{ P (n different birthdays) = number of ways to place n 1 birthdays in 364 days with no collision ( ) total number of ways to place = = n 1 birthdays in 365 days ( ) 364 (n 1)! n 1 (365) n 1 364 363 (364 (n 2)) 365 365 365 = (364) n 1 (365) n 1 6
Some numbers 1.0 0.8 0.6 0.4 0.2 0 10 20 30 40 50 P (at least one shared birthday) n P (shared birthday) 10 0.117 15 0.253 20 0.411 25 0.569 23 0.507 30 0.706 35 0.814 40 0.891 45 0.941 50 0.970 = 1 (364) n 1 (365) n 1 7
The Near-miss Birthday Problem Assuming birthdays are uniformly distributed over 365 days, find at least one pair of birthdays P that are either coincident or adjacent in a random sample of n people. Solution: Use complementation. P (at least one near miss) = 1 P (no near misses) = 1 P (n isolated birthdays) 8
Finding P (n isolated birthdays) Birthday of person 1 364 days {}}{ P (n isolated birthdays) = number of ways to place n 1 birthdays in 364 days with no collision and no two birthdays adjacent ( ) total number of ways to place n 1 birthdays in 365 days 9
Finding P (n isolated birthdays) Birthday of person 1 364 days {}}{ P (n isolated birthdays) = This is still (365) n 1 number of ways to place n 1 birthdays in 364 days with no collision and no two birthdays adjacent ( ) total number of ways to place n 1 birthdays in 365 days 10
Finding P (n isolated birthdays) Birthday of person 1 364 days {}}{ P (n isolated birthdays) = This is still (365) n 1 number of ways to place n 1 birthdays in 364 days with no collision and no two birthdays adjacent ( ) total number of ways to place n 1 birthdays in 365 days How do we count these? 11
Counting isolated birthdays n 1 isolated birthdays in 364 days {}}{ a 1 a 2 a 3 a 4 a n 1 a n Every arrangement of n 1 isolated birthdays corresponds to a gap sequence a 1, a 2,..., a n in which { a1 + a 2 + + a n = 364 (n 1) a i 1 for all i 12
Aside on counting A sequence a 1, a 2,..., a n of positive integers such that a 1 + a 2 + + a n = S is called an n-part composition of S. Theorem: The number of n-part compositions of S is ( ) S 1. n 1 Proof: Write down a string of S dots. Then there are S 1 inter-dot spaces. 13
Aside on counting A sequence a 1, a 2,..., a n of positive integers such that a 1 + a 2 + + a n = S is called an n-part composition of S. Theorem: The number of n-part compositions of S is ( ) S 1. n 1 Proof: Write down a string of S dots. Then there are S 1 inter-dot spaces. 3 {}}{ + 2 + 4 + {}}{ 3 {}}{ Placing bars in n 1 of these S 1 spaces determines an n-part composition of S, and conversely. 14
Application to birthdays n 1 isolated birthdays in 364 days {}}{ a 1 a 2 a 3 a 4 a n 1 a n There are ( ) [364 (n 1)] 1 n 1 possible gap sequences. = ( ) 364 n n 1 15
Application to birthdays n 1 isolated birthdays in 364 days {}}{ a 1 a 2 a 3 a 4 a n 1 a n There are ( ) [364 (n 1)] 1 n 1 possible gap sequences. = ( ) 364 n n 1 Result: ( number of ways to place n 1 isolated birthdays in 364 days ) = ( ) 364 n (n 1)! n 1 = (364 n) n 1 16
The near-miss birthday formula P (no near miss) = (364 n) n 1 365 n 1 The probability of at least one near miss in a random sample of n people is 1 (364 n) n 1 365 n 1. The least n for which this probability exceeds 0.5 is n = 14: P at least one pair of coincident or adjacent birthdays in a random sample of 14 people 0.537. 17
More numbers 1.0 0.8 0.6 0.4 0.2 10 20 30 40 50 P (shared birthday) = 1 (364) n 1 (365) n 1 n P (shared) P (near miss) 10 0.117 0.314 14 0.223 0.537 15 0.253 0.590 20 0.411 0.804 25 0.569 0.926 23 0.507 0.888 30 0.706 0.978 35 0.814 0.995 40 0.891 0.999 45 0.941 1 50 0.970 1 P (near miss) = 1 (364 n) n 1 (365) n 1 18
Birthdays shared by k or more people 0 0 1 0 2 0 1 3 0 1 0 0 0 1 1 0 2 0 0 0 0 1 0 4 0 0 Let (X 1, X 2,..., X 365 ) be a random vector in which X i is the number of people in a random sample of size n who were born on day i. Then (X 1, X 2,..., X 365 ) follows a multinomial distribution with n things, 365 bins, and constant probability 1/365. Thus P ((X 1, X 2,..., X 365 ) = (x 1, x 2,..., x 365 )) = ( ) n x 1 x 2 x 365 = 1 365 n n! x 1! x 2! x 365! ( ) n 1 365 We want P (max(x 1, X 2,..., X 365 ) k), the probability that some date is the birthday of k or more people in the sample. Again, we use complementation: P (max(x 1, X 2,..., X 365 ) k), = 1 P (max(x 1, X 2,..., X 365 ) k 1) = 1 P (X i k 1) for all i. 19
Finding P (X i k 1) for i = 1, 2,..., 365 We need n! 365 n k 1 k 1 k 1 x 1 =0 x 2 =0 x 365 =0 x 1 + x 2 + + x 365 =n ( 1 x 1! 1 x 2! 1 ) x 365! Consider the product ( 1 0! + 1 ) ( 1! + + 1 1 (k 1)! 0! + 1 ) 1! + + 1 (k 1)! factor for x 1 factor for x 2 ( 1 0! + 1 ) 1! + + 1 (k 1)! factor for x 365 To pick out the terms with x 1 + x 2 + + x 365 = n, introduce a tracer variable: ( τ 0 0! + τ 1 1! + + τ ) ( k 1 τ 0 (k 1)! 0! + τ 1 1! + + τ ) ( k 1 τ 0 (k 1)! 0! + τ 1 1! + + τ ) k 1 (k 1)! The coefficient of τ n in this product is exactly the sum that we want. 20
The multiple-birthday formula We have P (X i k 1 i) = n! 365 n coeff of τ n in ( τ 0 0! + τ 1 1! + + τ k 1 ) 365 (k 1)! And so P (max(x i ) k) = 1 n! ( τ 365 [τ n 0 ] n 0! + τ 1 1! + + τ k 1 ) 365 (k 1)! 21
An example In a random sample of 100 people, what s the probability that there are six (or more) who share a birthday? It s 1 100! ( τ 365 100[τ 100 0 ] 0! + τ 1 1! + τ 2 2! + τ 3 3! + τ 4 4! + τ 5 ) 365 5! Mathematica says the answer is 150 259 132 496 532 424 666 066 218 503 258 905 869 817 110 259 709 611 434 657 314 752 093 849 611 634 141 921 578 743 930 338 023 787 684 475 659 966 265 486 068 519 258 590 304 556 011 085 660 183 037 201 911 170 138 053 982 437 173 570 483 591 520 443 688 927 322 874 822 292 982 338 348 606 684 459 507 966 128 295 833 309/1 018 100 624 231 385 241 853 189 999 481 940 942 382 873 878 399 046 008 966 742 039 665 259 133 127 558 338 726 075 853 312 698 838 815 389 196 105 495 212 915 667 272 376 736 512 436 519 973 194 623 721 779 480 597 820 765 897 548 554 160 854 805 712 082 157 001 360 774 761 962 446 621 765 820 964 355 953 037 738 800 048 828 125 (This is about 0.0001476.) 22
Multiple birthday probabilities 1.0 0.8 0.6 0.4 0.2 100 200 300 400 500 Probabilities of at least one date in the calendar being the shared birthday of k people for k = 3, 4, and 5. 23