So far we discussed random number generators that need to have the maximum length period.

Similar documents
B.N.Bandodkar College of Science, Thane. Random-Number Generation. Mrs M.J.Gholba

Slides 3: Random Numbers

Systems Simulation Chapter 7: Random-Number Generation

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

Topics in Computer Mathematics

Module 8 Probability

Random Number Generation. CS1538: Introduction to simulations

CPSC 531: Random Numbers. Jonathan Hudson Department of Computer Science University of Calgary

Overview. CSE 21 Day 5. Image/Coimage. Monotonic Lists. Functions Probabilistic analysis

Topic 3 Random variables, expectation, and variance, II

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

1. Discrete Distributions

Sources of randomness

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation

Non-Interactive Zero Knowledge (II)

Computing Probability

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation

2008 Winton. Statistical Testing of RNGs

Lecture 22: Counting

Review of Basic Probability Theory

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Algorithms and Networking for Computer Games

Exam 1. Problem 1: True or false

PHYS 275 Experiment 2 Of Dice and Distributions

Lehmer Random Number Generators: Introduction

Probably About Probability p <.05. Probability. What Is Probability?

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Randomized Algorithms

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

Expectation, inequalities and laws of large numbers

Randomized algorithms. Inge Li Gørtz

Lecture 2: From Classical to Quantum Model of Computation

Attempt QUESTIONS 1 and 2, and THREE other questions. penalised if you attempt additional questions.

A Collection of Dice Problems

Random Variables Example:

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Uniform random numbers generators

Treatment of Error in Experimental Measurements

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

2. The Binomial Distribution

More on Bayes and conjugate forms

Recall the Basics of Hypothesis Testing

j=1 π j = 1. Let X j be the number

S6880 #6. Random Number Generation #2: Testing RNGs

7.1 What is it and why should we care?

CS5314 Randomized Algorithms. Lecture 5: Discrete Random Variables and Expectation (Conditional Expectation, Geometric RV)

Practice Problems Section Problems

Discrete Random Variable

UNIT 5:Random number generation And Variation Generation

Math 3338: Probability (Fall 2006)

Review of Probability. CS1538: Introduction to Simulations

b = 10 a, is the logarithm of b to the base 10. Changing the base to e we obtain natural logarithms, so a = ln b means that b = e a.

SIO 221B, Rudnick adapted from Davis 1. 1 x lim. N x 2 n = 1 N. { x} 1 N. N x = 1 N. N x = 1 ( N N x ) x = 0 (3) = 1 x N 2

11 CHI-SQUARED Introduction. Objectives. How random are your numbers? After studying this chapter you should

The Multinomial Model

6.042/18.062J Mathematics for Computer Science November 28, 2006 Tom Leighton and Ronitt Rubinfeld. Random Variables

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Business Statistics. Lecture 3: Random Variables and the Normal Distribution

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

Massachusetts Institute of Technology Lecture J/18.062J: Mathematics for Computer Science 2 May 2000 Professors David Karger and Nancy Lynch

CS 124 Math Review Section January 29, 2018

Multiple Sample Categorical Data

Algebra I+ Pacing Guide. Days Units Notes Chapter 1 ( , )

Randomized Load Balancing:The Power of 2 Choices

Statistical Hypothesis Testing

Introduction to discrete probability. The rules Sample space (finite except for one example)

5 ProbabilisticAnalysisandRandomized Algorithms

2. Probability. Chris Piech and Mehran Sahami. Oct 2017

Inaccessible Entropy and its Applications. 1 Review: Psedorandom Generators from One-Way Functions

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

CMPSCI 240: Reasoning Under Uncertainty

Sampling Distributions

Random Variable. Pr(X = a) = Pr(s)

Lecture Notes. This lecture introduces the idea of a random variable. This name is a misnomer, since a random variable is actually a function.

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Chapter 9. Non-Parametric Density Function Estimation

CS 246 Review of Proof Techniques and Probability 01/14/19

Lecture 1. ABC of Probability

Single Maths B: Introduction to Probability

Stochastic Simulation

Probability Rules. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

Carleton University. Final Examination Winter DURATION: 2 HOURS No. of students: 275

arxiv: v1 [math.gm] 23 Dec 2018

Monte Carlo Integration. Computer Graphics CMU /15-662, Fall 2016

Notes on Mathematics Groups

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

PRODUCTS THAT ARE POWERS. A mathematical vignette Ed Barbeau, University of Toronto

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Weldon s dice. Lecture 15 - χ 2 Tests. Labby s dice. Labby s dice (cont.)

Chapter 26: Comparing Counts (Chi Square)

Math 461 B/C, Spring 2009 Midterm Exam 1 Solutions and Comments

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

i=1 Pr(X 1 = i X 2 = i). Notice each toss is independent and identical, so we can write = 1/6

STA 247 Solutions to Assignment #1

Transcription:

So far we discussed random number generators that need to have the maximum length period. Even the increment series has the maximum length period yet it is by no means random How do we decide if a generator is adequate as a random number generator. To our eyes we tend to avoid sequences that seem non-random such as pairs of equal adjacent digits Even a random sequence will start appearing to have patterns after a while Consider π 3.459265358979323846264338327950 26 repeats after a while and the second appearance is in the middle of another pattern. In fact Dr. Matrix found dozens of properties that were observed on π We also notice patterns in numbers used in every day life to aid our memory, thus human judgment is not very good to analyse random number generators. We thus rely on a number of statistical tests. If a random number generator succeeds to pass a number of tests, it might still fail on the next test. Yet each success gives us more confidence on its randomness. Joseph Cordina

There are two kinds of tests: Empirical Tests where a generated sequence is analysed and one derives a statistical score for the sequence; Theoretical Tests where certain properties are derived on the recurrence rule used to form the sequence. Let us examine the most basic of statistical tests, the Chi-Square test (χ 2 test) Assume two fair dices, we can get the following totals s with certain probabilities value of s= 2 3 4 5 6 7 8 9 0 2 probability p s = 36 8 2 If we throw the dice n times we should get the value of s approximately np s times on average. 9 In 44 throws one expects to get the value 4 about 2 times. Let us assume we throw the dice 44 times and record the outcome value of s= 2 3 4 5 6 7 8 9 0 2 observed Y s = 2 4 0 2 22 29 2 5 4 9 6 expected np s = 4 8 2 6 20 24 20 6 2 8 4 As expected, the observed values are different from the expected. In fact if one gets 44 throws with s = 2 one would be convinced the dice are not fair even though this is still probable. Using these values one can devise a probabilistic test, i.e. how probable are certain throws on average. Joseph Cordina 2 5 36 6 5 36 9 2 8 36

One can look at the difference between the expected value of throws and the actual throws, thus V = (Y 2 np 2 ) 2 +(Y 3 np 3 ) 2 +...+(Y 2 np 2 ) 2 () A high value of V would indicate something is wrong. Note that V can still be high for a fair set of dice, and thus we ask how probable a certain value of V is. In (3) we give equal weight to each of the addition parts, even though (Y 7 np 7 ) 2 is likely to be higher than (Y 2 np 2 ) 2. Thus we weight each component to give V = (Y 2 np 2 ) 2 np 2 +...+ (Y 2 np 2 ) 2 np 2 (2) This is known as the Chi-Square Test (χ 2 ) For the experiment shown previously we find that V = 748 7. Yet is this value high or low? Joseph Cordina 3

In general we take n independent observations. Let p s be the probability that each observation falls into category s, and let Y s be the number of observations that actually do fall into category s. Then we have V = k s= (Y s np s ) 2 np s (3) In our previous example we had possible outcomes, so k =. Expanding (Y s np s ) 2 = Y 2 s 2np s Y s +n 2 p 2 s and knowing that we get Y + Y 2 +... + Y k = n p + p 2 +... + p k = V = n k s= ( Y s 2 ) n (4) p s Yet what makes a reasonable value for V? We make use of tables that give the chi-square distribution with n degrees of freedom for various values of v. Note that the value n = k should be used since as seen before one can calculate the value of Y k if one knows all other values. Thus one only has k independent values. Joseph Cordina 4

If the table entry in row n under column p is x, then the quantity V in Eq. (4) will be less than or equal to x with approximate probability p, if n is large enough. Thus if row 0 under 95 percent is 8.3, then we will have V > 8.3 only about 5 percent of the time. Assume we have two random number generators for 0 degrees of freedom, with V = 29.5 and V 2 =.2 then referring to the table we see that V is way too high, since V will be larger than 25.2 less than 0.05 percent of the time. On the other hand, V 2 is quite low meaning that the resulting value are too close to the expected. In fact we cannot consider these values to be random at all. For the previous experiment with V = 7 7 48 we see that the value falls in between 25 and 50 percent, which is a mid-range value thus satisfactory. It is surprising that the same table is used regardless of the number of observations and regardless of p s. Only n seems to matter. This table is valid only for large number of observations. A common rule of thumb is to take as many observations as possible such that each np s is 5 or more. Joseph Cordina 5

In fact the proper choice of n, the number of observations, is somewhat obscure A bias will be detected as n gets larger. Yet large values of n will smooth out locally non-random behaviour, when a series of numbers towards a certain bias are followed by a series of numbers with an opposite bias. Best way to use the chi-square test is to run the test at least three times, and if at least two are suspect, the generator is considered as not sufficiently random. Exercise: Apply the chi-square test to random number generators mentioned previously. The Chi-Square Test is adequate when the number of categories is known. Yet one can have an infinite number of categories such as a random fraction. A general notation for specifying probability distributions is to use the distribution function F(x) on a random quantity X, where F(x) = Pr(X x) = probability (X x) Figure 3 shows the distribution function for (a) a random bit, (b) a uniformly distributed random real number and (c) the limiting distribution of the value V in the chi-square test. Joseph Cordina 6

If we make n independent observations of the random quantity X, thus getting X, X 2,..., X n we can form the empirical distribution function F n (x) where F n (x) = number of X, X 2,..., X n that are x n (5) Figure 4 shows examples of empirical distributions. Note that with a finite number of samples, one is bound to get jumps in the curve. The smooth curve is the distribution function. The Kolmogorov-Smirnov Test (KS test) measures the difference between F(x) and F n (x). A bad random number generator will give an empirical distribution function that does not approach F(x) sufficiently well. To make the KS-test we use the following K + n = n K n = n max (F n(x) F(x)) (6) <x<+ max (F(x) F n(x)) (7) <x<+ The K + n measures the greatest amount of deviation when F n is greater than F and vice versa. Joseph Cordina 7

The statistics of Fig 4 are K 20 + K20 Fig 4(a) Fig 4(b) Fig. 4(c) 0.492 0.34 0.33 0.536.027 2.0 As in the Chi-Square Test we make use of tables to determine the probability of obtaining certain values of K. We see that the probability of obtaining K20 is 0.7975 or less is 75 percent. Note that the KS test is exact to the number of observations, unlike the Chi-Square Test. Equations 6 and 7 are not adequate for computer calculation since they vary over infinite values. Since F(x) is increasing and F n (x) also increases in finite steps, we can use the following procedure to obtain the KS-test values.. Obtain independent observations X,..., X n 2. Sort the observations in ascending order X... X n. 3. The desired statistics are now given by K + n = n max j n (j n F(X j)) (8) K n = n max j n (F(X j) j n ) (9) Joseph Cordina 8

Exercise Perform the KS Test on several random number generators mentioned previously and compare their corresponding Chi-Square Test. Exercise Make use of your preferred programming language random number generator for a finite number of degrees of freedom and perform the Chi-square and the KS Test on them Exercise Some dice were loaded such that on one die the value will appear twice as often as other numbers and the other die is similarly biased towards 6. The following values were obtained: value of s= 2 3 4 5 6 7 8 9 0 2 observed Y s = 2 6 0 6 8 32 20 3 6 9 2 Apply the Chi-Square test to determine if the chi-square test can detect the bad dice. If not, give indications why not. Exercise: Let F(x) be the uniform distribution given in Fig 3(b). Find K 20 + and k 20 for the following 20 observations 0.44, 0.732, 0.236, 0.62, 0.259, 0.442, 0.89, 0.693, 0.098, 0.302, 0.442, 0.434, 0.4, 0.07, 0.38, 0.869, 0.772, 0.678, 0.354, 0.78 and state if these values are to be expected. Joseph Cordina 9

The KS-Test and the Chi-Square Test are not normally used in isolation but in a number of empirical tests that evaluate random number sequences. What follows is a list of these tests. More details can be found in pages 6 to 75. Equidistribution Test: This tests that a series from 0 to is uniformly distributed. Serial Test: This tests that pairs of successive numbers is uniformly distributed in the sequence in an independent manner. This test can also be applied to triples, quadruples, etc. Gap Test: This examines the length of gaps between the occurrence of the same number. Poker Test: This test examines groups of five successive integers and observes the pattern they form. Coupon Collection Test: We see the length of numbers required to obtain a particular range of values. Permutation Test: This examines the possible relative ordering of generated sequences Run Test: This test examines if the numbers are increasing or decreasing and partitions them into groups. Maximum of t-test: This applies another test and selects the largest score on the sequences. Joseph Cordina 0

Collision Test: This is a test very similar to bucketing in hashing techniques. Birthday Spacing Test: This is similar to the above yet one applies an ordering sequence to the numbers Serial Correlation Test: This calculates the degree of dependency between one number and its predecessor. Exercise: Read about these tests and experiment with their implementation for your preferred random number generator. Joseph Cordina