Generating Random Numbers

Size: px

Start display at page:

Download "Generating Random Numbers"

Osborne Baldwin
5 years ago
Views:

1 Generating Random Numbers Seungchul Baek Department of Mathematics and Statistics, UMBC STAT 633: Statistical Computing 1 / 67

2 Introduction Simulation is a very powerful tool for statisticians. It allows us to investigate the performance of statistical methods before delving deep into difficult theoretical work. At a more practical level, integrals themselves are important for statisticians: - p-values are nothing but integrals; - Bayesians need to evaluate integrals to produce posterior probabilities, point estimates, and model selection criteria. Therefore, there is a need to understand simulation techniques and how they can be used for integral approximations. 2 / 67

3 Built-in functions in R R has many built-in random number generators. For each distribution, R has the pdf/pmf, quantile function, cdf, and an independent random number generator. distribution generator parameters binomial rbinom size, prob geometric rgeom prob negative binomial rnbinom size, prob Poisson rpois lambda uniform runif min, max normal rnorm mean, sd gamma rgamma shape, rate, scale χ 2 rchisq df exponential rexp rate F rf df1, df2 t rt df beta rbeta shape1, shape2 3 / 67

4 Uniform random variable Generating uniform random numbers is the main block for all that follows. In R, we use the generator, runif(n,min=0,max=1). The random numbers generated are said to be pseudo random they satisfy certain statistical tests we would expect independent uniforms to pass. In other words, runif() provides a deterministic sequence based on a random starting point. set.seed() controls a starting point. If it is not specified, R sets the seed by the current system time. 4 / 67

5 R does produce uniform random numbers? In order to make sure R produces uniform random numbers, we take a look at a histogram and scatter plot for (X i, X i+1 ). n < x <- runif(n) x1 <- x[-n] x2 <- x[-1] par(mfrow=c(1,2)) hist(x) plot(x1,x2) 5 / 67

6 R does produce uniform random numbers? How does it look like? 6 / 67

7 Random numbers for an f (x) We would like to generate random numbers from the pdf How should we do? f (x) = 3x 2, 0 < x < 1. 7 / 67

8 The inverse transform Theorem: Probability Integral Transformation Let X have continuous cdf F X (x) and define the random variable U as U = F X (X ). Then U U(0, 1). Proof. 8 / 67

9 The inverse transform Algorithm: 1. Derive the inverse function F 1 X (u). 2. Generate a random u from U(0, 1), then calculate x = F 1 X (u). This algorithm is straightforward to apply if F 1 compute. X is easy to Example 1. The pdf f (x) = 3x 2. 0 < x < 1. We let U be F X (X ). F X (x) = F 1 X (u) = 9 / 67

10 Random numbers for an f (x) We compare the empirical distribution and theoretical distribution in Example 1. n < u <- runif(n) x <- u^(1/3) hist(x, freq = FALSE) x <- seq(0,1,0.01) lines(x, 3*x^2, col="red") 10 / 67

11 Random numbers for an f (x) 11 / 67

12 Example: Exponential distribution If a random variable has a exponential distribution X Expon(λ), it has the pdf, f (x) = λe λx, x > 0. We want to generate random numbers from this distribution. Solution: 12 / 67

13 Example: Exponential distribution w/ λ = 1 n < u <- runif(n) x <- -log(u) y <- rexp(n) par(mfrow=c(1,2)) hist(x,freq=false,main="inverse transform",breaks=100) z <- seq(0,10,0.01) lines(z, exp(-z), col="red") hist(y,freq=false,main="in-built function",breaks=100) lines(z, exp(-z), col="red") 13 / 67

14 Example: Exponential distribution w/ λ = 1 14 / 67

15 Example: Cauchy distribution The standard Cauchy distribution has pdf and cdf f (x) = 1 π(1 + x 2 ), F (x) = This distribution has shape similar to normal, but tails are much heavier Cauchy has no finite moments. But its cdf can be inverted: F 1 (u) = To generate X from standard Cauchy... Non-standard Cauchy (location µ and scale σ): Use rt(n,df=1) in R. 15 / 67

16 Practice: Pareto distribution The Pareto(η, θ) distribution has the cdf, ( η ) θ F (x) = 1, x η > 0, θ > 0, x where η is the scale parameter and θ is the shape parameter. To generate 1000 random numbers from Pareto(2, 3), use the inverse transform method. Graph the density histogram of the sample and superimpose the density Pareto(2, 3). The pdf of Pareto(η, θ) is f (x) = θηθ, x η. x θ+1 16 / 67

17 Example: Discrete uniform distribution Suppose we want X to be sampled uniformly from 1,..., N. Here is an example where the cdf is neither continuous nor strictly increasing. The idea is as follows: 1. Divide up the interval [0, 1] into N equal subintervals; i.e., [0, 1/N), [1/N, 2/N),..., [(N 1)/N, 1]. 2. Sample U U(0, 1). 3. If i/n < U (i + 1)/N, then X = i + 1. More simply, set X = NU + 1. This is equivalent to sample(n, size=1) in R. 17 / 67

18 The inverse transform: Discrete case For a discrete random variable, the cdf F X (x) is discontinuous at countable distinct values. Algorithm: 1. Generate a random u from U(0, 1). 2. Take x i, where F X (x i 1 ) < u F X (x i ) and x i supp(x ). Step 2 may be hard for some distributions. 18 / 67

19 Example: Bernoulli distribution Let s consider a random variable from Bernoulli trial with the probability of success p. We know that F X (0) = 1 p and F X (1) = 1, and then we generate random numbers 1. Generate a random number from U(0, 1). 2. Take 1 if 1 p < u 1, otherwise take 0. Example 3. X Bernoulli(0.2). > n < > p <- 0.2 > u <- runif(n) > x <- as.integer(u>1-p) # if u>1-p is true, it returns 1. 0, o. > mean(x) [1] > var(x) [1] / 67

20 Example: Geometric distribution Suppose X has a geometric distribution X Geom(p). The pmf is f (x) = (1 p) x p, x = 0, 1, 2,.... In order to generate random numbers, we follow 1. Derive the cdf F X (x) =, where q = 1 p. 2. Generate a random number u from U(0, 1). 3. Solve F X (x i 1 ) < u F X (x i ) < u The last inequality simplifies to x < log(1 u)/logq x + 1. The solution is x + 1 = log(1 u)/logq, where x is the ceiling function. 20 / 67

21 Example: Geometric distribution X Geom(0.4), then we generate 1000 random numbers. > n < > p <- 0.4 > u <- runif(n) > x <- ceiling(log(u)/log(1-p))-1 # U and 1-U have same distribu > mean(x) [1] > var(x) [1] / 67

22 Example: Binomial distribution We want to generate random numbers from B(10, 0.25). F X (0) = 0.056, F X (1) = 0.244, F X (10) = 1, which can be calculated in R using pbinom(0:10,10,0.25). Generate random number from U(0, 1), and take x i, where F X (x i 1 ) < u F X (x i ).. 22 / 67

23 Example: Poisson distribution We want to generate random numbers from Poisson(λ = 50). Note that there is no explicit formula for x such that F X (x 1) < u F X (x). We expect that most of observations should be in λ ± 3 λ, which is (28.7, 71.2) for λ = 50. Note that P(X > 71) = and P(X < 28) = , which may be ignorable. For any generated random number u, we should test at least a certain amount. In this example, P(X 80) = , hence we test 80 times! Is it efficient? A way to handle this is to ignore what is outside of a highly likely interval. 23 / 67

24 Example: Poisson distribution Let s remove the outside of λ ± 3 λ. n < lam <- 50 dist <- 3*sqrt(lam) # adjustable trun <- round(seq(lam-dist,lam+dist,1)) cdfp <- ppois(trun,lam) pois.rand <- NULL for ( i in 1:n){ u <- runif(1) pois.rand[i] <- trun[1] + sum(cdfp<u) } > mean(pois.rand) [1] > var(pois.rand) [1] / 67

25 Practice: Revisit Binomial Example We want to generate random numbers from B(10, 0.25). F X (0) = 0.056, F X (1) = 0.244, F X (10) = 1, which can be calculated in R using pbinom(0:10,10,0.25). Generate random number from U(0, 1), and take x i, where F X (x i 1 ) < u F X (x i ). Write an R code to implement it. Generate 1000 random numbers and find the mean and variance.. 25 / 67

26 Transformation methods Many types of transformation methods are available. Z N(0, 1), then Y = X 2 χ 2 1. U χ 2 m and V χ 2 n are independent, F = (U/m)/(V /n) F m,n. U Gamma(r, β) and V Gamma(s, β) are independent, Y = U/(U + V ) Beta(r, s). U U(0, 1) and V U(0, 1) are independent, Z 1 = 2logU cos(2πv ), Z 2 = 2logV sin(2πu) are independent standard normal random variables. This is called the Box-Muller algorithm. 26 / 67

27 Example: Beta distribution We want to generate 1000 random numbers from Beta(2, 2). The Beta(2, 2) density is f (x) = 6x(1 x), where x (0, 1). 1. Generate a random number u from Gamma(2, 1). 2. Generate a random number v from Gamma(2, 1). 3. Take x = u/(u + v) * The shape parameter β can be any number, only if it is same for U and V and β > / 67

28 Example: Beta distribution #ex. Beta distribution n < a <- 2 b <- 2 u <- rgamma(n,a,1) v <- rgamma(n,b,1) x <- u/(u+v) par(mfrow=c(1,2)) hist(x,breaks=25,prob=t) y <- seq(0,1,0.01) lines(y,6*y*(1-y),col= red ) # QQ plot - to compare sample data to theoretical values quan <- qbeta(y,2,2) qqplot(quan,x) abline(0,1,col="red") 28 / 67

29 Example: Beta distribution 29 / 67

30 Example: χ 2 distribution The χ 2 random variable X (with n degrees of freedom) is defined as follows: Z 1,..., Z n N (0, 1), which are iid. Take X = Z Z 2 n. Therefore, to sample X χ 2 n take the sum of squares of n independent standard normal random variables. Independent normals can be sampled using Box-Muller. 30 / 67

31 Example: t distribution A t random variable X (with ν degrees of freedom) is defined as the ratio of a standard normal and the square root of an independent (normalized) χ 2 random variable. More formally, let Z N (0, 1) and Y χ 2 (ν), then X = Z Y /ν is a t ν random variable. In R, use rt(n,df=nu). 31 / 67

32 Fundamental theorem of simulation If f is the density of interest, on an arbitrary space, we can write f (x) = f (x) 0 Hence, f appears as the marginal density of the joint distribution, (X, U) Unif {(x, u) : 0 < u < f (x)}. du. Theorem: Fundamental Theorem of Simulation Simulating X f (x) is equivalent to simulating (X, U) Unif {(x, u) : 0 < u < f (x)}. proof. 32 / 67

33 Example Suppose that b a f (x)dx = 1, and that f is bounded by m. We can simulate (Y, U) U(0, m) by simulating Y U(a, b) and U Y = y U(0, m), and take the pair only if the further constraint 0 < u < f (y) is satisfied. This leads to the correct distribution of the accepted value of Y, because / 67

34 Example: Beta distribution Simulate from a Beta(2.7, 6.3) distribution. Start with uniforms in a box but keep only those that satisfy the constraint. Simulated 2000 uniforms, kept only 744 betas. 34 / 67

35 Accept-Reject method For many distributions, the inverse transformation method fails to generate the required random variables. Accept-Reject method requires us to know the functional form of the density f of interest (target density) up to multiplicative constant. We use a simpler density g (candidate or instrumental density). We impose constraints on g 1. f and g have compatible supports. 2. There is a constant M s.t. f (x)/g(x) M for all x. Algorithm: 1. Generate Y g, U U(0, 1). 2. Accept X = Y if U f (Y ) Mg(Y ). Otherwise, return to Step / 67

36 Accept-Reject method We now show that the cdf of the accepted random variable is the cdf of X. Proof. P(Y x U f (Y )/{Mg(Y )}) =... = P(X x). 36 / 67

37 Accept-Reject method Let s think more about this method. P(accept Y ) =? What is the total probability of acceptance for any iteration? What distribution does the number of iterations until acceptance follow? We expect that on average each sample value requires iterations. 37 / 67

38 Example: Accept-Reject method We want to generate 1000 random numbers from Beta(2, 2). The Beta(2, 2) density is f (x) = 6x(1 x), where x (0, 1). Take g(y) from U(0, 1). Is it okay? Find M s.t. f (y)/g(y) M. Take x = y if f (y) Mg(y) > u. How many iterations do we need for generating 1000 random numbers? 38 / 67

39 Example: Accept-Reject method #ex. AR n < k <- 0 # accepted i <- 0 # iterations x <- NULL while(k<n){ u <- runif(1) i <- i+1 y <- runif(1) if (6*y*(1-y)/1.5 > u){ k <- k+1 x[k] <- y } } > i [1] 1512 How about setting M = 6? Is it also okay? 39 / 67

40 Properties of Accept-Reject method The bound f Mg need NOT be tight. M should be as small as possible because the probability of acceptance is 1/M. The algorithm does not depend on the normalizing constant. The criticism of the Accept-Reject algorithm:. Importance sampling method can avoid this problem. 40 / 67

41 Adaptive Rejection Sampling (ARS) on the board. 41 / 67

42 Multivariate normal distribution A p-dimensional random vector x has a multivariate normal distribution N (µ, Σ) if x has the density f (x) = 1 (2π) p/2 Σ 1/2 exp{ 0.5(x µ)t Σ 1 (x Σ)}, where Σ is a p p positive definite matrix with entries σ ij = cov(x i, X j ). We generate multivariate normal random vectors 1. Generate standard normal random vector Z N (0, I p ). 2. Transform Z in order to obtain a desired random vector. 42 / 67

43 Multivariate normal distribution Suppose X N (µ, Σ) and Z N (0, I p ), then for any conformable constant matrix A and constant vector b, Y = AX + b N (Aµ + b, AΣA T ), Y = AZ + b N (b, AA T ). Using this property, if Σ is factorized into AA T, we can set Y = AZ + µ, then Y = AZ + µ N (, ). Can we decompose Σ into AA T? 43 / 67

44 Spectral decomposition A p p symmetric matrix Σ is decomposed as Σ = CΛC T, where Λ is the diagonal matrix with the eigenvalues of Σ along the diagonal and C is the matrix whose columns are the eigenvectors of Σ. As C is an orthogonal matrix, C T C = CC T = I p and C 1 = C T. In addition, when Σ is positive definite, we produce the square root matrix as Σ 1/2 = CΛ 1/2 C T, where Σ = Σ 1/2 Σ 1/2. Hence we can set A = Σ 1/2, so it results in AA T = Σ. 44 / 67

45 Example: Multivariate normal We want to generate 1000 random vectors from N (µ, Σ) µ = (1, 2) T, ( ) Σ = The eigen function in R returns eigenvalues and eigenvectors. 45 / 67

46 Example: Multivariate normal mu <- c(1,2) Sig <- matrix(c(1,0.8,0.8,1),2,2) mvn.spec <- function(n,mu,sig){ ei <- eigen(sig) Lam <- diag(ei$values) ei.mat <- ei$vectors A <- ei.mat %*% sqrt(lam) %*% t(ei.mat) x <- matrix(0,n,length(mu)) for (i in 1:n){ x[i,] <- A %*% rnorm(length(mu)) + mu } return(x) } x <- mvn.spec(1000,mu,sig) Run plot(x[,1],x[,2]);colmeans(x);cov(x). Does this sample seem multivariate normal random vectors? 46 / 67

47 Can we avoid using for loop in previous Example? We generated n random vectors by calculating x 1 = Az 1 + µ,..., x n = Az n + µ. In a compact way, it can be represented as z T 1 A µ 1... µ p z T 2 X n p = A µ 1... µ p z T n A µ 1... µ p n p = ZA + Jµ T, where Z = (z 1,..., z n ) T and J = (1,..., 1) T. n p 47 / 67

48 Example: Multivariate normal, alternative mu <- c(1,2) Sig <- matrix(c(1,0.8,0.8,1),2,2) mvn.spec2 <- function(n,mu,sig){ ei <- eigen(sig) Lam <- diag(ei$values) ei.mat <- ei$vectors A <- ei.mat %*% sqrt(lam) %*% t(ei.mat) x <- matrix(0,n,length(mu)) Z <- matrix(rnorm(n*length(mu)),n,length(mu)) x <- Z %*% A + rep(1,n)%*%t(mu) return(x) } y <- mvn.spec2(1000,mu,sig) 48 / 67

49 Cholesky factorization A real symmetric positive definite matrix Σ can be factorized as Σ = Q T Q, where Q is an upper triangular matrix. In R, chol function returns an upper triangular matrix Q such that Q T Q = Σ. We will generate 1000 random vectors using Cholesky decomposition. 49 / 67

50 Example: Multivariate normal mu <- c(1,2) Sig <- matrix(c(1,0.8,0.8,1),2,2) mvn.chol <- function(n,mu,sig){ Q <- chol(sig) x <- matrix(0,n,length(mu)) for (i in 1:n){ x[i,] <- t(q) %*% rnorm(length(mu)) + mu # Q must be transpo } return(x) } z <- mvn.chol(1000,mu,sig) colmeans(z) cov(z) 50 / 67

51 Singular Value Decomposition (SVD) Let A be an n p matrix of rank k. The singular value decomposition of a matrix A can be expressed as A = UDV T, where U is n k, D is k k, and V is p k. The diagonal elements of the nonsingular diagonal matrix D = diag(λ 1,..., λ k ) are the positive square roots of λ 2 1,..., λ2 k, which are the nonzero eigenvalues of A T A or of AA T. The values λ 1,..., λ k are called the singular values of A. The k columns of U are the normalized eigenvectors of AA T corresponding to the eigenvalues λ 2 1,..., λ2 k. The k columns of V are the normalized eigenvectors of A T A corresponding to the eigenvalues λ 2 1,..., λ2 k. UT U = V T V = I. 51 / 67

52 Singular Value Decomposition (SVD) As Σ is a symmetric positive definite matrix, the svd provides U = V = C such that Σ = CΛC T. We obtain square root matrix Σ 1/2 = UD 1/2 V T. Hence, svd is same as the spectral decomposition. Which one is better? Why? In R, svd function do the singular value decomposition. 52 / 67

53 Example: Multivariate normal mu <- c(1,2) Sig <- matrix(c(1,0.8,0.8,1),2,2) mvn.svd <- function(n,mu,sig){ s <- svd(sig) A <- s$u %*% diag(sqrt(s$d)) %*% t(s$v) x <- matrix(0,n,length(mu)) for (i in 1:n){ x[i,] <- A %*% rnorm(length(mu)) + mu } return(x) } w <- mvn.svd(1000,mu,sig) colmeans(w) cov(w) 53 / 67

54 Comparisons library(mvtnorm) n <- 100; p <- 50; iter <- 1000; mu <- rep(0,p); set.seed(10) system.time(for (i in 1:iter){ mvn.spec(n,mu,cov(matrix(rnorm(n*p),n,p)))})[3] set.seed(10) system.time(for (i in 1:iter){ mvn.chol(n,mu,cov(matrix(rnorm(n*p),n,p)))})[3] set.seed(10) system.time(for (i in 1:iter){ mvn.svd(n,mu,cov(matrix(rnorm(n*p),n,p)))})[3] set.seed(10) system.time(for (i in 1:iter){ rmvnorm(n,mu,cov(matrix(rnorm(n*p),n,p)))})[3] set.seed(10) system.time(for (i in 1:iter){ mvn.spec2(n,mu,cov(matrix(rnorm(n*p),n,p)))})[3] 54 / 67

55 Comparisons We compare methods generating multivariate normal vectors including built-in function in R, rmvnorm in mvtnorm package. method elapsed time Spectral d Cholesky d Singular value d built-in function Spectral d. (no loop) / 67

56 Finite mixture Given a finite set of probability density functions f 1 (x),..., f n (x) and weights w 1,..., w n such that w i 0 and i w i = 1, the mixture distribution can be represented by f (x) = n w i f i (x). i=1 It can be extended to n = for a countable infinite set. 56 / 67

57 Example: Finite mixture Suppose X 1 N (0, 1) and X 2 N (3, 1) are independent. Let s think of the following two problems: 1. We define Y = X 1 + X 2, and want to generate random numbers from the distribution of Y. 2. We define a 50% mixture, f (x) = 0.5f X1 (x) + 0.5f X2 (x). These distributions are same? How do we generate random numbers from 1 and 2? 57 / 67

58 Example: Finite mixture 1. We define Y = X 1 + X 2, and want to generate random numbers from the distribution of Y. In fact, Y has a distribution of. There are two ways to generate random numbers: 1.1 Generate Y. 1.2 Generate x 1. and x 2., then take y =.. 2. We define a 50% mixture, f (x) = 0.5f X1 (x) + 0.5f X2 (x). Generate U U(0, 1). If, take x N (0, 1). Otherwise, take x N (3, 1). 58 / 67

59 Example: Finite mixture n < x1 <- rnorm(n, 0, 1) x2 <- rnorm(n, 3, 1) y <- x1 + x2 #the convolution u <- runif(n) k <- as.integer(u > 0.5) x <- k*x1+(1-k)*x2 par(mfrow=c(1,2)) hist(y,freq=f,breaks=100) hist(x,freq=f,breaks=100) mean(y) var(y) > mean(y) [1] > var(y) [1] / 67

60 Example: Finite mixture 60 / 67

61 Example: Mixture of several distributions We define the mixture as f (x) = 5 w j f j (x), j=1 where X j Gamma(α = 2, β j = j) are independent and w j = j/15. To generate random number from this mixture, 1. Generate an integer from j {1,..., 5}, where P(j) = w j, j = 1,..., 5. (In R, we use sample function.) 2. Generate a random number from the corresponding Gamma(α, β j ). 61 / 67

62 Example: Mixture of several distributions # code 1 pt <- proc.time() n < j <- sample(1:5,size=n,replace=true,prob=(1:5)/15) x <- NULL for (i in 1:n){ x[i] <- rgamma(1,shape=2,scale=j[i]) } proc.time()-pt # code 2, A vectorized approach pt <- proc.time() n < j <- sample(1:5,size=n,replace=true,prob=(1:5)/15) y <- rgamma(n,shape=2,scale=j) proc.time()-pt 62 / 67

63 Practice: Mixture of several distributions This is also the mixture of five gamma distributions, i.e., f (x) = 5 w j f j (x), j=1 where X j Gamma(α = j + 1, β j ) are independent and β j = (1, 1.6, 2.2, 2.7, 3), w j = (0.1, 0.3, 0.4, 0.1, 0.1). Generate random numbers and provide sample mean and sample variance. 63 / 67

64 Continuous mixture In case that the set of component distributions is uncountable, the construction of such distributions has a similarity to that of mixture distributions, replacing the finite summations with integrals. Consider a probability density function f (x; a) for a random variable X, parameterized by a. That is, for each value of a in some set A, f (x; a) is a probability density function with respect to X. Given a probability density function w, the function f (x) = w(a)f (x; a)da, A is a probability density function for X, and A w(a)da = / 67

65 Poisson-Gamma mixture If X NB(r, p), then X has the mixture representation where β = (1 p)/p. X Y = y Poisson(y), Y Gamma(r, β), FYI... noting that E(X ) = E{(E(X Y )}, we have E(X ) = E{(E(X Y )} = E(Y ) = rβ = r 1 p p. 65 / 67

66 Example: Poisson-Gamma mixture Suppose X NB(7, 0.4). We generate random numbers from X and compare the results with those generated by Poisson-Gamma mixture. n < r <- 7 p <- 0.4 y <- rgamma(n,r,scale=(1-p)/p) x <- rpois(n,y) hist(x,prob=true,breaks=50) lines(1:50,dnbinom(1:50,r,p),lwd=2,col="red") > mean(x) [1] > var(x) [1] / 67

Stat 451 Lecture Notes Simulating Random Variables

Stat 451 Lecture Notes Simulating Random Variables Stat 451 Lecture Notes 05 12 Simulating Random Variables Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 22 in Lange, and Chapter 2 in Robert & Casella 2 Updated: