Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Similar documents
ISyE 6644 Fall 2014 Test #2 Solutions (revised 11/7/16)

Exponential Distribution and Poisson Process

Independent Events. Two events are independent if knowing that one occurs does not change the probability of the other occurring

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

1 Acceptance-Rejection Method

Problem 1 (20) Log-normal. f(x) Cauchy

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Practice Problems Section Problems

Continuous Distributions

LIST OF FORMULAS FOR STK1100 AND STK1110

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 18

This does not cover everything on the final. Look at the posted practice problems for other topics.

Exercises and Answers to Chapter 1

Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators.

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Chapter 4. Continuous Random Variables

STA 256: Statistics and Probability I

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Mathematical statistics

Random Variables and Their Distributions

15 Discrete Distributions

3 Continuous Random Variables

Continuous Random Variables

Simulation. Alberto Ceselli MSc in Computer Science Univ. of Milan. Part 4 - Statistical Analysis of Simulated Data

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

Brief Review of Probability

STAT 512 sp 2018 Summary Sheet

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

Asymptotic Statistics-VI. Changliang Zou

2905 Queueing Theory and Simulation PART IV: SIMULATION

7 Random samples and sampling distributions

Statistics 3858 : Maximum Likelihood Estimators

Common ontinuous random variables

Mathematical statistics

Statistics. Statistics

Continuous random variables

Central Limit Theorem ( 5.3)

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

FIRST YEAR EXAM Monday May 10, 2010; 9:00 12:00am

1 st Midterm Exam Solution. Question #1: Student s Number. Student s Name. Answer the following with True or False:

Asymptotic Statistics-III. Changliang Zou

Midterm Examination. STA 205: Probability and Measure Theory. Thursday, 2010 Oct 21, 11:40-12:55 pm

STAT Chapter 5 Continuous Distributions

Completeness. On the other hand, the distribution of an ancillary statistic doesn t depend on θ at all.

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences FINAL EXAMINATION, APRIL 2013

Let X be a continuous random variable, < X < f(x) is the so called probability density function (pdf) if

Lecture 1: August 28

Notes on Continuous Random Variables

Modelling the risk process

Gaussian vectors and central limit theorem

Week 9 The Central Limit Theorem and Estimation Concepts

ISyE 6644 Fall 2014 Test 3 Solutions

ISyE 3044 Fall 2017 Test #1a Solutions

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

2 Random Variable Generation

2 Functions of random variables

Variance reduction techniques

Math Stochastic Processes & Simulation. Davar Khoshnevisan University of Utah

Probability and Distributions

MATH 56A: STOCHASTIC PROCESSES CHAPTER 6

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Stat410 Probability and Statistics II (F16)

MATH : EXAM 2 INFO/LOGISTICS/ADVICE

Statistics and data analyses

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4.

Solution-Midterm Examination - I STAT 421, Spring 2012

IE 581 Introduction to Stochastic Simulation

A Very Brief Summary of Statistical Inference, and Examples

Stochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring 2018 version. Luc Rey-Bellet

Three hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

First Year Examination Department of Statistics, University of Florida

Final Examination December 16, 2009 MATH Suppose that we ask n randomly selected people whether they share your birthday.

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 120 minutes.

Review. December 4 th, Review

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

Probability and Statistics Notes

Practice Examination # 3

The exponential distribution and the Poisson process

DUBLIN CITY UNIVERSITY

STAT 461/561- Assignments, Year 2015

Statistics Ph.D. Qualifying Exam: Part II November 9, 2002

Review of Statistics I

Midterm Exam 1 Solution

Slides 5: Random Number Extensions

45 6 = ( )/6! = 20 6 = selected subsets: 6 = and 10

Ch3. Generating Random Variates with Non-Uniform Distributions

Spring 2012 Math 541B Exam 1

BMIR Lecture Series on Probability and Statistics Fall, 2015 Uniform Distribution

UNIT 5:Random number generation And Variation Generation

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Notes on the Multivariate Normal and Related Topics

Statistics 135 Fall 2008 Final Exam

Department of Mathematics

Exam 3, Math Fall 2016 October 19, 2016

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Statistics Ph.D. Qualifying Exam

Ching-Han Hsu, BMES, National Tsing Hua University c 2015 by Ching-Han Hsu, Ph.D., BMIR Lab. = a + b 2. b a. x a b a = 12

Transcription:

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Instructions This exam has 7 pages in total, numbered 1 to 7. Make sure your exam has all the pages. This exam will be 2 hours and 30 minutes in length. This is a closed book, closed-note exam, and no electronic devices (e.g., cellphone, computer) are allowed. The last pages of the exam have some possibly useful reference material. For all problems, follow these instructions: Show your work and give reasons. For any problem that asks you to provide an algorithm, be sure to give a step-by-step algorithm. Do not explain how to implement your algorithm in Arena or Excel, but rather, you should provide an algorithm (i.e., pseudo-code) that can be implemented in any language. Also, for any random variates needed by your algorithm, be sure to explicitly provide the steps needed to generate the random variates. You may assume that you have at your disposal a good unif[0,1] random number generator. For all constants or parameter values that can be computed, be sure to explicitly give their values. Be sure to explicitly define any estimators for unknown quantities. In any problem for which you use an approximation based on the central limit theorem (CLT), you can use a quantile point from the normal distribution. Any hypothesis test you perform should be at level α = 0.05. Any confidence interval (CI) you provide should be at a 95% confidence level. 1

1. [10 points] We collected 252 daily closing prices of Microsoft (MSFT) stock over a one-year time period. Let P i denote the closing price on the ith day, i = 0, 1,..., 251, and define Y i = ln(p i /P i 1 ), i = 1, 2,..., 251. We fit a normal distribution to the sample Y 1, Y 2,..., Y 251 using Arena s Input Analyzer, which gave the following output. Comment on the appropriateness of fitting a normal distribution to the data Y 1,..., Y 251. Be sure to discuss the output from the Input Analyzer. 2

2. [30 points] Let X be a random variable with CDF F, and let γ = P(X A) for some given set A. Assume that 0 < γ < 1. Let X 1, X 2,..., X n be IID samples from F. (a) [10 points] Give an estimator ˆγ n of γ that uses all of the n samples X 1, X 2,..., X n. Show that your estimator ˆγ n is an unbiased estimator of γ. (b) [10 points] Show that your estimator ˆγ n satisfies a law of large numbers and satisfies the following central limit theorem (CLT): n β (ˆγ n α) D N (0, 1) for large n, for some some constants α and β. Explicitly specify formulas for α and β. (c) [10 points] Derive an approximate 95% (two-sided) confidence interval for γ. 3. [60 points] A software company produces different software titles. Customers having problems with any of the company s products call the technical-support center, which is open 10 hours each day. Each day, calls arrive to technical support according to a non-homogeneous Poisson process with rate function λ(t) = 12t t 2 + 15, where t denotes the number of hours since the technical-support center opened for the day. (All times given are in hours.) The technical-support center has enough phone lines and enough staff to answer the phones so that every call received is immediately answered; i.e., no caller ever gets a busy signal or has to wait on hold. A phone call by a customer to technical support consists of two parts. First, the customer explains the problem he is having, and the amount of time this takes has the following density function: 40x if 0 x 0.1 f(x) = 5 10x if 0.1 < x 0.5 0 otherwise This is followed by the technical-support person providing the solution. In 70% of the cases, the technical-support person knows the answer immediately, and the amount of time it takes to explain it has a uniform[0.1, 0.3] distribution. In the other 30% of the cases, the technical-support person needs to research the question and then gives the answer; the total time to research and provide the answer has a Weibull distribution with shape parameter α = 1/3 and scale parameter β = 1/12. Once the answer is provided, the caller immediately hangs up. No new calls are accepted after t = 10, but any call that arrived before t = 10 and is still in process at t = 10 continues until the solution is provided by the technical-support person. Assume that callers are independent of one another and that the time for a caller to explain his problem is independent of the amount of time it takes for the technical-support person to provide the answer. For the following questions, you may assume that you have available a good uniform[0,1] random number generator. Also, if one part uses results from previous parts, you do not need to give all of the details from the previous parts. 3

(a) [10 points] Let X be a random variable having density function f. Give an algorithm to generate a sample of X. (b) [10 points] Define Y as the amount of time the technical-support person spends providing the solution. Give an algorithm to generate a sample of Y. (c) [10 points] Define Z as the total amount of time a customer phone call takes. Give an algorithm to generate a sample of Z. (d) [10 points] Give an algorithm to simulate the times of the phone calls that arrive during the time interval [0, 10] and the total number of calls arriving during [0, 10]. (e) [10 points] Define θ to be the probability that in a day, the number of calls exceeding 1 hour in length is at least 15. More precisely, let R be the number of calls that exceeded 1 hour in length in a day, so θ = P(R 15). Using the previous parts of this problem, describe an algorithm to estimate θ using simulation. Your algorithm should also produce a confidence interval for θ. (f) [10 points] Let M denote the total number of calls received during the time interval [0, 10]. Give an algorithm using M as a control variate to estimate and construct a confidence interval for θ. 4

Some possibly useful facts The CDF F of random variable X is defined as F (z) = P (X z). If X is a continuous random variable with density function f, then F (z) = z f(x) dx. If X is a discrete random variable with probability mass function p and support {x 1, x 2, x 3,...}, then F (z) = x i z p(x i). The expectation or mean µ of a random variable X is µ = E[X] = x i x i p(x i ) when X is a discrete random variable with probability mass function p( ) and support {x 1, x 2,...}, and µ = xf(x) dx when X is a continuous random variable with density function f. The variance of a random variable X is V(X) = E[(X µ) 2 ] = E[X 2 ] µ 2. The covariance of random variables X and Y with respective means µ X and µ Y is C(X, Y ) = E[(X µ X )(Y µ Y )] = E[XY ] µ X µ Y. For IID data X 1, X 2,..., X n, the sample mean is X(n) = 1 n nk=1 X i, and the sample variance is S 2 (n) = 1 n 1 nk=1 (X k X(n)) 2. The normal distribution with { mean µ and variance σ 2 > 0 is denoted by N (µ, σ 2 ) and has density function f(x) = 1 σ exp }. (x µ)2 Its CDF cannot be evaluated in closed form. 2π 2σ 2 If Z 1, Z 2,..., Z n, n 1, are IID N (0, 1) random variables, then X = Z 2 1 + Z2 2 + + Z2 n has a chi-squared distribution with n degrees of freedom, which is denoted by χ 2 n. If Z N (0, 1) and U χ 2 n with Z and U independent, then T = Z/ U/n has a Student-t distribution with n degrees of freedom, which is denoted by t n. If X 1, X 2,..., X n are IID N (µ, σ 2 ), then (n 1)S 2 (n)/σ 2 χ 2 n 1 and ( X(n) µ)/ S 2 (n)/n t n 1. The uniform distribution over the interval [a, b] has density function f(x) = 1/(b a) for a x b, and f(x) = 0 otherwise. Its CDF is F (x) = (x a)/(b a) for a x b, and F (x) = 0 otherwise. Its mean is (a + b)/2 and its variance is (b a) 2 /12. The exponential distribution with parameter λ > 0 has density function f(x) = λe λx for x 0, and f(x) = 0 for x < 0. Its CDF is F (x) = 1 e λx for x 0, and F (x) = 0 for x < 0. Its mean is 1/λ and its variance is 1/λ 2. The gamma distribution with shape parameter α > 0 and scale parameter β > 0 has density function f(x) = β α x α 1 exp{ x/β}/γ(α) for x 0, and f(x) = 0 for x < 0, where Γ(α) = 0 t α 1 e t dt. In general, its CDF cannot be evaluated in closed form. Its mean is αβ and its variance is αβ 2. The Weibull distribution with shape parameter α > 0 and scale parameter β > 0 has density function f(x) = αβ α x α 1 exp{ (x/β) α } for x 0, and f(x) = 0 for x < 0. Its CDF is F (x) = 1 exp{ (x/β) α } for x 0, and F (x) = 0 for x < 0. Its mean is (β/α)γ(1/α) and its variance is (β 2 /α)[2γ(2/α) (1/α)(Γ(1/α)) 2 ]. If Z N (µ, σ 2 ), then X = e Z has a lognormal(µ, σ 2 ) distribution. The lognormal(µ, σ 2 ) distribution has density function f(x) = 1/(x 2πσ 2 ) exp( (ln x µ) 2 /(2σ 2 )) for x 0, and f(x) = 0 for x < 0. Its CDF cannot be evaluated in closed form. Its mean is exp(µ + (σ 2 /2)) and its variance is e 2µ+σ2 (e σ2 1). A Poisson distribution has a given parameter λ > 0, which is also its mean, and has support {0, 1, 2,...}. The probability mass function is p(k) = e λ λ k /k! for k = 0, 1, 2,.... 5

Law of large numbers (LLN): if X 1, X 2,..., X n is a sequence of IID random variables with finite mean µ, then X(n) µ for large n. Central limit theorem (CLT): if X 1, X 2,..., X n is a sequence of IID random variables with finite n mean µ and positive, finite variance σ 2, then σ ( X(n) µ) D N (0, 1) for large n, where D means approximately equal to in distribution. Also, n S(n) ( X(n) µ) D N (0, 1) for large n. Confidence intervals (CIs): if X 1, X 2,..., X n is a sequence of IID random variables with finite mean µ and positive, finite variance σ 2, then an approximate 100(1 α)% confidence interval for µ is [ X(n) ± z 1 α/2 S(n)/ n], where z 1 α/2 is the constant such that P(N (0, 1) > z 1 α/2 ) = α/2. For example, P(N (0, 1) > 1.65) = 0.05, P(N (0, 1) > 1.96) = 0.025, and P(N (0, 1) > 2.60) = 0.005. ln e = 1, ln(ab) = ln(a) + ln(b), ln(a b ) = b ln(a), and e a ln(b) = b a. Quadratic formula: The roots of ax 2 + bx + c = 0 are x = [ b ± b 2 4ac]/(2a). Parameter estimation: Suppose X 1,..., X n are IID with distribution F θ1,...,θ d, where θ 1,..., θ d are unknown and to be estimated. MLE: If F θ1,...,θ d is a continuous distribution with density function f θ1,...,θ d, then compute the likelihood L(X 1,..., X n ) = n i=1 f θ1,...,θ d (X i ). If F θ1,...,θ d is a discrete distribution with probability mass function p θ1,...,θ d, then compute the likelihood L(X 1,..., X n ) = n i=1 p θ1,...,θ d (X i ). Then choose the values of θ 1,..., θ d that maximize L(X 1,..., X n ), or equivalently that maximize ln L(X 1,..., X n ). Method-of-moments: Let µ j = E[X j ] be the jth (theoretical) moment of F θ1,...,θ d, and let M j = 1 n ni=1 X j i be the jth sample moment of the data. Then set µ j = M j for j = 1, 2,..., d, and solve for θ 1,..., θ d. Goodness-of-fit tests: Suppose X 1,..., X n are IID from some hypothesized distribution F. Chi-squared test: Divide samples into k bins. Let O i be the number of observations in the ith bin, and let E i be the expected number of observations in the ith bin. Compute test statistic χ 2 = k i=1 (O i E i ) 2 /E i, which has a chi-squared distribution with ν degrees of freedom, where ν = (number of bins) (1 + number of estimated parameters). Kolmogorov-Smirnov (K-S) test: Compute empirical CDF ˆF n (x) = (# observations x)/n. Then compute D n = max x R ˆF n (x) F (x). Obtain the test statistic by centering and scaling D n, where the centering and scaling depend on the hypothesized F. LCG: Z i = (az i 1 + c)(mod m) and U i = Z i /m, for constants a, c and m. Inverse-transform method to generate sample X having CDF F. return X = F 1 (U). Generate U unif[0, 1], and Convolution method: Suppose X = Y 1 + Y 2 + + Y n, where Y 1, Y 2,..., Y n are independent, and Y i F i. We generate a sample of X by generating Y 1 F 1, Y 2 F 2,..., Y n F n, where Y 1,..., Y n are generated independently, and returning X = Y 1 + + Y n. Composition method: Suppose we can write CDF F as F (x) = p 1 F 1 (x) + p 2 F 2 (x) + + p d F d (x) for all x, where p i > 0 for each i and p 1 + + p d = 1. We then generate a sample X F as follows. First generate I having pmf (p 1, p 2,..., p d ). If I = j, then independently generate Y j F j, and return X = Y j. 6

Acceptance-rejection (AR) method: To generate X having density f, suppose Y is a RV with density g such that there exists a constant c for which f(x) cg(x) for all x. Then we can generate a sample of X using the AR method as follows. (i) Generate Y with density g. (ii) Generate U unif[0, 1], independent of Y. (iii) If cg(y )U f(y ), then return X = Y and stop; else, reject (Y, U) and return to step (i). Box-Muller method for generating standard normals: Generate independent U 1, U 2 unif[0, 1], and return X 1 = 2 ln(u 1 ) cos(2πu 2 ) and X 2 = 2 ln(u 1 ) sin(2πu 2 ) as IID N (0, 1). If Z = (Z 1,..., Z d ) is a vector of IID standard normals, then AZ + µ N d (µ, Σ) for a constant vector µ and matrix A, where Σ = AA and prime denotes transpose. A Poisson process [N(t) : t 0] with parameter λ > 0 is a counting process, where N(t) is the number of events in the time interval [0, t], and N(t) has a Poisson distribution with parameter λt. The interevent times are IID exponential with mean 1/λ. A nonhomogeneous Poisson process [N (t) : t 0] with rate function λ(t), t 0, can be generated by generating a (homogeneous) Poisson process [N(t) : t 0] with constant rate λ λ(t) for all t 0, and then accepting event time V i of the (homogeneous) Poisson process with probability λ(v i )/λ, independent of all other events. For each t, E[N (t)] = t 0 λ(s) ds. Common random numbers (CRN): Suppose that X and Y are outputs from different simulation models, and let µ x = E[X] and µ y = E[Y ]. Assume that we can express X = h x (U 1, U 2,..., U d ) and Y = h y (U 1, U 2,..., U d ), where h x and h y are given functions, and U 1, U 2,..., U d are IID unif[0, 1]. Then to estimate and construct a CI for α = µ x µ y using CRN, generate n IID replications of D = h x (U 1, U 2,..., U d ) h y (U 1, U 2,..., U d ), and form a confidence interval for α by computing the sample mean and sample variance of the IID replicates of D. A variance reduction is guaranteed when h x and h y are monotone in the same direction for each argument. Antithetic variates (AV): Suppose that X is an output from a simulation model, and let α = E[X]. Assume that we can express X = h x (U 1, U 2,..., U d ), where h x is a given function, and U 1, U 2,..., U d are IID unif[0, 1]. Then to estimate and construct a CI for α using AV, we generate n IID replications of D = [h x (U 1, U 2,..., U d )+h x (1 U 1, 1 U 2,..., 1 U d )]/2, and form a confidence interval for α by computing the sample mean and sample variance of the IID replicates of D. A variance reduction is guaranteed (compared to 2n IID replications of just X) when h x is monotone in each argument. Control variates (CV): Let (X, Y ) be outputs from a simulation, where E[Y ] is known. To estimate α = E[X] using CV, generate n IID replications (X i, Y i ), i = 1, 2,..., n, of (X, Y ). Let X i = X i ˆλ n (Y i E[Y ]), where ˆλ n is an estimate of Cov[X, Y ]/V[Y ] from the n replications. Construct a CI for α by taking the sample mean and sample variance of the X i. Importance sampling (IS): Let X be a random variable with density f, and let α = E f [h(x)], where h is a function and E f denotes expectation when X has density f. Let g be another density function such that h(x)f(x) > 0 implies g(x) > 0. Then α = E g [h(x)l(x)], where L(x) = f(x)/g(x) is the likelihood ratio and E g denotes expectation when X has density g. To estimate and construct a CI for α using IS, use g to generate IID samples of h(x)l(x), and compute the sample mean and sample variance of the h(x)l(x). 7