Asymptotically Optimal Simulation Allocation under Dependent Sampling

Size: px

Start display at page:

Download "Asymptotically Optimal Simulation Allocation under Dependent Sampling"

Ophelia Martin
6 years ago
Views:

1 Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD , USA, Sandee Juneja School of Technology and Comuter Science, Tata Institute of Fundamental Research, Mumbai, India , Michael C. Fu The Robert H. Smith School of Business, Institute for Systems Research, University of Maryland, College Park, MD , USA, We consider the roblem of otimal allocation of a simulation comuting budget to maximize the robability of correct selection in the ordinal otimization setting where the outut samles from different designs have a general distribution and may be mutually deendent, e.g., due to the use of common random numbers to generate them. Asymtotically, in terms of an increasing comutational budget, this roblem may be viewed in the large deviations framework, so that the rate function of the robability of false (incorrect) selection rovides a good surrogate measure for this robability. We evaluate this rate function as a function of comutation allocation to different designs and identify the equations satisfied by the otimal allocation in this asymtotic limit. By establishing several theoretical roerties that the allocation must satisfy, we reduce the roblem to a single-variable nonlinear otimization roblem that can be solved by numerical methods. We also characterize the convergence rate of the otimality ga (bias) between the asymtotically otimal solution and the true otimal solution. 1. Introduction In many simulation alications, the objective is to select a best design from among a finite set of cometing designs that are evaluated using stochastic system models, where the decision is based on comaring erformance measures that are exectations of random variables associated with each design model. In our setting, these exectations are unknown, and simulation is used to generate samles of the associated random variables. One then comutes a samle mean for each design, and then selects the design with the best (w.l.o.g., the highest) 1

2 samle mean. Under sufficient indeendence in the generated samles, the law of large numbers guarantees the convergence of the samle mean to the corresonding exectation, and the central limit theorem imlies that the rate of convergence is roortional to the inverse of the square root of the number of samles generated. The traditional aroach towards these tyes of roblems is based on the classical statistical ranking & selection rocedures for which there is an enormous body of literature, nearly all of it assuming indeendent Gaussian samling distributions. In the simulation context considered here, there has been significant research rogress made in the last decade or so on in develoing rocedures for the correlated setting (e.g., Yang and Nelson 1991; Nelson and Matejcik 1995, Goldsman and Nelson 1998; see also Chick and Inoue 2001ab for a Bayesian aroach), but to the best of our knowledge, non-gaussian distributions have not been addressed. The aroach of ordinal otimization (cf. Ho, Srinivas, and Vakili 1992, Dai and Chen 1997) is based on the observation that the robability of false (incorrect) selection decreases at an exonential rate as a function of the total number of generated samles (see Dai 1996, Dai and Chen 1997). Thus, while the number of samles needed to accurately estimate exectations may be large (as imlied by the inverse square root convergence rate), far fewer samles may be needed to correctly select the best design. The otimal comuting budget allocation (OCBA) framework introduced by Chen et al. (1997, 2000, 2005) exloits this observation by formulating and aroximately solving an otimization roblem that maximizes the robability of correct selection for a given total number of simulation relications. Thus, under the otimal allocation, little effort may be exended on obviously inferior designs, while more effort is focused on distinguishing between the to few cometing designs. The formulation assumes that each samle from each design has a Gaussian distribution and is indeendent of all the other generated samles. Glynn and Juneja (2004) use the large deviations framework to rigorously analyze this otimization roblem when the generated samles are allowed to have non-gaussian distributions; however, they still assume indeendence between generated samles from different designs. In ractice, common random numbers may be beneficially used to induce ositive deendence between oututs from different designs to further imrove the value of the robability of correct selection (see Dai and Chen 1997; Deng, Ho, and Hu 1992). In many alications, deendence between the designs may be endogenous, e.g., consider the case where the erformance of different financial ortfolios is based on the value of exected utility of each ortfolio. Here the utility of any given ortfolio is a function of underlying assets and these in turn may deend uon a 2

3 few common factors. Motivated by such alications, in this aer we consider the roblem of otimal allocation of a comuting budget when the oututs from different designs are deendent. Fu et al. (2004, 2006) consider this roblem under the assumtion that oututs from all the designs follow a multivariate Gaussian distribution. Based on Xiong (2005), we generalize their analysis to allow for non-gaussian distributions. As noted by Glynn and Juneja (2004), this setting also has imortant ractical consequences, because assuming Gaussian distributions for non-gaussian samling can lead to faulty allocations and significant erformance degradation, even after batching. A summary of our secific research contributions is as follows: We use the large deviations framework to identify the large deviations rate function associated with the robability of false selection as a function of the comutational allocations to various designs. Thus, asymtotically, the roblem of finding an otimal allocation to minimize the robability of false selection reduces to a deterministic otimization roblem involving the corresonding rate function. We establish structural roerties of the solution to the asymtotic otimization roblem, which leads to a simlified, single-variable nonlinear otimization roblem, for which we roose a numerical solution rocedure. We exlicitly characterize the convergence rate for the otimality ga (bias) between the otimal asymtotic allocations and the true (finite samle) otimal allocations. As far as we are aware, these are the first such results in ordinal otimization. As is tyical in the OCBA literature (cf. Chen et al. 1997, 2000; Fu et al. 2004, 2006), the analysis assumes that the large deviations rate function of the robability of false selection is known (e.g., if the joint distribution from samles from different designs is known). Since this is not the case in ractice as even the exected value of the samles is unknown imlementation involves a ilot hase where this rate function is estimated, from which the corresonding otimal allocations are determined, leading to some erformance degradation due to estimation errors. However, analysis of this erformance degradation is beyond the scoe of this aer. The rest of the aer is organized as follows. In Section 2, we formulate the roblem and identify the large deviations rate function associated with the robability of incorrect 3

4 selection. In Section 3 we develo a numerical solution methodology to solve the nonlinear otimization roblem to determine asymtotically otimal allocations. The convergence rate of these allocations to the true (finite samle) otimal allocations is characterized in Section 4. In Section 5, we rovide a brief conclusion to the aer. Some of the lengthy roofs are relegated to Aendices A and B. 2. Problem Formulation & Rate Function Identification Suose that we have k designs, with µ i denoting the mean of design i, i = 1,..., k, and the goal is to find the design with maximum mean. Without loss of generality, assume that µ 1 > µ 2... µ k, so the best design is design 1. In the setting of this aer, for each design i, µ i is estimated from i.i.d. samles J ij, j = 1, 2,..., where E[J ij ] = µ i. Samles across designs, however, may be deendent for a fixed (relication number) j, but are otherwise indeendent. Denote the samle mean for a design based on m relications by J i (m) = m j=1 J ij/m. The design with the largest samle mean is chosen as the best design; thus, correct selection occurs if design 1 has the largest samle mean. The roblem of otimal allocation of a comuting budget is to decide the number of samles for each design in order to maximize the robability of correct selection, or conversely to minimize the robability of false selection. Let i (n) denote the roortion of simulation relications (samles) allocated to design i out of the comuting budget exressed as the total number of relications n, where i (n) 0, k i=1 i(n) = 1, and each i (n)n is an integer. Thus, (J i,1, J i,2,..., J i,i (n)n) denotes the samles for design i. Then false selection is made if J1 ( 1 (n)n) is not the largest samle mean, so we define the robability of false selection by P (F S) P ( J 1 ( 1 (n)n) max 2 i k J i ( i (n)n)). (1) The roblem of interest is to find a robability vector (n) = ( 1 (n),..., k (n)) that minimizes (1). As mentioned in the introduction, we first use the large deviations framework to identify the rate function of P (F S) as n. We then find a robability vector [0, 1] k to minimize this rate function, which acts as a surrogate to P (F S). After roviding a numerical rocedure for finding, we rove convergence of (n) to and characterize the rate of convergence. 4

5 Note that (1) is lower bounded by and is uer bounded by Therefore, if for 2 i k, for some rate function R i (, ), then max P ( J 1 ( 1 (n)n) J i ( i (n)n)) 2 i k (k 1) max 2 i k P ( J 1 ( 1 (n)n) J i ( i (n)n)). 1 lim n n log P ( J 1 ( 1 (n)n) J i ( i (n)n)) = R i ( 1, i ) (2) 1 lim log P (F S) = min n n R i( 1, i ). (3) 2 i k Thus, asymtotically our otimization roblem reduces to finding a robability vector = ( 1,..., k ) that solves the otimization roblem max ( 1,..., k ) min R i( 1, i ). (4) 2 i k We determine R i ( 1, i ) using the Gartner-Ellis Theorem (see Dembo and Zeitouni 1998). Some notation and assumtions are needed before we do this. 2.1 Notation Throughout, we use J i to denote the generic design i random variable for J ij, j = 1, 2,... Let Λ i : R 2 R denote the log-moment generating function of (J 1, J i ), i.e., Λ i (x, y) = log E[e xj 1+yJ i ], and I i : R 2 R + denote the Fenchel-Legendre transform of Λ i, i.e., I i (x 1, x i ) = su λ 1,λ i (λ 1 x 1 + λ i x i Λ i (λ 1, λ i )). Let D Λi {λ R 2 : Λ i (λ) < ). Let A o denote the interior of set A. It is well known that Λ i is convex and C throughout D o Λ i (see Dembo and Zeitouni 1998). Let F i { Λ i (λ) : λ D Λ o i }. For x F i R 2, let λ x denote the solution to Λ i (λ) = x. 5

6 Then, I i (x) = λ T x x Λ i (λ x ), so that I i (x) <. It is also well known that I i is convex and differentiable along the interior of F i, I i (µ 1, µ i ) = 0, and I i (x) 0 for all x R 2 for 2 i k (see Dembo and Zeitouni 1998). For notational convenience, we introduce the following notational convention to denote artial derivatives for any function f : R 2 R: We make the following key assumtion: Assumtion 1. For each i = 2, 3,..., k, (i) (0, 0) D o Λ i ; f (j,k) (x, y) j+k f(x, y). j x k y (ii) γ i > 0, β i > 0 such that ( γ i, 0), (0, β i ) D o Λ i, Λ (1,0) i ( γ i, 0) = µ i, Λ (0,1) i (0, β i ) = µ 1 ; and (iii) J 1 J i is a non-degenerate random variable. Condition (i) ensures that each J i is light tailed, i.e., both its left and right tail decay at least exonentially fast. Condition (ii) imlies P (J 1 < µ i ) > 0, since if P (J 1 µ i ) = 1, no such γ i exists; similarly, it also imlies P (J i > µ 1 ) > 0. Condition (iii) is true in ractically all cases of interest; even in those cases where it does not hold, it is easy to modify the analysis and the roosed algorithm to handle the situation. 2.2 Determining the Rate Function R i ( 1, i ) Let Z (n) (i, 1 (n), i (n)) = J i (n i (n)) J 1 (n 1 (n)), and let Λ (n) (i, (i, 1 (n), i (n))(x) = log E[exZ(n) 1 (n), i (n)) ] denote its log-moment generating function. Since for any fixed j, (J ij : i = 1,..., k) are i.i.d. vectors, we have Λ (n) (i, 1 (n), i (n)) (λ) = { ni (n)λ i ( λ n 1 (n)λ i (, λ ) + n( n 1 (n) n i (n) 1(n) i (n))λ i ( λ, 0) n 1 (n) λ, λ ) + n( λ n 1 (n) n i (n) i(n) 1 (n))λ i (0, ) n i (n) 1(n) i (n), i(n) 1 (n). Note Z (n) (i, 1 (n), i (n)) is only well defined when {n i(n)} are integers; and Λ (n) (i, 1 (n), i (n))(λ) is well-defined when {n i (n)} are ositive. We further assume that i (n) i > 0 as n 6

7 for some robability vector { i }. We take this assumtion as given for now, and we will show in Section 4 it always holds. It follows from this assumtion that Λ (n) (i, lim 1 (n), i (n)) (nλ) n n = Λ (i,1, i )(λ), where Λ (i,1, i )(λ) = { i Λ i ( λ 1, λ i ) + ( 1 i )Λ i ( λ 1, 0) 1 i, 1 Λ i ( λ 1, λ i ) + ( i 1 )Λ i (0, λ i ) i 1. (5) Since D o Λ i is non-emty, D o Λ (i,1, i ) is non-emty. Let F (i, 1, i ) {Λ (i,1, i ) (λ) : λ D o Λ (i,1, i ) }. To characterize the rate function, we need the following additional assumtion: Assumtion 2. For each i = 2, 3,..., k, 0 F o (i, 1, i ) 1, i > 0. In other words, Λ (i, 1, i ) has a zero in F (i, o 1, i ), which we will denote by λ i ( 1, i ), which we now show is unique and satisfies some other roerties useful for our analysis. Definition 1. A function f is said to be homogeneous if f(αx) = αf(x) for α > 0. Lemma 1. Under Assumtions 1 and 2, λ i ( 1, i ) is a unique homogeneous function. Furthermore, it is ositive for 1, i > 0. Proof: Assumtion 2 imlies the existence of λ i ( 1, i ) such that Λ (i, 1, i ) (λ i( 1, i )) = 0, and its uniqueness follows from the strict convexity of Λ (i,1, i ), which we now establish. Λ i ( λ 1, λ i ) is a convex function of λ, as it is a log-moment generating function of J i / i J 1 / 1 evaluated at λ. Also, Λ i (0, x) and Λ i (x, 0) are strictly convex functions of x in the interior of their domain of finiteness (see Dembo and Zeitouni 1993, ). Therefore, for 1 i, Λ (i,1, i ) is strictly convex throughout D o Λ (i,1, i ) ; for 1 = i, strict convexity follows from Assumtion 1, which ensures that J 1 J i is non-degenerate. It is easy to check that Λ (i, 1, i ) (0) = µ i µ 1 < 0. Hence, λ i ( 1, i ) > 0. By differentiating Equation (5) and setting equal to zero, we have i 1 Λ (1,0) i ( λ i( 1, i ) 1, λ i( 1, i ) i ) + 1 i 1 Λ (1,0) i ( λ i( 1, i ) 1, 0) = Λ (0,1) i ( λ i( 1, i ) 1, λ i( 1, i ) i ), i 1, 7

8 Λ (1,0) i ( λ i( 1, i ) 1, λ i( 1, i ) i ) = 1 i Λ (0,1) i ( λ i( 1, i ) 1, λ i( 1, i ) i ) + i 1 i Λ (1,0) i (0, λ i( 1, i ) i ), 1 i. It follows that for any α > 0, λ i (α 1, α i ) = αλ i ( 1, i ). The lemma means that Assumtion 2 is essentially equivalent to requiring the existence of λ i (1, ) satisfying Λ (i,1,) (λ i(1, )) = 0 for each > 0. We can now state the main result characterizing the rate function. Theorem 1. Under Assumtions 1 and 2, R i ( 1, i ) = I (i,1, i )(0) = Λ (i,1, i )(λ i ( 1, i )) satisfies (2), so that (3) holds with this rate function. Proof: Under Assumtions 1 and 2, using the Gartner-Ellis Theorem (see Dembo and Zeitouni 1998, Theorem and Lemma 2.3.9), it can be seen that for each i 2, (Z (n) (i, 1 (n), i (n)) : n 0) satisfies the large deviations rincile, so that 1 lim log P (Z(n) n (i, n 1 (n), i (n)) 0) = inf I (i, 1, i )(x), (6) x 0 where I (i,1, i )(x) = su λ R (λx Λ (i,1, i )(λ)). Now, I (i,1, i ) is strictly convex and C in the set F o (i, 1, i ) (see Dembo and Zeitouni 1998, ), and equal to zero when µ i µ 1 < 0. Therefore, R i ( 1, i ) = inf x 0 I (i, 1, i )(x) = I (i,1, i )(0) = inf λ Λ (i, 1, i )(λ) = Λ (i,1, i )(λ i ( 1, i )). Recall our otimization roblem (4). If each R i (, ) were a concave function, our otimization roblem would be a concave otimization roblem and first-order conditions would be necessary and sufficient for otimality. As noted in Glynn and Juneja (2004) and Xiong (2005), this is true when the samles of different designs are indeendently generated. However, when there is deendence amongst samles, R i (, ) need not be a concave function making the otimization roblem much harder to solve. In Section 3.2, this lack of concavity is illustrated in Examle 3. Note that if for each i 2, J 1 and J i have ositive deendence, i.e., P (J 1 x, J i y) P (J 1 x)p (J i y), x, y, (7) 8

9 then the otimization roblem (4) gives a better solution comared to the case where each J 1 and J i are indeendent or negatively deendent, where the latter corresonds to the reversal of the inequality in (7). To see this, note that under ositive deendence, for any λ, µ > 0, e λj 1 and e µj i are negatively deendent, i.e., P (e λj 1 x, e µj i y) P (e λj 1 x)p (e µj i y), x, y. This imlies that (Lemma 2.1 in Jin et al. 2003) Ee λj 1+µJ i Ee λj 1 Ee µj i. With simle algebra, it can be seen that R i ( 1, i ) dominates R i ( 1, i ) under indeendence for any non-negative 1 and i, and the claim follows. We now illustrate some roerties of R i (, ) that are useful in designing algorithms to numerically solve (4). Lemma 2. For all i 2, R i ( 1, i ) is increasing in i when i 1, and increasing in 1 when i 1. Proof: Recall that R i ( 1, i ) = Λ (i,1, i )(λ i ( 1, i )) and If we fix 1 > 0 and let i < 1, we have Λ (i,1, i )(λ) λ = 0. λ=λi ( 1, i ) R (0,1) i ( 1, i ) = Λ (i, 1, i )(λ) λ=λi. ( 1, i ) i For notational convenience, we denote λ i ( 1, i ) as λ i in this roof. Then R (0,1) i ( 1, i ) = Λ i ( λ i, λ i ) + λ i Λ (0,1) i ( λ i, λ i ) + Λ i ( λ i, 0). (8) 1 i i 1 i 1 Consider function G(x) Λ i ( λ i 1, x). It is easily seen that G (x) > 0 for any x. We may rewrite the right hand side of (8) as G(0) G( λ i ) + λ i G ( λ i ) = λ2 i G (θ λ i ) 0, i i i 2 2 i i where the second equality is by a Taylor exansion of G(0) around λ i i and θ [0, 1]. With this, we establish the first art. Alying similar arguments to i > 1, the second art can be similarly established. 9

10 Recall that R i ( 1, i ) denotes the rate function associated with P (F S) when only design 1 and i are considered and 1 n samles are allocated to design 1 while i n samles are allocated to design i (here we assume that 1 n and i n are integers to facilitate the discussion). Then R i ( 1, i ) is a surrogate measure for P (F S), in that the larger its value, asymtotically the smaller the value of P (F S). Lemma 2 therefore is intuitively lausible as larger the number of samles of the design assigned less samles, asymtotically, lower the P (F S). However, as our analysis indicates, this monotonicity may breakdown when the samles of design assigned more samles are further increased. For instance, R i ( 1, i ) may decrease as i is increased when i > 1. This may haen when J 1 and J i are highly ositively deendent, so that J 1 J i has very little variability. Under such deendence, consider the case where 1 = i. Then the robability ( 1 n 1 P J 1k 1 1 n i n k=1 i n k=1 ) ( ) 1 n 1 J ik = P (J 1k J ik ) 0 1 n may be quite small. However, if i is increased further, then this robability becomes ( 1 n 1 P J 1k 1 n k=1 ( 1 i ) 1 1 n k=1 1 n ( ) i 1 J ik + k=1 i 1 ( i 1 )n i n k= 1 n+1 J ik ). (9) Thus, as i increases, the weight given to the deendent samles reduces while that given to the indeendent samles increases. This suggests that in certain cases, R i ( 1, i ) may decrease as i is increased from 1, due to added noise from the indeendent samles ( i 1 )n samles. Equation (9) suggests that as i, we may exect R i ( 1, i ) to converge to the rate function of P 1 1 (n)n 1 (n)n k=1 J 1k µ i. We show this in Lemma 4, and establish some other roerties of R i ( 1, i ) in Lemmas 3 and 4 useful to our analysis. Lemma 3. Under Assumtions 1 and 2, R i (, ) is a homogeneous function. Proof: Follows from Lemma 1 and Theorem 1, alying the definition of Λ (i,1, i )(λ i ( 1, i )) in (5). Thus, it suffices to focus on R i (1, ) for > 0 to understand the behavior of R i ( 1, i ). For notational simlicity, let λ i () denote λ i (1, ) and R i () denote R i (1, ). 10

11 Let f i (λ, ) = Λ (i,1,) (λ) = Λ i ( λ, λ ) ( 1)Λ i(0, λ ). (10) Then, R i () = f i (λ i (), ). Also, as discussed before, λ i () is the unique solution to f (1,0) i (λ i (), ) = 0. Therefore, from the imlicit function theorem, it follows that λ i() = f (1,1) i (λ i (), ) f (2,0) i (λ i (), ). It is also easy to see that R i () = f (0,1) i (λ i (), ). Furthermore, R i () = f (2,0) i (λ i (), )f (0,2) i (λ i (), ) f (1,1) i (λ i (), ) 2. f (2,0) i (λ i (), ) Lemma 2 and Lemma 3 imly that R i () is increasing when < 1. In order to know the monotonicity of R i () on (1, ), we need to analyze the first and second derivatives of R i () and its behavior as. So we ut down the exressions of both derivatives above. The following Lemma gives the asymtotic behavior of R i () and λ i (). Lemma 4. Under Assumtions 1 and 2, λ i () γ i (11) as, and as 0. Furthermore, λ i () β i (12) R i ( ) = lim R i () = Λ i ( γ i, 0) γ i µ i = inf x I i(µ i, x), (13) and R i (0 + ) = 0. lim R i () = 0, (14) The roof of Lemma 4 is given in the Aendix. These roerties will be used in the three examles resented later in the next section. Since under Assumtions 1 and 2, R i (0+) = 0, we set R i (0) = 0 henceforth. 11

12 3. Solving the Asymtotic Otimization Problem In this section, we first reduce the asymtotic otimization roblem (4) to a single variable otimization roblem. We then develo a numerical rocedure to solve this roblem. 3.1 Transformation to a Single Variable Problem Let Ω = {2, 3,..., k}. Setting x i = i 1 for i Ω, we can rewrite the budget constraint as 1 (1 + i Ω x i) = 1, and roblem (4) as: max (x j 0:j Ω) min i Ω R i (x i ) 1 + i Ω x. (15) i Recall that R i ( ) need not be a monotone function. Set R 1 i (z) inf{x > 0 R i (x) = z}. If (x i : i Ω) solves (15), then the solution to (4) corresonds to 1 1 = 1+ i Ω x i i = 1x i. The following theorem characterizes such a solution. Theorem 2. Under Assumtions 1 and 2, if (x i : i Ω) denotes an otimal solution to the otimization roblem (15), then for each i Ω, the following hold: 1. 0 < x i <. 2. z > 0 such that R i (x i ) = z i Ω. Furthermore, x i = R 1 i (z ). 3. If x i 1, then R i(x i ) If z R i (1), then x i 1. Proof: Under Assumtions 1 and 2, R i (0) = 0, so it s clear that each x i and > 0. Similarly, since for any x i =, the objective function value equals zero, it follows that at an otimal oint, each x i <. To see conclusion 2, first note that R i (x i ) must be equal for each i Ω. Otherwise, suose that R i (x i ) < R j (x j) for some i, j Ω. Then a small reduction in x j does not change the numerator of the objective function, but it reduces the denominator, contradicting the otimality of (x i : i Ω). Let z = R i (x i ). It is easy to see that x i = R 1 i (z ). Otherwise, x i > R 1 i (z ), and a better solution to (15) is obtained by relacing x i with R 1 i (z ). To see conclusion 3, note that if R i(x i ) < 0, then there exists an x < x i such that R i (x) > R i (x i ). Since R i (0) = 0, and R i ( ) is a continuous function, it follows that there exists a x < x < x i such that R i ( x) = R i (x i ). This contradicts the fact that x i = R 1 i (z ). 12

13 Conclusion 4 now follows from Lemma 2. It can be seen that R 1 i for non-gaussian distributions, R 1 i is continuous for Gaussian distributions (see Fu et al. 2006), but can be discontinuous when R i is not monotone. This is illustrated in Examle 3 (Figure 1: a = 0.95, b = 1.05 case) in the next section. The next lemma states some features of function R 1 i. Lemma 5. Under Assumtions 1 and 2, 1. R 1 i 2. R 1 i is strictly increasing. is left continuous. 3. If R 1 i is discontinuous at z 0 with x 0 = R 1 i (z 0 ), then R i(x 0 ) = 0. Proof. 1. Suose x 0 = R 1 i (z 0 ) > 0. Then for arbitrary z 1 (0, z 0 ), there exists x 1 (0, x 0 ) such that R i (x 1 ) = z 1 by continuity of R i. Hence, R 1 i (z 1 ) x 1 < x Let R 1 i (z 0 ) = x 0 and z n z 0. Then there exists x 1 = lim n R 1 i (z n ). By continuity of R i, we have z 0 = lim n z n = lim n R i (R 1 i (z n )) = R i (x 1 ). By monotonicity above, we have x 1 x 0. Then the definition of R 1 i imlies x 1 = x Discontinuity at z 0 imlies there exists x 1 > x 0 such that R 1 i (z) x 1 whenever z > z 0. The above two features imly R i(x 0 ) 0. Suose R i(x 0 ) > 0. Then there exists x 2 (x 0, x 1 ) such that z 2 = R i (x 2 ) > z 0. Now we have R 1 i (z 2 ) x 2 < x 1, a contradiction. Let R b i su x 0 R i (x). Obviously, z R b min i Ω R b i. Now we can rewrite roblem (15) as a single-variable nonlinear rogramming (NLP) roblem where F (z) = min 0 z R b F (z) (16) 1 + k z R 1 i (z). (17) Once we find an otimizer to this roblem, z, we can easily obtain { i } via 1 = i Ω R 1 i (z ), i = 1R 1 i (z ). (18) 13

14 Let R b i = max(r i (1), R i ( )). Then, clearly R b i R b i. It is intuitively lausible that at least in tyical cases, R b i = R b i. To see this, recall that R i (x) serves as a surrogate for the robability P ( 1 n n k=1 J 1k 1 xn xn k=1 J ik ) for large values of n. As roved in Lemma 2, R i (x) increases with x for x < 1. However, it may decrease with x for x > 1. It is intuitively lausible that either R i (x) increases with x for all x, in which case its highest value R b i = R i ( ) comuted in Lemma 4. Otherwise, it may have R i(1+) < 0, and it may either decrease with x for all x > 1, or that it first decreases and then increases with x. Thus, in all such cases, R b i = R b i. In articular, in these cases, R i ( ) is quasi-convex in the region (1, ). Next, we illustrate this with some examles. 3.2 Examles We resent some illustrative examles where R b i = R b i. This subsection can be skied without loss of continuity. Examle 1. Suose that Λ (0,1) i ( µ, λ) Λ (0,1) i (0, λ), for all µ λ > 0, and ( µ, λ) D Λi. In the case of Gaussian and Bernoulli distributions, this condition corresonds to negative correlation between J 1 and J i. We know R i () = f i (λ i (), )), where f i is given by Equation (10). Then we have R i() = f (1,0) i (λ i (), ) = λ i() [ (0,1) Λ 2 i ( λ i (), λ i() ) + ( 1)Λ(0,1) i (0, λ i() )] Λ i (0, λ i() ) λ i() Λ(0,1) i (0, λ i() ) Λ i(0, λ i() ) > 0, where the last ste uses convexity of Λ i (0, λ). Hence we always have an increasing R i (). In articular, R b i = R b i. Examle 2. Suose that Λ (0,2) i ( µ, λ) Λ (0,2) i (0, λ) and Λ (0,3) i (0, λ) 0, for all µ λ > 0, and ( µ, λ) D Λi. It can be seen that these conditions hold for a Gaussian distribution. In the aendix, we rove that R i is quasi-convex, so R b i = R b i. A non-gaussian distribution where both conditions hold (roof in the aendix) is the following: J 1 U( 1, 1), J i = cj 1 a + bu i, c [0, 0.5], where U i U( 1, 1) are i.i.d. and indeendent of J 1. 14

15 1.4 a=0.8, b=0.9 a=0.8, b=2 a=0.8, b= X i X i X i Figure 1: Function R i (x i ) Examle 3. Consider J 1 and J i as in Examle 2, with c = 1, 0 < a < 1 and a < b. In this case, it is difficult to establish quasi-convexity of R i on (1, ) analytically. However, numerically we observe this to be true for three set of arameters (refer to Figure 1; see also the aendix for more details): 1 > b > a: x i (0, 1]; b 1: Since R i is increasing to the right of 1, x i (0, ); b > 1 + o(1): Since R 1 i (R i (x)) x for any x (1, 4.7], x i (1, 4.7]. 3.3 Numerical Procedure for solving the NLP We now develo an efficient numerical rocedure to solve the one-dimensional NLP min F (z), (19) 0 z R b where R b = min i Ω Rb i and F is given by (17). Note that even if R b > R b, the solution to (19) still rovides a good solution to (16). Straightforward algebra yields M(z) z 2 F (z) = z R i (R 1 i (z)) 1 R 1 i (z). (20) 15

16 If F is smooth, we only need to locate all the roots of M(z) = 0 in interval [0, R b ]. But F is not differentiable at R i (1) unless J i and J 1 are indeendent. Also note that F may not even be continuous when some R i is not monotone, as then R 1 i is discontinuous. So M(z ) = 0 may not be satisfied or even well defined for each otimizer z to roblem (19). Now we define two sets as follows: S 1 = {R i (1) i Ω} [0, R b ), S 2 = {z (0, R b ) F ( ) discontinuous at z}. If we let k = 4 and functions {R i ( ), i = 2, 3, 4} to be set as the three cases from left to right in Figure 1, then we can show R b = R 2 (1), S 1 = {R 3 (1), R 4 (1)}, and S 2 = {R 4 (1)}. Theorem 3 resents some necessary conditions that an otimizer z satisfies. Theorem 3. If z is an otimizer of roblem (19), one of the following three conditions has to be satisfied: (i) F (z ) = 0, z / S 1, 0 < z < R b ; (ii) F (z ) 0, F (z +) 0, z S 1 \ (S 2 { R b }); (iii) F (z ) 0, z S 1 (S 2 { R b }). Proof: Note that F (0+) =, so 0 cannot be an otimizer. Now consider the oint R b, and assume R b / S 1. If R 1 i ( R b ) = for some i Ω, then the objective function F ( R b ) =, and hence R b cannot be otimal. Now assume R 1 i ( R b ) is finite for all i Ω. Recall that R b = min i Ω Rb i. Thus, Rb = R b i for some i, and since R b / S 1, it follows that R b = R i ( ) > R i (1). Therefore, R 1 i ( R b ) = +. From (14), it follows that R i(r 1 i ( R b )) = 0. Then we have F ( R b ) =, because z Therefore, R b cannot be otimal as long as R b / S 1. R i (R 1 i (z)) R 1 i (z) as z R b. Let ω S 2 \ S 1. Lemma 5 imlies that F is left continuous at ω, and R i(r 1 i (ω)) = 0 for some i Ω. So we have F (ω ) =, and thus ω cannot be otimal. Now if we assume z (0, R b )\S 1, we have z / S 2, i.e., F is continuous and differentiable at z. So F (z ) = 0 has to be true. This is the first situation in the theorem. For the case z S 1 \ (S 2 { R b }), we know F has both left and right limits at z, so F (z ) 0 and F (z +) 0 have to hold. This is the second situation in the theorem. For the case z S 1 (S 2 { R b }), we know F ( ) only has a left limit, so F (z ) 0, which is the third condition in the theorem. The following establishes a useful roerty of the otimal solution { i, i k}. 16

17 Lemma 6. Without loss of generality, assume R 2 (1) R 3 (1) R k (1). Then there exists 2 i 1 i 2 k such that i > 1, 2 i < i 1, i = 1, i 1 i < i 2, i < 1, i 2 i, where i = j = 1 for some i 1 i, j < i 2 only if R i (1) = R j (1). Proof: Suose R i (1) < z for some i. Then i = R 1 i (z ) > 1 = i > 1. The reverse 1 equality holds if R i (1) > z. Since i = 1 if and only if R i (1) = z, we can rove the second art of the claim. This feature is consistent with intuition. The order of R i (1) reflects the order of simulation efficiency. A small (large) value imlies low (high) efficiency, which then asks for more (less) allocation. In order to locate the global otimizer of roblem (16), we need to solve the following two roblems: P1. Calculate F (z), or more recisely, R 1 i (z), for any given R i (1) < z R b i. P2. Find all z satisfying conditions in Theorem 3. If we assume R i (x) on (1, ) only takes three ossible shaes as illustrated in Examle 3 of the revious section, P1 is trivial to solve by calculating R 1 i (z) via numerical rocedures such as the bisection method. P2 deends on our knowledge on convexity of function F. If all the samles follow Gaussian distributions, Fu et al. (2006) show that F is iecewise convex. However, this may not be true in general settings. Let V denote the otimal value of roblem (16). For an arbitrary > 0, the following result rovides a means to locate a solution with objective value in the interval [V, V + ]. Let F 0 = F (min i Ω R i (1)). Theorem 4. Consider a sequence {z n } where z 1 = 1 F 0 and z n+1 = F (z n )z n, where a b = min(a, b). min 1 j n F (z j ) F 0 For < V, there exists an m such that z m R b and z m 1 < R b. In addition, min F (z n) F 0 V +. 1 n m 17

18 Proof. Since < V, z n+1 = F (z n )z n min 1 j n F (z j ) F 0 F (z n)z n F (z n ) > z n. (21) Suose the sequence never exceeds R b, then it has to converge to some value, say r R b. Since F (z) is left continuous from Lemma 5, we have F (z n ) F (r). Letting n in (21), we have r F (r)r > r, a contradiction. F (r) Note that G(z) zf (z) = 1 + k R 1 i (z) is an increasing function of z. Let z 0 = 0. Then, for any 0 n < m and any z (z n, z n+1 ], we have F (z) = G(z) z { G(z n) F0, n = 0 = z n+1 min F (z j) F 0, n > 0 1 j n min F (z n) F 0. 1 n m We comlete the roof by noticing that (0, R b ] m 1 n=0 (z n, z n+1 ]. Based on Theorem 4, we roose the following exlicit algorithm: Algorithm for Solving the Aroximation Problem. Ste 1. Let z = min i {2,...,k} R i (1, 1) and F = F (z ), where F is given by (17) and R i is the rate function. Secify a tolerance level (0, F ). Set n = 1 and z 1 = 1 F. Ste 2. If z n > R b min i {2,...,k} Rb i, go to Ste 4; otherwise, go to Ste 3. Ste 3. If F (z n ) < F, let z = z n and F = F (z n ). Set z n+1 = F (zn)zn and n = n + 1, then go to ste 2. F Ste 4. Calculate 1 from z via (18) and i = 1R 1 i (z ) for i = 2,..., k. Return { i }. 4. Asymtotic Analysis Since the rocedure described in the revious section gives a solution = ( 1,..., k ) to the aroximate roblem, instead of the solution (n) = ( 1 (n),..., k (n)) to the original roblem of minimizing the robability of false selection given by (1), we rove that (n) and characterize the rate of convergence under additional assumtions. Lemma 7 roves that dominates any solution (n) = ( 1 (n),..., k (n)), where (n) converges to some limit. Any vector R k is referred to as an allocation vector if i 0 for each i, i k i = 1 and 18

19 n i is an integer for each i. Let P (F S) denotes the robability of false selection under the allocation vector. The following result establishes that a finite budget allocation based on dominates all other linear allocations in a certain sense asymtotically. Lemma 7. Consider a sequence of allocation vectors (n) = ( 1 (n),..., k (n)) for n 1, and suose that there exists a vector such that Under Assumtions 1 and 2, if, then (n). log(p (n) (F S)/P lim (n)(f S)) n n where (n) is an allocation vector such that (n). > 0, (22) Proof: We slit the roof into three cases: (i) 1 > 0 and i > 0 for all i Ω; (ii) 1 = 0; (iii) 1 > 0 and i = 0 for some i Ω. From Theorem 1, (i) follows. Now consider (ii). Without loss of generality, we further assume 2 > 0. Assumtion 1 imlies that there exists ε > 0 such that P (J 1 µ 2 ε) > 0. Letting α 2 (n) = 2(n) 1 (n), for n n 0 sufficiently large so that 2 (n) > 1 (n) for all n n 0, log P ( J2 ( 2 (n)n) J 1 ( 1 (n)n) ) 1 (n)n = log P (J 2m /α 2 (n) J 1m ) + log P m=1 2 (n)n m= 1 (n)n+1 2 (n)n m= 1 (n)n+1 J 2m /α 2 (n) 1 (n)n(µ 2 ε) J 2m /α 2 (n) > 0 1 (n)n + log P (J 2m /α 2 (n) J 1m ) 1 (n)n(µ 2 ε) log P m=1 1 2 (n)n 2 (n)n m= 1 (n)n+1 J 2m (µ 2 ε) + 1 (n)n log P ((J 2 /α 2 (n) J 1 ) (µ 2 ε)). By the strong law of large numbers, the first term converges to zero, and P ((J 2 /α 2 (n) J 1 ) (µ 2 ε)) P ( J 1 (µ 2 ε)). 19

20 Hence, we have lim inf n which then imlies that 1 n log P ( J 2 ( 2 (n)n) J 1 ( 1 (n)n) 1 log P (J 1 µ 2 ε) = 0, 1 lim n n P 1 (n)(f S) = 0 > lim n n P (n)(f S). Case (iii) can be treated by switching the roles of 2 (n) and 1 (n) in the arguments for case (ii). The following convergence result follows from Lemma 7: Theorem 5. Under Assumtions 1 and 2, lim (n) = n. Proof. Suose that (n) does not converge to. Then we can find an increasing and divergent sequence {n m, m = 1, } and a vector such that Then from Lemma 7, we have that lim (n m) =. n log(p (nm )(F S)/P lim (n)(f S)) n n > 0, imlying P (nm )(F S) > P (n)(f S) for n sufficiently large, which is a contradiction. Theorem 5 imlies that (n) differs from by an o(1) amount. In order to get a better idea of the rate of convergence of (n) to, we need an additional assumtion. Assumtion 3. J i, J 1, and J i J 1 are non-lattice random variables. Recall that a function f(n) is said to be Θ(g(n)) if there exist ositive constants K 1 < K 2 such that K 1 g(n) f(n) K 2 g(n) for all n sufficiently large. We have the following two lemmas, whose roofs are given in the Aendix. Lemma 8. Under Assumtions 1, 2, and 3, P α(n) (F S) = Θ ( 1 n ex[ n min 2 i k R i(α 1 (n), α i (n))] ), for any allocation vector α(n) that converges to a vector α as n. 20

21 Lemma 9. Under Assumtions 1, 2, and 3, su nr i ( 1 (n), i (n)) nr i ( 1, i ) <, i Ω. n>0 Recall that the otimizer z to roblem (16) has to satisfy one of three conditions in Theorem 3. Based on these conditions, we introduce the following definition. Definition 2. A solution z conditions hold: to roblem (16) is called Zero Fit if any of the following (1) F (z ) = 0 and z (0, R b ) \ S 1 ; (2) F (z )F (z +) = 0 and z S 1 \ (S 2 {R b }); (3) F (z ) = 0 and z S 1 (S 2 {R b }). The following theorem gives a more recise order of i (n) i in terms of the budget n. Its roof is given in the Aendix. Theorem 6. Under Assumtions 1, 2, and 3, suose there do not exist 1 < i < j k such that z = R i (1) = R j (1), then (i) If z is not Zero Fit, i (n) i O( 1 ) for any 1 i k; n (ii) If z is Zero Fit and F (z ±) 0, i (n) i O( 1 n ) for any 1 i k. 5. Conclusions In this aer, we considered the ordinal otimization roblem when the outut from different oulations is deendent and has a general (not necessarily Gaussian) distribution. By using the large deviations framework to identify the large deviations rate function of the robability of false selection, we obtain an otimization roblem that is far more comlex than the one obtained in the indeendent and/or Gaussian setting. However, by analyzing the structure of this roblem, we are able to reduce it to a single variable nonlinear otimization roblem and develo a numerical algorithm to aroximately solve it. In addition to roving that the solution to this aroximate otimization roblem is in fact the asymtotic limit of the solution to the original otimal budget allocation roblem, we are also able to characterize the convergence rate of the aroximate solution under additional conditions. Both the analysis used and the results obtained are novel to this general ordinal otimization setting. 21

22 Acknowledgments Michael Fu and Xiaoing Xiong were suorted in art by the National Science Foundation under Grants DMI and DMI , and by the Air Force Office of Scientific Research under Grant FA Aendix In this aendix, we rovide the roofs of Lemmas 4, 8, 9, Theorem 6, and further details of the analysis for Examles 2 and 3 from Section 3.2. Proof of Lemma 4: First consider > 1. For ɛ > 0 and sufficiently small, Λ (0,1) i (0, γ i + ɛ Note that Λ (0,1) i Λ (i,1,)(γ i + ɛ) = ) Λ (1,0) i ( (γ i + ɛ), γ i + ɛ ) + 1 ( (0, γ i+ɛ ) µ i as, and Λ (1,0) i ( (γ i + ɛ), γ i + ɛ Λ (0,1) i ( (γ i + ɛ), γ i + ɛ ) Λ (1,0) i ( (γ i + ɛ), 0) < µ i. ) Λ (0,1) i (0, γ ) i + ɛ ). Therefore, Λ (i,1,) (γ i + ɛ) > 0 for all sufficiently large. Similarly it follows that Λ (i, 1, i ) (γ i ɛ) < 0 for all sufficiently large. Since ɛ is arbitrary, (11) follows. To see (12), consider < 1. For ɛ > 0 and sufficiently small, Λ (i,1,)((β i + ɛ)) = Λ (0,1) i ( (β i + ɛ), β i + ɛ) Λ (1,0) i ( (β i + ɛ), β i + ɛ) (1 )Λ (1,0) i ( (β i + ɛ), 0). Note that as 0, Λ (0,1) i ( (β i + ɛ), β i + ɛ) Λ (0,1) i (0, β i + ɛ) > µ 1, and Λ (1,0) i ( (β i +ɛ), 0) µ 1. Hence, Λ (i,1,) ((β i +ɛ)) > 0 for sufficiently large. Similarly, it can be seen that Λ (i,1,) ((β i ɛ)) < 0 for sufficiently large, and (12) follows. To see (13), note that for > 1, R i () = Λ i ( λ i (), λ i() ) ( 1)Λ i(0, λ i() ). 22

23 Now as, Using Taylor s Theorem, Therefore, Λ i ( λ i (), λ i() ) Λ i( γ i, 0). Λ i (0, λ i() ) = λ i() Λ(0,1) i (0, 0) + o(1/). R i ( ) = Λ i ( γ i, 0) γ i µ i. To see that RHS equals inf x I i (µ i, x), note that the latter equals inf x su (λ 1 µ i + λ 2 x Λ i (λ 1, λ 2 )). λ 1,λ 2 Using the min-max theorem, this in turn equals su λ 1,λ 2 inf x (λ 1µ i + λ 2 x Λ i (λ 1, λ 2 )) The result now follows since Λ i( γ i, 0) = µ i. = su λ 1 (λ 1 µ i Λ i (λ 1, 0)). To see (14), note that R i() = f (0,1) i (λ i (), ) equals λ i () Λ (0,1) 2 i ( λ i (), λ i() ) Λ i(0, λ i() ) + ( 1)λ i() Λ (0,1) 2 i (0, λ i() ). (23) From the fact that f (1,0) i (λ i (), ) = 0, it follows that 1 Λ(0,1) i Plugging this in (23), ( λ i (), λ i() R i() = λ i() ) = Λ(1,0) i ( λ i (), λ i() Λ(1,0) i Equation (14) follows from this and (11). ( λ i (), λ i() To see that R i (0 + ) = 0, note that for < 1, ) 1 ) Λ i(0, λ i() ). R i () = Λ i ( λ i (), λ i() ) (1 )Λ i( λ i (), 0). Λ(0,1) i (0, λ i() ). The result now follows from (12). Theorem 3.3 in Chaganty and Sethuraman 1993 is useful in roving Lemma 8. reroduce its statement here to hel the reader follow the roof of Lemma 8. We 23

24 Theorem 7 (Chaganty and Sethuraman 1993). Let T n be a sequence of non-lattice valued random variables with m.g.f. φ n (z) = E ex(zt n ), which is nonvanishing and analytic in the region Ψ = {z C, z < a}, where a > 0 and C is the set of all comlex numbers. Let {a n } be a sequence of real numbers. Let ψ n (z) = log φn(z) a n and γ n (u) = su s <a,s R [us ψ n (s)], for u R. Let {m n } be a bounded sequence of real numbers and {τ n } be a sequence satisfying ψ n(τ n ) = m n and 0 < τ n < a 0 < a, for all n. Suose that a n such that τ n an. Further, assume that T n satisfies the following three conditions: (a) There exists β < such that ψ n (z) < β for all n and z Ψ; (b) There exists γ > 0 such that ψ n(τ n ) γ for all n; (c) There exists δ 0 > 0 such that, for any given δ and δ such that 0 < δ < δ 0 < δ, su φ n(τ n + t 1) = o( 1 ). (24) δ< t <δ τ n φ n (τ n ) n Then P ( T n a n m n ) 1 τ n 2πan ψ n(τ n ) ex( a nγ n (m m )). Proof of Lemma 8: In Theorem 7, set T n = n J i (α i (n)) n J 1 (α 1 (n)), m n = 0, a n = n, ψ n (λ) = Λ (i,α1 (n),α i (n))(λ), φ n (λ) = ex{nψ n (λ)}, and τ n be the solution to ψ n(λ) = 0. We also let ψ(λ) = Λ (i,α1,α i )(λ) and τ be the solution to ψ (λ) = 0. Then there exists a > τ such that ψ n (λ) is well defined on [0, a] for any sufficiently large n. We denote as Ψ the set {z C : z a, Re(z) 0}. Now we only need to verify the conditions (a), (b), (c) in Theorem 7 above. Obviously we have τ n τ and ψ n (z) ψ(z), z Ψ. Since ψ n and ψ are uniformly continuous on comact set Ψ, ψ n (z) uniformly converges to ψ(z) and thus (a) holds. The convexity of ψ(λ) and ψ n(τ n ) ψ(τ) imlies condition (b). To roceed, we denote as χ(λ; ξ) the moment-generating function, i.e., χ(λ; ξ) = Ee λξ. Obviously, χ(λ+t 1;ξ) is the χ(λ;ξ) characteristic function of a random variable, say η, hence, we have χ(λ + t 1; ξ) = χ(t 1; η) 1. χ(λ; ξ) In articular, if ξ is non-lattice, so is η. Now we suose α 1 > α i, 24

25 φ n(τ n + t 1) = χ(τ n + t 1; φ n (τ n ) χ(τ n ; J i α i (n) J 1 J i α i (n) J 1 α 1 (n) ) α 1 (n) ) χ(τ n + t 1; J 1 α 1 (n) ) χ(τ n ; J 1 α 1 (n) ) n(α1(n) αi(n)). nαi(n) χ(τ n + t 1; J 1 χ(τ n ; J 1 ) α 1 (n) α 1 (n) ) n(α1(n) αi(n)) Assumtion 3 imlies that the RHS can be taken as c.f. of a non-lattice random variable. To simlify, we denote as χ n (t) the quantity in the brackets on RHS of above formula. Since τ n τ, we have τ = su n τ n <. We choose δ 0 = 1 and fix δ and δ, we have su χ n (t) < 1. δ t δ τ Noticing that χ n (t) uniformly converges to χ(τ+t 1; J 1 α 1 ) have x = su n χ(τ; J 1 α 1 ) su χ n (t) < 1, δ t δ τ on a comact set [δ, δ τ ], we also which further imlies that su φ n (τ n + t 1) x n(α 1(n) α i (n)) = O(ex[n log(x)(α 1 α i )]) = o( 1 ). δ t δ τ n φ n (τ n ) n Similarly we can show (24) also holds when α i α 1, which comletes the roof. Proof of Lemma 9: To simlify notation, we let R(n) = min i( 1 (n), i (n)); Ri (n) = R i ( 1 (n), i (n)); 2 i k R (n) = min i( 1(n), i (n)); Ri (n) = R i ( 1(n), i (n)); 2 i k where ( i (n) : i k) is obtained by suitably rounding of each i to a multile of 1/n so that i (n) i O(1/n), each i (n) > 0 and k i=1 i (n) = 1. If R = min 2 i k R i ( 1, i ); R i = R i ( 1, i ), we have R (n) R = O(1/n), (25) R i (n) R i = O(1/n). (26) 25

26 To show (26), we reresent artial derivative with suerscrits and write R i (n) Ri = Ri ( 1(n), i (n)) R i ( 1, i ) R i ( 1(n), i (n)) R i ( 1, i (n)) + R i ( 1, i (n)) R i ( 1, i ) = (1,0) R i ( θ 1, i (n))( 1(n) 1) + R (0,1) i ( 1, θ i )( i (n) i ) K ( 1 (n) 1) + K ( i (n) i ) = O( 1 n ), where the third ste uses the mean value theorem with θ 1 between 1(n) and 1 and θ i between i (n) and i, the fourth ste uses boundedness of artial derivatives, and the last uses i (n) i O( 1 n ). To show (25), we notice that R i = R and thus, R (n) R = min i Ω [R i (n) R ] = min i Ω [R i (n) R i ] max i Ω R i (n) R i O( 1 n ). From Lemma 8 it follows that Hence, we have P (n) (F S) = Θ ( 1 n ex( n R(n) ), P (n)(f S) = Θ ( 1 n ex( nr (n) ). C log n nr (n) log P (n)(f S) log P (n) (F S) C log n n R(n), (27) where C 1 and C 2 are constants indeendent of n. We can relace R (n) with R on the left hand side by using (25). The fact that R is the value of roblem (4) imlies R R(n). Combining this with the above inequality and (25), we see that su n R(n) nr < C 1 C 2 + O(1) = K <, (28) n>0 where K is another constant. Obviously R i (n) R(n) for all i Ω. In addition, Part 2 of Theorem 2 imlies R i = R for all i Ω. Then n R i (n) nr i n R(n) nr K by (28). We define a new vector (n) with: { i (n) + 2K i(n), i 2, nr i (n) = i (n) 2K 1 i(n), i = 2. nr (29) 26

27 Obviously, (n) satisfies the budget constraint. Given sufficiently large n, each i (n) is ositive and R i (n) R(n) > R K n > R /2 for all i Ω. We recall that the otimality of { i } to roblem (4) is indeendent of the value of n. So we always have, for arbitrary n, min i Ω R i( 1 (n), i (n)) min i Ω R i( 1, i ) = R. (30) Note that i 1 = i(n) for i > 2. Combining this with homogeneity of R 1 (n) i, and noting (29) we have, for i > 2, nr i ( 1 (n), i (n)) = 1(n) 1 (n) nr i( 1 (n), i (n)) = n R i (n) + 2K R i (n) R where the last ste uses (28). > n R i (n) + K n R(n) + K nr, (31) Now nr i ( 1 (n), i (n)) > nr for all i > 2. If this inequality also holds for i = 2, we will lead to a contradiction to (30). Hence, nr 2 ( 1 (n), 2 (n)) nr = nr 2. Using suerscrits to denote artial derivatives of R 2 (, ), we have n R 2 (n) nr 2 = nr 2 ( 1 (n), 2 (n)) nr 2 nr 2 ( 1 (n), 2 (n)) nr 2 ( 1 (n), 2 (n)) = [nr 2 ( 1 (n), 2 (n)) nr 2 ( 1 (n), 2 (n))] + [nr 2 ( 1 (n), 2 (n)) nr 2 ( 1 (n), 2 (n))] = nr (1,0) 2 ( θ 1, 2 (n))[ 1 (n) 1 (n)] + nr (0,1) 2 ( 1 (n), θ 2)[ 2 (n) 2 (n)] = 2K 1(n) R R(1,0) 2 ( θ 1, 2 (n)) + 2K 1 2(n) R (0,1) R 2 ( 1 (n), θ 2), where the second ste uses nr 2 ( 1 (n), 2 (n)) nr 2, the fourth ste uses mean value theorem with θ 1 falling in between 1 (n) and 1 (n) and θ 2 falling in between 2 (n) and 2 (n). Since i (n) and i (n) all converge to i, it is easy to verify boundedness of each term on the RHS. So n R 2 (n) nr 2 is also bounded from above, and thus art (ii) is true when i = 2. Obviously, we can extend the arguments to all i Ω and comlete the roof. Additional notation is needed to hel in roving Theorem 6. We write n i (n) n i as i (n), i(n) as q 1 (n) i(n), and i as q i. We will also denote as X(n) UBD if some series X(n), 1 deending on n, is uniformly bounded from above and below for any large n. We first show the following lemma: Lemma 10. Under Assumtions 1, 2, and 3, 1 (n) UBD = i (n) UBD, i Ω. 27

28 Proof: Recall that for ( 1, i ), and α, R i (α 1, α i ) = αr i ( 1, i ), i.e., R i is a homogeneous function. Furthermore R i (α) = R i (1, α). Therefore, nr i ( 1 (n), i (n)) nr i ( 1, i ) = n 1 (n)r i ( q i (n)) n 1R i (qi ). (32) Through a Taylor series exansion, we have R i ( q i (n)) = R i (qi ) + ( q i (n) qi )R i(v i (n)), where V i (n) lies between q i (n) and qi and R i reresents the right (left) derivative if q i (n) is greater (less) than qi. Then, the RHS in (32) may be re-exressed as R i (qi ) 1 (n) + nr i(v i (n))( i (n) qi 1 (n)) = [R i (qi ) qi R i(v i (n))] 1 (n) + R i(v i (n)) i (n). (33) We observe that the coefficients of 1 (n) and i (n) converge to R i (qi ) qi R i(q i ) and R i(q i ), resectively, as n. Note that nr i ( 1 (n), i (n)) nr i ( 1, i ), and hence the right hand side of (33) is bounded by Lemma 8(ii). Obviously Lemma 10 is true if R i(q i ) 0. Now suose R i(q i ) = 0 for some i Ω; then qi 1, because R i( ) is ositive on (0, 1] by Lemma 2. If we further suose qi > 1, then R i(q i ) = 0 imlies F (z ) =, where F can be inferred from (20). But since z is otimal to (19), Theorem 3 requires F (z ) 0, a contradiction. So we may only have qi = 1, i.e., i = 1. We recall the following: V i (n) falls between q i (n) and qi = 1. R i(v i (n)) R i(1+) = 0 when q i (n) > qi = 1, whereas R i(v i (n)) R i(1 ) > 0 when q i (n) < qi = 1. Therefore, assuming R i(q i ) = 0 imlies V i (n) has to converge to q i and thus q i (n) V i (n) q i = 1. Now we have = 1 from the right, i (n) = n i (n) n i n 1 (n) n i = n 1 (n) n 1 = 1 (n). We can comlete the roof by using k i=1 i(n) = 0. Thus, 1 (n) UBD = i (n) UBD. Proof of Theorem 6: We show art (i) by considering two cases: 28

29 I. R i(q i ) = 0 for some i Ω; II. R i(q i ) 0 for all i Ω. As argued in the roof of Lemma 10, R i(q i ) = 0 for some i only if q i = 1. The condition of this theorem then imlies there is at most one i, say i = 2, such that R 2(q 2) = 0 in case I. Recall that both sides of (33) are uniformly bounded for all i, i.e., [R i (q i ) q i R i(v i (n))] 1 (n) + R i(v i (n)) i (n) UBD. (34) Since R i(v i (n)) converges to a non-zero value for i > 2, we have, for i > 2, Hence, R i (qi ) [ R i (V i(n)) q i ] 1 (n) + i (n) UBD. R i (qi ) [ R i (V i(n)) q i ] 1 (n) + i=3 Since R 2(V 2 (n)) 0, R 2(V 2 (n)) UBD and R i (qi ) [ R i (V i(n)) q i ]R 2(V 2 (n)) 1 (n) + i=3 i=3 i (n) UBD. i=3 R 2(V 2 (n)) i (n) UBD. Combining this with (34) for i = 2, we have [ ( )] [R R 2 (q2) + R 2(V i (qi ) 2 (n)) R i (V i) ] q i q 2 1 (n) + R 2(V 2 (n)) i (n) UBD. Since k i(n) = 0 and the coefficient of 1 (n) in the above item converges to R 2 (q 2) > 0, 1 (n) is bounded. Part (i) for case I of the theorem follows from Lemma 10. For case II, dividing RHS of (33) by R i(v i (n)) (this is ositive for n sufficiently large) imlies, for i Ω, [ R i (qi ) R i (V i(n)) ] q i 1 (n) + i=3 [ R i (qi ) R i (V i(n)) ] q i 1 (n) + i (n) UBD. (35) Therefore, their sum over all i Ω, ( i (n) = ) [ R i (qi ) R i (V i(n)) ] q i 1 1 (n) UBD. If we denote as B(n) the coefficient of 1 (n) in the RHS above, we have B(n) 1 (n) UBD. (36) 29

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for