Sample Average Approximation with Adaptive Importance Sampling

Size: px

Start display at page:

Download "Sample Average Approximation with Adaptive Importance Sampling"

Myron Rogers
5 years ago
Views:

1 oname manuscrpt o. wll be nserted by the edtor) Sample Average Approxmaton wth Adaptve Importance Samplng Andreas Wächter Jeremy Staum Alvaro Maggar Mngbn Feng October 9, 07 Abstract We study sample average approxmatons under adaptve mportance samplng n whch the sample denstes may depend on prevous random samples. Based on a generc unform law of large numbers, we establsh unform convergence of the sample average approxmaton to the true functon. We obtan convergence of the optmal value and optmal solutons of the sample average approxmaton. The relevance of ths result s demonstrated n the context of the convergence analyss of a randomzed optmzaton algorthm. Keywords sample average approxmaton, adaptve mportance samplng, lkelhood rato, parametrc ntegraton, unform convergence Introducton We are nterested n mnmzng a functon g : X R gven by gx) = F x, ξ)hx, ξ) dξ ) Ξ where F x, ) s measurable for all x, and hx, ) s a probablty densty functon that mght depend on x. We assume that X s a compact subset of R n. The ntegral gx) can be nterpreted as an expectaton E x [F x, ξ)] taken under the assumpton that ξ s a random vector wth densty hx, ). When the ntegral ) cannot be computed or s too expensve to evaluate, sample average approxmaton SAA) provdes a way to obtan an approxmaton of the mnmzer of gx). In the most smple settng, when the probablty dstrbuton does not depend on x, that s, hx, ξ) = hξ), ths approach conssts of mnmzng the sample average approxmaton ĝ x) = / F x, ξ ), where the realzatons ξ,..., ξ of the random varable Addresses) of authors) should be gven

2 Andreas Wächter et al. are drawn from hξ). In ths case, the set of mnmzers of ĝ converges to the set of mnmzers of gx) as, f ĝ converges unformly to g [9]. To extend ths approach, consder the parametrc ntegral gx) = Gx, ξ) dξ. ) Ξ Let φ be a samplng dstrbuton so that φξ) > 0 for any ξ such that there exsts an x X wth Gx, ξ) > 0. Then, when {ξ } s sampled..d. from φ, the Monte Carlo estmator / Gx, ξ )/φξ ) converges a.s. to gx) for all x X. In the context of problem ), defne Gx, ξ) = F x, ξ)hx, ξ). Then the estmator has the form ĝ x) = F x, ξ ) hx, ξ ) φξ ). Ths approach s known as mportance samplng [8]. The samplng densty φ may be dfferent from the target densty h. Usually, φ s chosen to reduce the varance of estmatng the expectaton of F. The key contrbuton of ths paper s that we provde convergence results wthout assumng that the samples ξ are ndependent and dentcally dstrbuted. Instead, we study the convergence of the sample average approxmaton gven by ĝ x) = Gx, ξ ) φ ξ ), 3) where, for each =,...,, ξ s sampled from a dfferent mportance samplng densty φ. A samplng densty φ mght even depend on the prevous samples ξ,..., ξ and s therefore by tself a random varable. Ths settng s smlar to that of adaptve multple mportance samplng [5, 6]. There, however, the estmator uses mxture dstrbutons, a case not consdered here. The pontwse convergence of ĝ x) to gx) for a sngle fxed x as the sample sze goes to nfnty s by tself of nterest and, dependng on the choce φ, mght be relatvely elementary see Secton 4 for two examples). In Secton, we gve condtons under whch pontwse convergence leads to unform convergence of the functons ĝ to g. Ths n turn allows us to establsh the convergence of the optmal solutons of the sample average approxmaton mn ĝx) 4) x X to the optmal solutons of the orgnal optmzaton problem mn x X gx). 5) In Secton 3 we extend ths to the case when ĝ depends on addtonal random nusance parameters z that converge to a random lmt pont z. Secton 5 gves smplfed condtons for unform convergence for the case that all probablty dstrbutons are normal. Fnally, n Secton 6 we apply our results to

3 Sample Average Approxmaton wth Adaptve Importance Samplng 3 prove convergence of the parameters n a quadratc regresson model that approxmates a stochastc functon n the context of a randomzed optmzaton algorthm. In stochastc optmzaton, mportance samplng has been used, for example, n the context of Benders decomposton [7, 0, ]. Royset and Polak [7] presented a result on unform convergence of the sample average approxmaton when ξ,..., ξ are ndependently sampled from an dentcal mportance samplng dstrbuton. In ther work, both the target and the samplng dstrbutons are assumed to be normal. The convergence of the sample average approxmaton under non-d samplng has been addressed, for example, by Da et al. [6]. They proved results about convergence of solutons to SAA problems when ξ,..., ξ are nether dentcally dstrbuted nor ndependent, but dd not dscuss unform convergence of ĝ to g. Dupačkocá and Wets [9] proved ep-convergence of ĝ to g, from whch convergence of solutons to SAA problems follows. Ther analyss assumes that {φ } converges n dstrbuton. A smlar result was obtaned by Korf and Wets [4]. One of ther assumptons s that {ξ } forms an ergodc process, whch may not be easy to verfy n many applcatons. Homem-de-Mello [] establshed results on unform convergence of ĝ to g, and of solutons to SAA problems, under non-d samplng. Hs results were generalzed by Xu []. Whle the these papers consder non-d samplng, our results are more general snce they permt dstrbutons that are adaptvely chosen based on the prevous samples. Unform Convergence To recaptulate wth more mathematcal detal: let X be a compact subset of R n, Ξ be a subset of R d and G be a functon from X R d to R whose support s contaned n X Ξ. Let Ω, G, Q) be a probablty space on whch there s an nfnte sequence of random vectors {ξ }, each ξ beng a G-measurable functon from Ω to R d. Defne {F } as the natural fltraton of ths sequence,.e., F contans the nformaton n ξ,..., ξ. Suppose that under Q, for every, the condtonal dstrbuton of ξ gven F has a densty φ. Let Ξ represent the support of φ ; ths subset of R d can be random. Suppose that G : X Ξ R be a real-valued functon so that, for all x X, ) exsts and s fnte. We are concerned wth unform convergence as of the sample average approxmaton ĝ defned by 3) to the functon g defned by ). The followng assumpton ensures that the ratos n 3) are fnte. Assumpton Wth probablty one, for every, Ξ Ξ. Our strategy s to assume that a pontwse strong law of large numbers apples Assumpton ), and then to specfy a Lpschtz-type condton Assumpton 3) that guarantees that the convergence s unform. Assumpton For all x X, w.p., lm ĝ x) gx) = 0.

4 4 Andreas Wächter et al. In Secton 4 we dscuss two pontwse laws of large numbers, ncludng one n whch {ξ } s nether ndependently nor dentcally dstrbuted. The followng Lpschtz assumpton corresponds to Assumpton S-LIP n []. Assumpton 3 There exsts a functon γ : R + R such that lm δ 0 γδ) = 0 and, for every, there exsts a random) measurable functon γ : Ξ R, such that and, wth probablty one, lm sup E[γ ξ )] <, 6) γ ξ ) E[γ ξ )]) = 0, 7) and, for all x, x X and, wth probablty one, Gx, ξ ) Gx, ξ ) φ ξ ) φ ξ ) γ ξ )γ x x ). 8) Lpschtz-type condtons smlar to 8) are common n unform convergence results see, for example, [8, 3, 0]). Together wth the compactness of the parameters, t allows for the extenson of pontwse results to unform ones. The Lpschtz constants are allowed to vary from sample to sample to accommodate a greater varety of samplng dstrbutons, so long as they satsfy the regularty condtons gven by 6) and 7). For the case of normal dstrbutons, Secton 5 presents condtons that are easer to verfy than those above. The next theorem follows from Theorem 3b) n []. It establshes unform convergence of the estmator ĝ to g. Theorem If Assumptons,, and 3 hold, then, wth probablty one, lm ĝ g = 0. ext we consder the convergence of the optmal solutons of the sample average approxmaton 4) to the optmal soluton of the orgnal problem 5). Let ˆϑ and ϑ denote the optmal objectve values of 4) and 5), respectvely. Smlarly, let Ŝ and S denote the set of optmal solutons of 4) and 5), respectvely. Fnally, we defne the dstance of a pont x X to a set B X as dstx, B) = nf x B x x and the devaton of a set A X from the set B as DA, B) = sup x A dstx, B). Theorem Suppose that Assumptons,, and 3 hold, that ) G, ξ) s lower sem-contnuous for all ξ R d, and ) that there exsts an ntegrable functon Zξ) such that Gx, ξ) Zξ) for all x X and almost all ξ Ξ. Further assume that there exsts a compact set C X such that S s nonempty and contaned n C, and wth probablty one, for large enough, Ŝ s non-empty and contaned n C. Then, wth probablty one, lm ˆϑ = ϑ and lm DŜ, S ) = 0. Havng establshed the unform convergence n Theorem, the proof of Theorem follows closely the proof of Theorem 5.3 n [9]. See Appendx A.)

5 Sample Average Approxmaton wth Adaptve Importance Samplng 5 3 Results When Some Parameters Converge In ths secton we consder the stuaton n whch the vector x n the parametrc ntegral ) s parttoned nto optmzaton varables y and nusance parameters z, wrtng x = y, z). We provde results relevant to sample average approxmaton and optmzaton over y alone, where the sample average approxmatons are constructed usng a convergent sequence of random values of the z parameters. For example, z may represent estmators of statstcal parameters, decsons that are updated and converge over tme, etc. Secton 6 descrbes an example n whch z corresponds to the terates of a randomzed optmzaton algorthm. To be mathematcally precse, let us assume that n the framework establshed n Secton, X = Y Z, where Y R ny and Z R nz for some n y and n z that sum to n. Further suppose there s a sequence of random vectors {Z } =, each Z beng a G-measurable functon from Ω to R nz. Ths sequence need not be adapted to the fltraton {F }. We analyze problems n whch ths sequence converges to a lmtng random varable Z. Assumpton 4 There exsts a random varable Z such that lm Z Z = 0 wth probablty one. We study the convergence of sample average approxmatons ĝ Z : Ω L Y) gven by ĝ Z y) = Gy,Z,ξ ) φ ξ ) to the functon g Z : Ω L Y) gven by g Z y) = gy, Z ). The followng result s a generalzaton of Theorem n ths context. Here, Assumptons,, and 3 refer to G : X Ξ R wth X = Y Z and x = y, z) Y Z. Theorem 3 If Assumptons,, 3, and 4 hold, then wth probablty one, lm ĝ Z gz = 0. Proof We have ĝ Z g Z Gy, Z, ξ ) = sup gy, Z ) y Y φ ξ ) sup Gy, Z, ξ ) Gy, Z, ξ ) y Y φ ξ ) + sup y Y 8) sup γ ξ )γ Z Z ) + sup y Y y Y Gy, Z, ξ ) φ ξ ) gy, Z ) gy, Z ). 9) Gy, Z, ξ ) φ ξ ) By Theorem, the second term converges to zero. For the frst term, we see that γ ξ ) = γ ξ ) E[γ ξ )]) + E[γ ξ )]

6 6 Andreas Wächter et al. where, by Assumpton 3, the frst term converges to zero and the second term s bounded. Snce Z converges to Z, we have from the contnuty of γ at 0 that γ Z Z ) 0. Hence, also the frst term n 9) converges to zero. Fnally, n analogy to 5) and 4), we consder the optmzaton problem ϑ Z := mn y Y g Z y) and ts sample average approxmaton ˆϑ Z := mn y Y ĝ Z y). Let SZ and ŜZ denote the set of optmal mnmzers of gz and ĝ Z, respectvely. Theorem 4 follows from Theorem 3 n the same way that Theorem follows from Theorem. Theorem 4 Suppose that Assumptons,, 3, and 4 hold, that ) G, ξ) s lower sem-contnuous for all ξ R d, and ) that there exsts an ntegrable functon Zξ) such that Gy, z, ξ) Zξ) for all y, z) Y Z and almost all ξ Ξ. Further assume that there exsts a compact set C Y such that, wth probablty one, S Z s non-empty and contaned n C and for large enough, Ŝ Z s non-empty and contaned n C. Then, wth probablty one, lm ˆϑZ = ϑ Z and lm DŜZ, SZ ) = 0. 4 Pontwse Strong Laws of Large umbers In ths secton, we gve two examples of theorems that mply the pontwse convergence requred n Assumpton. The frst s the well-known strong law of large numbers for ndependent and dentcally dstrbuted random varables. It follows, for example, from Theorem 6. n [], usng the fact that φ s the densty for ξ, and therefore E [ Gx,ξ) φ ξ ) assumpton on the measurablty of Gx, ). ] = gx). We however need the followng Assumpton 5 For all x X, Gx, ) s a measurable functon on R d and gx) <. Theorem 5 Suppose Assumpton and 5 hold. If {ξ } are ndependent and dentcally dstrbuted.e., φ = φ for all ), then for all x X, wth probablty one, lm ĝ x) gx) = 0. ext we establsh a pontwse strong law of large numbers for the case n whch {ξ } are nether ndependently nor dentcally dstrbuted. Assumpton 6 There exst non-negatve constants k and b such that, wth probablty one, for all, x X, and ξ Ξ, Gx,ξ) φ ξ) k expb ξ ). Assumptons on the uncondtonal moment generatng functon of F x, ξ) n ), for each x X, are common n ths type of analyss [6,,]. In Assumpton 7, we focus nstead on the moment generatng functon M of the condtonal dstrbuton of ξ gven F, defned as M s) = E[exps ξ ) F ] = Ξ exps ξ )φ ξ) dξ. ote that M s a random functon. Assumpton 7 There exsts α such that α E[M αb)] <, where b s as n Assumpton 6.

7 Sample Average Approxmaton wth Adaptve Importance Samplng 7 In Secton 5 we show that Assumpton 7 s satsfed when the denstes φ are normal dstrbutons wth bounded means. Theorem 6 Suppose Assumpton, 5, 6, and 7 hold. Then for all x X, wth probablty one, lm ĝ x) gx) = 0. The proof requres a smple relatonshp that s easy to show. Lemma Gven a, c R and r, we have a + c r + c ) r + a r ). Proof Proof of Theorem 6) For a gven fxed x X and all,, defne U = Gx,ξ) φ ξ ) gx) and V = U, so that ĝ x) gx) = V /. The clam of the theorem follows from Chow s strong law of large numbers for martngales see [3]) whch that states that V / 0 wth probablty one. The remander of ths proof verfes that our settng satsfes the condtons for the theorem n [3]. The condtons are that V be a martngale whose ncrements satsfy Chung s condton Equaton 3.) n [4]). That s, there exsts α such that +α) E[ U α ] <. To see that V s a martngale, recall that φ s the densty of ξ, and therefore E[U ] = 0 for all wth probablty one. Lettng a = U + gx) = Gx,ξ) φ, c = gx) and r = α n Lemma, we fnd E [ ξ ) U α] [ ) ]) α C + E Gx,ξ) φ ξ ), where C = + gx) ) α. Assumpton 6 then yelds [ Gx, ) ] [ [ α Gx, ) ]] α ξ ) ξ ) E = E E φ ξ ) φ ξ ) F E [ k α expαb ξ ) F ] = k α E[M αb)]. Snce α + >, we have +α) <, and wth Assumpton 7 ) +α) E[ U α ] C +α) + k α +α) E[M αb)] <. Hence, Chung s condton holds. 5 ormal Dstrbutons and Smooth Functons Assumpton 3 s stated n very general terms. ow we present specfc condtons that are easer to verfy. We consder the case n whch all densty functons correspond to normal dstrbutons wth dfferent means µ and varances σ, so they are of the form ϕµ, σ, ξ) = πσ) exp ξ ) µ d σ. 0)

8 8 Andreas Wächter et al. Assumpton 8 Let Ξ = R d, and for all x X and ξ R d we have hx, ξ) = ϕx, σ, ξ) for some σ > 0. Furthermore, for all, and ξ R d, we have Ξ = R d and φ ξ) = ϕµ, σ, ξ) for some random varables µ R and σ σ. The sequence {µ } s unformly bounded wth probablty one. Under ths assumpton, the moment generatng functons M s) = exps ξ )φ ξ) dξ = exp s ξ πσ ) d ξ µ ) σ are unformly bounded for fxed s, and Assumpton 7 holds for any values of α and b > 0). Furthermore, the followng lemma establshes that the lkelhood rato has subexponental growth. Lemma Suppose Assumpton 8 holds. Then there exst constants k h, b h 0 so that hx,ξ) φ ξ) k h expb h ξ ) for all, x X, and ξ Ξ. Proof Choose any, x X, and ξ Ξ. Then ) hx, ξ) log = log ϕx, σ, ξ)) log ϕµ, σ, ξ)) φ ξ) 0) = x ξ σ + ξ µ = σ = σ σ ) x + x, ξ ξ + σ σ µ σ σ µ, ξ + σ σ ξ [ ] ) σ σ µ x + x σ σ σ µ, ξ + σ ξ. ) By Assumpton 8, σ σ, and the term n the square brackets s non-postve. Because X s compact and µ s bounded by Assumpton 8, there exst postve constants k and b h so that for all, x X, and ξ Ξ, we have log hx,ξ) φ ξ) ) k + b h ξ. The clam of Lemma follows wth k h = exp k). We also requre some dfferentablty propertes for F. Assumpton 9 Suppose, that F n ) s contnuously dfferentable n x for any ξ Ξ, and that there exst k F, b F > 0 so that for any x X and ξ Ξ dξ, F x, ξ) k F expb F ξ ) and a) x F x, ξ) k F expb F ξ ). Here, x denotes the gradent wth respect to x. b) A consequence of the fnal proposton s that the clams of Theorems,, 3, and 4 hold under Assumptons 8 and 9. Proposton If Assumptons 8 and 9 hold, then Assumptons,, and 3 hold for Gx, ξ) = F x, ξ)hx, ξ).

9 Sample Average Approxmaton wth Adaptve Importance Samplng 9 Proof Suppose the assumptons of Proposton hold. Assumpton 8 mples Assumpton, and Assumpton 9 mples Assumpton 5. We already argued above that Assumpton 7 holds because of Assumpton 8. Assumpton 6 holds, snce for any, x X, and ξ Ξ, Gx, ξ) φ ξ) hx, ξ) a) = F x, ξ) k F expb F ξ ) k h expb h ξ ), φ ξ) where we used Lemma. Therefore, Theorem 6 mples that Assumpton holds. It remans to prove that Assumpton 3 s also mpled. ote that x hx, ξ) = σ hx, ξ)x ξ) for all x, ξ R d. Usng ths and the mean value theorem, we have for all and x, x X that Gx, ξ ) φ ξ ) Gx, ξ ) φ ξ ) = φ ξ ) xg x, ξ ), x x = φ ξ ) xf x, ξ )h x, ξ ) + F x, ξ ) x h x, ξ ), x x = h x, ξ ) φ ξ ) xf x, ξ ) + σ F x, ξ ) x ξ ), x x 3) for some x {λ x + λ )x : λ 0, )}. Wth M x = max{ x : x X } <, we fnd xf x, ξ ) + σ F x, ξ ) x ξ ) σ + M x + k F σ exp b F + ) ξ ) 4) where we used Assumpton 9 and ξ exp ξ ). Usng smlar arguments as n ), ) we have wth an arbtrary but fxed ˆx X and all that log h x,ξ ) hˆx,ξ ) = σ x ˆx + x ˆx, ξ ) ) ) ) σ M M x + M x ξ, so h x x, ξ ) hˆx, ξ ) exp σ exp Mx ξ σ. Combnng ths wth 3) and 4) we have Gx, ξ ) Gx, ξ ) φ ξ ) φ ξ ) hˆx, ξ ) φ ξ ) k G expb G ξ ) x x wth k G = k F + M x+) M σ exp x and b G = b F + + Mx σ. Defnng γ ξ ) = k G expb G ξ )hˆx,ξ ) φ ξ ), t remans to show that 6) and 7) hold. We are now gong to apply Theorem 6 to the functon G γ ˆx, ξ) = k G expb G ξ )hˆx, ξ) wth X γ = {ˆx}. For ths, note that g γ ˆx) defned as g γ ˆx) := Ξ G γ ˆx, ξ) dξ = Ξ σ ) G γ ˆx, ξ) φ ξ ) φ ξ ) dξ = Ξ γ ξ )φ ξ ) dξ = E[γ ξ )] s fnte. The last equalty follows because ξ s sampled from densty φ. Therefore, 6) holds, and Assumpton 5 holds for G = G γ. Further consder ĝ γ, ˆx) := G γˆx,ξ ) φ ξ ) = γ ξ ). From the defnton

10 0 Andreas Wächter et al. of G γ and Lemma, we have for any ξ Ξ that Gγˆx,ξ) φ ξ) = hˆx,ξ) φ ξ) k G expb G ξ ) k h k G expb h + b G ) ξ ). Therefore, Assumpton 6 holds for G = G γ, and usng Theorem 6 we obtan 0 = lm ĝ γ, ˆx) g γ ˆx)) = lm γ ξ ) E[γ ξ )]), whch s 7). 6 Example: Regresson Models for Step Computaton n an Optmzaton Algorthm As an llustraton n whch the mportance samplng s adaptve and nusance parameters are present, we consder the randomzed optmzaton algorthm proposed by Maggar et al. [5] n whch a local model of the objectve s constructed va a SAA regresson problem n every teraton. The algorthm n [5] addresses the mnmzaton of the functon L : Z R gven by Lz) = Ξ Lξ)hz, ξ) dξ, where Z Rd s a compact set, Ξ = R d, and hy, ξ) = ϕy, σ, ξ) s the normal densty wth mean y and varance σ. The ntegral s fnte because L : R d R s assumed to exhbt subexponental growth. Lξ) s the output of a determnstc computer smulaton wth nput ξ and the orgnal objectve functon one would lke to mnmzer. However, snce L s subject to numercal nose and therefore dscontnuous, the task of mnmzng L s ll-defned. To overcome ths dffculty, [5] proposes to mnmze the convoluton L as a smooth approxmaton of L. The dervatve-free trust-regon optmzaton algorthm proposed n [5] Lξ ) ϕz,σ,ξ) ϕt,σ,ξ ) utlzes an SAA approxmaton L z) = of L. The ponts ξ are sampled randomly accordng to the normal pdf ϕt, σ, ), where ts mean t s ether an terate or a tral pont encountered by the algorthm up to teraton. ote that the lkelhood rato n the defnton of Lz) has the form of that n 3) and therefore falls nto our framework. Gven an terate z Y, the optmzaton algorthm generates a tral pont z as the mnmzer of a quadratc model wthn a ball around z. The model has the form q ξ; z ) = b+ g, ξ z + ξ z, Qξ z ), wth coeffcents b R, g R d. The matrx Q R d d s symmetrc, and q ξ; z ) should approxmate the smulaton output Lξ) for ξ close to z. Convergence of the optmzaton algorthm would follow f the model parameters are computed by a weghted local regresson of L; that s, f y = b, g, Q) are the mnmzers of mn F y, z, ξ)hz, ξ) dξ, 5) y Y Ξ where F y, z, ξ) = b + g, ξ z + ξ z, Qξ z) Lξ)). Ths objectve functon has the form of ). In abuse of notaton, we collect the model parameters b, g, and Q n the vector y.) To get an approxmate soluton of 5), at an terate Z usng an upper case letter to emphasze ts stochastc nature), the optmzaton algorthm computes the quadratc model from the stochastc average approxmaton of

11 Sample Average Approxmaton wth Adaptve Importance Samplng 5); that s mn y Y ϕz, σ, ξ ) ϕt, σ, ξ ) b + g, ξ Z + ξ Z, Qξ z ) Lξ)). 6) The analyss of the algorthm n [5] requres that the model q ξ; Z ) converges to the optmal soluton of 5) at any lmt pont Z of the terates Z. Ths can be proved usng the results n Secton 3. For any ω Ω, let {Z ω)} = be a subsequence of terates such that {Z ω)} = converges to a lmt pont Z ω). Such a subsequence exsts, due to compactness of Z; thus Assumpton 4 holds. Furthermore, snce F y, z, ξ) s a polynomal n y, z) and L exhbts subexponental growth, Assumpton 9 holds. Also, because all terates and tral ponts are contaned n Z, the sequence {T }, consstng of such ponts, s unformly bounded. Fnally, the algorthm n [5] ensures that the optmal solutons of 5) and 6) are unque and unformly bounded, by montorng the condton number of matrces nvolved n the computaton of the optmal soluton of 6). In summary, Assumptons 4 and 9 hold, and Proposton together wth Theorem yelds lm DŜZ, SZ ) = 0. So, the approxmate model parameters n ŜZ n teraton converge to the optmal parameters n S Z. 7 Concluson We consdered the sample average approxmaton of stochastc optmzaton problems whose objectve functon s expressed as a parametrc ntegral. The key contrbuton s that we permt non-ndependent, non-dentcal, and adaptve samplng, where the mportance samplng dstrbuton may depend on prevous samples. Under the assumpton of pontwse convergence and a stochastc Lpschtz condton, we proved unform convergence of the sample average approxmaton of the parametrc ntegral over a compact set as well as convergence of the optmal values and optmal soluton sets of the sample average approxmaton problems as the number of samples goes to nfnty. Acknowledgments We thank Tto Homem-de-Mello, Davd Morton, Imry Rosenbaum, and Johannes Royset for dscussons. References. Andrews, D.W.: Generc unform convergence. Econometrc theory 80), ). Bllngsley, P.: Probablty and Measure, 3rd edn. John Wley & Sons 995) 3. Chow, Y.S.: On a strong law of large numbers for martngales. The Annals of Mathematcal Statstcs 38), )

12 Andreas Wächter et al. 4. Chung, K.L.: The strong law of large numbers. In: Processdngs of the Second Berkeley Symposum on Mathematcal Statstcs and Probablty, 950, pp Unversty of Calforna Press, Berkeley and Los Angeles 95) 5. Cornuet, J.M., Marn, J.M., Mra, A., Robert, C.P.: Adaptve multple mportance samplng. Scandnavan Journal of Statstcs 39, ) 6. Da, L., Chen, C.H., Brge, J.R.: Convergence propertes of two-stage stochastc programmng. Journal of Optmzaton Theory and Applcatons 063), ) 7. Dantzg, G.B., Glynn, P.W.: Parallel processors for plannng under uncertanty. Annals of Operatons Research ), 990) 8. Duffe, D., Sngleton, K.J.: Smulated moments estmaton of markov models of asset prces 990) 9. Dupačová, J., Wets, R.: Asymptotc behavor of statstcal estmators and of optmal solutons of stochastc optmzaton problems. The Annals of Statstcs pp ) 0. Glynn, P.W., Infanger, G.: Smulaton-based confdence bounds for two-stage stochastc programs. Mathematcal Programmng 38), ). Homem-de-Mello, T.: On rates of convergence for stochastc optmzaton problems under non-ndependent and dentcally dstrbuted samplng. SIAM Journal on Optmzaton 9), ). Infanger, G.: Monte Carlo mportance) samplng wthn a Benders decomposton algorthm for stochastc lnear programs. Annals of Operatons Research 39), ) 3. Jensh,., Prucha, I.R.: Central lmt theorems and unform laws of large numbers for arrays of random felds. Journal of econometrcs 50), ) 4. Korf, L., Wets, R.J.B.: Random LSC functons: An ergodc theorem. Mathematcs of Operatons Research 6), ) 5. Maggar, A., Wächter, A., Dolnskaya, I.S., Staum, J.: A dervatve-free trust-regon algorthm for the optmzaton of functons smoothed va Gaussan convoluton usng multple mportance samplng 05). HTML/05/07/507.html 6. Marn, J.M., Pudlo, P., Sedk, M.: Consstency of the adaptve multple mportance samplng 04). ArXv:.548v 7. Royset, J.O., Polak, E.: Implementable algorthm for stochastc optmzaton usng sample average approxmatons. Journal of Optmzaton Theory and Applcatons ), ) 8. Rubnsten, R.Y., Kroese, D.P.: Smulaton and the Monte Carlo method, 3rd edn. John Wley & Sons 07) 9. Shapro, A., Dentcheva, D., Ruszczńsk, A.: Lectures on Stochastc Programmng: Modelng and Theory. SIAM, Phladelpha 009) 0. Shapro, A., Xu, H.: Unform laws of large numbers for set-valued mappngs and subdfferentals of random functons. Journal of mathematcal analyss and applcatons 35), ). Xu, H.: Unform exponental convergence of sample average random functons under general samplng wth applcatons n stochastc programmng. Journal of Mathematcal Analyss and Applcatons 368, ) A Proof of Theorem 4 We establsh the result n two lemmas. Lemma 3 Suppose Assumptons,, and 3 hold. Further assume that S s not empty and that, wth probablty one, Ŝ s non-empty for all suffcently large. Then lm ˆϑ = ϑ wth probablty one. Proof We prove lm ˆϑ = ϑ n the event that Ŝ s non-empty for all suffcently large and that lm ĝ g = 0. Ths event has probablty one by assumpton and by Theorem.

13 Sample Average Approxmaton wth Adaptve Importance Samplng 3 Let x be an optmal soluton of 5). Because lm ĝ g = 0, lm ĝ x ) = gx ) = ϑ. Snce ˆϑ s the optmal value of 4), ˆϑ ĝ x ) for all. As a consequence, lm sup ˆϑ ϑ. Defne ˆϑ nf = lm nf ˆϑ. There exst a subsequence { } of the natural numbers and a sequence {x } = of ponts n X such that for every =,...,, x Ŝ, and lm ĝ x ) = ˆϑ nf. Because lm ĝ g = 0, we also have lm gx ) = ˆϑ nf. Snce ϑ s the optmal value of 5), ϑ gx ) for all. Therefore ϑ ˆϑ nf. Overall, we have obtaned lm sup ˆϑ ϑ lm nf ˆϑ. Lemma 4 Suppose the assumptons of Theorem hold. Then, w.p., lm DŜ, S ) = 0. Proof We prove lm DŜ, S ) = 0 n the event that lm ĝ g = 0, lm ˆϑ = ϑ, and Ŝ s non-empty and contaned n C for all suffcently large. Ths event has probablty one by Theorem, by Lemma 3, and by assumpton. Consder any subsequence { } of the natural numbers and sequence {x } = of ponts n X such that for every =,...,, x Ŝ. Because C s compact, the sequence {x } has a lmt pont. Consder any such lmt pont, and denote t as x. Consder any subsequence { } of { } such that lm x = x. For any, ) ) ˆϑ gx ) = ĝ x ) gx ) = ĝ x ) gx ) + gx ) gx ). It follows from assumptons ) and ) n Theorem and Theorem 7.47 n [9] that g s lower sem-contnuous, whch n turn mples that lm nf gx ) gx )) 0. We also have lm ĝ x ) gx )) = 0 snce lm ĝ g = 0. Therefore lm ˆϑ gx ). We also have lm ˆϑ = ϑ. Thus, gx ) ϑ, whch mples x S. In words: f x s a lmt pont of a sequence {x } of ponts that are optmal solutons of a sequence of sample average approxmaton problems gven by 4), then x s n S. Therefore, lm sup DŜ, S ) = lm sup sup dstx, S ) = 0. x Ŝ

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number