Confidence regions for stochastic variational inequalities

Similar documents
Individual confidence intervals for true solutions to stochastic variational inequalities

Implications of the Constant Rank Constraint Qualification

CONFIDENCE INTERVALS FOR STOCHASTIC VARIATIONAL INEQUALITES. Michael Lamm

Stability of optimization problems with stochastic dominance constraints

Asymptotics of minimax stochastic programs

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Normal Fans of Polyhedral Convex Sets

C 1 convex extensions of 1-jets from arbitrary subsets of R n, and applications

Convergence of Stationary Points of Sample Average Two-Stage Stochastic Programs: A Generalized Equation Approach

1 Directional Derivatives and Differentiability

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

INVERSE FUNCTION THEOREM and SURFACES IN R n

CHAPTER 9. Embedding theorems

Mathematical Analysis Outline. William G. Faris

A NICE PROOF OF FARKAS LEMMA

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Laplace s Equation. Chapter Mean Value Formulas

Set, functions and Euclidean space. Seungjin Han

1 Cheeger differentiation

Real Analysis Notes. Thomas Goller

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Course 212: Academic Year Section 1: Metric Spaces

MAT 257, Handout 13: December 5-7, 2011.

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

Real Analysis Problems

Convex Geometry. Carsten Schütt

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.

l(y j ) = 0 for all y j (1)

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

GENERALIZED second-order cone complementarity

arxiv: v2 [math.ag] 24 Jun 2015

Complexity of two and multi-stage stochastic programming problems

A strongly polynomial algorithm for linear systems having a binary solution

Lecture 7 Monotonicity. September 21, 2008

Differentiation. f(x + h) f(x) Lh = L.

Analysis Qualifying Exam

B. Appendix B. Topological vector spaces

An introduction to some aspects of functional analysis

Optimization and Optimal Control in Banach Spaces

P-adic Functions - Part 1

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

THEOREMS, ETC., FOR MATH 515

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Introduction to Real Analysis Alternative Chapter 1

Topology, Math 581, Fall 2017 last updated: November 24, Topology 1, Math 581, Fall 2017: Notes and homework Krzysztof Chris Ciesielski

Observer design for a general class of triangular systems

REAL AND COMPLEX ANALYSIS

Near-Potential Games: Geometry and Dynamics

Integration on Measure Spaces

Convex Feasibility Problems

arxiv: v1 [math.ca] 7 Aug 2015

Lecture 9 Monotone VIs/CPs Properties of cones and some existence results. October 6, 2008

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Notes for Functional Analysis

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

1. Bounded linear maps. A linear map T : E F of real Banach

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Banach Spaces V: A Closer Look at the w- and the w -Topologies

Existence and Uniqueness

Mathematics 530. Practice Problems. n + 1 }

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

LECTURE 15: COMPLETENESS AND CONVEXITY

Topological Properties of Operations on Spaces of Continuous Functions and Integrable Functions

Some Background Material

Notes on Complex Analysis

Notes on Distributions

3 Integration and Expectation

On duality theory of conic linear problems

General Notation. Exercises and Problems

Mathematical Methods for Physics and Engineering

Mathematics Course 111: Algebra I Part I: Algebraic Structures, Sets and Permutations

arxiv: v1 [math.oc] 22 Sep 2016

1 Lyapunov theory of stability

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

Part III. 10 Topological Space Basics. Topological Spaces

Topological properties

Chapter 2 Convex Analysis

Optimality Conditions for Nonsmooth Convex Optimization

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

ASYMPTOTICALLY NONEXPANSIVE MAPPINGS IN MODULAR FUNCTION SPACES ABSTRACT

(c) For each α R \ {0}, the mapping x αx is a homeomorphism of X.

SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES

An introduction to Mathematical Theory of Control

LECTURE 3 Functional spaces on manifolds

On Kusuoka Representation of Law Invariant Risk Measures

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Contents. O-minimal geometry. Tobias Kaiser. Universität Passau. 19. Juli O-minimal geometry

2. Function spaces and approximation

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis

Analysis Comprehensive Exam Questions Fall F(x) = 1 x. f(t)dt. t 1 2. tf 2 (t)dt. and g(t, x) = 2 t. 2 t

1 Math 241A-B Homework Problem List for F2015 and W2016

Eilenberg-Steenrod properties. (Hatcher, 2.1, 2.3, 3.1; Conlon, 2.6, 8.1, )

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

Transcription:

Confidence regions for stochastic variational inequalities Shu Lu Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, 355 Hanes Building, CB#326, Chapel Hill, NC 27599-326 email: shulu@email.unc.edu http://www.unc.edu/~shulu/ Amarjit Budhiraja Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, 357 Hanes Building, CB#326, Chapel Hill, NC 27599-326 email: budhiraj@email.unc.edu http://www.unc.edu/~budhiraj/ The sample average approximation (SAA) method is a basic approach for solving stochastic variational inequalities (SVI). It is well known that under appropriate conditions the SAA solutions provide asymptotically consistent point estimators for the true solution to an SVI. It is of fundamental interest to use such point estimators along with suitable central limit results to develop confidence regions of prescribed level of significance for the true solution. However, standard procedures are not applicable since the central limit theorem that governs the asymptotic behavior of SAA solutions involves a discontinuous function evaluated at the true solution of the SVI. This paper overcomes such a difficulty by exploiting the precise geometric structure of the variational inequalities and by appealing to certain large deviations probability estimates, and proposes a method to build asymptotically exact confidence regions for the true solution that are computable from the SAA solutions. We justify this method theoretically by establishing a precise limit theorem, apply it to complementarity problems, and test it with a linear complementarity problem. Key words: variational inequalities; stochastic variational inequalities; central limit theorems in Banach spaces; large deviations; confidence regions; statistical inference MSC2 Subject Classification: Primary: 9C33; Secondary: 9C15 OR/MS subject classification: Primary: Programming: complementarity, Programming: stochastic; Secondary: Programming: nonlinear: theory 1. Introduction. This paper studies the problem of constructing confidence regions for the solution to a stochastic variational inequality defined over a polyhedral convex set. Let (Ω, F, P ) be a probability space, and ξ be a random vector that is defined on Ω and supported on a closed subset Ξ of R d. Let O be an open subset of R n, and F be a measurable function from O Ξ to R n, such that for each x O the expectation E F (x, ξ) <. Let S be a polyhedral convex set in R n defined by S = x R n Ax b = x R n a i, x b i, i = 1,, m, (1) where A is an m n matrix whose rows are given by a T 1,, a T m, and b = (b 1,, b m ) is a column vector in R m. Suppose that S O, and let f : O R n be defined as The stochastic variational inequality (SVI) problem is to find: where N S (x) R n denotes the normal cone to S at x: f (x) = E[F (x, ξ)]. (2) x S O, such that f (x) N S (x), (3) N S (x) = v R n v, s x for each s S. The normal cone N S (x) is the polar cone of the tangent cone to S at x, which is given by T S (x) = u R n a i, u, i I(x), where I(x) denotes the active index set at x, i.e., the set of all indices i 1,, m such that a i, x = b i. Indeed, the function f defined in (2) is deterministic, and the problem (3) is essentially a deterministic variational inequality. However, evaluating f (x) for each given value of x requires finding the expected value of a random vector, which in most problems of interest does not have a closed form expression and in general requires a numerical approximation. Such approximations are usually provided by sampling. Depending on how sampling is incorporated with the algorithm, solution methods for SVIs can be classified into two basic categories. The first category consists of the stochastic approximation (SA) methods, which perform sampling in an interior manner, by applying an algorithm for deterministic variational 1

2 S. Lu and A. Budhiraja: inequalities and resorting to sampling whenever the algorithm requires values or gradients of f at given points. The second category corresponds to the sample average approximation (SAA) methods, which sample in an exterior manner. These methods replace f in (3) by a sample average function to obtain the SAA problem, and then use a solution to the SAA problem as an estimate of a solution to the true problem. In this paper, we will focus on solutions obtained by SAA methods, and will use these solutions to develop asymptotically exact confidence regions for the true solutions. To formally define the SAA problem, let ξ 1,, ξ N be independent and identically distributed (i.i.d.) random variables with distribution same as that of ξ. Define the sample average function f N : O Ω R n by f N (x, ω) = N 1 The SAA problem is then to find x S O such that N i=1 F (x, ξ i (ω)). (4) f N (x, ω) N S (x). (5) For brevity, we will write f N (x, ω) as f N (x) when clear from the context. Also, we refer to a solution to (5) as an SAA solution, and a solution to (3) as a true solution. SAA methods are known to be consistent. That is, under certain regularity conditions, SAA solutions will almost surely converge to a true solution as the sample size N goes to, see Gürkan, Özge and Robinson [4], King and Rockafellar [5], and Shapiro, Dentcheva and Ruszczyński [18, Section 5.2.1]. Moreover, [5, Theorem 2.7] and [18, Section 5.2.2] obtained the asymptotic distribution of SAA solutions. These papers showed that the difference between the SAA solution and the true solution, normalized by N 1/2, weakly converges to a random vector, which is the image of a normally distributed random vector under a certain function. From a different viewpoint, Xu [2] showed that the probability for the distance between the SAA solution and the set of true solutions to exceed any given number ϵ is no more than ce Nβ, where c and β are parameters depending on ϵ. Namely, SAA solutions converge to the true solution in probability at an exponential rate. Above results say that SAA solutions provide good point estimators for the true solutions, for large N. It is of fundamental interest to use such point estimators along with the associated central limit theory to develop confidence regions of prescribed level of significance for the true solution. One can obtain an expression for confidence regions of the true solution, based on the asymptotic distribution of SAA solutions (see Demir [2]). However, such an expression is not directly usable for specifying confidence regions, because it contains a function that depends discontinuously on the true solution. Due to such discontinuity, it is problematic to replace the true solution in that expression by the SAA solution, as such a replacement may result in a region with probabilities quite different from the desired confidence level. To overcome this difficulty, we design a sequence of functions that depends on the N-sample SAA solution. Using certain large deviations probability estimates (see Theorem 4.2 and equation (24)), these functions evaluated at the SAA solutions are shown to converge, as N, to the above discontinuous function evaluated at the true solution (see Corollary 5.1). This enables us to build confidence regions of the true solution that are computable from the SAA solutions. Development of this method is based on a close examination of the geometric structure of variational inequalities and the limiting behavior of SAA solutions. We justify this method theoretically by establishing a precise limit theorem (Theorem 3.1), apply it to complementarity problems, and test it with a linear complementarity problem. The paper is organized as follows. Section 2 below introduces some background about variational inequalities. Next, Section 3 presents the main result of this paper. Following that, Section 4 summarizes probability results used in this paper, and Section 5 develops and justifies the main method. Section 6 then specializes the method to complementarity problems and implements it in a numerical example. The appendix contains proofs of two theorems. The method developed in this paper deals with situations in which the true problem (3) has a locally unique solution. If the true solution is not locally unique, then the current method will not work and new ideas will be needed. Throughout this paper, we use ri C to denote the relative interior of a convex set C. For a convex and closed set C R n and a point z R n, Π C (z) denotes the Euclidean projection of z onto C, namely the

S. Lu and A. Budhiraja: 3 point in C nearest to z in Euclidean norm. We use to denote the norm of an element in a normed space; unless explicitly stated otherwise, it can be any norm, as long as the same norm is used in all related contexts. We use N (, Σ) to denote a Normal random vector with covariance matrix Σ. Weak convergence of random variables Y n to Y will be denoted as Y n Y. 2. Preliminaries. This section contains preliminary results that form the foundation for subsequent developments. 2.1 The normal map and normal manifold. Throughout this paper, we will use the normal map formulation of a variational inequality. This subsection introduces this formulation and related concepts. Given the function f and the set S as defined in the introduction, the normal map induced by f and S is a function (f ) S : Π 1 S (O) Rn, defined as (f ) S (z) = f (Π S (z)) (z Π S (z)) (6) for each z Π 1 S (O), where we recall that Π S(z) denotes the Euclidean projection of a point z on S, and Π 1 S (O) is the set of points z Rn such that Π S (z) O. Suppose for now that x is a solution of (3), and define z = x f (x). It follows that Π S (z) = x and that z Π 1 S (O), (f ) S (z) =. (7) Conversely, if z satisfies (7), then x = Π S (z) satisfies x f (x) = z and solves (3). Thus, equation (7) is an equivalent formulation for (3). In the following, we introduce the concept of the normal manifold. For detailed discussion on this, see [8, 12, 16]. The set S is polyhedral convex by hypothesis, so it has finitely many faces. Let F be the collection of all of its nonempty faces. On the relative interior of each nonempty face F, the normal cone to S is a constant cone, which we denote by N S (ri F ). For each F F, define The sets C S (F ) satisfy the following properties. C S (F ) = F N S (ri F ). (8) (i) For each F F, the set C S (F ) is a polyhedral convex set of full dimension (i.e., of dimension n). (ii) For each F 1 F and F 2 F, the sets C S (F 1 ) and C S (F 2 ) intersect in a common face (possibly empty). (iii) F F C S (F ) = R n. The collection of all of these sets C S (F ) is called the normal manifold of S, and each C S (F ) is called an n-cell in this normal manifold [12]. (The symbol n here refers to the dimension of these sets.) A k-dimensional face of an n-cell is called a k-cell in the normal manifold. Any k-cell, k =,, n, is called a cell in the normal manifold. On each n-cell C S (F ), the Euclidean projector Π S coincides with the Euclidean projector onto the affine hull of F (the affine hull of F is the smallest affine set containing F ). The latter projector is an affine function. Consequently, Π S is a piecewise affine function on R n. The following example illustrates above concepts with a set in R 2. Example 2.1 Let n = 2, m = 3, and S = x R 2 x 1 x 2 1, x 1, x 2. The set S has 7 nonempty faces, including 3 vertices, 3 edges and itself. Its normal manifold contains seven 2-cells. The 2-cell corresponding to the edge joining (, ) and (, 1) is the set x R 2 x 1, x 2 1. On this cell, Π S coincides with the Euclidean projector onto the x 2 axis. 2.2 Sensitivity of solutions to variational inequalities. Sensitivity analysis techniques for variational inequalities play a key role in understanding the behavior of SAA solutions. Theorem 2.1 below deals with sensitivity of solutions of a parametric variational inequality. Before stating this theorem, we

4 S. Lu and A. Budhiraja: provide definitions of some related concepts. For more details and background we refer the reader to [1, 11]. Let X, Y and Z be normed linear spaces, and let U and V be open subsets of X and Y respectively. The injectivity modulus of a function g : U Z on U is defined to be g(x1 ) g(x 2 ) inf x 1 x 2 x 1 x 2, x 1, x 2 U. We say that g is B-differentiable at a point x U if there is a positively homogeneous function dg(x ) : X Z, such that g(x v) = g(x ) dg(x )v o(v). Recall that a function dg(x ) is positively homogeneous, if dg(x )(λv) = λdg(x )(v) for each nonnegative real number λ and each v X. If dg(x ) is a bounded linear function, then g is Fréchet differentiable at x and dg(x ) is the Fréchet derivative of g at x. A function h : U V Z is partially B-differentiable in x at (x, y ) U V, if the function h(, y ) is B-differentiable at x. The partial B-derivative is denoted by d x h(x, y ). We say the partial B-derivative d x h(x, y ) is strong, if for each ϵ > there exist neighborhoods U of x in X and V of y in Y such that h(x, y) h(x, y) d x h(x, y )(x x ) ϵ x x, whenever x and x belong to U and y belongs to V. A partial Fréchet derivative is strong if it satisfies the same condition as above. Theorem 2.1 below is adapted from Theorems 3 and 4 of [13]. Sets O and S in this theorem are as defined at the beginning of this paper, and the normal map L K induced by a function L and a set K is defined in the same way as (f ) S is in (6), with L and K in place of f and S respectively. The neighborhoods X, Y and Z constructed in this theorem may depend on λ, with larger values of λ allowing for larger sizes of these neighborhoods. Theorem 2.1 Let Θ be an open subset of a normed linear space P, and h be a function from O Θ to R n. Let y Θ and z R n, and define x = Π S (z ). Suppose that x O and that h(, y ) S (z ) =, where h(, y ) S denotes the normal map induced by h(, y ) and S. Assume that: (i) For some positive number θ and each x O, h(x, ) is Lipschitz on Θ with modulus θ. (ii) h has a strong partial Fréchet derivative in x at (x, y ), denoted by L = d x h(x, y ). (iii) The normal map L K, induced by L and the critical cone K = T S (x ) z x, is a homeomorphism from R n to R n. Then the normal map L K has a positive injectivity modulus δ on R n. Moreover, for each λ > δ 1 θ there exist neighborhoods X of x in O, Y of y in Θ and Z of z in R n, and a function z : Y R n, such that: (i) z(y ) = z. (ii) For each y Y, z(y) is the unique point in Z satisfying h(, y) S (z(y)) =, and x(y) = Π S (z(y)) is the unique point in S X satisfying h(x(y), y) N S (x(y)). (iii) z is Lipschitz on Y with modulus λ. Moreover, if h has a partial B-derivative d y h(x, y ) in y at (x, y ), then the functions z(y) and x(y) are B-differentiable at y with dz(y ) = (L K ) 1 [ d y h(x, y )] and dx(y ) = Π K dz(y ). 3. Main result. In this section we present the main result of this work. We begin by introducing the assumptions we make. In the rest of this paper, let X be a nonempty compact subset of O. We use C 1 (X, R n ) to denote the Banach space of continuously differentiable mappings f : X R n, equipped with the norm f 1,X = sup f(x) sup df(x). (9) x X x X

S. Lu and A. Budhiraja: 5 Our first set of assumptions is Assumption 3.1 below. It is used to guarantee that (see Theorem 4.1) the function x f (x) = E[F (x, ξ)] belongs to C 1 (X, R n ), that the function x f N (x, ω) defined in (4) belongs to C 1 (X, R n ) for almost every ω, and that f N f almost surely in C 1 (X, R n ). Assumption 3.1 (a) E F (x, ξ) 2 < for all x O. (b) The map x F (x, ξ(ω)) is continuously differentiable on O for a.e. ω Ω, and E df x (x, ξ) 2 < for all x O. (c) There exists a square integrable random variable C such that F (x, ξ(ω)) F (x, ξ(ω)) df (x, ξ(ω)) df (x, ξ(ω)) C(ω) x x, for all x, x O and a.e. ω Ω. The next assumption is crucial for establishing the solution uniqueness and stability needed for our development. There is a precise characterization, called the coherent orientation, for the normal map L K under consideration to be a global homeomorphism; see [9, 12, 16]. A sufficient condition for this to hold is the restriction of L on the linear span of K being positive definite. For example, if f is strongly monotone then L K is always a global homeomorphism. Assumption 3.2 Suppose that x solves the variational inequality (3) and that x belongs to the interior of X. Let z = x f (x ), L = df (x ), K = T S (x ) z x, and assume that the normal map L K induced by L and K is a homeomorphism from R n to R n. Lemma 3.1 below is an important consequence of Assumptions 3.1 and 3.2. It shows that the variational inequality (3) has a locally unique solution that is stable with respect to perturbations on f. Its proof is based on Theorem 2.1 and is given in Appendix A. Lemma 3.1 Under Assumptions 3.1 and 3.2 the normal map L K has a positive injectivity modulus δ on R n. Moreover, for each λ > δ 1 there exist neighborhoods X of x in X, Z of z in R n and Γ of f in C 1 (X, R n ), and a function z : Γ R n, such that: (i) z(f ) = z. (ii) For each f Γ, z(f) is the unique point in Z satisfying f( ) S (z(f)) =, and x(f) = Π S (z(f)) is the unique point in X satisfying f(x(f)) N S (x(f)). (iii) z is Lipschitz on Γ with modulus λ. Finally, the functions z and x are B-differentiable at f with and dz(f )(g) = (L K ) 1 [ g(x )] (1) dx(f ) = Π K dz(f ). We will make the following non-degeneracy assumption. Assumption 3.3 Let Σ denote the covariance matrix of F (x, ξ). Suppose that the determinant of Σ is strictly positive. Under above assumptions, f N converges to f almost surely in C 1 (X, R n ) as N, so f N belongs to Γ for N large enough almost surely. It follows that z N = z(f N ) is almost surely well defined for sufficiently large N. Using Assumptions 3.1 3.3 it can be shown (see Theorem 5.1) that NΣ 1/2 L K (z N z ) N (, I n ). It then follows that, for large N, N [ L K (z N z ) ] T [ Σ 1 LK (z N z ) ] (11) approximately follows the χ 2 distribution with n degrees of freedom, and consequently the set z R n N [ L K (z N z) ] T [ Σ 1 LK (z N z) ] χ 2 n(α) (12)

6 S. Lu and A. Budhiraja: defines an approximate (1 α)1% confidence region for z, where χ 2 n(α) is defined to be the number that satisfies P (U > χ 2 n(α)) = α for a χ 2 random variable U with n degrees of freedom. However, the expression in (12) is not directly computable. In order for it to be useful, we need to find a good approximation for the matrix Σ, and a good approximation for L K. It is natural to approximate Σ by the sample covariance matrix of F (x N, ξ i ) N i=1. Also, it is well known that the map L K is the same as the B-derivative of the normal map (f ) S, denoted as d(f ) S, evaluated at z. It is thus tempting to use d(f ) S (z N ) as an estimate for L K. However this is problematic since the function d(f ) S ( ) is not continuous. The main objective of this paper is to demonstrate how one can overcome such a difficulty and develop a method for computing asymptotically true confidence regions for z. Towards that we now introduce an additional assumption. Assumption 3.4 (a) For each t R n and x X, let M x (t) = E [ exp t, F (x, ξ) f (x) ] be the moment generating function of the random variable F (x, ξ) f (x). Assume (i) There exists ζ > such that M x (t) expζ 2 t 2 /2 for every x X and every t R n. (ii) There exists a nonnegative random variable κ such that for all x, x O and almost every ω Ω. F (x, ξ(ω)) F (x, ξ(ω)) κ(ω) x x (13) (iii) The moment generating function of κ is finite valued in a neighborhood of zero. (b) For each T R n n and x X, let M x (T ) = E [ exp T, d x F (x, ξ) df (x) ] be the moment generating function of the random variable d x F (x, ξ) df (x). Assume (i) There exists ς > such that M x (T ) expς 2 T 2 /2 for every x X and every T R n n. (ii) There exists a nonnegative random variable ν such that for all x, x O and almost every ω Ω. d x F (x, ξ(ω)) d x F (x, ξ(ω)) ν(ω) x x (iii) The moment generating function of ν is finite valued in a neighborhood of zero. Assumption 3.4(a)(i) is satisfied, for example, if one of the following conditions hold. (i) sup x X,ξ Ξ F (x, ξ) <. That is, F (x, ξ) is a bounded random variable uniformly in x. (ii) E[F i (x, ξ) (f (x)) i ] k = for each odd integer k and each i = 1,, n, and there exists a normal random variable Y with mean zero such that sup x X F (x, ξ(ω)) f (x) D Y, where we say U D V if V stochastically dominates U. For a more general discussion on situations under which Assumption 3.4(a)(i) is satisfied, see [6, Chapter 5]. The following is the main result of this work. The proof is given in Section 5.3. Let Σ N be the sample covariance matrix for F (x N, ξ i ) N i=1. Theorem 3.1 Suppose that Assumptions 3.1 3.4 hold. Then, for every N N, there is a function Φ N : Π 1 S (O) Rn Ω R n, such that, writing Φ N (z N (ω),, ω) as Φ N (z N )( ), 1/2 NΣ N Φ N(z N )(z N z ) N (, I n ). The function Φ N (z N )( ) can be explicitly evaluated given z N (ω) and df N (ω), and its precise definition is given in (44). As a result of this theorem, for each α (, 1) the set z R n N [ Φ N (z N )(z N z) ] T [ Σ 1 N ΦN (z N )(z N z)] χ 2 n(α) (14) defines an asymptotically exact (1 α)1% confidence region for z.

S. Lu and A. Budhiraja: 7 4. Probability background. This section summarizes some basic probability results that will be used in this paper. The first result is quite standard and we refer the reader to Theorems 7.44, 7.48 and 7.52 of [18] for its proof. Theorem 4.1 Let G : O Ξ R k be a measurable map such that G(x, ξ) is integrable for every x O. Define g(x) = E[G(x, ξ)] for x O. (a) Let V be a nonempty compact set of O. Suppose that x G(x, ξ(ω)) is continuous on V for a.e ω and sup x V G(x, ξ(ω)) Ψ 1 (ω), where Ψ 1 is an integrable random variable. Then g is continuous on V. Let g N (x, ω) = 1 N G(x, ξ i (ω)), x V, ω Ω. N Then for a.e. ω, g N (, ω) converges to g( ) uniformly on V. i=1 (b) Suppose that, for some integrable random variable Ψ 2, G(x, ξ(ω)) G(x, ξ(ω)) Ψ 2 (ω) x x, for all x, x O, and a.e. ω Ω. (15) Then g is Lipschitz on O with modulus bounded by EΨ 2. (c) Suppose that G(, ξ(ω)), in addition to satisfying the property in (b), is continuously differentiable on O for almost every ω Ω. Then g is continuously differentiable on O, with dg(x) = E[d x G(x, ξ)]. (16) Furthermore, for every nonempty compact subset V of O, g N converges to g a.s. in C 1 (V, R k ). We will apply parts of the above theorem with G replaced with F and d x F, where F is defined at the beginning of this paper. In particular the above theorem provides conditions that ensure almost sure uniform convergence of f N. The theorem below is about the rate of such convergence. Theorem 4.2 Suppose that F satisfies Assumption 3.1. (a) Suppose that F also satisfies Assumption 3.4 (a). Then there exist positive real numbers β, µ, M and σ, such that the following holds for each ϵ > and each N: Prob sup f N (x) f (x) ϵ β exp Nµ Mϵ x X n exp Nϵ2. (17) σ (b) Suppose additionally that F satisfies Assumption 3.4 (b). Then there exist positive real numbers β 1, µ 1, M 1 and σ 1, such that the following holds for each ϵ > and each N: Prob f N f 1,X ϵ β 1 exp Nµ 1 M 1 ϵ n exp Nϵ2. (18) Proof. (a) Applying [18, Theorem 7.67] (see also [19, Theroem 5.1]) to F i (x, ξ) for each i = 1,, n, we can obtain constants µ > and M >, such that the following holds for each i = 1,, n, each ϵ > and each N: Prob sup (f N (x) f (x)) i ϵ exp Nµ M x X ϵ n exp Nϵ2 32ζ 2, (19) where (f N (x) f (x)) i denotes the ith component of f N (x) f (x). Now for each ϵ > and each N, we have Prob sup f N (x) f (x) 1 ϵ x X n i=1 σ 1 Prob sup (f N (x) f (x)) i ϵ x X n n1 Mn ϵ n exp Nϵ2 32n 2 ζ 2, n exp Nµ

8 S. Lu and A. Budhiraja: where f N (x) f (x) 1 denotes the 1-norm of f N (x) f (x). For any other norm on R n, the inequality (17) still holds with proper choices of M, σ and β, since all norms are equivalent in R n. (b) By part (a), there exist positive real numbers β, µ, M and σ such that (17) holds for each ϵ > and each N. By Theorem 4.1, f is continuously differentiable on O with df (x) = E[d x F (x, ξ)] for each x O, and f N converges to f w.p. 1 as an element of C 1 (X, R k ). Now using Assumption 3.4(b) and arguments as for part (a) of the theorem to d x F (x, ξ), we can obtain positive real numbers β 2, µ 2, M 2 and σ 2 such that Prob sup df N (x) df (x) ϵ β 2 exp Nµ 2 M 2 x X ϵ n exp for each ϵ > and each N. The result follows on noting that Prob f N f 1,X ϵ Prob sup f N (x) f (x) ϵ/2 x X Nϵ2 σ 2 Prob sup df N (x) df (x) ϵ/2 x X., (2) The next key result is the following functional central limit theorem in the space C 1 (X, R n ). The proof is very similar to Corollary 7.17 of [1] which is for a setting where C 1 (X, R n ) is replaced by C(X, R n ) (the space of continuous functions from X to R n with the uniform metric). For completeness we sketch the proof in Appendix B. Theorem 4.3 (Functional central limit theorem) Let F satisfy Assumption 3.1. Then there exists a C 1 (X, R n ) valued random variable Y such that for each finite subset x 1, x 2,, x m of X, the random vector (Y (x 1 ), Y (x m )) has a multivariate normal distribution with zero mean and the same covariance matrix as that of (F (x 1, ξ),, F (x m, ξ)) and, as N, N(f N f ) converges in distribution, in C 1 (X, R n ), to Y. The following theorem is taken from [18, Theorem 7.59]; see a more general version in [15]. We say a map G from a Banach space B 1 to B 2 is Hadamard directionally differentiable at a point µ B 1, if for all directions v B 1 the directional derivative dg(µ)(v) exists, and satisfies the following equality, dg(µ)(v) = lim t,v v G(µ tv ) G(µ). t When G is Lipschitz continuous in a neighborhood of µ, Hadamard directional differentiability is equivalent to B-differentiability [17]. Theorem 4.4 (Functional delta theorem) Let B 1 and B 2 be Banach spaces. Let Y N be a sequence of random elements in B 1, τ N be a sequence of positive numbers converging to as N, and G be a map from B 1 to B 2. Suppose that the space B 1 is separable, the map G is Hadamard directionally differentiable at a point µ B 1, and the sequence τ N (Y N µ) converges in distribution to a random element Y of B 1. Then τ N [G(Y N ) G(µ)] converges in distribution to dg(µ)(y ), where dg(µ) denotes the directional derivative of G at µ, and τ N [G(Y N ) G(µ)] = dg(µ)(τ N [Y N µ]) o p (1). In the above result o p (1) denotes a B 2 valued random variable Z N such that Z N converges to zero in probability as N. 5. Methodology. This section discusses the asymptotic distribution of SAA solutions, and develops a method to build confidence regions for the true solution to an SVI. Instead of dealing with the SAA problem in its original form (5), we work with its normal map formulation (f N ) S (z) =. (21) Recall that, under Assumption 3.2, x solves the variational inequality (3), with z = x f (x ). It follows that (f ) S (z ) =.

S. Lu and A. Budhiraja: 9 5.1 Asymptotic distribution of SAA solutions. The following theorem describes the probabilistic behavior of SAA solutions. Theorem 5.1 Suppose that Assumptions 3.1 and 3.2 hold. Let δ be as in the statement of Lemma 3.1. Choose λ > δ 1, and let neighborhoods X, Z and Γ and the function z : Γ R n be as given in Lemma 3.1. For each ω Ω and integer N satisfying f N Γ, let z N = z(f N ) and x N = Π S (z N ), so that z N is the unique solution for (21) in Z, and x N is the unique solution for the variational inequality f N ( ) N S ( ) in X. For almost every ω Ω, there exists an integer N ω, such that z N and x N are well defined for each N N ω. Moreover, lim N z N = z almost surely, N(zN z ) (L K ) 1 [ Y (x )], (22) where Y is as in the statement of Theorem 4.3, and Nd(f ) S (z )(z N z ) Y (x ). (23) Choose a positive real number ϵ, such that each f C 1 (X, R n ) satisfying f f 1,X < ϵ λ belongs to Γ. Suppose in addition that Assumption 3.4 holds. Then there exist positive real numbers β, µ, M and σ, such that the following holds for each ϵ (, ϵ ] and each N: Prob x N x < ϵ Prob z N z < ϵ 1 β exp Nµ M ϵ n exp Nϵ2. Proof. According to Lemma 3.1, for each f belonging to the neighborhood Γ, z(f) is the unique point in Z satisfying f( ) S (z(f)) =, and x(f) = Π S (z(f)) is the unique point in X satisfying f( )N S ( ). Moreover, the function z is B-differentiable at f with its B-derivative given by (1), and is Lipschitz on Γ with modulus λ. According to Theorem 4.1, f N converges to f w.p. 1 as an element of C 1 (X, R n ). Consequently, for almost every ω Ω, there exists an integer N ω, such that for each N N ω the function f N belongs to Γ. In addition, the Lipschitz continuity of z ensures that lim N z N = z w.p. 1. From Theorem 4.3 N(f N f ) converges in distribution in C 1 (X, R n ) to Y. Let Γ 1 be a closed subset of Γ such that f belongs to the interior of Γ 1. Then z : Γ 1 R n is a Lipschitz map which is B-differentiable at z. From the remark above Theorem 4.4, z is Hadamard directionally differentiable at z. Define f N (ω) = f N (ω)1 fn (ω) Γ 1 f 1 fn (ω) Γ 1, ω Ω. Then N( f N f ) converges in distribution in C 1 (X, R n ) to Y. By Theorem 4.4, N(z( fn ) z(f )) dz(f )(Y ) = L 1 K [ Y (x )]. Now since z( f N ) = z N for all N large enough, a.s., we have (22). It was shown in [13] that L K is exactly the B-derivative of the normal map (f ) S at z, denoted by d(f ) S (z ). Applying the operator d(f ) S (z ) on both sides of (22) yields (23). Finally, from Theorem 4.2, there exist positive constants β 1, µ 1, M 1 and σ 1 such that (18) holds for each ϵ > and each N. The way we chose ϵ implies that, for each f C 1 (X, R n ) satisfying f f 1,X < ϵ λ, z(f) is well defined and satisfies σ (24) Π S (z(f)) x z(f) z λ f f 1,X, (25) where the first inequality follows from the fact that Π S is a non-expansive mapping, and the second follows from the Lipschitiz continuity of z. Accordingly, for each real number ϵ (, ϵ ] and each integer N, we have Prob x N x < ϵ Prob z N z < ϵ Prob f N f 1,X < ϵ/λ 1 β 1 exp Nµ 1 M 1λ n ϵ n exp Nϵ2 σ 1 λ 2, where the first and second inequalities follow from (25), and the third follows from (18). The inequality (24) follows by letting β = β 1, µ = µ 1, M = M 1 λ n, σ = σ 1 λ 2.

1 S. Lu and A. Budhiraja: Note that the definition of the function z in Lemma 3.1 ensures that Π S (z(f)) X X, whenever z(f) is defined. Thus, x N belongs to X whenever it is well defined. For a given N, z N and x N may not be well defined at some ω Ω. The expression Prob x N x < ϵ in (24) is a short version of Prob x N x < ϵ and x N is well defined, and similarly for Prob z N z < ϵ. We use similar conventions in the rest of this paper for brevity. 5.2 B-derivatives of the Euclidean projector and normal maps. As noted in Section 3, in order to obtain computable confidence regions for z we need to suitably approximate d(f ) S (z ). To understand properties of B-derivatives of the normal maps (f ) S and (f N ) S, we first need to investigate properties of B-derivatives of the Euclidean projector Π S. As discussed in Section 2.1, Π S is a piecewise affine function, so it is B-differentiable on R n [16, Chapter 2.2]. Moreover, if we define the critical cone to S at a point z R n as K(z) = T S (Π S (z)) z Π S (z), (26) then we have dπ S (z)(h) = Π K(z) (h) (27) for each h R n ; see [11, Corollary 4.5]. Theorem 5.2 below will show that K(z) is the same cone, for all points z belonging to the relative interior of a given face of a given n-cell in the normal manifold of S. To develop this theorem we need the following lemma, which provides a characterization for faces of n-cells. Lemma 5.1 Let F be a nonempty face of S, and let N = N S (ri F ) and C = C S (F ). Then, a nonempty set B in R n is a face of C, if and only if there is a nonempty face E of F, and a nonempty face M of N, such that B = E M. Proof. It was shown in [12, Proposition 2.1] that par F = (aff N), where aff N is the affine hull of N, and par F is the subspace parallel to aff F. Since C = F N, for any point z in C there is a unique decomposition z = z F z N with z F F and z N N. Points z F and z N are affine functions of z. Now, let E be a nonempty face of F and M be a nonempty face of N. We need to prove that E M is a face of C. To this end, let c E M, and suppose that c 1, c 2 C satisfy c 1 c 2 = 2c. It follows that c F 1 c F 2 = 2c F and c N 1 c N 2 = 2c N. The facts that c F E and that E is a face of F imply c F 1, c F 2 E. Similarly, c N 1, c N 2 M. Consequently, c 1, c 2 E M. This proves that E M is a face of C. Conversely, let B be a face of C, and define E = z F z B and M = z N z B. We now prove that E is a face of F. Let e E, and choose z B such that e = z F. Suppose f 1, f 2 F satisfy f 1 f 2 = 2e. Define c 1 = f 1 z N and c 2 = f 2 z N. We have c 1, c 2 C and c 1 c 2 = 2z. The fact that B is a face of C implies c 1, c 2 B. But f 1 = c F 1 and f 2 = c F 2, so f 1, f 2 E. This proves that E is a face of F. A similar argument proves that M is a face of N. It remains to prove B = E M. It is clear that B E M. For the other direction, let z 1, z 2 B and define z = z1 F z2 N. It suffices to show z B. Write ẑ = z1 N z2 F, and note that z 1 z 2 = z ẑ. Because z1z2 2 B, z, ẑ C, and B is a face of C, we have z B. This proves B = E M. Theorem 5.2 below gives an explicit expression for K(z) for points z belonging to the relative interior of a face B of some n-cell C. Recall that F denotes the collection of all nonempty faces of S. Let I = I(x) x S be the family of all active index sets. There is a well known one-to-one correspondence between F and I given as follows. First, for each I I consider the set F (I) = x R n a i, x = b i, i I, a i, x b i, i 1,, m \ I, (28) which is a nonempty face of S. The active index set for any point belonging to ri F (I) is exactly I. Conversely, for each nonempty face F of S there is a unique I I such that F = F (I). For detailed discussion on this, see [7, Section 1] or [16, Chapter 2].

S. Lu and A. Budhiraja: 11 Let I I. It was shown in [16, Lemma 2.4.2] (see also [7, Proposition 1]) that each nonempty face of F (I) is of the form F (J), with J I and J I. The normal cone to S on ri F (I) is given by N S (ri F (I)) = posa i, i I = i I τ i a i τ i, i I, and each nonempty face of N S (ri F (I)) is of the form N S (ri F (J)), with J I and J I. Theorem 5.2 Let I I, F = F (I), N = N S (ri F ) and C = C S (F ). Let E be a nonempty face of F, M be a nonempty face of N, and B = E M. Let I, I I be such that E = F (I ) and M = N S (ri F (I )). For each z ri B, K(z) = u R n a i, u =, i I, a i, u, i I \ I. (29) Proof. By [14, Corollary 6.6.2], we have ri B = ri E ri M. Let x ri E and y ri M satisfy z = x y. Then y N S (x), x = Π S (z) and K(z) = T S (x) y. We have and ri E = x R n a i, x = b i, i I, a i, x < b i, i 1,, m \ I, ri M = ri posa i, i I = The fact x ri E implies I(x) = I, so i I τ ia i τ i >, i I if I, if I =. T S (x) = u R n a i, u, i I. Because y ri M, there exists τ i > for each i I such that y = i I τ i a i. Note that I I I, so that for each i I the vector a i belongs to the polar cone of T S (x). Thus, a vector u T S (x) satisfies y, u = if and only if a i, u = for each i I. Equation (29) follows. The right hand side of (29) depends on I and I, but not z. Thus, K(z) is the same for all z ri B. It then follows from (27) that dπ S (z)( ) is the same function for all such z. Because K(z) is a polyhedral convex cone, dπ S (z)( ) is a piecewise linear function. The way B is defined in Theorem 5.2 implies that it can be any k-cell of the normal manifold of S, for k =,, n. For each k =,, n, we write the k-cells of the normal manifold of S as Ci k, i = 1,, j(k), where j(k) denotes the number of k-cells. For all z belonging to the relative interior of a given Ci k, dπ S(z) is the same function, which we denote by Ψ k i. Thus, for each k =,, n and i = 1,, j(k), Ψk i is a function from R n to R n, with Ψ k i ( ) = dπ S (z)( ) if z belongs to the relative interior of C k i. (3) The next proposition shows that, for each point z R n, there is exactly one cell in the normal manifold of S that contains z in its relative interior. This cell is also the smallest among all cells containing z. Proposition 5.1 Let z R n. There is a unique cell, C q j, in the normal manifold of S, such that z ri C q j. Moreover, for any other cell Ck r that contains z, one has C q j Ck r and q < k. Proof. According to item (iii) below (8), the union of all n-cells in the normal manifold of S is R n. Thus, z belongs to at least one n-cell. Without loss of generality, suppose that z belongs to C1 n,, Cl n. The intersection of these n-cells, l i=1 Cn i, is a common face of theirs. Let F be the unique face of l i=1 Cn i such that z ri F. The set F is also a common face of C1 n,, Cl n, and is therefore the unique face of each Ci n, i = 1,, l, that contains z in its relative interior. Let Cq j = F. This proves the existence and uniqueness of a cell C q j such that z ri Cq j. Let Cr k be another cell containing z. Then Cr k is a face of one of the n-cells C1 n,, Cl n. Suppose it is a face of C1 n. Because C q j is a face of Cn 1 with z ri C q j, we have Cq j Ck r and q < k. Example 2.1 continued. The set S in Example 2.1 has nine 1-cells (edges of the 2-cells), and three -cells (vertices). So j(2) = 7, j(1) = 9, j() = 3. When z belongs to the interior of any of the 2-cells, dπ S (z)( ) is a linear function. For example, when z belongs to the interior of the 2-cell x R 2 x 1, x 2 1, dπ S (z)(h) = (, h 2 ) for any h R 2. When z belongs to the relative

12 S. Lu and A. Budhiraja: interior of any of the 1-cells, dπ S (z)( ) is a piecewise linear function with two pieces. For example, when h lies in the relative interior of the edge connecting (, ) and (, 1), we have h, if h1, dπ S (z)(h) = (, h 2 ), if h 1. When z is any of the three vertices, dπ S (z)( ) is a piecewise linear function with four pieces. As is shown by the example, dπ S (z)(h) is continuous with respect to h for a fixed z. However, on the boundaries of the n-cells, dπ S (z) is discontinuous with respect to z, due to the dramatic change of the structure of K(z). Now, suppose that Assumption 3.1 holds, so that f and almost every f N are differentiable on X. By the definition of the normal map (f ) S in (6) and the chain rule of B-differentiability (see, e.g., [16, Theorem 3.1.1]), the function (f ) S ( ) is B-differentiable on Π 1 S (O), with d(f ) S (z)(h) = df (Π S (z))(dπ S (z)(h)) h dπ S (z)(h) (31) = df (Π S (z))(π K(z) (h)) h Π K(z) (h) for each z Π 1 S (O) and each h Rn. Similarly, for almost every ω Ω, the function (f N ) S ( ) is B-differentiable on Π 1 S (O), with d(f N ) S (z)(h) = df N (Π S (z))(dπ S (z)(h)) h dπ S (z)(h) (32) = df N (Π S (z))(π K(z) (h)) h Π K(z) (h) for each z Π 1 S (O) and each h Rn. It follows that, the functions d(f ) S (z)(h) and d(f N ) S (z)(h) are both continuous with respect to h for fixed z, but they are discontinuous with respect to z on the boundaries of the n-cells. 5.3 Approximations of the B-derivatives. This subsection constructs functions that are computable from the SAA solutions, and shows how to use these functions to approximate dπ S (z ) and d(f ) S (z ). First, for each k =,, n and i = 1,, j(k), define a function d k i : Rn R by Next, let g be a map from N to R, such that (i) g(n) > for each N N. d k i (z) = d(z, Ci k ) = min z x. (33) x Ci k (ii) lim N g(n) =. N (iii) lim N g(n) =. 2 (iv) lim N g(n) n N exp θ (g(n)) = for θ 2 = min 1 1 1 4σ, 4σ 1, 4σ (E(C)), where σ 2 1, σ are as in the statements of Theorems 4.2 and 5.1 respectively and C is as in Assumption 3.1. (v) lim N N n/2 g(n) n exp θg(n) 2 = for each positive real number θ. Note that g(n) = N p for any p (, 1/2) satisfies (i) (v) above. Clearly, positive linear combinations of such functions also satisfy all the above conditions. Another example is g(n) = N 1/2 m log N, where m > n 2θ. Finally, for each N N, define a function Λ N : R n R n R n by n j(k) [ k= i=1 1/g(N) min(d k Λ N (z)(h) = i (z), 1/g(N)) ] k Ψ k i (h) n j(k) [ k= i=1 1/g(N) min(d k i (z), 1/g(N)) ] for each z R n and h R n, (34) k where Ψ k i (h) is as in (3). Lemma 5.2 below shows that Λ N is jointly continuous with respect to (z, h). Following that, Theorem 5.3 shows that Λ N (z N ) provides a nice approximation for dπ S (z ). We briefly explain the motivation for defining Λ N (z N ) as above. Recall that dπ S (z ) is represented by one of the functions Ψ k i. The cell corresponding to this particular Ψk i has the smallest dimension among all cells containing z. We define Λ N (z N ) as a weighted sum of all Ψ k i, in such a way that the weight of the one that represents dπ S (z ) becomes dominant as z N approaches z at an exponential rate. The exponential convergence rate is crucial here. Since dπ S ( ) is discontinuous and each Λ N ( ) is continuous, one does not expect the convergence of Λ N (z N ) to dπ S (z ) for an arbitrary sequence z N converging to z.

S. Lu and A. Budhiraja: 13 Lemma 5.2 For each N N, Λ N is well defined and continuous on R n R n. Proof. Each z R n belongs to at least one cell Ci k in the normal manifold, and for such k and i, d k i (z) =. Thus, the denominator in (34) is positive for each N, z, h, and Λ N is well defined everywhere. For each k =,, n and each i = 1,, j(k), the functions h Ψ k i (h) and z dk i (z) are continuous on R n, so Λ N is continuous on R n R n. Theorem 5.3 Suppose that Assumptions 3.1 and 3.2 hold. For each N N, let Λ N be as defined in (34). Let γ > be the minimum of d k i (z ) among all of the cells Ci k such that z Ci k. Then, there exists a positive real number κ and an integer N, such that for each N N, Λ N (z N )(h) dπ S (z )(h) Prob sup < κ h R n h g(n) (35) Prob z N z < γ /2 Prob z N z < 1/(2g(N)) 1. If Assumption 3.4 holds additionally, then [ lim Prob Λ N (z N )(h) dπ S (z )(h) sup < κ ] N h R n h g(n) Proof. Suppose that z belongs to cells C k(1) i(1),, Ck(q) i(q) z ri C k(q) i(q). It follows from Proposition 5.1 that C k(q) i(q) and from the definition of Ψ k i Ck(j) i(j) in (3) that and k(q) < k(j) for each j = 1,, q 1, d(π S )(z )(h) = Ψ k(q) i(q) (h) for each h Rn. = 1. (36) in the normal manifold of S, and that The definition of g ensures that lim N g(n) =, so there exists an integer N, such that g(n) max(2/γ, 1) for each N N. For each cell C k i with z Ci k, we have d k i (z N ) d k i (z ) z N z γ z N z. Consequently, for each N N, Prob 1/g(N) min(d ki (z N ), 1/g(N)) = for all C ki not containing z Prob z N z < γ /2. Next, consider a cell that contains z, that is, one of the cells C k(j) i(j) for j = 1,, q. The following inequality holds for each N, d k(j) i(j) (z N) = d(z N, C k(j) i(j) ) z N z, which implies, for each N N, Prob 1/g(N) min(d k(j) i(j) (z N ), 1/g(N)) > 1/(2g(N)) We then have Prob 1/g(N) min( z N z, 1/g(N)) > 1/(2g(N)) = Prob z N z < 1/(2g(N)). q [ ] k(j) Prob 1/g(N) min(d k(j) i(j) (z k(q) N ), 1/g(N)) > [1/(2g(N))] j=1 [ ] k(q) k(q) Prob 1/g(N) min(d k(q) i(q) (z N ), 1/g(N)) > [1/(2g(N))] = Prob 1/g(N) min(d k(q) i(q) (z N ), 1/g(N)) > 1/(2g(N)) Prob z N z < 1/(2g(N)). (37) (38) (39)

14 S. Lu and A. Budhiraja: It is clear that k(q) n. Proposition 5.1 implies that k(q) k(j) 1 for each j = 1,, q 1. Hence, for each j = 1,, q 1 and each N N, we have Consequently, for each N N, [1/g(N)] k(j) / [1/(2g(N))] k(q) 2 n g(n) 1. [ 1/g(N) min(d k(j) i(j) (z N), 1/g(N)) Prob ] k(p) < 2 q p=1 [1/g(N) n g(n) 1 for all j = 1,, q 1 min(d k(p) i(p) (z N ), 1/g(N)) q [ ] k(p) Prob 1/g(N) min(d k(p) i(p) (z k(q) N), 1/g(N)) > [1/(2g(N))] p=1 Prob z N z < 1/(2g(N)), where the second inequality follows from (39). ] k(j) We are now ready to prove (35). For each k =,, n and i = 1,, j(k), Ψ k i defined in (3) is a piecewise linear function, so its norm defined as Ψ k Ψ k i i = sup (h) h R n h is a finite number. Let η be the maximum of Ψ k i among all k and i. For each h Rn, we have Λ N (z N )(h) dπ S (z )(h) = Λ N (z N )(h) Ψ k(q) i(q) (h) n j(k) [ k= i=1 1/g(N) min(d k = i (z N ), 1/g(N)) ] k Ψ k i (h) n j(k) [ k= i=1 1/g(N) min(d k i (z N ), 1/g(N)) ] Ψ k(q) k For each N N, it follows from (37) that Prob Λ N (z N )(h) dπ S (z )(h) i(q) (h) ] k(j) q j=1 [1/g(N) min(d k(j) i(j) (z k(j) N ), 1/g(N)) Ψ i(j) (h) = ] k(j) Ψ q j=1 [1/g(N) k(q) min(d k(j) i(j) (z i(q) (h) for all h R n N ), 1/g(N)) term (a) Prob z N z < γ /2. For each h R n, term (a) above is bounded from above by [ ] k(j) q 1 1/g(N) min(d k(j) i(j) (z k(j) N ), 1/g(N)) Ψ i(j) (h) ] k(j) q j=1 j=1 [1/g(N) min(d k(j) i(j) (z N ), 1/g(N)) [ ] k(q) 1/g(N) min(d k(q) i(q) (z N), 1/g(N)) ] k(j) 1 q j=1 [1/g(N) min(d k(j) i(j) (z N), 1/g(N)) Ψk(q) i(q) (h) which is in turn bounded from above by [ ] k(j) q 1 1/g(N) min(d k(j) i(j) (z N ), 1/g(N)) 2η h ] k(j). q j=1 j=1 [1/g(N) min(d k(j) i(j) (z N ), 1/g(N)) term (b) From (4) we have, for each N N, Prob term (b) < 2 n1 η(q 1)g(N) 1 h for all h R n Prob z N z < 1/(2g(N)).., (4) (41) (42) (43)

S. Lu and A. Budhiraja: 15 The two inequalities (42) and (43) and the fact that term (a) is always less than or equal to term (b) imply Prob Λ N (z N )(h) dπ S (z )(h) < 2 n1 η(q 1)g(N) 1 h for all h R n Prob z N z < γ /2 Prob z N z < 1/(2g(N)) 1 for each N N. Letting κ = 2 n1 η(q 1) proves (35) for each N N. If Assumption 3.4 holds additionally, then by Theorem 5.1 there exist positive constants ϵ, β, µ, M 1 and σ, such that (24) holds for each ϵ (, ϵ ] and each N. For N large enough to satisfy 2g(N) ϵ, we have Prob z N z < 1/(2g(N)) 1 β exp Nµ 2 n M g(n) n N exp 4σ g(n) 2. The right hand side of the above inequality converges to 1 as N by the definition of g(n). Clearly, Prob z N z < γ /2 also converges to 1 as N. This proves (36) from (35). Theorem 5.3 above proves that the probability for the norm of the piecewise linear function Λ N (z N ) dπ S (z ) to be below κ/g(n) converges to 1 as N goes to infinity. Since g(n) as N, this in particular says that the norm of Λ N (z N ) dπ S (z ) converges to in probability, consequently, Λ N (z N ) is a good approximation for dπ S (z ) for N large. Next, we define for each N N a function Φ N : Π 1 S (O) Rn Ω R n by Φ N (z, h, ω) = df N (Π S (z))(λ N (z)(h)) h Λ N (z)(h). (44) For convenience we will write Φ N (z N (ω), h, ω) as Φ N (z N )(h). The corollary below shows that Φ N (z N ) provides a good approximation for d(f ) S (z ). Corollary 5.1 Suppose that Assumptions 3.1, 3.2 and 3.4 hold. For each N N, let Φ N be as defined in (44). Then, there exists a positive real number ϕ, such that [ lim Prob Φ N (z N )(h) d(f ) S (z )(h) sup < ϕ ] = 1. (45) N h R n h g(n) Proof. From Theorem 4.1, f belongs to C 1 (X, R n ) and df is Lipschitz with modulus bounded by E[C]. Also, for almost every ω the function f N belongs to C 1 (X, R n ) for each N N. Next, by Theorem 4.2, there exist positive real numbers β 1, µ 1, M 1 and σ 1 such that (18) holds for each ϵ > and each N. Now let positive constants λ, ϵ, β, µ, M and σ, neighborhoods X, Z and Γ, and the function z be as defined in Theorem 5.1; recall that (24) holds for each ϵ (, ϵ ] and each N. Note that from Theorem 5.3 there exists κ > such that (36) holds. For each h R n, it follows from (31) and (44) that Φ N (z N )(h) d(f ) S (z )(h) = [df N (Π S (z N ))(Λ N (z N )(h)) Λ N (z N )(h)] Recall the notation x N = Π S (z N ) and x = Π S (z ). Then Φ N (z N )(h) d(f ) S (z )(h) [df (Π S (z ))(dπ S (z )(h)) dπ S (z )(h)]. df N (x N )(Λ N (z N )(h)) df (x )(dπ S (z )(h)) Λ N (z N )(h) dπ S (z )(h). (46) term (a) term (b) Term (a) above is bounded from above by df N (x N )(Λ N (z N )(h) dπ S (z )(h)) term (c) df N (x N )(dπ S (z )(h)) df (x )(dπ S (z )(h)). term (d) First, we examine term (c). For each h R n, we have term (c) df N (x N ) Λ N (z N )(h) dπ S (z )(h). (47)

16 S. Lu and A. Budhiraja: Define ϵ 1 = min (2E[C]ϵ, 2ϵ /λ). For each ϵ (, ϵ 1 ], we have Prob df N (x N ) df (x ) < ϵ Prob df N (x N ) df (x N ) < ϵ/2 Prob df (x N ) df (x ) < ϵ/2 1 Prob f N f 1,X < ϵ/2 Prob x N x < ϵ 1. 2E[C] Replacing ϵ with 1/g(N) in the above inequality, applying (18) and (24), and taking limits, we have lim Prob df N (x N ) df (x ) < 1/g(N) = 1. (48) N It follows from (48) and (36) that lim Prob term (c) κ sup < ( df (x ) ϵ) = 1. (49) N h R n h g(n) for each ϵ >. Next, we examine term (d). For each h R n, we have term (d) df N (x N ) df (x ) dπ S (z )(h) df N (x N ) df (x ) dπ S (z ) h, where dπ S (z ) is defined in the same way as Ψ k i is in (41). The inequality above and (48) imply lim Prob term (d) sup < dπ S(z ) = 1. (5) N h R n h g(n) Putting (36), (49) and (5) together proves (45). Corollary 5.2 Suppose that Assumptions 3.1, 3.2 and 3.4 hold. Then NΦN (z N )(z N z ) N (, Σ ). (51) If Assumption 3.3 holds additionally, then NΣ 1/2 Φ N (z N )(z N z ) N (, I n ). (52) Proof. Since (52) follows from (51) easily when Assumption 3.3 holds, it suffices to prove (51) under Assumptions 3.1, 3.2 and 3.4. Under these assumptions, we have by (23), Nd(f ) S (z )(z N z ) N (, Σ ). To prove (51) it suffices to prove for each ϵ >. For brevity, we write lim Prob N d(f ) S (z )(z N z ) Φ N (z N )(z N z ) > ϵ = (53) N Φ N (z N )(h) d(f ) S (z )(h) V N = sup. h R n h By Corollary 5.1, there exists a positive real number ϕ such that We have lim ProbV N ϕ =. (54) N g(n) Prob N d(f ) S (z )(z N z ) Φ N (z N )(z N z ) > ϵ Prob NV N z N z > ϵ Prob V N > Prob ϕ g(n) z N z > ϵg(n) ϕ. N It follows from (24) that there exist positive real numbers ϵ, β, µ, M and σ such that Prob z N z > ϵg(n) ϕ β exp Nµ M ϕ n N n/2 N ϵ n g(n) n exp ϵ2 g(n) 2 σ ϕ 2 (55) (56)

S. Lu and A. Budhiraja: 17 as long as ϵg(n) ϕ N ϵ. (57) From property (iii) of g given in Section 5.3 it follows that the inequality (57) holds for N sufficiently large, and from property (v) the right hand side of (56) converges to zero as N goes to. Thus we have lim Prob z N z > ϵg(n) N ϕ =. N The above equality, combined with (54) and (55), implies (53). We can now complete the proof of our main result, Theorem 3.1. Proof of Theorem 3.1. In view of Corollary 5.2, it suffices to prove Define a function Θ : O Ξ R n n by lim Prob Σ N Σ > ϵ = for each ϵ >. (58) N Θ(x, z) = F (x, z)f (x, z) T, let θ (x) = E[Θ(x, ξ)], and for each N N define the sample average function as θ N (x, ω) = 1 N N F (x, ξ i (ω))f (x, ξ i (ω)) T = 1 N i=1 Let x N = Π S (z N ). The definitions of Σ N and Σ imply and Σ N = 1 N 1 = 1 N 1 = N N 1 i=1 N Θ(x, ξ i (ω)). i=1 N (F (x N, ξ i ) f N (x N ))(F (x N, ξ i ) f N (x N )) T i=1 N F (x N, ξ i )F (x N, ξ i ) T N N 1 f N(x N )f N (x N ) T ( θn (x N ) f N (x N )f N (x N ) T ) Σ = E [ (F (x, ξ) f (x ))(F (x, ξ) f (x )) T ] =E [ F (x, ξ)f (x, ξ) T ] f (x )f (x ) T = θ (x ) f (x )f (x ) T. From (24), x N converges to x in probability. Applying Theorem 4.1(a) with G replaced with F F T, we see that θ N converges uniformly in C(X, R n n ) to θ. Consequently, θ N (x N ) converges in probability to θ (x ). Similarly applying Theorem 4.1(a) to F we see that, as N, f N (x N ) f (x ) in probability. Combining the above results we have (58) and the result follows. 6. Application to complementarity problems. This section applies techniques developed in Section 5 to build confidence regions for stochastic complementarity problems, which are an important class of stochastic variational inequalities in which the set S in (3) equals R p R n p. The general formulation of a stochastic complementarity problem is f (x) N R p Rn p(x), (59) where f is as defined at the beginning of this paper. Section 6.1 below utilizes the special structure of R p R n p to simplify the formulas for computing Λ N and Φ N. Following that, Section 6.2 presents a numerical example to demonstrate the methodology and illustrate the results. 6.1 Specialization of general formulas. Given the special structure of R p R n p, its normal manifold consists of a total of 3 p cells. Each cell is characterized by a unique partition of the index set 1,, p as the union of three disjoint subsets I, I, I. More specifically, we define the following family P = (I, I, I ) I I I = 1,, p,. I I = I I = I I =