Large Deviations for Stochastic Processes. Jin Feng Thomas G. Kurtz

Similar documents
Weak convergence and large deviation theory

LARGE DEVIATIONS FOR STOCHASTIC PROCESSES

for all f satisfying E[ f(x) ] <.

A Note On Large Deviation Theory and Beyond

A Concise Course on Stochastic Partial Differential Equations

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Lecture 5: Importance sampling and Hamilton-Jacobi equations

{σ x >t}p x. (σ x >t)=e at.

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

The Pedestrian s Guide to Local Time

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

Invariance Principle for Variable Speed Random Walks on Trees

ELEMENTS OF PROBABILITY THEORY

Overview of normed linear spaces

Feller Processes and Semigroups

Martingale Problems. Abhay G. Bhatt Theoretical Statistics and Mathematics Unit Indian Statistical Institute, Delhi

Convergence of Feller Processes

Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Some SDEs with distributional drift Part I : General calculus. Flandoli, Franco; Russo, Francesco; Wolf, Jochen

Introduction to Random Diffusions

Large Deviations Techniques and Applications

Green s Functions and Distributions

Mathematical Analysis Outline. William G. Faris

An almost sure invariance principle for additive functionals of Markov chains

1 Math 241A-B Homework Problem List for F2015 and W2016

Fast-slow systems with chaotic noise

16 1 Basic Facts from Functional Analysis and Banach Lattices

Continuous dependence estimates for the ergodic problem with an application to homogenization

Metric Spaces and Topology

Proofs of the martingale FCLT

Some Properties of NSFDEs

P(E t, Ω)dt, (2) 4t has an advantage with respect. to the compactly supported mollifiers, i.e., the function W (t)f satisfies a semigroup law:

Math The Laplacian. 1 Green s Identities, Fundamental Solution

WEYL S LEMMA, ONE OF MANY. Daniel W. Stroock

Eigenvalues and Eigenfunctions of the Laplacian

Homogenization for chaotic dynamical systems

Walsh Diffusions. Andrey Sarantsev. March 27, University of California, Santa Barbara. Andrey Sarantsev University of Washington, Seattle 1 / 1

1 Weak Convergence in R k

Large Deviations from the Hydrodynamic Limit for a System with Nearest Neighbor Interactions

Convergence of Markov Processes. Amanda Turner University of Cambridge

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers.

THEOREMS, ETC., FOR MATH 515

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Random and Deterministic perturbations of dynamical systems. Leonid Koralov

A Brief Introduction to Functional Analysis

REAL AND COMPLEX ANALYSIS

Large Deviations Principles for McKean-Vlasov SDEs, Skeletons, Supports and the law of iterated logarithm

Stochastic Processes. Winter Term Paolo Di Tella Technische Universität Dresden Institut für Stochastik

Hypoelliptic multiscale Langevin diffusions and Slow fast stochastic reaction diffusion equations.

A LOCALIZATION PROPERTY AT THE BOUNDARY FOR MONGE-AMPERE EQUATION

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations.

Lévy Processes and Infinitely Divisible Measures in the Dual of afebruary Nuclear2017 Space 1 / 32

Hardy-Stein identity and Square functions

Dynamical systems with Gaussian and Levy noise: analytical and stochastic approaches

THEOREMS, ETC., FOR MATH 516

OPTIMAL SOLUTIONS TO STOCHASTIC DIFFERENTIAL INCLUSIONS

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

The concentration of a drug in blood. Exponential decay. Different realizations. Exponential decay with noise. dc(t) dt.

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Immerse Metric Space Homework

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

p 1 ( Y p dp) 1/p ( X p dp) 1 1 p

On semilinear elliptic equations with measure data

Lecture 12. F o s, (1.1) F t := s>t

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

Partial Differential Equations

1 Continuity Classes C m (Ω)

Spectral theory for compact operators on Banach spaces

Markov Chain Approximation of Pure Jump Processes. Nikola Sandrić (joint work with Ante Mimica and René L. Schilling) University of Zagreb

Stability of Stochastic Differential Equations

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis

Empirical Processes: General Weak Convergence Theory

Weak solutions of mean-field stochastic differential equations

A Note on the Central Limit Theorem for a Class of Linear Systems 1

t y n (s) ds. t y(s) ds, x(t) = x(0) +

Exercises in stochastic analysis

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm

Real Analysis Problems

9.2 Branching random walk and branching Brownian motions

1. Stochastic Processes and filtrations

Richard F. Bass Krzysztof Burdzy University of Washington

A Lévy-Fokker-Planck equation: entropies and convergence to equilibrium

Harnack Inequalities and Applications for Stochastic Equations

Elliptic Operators with Unbounded Coefficients

An introduction to Mathematical Theory of Control

A REPRESENTATION FOR THE KANTOROVICH RUBINSTEIN DISTANCE DEFINED BY THE CAMERON MARTIN NORM OF A GAUSSIAN MEASURE ON A BANACH SPACE

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

PUTNAM PROBLEMS DIFFERENTIAL EQUATIONS. First Order Equations. p(x)dx)) = q(x) exp(

Continuous Functions on Metric Spaces

Uniformly Uniformly-ergodic Markov chains and BSDEs

Large Deviations for Small-Noise Stochastic Differential Equations

Stochastic optimal control with rough paths

Compact operators on Banach spaces

CHAPTER V DUAL SPACES

Problem Set 6: Solutions Math 201A: Fall a n x n,

Large Deviations for Small-Noise Stochastic Differential Equations

Lecture 17 Brownian motion as a Markov process

Stochastic Differential Equations.

Transcription:

Large Deviations for Stochastic Processes Jin Feng Thomas G. Kurtz Department of Mathematics and Statistics, University of Massachusetts, Amherst, MA 3-455 E-mail address: feng@math.umass.edu Departments of Mathematics and Statistics, University of Wisconsin at Madison, 48 Lincoln Drive, Madison, WI 5376-388 E-mail address: kurtz@math.wisc.edu

2 Mathematics Subject Classification. Primary 6F, 47H2; Secondary 6J5, 6J25, 6J35, 49L25 Key words and phrases. large deviations, exponential tightness, approximation of nonlinear semigroups, Skorohod topology, Markov processes, random evolutions, occupation measures, weakly interacting particles, viscosity solutions, comparison principle, mass transport techniques Work supported in part by NSF Grants DMS 98-486 and DMS 99-757. Abstract. General functional large deviation results for cadlag processes under the Skorohod topology are developed. For Markov processes, criteria for large deviation results are given in terms convergence of infinitesimal generators. Following an introduction and overview, the material is presented in three parts. Part gives necessary and sufficient conditions for exponential tightness that are analogous to conditions for tightness in the theory of weak convergence. Analogs of the Prohorov theorem given by Puhalskii, O Brien and Vervaat, and de Acosta, then imply the large deviation principle, at least along subsequences. Representations of the rate function in terms of rate functions for the finite dimensional distributions are extended to include situations in which the rate function may be finite for sample paths with discontinuities. Part 2 focuses on Markov processes in metric spaces. For a sequence of such processes, convergence of Fleming s logarithmically transformed nonlinear semigroups is shown to imply the large deviation principle in a manner analogous to the use of convergence of linear semigroups in weak convergence. In particular cases, this convergence can be verified using the theory of nonlinear contraction semigroups. The theory of viscosity solutions of nonlinear equations is used to generalize earlier results on semigroup convergence, enabling the approach to cover a wide variety of situations. The key requirement is that a comparison principle holds. Control methods are used to give representations of the rate functions. Part 3 discusses methods for verifying the comparison principle and applies the general theory to obtain a variety of new and known results on large deviations for Markov processes. Applications include Freidlin- Wentzell theory for nearly deterministic processes, random walks and Markov chains, Donsker-Varadhan theory for occupation measures, random evolutions and averaging problems for stochastic systems, rescaled time-space lattice equations, and weakly interacting particle systems. The latter results include new comparison principles for a class of Hamilton- Jacobi equations in Hilbert spaces and spaces of probability measures. July 29, 25

Contents Preface Notation v vii Introduction Chapter. Introduction 3.. Basic methodology 4.2. The basic setting for Markov processes 6.3. Related approaches 8.4. Examples.5. An outline of the major obstacles 25 Chapter 2. An overview 29 2.. Basic setup 3 2.2. Compact state spaces 3 2.3. General state spaces 33 Part. The general theory of large deviations 39 Chapter 3. Large deviations and exponential tightness 4 3.. Basic definitions and results 4 3.2. Identifying a rate function 5 3.3. Rate functions in product spaces 53 Chapter 4. Large deviations for stochastic processes 57 4.. Exponential tightness for processes 57 4.2. Large deviations under changes of time-scale 62 4.3. Compactification 64 4.4. Large deviations in the compact uniform topology 65 4.5. Exponential tightness for solutions of martingale problems 67 4.6. Verifying compact containment 7 4.7. Finite dimensional determination of the process rate function 73 Part 2. Large deviations for Markov processes and semigroup convergence 77 Chapter 5. Large deviations for Markov processes and nonlinear semigroup convergence 79 5.. Convergence of sequences of operator semigroups 79 5.2. Applications to large deviations 82 i

ii CONTENTS Chapter 6. Large deviations and nonlinear semigroup convergence using viscosity solutions 97 6.. Viscosity solutions, definition and convergence 98 6.2. Large deviations using viscosity semigroup convergence 6 Chapter 7. Extensions of viscosity solution methods 9 7.. Viscosity solutions, definition and convergence 9 7.2. Large deviation applications 25 7.3. Convergence using projected operators 29 Chapter 8. The Nisio semigroup and a control representation of the rate function 33 8.. Formulation of the control problem 33 8.2. The Nisio semigroup 39 8.3. Control representation of the rate function 4 8.4. Properties of the control semigroup V 4 8.5. Verification of semigroup representation 49 8.6. Verifying the assumptions 53 Part 3. Examples of large deviations and the comparison principle 6 Chapter 9. The comparison principle 63 9.. General estimates 63 9.2. General conditions in R d 7 9.3. Bounded smooth domains in R d with (possibly oblique) reflection 77 9.4. Conditions for infinite dimensional state space 8 Chapter. Nearly deterministic processes in R d 97.. Processes with independent increments 97.2. Random walks 24.3. Markov processes 25.4. Nearly deterministic Markov chains 26.5. Diffusion processes with reflecting boundaries 28 Chapter. Random evolutions 227.. Discrete time, law of large numbers scaling 228.2. Continuous time, law of large numbers scaling 24.3. Continuous time, central limit scaling 256.4. Discrete time, central limit scaling 263.5. Diffusions with periodic coefficients 265.6. Systems with small diffusion and averaging 267 Chapter 2. Occupation measures 279 2.. Occupation measures of a Markov process - Discrete time 28 2.2. Occupation measures of a Markov process - Continuous time 284 Chapter 3. Stochastic equations in infinite dimensions 289 3.. Stochastic reaction-diffusion equations on a rescaled lattice 289 3.2. Stochastic Cahn-Hilliard equations on rescaled lattice 3 3.3. Weakly interacting stochastic particles 3

CONTENTS iii Appendix 339 Appendix A. Operators and convergence in function spaces 34 A.. Semicontinuity 34 A.2. General notions of convergence 342 A.3. Dissipativity of operators 346 Appendix B. Variational constants, rate of growth and spectral theory for the semigroup of positive linear operators 349 B.. Relationship to the spectral theory of positive linear operators 35 B.2. Relationship to some variational constants 353 Appendix C. Spectral properties for discrete and continuous Laplacians 363 C.. The case of d = 364 C.2. The case of d > 364 C.3. E = L 2 (O) {ρ : ρdx = } 365 C.4. Other useful approximations 366 Appendix D. Results from mass transport theory 367 D.. Distributional derivatives 367 D.2. Convex functions 37 D.3. The p-wasserstein metric space 372 D.4. The Monge-Kantorovich problem 374 D.5. Weighted Sobolev spaces Hµ(R d ) and Hµ (R d ) 378 D.6. Fisher information and its properties 382 D.7. Mass transport inequalities 39 D.8. Miscellaneous 397 Bibliography 399

Preface This work began as a research paper intended to show how the convergence of nonlinear semigroups associated with a sequence of Markov processes implied the large deviation principle for the sequence. We expected the result to be of little utility for specific applications, since classical convergence results for nonlinear semigroups involve hypotheses that are very difficult to verify, at least using classical methods. We should have recognized at the beginning that the modern theory of viscosity solutions provides the tools needed to overcome the classical difficulties. Once we did recognized that convergence of the nonlinear semigroups could be verified, the method evolved into a unified treatment of large deviation results for Markov processes, and the research paper steadily grew into the current volume. There are many approaches to large deviations for Markov processes, but this book focuses on just one. Our general title reflects both the presentation in Part of the theory of large deviations based on the large deviation analogue of the compactness theory for weak convergence, material that is the foundation of several of the approaches, and by the generality of the semigroup methods for Markov processes. The goal of Part 2 is to develop an approach for proving large deviations, in the context of metric-space-valued Markov processes, using convergence of generators in much the same spirit as for weak convergence (e.g. Ethier and Kurtz [36]). This approach complements the usual method that relies on asymptotic estimates obtained through Girsanov transformations. The usefulness of the method is best illustrated through examples, and Part 3 contains a range of concrete examples. We would like to thank Alex de Acosta, Paul Dupuis, Richard Ellis, Wendell Fleming, Jorge Garcia, Markos Katsoulakis, Jim Kuelbs, Peter Ney, Anatolii Puhalskii and Takis Souganidis for a number of helpful conversations, and Peter Ney and Jim Kuelbs for organizing a long-term seminar at the University of Wisconsin - Madison on large deviations that provided much information and insight. In particular, the authors first introduction to the close relationship between the theory of large deviations and that of weak convergence came through a series of lectures that Alex de Acosta presented in that seminar. v

Notation () (E, r). A complete, separable metric space. (2) B(E). The σ-algebra of all Borel subsets of E. (3) A M(E) M(E). An operator identified with its graph as a subset in M(E) M(E). (4) B(E). The space of bounded, Borel measurable functions. Endowed with the norm f = sup x E f(x), (B(E), ) is a Banach space. (5) B loc (E). The space of locally bounded, Borel measurable functions, that is, functions in M(E) that are bounded on each compact. (6) B ɛ (x) = {y E : r(x, y) < ɛ}. The ball of radius ɛ > and center x E. (7) buc-convergence, buc-approximable, buc-closure, closed and dense, and buc- lim, See Definition A.6. (8) C(E). The space of continuous functions on E. (9) C b (E) = C(E) B(E). () C(E, R). The collection of functions that are continuous as mappings from E into R with the natural topology on R. () C c (E). For E locally compact, the functions that are continuous and have compact support. (2) C k (O), for O R d open and k =, 2,...,. The space of functions whose derivatives up to kth order are continuous in O. (3) Cc k (O) = C k (O) C c (O). (4) C k,α (O), for O R d open, k =, 2,..., and α (, ]. The space of functions f C k (O) satisfying where f k,α = sup β k sup O β f + sup x β β =k [ β f x β ] [ ] f = sup f(x) f(y) α x y x y α, α (, ]. α <, (5) C k,α loc (O). The space of functions f C(O) such that f D C k,α (D) for every bounded open subset D O. (6) C E [, ). The space of E-valued, continuous functions on [, ). (7) Ĉ(U), for U locally compact. The space of continuous functions vanishing at infinity. (8) D(A) = {f : (f, g) A}. The domain of an operator A. (9) D + (A) = {f D(A), f > }. (2) D ++ (A) = {f D(A), inf y E f(y) > }. (2) D E [, ). The space of E-valued, cadlag (right continuous with left limit) functions on [, ) with the Skorohod topology, unless another topology is specified. (See Ethier and Kurtz [36], Chapter 3). vii

viii NOTATION (22) D(O), for O R d open. The space C c (O) with the topology giving the space of Schwartz test functions. (See D.). (23) D (O). The space of continuous linear functionals on D(O), that is, the space of Schwartz distributions. (24) g (respectively g ). The upper semicontinuous (resp. lower semicontinuous) regularization of a function g on a metric space (E, r). The definition is given by (6.2) (resp. (6.3)). (25) lim sup G n and lim inf G n for a sequence of sets G n. Definition 2.4 in Section 2.3. (26) H k ρ (R d ), for ρ P(R d ). A weighted Sobolev space. See Appendix D.5. (27) K(E) C b (E). The collection of nonnegative, bounded, continuous functions. (28) K (E) K(E). The collection of strictly positive, bounded, continuous functions. (29) K (E) K (E). The collection of bounded continuous functions satisfying inf x E f(x) >. (3) M(E). The R-valued, Borel measurable functions on E. (3) M u (E). The space of f M(E) that are bounded above. (32) M l (E). The space of f M(E) that are bounded below. (33) M(E, R). The space of Borel measurable functions with values in R and f(x) R for at least one x E. (34) M E [, ). The space of E-valued measurable functions on [, ). (35) M d d. The space of d d matrices. (36) M u (E, R) M(E, R) (respectively, C u (E, R) C(E, R)). The collection of Borel measurable (respectively continuous) functions that are bounded above (that is, f M u (E, R) implies sup x E f(x) < ). (37) M l (E, R) M(E, R) (respectively, C l (E, R) C(E, R)). The collection of Borel measurable (respectively continuous) functions that are bounded below. (38) M(E). The space of (positive) Borel measures on E. (39) M f (E). The space of finite (positive) Borel measures on E. (4) M m (U), U a metric space. The collection of µ M(U [, )) satisfying µ(u [, t]) = t for all t. (4) M T m(u) (T > ). The collection of µ M(U [, T ]) satisfying µ(u [, t]) = t for all t T. (42) P(E) M f (E). The space of probability measures on E. (43) R = [, ]. (44) R(A) = {g : (f, g) A}. The range of an operator A. (45) T #ρ = γ. γ P(E) is the push-forward (Definition D.) of ρ P(E) by the map T.

Introduction

CHAPTER Introduction The theory of large deviations is concerned with the asymptotic estimation of probabilities of rare events. In its basic form, the theory considers the limit of normalizations of log P (A n ) for a sequence of events with asymptotically vanishing probability. To be precise, for a sequence of random variables {X n } with values in a metric space (S, d), we are interested in the large deviation principle as formulated by Varadhan [8] Definition.. [Large Deviation Principle] {X n } satisfies a large deviation principle (LDP) if there exists a lower semicontinuous function I : S [, ] such that for each open set A, (.) lim inf and for each closed set B, (.2) lim sup n log P {X n A} inf x A I(x), n log P {X n B} inf x B I(x). I is called the rate function for the large deviation principle. A rate function is good if for each a [, ), {x : I(x) a} is compact. Beginning with the work of Cramér [6] and including the fundamental work on large deviations for stochastic processes by Freidlin and Wentzell [52] and Donsker and Varadhan [33], much of the analysis has been based on change of measure techniques. In this approach, a reference measure is identified under which the events of interest have high probability, and the probability of the event under the original measure is estimated in terms of the Radon-Nikodym derivative relating the two measures. More recently Puhalskii [97], O Brien and Vervaat [9], de Acosta [24] and others have developed an approach to large deviations analogous to the Prohorov compactness approach to weak convergence of probability measures. Definition.2. {X n } converges in distribution to X (that is, the distributions P {X n } converge weakly to P {X }) if and only if lim E[f(X n )] = E[f(X)] for each f C b (S). The analogy between the two theories becomes much clearer if we recall the following equivalent formulation of convergence in distribution. Proposition.3. {X n } converges in distribution to X if and only if for each open set A, lim inf P {X n A} P {X A}, 3

4. INTRODUCTION or equivalently, for each closed set B, lim sup P {X n B} P {X B}. Our main theme is the development of this approach to large deviation theory as it applies to sequences of cadlag stochastic processes. The proof of weak convergence results typically involves verification of relative compactness or tightness for the sequence and the unique characterization of the possible limit distribution. The analogous approach to large deviations involves verification of exponential tightness (Definition 3.2) and unique characterization of the possible rate function. In Part, we present results on exponential tightness and give Puhalskii s analogue of the Prohorov compactness theorem. We also give complete exponential tightness analogues of standard tightness criteria for cadlag stochastic processes and show that the rate function for a sequence of processes is determined by the rate functions for their finite dimensional distributions, just as the limit distribution for a weakly convergent sequence of processes is determined by the limits of the finite dimensional distributions. In Part 2, we focus on Markov processes and give general large deviation results based on the convergence of corresponding nonlinear semigroups. Again we have an analogy with the use of the convergence of linear semigroups in proofs of weak convergence of Markov processes. The success of this approach depends heavily on the idea of a viscosity solution of a nonlinear equation. We also exploit control theoretic representations of the limiting semigroups to obtain useful representations of the rate function. In Part 3, we demonstrate the effectiveness of these methods in a wide range of large deviation results for Markov processes, including classical Freidlin-Wentzell theory, random evolutions, and infinite dimensional diffusions... Basic methodology... Analogy with the theory of weak convergence. In both the weak convergence and large deviation settings, proofs consist of first verifying a compactness condition and then showing that there is only one possible limit. For weak convergence, the first step is usually accomplished by verifying tightness which, by Prohorov s theorem, implies relative compactness. The corresponding condition for large deviations is exponential tightness (Definition 3.2). Puhalskii [97] (and in more general settings, O Brien and Vervaat [9] and de Acosta [24]) has shown that exponential tightness implies the existence of a subsequence along which the large deviation principle holds. (See Theorem 3.7.) For stochastic processes, the second step of these arguments can be accomplished by verifying weak convergence (or the large deviation principle) for the finite dimensional distributions and showing that the limiting finite dimensional distributions (finite dimensional rate functions) determine a unique limiting distribution (rate function) for the process distributions. We extend this analogous development in a number of ways. For the Skorohod topology on the space of cadlag sample paths in a complete, separable metric space, we give a complete exponential tightness analogue of the tightness conditions of Kurtz [7] and Aldous [2] (Theorem 4.). We then extend the characterization of the rate function in terms of the finite-dimensional rate functions (Puhalskii [97], Theorem 4.5, and de Acosta [2], Lemma 3.2) to allow the rate function to be finite on paths with discontinuities (Section 4.7). Finally, we apply these results to Markov

.. BASIC METHODOLOGY 5 processes, using the asymptotic behavior of generators and semigroups associated with the Markov processes to verify exponential tightness for the processes (Section 4.5) and the large deviation principle for the finite dimensional distributions (Chapters 5, 6, 7). These arguments are again analogous to the use of convergence of generators and semigroups to verify tightness and convergence of finite dimensional distributions in the weak convergence setting...2. Nonlinear semigroups and viscosity methods. Results of Varadhan and Bryc (Proposition 3.8) relate large deviations for sequences of random variables to the asymptotic behavior of functionals of the form n log E[enf(Xn) ]. For Markov processes, (.3) V n (t)f(x) = n log E[enf(Xn(t)) X() = x] defines a nonlinear semigroup, and large deviations for sequences of Markov processes can be studied using the asymptotic behavior of the corresponding sequence of nonlinear semigroups. Viscosity methods for nonlinear equations play a central role. Fleming and others (cf. [44, 45, 4, 48, 46]) have used this approach to prove large deviation results for X n at single time points and exit times. We develop these ideas further, showing how convergence of the semigroups and their generators H n can be used to obtain both exponential tightness and the large deviation principle for the finite dimensional distributions. These results then imply the pathwise large deviation principle...3. Control theory. The limiting semigroup usually admits a variational form known as the Nisio semigroup in control theory. Connections between control problems and large deviation results were first made by Fleming [44, 45] and developed further by Sheu [8, 9]. Dupuis and Ellis [35] systematically develop these connections showing that, in many situations, one can represent a large class of functionals of the processes as the minimal cost functions of stochastic control problems and then verify convergence of the functionals to the minimal cost functions of limiting deterministic control problems. This convergence can then be used to obtain the desired large deviation result. Variational representations for the sequence of functionals can be difficult to obtain, and in the present work, we first verify convergence of the semigroups by methods that only require convergence of the corresponding generators and conditions on the limiting generator. Working only with the sequence of generators frequently provides conditions that are easier to verify than conditions that give a convergent sequence of variational representations. Variational representations for the limit are still important, however, as they provide methods for obtaining simple representations of the large deviation rate function. In Chapter 8, we discuss methods for obtaining such representations. In particular, we formulate a set of conditions at the infinitesimal level, so that by verifying a variational representation for the limit generator, we obtain the variational structure of the semigroup that gives the large deviation rate function. A generator convergence approach directly based on the sequence of control problems and Girsanov transformations was discussed in Feng [4] in a more restrictive setting. The present work avoids explicit use of a Girsanov transformation, and it greatly generalizes [4] making the approach much more applicable.

6. INTRODUCTION.2. The basic setting for Markov processes.2.. Notation. Throughout, (E, r) will be a complete, separable metric space, M(E) will denote the space of real-valued, Borel measurable functions on E, B(E) M(E), the space of bounded, Borel measurable function, C b (E) B(E), the space of bounded continuous functions, and P(E), the space of probability measures on E. We identify an operator A with its graph and, for example, write A C b (E) C b (E) if the domain D(A) and range R(A) are contained in C b (E). An operator can be multi-valued and nonlinear. The space of E-valued, cadlag functions on [, ) with the Skorohod topology will be denoted by D E [, ). Define q(x, y) = r(x, y), and note that q is a metric on E that is equivalent to r. The following metric gives the Skorohod topology. (See Ethier and Kurtz [36], Chapter 3.) Let Λ be the collection of strictly increasing functions mapping [, ) onto [, ) and satisfying (.4) γ(λ) sup log s<t For x, y D E [, ), define (.5) d(x, y) = inf λ Λ {γ(λ) λ(t) λ(s) <. t s e u sup q(x(λ(t) u), y(t u))du}. t.2.2. Continuous-time Markov processes. An E-valued Markov process X is usually characterized in terms of its generator, a linear operator A B(E) B(E). One approach to the characterization of X is to require that all processes of the form (.6) g(x(t)) g(x()) t Ag(X(s))ds be martingales with respect to some filtration {F t } independent of g. This requirement defines the (linear) martingale problem of Stroock and Varadhan, that is, a process X satisfying this condition is said to be a solution of the martingale problem for A. If ν P(E) and X() has distribution ν, X is a solution of the martingale problem for (A, ν). We say that uniqueness holds for the martingale problem for (A, ν) if any two solutions have the same finite dimensional distributions. The martingale problem for A is well-posed if for each ν P(E), there exists a unique solution of the martingale problem for (A, ν). If g is bounded away from zero, it is easy to see that (.6) is a martingale if and only if t Ag(X(s)) g(x(t)) exp{ g(x(s)) ds} is a martingale. Consequently, if we define D(H) = {f B(E) : e f D(A)} and set (.7) Hf = e f Ae f, then we can define the exponential martingale problem by requiring (.8) exp{f(x(t)) f(x()) t Hf(X(s))ds} to be a martingale with respect to a filtration independent of f. One can always assume that the domain D(A) contains constants, and hence if g D(A), then

.2. THE BASIC SETTING FOR MARKOV PROCESSES 7 there exists a constant c such that g + c D(A) is positive and bounded away from zero. Setting f = log(g +c), (.8) is a martingale if and only if (.6) is a martingale. It follows that X is a solution of the linear martingale problem for A if and only if it is a solution of the exponential martingale problem for H. Weak convergence results for a sequence {X n } of Markov processes can be based on convergence of the corresponding operator semigroups {T n (t)}, where At least formally, T n satisfies T n (t)g(x) = E[g(X n (t)) X n () = x]. d dt T n(t)g = A n T n (t)g, T n ()g = g, where A n is the generator for X n. An analogous approach to large deviation results is suggested by Fleming [45] using the nonlinear contraction (in the sup norm) semigroup V n defined in (.3). Again, at least formally, V n should satisfy which leads us to define d dt V n(t)f = n H n(nv n (t)f), (.9) H n f = n H n(nf) = n e nf A n e nf. Note that (.) exp{nf(x n (t)) nf(x()) is a martingale. t nh n f(x n (s))ds}.2.3. Discrete-time Markov processes. Consideration of discrete-time Markov processes leads to slightly different formulations of the two martingale problems. Let {Y k, k } be a time homogeneous Markov chain with state space E and transition operator T f(x) = E[f(Y k+ ) Y k = x]. For ɛ >, define X(t) = Y [t/ɛ]. Then, setting F X t is an {F X t is an {F X t [t/ɛ] g(x(t)) g(x()) (T g(y k ) g(y k )) k= = g(x(t)) g(x() }-martingale, and for f B(E), exp{f(x(t)) f(x()) [t/ɛ]ɛ [t/ɛ]ɛ = σ(x(s) : s t), for g B(E), ɛ (T I)g(X(s))ds ɛ log e f T e f (X(s))ds} }-martingale. For a sequence, X n (t) = Y[t/ɛ n n], we define (.) A n g = ɛ n (T n I)g and (.2) H n f = nɛ n log e nf T n e nf = nɛ n log(e nf (T n I)e nf + ),

8. INTRODUCTION so that and [t/ɛn]ɛ n g(x n (t)) g(x n ()) A n g(x n (s))ds [t/ɛn]ɛ n exp{nf(x n (t)) nf(x()) nh n f(x n (s))ds} are martingales. In Chapter 3, we give conditions for exponential tightness that are typically easy to check in the Markov process setting using the convergence (actually, only the boundedness) of H n. In particular, sup n H n f < for a sufficiently large class of f implies exponential tightness, at least if the state space is compact. In Section 4.7, we give conditions under which exponential tightness and the large deviation principle for finite dimensional distributions imply the large deviation principle in the Skorohod topology. In Chapters 5, 6 and 7, we give conditions under which convergence of the sequence of semigroups {V n } implies the large deviation principle for the finite dimensional distributions and conditions under which convergence of {H n } implies convergence of {V n }. Consequently, convergence of {H n } is an essential ingredient in the results..3. Related approaches There are several approaches in the literature that are closely related to the methods we develop here. We have already mentioned the control theoretic methods developed in detail by Dupuis and Ellis [35] that play a key role in the rate function representation results of Chapter 8. Two other approaches share important characteristics with our methods..3.. Exponential martingale method of de Acosta. de Acosta [26, 27] develops a general approach to large deviation results that, in the case of Markov processes, exploits the exponential martingale (.). Let λ be a measure on [, ], and assume that e af D(A n ) for all a R. Then the fact that (.) is a martingale implies that exp{ has expectation. Setting f(x n (s))λ(ds) λ[, ]f(x n ()) Φ n (x, λ, f) = λ[, ]f(x()) + convergence of H n implies (at least formally) that n Φ n(x, nλ, f) Φ(x, λ, f) = λ[, ]f(x()) + H n e λ(s,]f (X n (s))ds} H n e λ(s,]f (x(s))ds, He λ(s,]f (x(s))ds. Comparing this convergence to the conditions of Theorems 2. and 3. of [26] indicates a close relationship between the methods developed here and those developed by de Acosta, at least at the formal computational level.

.3. RELATED APPROACHES 9.3.2. The maxingale method of Puhalskii. Assume that the semigroups V n converge to V. Then V is nonlinear in the sense that, in general, However, defining V (t)(af + bg) av (t)f + bv (t)g. a b = max{a, b}, a b = a + b, and using (, ) as operations on R in place of the usual (+, ), V becomes linear and defining V n is linear under ( n, n ): Furthermore, V (t)((a f) (b g)) = (a V (t)f) (b V (t)g), a n b = n log(ena + e nb ), a n b = a + b, V n (t)((a n f) n (b n g)) = (a n V n (t)f) n (b n V n (t)g). lim a n b = a b, a n b = a b. The change of algebra (semi-ring, to be precise) on R produces a linear structure for V (or the V n ). Results analogous to those of linear analysis, such as the Riesz representation theorem, hold and can be used to study the properties of V. The counterpart of a measure is an idempotent measure. (See Puhalskii [].) Taking this view point and mimicking the martingale problem approach in weak convergence theory for stochastic processes, Puhalskii [99, ] defines a maxingale problem and develops large deviation theory for semimartingales. The main result of [], Theorem 5.4., can be stated roughly as follows: let {X n } be a sequence of Markov processes and H n be the operators so that M f s,n(t) = e nf(xn(t)) nf(xn(s)) R t s Hnf(Xn(r))dr is a mean one martingale for each f D(H n ), each s and t s, as in (.). Assuming that H = lim H n in an appropriate sense, under additional conditions on H and the convergence, {X n } is exponentially tight and along any subsequence of {X n }, there exists a further subsequence such that the large deviation principle holds with some rate function I(x). Then Π(A) = sup x A e I(x) defines an idempotent probability measure on the trajectory space of the processes. (See Chapter of []. Note that an idempotent measure is not a measure in the usual sense.) Π then solves a maxingale problem for H: e f(x(t)) f(x()) R t Hf(x(s))ds is a maxingale under Π(dx) for every f D(H) in the sense that (.3) sup {x:x(r)=z(r), r s} exp{f(x(t)) f(x(s)) t Hf(x(r))dr I(x( ))} s =, sup {x:x(r)=z(r), r s} exp{ I(x( ))} for every s t and every trajectory z. If the maxingale problem for H has a unique idempotent probability measure Π as solution, then Π = Π and the large deviation principle holds for {X n } with rate function I.

. INTRODUCTION.4. Examples As explained in Section.2, an essential ingredient of our result is the convergence of {H n }. As the following examples show, at least formally, the basic calculations for verifying this convergence may be quite simple. The calculations given here are frequently heuristic; however, we give rigorous results for these and other examples in Part 3. Example.4. [Freidlin-Wentzell Theory - I] The best-known example of large deviations for Markov processes comes from the work of Freidlin and Wentzell on diffusions with small diffusion coefficient. (See Chapters 3 and 4 in [52].) Consider a sequence of d-dimensional diffusion processes with X n satisfying the Itô equation X n (t) = x + t t σ(x n (s))dw (s) + b(x n (s))ds. n Let a(x) = σ(x) σ T (x). Then the (linear) generator is A n g(x) = a ij (x) i j g(x) + b i (x) i g(x), 2n ij i where we can take D(A n ) to be the collection of functions of the form c + f where c R and f C 2 c (R d ), the space of twice continuously differentiable functions with compact support in R d. (.4) H n f(x) = 2n and H n f(x) = 2n ij a ij (x) i j f(x) + 2n a ij (x) i f(x) j f(x) + ij i a ij (x) i j f(x) + a ij (x) i f(x) j f(x) + 2 ij ij i Consequently, Hf = lim H n f is (.5) Hf(x) = 2 ( f(x))t a(x) f(x) + b(x) f(x). b i (x) i f(x) b i (x) i f(x). We can identify the rate function I in a simple form by finding a variational representation of H. First, we introduce a pair of functions on R d R d : (.6) H(x, p) 2 pt a(x) p + b(x) p = 2 σt (x) p 2 + b(x) p, L(x, q) sup p R d {p q H(x, p)}. H(x, p) is convex in p, and H and L are dual Fenchel-Legendre transforms. particular, Therefore H(x, p) = sup q R d {p q L(x, q)}. Hf(x) = H(x, f(x)) = Hf(x) sup u R d {Af(x, u) L(x, u)}, In

Applying Corollary 8.28 or Corol- where Af(x, u) = u f(x), for f Cc 2 (R d ). lary 8.29, the rate function for {X n } is I(x) = I (x ) +.4. EXAMPLES inf {u:(x,u) J } L(x(s), u(s))ds, where I is the rate function for {X n ()} and J is the collection of solutions of the functional equation f(x(t)) f(x()) = t Af(x(s), u(s))ds, f C 2 c (R d ). In the present setting, this equation reduces to ẋ(t) = u(t), so { I (x()) + L(x(s), ẋ(s))ds if x is absolutely continuous (.7) I(x) = otherwise One can use other variational representations of the operator H and arrive at different expressions for the rate function I. For example, if we choose and define Af(x, u) = u (σ T (x) f(x)) + b(x) f(x), f C 2 (R d ), L(x, u) = 2 u 2, Hf(x) = sup u R d {Af(x, u) L(x, u)}, then H = H and the rate function can be expressed as I(x) = I (x()) + 2 inf{ u(s) 2 ds : u L 2 [, ) and x(t) = x() + t b(x(s))ds + t σ(x(s))u(s)ds}. Example.5. [Freidlin-Wentzell theory II]Wentzell [27, 28] has studied the jump process analogue of the small diffusion problem. Again, let E = R d, and consider, for example, (.8) A n g(x) = n (g(x + z) g(x))η(x, dz), R n d where for each x R d, η(x, ) is a measure on B(R d ) and for each B B(R d ), η(, B) is a Borel-measurable function. (For the moment, ignore the problem of defining an appropriate domain.) Then H n f(x) = n (e f(x+ n z) f(x) )η(x, dz) R d and H n f(x) = (e n(f(x+ n z) f(x)) )η(x, dz). R d Assuming eα z η(x, dz) <, for each α R d, Hf = lim E H n f is given by Hf(x) = (e f(x) z )η(x, dz). R d

2. INTRODUCTION More generally, one can consider generators of the form A n g(x) = n (g(x + R n z) g(x) z g(x))η(x, dz) n d + a ij (x) i j g(x) + b(x) g(x). 2n Then H n f(x) = n and H n f(x) = ij (e f(x+ n z) f(x) R n d + 2n ij a ij (x) i f(x) j f(x + 2n z f(x))η(x, dz) + b(x) f(x) a ij (x) i j f(x)) (e n(f(x+ n z) f(x)) z f(x))η(x, dz) + b(x) f(x) R d + a ij (x) i f(x) j f(x) + a ij (x) i j f(x). 2 2n ij Now, assuming (e α z α z)η(x, dz) <, for each α R d, Hf = R d lim H n f is given by Hf(x) = (e f(x) z z f(x))η(x, dz)+ a ij (x) i f(x) j f(x)+b(x) f(x). R 2 d A variational representation of H can be constructed as in Example.4. Define H(x, p) = (e p z z p)η(x, dz) + R 2 σt (x)p 2 + b(x) p d and L(x, q) = sup {p q H(x, p)}. p R d Then where Hf(x) = H(x, f(x)) = Hf(x) sup u R d {Af(x, u) L(x, u)}, and the rate function is given by (.7). Af(x, u) = u f(x), f C 2 c (R d ), Example.6. [Partial sum processes] Functional versions of the classical large deviation results of Cramér [6] have been considered by a number of authors, including Borovkov [] and Mogulskii [87]. Let ξ, ξ 2,... be independent and identically distributed R d -valued random variables with distribution ν. Define Then, as in (.), ij [t/ɛ n] X n (t) = βn ξ k. k= A n f(x) = ɛ n ( f(x + βn z)ν(dz) f(x)) R d ij ij

.4. EXAMPLES 3 and H n f(x) = log e n(f(x+β nɛ n R d n z) f(x)) ν(dz). If e α z ν(dz) <, for each α >, ɛ R d n = n, and β n = n, then H n f Hf given by Hf(x) = log e f(x) z ν(dz). R d If R d zν(dz) =, R d e α z ν(dz) < for some α >, ɛ n β 2 n = n, and ɛ n n, then H n f Hf given by Hf(x) = 2 R d ( f(x) z) 2 ν(dz). Once again, a variational representation of H and the rate function I can be constructed using Fenchel-Legendre transforms. Define and Then H(p) = 2 R d (z p) 2 ν(dz) L(q) = sup p R d {p q H(p)}. { L(ẋ(s))ds if x is absolutely continuous I(x) = otherwise. Example.7. [Levy processes] If η in Example.5 does not depend on x, then the process is a Lévy process, that is, a process with stationary, independent increments. Lynch and Sethuraman [82] and Mogulskii [88] consider the real-valued case. de Acosta [25] considers Banach space-valued processes. In de Acosta s setting, let E be a Banach space and E its dual. Recall that for f C (E), the gradient f(x) = x f(x) is the unique element in E satisfying f(x + z) f(x) = z, f(x) + o( z ), z E. Let F E separate points, and let D(A) be the collection of functions of the form f(x) = g( x, ξ,..., x, ξ m ), where ξ,..., ξ m F and g C 2 c (R m ). Then, f(x) = m k g( x, ξ,..., x, ξ m )ξ k E. k=

4. INTRODUCTION Assuming E z, ξ 2 η(dz) <, for every ξ F, we have ( A n g(x) = n g( x + n z, ξ,..., x + n z, ξ m ) g( x, ξ,..., x, ξ m ) E = n E n I { z } + m ) z, ξ k k g( x, ξ,..., x, ξ m η(dz) k= m b, ξ k k g( x, ξ,..., x, ξ m ), k= ( g( x + n z, ξ,..., x + n z, ξ m ) g( x, ξ,..., x, ξ m ) ) n I { z } z, f(x) η(dz) + b, f(x), H n f(x) = n (e f(x+ n z) f(x) E n I { z } z, f(x) )η(dz) + b, f(x), and H n f(x) = (e n(f(x+ n z) f(x)) n I { z } z, f(x) )η(dz) + b, f(x). E Now, assuming E (e α,z I { z } α, z )η(dz) <, for each α E, Hf = lim H n f is given by Hf(x) = (e z, f(x) I { z } z, f(x) )η(dz) + b, f(x). Define E H(p) = E (e z,p I z z, p )η(dz) + b, z, p E, and L(q) = sup { q, p H(p)}, q E. p E H is a convex function and L is its Fenchel-Legendre transform, and hence, H(p) = sup{ q, p L(q)}. q E Consequently Hf(x) = Hf(x) sup{af(x, u) L(u)}, u E where Af(x, u) = u, f(x), and the rate function is again given by { I (x()) + L(ẋ(s))ds if x is absolutely continuous I(x) = otherwise. Example.8. [Random evolutions - I] Let B be the generator for a Markov process Y with state space E, and let Y n (t) = Y (nt). Let X n satisfy Ẋ n (t) = F (X n (t), Y n (t)), where F : R d E R d. If Y is ergodic with stationary distribution π, then lim X n = X,

.4. EXAMPLES 5 where X satisfies Ẋ = F (X), with F (x) = F (x, y)π(dy). Averaging results of this type date back at least to Hasminskii [66]. Corresponding large deviation theorems have been considered by Freidlin [49, 5]. (See Freidlin and Wentzell [52], Chapter 7.) For simplicity, assume B is a bounded operator (.9) Bg(y) = λ(y) (g(z) g(y))µ(y, dz). Then and A n g(x, y) = F (x, y) x g(x, y) + nλ(y) (g(x, z) g(x, y))µ(y, dz), H n f(x, y) = F (x, y) x f(x, y) + nλ(y) (e f(x,z) f(x,y) )µ(y, dz) H n f(x, y) = F (x, y) x f(x, y) + λ(y) (e n(f(x,z) f(x,y)) )µ(y, dz). Let f n (x, y) = f(x) + nh(x, y). Then lim H nf n (x, y) = F (x, y) x f(x) + λ(y) (e h(x,z) h(x,y) )µ(y, dz). Consequently, H is multivalued. One approach to dealing with this limit is to select h so that the limit is independent of y, that is, to find functions h(x, y) and g(x) such that (.2) F (x, y) x f(x) + λ(y) (e h(x,z) h(x,y) )µ(y, dz) = g(x). Multiply both sides of the equation by e h(x,y), and fix x. One needs to solve an equation of the form (.2) α(y)h(y) + λ(y) (h(z) h(y))µ(y, dz) = γh(y), where h plays the role of e h(x,y) and hence must be positive. Of course, (.2) is just an eigenvalue problem for the operator (.22) Q = (αi + B). If E is finite, that is, Y is a finite Markov chain, and B is irreducible, then the Perron-Froebenius theorem implies the existence of a unique (up to multiplication by a constant) strictly positive h which is the eigenfunction corresponding to the largest eigenvalue for Q. Note that in (.2), it is only the ratio h(z)/h(y) that is relevant, so the arbitrary constant cancels. For linear semigroups, this approach to convergence of generators was introduced in Kurtz [7] and has been developed extensively by Papanicolaou and Varadhan [93] and Kushner (for example, [79]). For nonlinear semigroups, Kurtz [72] and Evans [37] have used similar arguments. If we use γ(α( )) to denote the largest eigenvalue for the Q in (.22), then we arrive at a single valued limit operator Define Hf(x) = γ(f (x, ) f(x)). H(x, p) = γ(f (x, ) p)

6. INTRODUCTION and L(x, q) = sup p R d {p q H(x, p)}, for x, p, q R d. There are various probabilistic representations for the principle eigenvalue γ. (See Appendix B.) It follows from these representations that H is convex in p, so Hf(x) = H(x, f(x)) = Hf(x) sup u R d {Af(x, u) L(x, u)}, where Af(x, u) = u f(x), for f Cc (R d ). By the results in Chapter 8, (.7) gives a representation of the rate function. Another representation of the rate function can be of special interest. Donsker and Varadhan [32] generalize the classical Rayleigh-Ritz formula and obtain the following variational representation of the principal eigenvalue for Q when B is a general operator satisfying the maximum principle: γ(α) = sup { αdµ I B (µ)}, µ P(E ) E where (.23) I B (µ) = inf f D(B),inf y f(y)> Therefore, setting H satisfies E Bf f dµ. Af(x, µ) = F (x, y)dµ(dy) f(x), E Hf(x) = Hf(x) sup {Af(x, µ) I B (µ)}. µ P(E ) By the methods in Chapter 8, the rate function can also be represented as I(x) = I (x()) + inf{ if x is absolutely continuous, and I(x) = otherwise. I B (µ(s))ds : ẋ(t) = F (x(t), )dµ(t)}, µ(t) P(E ), t }, E Example.9. [Random evolutions - II] Let B be as in (.9), let Y be the corresponding Markov process, and let Y n (t) = Y (n 3 t). Let X n satisfy Ẋ n (t) = nf (X n (t), Y n (t)). Suppose that Y is ergodic with stationary distribution π and that (.24) F (x, y)π(dy) =. E Then A n g(x, y) = nf (x, y) x g(x, y) + n 3 λ(y) (g(x, z) g(x, y))µ(y, dz), H n f(x, y) = nf (x, y) x f(x, y) + n 3 λ(y) (e f(x,z) f(x,y) )µ(y, dz),

.4. EXAMPLES 7 and H n f(x, y) = nf (x, y) x f(x, y) + n 2 λ(y) (e n(f(x,z) f(x,y)) )µ(y, dz). Let f n (x, y) = f(x) + n 2 h (x, y) + n 3 h 2 (x, y), and assume that Bh (x, )(y) = F (x, y) x f(x). If E is finite and B is irreducible, then (.24) ensures the existence of h. Note that h (x, y) will be of the form α(x, y) x f(x), where α is a vector function satisfying Bα(x, )(y) = F (x, y). It follows that lim H nf n (x, y) = λ(y) (h (x, z) h (x, y)) 2 µ(y, dz) 2 E +λ(y) (h 2 (x, z) h 2 (x, y))µ(y, dz), E and again we have a multivalued limit. As before, one approach is to select h 2 so that the limit is independent of y. If E is finite, (.24) ensures that h 2 exists. In any case, if h 2 exists, the limit will be Hf(x) = λ(y) 2 ((α(x, z) α(x, y)) x f(x)) 2 µ(y, dz)π(dy). E E Note that H is of the same form as H in Example.4, indicating that this random evolution behaves like the small diffusion process. If Y n (t) = Y (n 3 t) is replaced by Y n (t) = Y (n 2 t), then X n will converge in distribution to a diffusion. (See, for example, Ethier and Kurtz [36], Chapter 2.) The results in Example.4 give variational representations of the rate function. Example.. [Periodic diffusions] Baldi [6] has considered large deviations for models of the following type. Let σ be a periodic d d-matrix-valued function (for each i d, there is a period p i > such that σ(y) = σ(y + p i e i ) for all y R d ), and let X n satisfy the Itô equation dx n (t) = σ(α n X n (t))dw (t), n where α n > and lim n α n =. Let a = σσ T. Then A n f(x) = 2 a ij (α n x) f(x), 2n x i x j and H n f(x) = 2n H n f(x) = 2n ij ij a ij (α n x) i j f(x) + 2n a ij (α n x) i f(x) j f(x), a ij (α n x) i j f(x) + a ij (α n x) i f(x) j f(x). 2 ij Let f n (x) = f(x) + ɛ n h(x, α n x), where ɛ n = nαn 2. Noting that, by assumption, ɛ n α n = nαn, if we select h with the same periods in y as the a ij so that ( ) 2 (.25) a ij (y) h(x, y) + i f(x) j f(x) = g(x), 2 y i y j ij ij ij

8. INTRODUCTION for some g independent of y, then lim H nf n (x, y) = g(x). For each x, the desired h(x, ) is the solution of the linear partial differential equation (.26) 2 2 a ij (y) h(x, y) = g(x) a ij (y) i f(x) j f(x) y i y j 2 ij on [, p ] [, p d ] with periodic boundary conditions extended periodically to all of R d. For h to exist, we must have (.27) g(x) = a ij i f(x) j f(x), 2 ij where a ij is the average of a ij with respect to the stationary distribution for the diffusion on [, p ] [, p d ] whose generator is i,j A f(y) = 2 a ij (y) f(y), 2 y i y j i,j with periodic boundary conditions. To see that (.27) must hold, simply integrate both sides of (.26) by the stationary distribution. In particular, h(x, y) = h ij (y) i f(x) j f(x), 2 where h ij satisfies ij A h ij (y) = a ij a ij (y). If A is uniformly elliptic and f is C 3, then, with sufficient regularity on the a ij, h C 2 will exist. By (.27), the limit Hf is a special case of Hf in Example.4. Consequently, we can identify the rate function as in that example. Example.. [Donsker-Varadhan theory for occupation measures] Large deviation theory for occupation measures (see Donsker and Varadhan [33]) is closely related to Example.8. Let Y be a Markov process with generator B and state space E, and for n =, 2,..., define Γ n (C, t) = n nt I C (Y (s))ds. Then Z n defined by Z n (t) = (Y (nt), Γ n (, t)) is a Markov process with state space E = E M F (E ), where M F (E ) is the space of finite measures on E. Let h C b (E R m ) be differentiable in the real variables and satisfy h(, x) D(B) for x R m. For α i C b (E ), i =,..., m, let g(y, z) = h(y, α, z,..., α m, z ), (y, z) E M F (E ), where α, z = E αdz. The generator for Z n satisfies A n g(y, z) = nbh(y, α, z,..., α m, z ) + m α i (y) i h(y, α, z,..., α m, z ), i=

x i h(y, x,..., x m ). For definiteness, let B be a diffu- where i h(y, x,..., x m ) = sion operator Bg(y) = 2 a ij (y) g(y) + 2 y i,j i y j i.4. EXAMPLES 9 b i (y) y i g(y). Then for f(y, z) = h(y, α, z,..., α m, z ), H n f(y, z) = A n f(y, z) + n a ij (y) f(y, z) f(y, z) 2 y i y j and H n f(y, z) = A n f(y, z) + n2 2 i,j If we let f(z) = h( α, z,..., α m, z ) and then i,j a ij (y) y i f(y, z) y j f(y, z). f n (y, z) = h( α, z,..., α m, z ) + n h (y, α, z,..., α m, z ), lim H nf n (y, z) m (.28) = α i (y) i h( α, z,..., α m, z ) + Bh (y, α, z,..., α m, z ) i= + 2 i,j a ij (y) y i h (y, α, z,..., α m, z ) y j h (y, α, z,..., α m, z ). As in Example.8, we have a multivalued limit, and the rate function representation is also similar to the representation in Example.8. Example.2. [Stochastic reaction-diffusion equations] We consider a stochastic reaction-diffusion equation on a rescaled lattice, which is a Ginzburg- Landau model of non-conservative type. Let F C 2 (R), sup r F (r) <, and O = [, ) d with periodic boundary. We discretize O into Λ m {, m,..., k m,..., m }d O, also with periodic boundary conditions (that is, = ). Let m = m(n) depend on n so that lim m(n) =. When there is no possibility of confusion, we simply write m in place of m(n). For ρ R Λm, define (.29) i mρ(x) = m 2 (ρ(x + m e k) ρ(x m e k)); m ρ(x) = ( mρ(x),..., d mρ(x)) m ρ(x) = m m ρ(x) = ( m d ( 2 )2 ρ(x + 2 m e k) 2ρ(x) + ρ(x 2 ) m e k), k= and for a vector-valued function, ξ(x) = ( ) ξ (x),..., ξ d (x),

2. INTRODUCTION define div m ξ(x) = m ξ(x) = d k mρ k (x). k= We consider a finite system of stochastic differential equations (.3) dy n (t, x) = ( m Y n )(t, x)dt F (Y n (t, x))dt + md/2 n db(t, x), where x Λ m and {B(t, x) : x Λ m } are independent standard Brownian motions. Let E n = R Λm. For p C (O) and ρ E n, define ρ, p m x Λ m p(x)ρ(x)m d, and ρ, p For p,..., p l C (O) and ϕ C c O p(x)ρ(x)dx, ρ L 2 (O). (R l ), let γ = (γ,..., γ l ) and (.3) f n (ρ) = ϕ( ρ, γ m,..., ρ, γ l m ) = ϕ( ρ, γ m ), f(ρ) = ϕ( ρ, γ ), and A n f n (ρ) = l i= ( ) i ϕ( ρ, γ m ) γ i (x)( n ρ(x) F (ρ(x)))m d x Λ m + l ( ) 2 2n ijϕ( ρ, γ m ) γ i (x)γ j (x)m d. i,j= x Λ m Then, by Itô s formula, Y n is an E n -valued solution to the martingale problem for A n and H n f n (ρ) n e nf A n e nf (ρ) = Define η n : E n E = L 2 (O) by (.32) η n (ρ n )(x,..., x d ) l ( ) i ϕ( ρ, γ n ) γ i (x)( n ρ(x) F (ρ(x)))m d i= x Λ m + l ( ) i ϕ( ρ, γ n ) j ϕ( ρ, γ n ) γ i (x)γ j (x)m d 2 i,j= x Λ m + l ( ) 2 2n ijϕ( ρ, γ n ) γ i (x)γ j (x)m d. i,j= x Λ m m i,...,i d = ρ n ( i m,..., i d d m ) [ij/m,(i j+)/m)(x j ). j=

.4. EXAMPLES 2 For ρ n E n and ρ E satisfying η n (ρ n ) ρ L2 (O), we have (.33) where and (.34) H n f n (ρ n ) Hf(ρ) l ( ) i ϕ( ρ, γ ) ρ, γ i F (ρ), γ i i= + 2 l i,j= i ϕ( ρ, γ ) j ϕ( ρ, γ ) γ i (x)γ j (x)dx O = ρ, δf δρ F (ρ), δf δρ + 2 δf δρ 2 L 2 (O), = H(ρ, δf δρ ), H(ρ, p) = ρ + F (ρ), p + 2 p 2 L 2 (O), δf δρ = l i ϕ( ρ, γ )γ i. i= p C (O), By analogy with Example.4, we expect {η n (Y n )} to satisfy a large deviation principle in L 2 (O). Let L(ρ, u) = sup { u, p H(ρ, p)}. p C (O) The rate function is analogous to that of (.7) and should be given by I(ρ) = I (ρ()) + L(ρ(s), ρ(s))ds. A rigorous proof of this statement is given in Chapter 3, Theorem 3.7. The equation for Y n is a discretized version of the stochastic partial differential equation given in weak form by ϕ(x)x n (t, x)dx = ϕ(x)x n (, x)dx + ϕ(x)x n (s, x)dxds (.35) O O O [,t] ϕ(x)f (X n (s, x))dxds O [,t] + ϕ(x)w (dx ds), n O [,t] where W (dx ds) is a space-time Gaussian white noise. Unfortunately, (.35) does not have a function-valued solution when d 2. To study the large deviation problem in L 2 (O), as described above, even for the discretized equation (.3), requires that m go to infinity slowly enough with n. (See Theorem 3.7.) Equation (.35) is also known as a stochastically perturbed Allen-Cahn equation. It has been used in material science to study time evolution of material density when the total mass is not conserved. More generally, in statistical physics, when F is non-convex, the equation is used to model phase transition (Spohn [3]). When d =, (.35) is well-defined and Sowers [2] gives the large deviation principle.

22. INTRODUCTION Example.3. [Stochastic Cahn-Hilliard equations] Spohn [3] introduces a type of Ginzburg-Landau model with a conserved quantity motivated by the study of phase transitions. Let O and F be as in Example.2. Formally, the equation can be written ( (.36) t X n (t, x) = ( X n (t, x) + F (X n (t, x))) + ) t x W(t, x) n or in a more easily interpretable, but still formal, weak form ϕ(x)x n (t, x)dx = ϕ(x)x n (, x)dx 2 ϕ(x)x n (s, x)dxds (.37) O O O [,t] + ϕ(x)f (X n (s, x))dxds O [,t] + d xk ϕ(x)w k (dx ds), n k= O [,t] where the W k are independent Gaussian white noises and W = (W,..., W d ). Without the stochastic term, (.36) is known as the Cahn-Hilliard equation and originally arose in material science where the solution gives the evolution of a material density. In these applications, (.36) is derived as a phenomenological model using rough and heuristic arguments. Frequently, only asymptotic properties are of practical interest. Indeed, as in the previous example, (.36) does not have a function-valued solution. Therefore, for the large deviation problem here, we consider a discretized version. Let m and m be defined according to (.29), and consider the system (.38) ( ) dy n (t; x) = div m m ( m Y n (t, x) + F (Y n (t, x)))dt + md/2 db(t, x), x Λ m, n where B(t, x) = ( ) B (t, x),, B d (t, x), x Λ m, with the B j (, x) independent, R d -valued standard Brownian motions. Note that the solution satisfies Y n (t, x) = Y n (, x), t, x Λ m x Λ m so we may as well choose the state spaces (.39) E n {ρ R Λm : ρ(x) = }, E {ρ L 2 (O) : x Λ m O ρ(x)dx = }. The map η n in (.32) embeds E n into E. For functions of the form (.3), by Itô s formula, (.4) A n f(ρ) = l i= ) i ϕ( ρ, γ m ) ( γ i, m ( m ρ + F (ρ)) m + 2n l ijϕ( ρ, 2 γ m ) i,j= z Λ m m γ i (z) m γ j (z)m d.