Optimal Stopping of Partially Observable Markov Processes: A Filtering-Based Duality Approach

Similar documents
Introduction to Numerical Analysis. In this lesson you will be taken through a pair of techniques that will be used to solve the equations of.

TIME DELAY BASEDUNKNOWN INPUT OBSERVER DESIGN FOR NETWORK CONTROL SYSTEM

Lecture 18 GMM:IV, Nonlinear Models

On the approximation of particular solution of nonhomogeneous linear differential equation with Legendre series

1 Widrow-Hoff Algorithm

Vehicle Arrival Models : Headway

10. State Space Methods

Chapter 2. First Order Scalar Equations

6. Stochastic calculus with jump processes

5. Stochastic processes (1)

Application of Homotopy Analysis Method for Solving various types of Problems of Partial Differential Equations

A Generalization of Student s t-distribution from the Viewpoint of Special Functions

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

THE FINITE HAUSDORFF AND FRACTAL DIMENSIONS OF THE GLOBAL ATTRACTOR FOR A CLASS KIRCHHOFF-TYPE EQUATIONS

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

The Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

Decision Tree Learning. Decision Tree Learning. Decision Trees. Decision Trees: Operation. Blue slides: Mitchell. Orange slides: Alpaydin Humidity

Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

arxiv: v3 [math.na] 9 Oct 2017

An Introduction to Malliavin calculus and its applications

Cash Flow Valuation Mode Lin Discrete Time

1. Calibration factor

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS

Mean-square Stability Control for Networked Systems with Stochastic Time Delay

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

6.2 Transforms of Derivatives and Integrals.

Robust estimation based on the first- and third-moment restrictions of the power transformation model

An introduction to the theory of SDDP algorithm

t dt t SCLP Bellman (1953) CLP (Dantzig, Tyndall, Grinold, Perold, Anstreicher 60's-80's) Anderson (1978) SCLP

Connectionist Classifier System Based on Accuracy in Autonomous Agent Control

MODULE 3 FUNCTION OF A RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES PROBABILITY DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE

1 Review of Zero-Sum Games

Homework 2 Solutions

Lecture 20: Riccati Equations and Least Squares Feedback Control

Math 10B: Mock Mid II. April 13, 2016

STATE-SPACE MODELLING. A mass balance across the tank gives:

Wave Mechanics. January 16, 2017

arxiv: v1 [math.pr] 19 Feb 2011

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

Notes on Kalman Filtering

Testing for a Single Factor Model in the Multivariate State Space Framework

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Expert Advice for Amateurs

Chapter 5. Heterocedastic Models. Introduction to time series (2008) 1

Fourier Series & The Fourier Transform. Joseph Fourier, our hero. Lord Kelvin on Fourier s theorem. What do we want from the Fourier Transform?

Online Appendix to Solution Methods for Models with Rare Disasters

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Mapping in Dynamic Environments

Stochastic modeling of nonlinear oscillators under combined Gaussian and Poisson White noise

An Extension to the Tactical Planning Model for a Job Shop: Continuous-Time Control

Block Diagram of a DCS in 411

The expectation value of the field operator.

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem.

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Oscillation of an Euler Cauchy Dynamic Equation S. Huff, G. Olumolode, N. Pennington, and A. Peterson

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t.

Seminar 4: Hotelling 2

dy dx = xey (a) y(0) = 2 (b) y(1) = 2.5 SOLUTION: See next page

Chapter 3 Boundary Value Problem

An Introduction to Backward Stochastic Differential Equations (BSDEs) PIMS Summer School 2016 in Mathematical Finance.

Problem set 2 for the course on. Markov chains and mixing times

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Information Relaxations and Duality in Stochastic Dynamic Programs

Math 334 Fall 2011 Homework 11 Solutions

BU Macro BU Macro Fall 2008, Lecture 4

Convergence of the Neumann series in higher norms

M x t = K x F t x t = A x M 1 F t. M x t = K x cos t G 0. x t = A x cos t F 0

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

Statistics and Probability Letters

Utility maximization in incomplete markets

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Oscillation Properties of a Logistic Equation with Several Delays

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Generalized Snell envelope and BSDE With Two general Reflecting Barriers

A Dynamic Model of Economic Fluctuations

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

I-Optimal designs for third degree kronecker model mixture experiments

EXERCISES FOR SECTION 1.5

Thus the force is proportional but opposite to the displacement away from equilibrium.

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

Chapter 6. Systems of First Order Linear Differential Equations

Monte Carlo methods for the valuation of multiple exercise options

On a Fractional Stochastic Landau-Ginzburg Equation

Conduction Equation. Consider an arbitrary volume V bounded by a surface S. Let any point on the surface be denoted r s

Riemann Hypothesis and Primorial Number. Choe Ryong Gil

Martingales Stopping Time Processes

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H.

Differential Equations

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

Class Meeting # 10: Introduction to the Wave Equation

Recursive Least-Squares Fixed-Interval Smoother Using Covariance Information based on Innovation Approach in Linear Continuous Stochastic Systems

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

f(s)dw Solution 1. Approximate f by piece-wise constant left-continuous non-random functions f n such that (f(s) f n (s)) 2 ds 0.

14 Autoregressive Moving Average Models

Transcription:

1 Opial Sopping of Parially Observable Marov Processes: A Filering-Based Dualiy Approach Fan Ye, and Enlu Zhou, Meber, IEEE Absrac In his noe we develop a nuerical approach o he proble of opial sopping of discree-ie coninuous-sae parially observable Marov processes (POMPs). Our oivaion is o find approxiae soluions ha provide lower and upper bounds on he value funcion such ha he gap beween he bounds can provide a pracical easure of he qualiy of he soluions. o his end, we develop a filering-based dualiy approach, which relies on he aringale dualiy forulaion of he opial sopping proble and he paricle filering echnique. We show ha his approach copleens an asypoic lower bound derived fro a subopial sopping ie wih an asypoic upper bound on he value funcion. We carry ou error analysis and illusrae he effeciveness of our ehod on an exaple of pricing Aerican opions under parial observaion of sochasic volailiy. Index ers Parially observable, opial sopping, paricle filering, aringale dualiy, Aerican opion pricing, sochasic volailiy. I. INRODUCION Opial sopping of a parially observable Marov process (POM- P) is a sequenial decision aing proble under parial observaion of he underlying sae. his ype of probles arise in a nuber of applicaions, including change poin deecion in a producion line, launching of a new echnology under incoplee inforaion of he are, and selling of an asse or a financial derivaive. Opial sopping of a POMP is ore challenging han is counerpar of a fully observable process, since he inference of he hidden sae and he choice of an opial acion should be accoplished a he sae ie. As a special class of he parially observable Marov decision processes (POMDPs), opial sopping of a POMP can be ransfored o a fully observable opial sopping proble by inroducing a new sae variable, ofen referred o as he filering disribuion. However, his concise represenaion does no reduce he coplexiy of he proble, because he filering disribuion is usually infinie diensional when he unobserved sae aes values in a coninuous space. In addiion, he proble also suffers fro he so-called curse of diensionaliy of dynaic prograing ha is coon in solving coninuous-sae Marov decision processes. Nuerical soluions o opial sopping of POMPs have been sudied by [4], [8], [10], [9], osly in he seing of pricing Aerican opions under parial observaion of sochasic volailiy. hese ehods can be viewed as a cobinaion of diension reducion on he filering disribuion and approxiae dynaic prograing, whereas [14] avoids he filering sep o approxiae he value funcion. Soe of he aforeenioned approaches are proven o converge asypoically o he rue value funcion. However, in pracice wih a finie aoun of copuaion resource, he difference beween heir approxiae soluions and he rue value funcion is usually unnown and hard o quanify. In view of he lac of perforance guaranee and copuaional coplexiy of he aforeenioned ehods, in his noe we focus on developing a lower-and-upper-bound approach wih oderae copuaional cos. he oivaion is ha he gap beween he lower and upper bounds gives an indicaion of he qualiy of he approxiae soluions. o guaranee a high-qualiy approxiae soluion, we can increase he copuaion effor unil he gap beween he wo bounds decreases o a desirable olerance level. o his end, we propose F. Ye and E. Zhou are wih he Deparen of Indusrial & Enerprise Syses Engineering, Universiy of Illinois a Urbana-Chapaign, Urbana, IL, 61801 USA e-ail:fanye2, enluzhou@illinois.edu. his wor was suppored by he Naional Science Foundaion under Grans ECCS-0901543 and CMMI-1130273, and by he Air Force Office of Scienific Research under YIP Gran FA-9550-12-1-0250. a filering-based dualiy approach ha copleens a subopial sopping ie (hence an asypoic lower bound) wih an asypoic upper bound on he value funcion. Since our approach does no ie o a paricular odel and only involves Mone Carlo siulaion, i can be generalized o any POMP as long as he paricle filering echnique can be applied. Our ehod relies on he aringale dualiy forulaion of he fully observable opial sopping proble, which is proposed by [11] and [5] in he seing of pricing Aerican opions under consan volailiy. Fro he perspecive of odeling fideliy versus copuaional coplexiy, i is no rivial o copare opial sopping of POMPs wih is counerpar in fully observable Marov processes. In paricular, he difference of heir value funcions canno be quanified in general and is proble dependen, so we are also ineresed in learning he feaures ha influence his difference in he underlying probabilisic odel. Indeed, as an exaple, our nuerical experiens on pricing Aerican opions under parially observable sochasic volailiy show ha our asypoic upper bound is sricly less han he opion price of he odel where he volailiy is reaed direcly observable, and he difference is especially obvious when he effec of he volailiy is doinan. his in urn shows ha our ehod provides a beer crierion o evaluae he perforance of a subopial policy in he parially observable odel. he res of he noe is organized as follows. In Secion II, we describe he general proble forulaion of opial sopping of POMPs and he ransforaion o an equivalen fully observable opial sopping proble. In Secion III, we develop he fileringbased dualiy approach, and is error analysis and convergence resul are presened in Secion IV. We presen soe nuerical exaples in Secion V, and finally conclude in Secion VI. All he proofs are conained in he Appendix. II. PROBLEM FORMULAION Le (Ω,F,P) be a probabiliy space. Consider a hidden Marov odel (,Y ), = 0,1,, saisfying he following equaions +1 = f (,Z+1 1 ), = 0,1,, 1; (1a) Y 0 = h 0 ( 0,Z0 2 ); (1b) Y +1 = h( +1,Y,Z+1 2 ), = 0,1,, 1; (1c) where he unobserved sae is in a coninuous sae space R n x, he observaion Y is in a coninuous observaion space Y R n y. he noises (Z 1,Z 2 ), = 1,,, which are independen of he iniial sae 0 and he iniial observaion Y 0, are independen rando vecors wih nown disribuions, bu he coponens of each vecor can be correlaed. Equaions (1a) and (1b)-(1c) are ofen referred o as he sae equaion and he observaion equaion respecively. Noe ha (,Y ) is a bivariae Marov process adaped o he filraion F σ( i,y i );i = 0,...,. Le J 1,,. Denoe by F Y σy 0,...,Y he filraion generaed by he processes (1b)-(1c). A rando variable τ : Ω J is an F Y -sopping ie if τ F Y for every J. We define Y as he se of F Y -sopping ies ha ae values in J. Assue ha he iniial Y 0 is a nown consan, and he iniial 0 follows a nown disribuion π 0, which is derived fro he hisorical daa (including Y 0 ). We consider he finie-horizon parially observable opial sopping proble V 0 (π 0,y 0 ) = sup τ Y E[g(τ, τ,y τ ) 0 π 0,Y 0 = y 0 ], (2) where g : J Y R is he reward funcion. In his seing he decision aer has access o only sae Y so ha her decision a ie is ade purely depending on he observaion hisory up o ie,

2 i.e.,y 0,,Y. For convenience, in he following we use g(,y ) and g( τ,y τ ) in shor for g(,,y ) and g(τ, τ,y τ ) respecively. he opial sopping proble of a POMP can be ransfored o an equivalen fully observable opial sopping proble by inroducing a new sae variable Π, ofen referred o as he filering disribuion, which is he condiional disribuion of given he observaions Y 0: Y 0,...,Y. More specifically, given a se A in he Borel σ- algebra over, define Π (A) Prob( A Y 0,...,Y ), = 0,...,. Given a realizaion of he observaions y 0: y 0,...,y, he probabiliy densiy π of he filering disribuion Π evolves as follows: π (x ) = p(x,y x 1,y 1 )π 1 (x 1 )dx 1 p(y, = 1,...,, (3) x 1,y 1 )π 1 (x 1 )dx 1 where he condiional probabiliy densiy funcions p(x,y x 1,y 1 ) and p(y x 1,y 1 ) are induced by (1a), (1c), and he disribuions of Z 1 and Z 2. Noicing ha π only depends on π 1, y 1, and y, and leing he realizaion y 0: be replaced by he rando variables Y 0:, we can absracly rewrie he filering recursion (3) as Π = Φ(Π 1,Y 1,Y ), = 1,2,...,. hen proble (2) can be ransfored o an equivalen opial sopping proble (see, e.g., Chaper 5 in [3]) wih fully observable sae (Π,Y ): where V 0 (π 0,y 0 ) = sup τ Y E[ g(π τ,y τ ) 0 π 0,Y 0 = y 0 ], g(π,y ) E[g(,Y ) F Y ] = g(x,y )Π (x )dx. heoreically, we can solve (2) following he dynaic prograing recursion: V (Π,Y ) = ax( g(π,y ),C (Π,Y )), =,...,1, (4) where C (Π,Y ) is he coninuaion value a ie defined as C (Π,Y ) g(π,y ); C (Π,Y ) E[V +1 (Π +1,Y +1 ) Π,Y ], = 1,...,0. Here E[ Π,Y ] is inerpreed as E[ Π,Y ]. hen V 0 = C 0 and he opial sopping ie is τ = in J g(π,y ) C (Π,Y ). We also define is associaed -indexed sopping ie τ for each J : τ ini J g(π i,y i ) C i (Π i,y i ) (5) wih J, + 1,...,. he above recursion also shows ha (Π,Y ) are he sufficien saisics ha deerine he opial sopping ie. he process V V (Π,Y ) defined in (4) is called he Snell envelope process (see, e.g., Chaper 2 in [6]) of he process g(π,y ), which is he salles F Y -superaringale ha doinaes g in he sense ha V (Π,Y ) g(π,y ). In paricular, by shifing he ie index in (2) we can inerpre V as V (π,y ) = sup E[g( τ,y τ ) π,y = y ] τ Y, τ = E[g( τ,y τ ) π,y = y ], = 1,...,. (6) However, i is ofen ipossible o solve he proble exacly following (4) due o wo ain difficulies. One is ha in general he filering disribuion Π is infinie diensional and he filering recursion (3) canno be copued exacly. he oher difficuly lies in he accurae esiaion of he coninuaion value C (Π,Y ) ha leads o he opial sopping ie τ. So we develop an approxiaion ehod in he nex secion. III. FILERING-BASED MARINGALE DUALIY APPROACH In his secion, we consruc a dual proble o he original opial sopping of POMPs, and develop a nuerical ehod ha yields an asypoic upper bound on he value funcion. Our dual forulaion is a sraighforward exension of he dual forulaion for he opial sopping proble proposed in [11], [5], and [1], by replacing he filraion wih F Y. heore 1 (c.f. (5) in [1]). Le M represen he space of F Y -adaped aringales M wih M 0 = 0 and sup J E M <. hen V 0 (π 0,y 0 ) = in E[ax g(π,y ) M 0 π 0,Y 0 = y 0 ]. (7) M M J he opial aringale M ha achieves he iniu on he righ hand side of (7) is of he for M = i, (8) where is he aringale difference sequence defined as E[V F Y ] E[V F Y 1], J. (9) In addiion, he following equaliy holds pahwisely in he alos sure sense, i.e., V 0 (π 0,y 0 ) = ax J ( g(π,y ) M ) a.s.. he proof of heore 1 follows he sae line in [1] and hence is oied here. heore 1 characerizes a srong dualiy relaion beween he prial proble (2) and is dual proble on he righ side of (7); his dualiy suggess ha any F Y -adaped aringale M can lead o an upper bound on V 0 (π 0,y 0 ) and ha he opial aringale (8) is derived fro he Doob-Meyer decoposiion of he superaringale V. In paricular, we can rewrie (9) as =E[V Π,Y ] E[V Π 1,Y 1 ] =E[g( τ,y τ ) Π,Y ] E[g( τ,y τ ) Π 1,Y 1 ]. (10a) (10b) Noe ha i is ipossible o copue he opial aringale M, since he aringale difference er (10a) (or (10b)) involves he inracable filering disribuion Π and he Snell envelop process V (or he opial sopping ie τ ). herefore, we need o inroduce approxiaion schees o address boh aspecs. On he one hand, he inracable filering disribuion Π can be approxiaed by a discree disribuion using paricle filering, which will be saed in Secion III-A. On he oher hand, (10a) and (10b) sugges ha we approxiae using eiher approxiae value funcions of V or subopial F Y -sopping ies ha approxiae τ. In addiion, soe oher heurisic consrucions can be considered. For exaple, we can ae = E[U (,Y ) F Y ] E[U (,Y ) F 1 Y ], where U (,Y ) is he value funcion o he corresponding opial sopping proble wih fully observable sae (,Y ): U (x,y ) = sup κ E[g( κ,y κ ) = x,y = y ], (11) where is he se of F -sopping ies κ ha ae values in J ; or equivalenly we can ae = E[g( κ,y κ ) Π,Y ] E[g( κ,y κ ) Π 1,Y 1 ], where κ is he opial F -sopping ie o proble (11). Even if he explici fors of U and κ are no nown, heir approxiaions can be used in and is aringale difference propery can sill be preserved. he advanage of approxiaing U or κ is heir siple srucure as funcions of only (,Y ), whereas eiher V or τ is a funcion of (Y 0,,Y ). hus, i ay be

3 easier o generae aringale difference ers based on approxiae U or κ, even hough hey ay yield less opial values. In he res of his secion we focus on approxiaing in (10b) by he following based on a fixed sopping ie τ (see, e.g., (16) in Secion III-B), which is eiher F Y or F -adaped: E[g( τ,y τ ) Π,Y ] E[g( τ,y τ ) Π 1,Y 1], (12) where τ is he -indexed sopping ie associaed wih τ, and Π (see deails in Secion III-A) is he approxiae filering disribuion a ie obained by paricle filering (he superscrip in Π denoes he nuber of paricles), which will be elaboraed in he nex secion. A lower-case noaion π denoes he corresponding approxiae filering disribuion based on a realizaion of he observaions y 0:. hen we define M as M 0 = 0; M = 1 +... +, J. (13) Incorporaing he above ideas, we propose he following algorih ha yields an asypoic upper bound on V 0. Algorih 1. Filering-Based Maringale Dualiy Approach Sep 1. For = 1,2,...,N, do - Generae a pah of observaions y () 1: according o he processes (1a)- (1c) wih iniial condiion Y 0 = y 0 and 0 π 0, and hen follow Algorih 2 (paricle filering) o generae he approxiae filering disribuion π () 1,...,π (). - For = 1,...,, use Algorih 3 o copue (), which is an approxiaion for () = E[g( τ,y τ ) π (),y () ] E[g( τ,y τ ) π () - Su he approxiae aringale differences o obain M () = () 1 +... + () ( - Evaluae V () = ax J,y () g(π (), = 1,...,. ) ) M (). end 1,y() 1 ]. (14) Sep 2. Se VN τ = N 1 N =1 V (). VN τ is an asypoic upper bound on he value funcion V 0 (π 0,y 0 ). In he nex wo subsecions, we will discuss how o generae approxiae filering disribuion using paricle filering via Algorih 2 and how o copue he approxiae aringale difference via Algorih 3. A. Paricle Filering We approxiae π using paricle filering, which is a successful and versaile nuerical ehod for solving nonlinear filering proble. A good inroducion on paricle filering can be found in he boo [2]. he paricle filering ehod approxiaes π by a finie nuber (say ) of paricles x (1),...,x (), i.e., a discree disribuion π wrien as follows π = 1 δ (i) x, (15) where δ is he Dirac easure. As he nuber of paricles goes o infiniy, i can be ensured ha π converges o π in cerain sense. Algorih 2. Paricle Filering Inpu: 0 π 0 and a sequence of observaions y 0:. Oupu: he approxiae filering disribuion π0,...,π. Sep 1. Iniializaion: Se = 0. Draw i.i.d. saples x (1) 0,...,x() 0 fro he disribuion π 0. Se π0 = 1 δ x (i). 0 Sep 2. For = 1,...,, do Predicion: For each i = 1,...,, draw one saple x (i) fro P( 1 = x (i) 1 ). Bayes Updaing: Copue w (i) = p(y x(i),y 1 ) p(y x(i), i = 1,...,.,y 1 ),...,x () fro he discree ) = w (i), i = 1,...,. Se π = 1 δ x (i). end Resapling: Draw i.i.d. saples x (1) disribuion Prob( x (i) B. Approxiae Maringale Difference he reaining issue is how o copue he aringale difference (14). hroughou his subsecion we assue a subopial sopping ie τ of he for, τ = in J g(,y ) C (,Y ), (16) where C, J is a sequence of approxiae coninuaion funcions of U. he approxiae coninuaion funcions C can be derived, for exaple, by regression on soe basis funcions as suggesed by [7] and [13]. We choose an F -sopping ie τ of he for (16) only for ease of exposiion, hough Algorih 3 can be adjused using any oher F (or F Y )-sopping ie wih he sae principle. Given a realizaion of observaions y 0:, we eploy nesed siulaion o obain he esiae of in (14). Noe ha π in Algorih 1 is of he for (15). herefore, = 1 1 E[g( τ,y τ ) = x (i),y = y ] E[g( τ,y τ ) 1 = x (i) 1,Y 1 = y 1 ], where τ is he -indexed sopping ie associaed wih τ defined as τ = ini J g( i,y i ) C i ( i,y i ). o esiae E[g( τ,y τ ) x (i),y ] (resp., E[g( τ,y τ ) x (i) 1,y 1]), we generae l subpahs ha are sopped according o τ wih iniial condiion = x (i),y = y (resp., 1 = x (i) 1,Y 1 = y 1 ) for each i and, and we average g( τ,y τ ) over hese subpahs. So here are a oal nuber of l subpahs generaed o esiae each expecaion er in (14). he deails of he nesed siulaion are presened below. Algorih 3. Esiaion of Using Nesed Siulaion Inpu: y 1, y, π 1 = 1 δ x (i) and π = 1 δ 1 x (i) fro Algorih 1 and Algorih 2. (Sep 1 - Sep 2 are used o esiae E[g( τ,y τ ) π 1,y 1].) Sep 1. For i = 1,...,, do (i j) (i j) (i j) j) - Siulae (x,y ),...,(x,y(i )l j=1 fro he processes (1a)-(1c) wih he iniial condiion 1 = x (i) 1 and Y 1 = y 1. - o apply τ on hese saple pahs, find (i j) (i j) (i j) (i j) i j = in J : g(x,y ) C (x,y. ) - Se b i = 1 l l j) (i j) j=1 g(x(i i j,y i j ). end Sep 2. Se G,l 1, 1 b i, which is an unbiased esiaor of E[g( τ,y τ ) π 1,y 1]. (Sep 3 - Sep 4 is used o esiae E[g( τ,y τ ) π,y ].) Sep 3. For i = 1,...,, do If g(x (i),y ) C (x (i),y ), i.e., (x (i),y ) is in he sopping region, se b i = g(x (i),y ). Oherwise, repea Sep 1 wih he iniial condiion = x (i) and Y = y o obain b i. end Sep 4. Se G,l 1 b i, which is an unbiased esiaor of, E[g( τ,y τ ) π,y ]. Sep 5. Se = G,l, G,l 1,. IV. ERROR ANALYSIS In his secion, we analyze he error bound and asypoic convergence of our algorih. o lighen he noaions, we use E 0 [ ] o denoe E[ 0 π 0,Y 0 = y 0 ] in he res of noe. he following assupion is used hroughou our analysis. Assupion 1. i. g ax J g(,, ) <. ii. For any observaion sequence y 0:, sup p(y x,y 1 ) <, J. x

4 We firs inroduce an F Y -adaped aringale difference sequence τ and aringale M τ induced by an F (or F Y )-sopping ie τ: τ = E[g( τ,y τ ) Π,Y ] E[g( τ,y τ ) Π 1,Y 1 ], M τ 0 0; Mτ τ 1 +... + τ, J. Since M τ is an F Y -adaped aringale, hen E 0 [ax J ( g(π,y ) M τ )] is an upper bound on V 0 (π 0,y 0 ) by heore 1. Recall ha he approxiae aringale difference based on a realizaion of observaions y 0: is = E[g( τ,y τ ) π,y ] E[g( τ,y τ ) π 1,y 1]. In Algorih 3 he epirical esiaes of E[g( τ,y τ ) π,y ] and E[g( τ,y τ ) π 1,y 1] are denoed by G,,l and G,l 1,, respecively. herefore, we use = G,,l G,l 1, and M = i o approxiae and M. Insead of obaining ax J g(π,y ) M τ exacly along each pah of he observaions y 0:, we copue ax J g(π,y ) M. Noe ha condiional on a fixed observaion sequence, he forer er is a consan, while he laer one is a rando er due o sapling. he difference beween hese wo ers is due o wo sources of noise: One is fro he difference of he deerinisic densiy π and he rando easure π, and his gap will go o zero (in expecaion) by increasing he nuber of paricles under Assupion 1; anoher difference is fro he variabiliy of he nesed (Mone Carlo) siulaion, which can be eliinaed by increasing he nuber of saple pahs l. We will show in he nex heore (wih proof in he Appendix) ha E 0 [ax J g(π,y ) M ] converges o E 0 [ax J g(π,y ) M τ ] when he paricle nuber increases o infiniy. Hence, E 0 [ax J g(π,y ) M ] is an asypoic (as ) upper bound on V 0 (π 0,y 0 ). Moreover, he gap beween E 0 [ax J g(π,y ) M τ ] and V 0 (π 0,y 0 ) is purely due o he subopial sopping ie τ. heore 2. Suppose τ is an F (or F Y )-sopping ie. hen li E 0[ax J g(π,y ) M ] = E 0 [ax g(π,y ) M τ ]. (17) J Moreover, we have he following inequaliies: E 0 [ax g(π,y ) M τ ] V 0 (π 0,y 0 ) J 2 2 =1 =1 E 0 [( τ ) 2 ] E 0 [ (E[g(τ,Y τ ) Π,Y,] E[g( τ,y τ ) Π,Y ] ) 2 ]. (18) Fro (17), he oupu VN τ in Algorih 1 is an asypoic (as he saple pah nuber N and he paricle nuber ) upper bound on he rue value funcion V 0. According o (18), a large will lead o a igh upper bound provided ha he aringale M τ induced by he sopping ie τ does no differ oo uch fro he opial M, or ore inuiively, he subopial sopping ie τ does no differ oo uch fro he opial τ. V. NUMERICAL EAMPLES We apply our ehod o price Aerican pu opions under s- ochasic volailiy. Following he odel in [10] we considered a d S -diensional process of asse price S, = 0 : : S i +1 = Si exp ( r (σ i +1 )2 2 ) δ + σ+1 i δz i,1 +1, i = 1,...,d S, (19) where r is he consan ineres rae, δ is he ie period beween he equally-spaced ie poins, Z i,1, = 1 :,i = 1,...,d S are independen sequences of Gaussian rando variables wih Z i,1 N (0,1), and he volailiy σ i exp( i ) is a deerinisic funcion of a d (= d S )-diensional process, = 0 : ha evolves as a discreized Ornsein-Uhlenbec process: +1 i = i e λiδ + θ i (1 e λiδ 1 e ) + γ 2λ iδ i Z i,2 2λ +1, i = 1,...,d, (20) i where he posiive consan θ i is he ean reversion value, he consan λ i is he ean reversion rae, he consan γ i is a easure of he process volailiy, and Z i,2, = 1 :,i = 1,...,d are independen sequences of Gaussian rando variables wih Z i,2 N (0, µ i 2 ), which are also independen of Z i,1. Here µ i is used o conrol he observaion noise. For sipliciy, in our nuerical experiens we use λ i = λ, θ i = θ, γ i = γ, µ i = µ for all i = 1,...,d. Assue ha only he asse price is observed, and exercise opporuniies ae place a = 1,...,. We consider he pu opion on he iniu of d S asses, i.e., he payoff funcion is of he for ( ) g(,s ) = ax e rδ K ins 1,...,S d S,0. In he res of his secion, exercise policy siply eans sopping ie in he general opial sopping proble. Rear 1. In his exaple, he condiional probabiliy densiy funcion where p(s,s 1 ) = exp p(s i i,s 1 i ) = d p(s i i,s i 1 ) (ln(si /S 1 i ) (r exp2( i )/2)δ) 2 2exp 2 ( i )δ µ2 S i 2π exp 2 ( i )δ µ 2. I can be shown ha p(s,s 1 ) saisfies Assupion 1(ii) and ha Assupion 1(i) is also rivially saisfied. Since he sochasic volailiy canno be direcly observed in realiy bu can be parially observable hrough he inference fro he observed asse price, pricing Aerican opion under he above odel (19)-(20) falls ino he fraewor of opial sopping of POMPs. We illusrae our algorih hrough a series of nuerical experiens wih d S = 1 (one asse) and d S = 2 (wo asses). In paricular, we are ineresed in how he variance of he volailiy (corresponding o he paraeers (θ,λ,γ)) and observaion noise (corresponding o he paraeer µ) influence he price difference due o he difference beween he fully observable and parially observable volailiies. We lis he paraeer ses in able I. o copue opion prices under boh full and parial observaions, we ipleen our algorih as well as he Leas-Squares Mone Carlo (LSMC) ehod of [7], which provides subopial exercise policies, and he prial-dual (PD) ehod of [1], which parallels our ehod in he fully observable odels. he nuerical resuls of he opion prices under differen paraeer ses are lised in able II (for one asse) and able III (for wo asses), where LB represens he lower bound obained by he LSMC ehod for he fully/parially observable odel wih he following wo ses of basis funcions for he one-asse and wo-asse probles respecively: H 1 =L 0 (S 1 ),L 2 0(S 1 ),L 1 (S 1 ),L 2 1(S 1 ),L 0 (S 1 )L 1 (S 1 ),1, H 2 =L 0 (S 1 ),L 2 0(S 1 ),L 0 (S 2 ),L 2 0(S 2 ),L 0 (S 1 )L 0 (S 2 ),L 2 (S 1,S 2 ),L 2 2(S 1,S 2 ),1, where L 0 (x) = x, L 1 (x) = axk x,0 and L 2 (x,y) = axk inx,y,0. Please noe ha he basis funcions only depend on he asse price S no he volailiy exp( ), so he subopial policy is F Y -adaped and he resuls are guaraneed o be lower bounds for he parially observable odel. In he ables, UB represens

5 he corresponding upper bound yielded by our filering-based dualiy ehod for he parially observable odel, and Full.ŨB represens he corresponding upper bound yielded by he PD ehod for he fully observable odel. I is clear ha we can iprove he exercise policy for he fully observable odel by eploying ore basis funcions ha use he inforaion of he volailiy exp( ): Full.LB and Full.UB are he lower bound and upper bound for he fully observable odel, sill obained by he LSMC ehod and PD ehod wih addiional basis funcions for each proble: H add 1 = L 0 (e 1 ),L 0 (e 1 )L 1 (S 1 ) H add 2 = L 0 (e 1 ),L 2 0(e 1 ),L 0 (e 2 ),L 2 0(e 2 ),L 0 (e 1 )L 2 (S 1,S 2 ),L 0 (e 2 )L 2 (S 1,S 2 ). Each enry in able II and able III shows he saple average and he sandard error (in parenheses) of he nuerical resuls of 20 independen runs using he following procedure: we ipleen he LSMC ehod wih 50000 saple pahs o obain a subopial policy τ, and hen apply his policy on anoher independen se of 50000 pahs o ge he lower bound LB; he dual upper bound UB is obained by ipleening Algorih 1 using he subopial policy τ wih he nuber of saple pahs N = 500, nuber of paricles = 500, and nuber of subpahs l = 10; o invesigae he opion prices under he fully observable sochasic volailiy, we use he PD ehod wih 500 saple pahs and 5000 subpahs in nesed siulaion (which is equal o l) o obain an upper bound Full.ŨB, since he policy τ obained before is also a subopial policy for he fully observable odel. Excep he new ses of basis funcions, he LSMC and PD ehods are ipleened exacly he sae way as before o generae anoher se of lower bound Full.LB and upper bound Full.UB for he fully observable odel. In pracice we ofen use he average of LB and UB, and he average of Full.LB and Full.UB as esiaes of he opion prices o he parially observable and fully observable probles, respecively. ABLE I PARAMEER SES # (θ,λ,γ) µ 1 (log(0.1),1.0,1.0) 0.3 2 (log(0.1),1.0,1.0) 1.0 3 (log(0.2),0.5,1.0) 0.3 4 (log(0.2),0.5,1.0) 1.0 5 (log(0.2),1.5,1.0) 0.3 6 (log(0.2),1.5,1.0) 1.0 7 (log(0.2),1.0,0.5) 0.3 8 (log(0.2),1.0,0.5) 1.0 9 (log(0.3),2.0,0.3) 0.3 10 (log(0.3),2.0,0.3) 1.0 ABLE II AMERICAN PU OPION PRICES ON ONE ASSE (r = 0.05, K = 40, δ = 0.1, = 10, S 0 = 36, 0 = θ ) Volailiy no observable Volailiy direcly observable # LB UB Full.ŨB Full.LB Full.UB 1 3.820(0.000) 3.820(0.000) 3.825(0.001) 3.820(0.000) 3.821(0.000) 2 3.853(0.001) 3.887(0.001) 3.954(0.003) 3.905(0.002) 3.912(0.001) 3 3.892(0.001) 4.019(0.003) 4.321(0.005) 4.197(0.003) 4.209(0.001) 4 5.009(0.006) 5.216(0.005) 5.368(0.009) 5.297(0.005) 5.328(0.001) 5 3.881(0.001) 3.898(0.001) 3.995(0.004) 3.928(0.002) 3.938(0.001) 6 4.842(0.003) 4.935(0.002) 5.028(0.003) 4.973(0.004) 4.997(0.001) 7 3.869(0.001) 3.870(0.000) 3.876(0.001) 3.871(0.001) 3.872(0.000) 8 4.632(0.002) 4.653(0.001) 4.704(0.002) 4.679(0.003) 4.689(0.001) 9 4.010(0.001) 4.022(0.001) 4.049(0.001) 4.030(0.001) 4.044(0.001) 10 5.881(0.003) 5.902(0.001) 5.907(0.001) 5.896(0.005) 5.904(0.001) he nuerical resuls are divided ino wo caegories: he firs six rows repor he nuerical resuls under he doinan volailiy effecs, i.e., γ is coparaively large and λ is coparaively sall; he las four rows repor he resuls under oderae/wea volailiy effecs. I can be seen fro he ables ha [Full.LB,Full.UB] is usually a igher inerval han [LB, Full.ŨB] for he fully observable opion price, since ore inforaion is used o deerine a beer ABLE III AMERICAN PU OPION PRICES ON HE MINIMUM OF WO ASSES (r = 0.05, K = 40, δ = 0.1, = 10, S 0 = (36,36), 0 = (θ,θ) ) Volailiy no observable Volailiy direcly observable # LB UB Full.ŨB Full.LB Full.UB 1 4.027(0.002) 4.032(0.001) 4.068(0.002) 4.039(0.001) 4.043(0.001) 2 5.004(0.006) 5.147(0.004) 5.256(0.006) 5.143(0.005) 5.222(0.003) 3 5.274(0.005) 5.378(0.002) 5.565(0.004) 5.467(0.004) 5.489(0.001) 4 8.045(0.006) 8.171(0.004) 8.289(0.006) 8.188(0.010) 8.268(0.003) 5 4.641(0.002) 4.782(0.001) 4.918(0.005) 4.833(0.006) 4.870(0.001) 6 7.531(0.006) 7.638(0.002) 7.723(0.007) 7.606(0.007) 7.704(0.002) 7 4.429(0.002) 4.456(0.001) 4.514(0.001) 4.477(0.002) 4.500(0.001) 8 6.984(0.004) 7.042(0.003) 7.074(0.004) 6.997(0.007) 7.080(0.001) 9 5.417(0.002) 5.428(0.001) 5.449(0.001) 5.431(0.003) 5.447(0.001) 10 9.084(0.006) 9.130(0.002) 9.138(0.002) 9.071(0.009) 9.133(0.002) exercise policy. o differeniae he opion prices under full and parial observaions of sochasic volailiy, [10] poined ou ha he parial observaion of sochasic volailiy has an ipac especially when he effec of he volailiy (i.e., γ2 ) is high. Our nuerical resuls 2λ also suppor heir viewpoins in ers of he differences beween UB and Full.ŨB, which deonsrae he effeciveness of inroducing he filering sep. In paricular, i can be observed ha we can reduce relaively ore overpricing for probles wih doinan volailiy (i.e., he firs caegory). Considering he differences beween LB and Full.UB, parially observable and fully observable opion prices have relaively sall gaps under oderae/wea volailiy effecs copared wih he gaps in he firs caegory. Larger observaion noise µ challenges he perforance of subopial exercise policy and also deerioraes he perforance of paricle filering, so i generally increases he gap beween Full.LB and Full.UB and he gap beween LB and UB. Copared wih [10] and [8], whose approaches provide asypoic lower bounds on he opion prices, our ain conribuion is o provide an asypoic upper bound on he opion price, which is less han or siilar o he lower bound (Full.LB) of he corresponding fully observable opion price in he firs caegory. Hence, our ehod provides a beer crierion o evaluae he perforance of LB: he saller he gap beween UB and LB, he beer he bounds. If he gap beween UB and LB is sall enough, hey can be boh regarded as approxiae opion prices under parial observaion. Oherwise, iproveen on he exercise policy should be considered. VI. CONCLUSION In his noe we propose a nuerical approach o solve for he value funcion of he parially observable opial sopping proble. We represen he value funcion as a soluion of a dual iniizaion proble, based on which we develop an algorih ha copleens a subopial sopping ie wih an asypoic upper bound on he value funcion. Our approach provides a pracical way o judge wheher ore copuaional effor is needed o iprove he qualiy of he approxiae soluion. We apply our approach o price Aerican pu opions in sochasic volailiy odels, wih he realisic assupion ha he volailiy canno be direcly observed bu can be inferred fro he asse prices. he nuerical resuls confir a higher price of he opion if we alernaively assue ha he volailiy is direcly observable. he price difference is ore significan when he effec of volailiy is high, indicaing he iporance of aing he parial observabiliy ino accoun. APPENDI PROOF OF HEOREM 2 We need he following proposiion for he proof of he heore. Proposiion 1 (Corollary 10.28, [2]). Le π0,...,π be he rando easure generaed by Algorih 2 for he observaion sequence y 0:. Suppose ha he following assupion holds: f < and sup p(y x,y 1 ) <, = 1,...,. x

6 hen [ ( E f (x )π (x )dx ) ] 2 f (x )π (x )dx 2 f 2, = 0,...,, where he consan does no depend on (bu i dose depend on and y 0: ). In paricular, 0 = 1. Proof of heore 2: We firs prove (17). Given a saple pah of he observaions y 0,...,y, he difference of g(π,y ) and g(π,y ) is ϑ g(x,y )π (x )dx g(x,y )π (x )dx. Guaraneed by Proposiion 1, E[ ϑ ] E[(ϑ ) 2 ] g for soe consan. he difference beween M τ and M is he su of he differences beween τ and : where τ = χ, χ 1, + ε,l, ε,l 1,, χ, E[g( τ,y τ ) π,y ] E[g( τ,y τ ) π,y ], χ 1, E[g( τ,y τ ) π 1,y 1 ] E[g( τ,y τ ) π 1,y 1], ε,l, E[g( τ,y τ ) π,y ] G,l,, ε,l 1, E[g( τ,y τ ) π 1,y 1] G,l 1,. he firs wo errors are filering errors, since we can rewrie χ, as ] ] χ, = E g( j,y j )1 τ = j π,y E g( j,y j )1 τ = j π,y [ j= [ j= = I (x,y )π (x )dx I (x,y )π (x )dx. (21) I (x,y ) is defined as he inegrand of E[ j= g( j,y j )1 τ = j π,y ], i.e., I (x,y ) g(x,y )1 τ = + g(x j,y j )1 τ = j p(dx +1 dy +1...dx j dy j x,y ), j=+1 where p(dx +1 dy +1...dx j dy j x,y ) denoes he join probabiliy disribuion of (x +1,y +1,...,x j,y j ) condiional on (x,y ). As τ = j are disjoin ses for each j, i iplies I g. Based on (21) and using Proposiion 1 wih f = I, i is ensured ha E[ χ, ] g for soe consan. Siilarly, E[ χ 1, ] b 1 g for soe consan b 1. he laer wo errors are fro he sapling variabiliy of Mone Carlo siulaion (as sep 1 in Algorih 2); he error bounds are guaraneed by Proposiion 1 wih = 0, i.e., E[ ε,,l ] g and E[ ε,l g l 1, ]. l So given a saple pah of he observaions y 0: we have for each J, Since li E[ ( g(π,y ) M τ ) ( g(π,y ) M ) ] = li E[ ϑ + ( ( i τ i )) ] = 0. (22) ax J g(π,y ) M τ ax J g(π,y ) M ax J ( g(π,y ) M τ ) ( g(π,y ) M ) ( g(π,y ) M τ ) ( g(π,y ) M ), =1 by aing expecaion and leing go o infiniy we have li E[ ax J g(π,y ) M ax g(π,y ) M τ ] = 0. J Noe ha is bounded by 2 g for each J, and herefore, g(π,y ) M is bounded by (2 +1) g and ax J g(π,y ) M is bounded by (2 + 1) g. he sae conclusions are also valid for τ, g(π,y ) M τ and ax J g(π,y ) M τ. hen li E [ 0 ax J g(π,y ) M ax g(π,y ) M τ ] J = li E [ [ 0 E ax J g(π,y ) M ax g(π,y ) M τ F Y ]] J [ =E 0 li E[ ax J g(π,y ) M ax g(π,y ) M τ F Y ]] J =0, where he second equaliy follows fro he boundedness of he inegrand and he doinaed convergence heore. Hence, li E 0[ax J g(π,y ) M ] = E 0 [ax g(π,y ) M τ ]. J Now we prove (18). Firs we have E 0 [ax J g(π,y ) M τ ] V 0 =E 0 [ax J g(π,y ) M τ ] E 0 [ax J g(π,y ) M ] E 0 [ax J M M τ ], following he fac ha ax J g(π,y ) M τ ax J g(π,y ) M ax J M M τ. hen (18) follows fro E 0 [ax J M M τ ] 2 E 0 [(M Mτ )2 ] [ ((M =2 ] E 0 M τ ) (M 1 Mτ 1 )) 2 =1 =2 E 0 [( τ ) 2 ] =1 [ (E[g(τ 2 E 0,Y τ ) Π,Y ] E[g( τ,y τ ) Π,Y ] ) ] 2, =1 where he firs inequaliy follows fro he fac ha M M τ is a aringale and applying Doob s aringale inequaliy, and he firs equaliy uses he orhogonaliy propery of aringale difference (see p.331 in [12]). o show he las inequaliy, recall ha τ =(E[g( τ,y τ ) F Y ] E[g( τ,y τ ) F Y ]) (E[g( τ,y τ ) F Y 1] E[g( τ,y τ ) F Y 1]); hen he las inequaliy can be shown by siple algebra and ieraed expecaion on F Y 1. REFERENCES [1] L. Andersen and M. Broadie. Prial-dual siulaion algorih for pricing ulidiensional Aerican opions. Manageen Science, 50(9):1222 1234, 2004. [2] A. Bain and D.Crisan. Fundaenals of Sochaic Filering. Springer, 2008. [3] D.P. Berseas. Dynaic Prograing and Opial Conrol. Ahena Scienific, 3rd ediion, 2007. [4] I. Florescu and F. Viens. Sochasic volailiy: Opion pricing using a ulinoial recobining ree. Applied Maheaical Finance, 15(2):151 181, 2008. [5] M. B. Haugh and L. Kogan. Pricing Aerican opions: A dualiy approach. Operaions Research, 52(2):258 270, 2004. [6] D. Laberon and B. Lapeyre. Inroducion o sochasic calculus applied o finance. Chapan & Hall/CRC, 2007. [7] F. A. Longsaff and E. S. Schwarz. Valuing Aerican opions by siulaion: A siple leas-squares approach. he Review of Financial Sudies, 14(1):113 147, 2001. [8] M. Ludovsi. A siulaion approach o opial sopping under parial inforaion. Sochasic Processes and Applicaions, 119(12):2071 2087, 2009. [9] H. Pha, W. Runggaldier, and A. Sellai. Approxiaion by quanizaion of he filer process and applicaions o opial sopping probles under parial observaion. Mone Carlo Mehods and Applicaions, 11(1):57 81, 2005. [10] B. R. Rabhara and A. E. Brocwell. Sequenial Mone Carlo pricing of Aerican-syle opions under sochasic volailiy odels. he Annals of Applied Saisics, 4, No. 1, 222-265(1):222 265, 2010. [11] L. C. G. Rogers. Mone Carlo valuaion of Aerican opions. Maheaical Finance, 12(3):271 286, 2002. [12] S.Karlin and H. aylor. A Firs Course in Sochasic Process,2nd edn. Acadeic Press, San Diego, 1975. [13] J. sisilis and B. Van Roy. Regression ehods for pricing coplex Aerican-syle opions. IEEE ransacions on Neural Newors, 12(4):694 703, 2001. [14] E. Zhou. Opial sopping under parial observaion: Near-value ieraion. 2011. Forhcoing in IEEE ransacions on Auoaic Conrol.