Optimal Stopping of Partially Observable Markov Processes: A Filtering-Based Duality Approach

1 Opial Sopping of Parially Observable Marov Processes: A Filering-Based Dualiy Approach Fan Ye, and Enlu Zhou, Meber, IEEE Absrac In his noe we develop a nuerical approach o he proble of opial sopping of discree-ie coninuous-sae parially observable Marov processes (POMPs). Our oivaion is o find approxiae soluions ha provide lower and upper bounds on he value funcion such ha he gap beween he bounds can provide a pracical easure of he qualiy of he soluions. o his end, we develop a filering-based dualiy approach, which relies on he aringale dualiy forulaion of he opial sopping proble and he paricle filering echnique. We show ha his approach copleens an asypoic lower bound derived fro a subopial sopping ie wih an asypoic upper bound on he value funcion. We carry ou error analysis and illusrae he effeciveness of our ehod on an exaple of pricing Aerican opions under parial observaion of sochasic volailiy. Index ers Parially observable, opial sopping, paricle filering, aringale dualiy, Aerican opion pricing, sochasic volailiy. I. INRODUCION Opial sopping of a parially observable Marov process (POM- P) is a sequenial decision aing proble under parial observaion of he underlying sae. his ype of probles arise in a nuber of applicaions, including change poin deecion in a producion line, launching of a new echnology under incoplee inforaion of he are, and selling of an asse or a financial derivaive. Opial sopping of a POMP is ore challenging han is counerpar of a fully observable process, since he inference of he hidden sae and he choice of an opial acion should be accoplished a he sae ie. As a special class of he parially observable Marov decision processes (POMDPs), opial sopping of a POMP can be ransfored o a fully observable opial sopping proble by inroducing a new sae variable, ofen referred o as he filering disribuion. However, his concise represenaion does no reduce he coplexiy of he proble, because he filering disribuion is usually infinie diensional when he unobserved sae aes values in a coninuous space. In addiion, he proble also suffers fro he so-called curse of diensionaliy of dynaic prograing ha is coon in solving coninuous-sae Marov decision processes. Nuerical soluions o opial sopping of POMPs have been sudied by [4], [8], [10], [9], osly in he seing of pricing Aerican opions under parial observaion of sochasic volailiy. hese ehods can be viewed as a cobinaion of diension reducion on he filering disribuion and approxiae dynaic prograing, whereas [14] avoids he filering sep o approxiae he value funcion. Soe of he aforeenioned approaches are proven o converge asypoically o he rue value funcion. However, in pracice wih a finie aoun of copuaion resource, he difference beween heir approxiae soluions and he rue value funcion is usually unnown and hard o quanify. In view of he lac of perforance guaranee and copuaional coplexiy of he aforeenioned ehods, in his noe we focus on developing a lower-and-upper-bound approach wih oderae copuaional cos. he oivaion is ha he gap beween he lower and upper bounds gives an indicaion of he qualiy of he approxiae soluions. o guaranee a high-qualiy approxiae soluion, we can increase he copuaion effor unil he gap beween he wo bounds decreases o a desirable olerance level. o his end, we propose F. Ye and E. Zhou are wih he Deparen of Indusrial & Enerprise Syses Engineering, Universiy of Illinois a Urbana-Chapaign, Urbana, IL, 61801 USA e-ail:fanye2, enluzhou@illinois.edu. his wor was suppored by he Naional Science Foundaion under Grans ECCS-0901543 and CMMI-1130273, and by he Air Force Office of Scienific Research under YIP Gran FA-9550-12-1-0250. a filering-based dualiy approach ha copleens a subopial sopping ie (hence an asypoic lower bound) wih an asypoic upper bound on he value funcion. Since our approach does no ie o a paricular odel and only involves Mone Carlo siulaion, i can be generalized o any POMP as long as he paricle filering echnique can be applied. Our ehod relies on he aringale dualiy forulaion of he fully observable opial sopping proble, which is proposed by [11] and [5] in he seing of pricing Aerican opions under consan volailiy. Fro he perspecive of odeling fideliy versus copuaional coplexiy, i is no rivial o copare opial sopping of POMPs wih is counerpar in fully observable Marov processes. In paricular, he difference of heir value funcions canno be quanified in general and is proble dependen, so we are also ineresed in learning he feaures ha influence his difference in he underlying probabilisic odel. Indeed, as an exaple, our nuerical experiens on pricing Aerican opions under parially observable sochasic volailiy show ha our asypoic upper bound is sricly less han he opion price of he odel where he volailiy is reaed direcly observable, and he difference is especially obvious when he effec of he volailiy is doinan. his in urn shows ha our ehod provides a beer crierion o evaluae he perforance of a subopial policy in he parially observable odel. he res of he noe is organized as follows. In Secion II, we describe he general proble forulaion of opial sopping of POMPs and he ransforaion o an equivalen fully observable opial sopping proble. In Secion III, we develop he fileringbased dualiy approach, and is error analysis and convergence resul are presened in Secion IV. We presen soe nuerical exaples in Secion V, and finally conclude in Secion VI. All he proofs are conained in he Appendix. II. PROBLEM FORMULAION Le (Ω,F,P) be a probabiliy space. Consider a hidden Marov odel (,Y ), = 0,1,, saisfying he following equaions +1 = f (,Z+1 1 ), = 0,1,, 1; (1a) Y 0 = h 0 ( 0,Z0 2 ); (1b) Y +1 = h( +1,Y,Z+1 2 ), = 0,1,, 1; (1c) where he unobserved sae is in a coninuous sae space R n x, he observaion Y is in a coninuous observaion space Y R n y. he noises (Z 1,Z 2 ), = 1,,, which are independen of he iniial sae 0 and he iniial observaion Y 0, are independen rando vecors wih nown disribuions, bu he coponens of each vecor can be correlaed. Equaions (1a) and (1b)-(1c) are ofen referred o as he sae equaion and he observaion equaion respecively. Noe ha (,Y ) is a bivariae Marov process adaped o he filraion F σ( i,y i );i = 0,...,. Le J 1,,. Denoe by F Y σy 0,...,Y he filraion generaed by he processes (1b)-(1c). A rando variable τ : Ω J is an F Y -sopping ie if τ F Y for every J. We define Y as he se of F Y -sopping ies ha ae values in J. Assue ha he iniial Y 0 is a nown consan, and he iniial 0 follows a nown disribuion π 0, which is derived fro he hisorical daa (including Y 0 ). We consider he finie-horizon parially observable opial sopping proble V 0 (π 0,y 0 ) = sup τ Y E[g(τ, τ,y τ ) 0 π 0,Y 0 = y 0 ], (2) where g : J Y R is he reward funcion. In his seing he decision aer has access o only sae Y so ha her decision a ie is ade purely depending on he observaion hisory up o ie,

2 i.e.,y 0,,Y. For convenience, in he following we use g(,y ) and g( τ,y τ ) in shor for g(,,y ) and g(τ, τ,y τ ) respecively. he opial sopping proble of a POMP can be ransfored o an equivalen fully observable opial sopping proble by inroducing a new sae variable Π, ofen referred o as he filering disribuion, which is he condiional disribuion of given he observaions Y 0: Y 0,...,Y. More specifically, given a se A in he Borel σ- algebra over, define Π (A) Prob( A Y 0,...,Y ), = 0,...,. Given a realizaion of he observaions y 0: y 0,...,y, he probabiliy densiy π of he filering disribuion Π evolves as follows: π (x ) = p(x,y x 1,y 1 )π 1 (x 1 )dx 1 p(y, = 1,...,, (3) x 1,y 1 )π 1 (x 1 )dx 1 where he condiional probabiliy densiy funcions p(x,y x 1,y 1 ) and p(y x 1,y 1 ) are induced by (1a), (1c), and he disribuions of Z 1 and Z 2. Noicing ha π only depends on π 1, y 1, and y, and leing he realizaion y 0: be replaced by he rando variables Y 0:, we can absracly rewrie he filering recursion (3) as Π = Φ(Π 1,Y 1,Y ), = 1,2,...,. hen proble (2) can be ransfored o an equivalen opial sopping proble (see, e.g., Chaper 5 in [3]) wih fully observable sae (Π,Y ): where V 0 (π 0,y 0 ) = sup τ Y E[ g(π τ,y τ ) 0 π 0,Y 0 = y 0 ], g(π,y ) E[g(,Y ) F Y ] = g(x,y )Π (x )dx. heoreically, we can solve (2) following he dynaic prograing recursion: V (Π,Y ) = ax( g(π,y ),C (Π,Y )), =,...,1, (4) where C (Π,Y ) is he coninuaion value a ie defined as C (Π,Y ) g(π,y ); C (Π,Y ) E[V +1 (Π +1,Y +1 ) Π,Y ], = 1,...,0. Here E[ Π,Y ] is inerpreed as E[ Π,Y ]. hen V 0 = C 0 and he opial sopping ie is τ = in J g(π,y ) C (Π,Y ). We also define is associaed -indexed sopping ie τ for each J : τ ini J g(π i,y i ) C i (Π i,y i ) (5) wih J, + 1,...,. he above recursion also shows ha (Π,Y ) are he sufficien saisics ha deerine he opial sopping ie. he process V V (Π,Y ) defined in (4) is called he Snell envelope process (see, e.g., Chaper 2 in [6]) of he process g(π,y ), which is he salles F Y -superaringale ha doinaes g in he sense ha V (Π,Y ) g(π,y ). In paricular, by shifing he ie index in (2) we can inerpre V as V (π,y ) = sup E[g( τ,y τ ) π,y = y ] τ Y, τ = E[g( τ,y τ ) π,y = y ], = 1,...,. (6) However, i is ofen ipossible o solve he proble exacly following (4) due o wo ain difficulies. One is ha in general he filering disribuion Π is infinie diensional and he filering recursion (3) canno be copued exacly. he oher difficuly lies in he accurae esiaion of he coninuaion value C (Π,Y ) ha leads o he opial sopping ie τ. So we develop an approxiaion ehod in he nex secion. III. FILERING-BASED MARINGALE DUALIY APPROACH In his secion, we consruc a dual proble o he original opial sopping of POMPs, and develop a nuerical ehod ha yields an asypoic upper bound on he value funcion. Our dual forulaion is a sraighforward exension of he dual forulaion for he opial sopping proble proposed in [11], [5], and [1], by replacing he filraion wih F Y. heore 1 (c.f. (5) in [1]). Le M represen he space of F Y -adaped aringales M wih M 0 = 0 and sup J E M <. hen V 0 (π 0,y 0 ) = in E[ax g(π,y ) M 0 π 0,Y 0 = y 0 ]. (7) M M J he opial aringale M ha achieves he iniu on he righ hand side of (7) is of he for M = i, (8) where is he aringale difference sequence defined as E[V F Y ] E[V F Y 1], J. (9) In addiion, he following equaliy holds pahwisely in he alos sure sense, i.e., V 0 (π 0,y 0 ) = ax J ( g(π,y ) M ) a.s.. he proof of heore 1 follows he sae line in [1] and hence is oied here. heore 1 characerizes a srong dualiy relaion beween he prial proble (2) and is dual proble on he righ side of (7); his dualiy suggess ha any F Y -adaped aringale M can lead o an upper bound on V 0 (π 0,y 0 ) and ha he opial aringale (8) is derived fro he Doob-Meyer decoposiion of he superaringale V. In paricular, we can rewrie (9) as =E[V Π,Y ] E[V Π 1,Y 1 ] =E[g( τ,y τ ) Π,Y ] E[g( τ,y τ ) Π 1,Y 1 ]. (10a) (10b) Noe ha i is ipossible o copue he opial aringale M, since he aringale difference er (10a) (or (10b)) involves he inracable filering disribuion Π and he Snell envelop process V (or he opial sopping ie τ ). herefore, we need o inroduce approxiaion schees o address boh aspecs. On he one hand, he inracable filering disribuion Π can be approxiaed by a discree disribuion using paricle filering, which will be saed in Secion III-A. On he oher hand, (10a) and (10b) sugges ha we approxiae using eiher approxiae value funcions of V or subopial F Y -sopping ies ha approxiae τ. In addiion, soe oher heurisic consrucions can be considered. For exaple, we can ae = E[U (,Y ) F Y ] E[U (,Y ) F 1 Y ], where U (,Y ) is he value funcion o he corresponding opial sopping proble wih fully observable sae (,Y ): U (x,y ) = sup κ E[g( κ,y κ ) = x,y = y ], (11) where is he se of F -sopping ies κ ha ae values in J ; or equivalenly we can ae = E[g( κ,y κ ) Π,Y ] E[g( κ,y κ ) Π 1,Y 1 ], where κ is he opial F -sopping ie o proble (11). Even if he explici fors of U and κ are no nown, heir approxiaions can be used in and is aringale difference propery can sill be preserved. he advanage of approxiaing U or κ is heir siple srucure as funcions of only (,Y ), whereas eiher V or τ is a funcion of (Y 0,,Y ). hus, i ay be

3 easier o generae aringale difference ers based on approxiae U or κ, even hough hey ay yield less opial values. In he res of his secion we focus on approxiaing in (10b) by he following based on a fixed sopping ie τ (see, e.g., (16) in Secion III-B), which is eiher F Y or F -adaped: E[g( τ,y τ ) Π,Y ] E[g( τ,y τ ) Π 1,Y 1], (12) where τ is he -indexed sopping ie associaed wih τ, and Π (see deails in Secion III-A) is he approxiae filering disribuion a ie obained by paricle filering (he superscrip in Π denoes he nuber of paricles), which will be elaboraed in he nex secion. A lower-case noaion π denoes he corresponding approxiae filering disribuion based on a realizaion of he observaions y 0:. hen we define M as M 0 = 0; M = 1 +... +, J. (13) Incorporaing he above ideas, we propose he following algorih ha yields an asypoic upper bound on V 0. Algorih 1. Filering-Based Maringale Dualiy Approach Sep 1. For = 1,2,...,N, do - Generae a pah of observaions y () 1: according o he processes (1a)- (1c) wih iniial condiion Y 0 = y 0 and 0 π 0, and hen follow Algorih 2 (paricle filering) o generae he approxiae filering disribuion π () 1,...,π (). - For = 1,...,, use Algorih 3 o copue (), which is an approxiaion for () = E[g( τ,y τ ) π (),y () ] E[g( τ,y τ ) π () - Su he approxiae aringale differences o obain M () = () 1 +... + () ( - Evaluae V () = ax J,y () g(π (), = 1,...,. ) ) M (). end 1,y() 1 ]. (14) Sep 2. Se VN τ = N 1 N =1 V (). VN τ is an asypoic upper bound on he value funcion V 0 (π 0,y 0 ). In he nex wo subsecions, we will discuss how o generae approxiae filering disribuion using paricle filering via Algorih 2 and how o copue he approxiae aringale difference via Algorih 3. A. Paricle Filering We approxiae π using paricle filering, which is a successful and versaile nuerical ehod for solving nonlinear filering proble. A good inroducion on paricle filering can be found in he boo [2]. he paricle filering ehod approxiaes π by a finie nuber (say ) of paricles x (1),...,x (), i.e., a discree disribuion π wrien as follows π = 1 δ (i) x, (15) where δ is he Dirac easure. As he nuber of paricles goes o infiniy, i can be ensured ha π converges o π in cerain sense. Algorih 2. Paricle Filering Inpu: 0 π 0 and a sequence of observaions y 0:. Oupu: he approxiae filering disribuion π0,...,π. Sep 1. Iniializaion: Se = 0. Draw i.i.d. saples x (1) 0,...,x() 0 fro he disribuion π 0. Se π0 = 1 δ x (i). 0 Sep 2. For = 1,...,, do Predicion: For each i = 1,...,, draw one saple x (i) fro P( 1 = x (i) 1 ). Bayes Updaing: Copue w (i) = p(y x(i),y 1 ) p(y x(i), i = 1,...,.,y 1 ),...,x () fro he discree ) = w (i), i = 1,...,. Se π = 1 δ x (i). end Resapling: Draw i.i.d. saples x (1) disribuion Prob( x (i) B. Approxiae Maringale Difference he reaining issue is how o copue he aringale difference (14). hroughou his subsecion we assue a subopial sopping ie τ of he for, τ = in J g(,y ) C (,Y ), (16) where C, J is a sequence of approxiae coninuaion funcions of U. he approxiae coninuaion funcions C can be derived, for exaple, by regression on soe basis funcions as suggesed by [7] and [13]. We choose an F -sopping ie τ of he for (16) only for ease of exposiion, hough Algorih 3 can be adjused using any oher F (or F Y )-sopping ie wih he sae principle. Given a realizaion of observaions y 0:, we eploy nesed siulaion o obain he esiae of in (14). Noe ha π in Algorih 1 is of he for (15). herefore, = 1 1 E[g( τ,y τ ) = x (i),y = y ] E[g( τ,y τ ) 1 = x (i) 1,Y 1 = y 1 ], where τ is he -indexed sopping ie associaed wih τ defined as τ = ini J g( i,y i ) C i ( i,y i ). o esiae E[g( τ,y τ ) x (i),y ] (resp., E[g( τ,y τ ) x (i) 1,y 1]), we generae l subpahs ha are sopped according o τ wih iniial condiion = x (i),y = y (resp., 1 = x (i) 1,Y 1 = y 1 ) for each i and, and we average g( τ,y τ ) over hese subpahs. So here are a oal nuber of l subpahs generaed o esiae each expecaion er in (14). he deails of he nesed siulaion are presened below. Algorih 3. Esiaion of Using Nesed Siulaion Inpu: y 1, y, π 1 = 1 δ x (i) and π = 1 δ 1 x (i) fro Algorih 1 and Algorih 2. (Sep 1 - Sep 2 are used o esiae E[g( τ,y τ ) π 1,y 1].) Sep 1. For i = 1,...,, do (i j) (i j) (i j) j) - Siulae (x,y ),...,(x,y(i )l j=1 fro he processes (1a)-(1c) wih he iniial condiion 1 = x (i) 1 and Y 1 = y 1. - o apply τ on hese saple pahs, find (i j) (i j) (i j) (i j) i j = in J : g(x,y ) C (x,y. ) - Se b i = 1 l l j) (i j) j=1 g(x(i i j,y i j ). end Sep 2. Se G,l 1, 1 b i, which is an unbiased esiaor of E[g( τ,y τ ) π 1,y 1]. (Sep 3 - Sep 4 is used o esiae E[g( τ,y τ ) π,y ].) Sep 3. For i = 1,...,, do If g(x (i),y ) C (x (i),y ), i.e., (x (i),y ) is in he sopping region, se b i = g(x (i),y ). Oherwise, repea Sep 1 wih he iniial condiion = x (i) and Y = y o obain b i. end Sep 4. Se G,l 1 b i, which is an unbiased esiaor of, E[g( τ,y τ ) π,y ]. Sep 5. Se = G,l, G,l 1,. IV. ERROR ANALYSIS In his secion, we analyze he error bound and asypoic convergence of our algorih. o lighen he noaions, we use E 0 [ ] o denoe E[ 0 π 0,Y 0 = y 0 ] in he res of noe. he following assupion is used hroughou our analysis. Assupion 1. i. g ax J g(,, ) <. ii. For any observaion sequence y 0:, sup p(y x,y 1 ) <, J. x

4 We firs inroduce an F Y -adaped aringale difference sequence τ and aringale M τ induced by an F (or F Y )-sopping ie τ: τ = E[g( τ,y τ ) Π,Y ] E[g( τ,y τ ) Π 1,Y 1 ], M τ 0 0; Mτ τ 1 +... + τ, J. Since M τ is an F Y -adaped aringale, hen E 0 [ax J ( g(π,y ) M τ )] is an upper bound on V 0 (π 0,y 0 ) by heore 1. Recall ha he approxiae aringale difference based on a realizaion of observaions y 0: is = E[g( τ,y τ ) π,y ] E[g( τ,y τ ) π 1,y 1]. In Algorih 3 he epirical esiaes of E[g( τ,y τ ) π,y ] and E[g( τ,y τ ) π 1,y 1] are denoed by G,,l and G,l 1,, respecively. herefore, we use = G,,l G,l 1, and M = i o approxiae and M. Insead of obaining ax J g(π,y ) M τ exacly along each pah of he observaions y 0:, we copue ax J g(π,y ) M. Noe ha condiional on a fixed observaion sequence, he forer er is a consan, while he laer one is a rando er due o sapling. he difference beween hese wo ers is due o wo sources of noise: One is fro he difference of he deerinisic densiy π and he rando easure π, and his gap will go o zero (in expecaion) by increasing he nuber of paricles under Assupion 1; anoher difference is fro he variabiliy of he nesed (Mone Carlo) siulaion, which can be eliinaed by increasing he nuber of saple pahs l. We will show in he nex heore (wih proof in he Appendix) ha E 0 [ax J g(π,y ) M ] converges o E 0 [ax J g(π,y ) M τ ] when he paricle nuber increases o infiniy. Hence, E 0 [ax J g(π,y ) M ] is an asypoic (as ) upper bound on V 0 (π 0,y 0 ). Moreover, he gap beween E 0 [ax J g(π,y ) M τ ] and V 0 (π 0,y 0 ) is purely due o he subopial sopping ie τ. heore 2. Suppose τ is an F (or F Y )-sopping ie. hen li E 0[ax J g(π,y ) M ] = E 0 [ax g(π,y ) M τ ]. (17) J Moreover, we have he following inequaliies: E 0 [ax g(π,y ) M τ ] V 0 (π 0,y 0 ) J 2 2 =1 =1 E 0 [( τ ) 2 ] E 0 [ (E[g(τ,Y τ ) Π,Y,] E[g( τ,y τ ) Π,Y ] ) 2 ]. (18) Fro (17), he oupu VN τ in Algorih 1 is an asypoic (as he saple pah nuber N and he paricle nuber ) upper bound on he rue value funcion V 0. According o (18), a large will lead o a igh upper bound provided ha he aringale M τ induced by he sopping ie τ does no differ oo uch fro he opial M, or ore inuiively, he subopial sopping ie τ does no differ oo uch fro he opial τ. V. NUMERICAL EAMPLES We apply our ehod o price Aerican pu opions under s- ochasic volailiy. Following he odel in [10] we considered a d S -diensional process of asse price S, = 0 : : S i +1 = Si exp ( r (σ i +1 )2 2 ) δ + σ+1 i δz i,1 +1, i = 1,...,d S, (19) where r is he consan ineres rae, δ is he ie period beween he equally-spaced ie poins, Z i,1, = 1 :,i = 1,...,d S are independen sequences of Gaussian rando variables wih Z i,1 N (0,1), and he volailiy σ i exp( i ) is a deerinisic funcion of a d (= d S )-diensional process, = 0 : ha evolves as a discreized Ornsein-Uhlenbec process: +1 i = i e λiδ + θ i (1 e λiδ 1 e ) + γ 2λ iδ i Z i,2 2λ +1, i = 1,...,d, (20) i where he posiive consan θ i is he ean reversion value, he consan λ i is he ean reversion rae, he consan γ i is a easure of he process volailiy, and Z i,2, = 1 :,i = 1,...,d are independen sequences of Gaussian rando variables wih Z i,2 N (0, µ i 2 ), which are also independen of Z i,1. Here µ i is used o conrol he observaion noise. For sipliciy, in our nuerical experiens we use λ i = λ, θ i = θ, γ i = γ, µ i = µ for all i = 1,...,d. Assue ha only he asse price is observed, and exercise opporuniies ae place a = 1,...,. We consider he pu opion on he iniu of d S asses, i.e., he payoff funcion is of he for ( ) g(,s ) = ax e rδ K ins 1,...,S d S,0. In he res of his secion, exercise policy siply eans sopping ie in he general opial sopping proble. Rear 1. In his exaple, he condiional probabiliy densiy funcion where p(s,s 1 ) = exp p(s i i,s 1 i ) = d p(s i i,s i 1 ) (ln(si /S 1 i ) (r exp2( i )/2)δ) 2 2exp 2 ( i )δ µ2 S i 2π exp 2 ( i )δ µ 2. I can be shown ha p(s,s 1 ) saisfies Assupion 1(ii) and ha Assupion 1(i) is also rivially saisfied. Since he sochasic volailiy canno be direcly observed in realiy bu can be parially observable hrough he inference fro he observed asse price, pricing Aerican opion under he above odel (19)-(20) falls ino he fraewor of opial sopping of POMPs. We illusrae our algorih hrough a series of nuerical experiens wih d S = 1 (one asse) and d S = 2 (wo asses). In paricular, we are ineresed in how he variance of he volailiy (corresponding o he paraeers (θ,λ,γ)) and observaion noise (corresponding o he paraeer µ) influence he price difference due o he difference beween he fully observable and parially observable volailiies. We lis he paraeer ses in able I. o copue opion prices under boh full and parial observaions, we ipleen our algorih as well as he Leas-Squares Mone Carlo (LSMC) ehod of [7], which provides subopial exercise policies, and he prial-dual (PD) ehod of [1], which parallels our ehod in he fully observable odels. he nuerical resuls of he opion prices under differen paraeer ses are lised in able II (for one asse) and able III (for wo asses), where LB represens he lower bound obained by he LSMC ehod for he fully/parially observable odel wih he following wo ses of basis funcions for he one-asse and wo-asse probles respecively: H 1 =L 0 (S 1 ),L 2 0(S 1 ),L 1 (S 1 ),L 2 1(S 1 ),L 0 (S 1 )L 1 (S 1 ),1, H 2 =L 0 (S 1 ),L 2 0(S 1 ),L 0 (S 2 ),L 2 0(S 2 ),L 0 (S 1 )L 0 (S 2 ),L 2 (S 1,S 2 ),L 2 2(S 1,S 2 ),1, where L 0 (x) = x, L 1 (x) = axk x,0 and L 2 (x,y) = axk inx,y,0. Please noe ha he basis funcions only depend on he asse price S no he volailiy exp( ), so he subopial policy is F Y -adaped and he resuls are guaraneed o be lower bounds for he parially observable odel. In he ables, UB represens

5 he corresponding upper bound yielded by our filering-based dualiy ehod for he parially observable odel, and Full.ŨB represens he corresponding upper bound yielded by he PD ehod for he fully observable odel. I is clear ha we can iprove he exercise policy for he fully observable odel by eploying ore basis funcions ha use he inforaion of he volailiy exp( ): Full.LB and Full.UB are he lower bound and upper bound for he fully observable odel, sill obained by he LSMC ehod and PD ehod wih addiional basis funcions for each proble: H add 1 = L 0 (e 1 ),L 0 (e 1 )L 1 (S 1 ) H add 2 = L 0 (e 1 ),L 2 0(e 1 ),L 0 (e 2 ),L 2 0(e 2 ),L 0 (e 1 )L 2 (S 1,S 2 ),L 0 (e 2 )L 2 (S 1,S 2 ). Each enry in able II and able III shows he saple average and he sandard error (in parenheses) of he nuerical resuls of 20 independen runs using he following procedure: we ipleen he LSMC ehod wih 50000 saple pahs o obain a subopial policy τ, and hen apply his policy on anoher independen se of 50000 pahs o ge he lower bound LB; he dual upper bound UB is obained by ipleening Algorih 1 using he subopial policy τ wih he nuber of saple pahs N = 500, nuber of paricles = 500, and nuber of subpahs l = 10; o invesigae he opion prices under he fully observable sochasic volailiy, we use he PD ehod wih 500 saple pahs and 5000 subpahs in nesed siulaion (which is equal o l) o obain an upper bound Full.ŨB, since he policy τ obained before is also a subopial policy for he fully observable odel. Excep he new ses of basis funcions, he LSMC and PD ehods are ipleened exacly he sae way as before o generae anoher se of lower bound Full.LB and upper bound Full.UB for he fully observable odel. In pracice we ofen use he average of LB and UB, and he average of Full.LB and Full.UB as esiaes of he opion prices o he parially observable and fully observable probles, respecively. ABLE I PARAMEER SES # (θ,λ,γ) µ 1 (log(0.1),1.0,1.0) 0.3 2 (log(0.1),1.0,1.0) 1.0 3 (log(0.2),0.5,1.0) 0.3 4 (log(0.2),0.5,1.0) 1.0 5 (log(0.2),1.5,1.0) 0.3 6 (log(0.2),1.5,1.0) 1.0 7 (log(0.2),1.0,0.5) 0.3 8 (log(0.2),1.0,0.5) 1.0 9 (log(0.3),2.0,0.3) 0.3 10 (log(0.3),2.0,0.3) 1.0 ABLE II AMERICAN PU OPION PRICES ON ONE ASSE (r = 0.05, K = 40, δ = 0.1, = 10, S 0 = 36, 0 = θ ) Volailiy no observable Volailiy direcly observable # LB UB Full.ŨB Full.LB Full.UB 1 3.820(0.000) 3.820(0.000) 3.825(0.001) 3.820(0.000) 3.821(0.000) 2 3.853(0.001) 3.887(0.001) 3.954(0.003) 3.905(0.002) 3.912(0.001) 3 3.892(0.001) 4.019(0.003) 4.321(0.005) 4.197(0.003) 4.209(0.001) 4 5.009(0.006) 5.216(0.005) 5.368(0.009) 5.297(0.005) 5.328(0.001) 5 3.881(0.001) 3.898(0.001) 3.995(0.004) 3.928(0.002) 3.938(0.001) 6 4.842(0.003) 4.935(0.002) 5.028(0.003) 4.973(0.004) 4.997(0.001) 7 3.869(0.001) 3.870(0.000) 3.876(0.001) 3.871(0.001) 3.872(0.000) 8 4.632(0.002) 4.653(0.001) 4.704(0.002) 4.679(0.003) 4.689(0.001) 9 4.010(0.001) 4.022(0.001) 4.049(0.001) 4.030(0.001) 4.044(0.001) 10 5.881(0.003) 5.902(0.001) 5.907(0.001) 5.896(0.005) 5.904(0.001) he nuerical resuls are divided ino wo caegories: he firs six rows repor he nuerical resuls under he doinan volailiy effecs, i.e., γ is coparaively large and λ is coparaively sall; he las four rows repor he resuls under oderae/wea volailiy effecs. I can be seen fro he ables ha [Full.LB,Full.UB] is usually a igher inerval han [LB, Full.ŨB] for he fully observable opion price, since ore inforaion is used o deerine a beer ABLE III AMERICAN PU OPION PRICES ON HE MINIMUM OF WO ASSES (r = 0.05, K = 40, δ = 0.1, = 10, S 0 = (36,36), 0 = (θ,θ) ) Volailiy no observable Volailiy direcly observable # LB UB Full.ŨB Full.LB Full.UB 1 4.027(0.002) 4.032(0.001) 4.068(0.002) 4.039(0.001) 4.043(0.001) 2 5.004(0.006) 5.147(0.004) 5.256(0.006) 5.143(0.005) 5.222(0.003) 3 5.274(0.005) 5.378(0.002) 5.565(0.004) 5.467(0.004) 5.489(0.001) 4 8.045(0.006) 8.171(0.004) 8.289(0.006) 8.188(0.010) 8.268(0.003) 5 4.641(0.002) 4.782(0.001) 4.918(0.005) 4.833(0.006) 4.870(0.001) 6 7.531(0.006) 7.638(0.002) 7.723(0.007) 7.606(0.007) 7.704(0.002) 7 4.429(0.002) 4.456(0.001) 4.514(0.001) 4.477(0.002) 4.500(0.001) 8 6.984(0.004) 7.042(0.003) 7.074(0.004) 6.997(0.007) 7.080(0.001) 9 5.417(0.002) 5.428(0.001) 5.449(0.001) 5.431(0.003) 5.447(0.001) 10 9.084(0.006) 9.130(0.002) 9.138(0.002) 9.071(0.009) 9.133(0.002) exercise policy. o differeniae he opion prices under full and parial observaions of sochasic volailiy, [10] poined ou ha he parial observaion of sochasic volailiy has an ipac especially when he effec of he volailiy (i.e., γ2 ) is high. Our nuerical resuls 2λ also suppor heir viewpoins in ers of he differences beween UB and Full.ŨB, which deonsrae he effeciveness of inroducing he filering sep. In paricular, i can be observed ha we can reduce relaively ore overpricing for probles wih doinan volailiy (i.e., he firs caegory). Considering he differences beween LB and Full.UB, parially observable and fully observable opion prices have relaively sall gaps under oderae/wea volailiy effecs copared wih he gaps in he firs caegory. Larger observaion noise µ challenges he perforance of subopial exercise policy and also deerioraes he perforance of paricle filering, so i generally increases he gap beween Full.LB and Full.UB and he gap beween LB and UB. Copared wih [10] and [8], whose approaches provide asypoic lower bounds on he opion prices, our ain conribuion is o provide an asypoic upper bound on he opion price, which is less han or siilar o he lower bound (Full.LB) of he corresponding fully observable opion price in he firs caegory. Hence, our ehod provides a beer crierion o evaluae he perforance of LB: he saller he gap beween UB and LB, he beer he bounds. If he gap beween UB and LB is sall enough, hey can be boh regarded as approxiae opion prices under parial observaion. Oherwise, iproveen on he exercise policy should be considered. VI. CONCLUSION In his noe we propose a nuerical approach o solve for he value funcion of he parially observable opial sopping proble. We represen he value funcion as a soluion of a dual iniizaion proble, based on which we develop an algorih ha copleens a subopial sopping ie wih an asypoic upper bound on he value funcion. Our approach provides a pracical way o judge wheher ore copuaional effor is needed o iprove he qualiy of he approxiae soluion. We apply our approach o price Aerican pu opions in sochasic volailiy odels, wih he realisic assupion ha he volailiy canno be direcly observed bu can be inferred fro he asse prices. he nuerical resuls confir a higher price of he opion if we alernaively assue ha he volailiy is direcly observable. he price difference is ore significan when he effec of volailiy is high, indicaing he iporance of aing he parial observabiliy ino accoun. APPENDI PROOF OF HEOREM 2 We need he following proposiion for he proof of he heore. Proposiion 1 (Corollary 10.28, [2]). Le π0,...,π be he rando easure generaed by Algorih 2 for he observaion sequence y 0:. Suppose ha he following assupion holds: f < and sup p(y x,y 1 ) <, = 1,...,. x

6 hen [ ( E f (x )π (x )dx ) ] 2 f (x )π (x )dx 2 f 2, = 0,...,, where he consan does no depend on (bu i dose depend on and y 0: ). In paricular, 0 = 1. Proof of heore 2: We firs prove (17). Given a saple pah of he observaions y 0,...,y, he difference of g(π,y ) and g(π,y ) is ϑ g(x,y )π (x )dx g(x,y )π (x )dx. Guaraneed by Proposiion 1, E[ ϑ ] E[(ϑ ) 2 ] g for soe consan. he difference beween M τ and M is he su of he differences beween τ and : where τ = χ, χ 1, + ε,l, ε,l 1,, χ, E[g( τ,y τ ) π,y ] E[g( τ,y τ ) π,y ], χ 1, E[g( τ,y τ ) π 1,y 1 ] E[g( τ,y τ ) π 1,y 1], ε,l, E[g( τ,y τ ) π,y ] G,l,, ε,l 1, E[g( τ,y τ ) π 1,y 1] G,l 1,. he firs wo errors are filering errors, since we can rewrie χ, as ] ] χ, = E g( j,y j )1 τ = j π,y E g( j,y j )1 τ = j π,y [ j= [ j= = I (x,y )π (x )dx I (x,y )π (x )dx. (21) I (x,y ) is defined as he inegrand of E[ j= g( j,y j )1 τ = j π,y ], i.e., I (x,y ) g(x,y )1 τ = + g(x j,y j )1 τ = j p(dx +1 dy +1...dx j dy j x,y ), j=+1 where p(dx +1 dy +1...dx j dy j x,y ) denoes he join probabiliy disribuion of (x +1,y +1,...,x j,y j ) condiional on (x,y ). As τ = j are disjoin ses for each j, i iplies I g. Based on (21) and using Proposiion 1 wih f = I, i is ensured ha E[ χ, ] g for soe consan. Siilarly, E[ χ 1, ] b 1 g for soe consan b 1. he laer wo errors are fro he sapling variabiliy of Mone Carlo siulaion (as sep 1 in Algorih 2); he error bounds are guaraneed by Proposiion 1 wih = 0, i.e., E[ ε,,l ] g and E[ ε,l g l 1, ]. l So given a saple pah of he observaions y 0: we have for each J, Since li E[ ( g(π,y ) M τ ) ( g(π,y ) M ) ] = li E[ ϑ + ( ( i τ i )) ] = 0. (22) ax J g(π,y ) M τ ax J g(π,y ) M ax J ( g(π,y ) M τ ) ( g(π,y ) M ) ( g(π,y ) M τ ) ( g(π,y ) M ), =1 by aing expecaion and leing go o infiniy we have li E[ ax J g(π,y ) M ax g(π,y ) M τ ] = 0. J Noe ha is bounded by 2 g for each J, and herefore, g(π,y ) M is bounded by (2 +1) g and ax J g(π,y ) M is bounded by (2 + 1) g. he sae conclusions are also valid for τ, g(π,y ) M τ and ax J g(π,y ) M τ. hen li E [ 0 ax J g(π,y ) M ax g(π,y ) M τ ] J = li E [ [ 0 E ax J g(π,y ) M ax g(π,y ) M τ F Y ]] J [ =E 0 li E[ ax J g(π,y ) M ax g(π,y ) M τ F Y ]] J =0, where he second equaliy follows fro he boundedness of he inegrand and he doinaed convergence heore. Hence, li E 0[ax J g(π,y ) M ] = E 0 [ax g(π,y ) M τ ]. J Now we prove (18). Firs we have E 0 [ax J g(π,y ) M τ ] V 0 =E 0 [ax J g(π,y ) M τ ] E 0 [ax J g(π,y ) M ] E 0 [ax J M M τ ], following he fac ha ax J g(π,y ) M τ ax J g(π,y ) M ax J M M τ. hen (18) follows fro E 0 [ax J M M τ ] 2 E 0 [(M Mτ )2 ] [ ((M =2 ] E 0 M τ ) (M 1 Mτ 1 )) 2 =1 =2 E 0 [( τ ) 2 ] =1 [ (E[g(τ 2 E 0,Y τ ) Π,Y ] E[g( τ,y τ ) Π,Y ] ) ] 2, =1 where he firs inequaliy follows fro he fac ha M M τ is a aringale and applying Doob s aringale inequaliy, and he firs equaliy uses he orhogonaliy propery of aringale difference (see p.331 in [12]). o show he las inequaliy, recall ha τ =(E[g( τ,y τ ) F Y ] E[g( τ,y τ ) F Y ]) (E[g( τ,y τ ) F Y 1] E[g( τ,y τ ) F Y 1]); hen he las inequaliy can be shown by siple algebra and ieraed expecaion on F Y 1. REFERENCES [1] L. Andersen and M. Broadie. Prial-dual siulaion algorih for pricing ulidiensional Aerican opions. Manageen Science, 50(9):1222 1234, 2004. [2] A. Bain and D.Crisan. Fundaenals of Sochaic Filering. Springer, 2008. [3] D.P. Berseas. Dynaic Prograing and Opial Conrol. Ahena Scienific, 3rd ediion, 2007. [4] I. Florescu and F. Viens. Sochasic volailiy: Opion pricing using a ulinoial recobining ree. Applied Maheaical Finance, 15(2):151 181, 2008. [5] M. B. Haugh and L. Kogan. Pricing Aerican opions: A dualiy approach. Operaions Research, 52(2):258 270, 2004. [6] D. Laberon and B. Lapeyre. Inroducion o sochasic calculus applied o finance. Chapan & Hall/CRC, 2007. [7] F. A. Longsaff and E. S. Schwarz. Valuing Aerican opions by siulaion: A siple leas-squares approach. he Review of Financial Sudies, 14(1):113 147, 2001. [8] M. Ludovsi. A siulaion approach o opial sopping under parial inforaion. Sochasic Processes and Applicaions, 119(12):2071 2087, 2009. [9] H. Pha, W. Runggaldier, and A. Sellai. Approxiaion by quanizaion of he filer process and applicaions o opial sopping probles under parial observaion. Mone Carlo Mehods and Applicaions, 11(1):57 81, 2005. [10] B. R. Rabhara and A. E. Brocwell. Sequenial Mone Carlo pricing of Aerican-syle opions under sochasic volailiy odels. he Annals of Applied Saisics, 4, No. 1, 222-265(1):222 265, 2010. [11] L. C. G. Rogers. Mone Carlo valuaion of Aerican opions. Maheaical Finance, 12(3):271 286, 2002. [12] S.Karlin and H. aylor. A Firs Course in Sochasic Process,2nd edn. Acadeic Press, San Diego, 1975. [13] J. sisilis and B. Van Roy. Regression ehods for pricing coplex Aerican-syle opions. IEEE ransacions on Neural Newors, 12(4):694 703, 2001. [14] E. Zhou. Opial sopping under parial observaion: Near-value ieraion. 2011. Forhcoing in IEEE ransacions on Auoaic Conrol.