Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment

Size: px
Start display at page:

Download "Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment"

Transcription

1 Proceedngs of he Tweny-Nnh AAAI Conference on Arfcal Inellgence Unsupervsed Cross-Doman Transfer n Polcy Graden Renforcemen Learnng va Manfold Algnmen Haham Bou Ammar Unv. of Pennsylvana hahamb@seas.upenn.edu Erc Eaon Unv. of Pennsylvana eeaon@cs.upenn.edu Paul Ruvolo Oln College of Engneerng paul.ruvolo@oln.edu Mahew E. Taylor Washngon Sae Unv. aylorm@eecs.wsu.edu Absrac The success of applyng polcy graden renforcemen learnng RL o dffcul conrol asks hnges crucally on he ably o deermne a sensble nalzaon for he polcy. Transfer learnng mehods ackle hs problem by reusng knowledge gleaned from solvng oher relaed asks. In he case of mulple ask domans, hese algorhms requre an ner-ask mappng o faclae knowledge ransfer across domans. However, here are currenly no general mehods o learn an ner-ask mappng whou requrng eher background knowledge ha s no ypcally presen n RL sengs, or an expensve analyss of an exponenal number of ner-ask mappngs n he sze of he sae and acon spaces. Ths paper nroduces an auonomous framework ha uses unsupervsed manfold algnmen o learn nerask mappngs and effecvely ransfer samples beween dfferen ask domans. Emprcal resuls on dverse dynamcal sysems, ncludng an applcaon o quadroor conrol, demonsrae s effecveness for cross-doman ransfer n he conex of polcy graden RL. Inroducon Polcy graden renforcemen learnng RL algorhms have been appled wh consderable success o solve hghdmensonal conrol problems, such as hose arsng n roboc conrol and coordnaon Peers & Schaal 28. These algorhms use graden ascen o une he parameers of a polcy o maxmze s expeced performance. Unforunaely, hs graden ascen procedure s prone o becomng rapped n local maxma, and hus has been wdely recognzed ha nalzng he polcy n a sensble manner s crucal for achevng opmal performance. or nsance, one ypcal sraegy s o nalze he polcy usng human demonsraons Peers & Schaal 26, whch may be nfeasble when he ask canno be easly solved by a human. Ths paper explores a dfferen approach: nsead of nalzng he polcy a random.e., abula rasa or va human demonsraons, we nsead use ransfer learnng TL o nalze he polcy for a new arge doman based on knowledge from one or more source asks. In RL ransfer, he source and arge asks may dffer n her formulaons Taylor & Sone 29. In parcular, Copyrgh c 215, Assocaon for he Advancemen of Arfcal Inellgence All rghs reserved. when he source and arge asks have dfferen sae and/or acon spaces, an ner-ask mappng Taylor e al. 27a ha descrbes he relaonshp beween he wo asks s ypcally needed. Ths paper nroduces a framework for auonomously learnng an ner-ask mappng for cross-doman ransfer n polcy graden RL. rs, we learn an ner-sae mappng.e., a mappng beween saes n wo asks usng unsupervsed manfold algnmen. Manfold algnmen provdes a powerful and general framework ha can dscover a shared laen represenaon o capure nrnsc relaons beween dfferen asks, rrespecve of her dmensonaly. The algnmen also yelds an mplc ner-acon mappng ha s generaed by mappng rackng saes from he source o he arge. Gven he mappng beween ask domans, source ask raecores are hen used o nalze a polcy n he arge ask, sgnfcanly mprovng he speed of subsequen learnng over an unnformed nalzaon. Ths paper provdes he followng conrbuons. rs, we nroduce a novel unsupervsed mehod for learnng nersae mappngs usng manfold algnmen. Second, we show ha he dscovered subspace can be used o nalze he arge polcy. Thrd, our emprcal valdaon conduced on four dssmlar and dynamcally chaoc ask domans e.g., conrollng a hree-lnk car-pole and a quadroor aeral vehcle shows ha our approach can a auomacally learn an ner-sae mappng across MDPs from he same doman, b auomacally learn an ner-sae mappng across MDPs from very dfferen domans, and c ransfer nformave nal polces o acheve hgher nal performance and reduce he me needed for convergence o near-opmal behavor. Relaed Work Learnng an ner-ask mappng has been of maor neres n he ransfer learnng communy because of s promse of auonomous ransfer beween very dfferen asks Taylor & Sone 29. However, he maory of exsng work assumes ha a he source ask and arge ask are smlar enough ha no mappng s needed Baneree & Sone 27; Kondars & Baro 27, or b an ner-ask mappng s provded o he agen Taylor e al. 27a; Torrey e al. 28. The man dfference beween hese mehods and hs paper s ha we are neresed n learnng a mappng beween asks. There has been some recen work on learnng such mappngs. or example, mappngs may be based on seman- 254

2 c knowledge abou sae feaures beween wo asks Lu & Sone 26, background knowledge abou he range or ype of sae varables Taylor e al. 27b, or ranson models for each possble mappng could be generaed and esed Taylor e al. 28. However, here are currenly no general mehods o learn an ner-ask mappng whou requrng eher background knowledge ha s no ypcally presen n RL sengs, or an expensve analyss of an exponenal number n he sze of he acon and sae varable ses of ner-ask mappngs. We overcome hese ssues by auomacally dscoverng hgh-level feaures and usng hem o ransfer knowledge beween agens whou sufferng from an exponenal exploson. In prevous work, we used sparse codng, sparse proecon, and sparse Gaussan processes o learn an ner-ask mappng beween MDPs wh arbrary varaons Bou Ammar e al However, hs prevous work reled on a Eucldean dsance correlaon beween source and arge ask rples, whch may fal for hghly dssmlar asks. Addonally, placed resrcons on he ner-ask mappng ha reduced he flexbly of he learned mappng. In oher relaed work, Bósc e al. 213 use manfold algnmen o asss n ransfer. The prmary dfferences wh our work are ha he auhors a focus on ransferrng models beween dfferen robos, raher han polces/samples, and b rely on source and arge robos ha are qualavely smlar. Background Renforcemen Learnng problems nvolve an agen choosng sequenal acons o maxmze s expeced reurn. Such problems are ypcally formalzed as a Markov decson process MDP T = S, A, P, P,r, where S s he poenally nfne se of saes, A s he se of acons ha he agen may execue, P : S [, 1] s a probably dsrbuon over he nal sae, P : S A S [, 1] s a sae ranson probably funcon descrbng he ask dynamcs, and r : S A S Rs he reward funcon measurng he performance of he agen. A polcy π : S A [, 1] s defned as a condonal probably dsrbuon over acons gven he curren sae. The agen s goal s o fnd a polcy π whch maxmzes he average expeced reward: π =argmax π =argmax π E T [ 1 H H ] rs, a, s +1 π =1 p π τ Rτ dτ, where T s he se of all possble raecores wh horzon H, Rτ = 1 H rs, a, s +1, and 2 H =1 p π τ =P s 1 1 H Ps +1 s, a πa s. 3 =1 Polcy Graden mehods Suon e al. 1999; Peers e al. 25 represen he agen s polcy π as a funcon defned over a vecor θ R d of conrol parameers and a vecor of sae feaures gven by he ransformaon Φ : S R m. By subsung hs parameerzaon of he conrol polcy no Eqn. 2, we can compue he parameers of he opmal polcy as θ = argmax θ J θ, where J θ = T p πθτ Rτ dτ. To maxmze J, many polcy graden mehods employ sandard supervsed funcon approxmaon o learn θ by followng an esmaed graden of a lower bound on he expeced reurn of J θ. Polcy graden algorhms have ganed aenon n he RL communy n par due o her successful applcaons on real-world robocs Peers e al. 25. Whle such algorhms have a low compuaonal cos per updae, hghdmensonal problems requre many updaes by acqurng new rollous o acheve good performance. Transfer learnng can reduce hs daa requremen and accelerae learnng. Snce polcy graden mehods are prone o becomng suck n local maxma, s crucal ha he polcy be nalzed n a sensble fashon. A common echnque Peers & Schaal 26; Argall e al. 29 for polcy nalzaon s o frs collec demonsraons from a human conrollng he sysem, hen use supervsed learnng o f polcy parameers ha maxmze he lkelhood of he human-demonsraed acons, and fnally use he fed parameers as he nal polcy parameers for a polcy graden algorhm. Whle hs approach works well n some sengs, s napplcable n several common cases: a when s dffcul o nsrumen he sysem n queson so ha a human can successfully perform a demonsraon, b when an agen s consanly faced wh new asks, makng gaherng human demonsraons for each new ask mpraccal, or c when he asks n queson canno be nuvely solved by a human demonsraor. The nex secon nroduces a mehod for usng ransfer learnng o nalze he parameers of a polcy n a way ha s no suscepble o hese lmaons. Our expermenal resuls show ha hs mehod of polcy nalzaon, when compared o random polcy nalzaon, s able o no only acheve beer nal performance, bu also oban a hgher performng polcy when run unl convergence. Polcy Graden Transfer Learnng Transfer learnng ams o mprove learnng mes and/or behavor of an agen on a new arge ask T by reusng knowledge from a solved source ask T S. In RL sengs, each ask s descrbed by an MDP: ask T S = S S, A S, P S, P S,r S and T = S, A, P, P,r. One way n whch knowledge from a solved source ask can be leveraged o solve he arge ask s by mappng opmal sae, acon, nex sae rples from he source ask no he sae and acon spaces of he arge ask. Transferrng opmal rples n hs way allows us o boh provde a beer umpsar and learnng ably o he arge agen, based on he source agen s ably. Whle he precedng dea s aracve, complexes arse when he source and arge asks have dfferen sae and/or acon spaces. In hs case, one mus defne an ner-ask mappng χ n order o ranslae opmal rples from he source o he arge ask. Typcally Taylor & Sone 29, χ s defned by wo sub-mappngs: 1 an ner-sae mappng χ S and 2 an ner-acon mappng χ A. 255

3 races from πs Phase I: Learn cross-doman mappng α T S shared represenaon α T χ S races from arge 3. execue πs 2. reflec arge 1. sample nal saes α T+ α T P S P S Source Doman G S Phase II: Cross-doman ransfer va χ S α T S α T+ 4. ransfer rackng sgnal Targe Doman G T gure 1: Transfer s spl no wo phases: I learnng he ner-sae mappng χ S va manfold algnmen, and II nalzng he arge polcy va mappng he source ask polcy. By adopng an RL framework where polces are saefeedback conrollers, we show ha we can use opmal sae raecores from he source ask o nellgenly nalze a conrol polcy n he arge ask, whou needng o explcly consruc an ner-acon mappng. We accomplsh hs by learnng a pseudo-nverble ner-sae mappng beween he sae spaces of a par of asks usng manfold algnmen, whch can hen be used o ransfer opmal sequences of saes o he arge. The fac ha our algorhm does no requre learnng an explc ner-acon mappng sgnfcanly reduces s compuaonal complexy. Our approach consss of wo phases gure 1. rs, usng races gahered n he source and arge asks, we learn an ner-sae mappng χ S usng manfold algnmen Phase I n gure 1. To perform hs sep, we adap he Unsupervsed Manfold Algnmen UMA algorhm Wang & Mahadevan 29, as dealed n he nex secon. Second, we use χ S o proec sae raecores from he source o he arge ask Phase II n gure 1. These proeced sae raecores defne a se of a rackng raecores for he arge ask ha allow us o perform one sep of polcy graden mprovemen n he arge ask. Ths polcy mprovemen sep nellgenly nalzes he arge polcy, whch resuls n superor learnng performance han sarng from a randomly nalzed polcy, as shown n our expermens. Alhough we focus on polcy graden mehods, our approach could easly be adaped o oher polcy search mehods e.g., PoWER, REPS, ec.; see Kober e al Learnng an Iner-Sae Mappng Unsupervsed Manfold Algnmen UMA s a echnque ha effcenly dscovers an algnmen beween wo daases Wang & Mahadevan 29. UMA was developed o algn daases for knowledge ransfer beween wo supervsed learnng asks. Here, we adap UMA o an RL seng by algnng source and arge ask sae spaces wh poenally dfferen dmensons m S and m T. To learn χ S relang S S and S, raecores of } saes n he source ask, τs = ns s,s 1,,s,S H S, are obaned by =1 followng πs, and raecores of saes n he arge ask, } τ = s, 1,,s, nt H T, are obaned by ulzng π, whch s nalzed usng randomly seleced polcy =1 parameers. or smplcy of exposon, we assume ha raecores n he source doman have lengh H S and hose n he arge doman have lengh H T ; however, our algorhm s capable of handlng varable-lengh raecores. We are neresed n he seng where daa s scarcer n he arge ask han n he source ask.e., n T n S. Gven raecores from boh he source and arge asks, we flaen he raecores.e., we rea he saes as unordered and hen apply he ask-specfc sae ransformaon o oban wo ses of sae feaure vecors, one for he source ask and one for he arge ask. Specfcally, we creae he followng ses of pons: X S = Φ S s 1S 1 X =,,Φ S s 1S H S, } Φ S s n SS 1,,Φ S s n SS H S Φ s 1 1,,Φ s 1 H T, } Φ s n T 1,,Φ s n T H T. Gven X S R m S H S n S, X R m T H T n T,we can apply he UMA algorhm Wang & Mahadevan 29 wh mnmal modfcaon, as descrbed nex. Unsupervsed Manfold Algnmen UMA The frs sep of applyng UMA o learn he ner-sae mappng s o represen each ransformed sae n boh he source and arge asks n erms of s local geomery. We use he noaon R S x R k+1 k+1 o refer o he marx of parwse Eucldean dsances among he k-neares neghbors of x S X S. Smlarly, R x refers o he equvalen marx of dsances for he k-neares neghbors of x X. The relaons beween local geomeres n X S and X are represened by he marx W R n S H S n T H T wh w, =exp ds R S x ds R S x, R x = [ mn 1 h k!, R x mn R x R x S } and dsance merc h γ 1 R x S, 4 γ 2 R x h ]. We use he noaon h o denoe he h h varan of he k! permuaons of he rows and columns of he npu marx, s he robenus norm, and γ 1 and γ 2 are defned as: r γ 1 = r R T x S R T x S R x R x S h γ 2 = r Rx T h R x S r Rx T h R x h. 256

4 To algn he manfolds, UMA compues he on Laplacan LX S + μγ L = 1 μγ 2 μγ 3 L X + μγ 4 5 wh dagonal marces Γ 1 Γ 4 R n T H T n T H T, where Γ 1, R n S H S n S H S and = w, and Γ 4, = w,. The marces Γ 2 R n S H S n T H T and Γ 3 R n T H T n S H S on he wo manfolds wh Γ 2, = w, and Γ 3, = w,. Addonally, he non-normalzed Laplacans L X S and L X are defned as: L X S = D X S W X S and L X = D X W X, where D X S R n S H S n S H S s a dagonal marx wh D, = X S ws, and, smlarly, D, = X w,. The marces WS and W represen he smlary n he source and arge ask sae spaces respecvely and can be compued smlar o W. To on he manfolds, UMA frs defnes wo marces: τ Z = S τ DS S D = D S. 6 Gven Z and D, UMA compues opmal proecons o reduce he dmensonaly of he on srucure by akng he d mnmum egenvecors ζ 1,,ζ d of he generalzed egenvalue decomposon ZLZ T ζ = λzdz T ζ. The opmal proecons α S and α are hen gven as he frs d 1 and d 2 rows of [ζ 1,,ζ d ], respecvely. Gven he embeddng dscovered by UMA, we can hen defne he ner-sae mappng as: χ S [ ] =α T+ αt S[ ]. 7 The nverse of he ner-sae mappng o proec arge saes o he source ask can be deermned by akng he pseudo-nverse of Eqn. 7, yeldng χ + S [ ] =αt+ S αt [ ]. Inuvely, hs approach algns he mporan regons of he source ask s sae space sampled based on opmal source raecores wh he sae space explored so far n he arge ask. Alhough acons were gnored n consrucng he manfolds, he algned represenaon mplcly capures local sae ranson dynamcs whn each ask snce he saes came from raecores, provdng a mechansm o ransfer raecores beween asks, as we descrbe nex. Polcy Transfer and Improvemen Nex, we dscuss he procedure for nalzng he arge polcy, π. We consder a model-free seng n whch he polcy s lnear over a se of poenally non-lnear sae feaure funcons modulaed by Gaussan nose where he magnude of he nose balances exploraon and exploaon. Specfcally, we can wre he source and arge polces as: π S π s S s = Φ S s S = Φ s T θ S + ɛ S T θ + ɛ, where ɛ S N, Σ S and ɛ N, Σ. τ =1 To nalze π, we frs sample m nal arge raecores D = from he arge ask usng a } m randomly nalzed polcy hese can be newly sampled raecores or smply he ones used o do he nal manfold algnmen sep. Nex, we map he se of nal saes n D o he source ask usng χ + S. We hen run he opmal source polcy sarng from each of hese mapped nal saes o produce a se of m opmal sae raecores n he source ask. nally, he resulng sae raecores are mapped back o he arge ask usng χ S o generae a se of refleced sae- raecores n he arge ask, D = τ } m =1. or clary, we assume ha all raecores are of lengh H; however, hs s no a fundamenal lmaon of our algorhm. We defne he followng ransfer cos funcon: J T S T θ = m p θ =1 τ ˆR τ, τ where ˆR s a cos funcon ha penalzes devaons beween he nal sampled raecores n he arge ask and he refleced opmal raecores: ˆR τ, τ = 1 H H s s = Mnmzng, Eqn. 8 s equvalen o aanng a arge polcy parameerzaon θ such ha π follows he refleced raecores D. urher, Eqn. 8 s n exacly he form requred o apply sandard off-he-shelf polcy graden algorhms o mnmze he ransfer cos. The Manfold Algnmen Cross-Doman Transfer for Polcy Gradens MAXDT-PG framework s dealed 1 n Algorhm 1. Specal Cases Our work can be seen as an exenson of he smpler modelbased case wh a lnear-quadrac regulaor LQR Bemporad e al. 22 polcy, whch s derved and explaned n he onlne appendx 2 accompanyng hs paper. Alhough he assumpons made by he model-based case seem resrcve, he analyss n he appendx covers a wde range of applcaons. These, for example, nclude: a he case n whch a dynamcal model s provded beforehand, or b he case n whch model-based RL algorhms are adoped see Buşonu e al. 21. In he man paper, however, we consder he more general model-free case. Expermens and Resuls To assess MAXDT-PG s performance, we conduced expermens on ransfer boh beween asks n he same doman as well as beween asks n dfferen domans. Also, we suded 1 Lnes 9-11 of Algorhm 1 requre neracon wh he arge doman or a smulaor for acqurng he opmal polcy. Such an assumpon s common o polcy graden mehods, where a each eraon, daa s gahered and used o eravely mprove he polcy. 2 The onlne appendx s avalable on he auhors webses. 257

5 θ, θ Algorhm 1 Manfold Algnmen Cross-Doman Transfer for Polcy Gradens MAXDT- PG Inpus: Source and arge asks T S and T, opmal source polcy πs, # source and arge races ns and nt, # neares neghbors k, # arge rollous zt, nal # of arge saes m. Learn χs : 1: Sample ns opmal source races, τs, and nt random arge races, τ 2: Usng he modfied UMA approach, learn αs and T α o produce χs = αt+ αs [ ] Transfer & Inalze Polcy: 3: Collec m nal arge saes s1 P 4: Proec hese m saes o he source by applyng χ+ S [ ] 5: Apply he opmal source polcy πs on hese proeced S saes o collec DS = τ x, x a Smple Mass θ 3, θ 3 θ 1, θ 1 x, x b Car Pole 2 e11 e21 e31 r 3 e3b e 2B e1 1 Φ B l rol Θ p ch 4 Ψ yaw d Quadroor gure 2: Dynamcal sysems used n he expermens. =1 Three-Lnk Car Pole 3CP: The 3CP dynamcs are descrbed va an egh-dmensonal sae vecor x, x, θ1, θ 1, θ2, θ 2, θ3, θ 3, where x and x descrbe he poson and velocy of he car and θ and θ represen he angle and angular velocy of he h lnk. The sysem s conrolled by applyng a force o he car n he x drecon, wh he goal of balancng he hree poles uprgh. Quadroor QR: The sysem dynamcs were adoped from a smulaor valdaed on real quadroors Bouabdallah 27; Voos & Bou Ammar 21, and are descrbed va hree angles and hree angular veloces n he body frame.e., e1b, e2b, and e3b. The acons conss of four roor orques 1, 2, 3, 4 }. Each ask corresponds o a dfferen quadroor configuraon e.g., dfferen armaure lenghs, ec., and he goal s o sablze he dfferen quadroors. 6: Proec he samples n D S o he arge usng χs [ ] o produce rackng arge races D θ 2, θ 2 x, x c Three-Lnk Car Pole m 7: Compue rackng rewards usng Eqn. 9 8: Use polcy gradens o mnmze Eqn. 8, yeldng θ Improve Polcy: 9: Sar wh θ and sample zt arge rollous 1: ollow polcy gradens e.g., epsodc REINORCE bu usng arge rewards R 11: Reurn opmal arge polcy parameers θt he robusness of he learned mappng by varyng he number of source and arge samples used for ransfer and measurng he resulan arge ask performance. In all cases we compared he performance of MAXDT- PG o sandard polcy graden learners. Our resuls show ha MAXDT- PG was able o: a learn a vald ner-sae mappng wh relavely lle daa from he arge ask, and b effecvely ransfer beween asks from eher he same or dfferen domans. Same-Doman Transfer We firs evaluae MAXDT- PG on same-doman ransfer. Whn each doman, we can oban dfferen asks by varyng he sysem parameers e.g., for he SM sysem we vared mass M, sprng consan K, and dampng consan b as well as he reward funcons. We assessed he performance of usng he ransferred polcy from MAXDT- PG versus sandard polcy gradens by measurng he average reward on he arge ask vs. he amoun of learnng eraons n he arge. We also examned he robusness of MAXDT- PG s performance based on he number of source and arge samples used o learn χs. Rewards were averaged over 5 races colleced from 15 nal saes. Due o space consrans, we repor same-doman ransfer resuls here; deals of he asks and expermenal procedure can be found n he appendx2. gure 3 shows MAXDT- PG s performance usng varyng numbers of source and arge samples o learn χs. These resuls reveal ha ransfer-nalzed polces ouperform sandard polcy graden nalzaon. urher, as he number of samples used o learn χs ncreases, so does boh he nal and final performance n all domans. All nalzaons resul n equal per-eraon compuaonal cos. Therefore, MAXDT- PG boh mproves sample complexy and reduces wall-clock learnng me. Dynamcal Sysem Domans We esed MAXDT- PG and sandard polcy graden learnng on four dynamcal sysems gure 2. On all sysems, he reward funcon was based on wo facors: a penalzng saes far from he goal sae, and b penalzng hgh forces acons o encourage smooh, low-energy movemens. Smple Mass Sprng Damper SM: The goal wh he SM s o conrol he mass a a specfied poson wh zero velocy. The sysem dynamcs are descrbed by wo saevarables ha represen he mass poson and velocy, and a sngle force ha acs on he car n he x drecon. Car Pole CP: The goal s o swng up and hen balance he pole vercally. The sysem dynamcs are descrbed va a four-dmensonal sae vecor x, x, θ, θ, represenng he poson, velocy of he car, and he angle and angular velocy of he pole, respecvely. The acons conss of a force ha acs on he car n he x drecon. 258

6 Average Reward Source 1 Targe Samples 5 Source 5 Targe Samples Source 3 Targe Samples 6 1 Source 1 Targe Samples Sandard Polcy Gradens Ieraons Ieraons Ieraons Ieraons 3 a Smple Mass b Car Pole c Three-Lnk Car Pole d Quadroor gure 3: Same-doman ransfer resuls. All plos share he same legend and vercal axs label. Average Reward Source 1 Targe Samples Source 5 Targe Samples 5 Source 3 Targe Samples 1 Source 1 Targe Samples 315 Sandard Polcy Gradens Ieraons Ieraons Ieraons a Smple Mass o Car Pole 35 b Car Pole o Three-Lnk CP c Car Pole o Quadroor θ r θ * Targe: SM Targe: CP Targe: 3CP Source: SM Source: CP Source: 3CP Procruses Measure d Algnmen Qualy vs Transfer gure 4: Cross-doman ransfer resuls. Plos a c depc arge ask performance, and share he same legend and axs labels. Plo d shows he correlaon beween manfold algnmen qualy Procruses merc and qualy of he ransferred knowledge. Cross-Doman Transfer Nex, we consder he more dffcul problem of crossdoman ransfer. The expermenal seup s dencal o he same-doman case wh he crucal dfference ha he sae and/or acon spaces were dfferen for he source and he arge ask snce he asks were from dfferen domans. We esed hree cross-doman ransfer scenaros: smple mass o car pole, car pole o hree-lnk car pole, and car pole o quadroor. In each case, he source and arge ask have dfferen numbers of sae varables and sysem dynamcs. Deals of hese expermens are avalable n he appendx 2. gure 4 shows he resuls of cross-doman ransfer, demonsrang ha MAXDT-PG can acheve successful ransfer beween dfferen ask domans. These resuls renforce he conclusons of he same-doman ransfer expermens, showng ha a ransfer-nalzed polces ouperform sandard polcy gradens, even beween dfferen ask domans and b nal and fnal performance mproves as more samples are used o learn χ S. We also examned he correlaon beween he qualy of he manfold algnmen, as assessed by he Procruses merc Goldberg & Rov 29, and he qualy of he ransferred knowledge, as measured by he dsance beween he ransferred θ r and he opmal θ parameers gure 4d. On boh measures, smaller values ndcae beer qualy. Each daa pon represens a ransfer scenaro beween wo dfferen asks, from eher SM, CP, or 3CP; we dd no consder quadroor asks due o he requred smulaor me. Alhough we show ha he Procruses measure s posvely correlaed wh ransfer qualy, we hesae o recommend as a predcve measure of ransfer performance. In our approach, he cross-doman mappng s no guaraneed o be orhogonal, and herefore he Procruses measure s no heorecally guaraneed o accuraely measure he qualy of he global embeddng.e., Goldberg and Rov s 29 Corollary 1 s no guaraneed o hold, bu he Procruses measure sll appears correlaed wh ransfer qualy n pracce. We can conclude ha MAXDT-PG s capable of: a auomacally learnng an ner-sae mappng, and b effecvely ransferrng beween dfferen doman sysems. Even when he source and arge asks are hghly dssmlar e.g., car pole o quadroor, MAXDT-PG s capable of successfully provdng arge polcy nalzaons ha ouperform saeof-he-ar polcy graden echnques. Concluson We nroduced MAXDT-PG, a echnque for auonomous ransfer beween polcy graden RL algorhms. MAXDT-PG employs unsupervsed manfold algnmen o learn an nersae mappng, whch s hen used o ransfer samples and nalze he arge ask polcy. MAXDT-PG s performance was evaluaed on four dynamcal sysems, demonsrang ha MAXDT-PG s capable of mprovng boh an agen s nal and fnal performance relave o usng polcy graden algorhms whou ransfer, even across dfferen domans. 259

7 Acknowledgemens Ths research was suppored by ONR gran #N , AOSR gran #A , and NS gran IIS We hank he anonymous revewers for her helpful feedback. References Argall, B. D.; Chernova, S.; Veloso, M.; and Brownng, B. 29. A survey of robo learnng from demonsraon. Robocs and Auonomous Sysems 575: Baneree, B., and Sone, P. 27. General game learnng usng knowledge ransfer. In Proceedngs of he 2h Inernaonal Jon Conference on Arfcal Inellgence, Bemporad, A.; Morar, M.; Dua, V.; and Pskopoulos, E. 22. The explc lnear quadrac regulaor for consraned sysems. Auomaca 381:3 2. Bócs, B.; Csao, L.; and Peers, J Algnmen-based ransfer learnng for robo models. In Proceedngs of he Inernaonal Jon Conference on Neural Neworks IJCNN. Bou Ammar, H.; Taylor, M.; Tuyls, K.; Dressens, K.; and Wess, G Renforcemen learnng ransfer va sparse codng. In Proceedngs of he 11h Conference on Auonomous Agens and Mulagen Sysems AAMAS. Bouabdallah, S. 27. Desgn and Conrol of Quadroors wh Applcaon o Auonomous lyng. Ph.D. Dsseraon, École polyechnque fédérale de Lausanne. Buşonu, L.; Babuška, R.; De Schuer, B.; and Erns, D. 21. Renforcemen Learnng and Dynamc Programmng Usng uncon Approxmaors. Boca Raon, lorda: CRC Press. Goldberg, Y.; and Rov, Y. 29. Local Procruses for manfold embeddng: a measure of embeddng qualy and embeddng algorhms. Machne Learnng 771: Kober, J.; Bagnell, A.; and Peers, J Renforcemen learnng n robocs: a survey. Inernaonal Journal of Robocs Research 3211: Kondars, G., and Baro, A. 27. Buldng porable opons: Skll ransfer n renforcemen learnng. In Proceedngs of he 2h Inernaonal Jon Conference on Arfcal Inellgence, Lu, Y., and Sone, P. 26. Value-funcon-based ransfer for renforcemen learnng usng srucure mappng. In Proceedngs of he 21s Naonal Conference on Arfcal Inellgence, Peers, J., and Schaal, S. 26. Polcy graden mehods for robocs. In Proceedngs of he IEEE/RSJ Inernaonal Conference on Inellgen Robos and Sysems, Peers, J., and Schaal, S. 28. Naural acor-crc. Neurocompung 717-9: Peers, J.; Vayakumar, S.; and Schaal, S. 25. Naural acor-crc. In Proceedngs of he 16h European Conference on Machne Learnng ECML, Sprnger. Suon, R. S.; McAlleser, D. A.; Sngh, S. P.; and Mansour, Y Polcy graden mehods for renforcemen learnng wh funcon approxmaon. Neural Informaon Processng Sysems Taylor, M. E., and Sone, P. 29. Transfer learnng for renforcemen learnng domans: a survey. Journal of Machne Learnng Research 1: Taylor, M. E.; Kuhlmann, G.; and Sone, P. 28. Auonomous ransfer for renforcemen learnng. In Proceedngs of he 7h Inernaonal Jon Conference on Auonomous Agens and Mulagen Sysems AAMAS, Taylor, M. E.; Sone, P.; and Lu, Y. 27. Transfer learnng va ner-ask mappngs for emporal dfference learnng. Journal of Machne Learnng Research 81: Taylor, M. E.; Wheson, S.; and Sone, P. 27. Transfer va ner-ask mappngs n polcy search renforcemen learnng. In Proceedngs of he 6h Inernaonal Jon Conference on Auonomous Agens and Mulagen Sysems. Torrey, L.; Shavlk, J.; Walker, T.; and Macln, R. 28. Relaonal macros for ransfer n renforcemen learnng. In Blockeel, H.; Ramon, J.; Shavlk, J.; and Tadepall, P., eds., Inducve Logc Programmng, volume 4894 of Lecure Noes n Compuer Scence. Sprnger Berln Hedelberg Voos, H., and Bou Ammar, H. 21. Nonlnear rackng and landng conroller for quadroor aeral robos. In Proceedngs of he IEEE Inernaonal Conference on Conrol Applcaons CCA, Wang, C., and Mahadevan, S. 29. Manfold algnmen whou correspondence. In Proceedngs of he 21s Inernaonal Jon Conference on Arfcal Inellgence IJCAI, Morgan Kaufmann. 251

Variants of Pegasos. December 11, 2009

Variants of Pegasos. December 11, 2009 Inroducon Varans of Pegasos SooWoong Ryu bshboy@sanford.edu December, 009 Youngsoo Cho yc344@sanford.edu Developng a new SVM algorhm s ongong research opc. Among many exng SVM algorhms, we wll focus on

More information

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS R&RATA # Vol.) 8, March FURTHER AALYSIS OF COFIDECE ITERVALS FOR LARGE CLIET/SERVER COMPUTER ETWORKS Vyacheslav Abramov School of Mahemacal Scences, Monash Unversy, Buldng 8, Level 4, Clayon Campus, Wellngon

More information

CHAPTER 10: LINEAR DISCRIMINATION

CHAPTER 10: LINEAR DISCRIMINATION CHAPER : LINEAR DISCRIMINAION Dscrmnan-based Classfcaon 3 In classfcaon h K classes (C,C,, C k ) We defned dscrmnan funcon g j (), j=,,,k hen gven an es eample, e chose (predced) s class label as C f g

More information

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model Probablsc Model for Tme-seres Daa: Hdden Markov Model Hrosh Mamsuka Bonformacs Cener Kyoo Unversy Oulne Three Problems for probablsc models n machne learnng. Compung lkelhood 2. Learnng 3. Parsng (predcon

More information

Lecture 6: Learning for Control (Generalised Linear Regression)

Lecture 6: Learning for Control (Generalised Linear Regression) Lecure 6: Learnng for Conrol (Generalsed Lnear Regresson) Conens: Lnear Mehods for Regresson Leas Squares, Gauss Markov heorem Recursve Leas Squares Lecure 6: RLSC - Prof. Sehu Vjayakumar Lnear Regresson

More information

Robustness Experiments with Two Variance Components

Robustness Experiments with Two Variance Components Naonal Insue of Sandards and Technology (NIST) Informaon Technology Laboraory (ITL) Sascal Engneerng Dvson (SED) Robusness Expermens wh Two Varance Componens by Ana Ivelsse Avlés avles@ns.gov Conference

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4 CS434a/54a: Paern Recognon Prof. Olga Veksler Lecure 4 Oulne Normal Random Varable Properes Dscrmnan funcons Why Normal Random Varables? Analycally racable Works well when observaon comes form a corruped

More information

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005 Dynamc Team Decson Theory EECS 558 Proec Shruvandana Sharma and Davd Shuman December 0, 005 Oulne Inroducon o Team Decson Theory Decomposon of he Dynamc Team Decson Problem Equvalence of Sac and Dynamc

More information

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!) i+1,q - [(! ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL The frs hng o es n wo-way ANOVA: Is here neracon? "No neracon" means: The man effecs model would f. Ths n urn means: In he neracon plo (wh A on he horzonal

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Lnear Response Theory: The connecon beween QFT and expermens 3.1. Basc conceps and deas Q: ow do we measure he conducvy of a meal? A: we frs nroduce a weak elecrc feld E, and hen measure

More information

Robust and Accurate Cancer Classification with Gene Expression Profiling

Robust and Accurate Cancer Classification with Gene Expression Profiling Robus and Accurae Cancer Classfcaon wh Gene Expresson Proflng (Compuaonal ysems Bology, 2005) Auhor: Hafeng L, Keshu Zhang, ao Jang Oulne Background LDA (lnear dscrmnan analyss) and small sample sze problem

More information

Lecture VI Regression

Lecture VI Regression Lecure VI Regresson (Lnear Mehods for Regresson) Conens: Lnear Mehods for Regresson Leas Squares, Gauss Markov heorem Recursve Leas Squares Lecure VI: MLSC - Dr. Sehu Vjayakumar Lnear Regresson Model M

More information

( ) () we define the interaction representation by the unitary transformation () = ()

( ) () we define the interaction representation by the unitary transformation () = () Hgher Order Perurbaon Theory Mchael Fowler 3/7/6 The neracon Represenaon Recall ha n he frs par of hs course sequence, we dscussed he chrödnger and Hesenberg represenaons of quanum mechancs here n he chrödnger

More information

An introduction to Support Vector Machine

An introduction to Support Vector Machine An nroducon o Suppor Vecor Machne 報告者 : 黃立德 References: Smon Haykn, "Neural Neworks: a comprehensve foundaon, second edon, 999, Chaper 2,6 Nello Chrsann, John Shawe-Tayer, An Inroducon o Suppor Vecor Machnes,

More information

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany Herarchcal Markov Normal Mxure models wh Applcaons o Fnancal Asse Reurns Appendx: Proofs of Theorems and Condonal Poseror Dsrbuons John Geweke a and Gann Amsano b a Deparmens of Economcs and Sascs, Unversy

More information

On One Analytic Method of. Constructing Program Controls

On One Analytic Method of. Constructing Program Controls Appled Mahemacal Scences, Vol. 9, 05, no. 8, 409-407 HIKARI Ld, www.m-hkar.com hp://dx.do.org/0.988/ams.05.54349 On One Analyc Mehod of Consrucng Program Conrols A. N. Kvko, S. V. Chsyakov and Yu. E. Balyna

More information

Solution in semi infinite diffusion couples (error function analysis)

Solution in semi infinite diffusion couples (error function analysis) Soluon n sem nfne dffuson couples (error funcon analyss) Le us consder now he sem nfne dffuson couple of wo blocks wh concenraon of and I means ha, n a A- bnary sysem, s bondng beween wo blocks made of

More information

Cubic Bezier Homotopy Function for Solving Exponential Equations

Cubic Bezier Homotopy Function for Solving Exponential Equations Penerb Journal of Advanced Research n Compung and Applcaons ISSN (onlne: 46-97 Vol. 4, No.. Pages -8, 6 omoopy Funcon for Solvng Eponenal Equaons S. S. Raml *,,. Mohamad Nor,a, N. S. Saharzan,b and M.

More information

Clustering (Bishop ch 9)

Clustering (Bishop ch 9) Cluserng (Bshop ch 9) Reference: Daa Mnng by Margare Dunham (a slde source) 1 Cluserng Cluserng s unsupervsed learnng, here are no class labels Wan o fnd groups of smlar nsances Ofen use a dsance measure

More information

Computing Relevance, Similarity: The Vector Space Model

Computing Relevance, Similarity: The Vector Space Model Compung Relevance, Smlary: The Vecor Space Model Based on Larson and Hears s sldes a UC-Bereley hp://.sms.bereley.edu/courses/s0/f00/ aabase Managemen Sysems, R. Ramarshnan ocumen Vecors v ocumens are

More information

DEEP UNFOLDING FOR MULTICHANNEL SOURCE SEPARATION SUPPLEMENTARY MATERIAL

DEEP UNFOLDING FOR MULTICHANNEL SOURCE SEPARATION SUPPLEMENTARY MATERIAL DEEP UNFOLDING FOR MULTICHANNEL SOURCE SEPARATION SUPPLEMENTARY MATERIAL Sco Wsdom, John Hershey 2, Jonahan Le Roux 2, and Shnj Waanabe 2 Deparmen o Elecrcal Engneerng, Unversy o Washngon, Seale, WA, USA

More information

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance INF 43 3.. Repeon Anne Solberg (anne@f.uo.no Bayes rule for a classfcaon problem Suppose we have J, =,...J classes. s he class label for a pxel, and x s he observed feaure vecor. We can use Bayes rule

More information

Relative controllability of nonlinear systems with delays in control

Relative controllability of nonlinear systems with delays in control Relave conrollably o nonlnear sysems wh delays n conrol Jerzy Klamka Insue o Conrol Engneerng, Slesan Techncal Unversy, 44- Glwce, Poland. phone/ax : 48 32 37227, {jklamka}@a.polsl.glwce.pl Keywor: Conrollably.

More information

Comb Filters. Comb Filters

Comb Filters. Comb Filters The smple flers dscussed so far are characered eher by a sngle passband and/or a sngle sopband There are applcaons where flers wh mulple passbands and sopbands are requred Thecomb fler s an example of

More information

( ) [ ] MAP Decision Rule

( ) [ ] MAP Decision Rule Announcemens Bayes Decson Theory wh Normal Dsrbuons HW0 due oday HW o be assgned soon Proec descrpon posed Bomercs CSE 90 Lecure 4 CSE90, Sprng 04 CSE90, Sprng 04 Key Probables 4 ω class label X feaure

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecure Sldes for Machne Learnng nd Edon ETHEM ALPAYDIN, modfed by Leonardo Bobadlla and some pars from hp://www.cs.au.ac.l/~aparzn/machnelearnng/ The MIT Press, 00 alpaydn@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/mle

More information

Chapter 6: AC Circuits

Chapter 6: AC Circuits Chaper 6: AC Crcus Chaper 6: Oulne Phasors and he AC Seady Sae AC Crcus A sable, lnear crcu operang n he seady sae wh snusodal excaon (.e., snusodal seady sae. Complee response forced response naural response.

More information

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD Journal of Appled Mahemacs and Compuaonal Mechancs 3, (), 45-5 HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD Sansław Kukla, Urszula Sedlecka Insue of Mahemacs,

More information

Single-loop System Reliability-Based Design & Topology Optimization (SRBDO/SRBTO): A Matrix-based System Reliability (MSR) Method

Single-loop System Reliability-Based Design & Topology Optimization (SRBDO/SRBTO): A Matrix-based System Reliability (MSR) Method 10 h US Naonal Congress on Compuaonal Mechancs Columbus, Oho 16-19, 2009 Sngle-loop Sysem Relably-Based Desgn & Topology Opmzaon (SRBDO/SRBTO): A Marx-based Sysem Relably (MSR) Mehod Tam Nguyen, Junho

More information

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Ths documen s downloaded from DR-NTU, Nanyang Technologcal Unversy Lbrary, Sngapore. Tle A smplfed verb machng algorhm for word paron n vsual speech processng( Acceped verson ) Auhor(s) Foo, Say We; Yong,

More information

FTCS Solution to the Heat Equation

FTCS Solution to the Heat Equation FTCS Soluon o he Hea Equaon ME 448/548 Noes Gerald Reckenwald Porland Sae Unversy Deparmen of Mechancal Engneerng gerry@pdxedu ME 448/548: FTCS Soluon o he Hea Equaon Overvew Use he forward fne d erence

More information

[ ] 2. [ ]3 + (Δx i + Δx i 1 ) / 2. Δx i-1 Δx i Δx i+1. TPG4160 Reservoir Simulation 2018 Lecture note 3. page 1 of 5

[ ] 2. [ ]3 + (Δx i + Δx i 1 ) / 2. Δx i-1 Δx i Δx i+1. TPG4160 Reservoir Simulation 2018 Lecture note 3. page 1 of 5 TPG460 Reservor Smulaon 08 page of 5 DISCRETIZATIO OF THE FOW EQUATIOS As we already have seen, fne dfference appromaons of he paral dervaves appearng n he flow equaons may be obaned from Taylor seres

More information

Mechanics Physics 151

Mechanics Physics 151 Mechancs Physcs 5 Lecure 9 Hamlonan Equaons of Moon (Chaper 8) Wha We Dd Las Tme Consruced Hamlonan formalsm H ( q, p, ) = q p L( q, q, ) H p = q H q = p H = L Equvalen o Lagrangan formalsm Smpler, bu

More information

Volatility Interpolation

Volatility Interpolation Volaly Inerpolaon Prelmnary Verson March 00 Jesper Andreasen and Bran Huge Danse Mares, Copenhagen wan.daddy@danseban.com brno@danseban.com Elecronc copy avalable a: hp://ssrn.com/absrac=69497 Inro Local

More information

Let s treat the problem of the response of a system to an applied external force. Again,

Let s treat the problem of the response of a system to an applied external force. Again, Page 33 QUANTUM LNEAR RESPONSE FUNCTON Le s rea he problem of he response of a sysem o an appled exernal force. Agan, H() H f () A H + V () Exernal agen acng on nernal varable Hamlonan for equlbrum sysem

More information

Mechanics Physics 151

Mechanics Physics 151 Mechancs Physcs 5 Lecure 9 Hamlonan Equaons of Moon (Chaper 8) Wha We Dd Las Tme Consruced Hamlonan formalsm Hqp (,,) = qp Lqq (,,) H p = q H q = p H L = Equvalen o Lagrangan formalsm Smpler, bu wce as

More information

Machine Learning Linear Regression

Machine Learning Linear Regression Machne Learnng Lnear Regresson Lesson 3 Lnear Regresson Bascs of Regresson Leas Squares esmaon Polynomal Regresson Bass funcons Regresson model Regularzed Regresson Sascal Regresson Mamum Lkelhood (ML)

More information

. The geometric multiplicity is dim[ker( λi. number of linearly independent eigenvectors associated with this eigenvalue.

. The geometric multiplicity is dim[ker( λi. number of linearly independent eigenvectors associated with this eigenvalue. Lnear Algebra Lecure # Noes We connue wh he dscusson of egenvalues, egenvecors, and dagonalzably of marces We wan o know, n parcular wha condons wll assure ha a marx can be dagonalzed and wha he obsrucons

More information

Introduction to Boosting

Introduction to Boosting Inroducon o Boosng Cynha Rudn PACM, Prnceon Unversy Advsors Ingrd Daubeches and Rober Schapre Say you have a daabase of news arcles, +, +, -, -, +, +, -, -, +, +, -, -, +, +, -, + where arcles are labeled

More information

Time-interval analysis of β decay. V. Horvat and J. C. Hardy

Time-interval analysis of β decay. V. Horvat and J. C. Hardy Tme-nerval analyss of β decay V. Horva and J. C. Hardy Work on he even analyss of β decay [1] connued and resuled n he developmen of a novel mehod of bea-decay me-nerval analyss ha produces hghly accurae

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machne Learnng & Percepon Insrucor: Tony Jebara SVM Feaure & Kernel Selecon SVM Eensons Feaure Selecon (Flerng and Wrappng) SVM Feaure Selecon SVM Kernel Selecon SVM Eensons Classfcaon Feaure/Kernel

More information

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s Ordnary Dfferenal Equaons n Neuroscence wh Malab eamples. Am - Gan undersandng of how o se up and solve ODE s Am Undersand how o se up an solve a smple eample of he Hebb rule n D Our goal a end of class

More information

TSS = SST + SSE An orthogonal partition of the total SS

TSS = SST + SSE An orthogonal partition of the total SS ANOVA: Topc 4. Orhogonal conrass [ST&D p. 183] H 0 : µ 1 = µ =... = µ H 1 : The mean of a leas one reamen group s dfferen To es hs hypohess, a basc ANOVA allocaes he varaon among reamen means (SST) equally

More information

Exact Dynamic Programming for Decentralized POMDPs with Lossless Policy Compression

Exact Dynamic Programming for Decentralized POMDPs with Lossless Policy Compression Proceedngs of he Egheenh Inernaonal Conference on Auomaed Plannng and Schedulng (ICAPS 2008) Exac Dynamc Programmng for Decenralzed POMDPs wh Lossless Polcy Compresson Abdeslam Boularas and Brahm Chab-draa

More information

Reactive Methods to Solve the Berth AllocationProblem with Stochastic Arrival and Handling Times

Reactive Methods to Solve the Berth AllocationProblem with Stochastic Arrival and Handling Times Reacve Mehods o Solve he Berh AllocaonProblem wh Sochasc Arrval and Handlng Tmes Nsh Umang* Mchel Berlare* * TRANSP-OR, Ecole Polyechnque Fédérale de Lausanne Frs Workshop on Large Scale Opmzaon November

More information

Lecture 2 L n i e n a e r a M od o e d l e s

Lecture 2 L n i e n a e r a M od o e d l e s Lecure Lnear Models Las lecure You have learned abou ha s machne learnng Supervsed learnng Unsupervsed learnng Renforcemen learnng You have seen an eample learnng problem and he general process ha one

More information

M. Y. Adamu Mathematical Sciences Programme, AbubakarTafawaBalewa University, Bauchi, Nigeria

M. Y. Adamu Mathematical Sciences Programme, AbubakarTafawaBalewa University, Bauchi, Nigeria IOSR Journal of Mahemacs (IOSR-JM e-issn: 78-578, p-issn: 9-765X. Volume 0, Issue 4 Ver. IV (Jul-Aug. 04, PP 40-44 Mulple SolonSoluons for a (+-dmensonalhroa-sasuma shallow waer wave equaon UsngPanlevé-Bӓclund

More information

Dual Approximate Dynamic Programming for Large Scale Hydro Valleys

Dual Approximate Dynamic Programming for Large Scale Hydro Valleys Dual Approxmae Dynamc Programmng for Large Scale Hydro Valleys Perre Carpener and Jean-Phlppe Chanceler 1 ENSTA ParsTech and ENPC ParsTech CMM Workshop, January 2016 1 Jon work wh J.-C. Alas, suppored

More information

WiH Wei He

WiH Wei He Sysem Idenfcaon of onlnear Sae-Space Space Baery odels WH We He wehe@calce.umd.edu Advsor: Dr. Chaochao Chen Deparmen of echancal Engneerng Unversy of aryland, College Par 1 Unversy of aryland Bacground

More information

P R = P 0. The system is shown on the next figure:

P R = P 0. The system is shown on the next figure: TPG460 Reservor Smulaon 08 page of INTRODUCTION TO RESERVOIR SIMULATION Analycal and numercal soluons of smple one-dmensonal, one-phase flow equaons As an nroducon o reservor smulaon, we wll revew he smples

More information

Lecture 11 SVM cont

Lecture 11 SVM cont Lecure SVM con. 0 008 Wha we have done so far We have esalshed ha we wan o fnd a lnear decson oundary whose margn s he larges We know how o measure he margn of a lnear decson oundary Tha s: he mnmum geomerc

More information

Li An-Ping. Beijing , P.R.China

Li An-Ping. Beijing , P.R.China A New Type of Cpher: DICING_csb L An-Png Bejng 100085, P.R.Chna apl0001@sna.com Absrac: In hs paper, we wll propose a new ype of cpher named DICING_csb, whch s derved from our prevous sream cpher DICING.

More information

J i-1 i. J i i+1. Numerical integration of the diffusion equation (I) Finite difference method. Spatial Discretization. Internal nodes.

J i-1 i. J i i+1. Numerical integration of the diffusion equation (I) Finite difference method. Spatial Discretization. Internal nodes. umercal negraon of he dffuson equaon (I) Fne dfference mehod. Spaal screaon. Inernal nodes. R L V For hermal conducon le s dscree he spaal doman no small fne spans, =,,: Balance of parcles for an nernal

More information

CS286.2 Lecture 14: Quantum de Finetti Theorems II

CS286.2 Lecture 14: Quantum de Finetti Theorems II CS286.2 Lecure 14: Quanum de Fne Theorems II Scrbe: Mara Okounkova 1 Saemen of he heorem Recall he las saemen of he quanum de Fne heorem from he prevous lecure. Theorem 1 Quanum de Fne). Le ρ Dens C 2

More information

Performance Analysis for a Network having Standby Redundant Unit with Waiting in Repair

Performance Analysis for a Network having Standby Redundant Unit with Waiting in Repair TECHNI Inernaonal Journal of Compung Scence Communcaon Technologes VOL.5 NO. July 22 (ISSN 974-3375 erformance nalyss for a Nework havng Sby edundan Un wh ang n epar Jendra Sngh 2 abns orwal 2 Deparmen

More information

Math 128b Project. Jude Yuen

Math 128b Project. Jude Yuen Mah 8b Proec Jude Yuen . Inroducon Le { Z } be a sequence of observed ndependen vecor varables. If he elemens of Z have a on normal dsrbuon hen { Z } has a mean vecor Z and a varancecovarance marx z. Geomercally

More information

. The geometric multiplicity is dim[ker( λi. A )], i.e. the number of linearly independent eigenvectors associated with this eigenvalue.

. The geometric multiplicity is dim[ker( λi. A )], i.e. the number of linearly independent eigenvectors associated with this eigenvalue. Mah E-b Lecure #0 Noes We connue wh he dscusson of egenvalues, egenvecors, and dagonalzably of marces We wan o know, n parcular wha condons wll assure ha a marx can be dagonalzed and wha he obsrucons are

More information

Density Matrix Description of NMR BCMB/CHEM 8190

Density Matrix Description of NMR BCMB/CHEM 8190 Densy Marx Descrpon of NMR BCMBCHEM 89 Operaors n Marx Noaon Alernae approach o second order specra: ask abou x magnezaon nsead of energes and ranson probables. If we say wh one bass se, properes vary

More information

Graduate Macroeconomics 2 Problem set 5. - Solutions

Graduate Macroeconomics 2 Problem set 5. - Solutions Graduae Macroeconomcs 2 Problem se. - Soluons Queson 1 To answer hs queson we need he frms frs order condons and he equaon ha deermnes he number of frms n equlbrum. The frms frs order condons are: F K

More information

Notes on the stability of dynamic systems and the use of Eigen Values.

Notes on the stability of dynamic systems and the use of Eigen Values. Noes on he sabl of dnamc ssems and he use of Egen Values. Source: Macro II course noes, Dr. Davd Bessler s Tme Seres course noes, zarads (999) Ineremporal Macroeconomcs chaper 4 & Techncal ppend, and Hamlon

More information

Approximate Analytic Solution of (2+1) - Dimensional Zakharov-Kuznetsov(Zk) Equations Using Homotopy

Approximate Analytic Solution of (2+1) - Dimensional Zakharov-Kuznetsov(Zk) Equations Using Homotopy Arcle Inernaonal Journal of Modern Mahemacal Scences, 4, (): - Inernaonal Journal of Modern Mahemacal Scences Journal homepage: www.modernscenfcpress.com/journals/jmms.aspx ISSN: 66-86X Florda, USA Approxmae

More information

Econ107 Applied Econometrics Topic 5: Specification: Choosing Independent Variables (Studenmund, Chapter 6)

Econ107 Applied Econometrics Topic 5: Specification: Choosing Independent Variables (Studenmund, Chapter 6) Econ7 Appled Economercs Topc 5: Specfcaon: Choosng Independen Varables (Sudenmund, Chaper 6 Specfcaon errors ha we wll deal wh: wrong ndependen varable; wrong funconal form. Ths lecure deals wh wrong ndependen

More information

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS THE PREICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS INTROUCTION The wo dmensonal paral dfferenal equaons of second order can be used for he smulaon of compeve envronmen n busness The arcle presens he

More information

Learning Objectives. Self Organization Map. Hamming Distance(1/5) Introduction. Hamming Distance(3/5) Hamming Distance(2/5) 15/04/2015

Learning Objectives. Self Organization Map. Hamming Distance(1/5) Introduction. Hamming Distance(3/5) Hamming Distance(2/5) 15/04/2015 /4/ Learnng Objecves Self Organzaon Map Learnng whou Exaples. Inroducon. MAXNET 3. Cluserng 4. Feaure Map. Self-organzng Feaure Map 6. Concluson 38 Inroducon. Learnng whou exaples. Daa are npu o he syse

More information

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data Anne Chao Ncholas J Goell C seh lzabeh L ander K Ma Rober K Colwell and Aaron M llson 03 Rarefacon and erapolaon wh ll numbers: a framewor for samplng and esmaon n speces dversy sudes cology Monographs

More information

EEL 6266 Power System Operation and Control. Chapter 5 Unit Commitment

EEL 6266 Power System Operation and Control. Chapter 5 Unit Commitment EEL 6266 Power Sysem Operaon and Conrol Chaper 5 Un Commmen Dynamc programmng chef advanage over enumeraon schemes s he reducon n he dmensonaly of he problem n a src prory order scheme, here are only N

More information

Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms

Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms Course organzaon Inroducon Wee -2) Course nroducon A bref nroducon o molecular bology A bref nroducon o sequence comparson Par I: Algorhms for Sequence Analyss Wee 3-8) Chaper -3, Models and heores» Probably

More information

On computing differential transform of nonlinear non-autonomous functions and its applications

On computing differential transform of nonlinear non-autonomous functions and its applications On compung dfferenal ransform of nonlnear non-auonomous funcons and s applcaons Essam. R. El-Zahar, and Abdelhalm Ebad Deparmen of Mahemacs, Faculy of Scences and Humanes, Prnce Saam Bn Abdulazz Unversy,

More information

3. OVERVIEW OF NUMERICAL METHODS

3. OVERVIEW OF NUMERICAL METHODS 3 OVERVIEW OF NUMERICAL METHODS 3 Inroducory remarks Ths chaper summarzes hose numercal echnques whose knowledge s ndspensable for he undersandng of he dfferen dscree elemen mehods: he Newon-Raphson-mehod,

More information

Fall 2010 Graduate Course on Dynamic Learning

Fall 2010 Graduate Course on Dynamic Learning Fall 200 Graduae Course on Dynamc Learnng Chaper 4: Parcle Flers Sepember 27, 200 Byoung-Tak Zhang School of Compuer Scence and Engneerng & Cognve Scence and Bran Scence Programs Seoul aonal Unversy hp://b.snu.ac.kr/~bzhang/

More information

ON THE WEAK LIMITS OF SMOOTH MAPS FOR THE DIRICHLET ENERGY BETWEEN MANIFOLDS

ON THE WEAK LIMITS OF SMOOTH MAPS FOR THE DIRICHLET ENERGY BETWEEN MANIFOLDS ON THE WEA LIMITS OF SMOOTH MAPS FOR THE DIRICHLET ENERGY BETWEEN MANIFOLDS FENGBO HANG Absrac. We denfy all he weak sequenal lms of smooh maps n W (M N). In parcular, hs mples a necessary su cen opologcal

More information

Existence and Uniqueness Results for Random Impulsive Integro-Differential Equation

Existence and Uniqueness Results for Random Impulsive Integro-Differential Equation Global Journal of Pure and Appled Mahemacs. ISSN 973-768 Volume 4, Number 6 (8), pp. 89-87 Research Inda Publcaons hp://www.rpublcaon.com Exsence and Unqueness Resuls for Random Impulsve Inegro-Dfferenal

More information

Mechanics Physics 151

Mechanics Physics 151 Mechancs Physcs 5 Lecure 0 Canoncal Transformaons (Chaper 9) Wha We Dd Las Tme Hamlon s Prncple n he Hamlonan formalsm Dervaon was smple δi δ Addonal end-pon consrans pq H( q, p, ) d 0 δ q ( ) δq ( ) δ

More information

Epistemic Game Theory: Online Appendix

Epistemic Game Theory: Online Appendix Epsemc Game Theory: Onlne Appendx Edde Dekel Lucano Pomao Marcano Snscalch July 18, 2014 Prelmnares Fx a fne ype srucure T I, S, T, β I and a probably µ S T. Le T µ I, S, T µ, βµ I be a ype srucure ha

More information

Advanced Macroeconomics II: Exchange economy

Advanced Macroeconomics II: Exchange economy Advanced Macroeconomcs II: Exchange economy Krzyszof Makarsk 1 Smple deermnsc dynamc model. 1.1 Inroducon Inroducon Smple deermnsc dynamc model. Defnons of equlbrum: Arrow-Debreu Sequenal Recursve Equvalence

More information

Genetic Algorithm in Parameter Estimation of Nonlinear Dynamic Systems

Genetic Algorithm in Parameter Estimation of Nonlinear Dynamic Systems Genec Algorhm n Parameer Esmaon of Nonlnear Dynamc Sysems E. Paeraks manos@egnaa.ee.auh.gr V. Perds perds@vergna.eng.auh.gr Ah. ehagas kehagas@egnaa.ee.auh.gr hp://skron.conrol.ee.auh.gr/kehagas/ndex.hm

More information

12d Model. Civil and Surveying Software. Drainage Analysis Module Detention/Retention Basins. Owen Thornton BE (Mech), 12d Model Programmer

12d Model. Civil and Surveying Software. Drainage Analysis Module Detention/Retention Basins. Owen Thornton BE (Mech), 12d Model Programmer d Model Cvl and Surveyng Soware Dranage Analyss Module Deenon/Reenon Basns Owen Thornon BE (Mech), d Model Programmer owen.hornon@d.com 4 January 007 Revsed: 04 Aprl 007 9 February 008 (8Cp) Ths documen

More information

Part II CONTINUOUS TIME STOCHASTIC PROCESSES

Part II CONTINUOUS TIME STOCHASTIC PROCESSES Par II CONTINUOUS TIME STOCHASTIC PROCESSES 4 Chaper 4 For an advanced analyss of he properes of he Wener process, see: Revus D and Yor M: Connuous marngales and Brownan Moon Karazas I and Shreve S E:

More information

A NEW TECHNIQUE FOR SOLVING THE 1-D BURGERS EQUATION

A NEW TECHNIQUE FOR SOLVING THE 1-D BURGERS EQUATION S19 A NEW TECHNIQUE FOR SOLVING THE 1-D BURGERS EQUATION by Xaojun YANG a,b, Yugu YANG a*, Carlo CATTANI c, and Mngzheng ZHU b a Sae Key Laboraory for Geomechancs and Deep Underground Engneerng, Chna Unversy

More information

The Analysis of the Thickness-predictive Model Based on the SVM Xiu-ming Zhao1,a,Yan Wang2,band Zhimin Bi3,c

The Analysis of the Thickness-predictive Model Based on the SVM Xiu-ming Zhao1,a,Yan Wang2,band Zhimin Bi3,c h Naonal Conference on Elecrcal, Elecroncs and Compuer Engneerng (NCEECE The Analyss of he Thcknesspredcve Model Based on he SVM Xumng Zhao,a,Yan Wang,band Zhmn B,c School of Conrol Scence and Engneerng,

More information

Comparison of Differences between Power Means 1

Comparison of Differences between Power Means 1 In. Journal of Mah. Analyss, Vol. 7, 203, no., 5-55 Comparson of Dfferences beween Power Means Chang-An Tan, Guanghua Sh and Fe Zuo College of Mahemacs and Informaon Scence Henan Normal Unversy, 453007,

More information

Attribute Reduction Algorithm Based on Discernibility Matrix with Algebraic Method GAO Jing1,a, Ma Hui1, Han Zhidong2,b

Attribute Reduction Algorithm Based on Discernibility Matrix with Algebraic Method GAO Jing1,a, Ma Hui1, Han Zhidong2,b Inernaonal Indusral Informacs and Compuer Engneerng Conference (IIICEC 05) Arbue educon Algorhm Based on Dscernbly Marx wh Algebrac Mehod GAO Jng,a, Ma Hu, Han Zhdong,b Informaon School, Capal Unversy

More information

Single and Multiple Object Tracking Using a Multi-Feature Joint Sparse Representation

Single and Multiple Object Tracking Using a Multi-Feature Joint Sparse Representation Sngle and Mulple Objec Trackng Usng a Mul-Feaure Jon Sparse Represenaon Wemng Hu, We L, and Xaoqn Zhang (Naonal Laboraory of Paern Recognon, Insue of Auomaon, Chnese Academy of Scences, Bejng 100190) {wmhu,

More information

Lecture 18: The Laplace Transform (See Sections and 14.7 in Boas)

Lecture 18: The Laplace Transform (See Sections and 14.7 in Boas) Lecure 8: The Lalace Transform (See Secons 88- and 47 n Boas) Recall ha our bg-cure goal s he analyss of he dfferenal equaon, ax bx cx F, where we emloy varous exansons for he drvng funcon F deendng on

More information

Large-Scale Optimistic Adaptive Submodularity

Large-Scale Optimistic Adaptive Submodularity Vcor Gabllon INRIA Llle - eam SequeL Vlleneuve d Ascq, France vcor.gabllon@nra.fr Large-Scale Opmsc Adapve Submodulary Bran Erksson Techncolor Labs Palo Alo, CA bran.erksson@echncolor.com Absrac Maxmzaon

More information

CS 536: Machine Learning. Nonparametric Density Estimation Unsupervised Learning - Clustering

CS 536: Machine Learning. Nonparametric Density Estimation Unsupervised Learning - Clustering CS 536: Machne Learnng Nonparamerc Densy Esmaon Unsupervsed Learnng - Cluserng Fall 2005 Ahmed Elgammal Dep of Compuer Scence Rugers Unversy CS 536 Densy Esmaon - Cluserng - 1 Oulnes Densy esmaon Nonparamerc

More information

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim Korean J. Mah. 19 (2011), No. 3, pp. 263 272 GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS Youngwoo Ahn and Kae Km Absrac. In he paper [1], an explc correspondence beween ceran

More information

Dynamic Team Decision Theory

Dynamic Team Decision Theory Dynamc Team Decson Theory EECS 558 Proec Repor Shruvandana Sharma and Davd Shuman December, 005 I. Inroducon Whle he sochasc conrol problem feaures one decson maker acng over me, many complex conrolled

More information

Density Matrix Description of NMR BCMB/CHEM 8190

Density Matrix Description of NMR BCMB/CHEM 8190 Densy Marx Descrpon of NMR BCMBCHEM 89 Operaors n Marx Noaon If we say wh one bass se, properes vary only because of changes n he coeffcens weghng each bass se funcon x = h< Ix > - hs s how we calculae

More information

FI 3103 Quantum Physics

FI 3103 Quantum Physics /9/4 FI 33 Quanum Physcs Aleander A. Iskandar Physcs of Magnesm and Phooncs Research Grou Insu Teknolog Bandung Basc Conces n Quanum Physcs Probably and Eecaon Value Hesenberg Uncerany Prncle Wave Funcon

More information

Including the ordinary differential of distance with time as velocity makes a system of ordinary differential equations.

Including the ordinary differential of distance with time as velocity makes a system of ordinary differential equations. Soluons o Ordnary Derenal Equaons An ordnary derenal equaon has only one ndependen varable. A sysem o ordnary derenal equaons consss o several derenal equaons each wh he same ndependen varable. An eample

More information

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study)

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study) Inernaonal Mahemacal Forum, Vol. 8, 3, no., 7 - HIKARI Ld, www.m-hkar.com hp://dx.do.org/.988/mf.3.3488 New M-Esmaor Objecve Funcon n Smulaneous Equaons Model (A Comparave Sudy) Ahmed H. Youssef Professor

More information

2. SPATIALLY LAGGED DEPENDENT VARIABLES

2. SPATIALLY LAGGED DEPENDENT VARIABLES 2. SPATIALLY LAGGED DEPENDENT VARIABLES In hs chaper, we descrbe a sascal model ha ncorporaes spaal dependence explcly by addng a spaally lagged dependen varable y on he rgh-hand sde of he regresson equaon.

More information

10. A.C CIRCUITS. Theoretically current grows to maximum value after infinite time. But practically it grows to maximum after 5τ. Decay of current :

10. A.C CIRCUITS. Theoretically current grows to maximum value after infinite time. But practically it grows to maximum after 5τ. Decay of current : . A. IUITS Synopss : GOWTH OF UNT IN IUIT : d. When swch S s closed a =; = d. A me, curren = e 3. The consan / has dmensons of me and s called he nducve me consan ( τ ) of he crcu. 4. = τ; =.63, n one

More information

e-journal Reliability: Theory& Applications No 2 (Vol.2) Vyacheslav Abramov

e-journal Reliability: Theory& Applications No 2 (Vol.2) Vyacheslav Abramov June 7 e-ournal Relably: Theory& Applcaons No (Vol. CONFIDENCE INTERVALS ASSOCIATED WITH PERFORMANCE ANALYSIS OF SYMMETRIC LARGE CLOSED CLIENT/SERVER COMPUTER NETWORKS Absrac Vyacheslav Abramov School

More information

Hidden Markov Models Following a lecture by Andrew W. Moore Carnegie Mellon University

Hidden Markov Models Following a lecture by Andrew W. Moore Carnegie Mellon University Hdden Markov Models Followng a lecure by Andrew W. Moore Carnege Mellon Unversy www.cs.cmu.edu/~awm/uorals A Markov Sysem Has N saes, called s, s 2.. s N s 2 There are dscree meseps, 0,, s s 3 N 3 0 Hdden

More information

CH.3. COMPATIBILITY EQUATIONS. Continuum Mechanics Course (MMC) - ETSECCPB - UPC

CH.3. COMPATIBILITY EQUATIONS. Continuum Mechanics Course (MMC) - ETSECCPB - UPC CH.3. COMPATIBILITY EQUATIONS Connuum Mechancs Course (MMC) - ETSECCPB - UPC Overvew Compably Condons Compably Equaons of a Poenal Vecor Feld Compably Condons for Infnesmal Srans Inegraon of he Infnesmal

More information

Supplementary Material to: IMU Preintegration on Manifold for E cient Visual-Inertial Maximum-a-Posteriori Estimation

Supplementary Material to: IMU Preintegration on Manifold for E cient Visual-Inertial Maximum-a-Posteriori Estimation Supplemenary Maeral o: IMU Prenegraon on Manfold for E cen Vsual-Ineral Maxmum-a-Poseror Esmaon echncal Repor G-IRIM-CP&R-05-00 Chrsan Forser, Luca Carlone, Fran Dellaer, and Davde Scaramuzza May 0, 05

More information

Comparison of Supervised & Unsupervised Learning in βs Estimation between Stocks and the S&P500

Comparison of Supervised & Unsupervised Learning in βs Estimation between Stocks and the S&P500 Comparson of Supervsed & Unsupervsed Learnng n βs Esmaon beween Socks and he S&P500 J. We, Y. Hassd, J. Edery, A. Becker, Sanford Unversy T I. INTRODUCTION HE goal of our proec s o analyze he relaonshps

More information

The Dynamic Programming Models for Inventory Control System with Time-varying Demand

The Dynamic Programming Models for Inventory Control System with Time-varying Demand The Dynamc Programmng Models for Invenory Conrol Sysem wh Tme-varyng Demand Truong Hong Trnh (Correspondng auhor) The Unversy of Danang, Unversy of Economcs, Venam Tel: 84-236-352-5459 E-mal: rnh.h@due.edu.vn

More information