A Reinforcement Learning Approach for Collaborative Filtering

Size: px
Start display at page:

Download "A Reinforcement Learning Approach for Collaborative Filtering"

Transcription

1 A Reinforcemen Learning Approach for Collaboraive Filering Jungkyu Lee, Byonghwa Oh 2, Jihoon Yang 2, and Sungyong Park 2 Cyram Inc, Seoul, Korea jklee@cyram.com 2 Sogang Universiy, Seoul, Korea {mrfive,yangjh,parksy}@sogang.ac.kr Absrac. In recen years, here has been much ineres in recommender sysems ha provide users wih personalized suggesions for producs or services. In his paper we presens a reinforcemen learning approach for collaboraive filering. Firs, we formalize he collaboraive filering problem as a Markov Decision Process. Then, we learn he connecion beween he ime sequence of user raings using Q-learning. Experimens demonsrae he feasibiliy of our approach and a igh relaionship beween he pas and curren raings. Keywords: recommender sysems, markov decision process, Q-learning, singular value decomposiion Inroducion Recommender sysems provide users wih personalized suggesions for producs or services. Recommender sysems can be combined wih various echnologies including auonomous driving and make appropriae suggesions in differen spaioemporal siuaions. Collaboraive Filering (CF) is one of he mos acive domain in recommender sysems since i avoids using informaion abou he conen, bu raher i uses hisorical daa (e.g. user raings). The daa of CF can be represened as a large sparse marix wih (user, iem, raing) enries, and he goal is o predic he missing enries of users s raings for iems hey have no ye considered. Many researchers have developed CF algorihms including Resriced Bolzmann Machine (RBM)[8], K-Neares Neighbor(KNN)[] and Singular Value Decomposiion (SVD)[5]. This paper presens a new CF algorihm based on reinforcemen learning, RLCF. Reinforcemen learning has been successful in many applicaions [3]. However, o he bes of our knowledge, here has been no aemp o apply reinforcemen learning o CF. The moivaion for applying reinforcemen learning o CF comes from he conras effec, which is he enhancemen or diminishmen of a percepion and relaed performance as a resul of immediae previous or simulaneous exposure o he simulus in he same dimension, wih a smaller or greaer value [0]. The conras effec can ake place in CF. For insance, a user can evaluae he same movie differenly depending on wha he wached

2 previously. Thus, he predicion of a user s raings for an unseen movie can be opimized by aking ino accoun he sequence of movies he has seen. Agains his background, we aemp o formulae he conras effec on CF as a Markov Decision Process (MDP) and apply one of he popular reinforcemen learning algorihms, Q-learning [9, ] o i. 2 Preliminaries 2. Problem Definiion Suppose he user-movie daase consis of raings (of 5 sars) for N users and M movies. Le X R N M be he marix of he raings. Le A R N M be a marix including enry (user, movie, raing) X as real (correc) raings. The performance of a CF algorihm can be measured by he error beween he predicion values of he algorihm and A. Le I {0, } N M is an indicaor funcion such ha: I ij = { if movie j is raed by user i in A 0 if he raings is missing in A Le P R N M be he marix of predicions made by he CF algorihm. The goal is o esimae he missing enry of X such ha he roo mean squared error (RMSE) is minimal. RMSE is defined as follows: n m i= j= RMSE(P, A) = I ij(a ij P ij ) 2 n m i= j= I (2) ij () 2.2 Problem Fomulaion as MDP MDP provides a mahemaical framework for a sequenial decision problem [2, 6], and has been used for a wide range of opimizaion problems. MDP consiss of a uple of 5 elemens (S, A, {P sa }, γ, R) where Sae S: The sae S is defined as he raings of movies. If he range of raings is ineger o K, he size of sae S is K. s (i) S refers o he raing of he -h movie ha user i wached. Acion A: As defined above, s (i), and s (i) + refer o he raings of -h movie and ( + )-h ha user i wached, respecively. Sae s (i) by aking acion a (i) A. We represen his process as follows: s (i) ransiions o s (i) + a (i) s (i) + (3) Noe ha A = S. Transiion Probabiliy P sa : We assume ha he MDP for CF is deerminisic. Tha is, if a user akes acion a (i) a sae s (i), ransiion o he nex sae s (i) + is no random. Since A = S and he MDP is deerminisic, we can see ha a (i) = s (i) + in our problem.

3 Table : Example of (user, movie, raing, order) daa movie movie 2 movie 3 movie 4 movie 5 user (2,s) missing (5,2nd) missing (2,3h) user 2 (,s) (,2nd) missing (3,3h) missing user 3 (5,s) missing missing (3,2nd) (5,3h) user 4 (,s) missing missing missing (4,2nd) user 5 (2,s) (4,2nd) (5,3h) (5,4h) missing Discoun facor γ: 0 < γ < is called he discoun facor. Reward r: When user i akes acion a (i) a sae s (i), user i receives a reward or penaly signals defined as follows: r(s (i), a (i) ) = s (i) +2 predicor(i, ) (4) predicor(i, ) is he predicion ha a CF algorihm esimaes and s (i) +2 is he nex, nex sae. The raionale behind (4) will be described in Secion Generaing Episodes As menioned earlier, X R N M consiss of (user, movie, raing) paired enries. If we use he CF daa wih (user, movie, raing, order) paired enries, we can generae he episodes as follows: s (i) j s () s (2) s (N) a () a () 2 a () 3 s 2 s 3 a() T s () a (2) T a (2) 2 a (2) 3 s 2 s 3 a(2) T s (2) a (N) s 2. a (N) 2 s 3 a (N) 3 T a(n) T s (N) is he raing ha user i scores for j-h wached movie. For insance, if he (user, movie, raing, order) enries of user-movie daa are like Table (The enry (,, 2, s) means ha user wached movie firs and gave 2 sars), he episodes are generaed as follows: T

4 3 RLCF RLCF consiss of wo phases: raining and predicion. The former is o compue he Q-able of he MDP, and he laer is o predic he rae for a user-movie pair using boh he oupu of a CF algorihm and he value of he Q-able. 3. Training Based on he MDP formulaion and preliminaries, we learn he Q funcion using Q-learning which can be summarized as (see [9, ] for deailed derivaion): Q(s, a) = Q(s, a) + α[r(s, a) + max a Q(δ(s, a), a ) Q(s, a)] (5) where Q(s, a) is he old esimaed sum of reward, r(s, a) + max a Q(δ(s, a), a ) is new esimaed sum of reward afer a sep forward, and α is he learning rae. Since he number of possible acions in sae s is S, Q(s, a) is a S S able. The episodes generaed as described in Secion 2.3 are used as raining daa for Q-learning. Here is he raining algorihm: Algorihm Training Algorihm in RLCF : Inpu: α : learning rae γ : discoun facor T i : he number of movies user i has raed 2: Oupu: Q(s, a) : he esimaed sum of reward 3: Iniialize s S, a A Q(s, a) = 0; 4: Conver X R N M o raining episode se as in Secion 2.3; 5: for each user i = : N do 6: for each movie j = : T i do 7: Calculae he reward: r(s (i) j, a(i) j ) = s(i) j+2 predicor(i, j); 8: Updae he Q-funcion Q(s, a) using eq (5); 9: end for 0: end for 3.2 Predicion The predicion p (i) j of he enry x ij X ha user i gives for movie j can be calculaed by as follows: p (i) j = predicor(i, j) + Q(s i j 2, a i j 2) (6) In eq (4), he reason why we used he nex nex sae s (i) +2, no he nex sae, in he reward funcion r(s(i), a (i) ) is relaed o eq (6). If we used he nex s (i) +

5 Table 2: The RMSE of RLCF Algorihm Base RLCF improvemen γ α MM SVD SVD sae s (i) +, predicion p(i) j mus be defined as p (i) j = predicor(i, j) + Q(s i j, a i j ) (7) However, i is impossible o know Q(s i j, ai j ) = Q(si j, si j ) when we ry o predic p (i) j. This is because he parameer si j is exacly he raing we wan o predic. 4 Experimen 4. Daa se We adoped MovieLens daa se which conains 0,000,054 raings applied o 0,68 movies by 7,567 users of he online movie recommender service [7]. Users who raed a leas 20 movies were chosen and heir raings were sored chronologically. 20% he mos recen movie raing of he enire daa is used for esing, and he remaining 80% is used for raining. 4.2 Base Predicor As a base predicor for CF, we adoped SVD and SVD++ which is an improvemen over sandard SVD by incorporaing implici feedback, implemened by sochasic gradien descen. (See [4] for deailed descripions on he algorihm.) In addiion, we added a simple movie means (MM) algorihm ha oupus he mean of movie raings. 4.3 Resuls The sae S is defined wih raings in [0.5, 5.0] wih 0.5 inervals, making S = 0 and Q(s, a) is a 0 0 able. We compared he RMSE of RLCF combined wih base predicors wih ha of pure base predicors. We rained SVD, SVD++ predicors wih 0 laen feaures. There are wo parameers in RLCF o be uned: he learning rae α and he discoun facor γ of Q-learning. Tables 2 shows experimenal resuls for each predicor wih various parameer seings. As expeced, SVD++ produced he bes performance while MM produced he wors. RLCF improved he performance regardless of he base predicor. Tha is, MM, SVD, and SVD++ wih RLCF predicor reduced RMSE by , , respecively han is pure predicor. This verifies our idea of incorporaing he conras effec via reinforcemen learning.

6 5 Conclusion We presened a new reinforcemen learning based algorihm, RLCF, for CF. We firs formalize he CF as MDP o model he effec ha he sequences of a users previous raings influence curren raings. Then, he effec is learned and recorded in a Q-able via Q-learning. The experimenal resuls show ha he order of pas raings affec he curren raings significanly, and is hus useful o elici more accurae predicions. There are some ineresing direcions for fuure work. When we formalize he CF as MDP, saes can be defined by oher facors as well as raings. For example, he genre, season, and diverse ags can be defined as saes in MDP. If he movie genre is used, RLCF can discover he effec ha influences raings for acion movies when a user waches an acion movie afer having wached a comedy. In addiion, RLCF can be combined wih oher emerging echnologies o provide relevan informaion o he user. For insance, i can be implemened in auonomous vehicles o make suggesions based on he driver s spaioemporal conex. Acknowledgmens This work was suppored by he Sogang Universiy Research Gran of 202 ( ). References. Bell, R., Koren, Y.: Improved neighborhood-based collaboraive filering. In: Proc. KDD-Cup and Workshop a he 3h ACM SIGKDD Inernaional Conference on Knowledge Discovery and Daa Mining. Cieseer (2007) 2. Bellman, R.: A Markovian decision process. (957) 3. Kaelbling, L., Liman, M., Moore, A.: Reinforcemen learning: A survey. Arxiv preprin cs/ (996) 4. Koren, Y.: Facorizaion mees he neighborhood: a mulifaceed collaboraive filering model. In: Proceeding of he 4h ACM SIGKDD inernaional conference on Knowledge discovery and daa mining. pp ACM (2008) 5. Paerek, A.: Improving regularized singular value decomposiion for collaboraive filering. In: Proceedings of KDD Cup and Workshop. vol Cieseer (2007) 6. Puerman, M.: Markov decision processes. Handbooks in Operaions Research and Managemen Science 2, (990) 7. Riedl, J., Konsan, J.: Movielens daase (998) 8. Salakhudinov, R., Mnih, A., Hinon, G.: Resriced Bolzmann machines for collaboraive filering. In: Proceedings of he 24h inernaional conference on Machine learning. p ACM (2007) 9. Suon, R., Baro, A.: Reinforcemen learning: An inroducion. The MIT press (998) 0. Thursone, L.: A law of comparaive judgmen. Psychological review 34(4), (927). Wakins, C., Dayan, P.: Q-learning. Machine learning 8(3), (992)

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updaes Michael Kearns AT&T Labs mkearns@research.a.com Sainder Singh AT&T Labs baveja@research.a.com Absrac We give he firs rigorous upper bounds on he error

More information

Presentation Overview

Presentation Overview Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

CSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14

CSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14 CSE/NB 58 Lecure 14: From Supervised o Reinforcemen Learning Chaper 9 1 Recall from las ime: Sigmoid Neworks Oupu v T g w u g wiui w Inpu nodes u = u 1 u u 3 T i Sigmoid oupu funcion: 1 g a 1 a e 1 ga

More information

CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9)

CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9) CSE/NB 528 Lecure 14: Reinforcemen Learning Chaper 9 Image from hp://clasdean.la.asu.edu/news/images/ubep2001/neuron3.jpg Lecure figures are from Dayan & Abbo s book hp://people.brandeis.edu/~abbo/book/index.hml

More information

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks - Deep Learning: Theory, Techniques & Applicaions - Recurren Neural Neworks - Prof. Maeo Maeucci maeo.maeucci@polimi.i Deparmen of Elecronics, Informaion and Bioengineering Arificial Inelligence and Roboics

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

An recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes

An recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes WHAT IS A KALMAN FILTER An recursive analyical echnique o esimae ime dependen physical parameers in he presence of noise processes Example of a ime and frequency applicaion: Offse beween wo clocks PREDICTORS,

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Learning Objectives: Practice designing and simulating digital circuits including flip flops Experience state machine design procedure

Learning Objectives: Practice designing and simulating digital circuits including flip flops Experience state machine design procedure Lab 4: Synchronous Sae Machine Design Summary: Design and implemen synchronous sae machine circuis and es hem wih simulaions in Cadence Viruoso. Learning Objecives: Pracice designing and simulaing digial

More information

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate.

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate. Inroducion Gordon Model (1962): D P = r g r = consan discoun rae, g = consan dividend growh rae. If raional expecaions of fuure discoun raes and dividend growh vary over ime, so should he D/P raio. Since

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Probabilisic reasoning over ime So far, we ve mosly deal wih episodic environmens Excepions: games wih muliple moves, planning In paricular, he Bayesian neworks we ve seen so far describe

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H.

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H. ACE 564 Spring 2006 Lecure 7 Exensions of The Muliple Regression Model: Dumm Independen Variables b Professor Sco H. Irwin Readings: Griffihs, Hill and Judge. "Dumm Variables and Varing Coefficien Models

More information

Testing for a Single Factor Model in the Multivariate State Space Framework

Testing for a Single Factor Model in the Multivariate State Space Framework esing for a Single Facor Model in he Mulivariae Sae Space Framework Chen C.-Y. M. Chiba and M. Kobayashi Inernaional Graduae School of Social Sciences Yokohama Naional Universiy Japan Faculy of Economics

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

Chapter 21. Reinforcement Learning. The Reinforcement Learning Agent

Chapter 21. Reinforcement Learning. The Reinforcement Learning Agent CSE 47 Chaper Reinforcemen Learning The Reinforcemen Learning Agen Agen Sae u Reward r Acion a Enironmen CSE AI Faculy Why reinforcemen learning Programming an agen o drie a car or fly a helicoper is ery

More information

Errata (1 st Edition)

Errata (1 st Edition) P Sandborn, os Analysis of Elecronic Sysems, s Ediion, orld Scienific, Singapore, 03 Erraa ( s Ediion) S K 05D Page 8 Equaion (7) should be, E 05D E Nu e S K he L appearing in he equaion in he book does

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

Inventory Control of Perishable Items in a Two-Echelon Supply Chain

Inventory Control of Perishable Items in a Two-Echelon Supply Chain Journal of Indusrial Engineering, Universiy of ehran, Special Issue,, PP. 69-77 69 Invenory Conrol of Perishable Iems in a wo-echelon Supply Chain Fariborz Jolai *, Elmira Gheisariha and Farnaz Nojavan

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Tom Heskes and Onno Zoeter. Presented by Mark Buller Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden

More information

The expectation value of the field operator.

The expectation value of the field operator. The expecaion value of he field operaor. Dan Solomon Universiy of Illinois Chicago, IL dsolom@uic.edu June, 04 Absrac. Much of he mahemaical developmen of quanum field heory has been in suppor of deermining

More information

Air Traffic Forecast Empirical Research Based on the MCMC Method

Air Traffic Forecast Empirical Research Based on the MCMC Method Compuer and Informaion Science; Vol. 5, No. 5; 0 ISSN 93-8989 E-ISSN 93-8997 Published by Canadian Cener of Science and Educaion Air Traffic Forecas Empirical Research Based on he MCMC Mehod Jian-bo Wang,

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

Machine Learning 4771

Machine Learning 4771 ony Jebara, Columbia Universiy achine Learning 4771 Insrucor: ony Jebara ony Jebara, Columbia Universiy opic 20 Hs wih Evidence H Collec H Evaluae H Disribue H Decode H Parameer Learning via JA & E ony

More information

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8) I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression

More information

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON4325 Moneary Policy Dae of exam: Tuesday, May 24, 206 Grades are given: June 4, 206 Time for exam: 2.30 p.m. 5.30 p.m. The problem se covers 5 pages

More information

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H.

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H. ACE 56 Fall 5 Lecure 8: The Simple Linear Regression Model: R, Reporing he Resuls and Predicion by Professor Sco H. Irwin Required Readings: Griffihs, Hill and Judge. "Explaining Variaion in he Dependen

More information

Single-Pass-Based Heuristic Algorithms for Group Flexible Flow-shop Scheduling Problems

Single-Pass-Based Heuristic Algorithms for Group Flexible Flow-shop Scheduling Problems Single-Pass-Based Heurisic Algorihms for Group Flexible Flow-shop Scheduling Problems PEI-YING HUANG, TZUNG-PEI HONG 2 and CHENG-YAN KAO, 3 Deparmen of Compuer Science and Informaion Engineering Naional

More information

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims Problem Se 5 Graduae Macro II, Spring 2017 The Universiy of Nore Dame Professor Sims Insrucions: You may consul wih oher members of he class, bu please make sure o urn in your own work. Where applicable,

More information

Longest Common Prefixes

Longest Common Prefixes Longes Common Prefixes The sandard ordering for srings is he lexicographical order. I is induced by an order over he alphabe. We will use he same symbols (,

More information

Comments on Window-Constrained Scheduling

Comments on Window-Constrained Scheduling Commens on Window-Consrained Scheduling Richard Wes Member, IEEE and Yuing Zhang Absrac This shor repor clarifies he behavior of DWCS wih respec o Theorem 3 in our previously published paper [1], and describes

More information

Linear Gaussian State Space Models

Linear Gaussian State Space Models Linear Gaussian Sae Space Models Srucural Time Series Models Level and Trend Models Basic Srucural Model (BSM Dynamic Linear Models Sae Space Model Represenaion Level, Trend, and Seasonal Models Time Varying

More information

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models. Technical Repor Doc ID: TR--203 06-March-203 (Las revision: 23-Februar-206) On formulaing quadraic funcions in opimizaion models. Auhor: Erling D. Andersen Convex quadraic consrains quie frequenl appear

More information

EECS 141: FALL 00 MIDTERM 2

EECS 141: FALL 00 MIDTERM 2 Universiy of California College of Engineering Deparmen of Elecrical Engineering and Compuer Science J. M. Rabaey TuTh9:30-11am ee141@eecs EECS 141: FALL 00 MIDTERM 2 For all problems, you can assume he

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

A Study of Inventory System with Ramp Type Demand Rate and Shortage in The Light Of Inflation I

A Study of Inventory System with Ramp Type Demand Rate and Shortage in The Light Of Inflation I Inernaional Journal of Mahemaics rends and echnology Volume 7 Number Jan 5 A Sudy of Invenory Sysem wih Ramp ype emand Rae and Shorage in he Ligh Of Inflaion I Sangeea Gupa, R.K. Srivasava, A.K. Singh

More information

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Lecture 5. Time series: ECM. Bernardina Algieri Department Economics, Statistics and Finance

Lecture 5. Time series: ECM. Bernardina Algieri Department Economics, Statistics and Finance Lecure 5 Time series: ECM Bernardina Algieri Deparmen Economics, Saisics and Finance Conens Time Series Modelling Coinegraion Error Correcion Model Two Seps, Engle-Granger procedure Error Correcion Model

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Solutions for Assignment 2

Solutions for Assignment 2 Faculy of rs and Science Universiy of Torono CSC 358 - Inroducion o Compuer Neworks, Winer 218 Soluions for ssignmen 2 Quesion 1 (2 Poins): Go-ack n RQ In his quesion, we review how Go-ack n RQ can be

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time. Supplemenary Figure 1 Spike-coun auocorrelaions in ime. Normalized auocorrelaion marices are shown for each area in a daase. The marix shows he mean correlaion of he spike coun in each ime bin wih he spike

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

Chapter 7: Solving Trig Equations

Chapter 7: Solving Trig Equations Haberman MTH Secion I: The Trigonomeric Funcions Chaper 7: Solving Trig Equaions Le s sar by solving a couple of equaions ha involve he sine funcion EXAMPLE a: Solve he equaion sin( ) The inverse funcions

More information

Logic in computer science

Logic in computer science Logic in compuer science Logic plays an imporan role in compuer science Logic is ofen called he calculus of compuer science Logic plays a similar role in compuer science o ha played by calculus in he physical

More information

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems Paricle Swarm Opimizaion Combining Diversificaion and Inensificaion for Nonlinear Ineger Programming Problems Takeshi Masui, Masaoshi Sakawa, Kosuke Kao and Koichi Masumoo Hiroshima Universiy 1-4-1, Kagamiyama,

More information

Maintenance Models. Prof. Robert C. Leachman IEOR 130, Methods of Manufacturing Improvement Spring, 2011

Maintenance Models. Prof. Robert C. Leachman IEOR 130, Methods of Manufacturing Improvement Spring, 2011 Mainenance Models Prof Rober C Leachman IEOR 3, Mehods of Manufacuring Improvemen Spring, Inroducion The mainenance of complex equipmen ofen accouns for a large porion of he coss associaed wih ha equipmen

More information

CENTRALIZED VERSUS DECENTRALIZED PRODUCTION PLANNING IN SUPPLY CHAINS

CENTRALIZED VERSUS DECENTRALIZED PRODUCTION PLANNING IN SUPPLY CHAINS CENRALIZED VERSUS DECENRALIZED PRODUCION PLANNING IN SUPPLY CHAINS Georges SAHARIDIS* a, Yves DALLERY* a, Fikri KARAESMEN* b * a Ecole Cenrale Paris Deparmen of Indusial Engineering (LGI), +3343388, saharidis,dallery@lgi.ecp.fr

More information

An Analysis of Actor/Critic Algorithms using Eligibility Traces: Reinforcement Learning with Imperfect Value Functions

An Analysis of Actor/Critic Algorithms using Eligibility Traces: Reinforcement Learning with Imperfect Value Functions An Analysis of Acor/Criic Algorihms using Eligibiliy Traces: Reinforcemen Learning wih Imperfec Value Funcions Hajime Kimura 3 Tokyo Insiue of Technology gen@fe.dis.iech.ac.jp Shigenobu Kobayashi Tokyo

More information

Overview. COMP14112: Artificial Intelligence Fundamentals. Lecture 0 Very Brief Overview. Structure of this course

Overview. COMP14112: Artificial Intelligence Fundamentals. Lecture 0 Very Brief Overview. Structure of this course OMP: Arificial Inelligence Fundamenals Lecure 0 Very Brief Overview Lecurer: Email: Xiao-Jun Zeng x.zeng@mancheser.ac.uk Overview This course will focus mainly on probabilisic mehods in AI We shall presen

More information

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006 2.160 Sysem Idenificaion, Esimaion, and Learning Lecure Noes No. 8 March 6, 2006 4.9 Eended Kalman Filer In many pracical problems, he process dynamics are nonlinear. w Process Dynamics v y u Model (Linearized)

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

1. VELOCITY AND ACCELERATION

1. VELOCITY AND ACCELERATION 1. VELOCITY AND ACCELERATION 1.1 Kinemaics Equaions s = u + 1 a and s = v 1 a s = 1 (u + v) v = u + as 1. Displacemen-Time Graph Gradien = speed 1.3 Velociy-Time Graph Gradien = acceleraion Area under

More information

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number

More information

Isolated-word speech recognition using hidden Markov models

Isolated-word speech recognition using hidden Markov models Isolaed-word speech recogniion using hidden Markov models Håkon Sandsmark December 18, 21 1 Inroducion Speech recogniion is a challenging problem on which much work has been done he las decades. Some of

More information

Single and Double Pendulum Models

Single and Double Pendulum Models Single and Double Pendulum Models Mah 596 Projec Summary Spring 2016 Jarod Har 1 Overview Differen ypes of pendulums are used o model many phenomena in various disciplines. In paricular, single and double

More information

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model 1 Boolean and Vecor Space Rerieval Models Many slides in his secion are adaped from Prof. Joydeep Ghosh (UT ECE) who in urn adaped hem from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) Rerieval

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

Forecasting optimally

Forecasting optimally I) ile: Forecas Evaluaion II) Conens: Evaluaing forecass, properies of opimal forecass, esing properies of opimal forecass, saisical comparison of forecas accuracy III) Documenaion: - Diebold, Francis

More information

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given

More information

PROC NLP Approach for Optimal Exponential Smoothing Srihari Jaganathan, Cognizant Technology Solutions, Newbury Park, CA.

PROC NLP Approach for Optimal Exponential Smoothing Srihari Jaganathan, Cognizant Technology Solutions, Newbury Park, CA. PROC NLP Approach for Opimal Exponenial Smoohing Srihari Jaganahan, Cognizan Technology Soluions, Newbury Park, CA. ABSTRACT Esimaion of smoohing parameers and iniial values are some of he basic requiremens

More information

Augmented Reality II - Kalman Filters - Gudrun Klinker May 25, 2004

Augmented Reality II - Kalman Filters - Gudrun Klinker May 25, 2004 Augmened Realiy II Kalman Filers Gudrun Klinker May 25, 2004 Ouline Moivaion Discree Kalman Filer Modeled Process Compuing Model Parameers Algorihm Exended Kalman Filer Kalman Filer for Sensor Fusion Lieraure

More information

Spring Ammar Abu-Hudrouss Islamic University Gaza

Spring Ammar Abu-Hudrouss Islamic University Gaza Chaper 7 Reed-Solomon Code Spring 9 Ammar Abu-Hudrouss Islamic Universiy Gaza ١ Inroducion A Reed Solomon code is a special case of a BCH code in which he lengh of he code is one less han he size of he

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17 EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive

More information

arxiv: v2 [math.oc] 19 Jun 2016

arxiv: v2 [math.oc] 19 Jun 2016 Using Deep Q-Learning o Conrol Opimizaion Hyperparameers Samanha Hansen IBM T.J. Wason Research Cener arxiv:16.6v [mah.oc] 19 Jun 16 Absrac We presen a novel definiion of he reinforcemen learning sae,

More information

Reinforcement Learning: A Tutorial. Scope of Tutorial. 1 Introduction

Reinforcement Learning: A Tutorial. Scope of Tutorial. 1 Introduction Reinforcemen Learning: A Tuorial Mance E. Harmon WL/AACF 224 Avionics Circle Wrigh Laboraory Wrigh-Paerson AFB, OH 45433 mharmon@acm.org Sephanie S. Harmon Wrigh Sae Universiy 56-8 Mallard Glen Drive Cenerville,

More information

BU Macro BU Macro Fall 2008, Lecture 4

BU Macro BU Macro Fall 2008, Lecture 4 Dynamic Programming BU Macro 2008 Lecure 4 1 Ouline 1. Cerainy opimizaion problem used o illusrae: a. Resricions on exogenous variables b. Value funcion c. Policy funcion d. The Bellman equaion and an

More information

Solutions to Odd Number Exercises in Chapter 6

Solutions to Odd Number Exercises in Chapter 6 1 Soluions o Odd Number Exercises in 6.1 R y eˆ 1.7151 y 6.3 From eˆ ( T K) ˆ R 1 1 SST SST SST (1 R ) 55.36(1.7911) we have, ˆ 6.414 T K ( ) 6.5 y ye ye y e 1 1 Consider he erms e and xe b b x e y e b

More information

A unit root test based on smooth transitions and nonlinear adjustment

A unit root test based on smooth transitions and nonlinear adjustment MPRA Munich Personal RePEc Archive A uni roo es based on smooh ransiions and nonlinear adjusmen Aycan Hepsag Isanbul Universiy 5 Ocober 2017 Online a hps://mpra.ub.uni-muenchen.de/81788/ MPRA Paper No.

More information

A First Course on Kinetics and Reaction Engineering. Class 19 on Unit 18

A First Course on Kinetics and Reaction Engineering. Class 19 on Unit 18 A Firs ourse on Kineics and Reacion Engineering lass 19 on Uni 18 Par I - hemical Reacions Par II - hemical Reacion Kineics Where We re Going Par III - hemical Reacion Engineering A. Ideal Reacors B. Perfecly

More information

Adaptive Noise Estimation Based on Non-negative Matrix Factorization

Adaptive Noise Estimation Based on Non-negative Matrix Factorization dvanced cience and Technology Leers Vol.3 (ICC 213), pp.159-163 hp://dx.doi.org/1.14257/asl.213 dapive Noise Esimaion ased on Non-negaive Marix Facorizaion Kwang Myung Jeon and Hong Kook Kim chool of Informaion

More information

Off-policy TD(λ) with a true online equivalence

Off-policy TD(λ) with a true online equivalence Off-policy TD(λ) wih a rue online equivalence Hado van Hassel A Rupam Mahmood Richard S Suon Reinforcemen Learning and Arificial Inelligence Laboraory Universiy of Albera, Edmonon, AB T6G 2E8 Canada Absrac

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

Classical Conditioning IV: TD learning in the brain

Classical Conditioning IV: TD learning in the brain Classical Condiioning IV: TD learning in he brain PSY/NEU338: Animal learning and decision making: Psychological, compuaional and neural perspecives recap: Marr s levels of analysis David Marr (1945-1980)

More information