Nitzan Carmeli. MSc Advisors: Prof. Haya Kaspi, Prof. Avishai Mandelbaum Empirical Support: Dr. Galit Yom-Tov

Size: px
Start display at page:

Download "Nitzan Carmeli. MSc Advisors: Prof. Haya Kaspi, Prof. Avishai Mandelbaum Empirical Support: Dr. Galit Yom-Tov"

Transcription

1 Nitzan Carmeli MSc Advisors: Prof. Haya Kaspi, Prof. Avishai Mandelbaum Empirical Support: Dr. Galit Yom-Tov Industrial Engineering and Management Technion

2 Introduction o Interactive Voice Response (Voice Response Unit) 2

3 Introduction o Agents salaries are typically 60%-80% of the overall operating budget in call centers o Properly designed IVR system increase customer satisfaction increase loyalty decrease staffing costs Customers self-serve Increase profit 3

4 Introduction Literature Review Research Goals Search Model MDP Case Study Model Implications Summary Introduction o Call Center with an IVR system IVR? Success? Abandonment P servers Returns 4

5 Introduction Literature Review Research Goals Search Model MDP Case Study Model Implications Summary Literature Review o Methodologies for evaluating IVRs o Quantifying IVR usability and cost-effectiveness o Agent time being saved by handling the call, or part of the call, in the IVR o Original reasons for calling vs. experience (by analyzing end-to-end calls) 30% completed service in IVR Suhm B. and Peterson P. (2002, 2009) Only 1.6% completed it successfully o Designing and optimizing IVRs o Human-Factor-Engineering o Mainly comparing broad (shallow) designs and narrow (deep) designs. Examples: Schumacher R. et al. (1995), Commarford P. et al. (2008) and many more. 5

6 Literature Review Broad (shallow) design: Narrow (deep) design:

7 Introduction Literature Review Research Goals Search Model MDP Case Study Model Implications Summary Literature Review o Reducing IVR service times o The IVR menu as a service tree reducing the average time to reach a desired service Minimize f ( T ) Salcedo-Sanz S. et al. (2010) M = i= 1 t p i i o Stochastic search in a Forest o Models of stochastic search in the context of R&D projects o Optimality of index policy. Denardo E.V., Rotblum U.G., Van der Heyden L. (2004) Granot D. and Zuckerman D. (1991) 7

8 Research Goals o Modeling customer flow within an IVR system o Using EDA to estimate model parameters and identify usability problems: has actually inspired our theoretical model o Using optimal search solutions and empirical analysis to compare alternative designs and infer design implications 8

9 Search Model Adapted from an actual IVR of a commercial enterprise 9

10 Search Model o Notations G = ( V, E) - rooted tree representing the IVR system M V - set of tree leaves, representing IVR services E - Tree edges, representing IVR menu options s - root vertex, representing the IVR main menu A( i) - set of immediate successors of vertex i pre( i) - immediate predecessor of vertex i s pre( c) = a b a e d c g f 10

11 Search Model o Customer search o Customers pay for each unit of time they spend in the IVR o Customers receive reward when reaching a desired service o Customers may look for more than one service o Customers have finite patience 11

12 Search Model o Customer search o At each stage the customer can choose: 1. Terminate the search and leave the system 2. Terminate the search and opt-out to an agent 3. Move forward to one of i s immediate successors, j A( i) 4. Move backward to i s immediate predecessor, 5. Move backward to the root vertex, s s pre( i) opt-out b a opt-out e d c opt-out g f 12

13 Search Model o IVR flow animation 13

14 Search Model o States Sn = ( i, T, N ) Where i = current vertex T = Total time spent in the serach N : V 1 vector representing repeated visits to each vertex 14

15 Search Model o Model Parameters - Edges N ( i) t i, j - exploration time of edge e = ( i, j), given that vertex i was visited N( i) times before Where N( i) - number of previous visits to vertex i Sn ( i, T, N ) = ( N ( i S = ) ) n 1 j, T + ti, j, N + δ + ( i) 15

16 Search Model ID time before Play Account Balance Total for November2008 to June2009,All days Relative frequencies % :05 00:11 00:17 00:23 00:29 00:35 00:41 00:47 00:53 00:59 01:05 01:11 01:17 Time(mm:ss) (1 sec. resolution) Total [5-10) [10-15) [15-20) [20-30) [30-40) [40-50) [50-100) 16

17 Search Model o Model Parameters - Edges τ - customer patience τ exp( θ ), assumed ( N ( i) ) ( N ( i) τ τ τ ) Pi, j = P > T + ti, j > T = P > ti, j (memoryless) 17

18 Search Model o Model Parameters - Leaves P i r i tser t F - success probability of vertex i - reward earned from successful completion of vertex i ( i) - time spent in vertex i given that we have been successful in visiting it ( i) - time spent in vertex i given that we have not been successful in visiting it T i tser ( i) w. p. Pi = tf ( i) w. p. 1 Pi o For non-leaves vertices, i V \ M : P =1 i r =0, t (i)=0, t (i)=0 i ser F 18

19 Search Model o Search candidates A sequence U = ( i,..., i ) of vertices in V will be called a candidate 1 n Definition 1. A candidate U will be called proper if it starts with with s, and if every two sequential vertices is U : ik, i k + 1, satisfies one of the following conditions: b s a i i k + 1 k + 1 A( i ) pre( i ) i s, i = s k = k k + 1 k e d c Example: U = s a c f c a d g f 19

20 Search Model o Search candidates A sequence U = ( i,..., i ) of vertices in V will be called a candidate 1 n Definition 1. A candidate U will be called proper if it starts with with s, and if every two sequential vertices is U : ik, i k + 1, satisfies one of the following conditions: i i k + 1 k + 1 A( i ) pre( i ) i s, i = s k = k k + 1 k Λ Λ- The - The set set of of all all admissible proper candidates 20

21 Search Model o Search candidates X ( U ) - Net revenue earned by fathoming candidate U [ ] R( U ) = E X ( U ) ( Expectation over: patience, success probabilities, time in vertices and edges) * Our goal is to find a candidate U that maximizes R( U ) ; U Λ 21

22 Search Model o Search candidates Proposition 1. A proper candidate U could be dominated by another candidate (could not be optimal) if it has at least one of the following properties: 1. There is at least one leaf i M which appears more than once in U 2. The candidate U has a subsequence pre( i) i pre( i), where i M 3. The candidate U has a subsequence i s, where i M 4. The candidate U ends with i M 5. The candidate U has a subsequence i pre( i) i 6. The candidate U has a subsequence i s direct sequence to j, where L( i, j) = pre( i) 22

23 Search Model o Search candidates D - The set of all prope r candidates with at least one of the properties of Propositi on 1 Λ ' = Λ \ D The set of all admissible candidates Proposition 2. The set Λ ' is finite and each of its candidates is a finite path 23

24 Search Model Maximal Tree Algorithm T T 0. Initialization: T = ( V, E ) G 1. Spanning the tree with backward transitions 2. Spanning the tree with forward transitions 3. Updating T with the set of new vertices, M if M T = Stop, else, return to step 1. T 24

25 Search Model o Maximal Tree Algorithm U T : s a c f ( i T N ) Recall: S =,, n ( f ) c u U base( u) = current vertex index( u) = ordered vector representing former backward transitions ( f, c ) a d ( f, c ) 25

26 Search Model Theorem. A proper candidate U is admissible (in Λ') iff it is represented T T ( V E ) by a path starting from the root, U T =, The set Λ' = Λ\D and the output of Maximal Tree Algorithm are equivalent 26

27 MDP o Backward induction Index calculation r( v) - discounted revenue earned from vertex v V T r( v) = payment for time spent in vertex r e ce ds w. p. P discounted reward from vertex tser ( v ) α tser ( v ) α s v 0 t F ( v ) ce α s ds 0 payment for time spent in vertex v w. p. 1 Pv z 27

28 MDP o Backward induction Index calculation * T r ( v) - discounted revenue earned from the tree with root vertex v V ( ) = ( ) * * R v E r v T * In the special case where v M : r ( v) = r( v) 28

29 MDP o Backward induction Index calculation T r - discounted revenue earned earned from edge ( u, v) E, u, v given that we successfully reached vertex u after t units of time u R = E r = u, v u, v tu + tu, v α ( tu + tu, v ) * α s E Ι{, } e R ( v) ce ds τ tu + t u v t u + Ι u { tu + tu, v > τ tu } τ α s ce ds τ tu t 29

30 MDP o Backward induction Index calculation We get: t * c c u Ru, v e α θ θ = Lt ( α + θ ) R ( v) 1 1 u, v + α α + θ α α + θ While in u, it is enough to calculate: R c θ c θ = L ( α + θ ) R ( v) α α + θ α α + θ * *, u v t u, v 30

31 MDP o Backward induction Index calculation v A( u) v A( u) * * u, v * Set σ ( u) [ ] if max R 0 : Set R ( u) = E r( u) tser ( u) ( * ) αtser ( u) α s ru + max R u, v e ce ds w. p. Pu v A( u) * * 0 u, v > = tf ( u) ( * ) αtf ( u) α s max u, v.. 1 v A( u) 0 if max R 0 : Set r ( u) = Stop R e ce ds w p P u * * Set R ( u) =E r ( u) v A( u) {, } * * Set σ ( u) = arg max R u v 31

32 MDP G = ( V, E) IVR original tree T T T = ( V, E ) Extended tree representing possible customer actions within the IVR system Calculating indices using backward induction to find optimal search policy 32

33 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Case Study Description o A medium bank o About 450 working agents per day (~200 during peak hours) o Call by call data from May 1 st, 2008 to June 30 th, 2009 o Overall Summary of calls IVR Calls Total % Out of total Average per weekday # Served only by IVR 27,709, % 50,937 # Requesting also agent service 17,280, % 31,765 10% of IVR calls Identification only! 33

34 Case Study Requested by less than 2% of the calls 34

35 Case Study o Estimating abandonments Threshold time Threshold time Threshold time 35

36 Case Study o Estimating abandonments ILBank IVR subevent duration, Query, Current Account - Recent Account Activity Total for May2008 to March2009,All days Fitting Mixtures of Distributions Relative frequencies % Components Mixing Proportions (%) Parameter Estimates Scale Shape Mean Standard Deviation 1. Lognormal Lognormal Lognormal Lognormal Lognormal Lognormal :02 00:11 00:21 00:31 00:41 00:51 01:01 01:11 01:21 Time(mm:ss) (1 sec. resolution) Empirical Total Lognormal Lognormal Lognormal Lognormal Lognormal Lognormal 36

37 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Model Implications Description o Example - Comparing different designs (based on frequent services in our database) 37

38 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Model Implications Description o Example - Comparing different designs c = i, j 0.05 NIS/sec α =0.05 θ =0.004 average customer patience = 250 sec t, t exponentially distributed ser F t exponentially distributed for every i, j Service r Type 1 tser (mean, sec) tf (mean, sec) Play account balance 200c Account Summary Recent Account Activity 400c Account Activity Today Selected Securities Stock Exchange Israel Transfer between Account Credit Card Vouchers 600c P 38

39 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Model Implications Description o Example - Comparing different designs o Design A - original: Optimal candidate : U: Main menu Play Account Balance Main Menu Checking account Recent Account Activity Main menu General Activities Credit Card Vouchers Customer revenue: R( U ) =

40 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Model Implications Description o Example - Comparing different designs o Design B - deep: Optimal candidate: U: Main Menu Play Account Balance Main Menu M1 M2 Recent Account Activity M2 M3 M4 M5 M6 Credit Card Vouchers Customer revenue: R( U ) =

41 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Model Implications Description o Example - Comparing different designs o Design C - shallow: Optimal candidate: U : Main Menu Play Account Balance Main Menu Main Menu Credit Card Vouchers Recent Account Activity Customer revenue: R( U ) =

42 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Model Implications Description o Comparing different designs Organization point of view Example, assume: o 10,000 calls per day o Cost per unit of time of communication = NIS/sec o Cost per second of agent call = 0.03 NIS/sec (based on UG project) Design A - Original Design B Deep Design C - Shallow Customer revenue Cost of communication Revenue from saved agent time Total organization profit 4,736 5,606 4,826 30,564 30,564 30,564 25,828 24,958 25,738 42

43 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Model Implications Description o How to design an IVR tree to encourage the use of certain services? o Customer learning? o Do customers navigate optimally? o Services with low demand lack of interest or lack of patience? o Sensitivity to customer patience? o Anticipating long waiting in agent queue does it affect customer behavior? o Using IVR and state information to control customers behavior examples: return later, use IVR for service 43

44 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Summary Description and Future research o Stochastic models of customer flow within the IVR o Comparing IVR designs o Insights for better designs o Supplementing previous knowledge in HFE, CS fields o Data analysis as a basis for theoretical models o Can be easily modified to other self-service systems Example: Web search engines Hassan A. et al. (2010) o Interesting open questions: o How does one recognize an abandonment from self-service when seeing one? o Interrelation between customer-ivr interaction and customer-agent interaction? 44

45

Service Engineering January Laws of Congestion

Service Engineering January Laws of Congestion Service Engineering January 2004 http://ie.technion.ac.il/serveng2004 Laws of Congestion The Law for (The) Causes of Operational Queues Scarce Resources Synchronization Gaps (in DS PERT Networks) Linear

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

Analysis of Software Artifacts

Analysis of Software Artifacts Analysis of Software Artifacts System Performance I Shu-Ngai Yeung (with edits by Jeannette Wing) Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 2001 by Carnegie Mellon University

More information

Managing Call Centers with Many Strategic Agents

Managing Call Centers with Many Strategic Agents Managing Call Centers with Many Strategic Agents Rouba Ibrahim, Kenan Arifoglu Management Science and Innovation, UCL YEQT Workshop - Eindhoven November 2014 Work Schedule Flexibility 86SwofwthewqBestwCompanieswtowWorkwForqwofferw

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Chapter 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

Chapter 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 7: Eligibility Traces R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Midterm Mean = 77.33 Median = 82 R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

More information

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free

More information

Decision Theory: Q-Learning

Decision Theory: Q-Learning Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning

More information

Allocating Resources, in the Future

Allocating Resources, in the Future Allocating Resources, in the Future Sid Banerjee School of ORIE May 3, 2018 Simons Workshop on Mathematical and Computational Challenges in Real-Time Decision Making online resource allocation: basic model......

More information

IOE 202: lectures 11 and 12 outline

IOE 202: lectures 11 and 12 outline IOE 202: lectures 11 and 12 outline Announcements Last time... Queueing models intro Performance characteristics of a queueing system Steady state analysis of an M/M/1 queueing system Other queueing systems,

More information

Exercises Stochastic Performance Modelling. Hamilton Institute, Summer 2010

Exercises Stochastic Performance Modelling. Hamilton Institute, Summer 2010 Exercises Stochastic Performance Modelling Hamilton Institute, Summer Instruction Exercise Let X be a non-negative random variable with E[X ]

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS 188 Fall 2018 Introduction to Artificial Intelligence Practice Final You have approximately 2 hours 50 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib

More information

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes. CS 188 Spring 2014 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed notes except your two-page crib sheet. Mark your answers

More information

Robustness and performance of threshold-based resource allocation policies

Robustness and performance of threshold-based resource allocation policies Robustness and performance of threshold-based resource allocation policies Takayuki Osogami Mor Harchol-Balter Alan Scheller-Wolf Computer Science Department, Carnegie Mellon University, 5000 Forbes Avenue,

More information

Reinforcement Learning and Control

Reinforcement Learning and Control CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make

More information

CS 4100 // artificial intelligence. Recap/midterm review!

CS 4100 // artificial intelligence. Recap/midterm review! CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks

More information

The M/M/n+G Queue: Summary of Performance Measures

The M/M/n+G Queue: Summary of Performance Measures The M/M/n+G Queue: Summary of Performance Measures Avishai Mandelbaum and Sergey Zeltyn Faculty of Industrial Engineering & Management Technion Haifa 32 ISRAEL emails: avim@tx.technion.ac.il zeltyn@ie.technion.ac.il

More information

Dynamic Pricing for Non-Perishable Products with Demand Learning

Dynamic Pricing for Non-Perishable Products with Demand Learning Dynamic Pricing for Non-Perishable Products with Demand Learning Victor F. Araman Stern School of Business New York University René A. Caldentey DIMACS Workshop on Yield Management and Dynamic Pricing

More information

Planning in Markov Decision Processes

Planning in Markov Decision Processes Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and

More information

Dynamic Call Center Routing Policies Using Call Waiting and Agent Idle Times Online Supplement

Dynamic Call Center Routing Policies Using Call Waiting and Agent Idle Times Online Supplement Submitted to imanufacturing & Service Operations Management manuscript MSOM-11-370.R3 Dynamic Call Center Routing Policies Using Call Waiting and Agent Idle Times Online Supplement (Authors names blinded

More information

Control of Fork-Join Networks in Heavy-Traffic

Control of Fork-Join Networks in Heavy-Traffic in Heavy-Traffic Asaf Zviran Based on MSc work under the guidance of Rami Atar (Technion) and Avishai Mandelbaum (Technion) Industrial Engineering and Management Technion June 2010 Introduction Network

More information

Reinforcement Learning

Reinforcement Learning 1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision

More information

Chapter 5. Continuous-Time Markov Chains. Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan

Chapter 5. Continuous-Time Markov Chains. Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan Chapter 5. Continuous-Time Markov Chains Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan Continuous-Time Markov Chains Consider a continuous-time stochastic process

More information

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Christos Dimitrakakis Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Microeconomic Theory (501b) Problem Set 10. Auctions and Moral Hazard Suggested Solution: Tibor Heumann

Microeconomic Theory (501b) Problem Set 10. Auctions and Moral Hazard Suggested Solution: Tibor Heumann Dirk Bergemann Department of Economics Yale University Microeconomic Theory (50b) Problem Set 0. Auctions and Moral Hazard Suggested Solution: Tibor Heumann 4/5/4 This problem set is due on Tuesday, 4//4..

More information

CSC321 Lecture 22: Q-Learning

CSC321 Lecture 22: Q-Learning CSC321 Lecture 22: Q-Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Q-Learning 1 / 21 Overview Second of 3 lectures on reinforcement learning Last time: policy gradient (e.g. REINFORCE) Optimize

More information

Real Time Value Iteration and the State-Action Value Function

Real Time Value Iteration and the State-Action Value Function MS&E338 Reinforcement Learning Lecture 3-4/9/18 Real Time Value Iteration and the State-Action Value Function Lecturer: Ben Van Roy Scribe: Apoorva Sharma and Tong Mu 1 Review Last time we left off discussing

More information

Q-Learning in Continuous State Action Spaces

Q-Learning in Continuous State Action Spaces Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental

More information

Economic Growth: Lecture 8, Overlapping Generations

Economic Growth: Lecture 8, Overlapping Generations 14.452 Economic Growth: Lecture 8, Overlapping Generations Daron Acemoglu MIT November 20, 2018 Daron Acemoglu (MIT) Economic Growth Lecture 8 November 20, 2018 1 / 46 Growth with Overlapping Generations

More information

λ λ λ In-class problems

λ λ λ In-class problems In-class problems 1. Customers arrive at a single-service facility at a Poisson rate of 40 per hour. When two or fewer customers are present, a single attendant operates the facility, and the service time

More information

Dynamic Call Center Routing Policies Using Call Waiting and Agent Idle Times Online Supplement

Dynamic Call Center Routing Policies Using Call Waiting and Agent Idle Times Online Supplement Dynamic Call Center Routing Policies Using Call Waiting and Agent Idle Times Online Supplement Wyean Chan DIRO, Université de Montréal, C.P. 6128, Succ. Centre-Ville, Montréal (Québec), H3C 3J7, CANADA,

More information

Temporal difference learning

Temporal difference learning Temporal difference learning AI & Agents for IET Lecturer: S Luz http://www.scss.tcd.ie/~luzs/t/cs7032/ February 4, 2014 Recall background & assumptions Environment is a finite MDP (i.e. A and S are finite).

More information

16.410/413 Principles of Autonomy and Decision Making

16.410/413 Principles of Autonomy and Decision Making 16.410/413 Principles of Autonomy and Decision Making Lecture 23: Markov Decision Processes Policy Iteration Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute of Technology December

More information

Frequently Asked Questions

Frequently Asked Questions Frequently Asked Questions Can I still get paid via direct deposit? Can I use e- wallet to pay for USANA auto ship orders? Can I use e- wallet to pay for USANA products? Can I use e- wallet to pay for

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

A Semi-parametric Bayesian Framework for Performance Analysis of Call Centers

A Semi-parametric Bayesian Framework for Performance Analysis of Call Centers Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session STS065) p.2345 A Semi-parametric Bayesian Framework for Performance Analysis of Call Centers Bangxian Wu and Xiaowei

More information

Electronic Market Making and Latency

Electronic Market Making and Latency Electronic Market Making and Latency Xuefeng Gao 1, Yunhan Wang 2 June 15, 2018 Abstract This paper studies the profitability of market making strategies and the impact of latency on electronic market

More information

Internet Monetization

Internet Monetization Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition

More information

Reinforcement Learning. Yishay Mansour Tel-Aviv University

Reinforcement Learning. Yishay Mansour Tel-Aviv University Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak

More information

Analysis of Call Center Data

Analysis of Call Center Data University of Pennsylvania ScholarlyCommons Wharton Research Scholars Wharton School 4-1-2004 Analysis of Call Center Data Yu Chu Cheng University of Pennsylvania Follow this and additional works at: http://repository.upenn.edu/wharton_research_scholars

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

Reinforcement Learning as Variational Inference: Two Recent Approaches

Reinforcement Learning as Variational Inference: Two Recent Approaches Reinforcement Learning as Variational Inference: Two Recent Approaches Rohith Kuditipudi Duke University 11 August 2017 Outline 1 Background 2 Stein Variational Policy Gradient 3 Soft Q-Learning 4 Closing

More information

Data-Based Resource-View of Service Networks: Performance Analysis, Delay Prediction and Asymptotics. PhD Proposal. Nitzan Carmeli.

Data-Based Resource-View of Service Networks: Performance Analysis, Delay Prediction and Asymptotics. PhD Proposal. Nitzan Carmeli. Data-Based Resource-View of Service Networks: Performance Analysis, Delay Prediction and Asymptotics PhD Proposal Nitzan Carmeli Advisers: Prof. Avishai Mandelbaum Dr. Galit Yom-Tov Technion - Israel Institute

More information

Dynamic Control of Parallel-Server Systems

Dynamic Control of Parallel-Server Systems Dynamic Control of Parallel-Server Systems Jim Dai Georgia Institute of Technology Tolga Tezcan University of Illinois at Urbana-Champaign May 13, 2009 Jim Dai (Georgia Tech) Many-Server Asymptotic Optimality

More information

The Book: Where we are and where we re going. CSE 190: Reinforcement Learning: An Introduction. Chapter 7: Eligibility Traces. Simple Monte Carlo

The Book: Where we are and where we re going. CSE 190: Reinforcement Learning: An Introduction. Chapter 7: Eligibility Traces. Simple Monte Carlo CSE 190: Reinforcement Learning: An Introduction Chapter 7: Eligibility races Acknowledgment: A good number of these slides are cribbed from Rich Sutton he Book: Where we are and where we re going Part

More information

Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem

Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 1/37 Bayesian Congestion Control over a Markovian Network Bandwidth Process:

More information

Abandonment and Customers Patience in Tele-Queues. The Palm/Erlang-A Model.

Abandonment and Customers Patience in Tele-Queues. The Palm/Erlang-A Model. Sergey Zeltyn February 25 zeltyn@ie.technion.ac.il STAT 991. Service Engineering. The Wharton School. University of Pennsylvania. Abandonment and Customers Patience in Tele-Queues. The Palm/Erlang-A Model.

More information

Q-learning. Tambet Matiisen

Q-learning. Tambet Matiisen Q-learning Tambet Matiisen (based on chapter 11.3 of online book Artificial Intelligence, foundations of computational agents by David Poole and Alan Mackworth) Stochastic gradient descent Experience

More information

Elements of Reinforcement Learning

Elements of Reinforcement Learning Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,

More information

Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement

Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement Jefferson Huang March 21, 2018 Reinforcement Learning for Processing Networks Seminar Cornell University Performance

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS 188 Spring 2017 Introduction to Artificial Intelligence Midterm V2 You have approximately 80 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. Mark

More information

CS221 Practice Midterm

CS221 Practice Midterm CS221 Practice Midterm Autumn 2012 1 ther Midterms The following pages are excerpts from similar classes midterms. The content is similar to what we ve been covering this quarter, so that it should be

More information

Statistics 150: Spring 2007

Statistics 150: Spring 2007 Statistics 150: Spring 2007 April 23, 2008 0-1 1 Limiting Probabilities If the discrete-time Markov chain with transition probabilities p ij is irreducible and positive recurrent; then the limiting probabilities

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.

More information

Lecture 9: Deterministic Fluid Models and Many-Server Heavy-Traffic Limits. IEOR 4615: Service Engineering Professor Whitt February 19, 2015

Lecture 9: Deterministic Fluid Models and Many-Server Heavy-Traffic Limits. IEOR 4615: Service Engineering Professor Whitt February 19, 2015 Lecture 9: Deterministic Fluid Models and Many-Server Heavy-Traffic Limits IEOR 4615: Service Engineering Professor Whitt February 19, 2015 Outline Deterministic Fluid Models Directly From Data: Cumulative

More information

Time Preferences. Mark Dean. ECON 2090 Spring 2015

Time Preferences. Mark Dean. ECON 2090 Spring 2015 Time Preferences Mark Dean ECON 2090 Spring 2015 Two Standard Ways Last week we suggested two possible ways of spotting temptation 1 Preference for Commitment 2 Time inconsistency Previously we covered

More information

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm You have 80 minutes. The exam is closed book, closed notes except a one-page crib sheet, basic calculators only.

More information

16.4 Multiattribute Utility Functions

16.4 Multiattribute Utility Functions 285 Normalized utilities The scale of utilities reaches from the best possible prize u to the worst possible catastrophe u Normalized utilities use a scale with u = 0 and u = 1 Utilities of intermediate

More information

An Analysis of the Preemptive Repeat Queueing Discipline

An Analysis of the Preemptive Repeat Queueing Discipline An Analysis of the Preemptive Repeat Queueing Discipline Tony Field August 3, 26 Abstract An analysis of two variants of preemptive repeat or preemptive repeat queueing discipline is presented: one in

More information

How to Calculate Form Line 15

How to Calculate Form Line 15 How to Calculate Form 8621 - Line 15 2013-2015 Comprehensive Example Mary Beth Lougen EA USTCP Chief Operating Officer Expat Tax Tools B.Lougen@f8621.com 1 (844) 312-8670 ext. 402 www.f8621.com www.expattaxtools.com

More information

Staffing Tandem Queues with Impatient Customers Application in Financial Service Operations

Staffing Tandem Queues with Impatient Customers Application in Financial Service Operations Staffing Tandem Queues with Impatient Customers Application in Financial Service Operations Jianfu Wang, Hossein Abouee-Mehrizi 2, Opher Baron 3, Oded Berman 3 : Nanyang Business School, Nanyang Technological

More information

stochnotes Page 1

stochnotes Page 1 stochnotes110308 Page 1 Kolmogorov forward and backward equations and Poisson process Monday, November 03, 2008 11:58 AM How can we apply the Kolmogorov equations to calculate various statistics of interest?

More information

Erlang-C = M/M/N. agents. queue ACD. arrivals. busy ACD. queue. abandonment BACK FRONT. lost calls. arrivals. lost calls

Erlang-C = M/M/N. agents. queue ACD. arrivals. busy ACD. queue. abandonment BACK FRONT. lost calls. arrivals. lost calls Erlang-C = M/M/N agents arrivals ACD queue Erlang-A lost calls FRONT BACK arrivals busy ACD queue abandonment lost calls Erlang-C = M/M/N agents arrivals ACD queue Rough Performance Analysis

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Temporal Difference Learning Temporal difference learning, TD prediction, Q-learning, elibigility traces. (many slides from Marc Toussaint) Vien Ngo MLR, University of Stuttgart

More information

Bayesian Congestion Control over a Markovian Network Bandwidth Process

Bayesian Congestion Control over a Markovian Network Bandwidth Process Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 1/30 Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard (USC) Joint work

More information

COMP9334 Capacity Planning for Computer Systems and Networks

COMP9334 Capacity Planning for Computer Systems and Networks COMP9334 Capacity Planning for Computer Systems and Networks Week 2: Operational Analysis and Workload Characterisation COMP9334 1 Last lecture Modelling of computer systems using Queueing Networks Open

More information

Queueing Theory I Summary! Little s Law! Queueing System Notation! Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K "

Queueing Theory I Summary! Little s Law! Queueing System Notation! Stationary Analysis of Elementary Queueing Systems  M/M/1  M/M/m  M/M/1/K Queueing Theory I Summary Little s Law Queueing System Notation Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K " Little s Law a(t): the process that counts the number of arrivals

More information

Delay Announcements. Predicting Queueing Delays for Delay Announcements. IEOR 4615, Service Engineering, Professor Whitt. Lecture 21, April 21, 2015

Delay Announcements. Predicting Queueing Delays for Delay Announcements. IEOR 4615, Service Engineering, Professor Whitt. Lecture 21, April 21, 2015 Delay Announcements Predicting Queueing Delays for Delay Announcements IEOR 4615, Service Engineering, Professor Whitt Lecture 21, April 21, 2015 OUTLINE Delay Announcements: Why, What and When? [Slide

More information

Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach

Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach B. Bischoff 1, D. Nguyen-Tuong 1,H.Markert 1 anda.knoll 2 1- Robert Bosch GmbH - Corporate Research Robert-Bosch-Str. 2, 71701

More information

CS281A/Stat241A Lecture 19

CS281A/Stat241A Lecture 19 CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723

More information

RL 3: Reinforcement Learning

RL 3: Reinforcement Learning RL 3: Reinforcement Learning Q-Learning Michael Herrmann University of Edinburgh, School of Informatics 20/01/2015 Last time: Multi-Armed Bandits (10 Points to remember) MAB applications do exist (e.g.

More information

arxiv: v1 [cs.ai] 5 Nov 2017

arxiv: v1 [cs.ai] 5 Nov 2017 arxiv:1711.01569v1 [cs.ai] 5 Nov 2017 Markus Dumke Department of Statistics Ludwig-Maximilians-Universität München markus.dumke@campus.lmu.de Abstract Temporal-difference (TD) learning is an important

More information

Sequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague

Sequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague Sequential decision making under uncertainty Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

The Performance Impact of Delay Announcements

The Performance Impact of Delay Announcements The Performance Impact of Delay Announcements Taking Account of Customer Response IEOR 4615, Service Engineering, Professor Whitt Supplement to Lecture 21, April 21, 2015 Review: The Purpose of Delay Announcements

More information

Economics 2010c: Lectures 9-10 Bellman Equation in Continuous Time

Economics 2010c: Lectures 9-10 Bellman Equation in Continuous Time Economics 2010c: Lectures 9-10 Bellman Equation in Continuous Time David Laibson 9/30/2014 Outline Lectures 9-10: 9.1 Continuous-time Bellman Equation 9.2 Application: Merton s Problem 9.3 Application:

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Exercises, II part Exercises, II part

Exercises, II part Exercises, II part Inference: 12 Jul 2012 Consider the following Joint Probability Table for the three binary random variables A, B, C. Compute the following queries: 1 P(C A=T,B=T) 2 P(C A=T) P(A, B, C) A B C 0.108 T T

More information

Bayesian reinforcement learning and partially observable Markov decision processes November 6, / 24

Bayesian reinforcement learning and partially observable Markov decision processes November 6, / 24 and partially observable Markov decision processes Christos Dimitrakakis EPFL November 6, 2013 Bayesian reinforcement learning and partially observable Markov decision processes November 6, 2013 1 / 24

More information

Hidden Markov Models (HMM) and Support Vector Machine (SVM)

Hidden Markov Models (HMM) and Support Vector Machine (SVM) Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

Probabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford

Probabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford Probabilistic Model Checking Michaelmas Term 20 Dr. Dave Parker Department of Computer Science University of Oxford Overview PCTL for MDPs syntax, semantics, examples PCTL model checking next, bounded

More information

Reinforcement Learning. Spring 2018 Defining MDPs, Planning

Reinforcement Learning. Spring 2018 Defining MDPs, Planning Reinforcement Learning Spring 2018 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state

More information

Reinforcement Learning II

Reinforcement Learning II Reinforcement Learning II Andrea Bonarini Artificial Intelligence and Robotics Lab Department of Electronics and Information Politecnico di Milano E-mail: bonarini@elet.polimi.it URL:http://www.dei.polimi.it/people/bonarini

More information

The exponential distribution and the Poisson process

The exponential distribution and the Poisson process The exponential distribution and the Poisson process 1-1 Exponential Distribution: Basic Facts PDF f(t) = { λe λt, t 0 0, t < 0 CDF Pr{T t) = 0 t λe λu du = 1 e λt (t 0) Mean E[T] = 1 λ Variance Var[T]

More information

Scheduling Markovian PERT networks to maximize the net present value: new results

Scheduling Markovian PERT networks to maximize the net present value: new results Scheduling Markovian PERT networks to maximize the net present value: new results Hermans B, Leus R. KBI_1709 Scheduling Markovian PERT networks to maximize the net present value: New results Ben Hermans,a

More information

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study

More information

Moment-based Availability Prediction for Bike-Sharing Systems

Moment-based Availability Prediction for Bike-Sharing Systems Moment-based Availability Prediction for Bike-Sharing Systems Jane Hillston Joint work with Cheng Feng and Daniël Reijsbergen LFCS, School of Informatics, University of Edinburgh http://www.quanticol.eu

More information

Chapter 3 Deterministic planning

Chapter 3 Deterministic planning Chapter 3 Deterministic planning In this chapter we describe a number of algorithms for solving the historically most important and most basic type of planning problem. Two rather strong simplifying assumptions

More information

Clinic Scheduling Models with Overbooking for Patients with Heterogeneous No-show Probabilities

Clinic Scheduling Models with Overbooking for Patients with Heterogeneous No-show Probabilities 1 2 3 4 5 6 7 8 9 Clinic Scheduling Models with Overbooking for Patients with Heterogeneous No-show Probabilities Bo Zeng, Ji Lin and Mark Lawley email: {bzeng, lin35, malawley}@purdue.edu Weldon School

More information

Trust Region Policy Optimization

Trust Region Policy Optimization Trust Region Policy Optimization Yixin Lin Duke University yixin.lin@duke.edu March 28, 2017 Yixin Lin (Duke) TRPO March 28, 2017 1 / 21 Overview 1 Preliminaries Markov Decision Processes Policy iteration

More information

1 [15 points] Search Strategies

1 [15 points] Search Strategies Probabilistic Foundations of Artificial Intelligence Final Exam Date: 29 January 2013 Time limit: 120 minutes Number of pages: 12 You can use the back of the pages if you run out of space. strictly forbidden.

More information