Nitzan Carmeli. MSc Advisors: Prof. Haya Kaspi, Prof. Avishai Mandelbaum Empirical Support: Dr. Galit Yom-Tov
|
|
- Katrina King
- 5 years ago
- Views:
Transcription
1 Nitzan Carmeli MSc Advisors: Prof. Haya Kaspi, Prof. Avishai Mandelbaum Empirical Support: Dr. Galit Yom-Tov Industrial Engineering and Management Technion
2 Introduction o Interactive Voice Response (Voice Response Unit) 2
3 Introduction o Agents salaries are typically 60%-80% of the overall operating budget in call centers o Properly designed IVR system increase customer satisfaction increase loyalty decrease staffing costs Customers self-serve Increase profit 3
4 Introduction Literature Review Research Goals Search Model MDP Case Study Model Implications Summary Introduction o Call Center with an IVR system IVR? Success? Abandonment P servers Returns 4
5 Introduction Literature Review Research Goals Search Model MDP Case Study Model Implications Summary Literature Review o Methodologies for evaluating IVRs o Quantifying IVR usability and cost-effectiveness o Agent time being saved by handling the call, or part of the call, in the IVR o Original reasons for calling vs. experience (by analyzing end-to-end calls) 30% completed service in IVR Suhm B. and Peterson P. (2002, 2009) Only 1.6% completed it successfully o Designing and optimizing IVRs o Human-Factor-Engineering o Mainly comparing broad (shallow) designs and narrow (deep) designs. Examples: Schumacher R. et al. (1995), Commarford P. et al. (2008) and many more. 5
6 Literature Review Broad (shallow) design: Narrow (deep) design:
7 Introduction Literature Review Research Goals Search Model MDP Case Study Model Implications Summary Literature Review o Reducing IVR service times o The IVR menu as a service tree reducing the average time to reach a desired service Minimize f ( T ) Salcedo-Sanz S. et al. (2010) M = i= 1 t p i i o Stochastic search in a Forest o Models of stochastic search in the context of R&D projects o Optimality of index policy. Denardo E.V., Rotblum U.G., Van der Heyden L. (2004) Granot D. and Zuckerman D. (1991) 7
8 Research Goals o Modeling customer flow within an IVR system o Using EDA to estimate model parameters and identify usability problems: has actually inspired our theoretical model o Using optimal search solutions and empirical analysis to compare alternative designs and infer design implications 8
9 Search Model Adapted from an actual IVR of a commercial enterprise 9
10 Search Model o Notations G = ( V, E) - rooted tree representing the IVR system M V - set of tree leaves, representing IVR services E - Tree edges, representing IVR menu options s - root vertex, representing the IVR main menu A( i) - set of immediate successors of vertex i pre( i) - immediate predecessor of vertex i s pre( c) = a b a e d c g f 10
11 Search Model o Customer search o Customers pay for each unit of time they spend in the IVR o Customers receive reward when reaching a desired service o Customers may look for more than one service o Customers have finite patience 11
12 Search Model o Customer search o At each stage the customer can choose: 1. Terminate the search and leave the system 2. Terminate the search and opt-out to an agent 3. Move forward to one of i s immediate successors, j A( i) 4. Move backward to i s immediate predecessor, 5. Move backward to the root vertex, s s pre( i) opt-out b a opt-out e d c opt-out g f 12
13 Search Model o IVR flow animation 13
14 Search Model o States Sn = ( i, T, N ) Where i = current vertex T = Total time spent in the serach N : V 1 vector representing repeated visits to each vertex 14
15 Search Model o Model Parameters - Edges N ( i) t i, j - exploration time of edge e = ( i, j), given that vertex i was visited N( i) times before Where N( i) - number of previous visits to vertex i Sn ( i, T, N ) = ( N ( i S = ) ) n 1 j, T + ti, j, N + δ + ( i) 15
16 Search Model ID time before Play Account Balance Total for November2008 to June2009,All days Relative frequencies % :05 00:11 00:17 00:23 00:29 00:35 00:41 00:47 00:53 00:59 01:05 01:11 01:17 Time(mm:ss) (1 sec. resolution) Total [5-10) [10-15) [15-20) [20-30) [30-40) [40-50) [50-100) 16
17 Search Model o Model Parameters - Edges τ - customer patience τ exp( θ ), assumed ( N ( i) ) ( N ( i) τ τ τ ) Pi, j = P > T + ti, j > T = P > ti, j (memoryless) 17
18 Search Model o Model Parameters - Leaves P i r i tser t F - success probability of vertex i - reward earned from successful completion of vertex i ( i) - time spent in vertex i given that we have been successful in visiting it ( i) - time spent in vertex i given that we have not been successful in visiting it T i tser ( i) w. p. Pi = tf ( i) w. p. 1 Pi o For non-leaves vertices, i V \ M : P =1 i r =0, t (i)=0, t (i)=0 i ser F 18
19 Search Model o Search candidates A sequence U = ( i,..., i ) of vertices in V will be called a candidate 1 n Definition 1. A candidate U will be called proper if it starts with with s, and if every two sequential vertices is U : ik, i k + 1, satisfies one of the following conditions: b s a i i k + 1 k + 1 A( i ) pre( i ) i s, i = s k = k k + 1 k e d c Example: U = s a c f c a d g f 19
20 Search Model o Search candidates A sequence U = ( i,..., i ) of vertices in V will be called a candidate 1 n Definition 1. A candidate U will be called proper if it starts with with s, and if every two sequential vertices is U : ik, i k + 1, satisfies one of the following conditions: i i k + 1 k + 1 A( i ) pre( i ) i s, i = s k = k k + 1 k Λ Λ- The - The set set of of all all admissible proper candidates 20
21 Search Model o Search candidates X ( U ) - Net revenue earned by fathoming candidate U [ ] R( U ) = E X ( U ) ( Expectation over: patience, success probabilities, time in vertices and edges) * Our goal is to find a candidate U that maximizes R( U ) ; U Λ 21
22 Search Model o Search candidates Proposition 1. A proper candidate U could be dominated by another candidate (could not be optimal) if it has at least one of the following properties: 1. There is at least one leaf i M which appears more than once in U 2. The candidate U has a subsequence pre( i) i pre( i), where i M 3. The candidate U has a subsequence i s, where i M 4. The candidate U ends with i M 5. The candidate U has a subsequence i pre( i) i 6. The candidate U has a subsequence i s direct sequence to j, where L( i, j) = pre( i) 22
23 Search Model o Search candidates D - The set of all prope r candidates with at least one of the properties of Propositi on 1 Λ ' = Λ \ D The set of all admissible candidates Proposition 2. The set Λ ' is finite and each of its candidates is a finite path 23
24 Search Model Maximal Tree Algorithm T T 0. Initialization: T = ( V, E ) G 1. Spanning the tree with backward transitions 2. Spanning the tree with forward transitions 3. Updating T with the set of new vertices, M if M T = Stop, else, return to step 1. T 24
25 Search Model o Maximal Tree Algorithm U T : s a c f ( i T N ) Recall: S =,, n ( f ) c u U base( u) = current vertex index( u) = ordered vector representing former backward transitions ( f, c ) a d ( f, c ) 25
26 Search Model Theorem. A proper candidate U is admissible (in Λ') iff it is represented T T ( V E ) by a path starting from the root, U T =, The set Λ' = Λ\D and the output of Maximal Tree Algorithm are equivalent 26
27 MDP o Backward induction Index calculation r( v) - discounted revenue earned from vertex v V T r( v) = payment for time spent in vertex r e ce ds w. p. P discounted reward from vertex tser ( v ) α tser ( v ) α s v 0 t F ( v ) ce α s ds 0 payment for time spent in vertex v w. p. 1 Pv z 27
28 MDP o Backward induction Index calculation * T r ( v) - discounted revenue earned from the tree with root vertex v V ( ) = ( ) * * R v E r v T * In the special case where v M : r ( v) = r( v) 28
29 MDP o Backward induction Index calculation T r - discounted revenue earned earned from edge ( u, v) E, u, v given that we successfully reached vertex u after t units of time u R = E r = u, v u, v tu + tu, v α ( tu + tu, v ) * α s E Ι{, } e R ( v) ce ds τ tu + t u v t u + Ι u { tu + tu, v > τ tu } τ α s ce ds τ tu t 29
30 MDP o Backward induction Index calculation We get: t * c c u Ru, v e α θ θ = Lt ( α + θ ) R ( v) 1 1 u, v + α α + θ α α + θ While in u, it is enough to calculate: R c θ c θ = L ( α + θ ) R ( v) α α + θ α α + θ * *, u v t u, v 30
31 MDP o Backward induction Index calculation v A( u) v A( u) * * u, v * Set σ ( u) [ ] if max R 0 : Set R ( u) = E r( u) tser ( u) ( * ) αtser ( u) α s ru + max R u, v e ce ds w. p. Pu v A( u) * * 0 u, v > = tf ( u) ( * ) αtf ( u) α s max u, v.. 1 v A( u) 0 if max R 0 : Set r ( u) = Stop R e ce ds w p P u * * Set R ( u) =E r ( u) v A( u) {, } * * Set σ ( u) = arg max R u v 31
32 MDP G = ( V, E) IVR original tree T T T = ( V, E ) Extended tree representing possible customer actions within the IVR system Calculating indices using backward induction to find optimal search policy 32
33 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Case Study Description o A medium bank o About 450 working agents per day (~200 during peak hours) o Call by call data from May 1 st, 2008 to June 30 th, 2009 o Overall Summary of calls IVR Calls Total % Out of total Average per weekday # Served only by IVR 27,709, % 50,937 # Requesting also agent service 17,280, % 31,765 10% of IVR calls Identification only! 33
34 Case Study Requested by less than 2% of the calls 34
35 Case Study o Estimating abandonments Threshold time Threshold time Threshold time 35
36 Case Study o Estimating abandonments ILBank IVR subevent duration, Query, Current Account - Recent Account Activity Total for May2008 to March2009,All days Fitting Mixtures of Distributions Relative frequencies % Components Mixing Proportions (%) Parameter Estimates Scale Shape Mean Standard Deviation 1. Lognormal Lognormal Lognormal Lognormal Lognormal Lognormal :02 00:11 00:21 00:31 00:41 00:51 01:01 01:11 01:21 Time(mm:ss) (1 sec. resolution) Empirical Total Lognormal Lognormal Lognormal Lognormal Lognormal Lognormal 36
37 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Model Implications Description o Example - Comparing different designs (based on frequent services in our database) 37
38 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Model Implications Description o Example - Comparing different designs c = i, j 0.05 NIS/sec α =0.05 θ =0.004 average customer patience = 250 sec t, t exponentially distributed ser F t exponentially distributed for every i, j Service r Type 1 tser (mean, sec) tf (mean, sec) Play account balance 200c Account Summary Recent Account Activity 400c Account Activity Today Selected Securities Stock Exchange Israel Transfer between Account Credit Card Vouchers 600c P 38
39 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Model Implications Description o Example - Comparing different designs o Design A - original: Optimal candidate : U: Main menu Play Account Balance Main Menu Checking account Recent Account Activity Main menu General Activities Credit Card Vouchers Customer revenue: R( U ) =
40 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Model Implications Description o Example - Comparing different designs o Design B - deep: Optimal candidate: U: Main Menu Play Account Balance Main Menu M1 M2 Recent Account Activity M2 M3 M4 M5 M6 Credit Card Vouchers Customer revenue: R( U ) =
41 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Model Implications Description o Example - Comparing different designs o Design C - shallow: Optimal candidate: U : Main Menu Play Account Balance Main Menu Main Menu Credit Card Vouchers Recent Account Activity Customer revenue: R( U ) =
42 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Model Implications Description o Comparing different designs Organization point of view Example, assume: o 10,000 calls per day o Cost per unit of time of communication = NIS/sec o Cost per second of agent call = 0.03 NIS/sec (based on UG project) Design A - Original Design B Deep Design C - Shallow Customer revenue Cost of communication Revenue from saved agent time Total organization profit 4,736 5,606 4,826 30,564 30,564 30,564 25,828 24,958 25,738 42
43 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Model Implications Description o How to design an IVR tree to encourage the use of certain services? o Customer learning? o Do customers navigate optimally? o Services with low demand lack of interest or lack of patience? o Sensitivity to customer patience? o Anticipating long waiting in agent queue does it affect customer behavior? o Using IVR and state information to control customers behavior examples: return later, use IVR for service 43
44 Introduction Literature Review Research Theoretical Goals Model Search Empirical Model Analysis MDP Case Model Study Implications Model Implications Summary and Summary Discussion System Summary Description and Future research o Stochastic models of customer flow within the IVR o Comparing IVR designs o Insights for better designs o Supplementing previous knowledge in HFE, CS fields o Data analysis as a basis for theoretical models o Can be easily modified to other self-service systems Example: Web search engines Hassan A. et al. (2010) o Interesting open questions: o How does one recognize an abandonment from self-service when seeing one? o Interrelation between customer-ivr interaction and customer-agent interaction? 44
45
Service Engineering January Laws of Congestion
Service Engineering January 2004 http://ie.technion.ac.il/serveng2004 Laws of Congestion The Law for (The) Causes of Operational Queues Scarce Resources Synchronization Gaps (in DS PERT Networks) Linear
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationAnalysis of Software Artifacts
Analysis of Software Artifacts System Performance I Shu-Ngai Yeung (with edits by Jeannette Wing) Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 2001 by Carnegie Mellon University
More informationManaging Call Centers with Many Strategic Agents
Managing Call Centers with Many Strategic Agents Rouba Ibrahim, Kenan Arifoglu Management Science and Innovation, UCL YEQT Workshop - Eindhoven November 2014 Work Schedule Flexibility 86SwofwthewqBestwCompanieswtowWorkwForqwofferw
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationChapter 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
Chapter 7: Eligibility Traces R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Midterm Mean = 77.33 Median = 82 R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
More informationLecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation
Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More informationAllocating Resources, in the Future
Allocating Resources, in the Future Sid Banerjee School of ORIE May 3, 2018 Simons Workshop on Mathematical and Computational Challenges in Real-Time Decision Making online resource allocation: basic model......
More informationIOE 202: lectures 11 and 12 outline
IOE 202: lectures 11 and 12 outline Announcements Last time... Queueing models intro Performance characteristics of a queueing system Steady state analysis of an M/M/1 queueing system Other queueing systems,
More informationExercises Stochastic Performance Modelling. Hamilton Institute, Summer 2010
Exercises Stochastic Performance Modelling Hamilton Institute, Summer Instruction Exercise Let X be a non-negative random variable with E[X ]
More informationThe exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
CS 188 Fall 2018 Introduction to Artificial Intelligence Practice Final You have approximately 2 hours 50 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib
More informationFinal. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.
CS 188 Spring 2014 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed notes except your two-page crib sheet. Mark your answers
More informationRobustness and performance of threshold-based resource allocation policies
Robustness and performance of threshold-based resource allocation policies Takayuki Osogami Mor Harchol-Balter Alan Scheller-Wolf Computer Science Department, Carnegie Mellon University, 5000 Forbes Avenue,
More informationReinforcement Learning and Control
CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make
More informationCS 4100 // artificial intelligence. Recap/midterm review!
CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks
More informationThe M/M/n+G Queue: Summary of Performance Measures
The M/M/n+G Queue: Summary of Performance Measures Avishai Mandelbaum and Sergey Zeltyn Faculty of Industrial Engineering & Management Technion Haifa 32 ISRAEL emails: avim@tx.technion.ac.il zeltyn@ie.technion.ac.il
More informationDynamic Pricing for Non-Perishable Products with Demand Learning
Dynamic Pricing for Non-Perishable Products with Demand Learning Victor F. Araman Stern School of Business New York University René A. Caldentey DIMACS Workshop on Yield Management and Dynamic Pricing
More informationPlanning in Markov Decision Processes
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and
More informationDynamic Call Center Routing Policies Using Call Waiting and Agent Idle Times Online Supplement
Submitted to imanufacturing & Service Operations Management manuscript MSOM-11-370.R3 Dynamic Call Center Routing Policies Using Call Waiting and Agent Idle Times Online Supplement (Authors names blinded
More informationControl of Fork-Join Networks in Heavy-Traffic
in Heavy-Traffic Asaf Zviran Based on MSc work under the guidance of Rami Atar (Technion) and Avishai Mandelbaum (Technion) Industrial Engineering and Management Technion June 2010 Introduction Network
More informationReinforcement Learning
1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision
More informationChapter 5. Continuous-Time Markov Chains. Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan
Chapter 5. Continuous-Time Markov Chains Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan Continuous-Time Markov Chains Consider a continuous-time stochastic process
More informationComplexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning
Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Christos Dimitrakakis Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationMicroeconomic Theory (501b) Problem Set 10. Auctions and Moral Hazard Suggested Solution: Tibor Heumann
Dirk Bergemann Department of Economics Yale University Microeconomic Theory (50b) Problem Set 0. Auctions and Moral Hazard Suggested Solution: Tibor Heumann 4/5/4 This problem set is due on Tuesday, 4//4..
More informationCSC321 Lecture 22: Q-Learning
CSC321 Lecture 22: Q-Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Q-Learning 1 / 21 Overview Second of 3 lectures on reinforcement learning Last time: policy gradient (e.g. REINFORCE) Optimize
More informationReal Time Value Iteration and the State-Action Value Function
MS&E338 Reinforcement Learning Lecture 3-4/9/18 Real Time Value Iteration and the State-Action Value Function Lecturer: Ben Van Roy Scribe: Apoorva Sharma and Tong Mu 1 Review Last time we left off discussing
More informationQ-Learning in Continuous State Action Spaces
Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental
More informationEconomic Growth: Lecture 8, Overlapping Generations
14.452 Economic Growth: Lecture 8, Overlapping Generations Daron Acemoglu MIT November 20, 2018 Daron Acemoglu (MIT) Economic Growth Lecture 8 November 20, 2018 1 / 46 Growth with Overlapping Generations
More informationλ λ λ In-class problems
In-class problems 1. Customers arrive at a single-service facility at a Poisson rate of 40 per hour. When two or fewer customers are present, a single attendant operates the facility, and the service time
More informationDynamic Call Center Routing Policies Using Call Waiting and Agent Idle Times Online Supplement
Dynamic Call Center Routing Policies Using Call Waiting and Agent Idle Times Online Supplement Wyean Chan DIRO, Université de Montréal, C.P. 6128, Succ. Centre-Ville, Montréal (Québec), H3C 3J7, CANADA,
More informationTemporal difference learning
Temporal difference learning AI & Agents for IET Lecturer: S Luz http://www.scss.tcd.ie/~luzs/t/cs7032/ February 4, 2014 Recall background & assumptions Environment is a finite MDP (i.e. A and S are finite).
More information16.410/413 Principles of Autonomy and Decision Making
16.410/413 Principles of Autonomy and Decision Making Lecture 23: Markov Decision Processes Policy Iteration Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute of Technology December
More informationFrequently Asked Questions
Frequently Asked Questions Can I still get paid via direct deposit? Can I use e- wallet to pay for USANA auto ship orders? Can I use e- wallet to pay for USANA products? Can I use e- wallet to pay for
More informationReinforcement Learning
Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability
More informationA Semi-parametric Bayesian Framework for Performance Analysis of Call Centers
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session STS065) p.2345 A Semi-parametric Bayesian Framework for Performance Analysis of Call Centers Bangxian Wu and Xiaowei
More informationElectronic Market Making and Latency
Electronic Market Making and Latency Xuefeng Gao 1, Yunhan Wang 2 June 15, 2018 Abstract This paper studies the profitability of market making strategies and the impact of latency on electronic market
More informationInternet Monetization
Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition
More informationReinforcement Learning. Yishay Mansour Tel-Aviv University
Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak
More informationAnalysis of Call Center Data
University of Pennsylvania ScholarlyCommons Wharton Research Scholars Wharton School 4-1-2004 Analysis of Call Center Data Yu Chu Cheng University of Pennsylvania Follow this and additional works at: http://repository.upenn.edu/wharton_research_scholars
More informationReinforcement Learning
Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha
More informationReinforcement Learning as Variational Inference: Two Recent Approaches
Reinforcement Learning as Variational Inference: Two Recent Approaches Rohith Kuditipudi Duke University 11 August 2017 Outline 1 Background 2 Stein Variational Policy Gradient 3 Soft Q-Learning 4 Closing
More informationData-Based Resource-View of Service Networks: Performance Analysis, Delay Prediction and Asymptotics. PhD Proposal. Nitzan Carmeli.
Data-Based Resource-View of Service Networks: Performance Analysis, Delay Prediction and Asymptotics PhD Proposal Nitzan Carmeli Advisers: Prof. Avishai Mandelbaum Dr. Galit Yom-Tov Technion - Israel Institute
More informationDynamic Control of Parallel-Server Systems
Dynamic Control of Parallel-Server Systems Jim Dai Georgia Institute of Technology Tolga Tezcan University of Illinois at Urbana-Champaign May 13, 2009 Jim Dai (Georgia Tech) Many-Server Asymptotic Optimality
More informationThe Book: Where we are and where we re going. CSE 190: Reinforcement Learning: An Introduction. Chapter 7: Eligibility Traces. Simple Monte Carlo
CSE 190: Reinforcement Learning: An Introduction Chapter 7: Eligibility races Acknowledgment: A good number of these slides are cribbed from Rich Sutton he Book: Where we are and where we re going Part
More informationBayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem
Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 1/37 Bayesian Congestion Control over a Markovian Network Bandwidth Process:
More informationAbandonment and Customers Patience in Tele-Queues. The Palm/Erlang-A Model.
Sergey Zeltyn February 25 zeltyn@ie.technion.ac.il STAT 991. Service Engineering. The Wharton School. University of Pennsylvania. Abandonment and Customers Patience in Tele-Queues. The Palm/Erlang-A Model.
More informationQ-learning. Tambet Matiisen
Q-learning Tambet Matiisen (based on chapter 11.3 of online book Artificial Intelligence, foundations of computational agents by David Poole and Alan Mackworth) Stochastic gradient descent Experience
More informationElements of Reinforcement Learning
Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,
More informationNear-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement
Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement Jefferson Huang March 21, 2018 Reinforcement Learning for Processing Networks Seminar Cornell University Performance
More informationThe exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
CS 188 Spring 2017 Introduction to Artificial Intelligence Midterm V2 You have approximately 80 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. Mark
More informationCS221 Practice Midterm
CS221 Practice Midterm Autumn 2012 1 ther Midterms The following pages are excerpts from similar classes midterms. The content is similar to what we ve been covering this quarter, so that it should be
More informationStatistics 150: Spring 2007
Statistics 150: Spring 2007 April 23, 2008 0-1 1 Limiting Probabilities If the discrete-time Markov chain with transition probabilities p ij is irreducible and positive recurrent; then the limiting probabilities
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationIntroduction to Reinforcement Learning
CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.
More informationLecture 9: Deterministic Fluid Models and Many-Server Heavy-Traffic Limits. IEOR 4615: Service Engineering Professor Whitt February 19, 2015
Lecture 9: Deterministic Fluid Models and Many-Server Heavy-Traffic Limits IEOR 4615: Service Engineering Professor Whitt February 19, 2015 Outline Deterministic Fluid Models Directly From Data: Cumulative
More informationTime Preferences. Mark Dean. ECON 2090 Spring 2015
Time Preferences Mark Dean ECON 2090 Spring 2015 Two Standard Ways Last week we suggested two possible ways of spotting temptation 1 Preference for Commitment 2 Time inconsistency Previously we covered
More informationCS 188 Introduction to Fall 2007 Artificial Intelligence Midterm
NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm You have 80 minutes. The exam is closed book, closed notes except a one-page crib sheet, basic calculators only.
More information16.4 Multiattribute Utility Functions
285 Normalized utilities The scale of utilities reaches from the best possible prize u to the worst possible catastrophe u Normalized utilities use a scale with u = 0 and u = 1 Utilities of intermediate
More informationAn Analysis of the Preemptive Repeat Queueing Discipline
An Analysis of the Preemptive Repeat Queueing Discipline Tony Field August 3, 26 Abstract An analysis of two variants of preemptive repeat or preemptive repeat queueing discipline is presented: one in
More informationHow to Calculate Form Line 15
How to Calculate Form 8621 - Line 15 2013-2015 Comprehensive Example Mary Beth Lougen EA USTCP Chief Operating Officer Expat Tax Tools B.Lougen@f8621.com 1 (844) 312-8670 ext. 402 www.f8621.com www.expattaxtools.com
More informationStaffing Tandem Queues with Impatient Customers Application in Financial Service Operations
Staffing Tandem Queues with Impatient Customers Application in Financial Service Operations Jianfu Wang, Hossein Abouee-Mehrizi 2, Opher Baron 3, Oded Berman 3 : Nanyang Business School, Nanyang Technological
More informationstochnotes Page 1
stochnotes110308 Page 1 Kolmogorov forward and backward equations and Poisson process Monday, November 03, 2008 11:58 AM How can we apply the Kolmogorov equations to calculate various statistics of interest?
More informationErlang-C = M/M/N. agents. queue ACD. arrivals. busy ACD. queue. abandonment BACK FRONT. lost calls. arrivals. lost calls
Erlang-C = M/M/N agents arrivals ACD queue Erlang-A lost calls FRONT BACK arrivals busy ACD queue abandonment lost calls Erlang-C = M/M/N agents arrivals ACD queue Rough Performance Analysis
More informationReinforcement Learning
Reinforcement Learning Temporal Difference Learning Temporal difference learning, TD prediction, Q-learning, elibigility traces. (many slides from Marc Toussaint) Vien Ngo MLR, University of Stuttgart
More informationBayesian Congestion Control over a Markovian Network Bandwidth Process
Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 1/30 Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard (USC) Joint work
More informationCOMP9334 Capacity Planning for Computer Systems and Networks
COMP9334 Capacity Planning for Computer Systems and Networks Week 2: Operational Analysis and Workload Characterisation COMP9334 1 Last lecture Modelling of computer systems using Queueing Networks Open
More informationQueueing Theory I Summary! Little s Law! Queueing System Notation! Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K "
Queueing Theory I Summary Little s Law Queueing System Notation Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K " Little s Law a(t): the process that counts the number of arrivals
More informationDelay Announcements. Predicting Queueing Delays for Delay Announcements. IEOR 4615, Service Engineering, Professor Whitt. Lecture 21, April 21, 2015
Delay Announcements Predicting Queueing Delays for Delay Announcements IEOR 4615, Service Engineering, Professor Whitt Lecture 21, April 21, 2015 OUTLINE Delay Announcements: Why, What and When? [Slide
More informationLearning Control Under Uncertainty: A Probabilistic Value-Iteration Approach
Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach B. Bischoff 1, D. Nguyen-Tuong 1,H.Markert 1 anda.knoll 2 1- Robert Bosch GmbH - Corporate Research Robert-Bosch-Str. 2, 71701
More informationCS281A/Stat241A Lecture 19
CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723
More informationRL 3: Reinforcement Learning
RL 3: Reinforcement Learning Q-Learning Michael Herrmann University of Edinburgh, School of Informatics 20/01/2015 Last time: Multi-Armed Bandits (10 Points to remember) MAB applications do exist (e.g.
More informationarxiv: v1 [cs.ai] 5 Nov 2017
arxiv:1711.01569v1 [cs.ai] 5 Nov 2017 Markus Dumke Department of Statistics Ludwig-Maximilians-Universität München markus.dumke@campus.lmu.de Abstract Temporal-difference (TD) learning is an important
More informationSequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague
Sequential decision making under uncertainty Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationThe Performance Impact of Delay Announcements
The Performance Impact of Delay Announcements Taking Account of Customer Response IEOR 4615, Service Engineering, Professor Whitt Supplement to Lecture 21, April 21, 2015 Review: The Purpose of Delay Announcements
More informationEconomics 2010c: Lectures 9-10 Bellman Equation in Continuous Time
Economics 2010c: Lectures 9-10 Bellman Equation in Continuous Time David Laibson 9/30/2014 Outline Lectures 9-10: 9.1 Continuous-time Bellman Equation 9.2 Application: Merton s Problem 9.3 Application:
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationExercises, II part Exercises, II part
Inference: 12 Jul 2012 Consider the following Joint Probability Table for the three binary random variables A, B, C. Compute the following queries: 1 P(C A=T,B=T) 2 P(C A=T) P(A, B, C) A B C 0.108 T T
More informationBayesian reinforcement learning and partially observable Markov decision processes November 6, / 24
and partially observable Markov decision processes Christos Dimitrakakis EPFL November 6, 2013 Bayesian reinforcement learning and partially observable Markov decision processes November 6, 2013 1 / 24
More informationHidden Markov Models (HMM) and Support Vector Machine (SVM)
Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)
More informationDistributed Optimization. Song Chong EE, KAIST
Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links
More informationProbabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford
Probabilistic Model Checking Michaelmas Term 20 Dr. Dave Parker Department of Computer Science University of Oxford Overview PCTL for MDPs syntax, semantics, examples PCTL model checking next, bounded
More informationReinforcement Learning. Spring 2018 Defining MDPs, Planning
Reinforcement Learning Spring 2018 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state
More informationReinforcement Learning II
Reinforcement Learning II Andrea Bonarini Artificial Intelligence and Robotics Lab Department of Electronics and Information Politecnico di Milano E-mail: bonarini@elet.polimi.it URL:http://www.dei.polimi.it/people/bonarini
More informationThe exponential distribution and the Poisson process
The exponential distribution and the Poisson process 1-1 Exponential Distribution: Basic Facts PDF f(t) = { λe λt, t 0 0, t < 0 CDF Pr{T t) = 0 t λe λu du = 1 e λt (t 0) Mean E[T] = 1 λ Variance Var[T]
More informationScheduling Markovian PERT networks to maximize the net present value: new results
Scheduling Markovian PERT networks to maximize the net present value: new results Hermans B, Leus R. KBI_1709 Scheduling Markovian PERT networks to maximize the net present value: New results Ben Hermans,a
More informationA Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley
A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study
More informationMoment-based Availability Prediction for Bike-Sharing Systems
Moment-based Availability Prediction for Bike-Sharing Systems Jane Hillston Joint work with Cheng Feng and Daniël Reijsbergen LFCS, School of Informatics, University of Edinburgh http://www.quanticol.eu
More informationChapter 3 Deterministic planning
Chapter 3 Deterministic planning In this chapter we describe a number of algorithms for solving the historically most important and most basic type of planning problem. Two rather strong simplifying assumptions
More informationClinic Scheduling Models with Overbooking for Patients with Heterogeneous No-show Probabilities
1 2 3 4 5 6 7 8 9 Clinic Scheduling Models with Overbooking for Patients with Heterogeneous No-show Probabilities Bo Zeng, Ji Lin and Mark Lawley email: {bzeng, lin35, malawley}@purdue.edu Weldon School
More informationTrust Region Policy Optimization
Trust Region Policy Optimization Yixin Lin Duke University yixin.lin@duke.edu March 28, 2017 Yixin Lin (Duke) TRPO March 28, 2017 1 / 21 Overview 1 Preliminaries Markov Decision Processes Policy iteration
More information1 [15 points] Search Strategies
Probabilistic Foundations of Artificial Intelligence Final Exam Date: 29 January 2013 Time limit: 120 minutes Number of pages: 12 You can use the back of the pages if you run out of space. strictly forbidden.
More information