A Probabilistic Mental Model for Estimating Disruption

Size: px
Start display at page:

Download "A Probabilistic Mental Model for Estimating Disruption"

Transcription

1 A Probabilistic Mental Model for Estimating Disruption Bowen Hui 1, Grant Partridge 2, Craig Boutilier 1 1 Dept of Computer Science, University of Toronto, Canada 2 Dept of Computer Science, University of Manitoba, Canada Intelligent User Interfaces (IUI 09), Feb 8-11, 2009

2 Need for Software Customization Varying user needs and preferences Industry state-of-practice One-size-fits-all: cluttered, bloated interfaces Users become lost and unsatisfied Most affected users People with cognitive, sensory, motor impairments Elderly people Children Novices IUI

3 Our Approach Decision-theoretic (DT) adaptive systems Incorporates explicit user feedback (adaptable) Learn user preferences (adaptive) Trades off benefits and costs Long term, sequential reasoning Models individual differences over time Demonstrated success of DT systems Health care assistance *BPH+ 05+ Intelligent assistance/tutoring *HBH+ 98, FNJ+ 07, MV 00+ Interface customization *BJ 01, GW 04, HB 06+ IUI

4 Disruption Major bottleneck in adaptive system Mental model of software application Location, procedure, execution time, etc. Benefits of adaptive action Speed gains Reduce bloat Costs of adaptive action Induced disruption Interruption IUI

5 Disruption Major bottleneck in adaptive system Mental model of software application Location, procedure, execution time, etc. Benefits of adaptive action Speed gains Reduce bloat Costs of adaptive action Induced disruption Interruption Mental Model t-1 Loc t-1 Costs Adaptive Action Loc t Benefits IUI

6 A Probabilistic Representation Focus: mental model of function locations k Є 1..K functions l Є 1..L locations θ 1..θ K (independent) multinomial distributions 1 1 Pr Pr 0 1 L locations θ k (l) is the probability to (first) access k at l 0 1 L locations IUI

7 Model Strength user s degree of uncertainty of model distribution M k = strength of θ k defined as relative entropy: M k = 1-H(θ k ) H L + where H(X) = -Σ x P(x)log(P(x)) H L + = max entropy Notions of strength: Weak mental model when M k close to 0 Strong mental model when M k close to 1 8

8 Mental Processes Common to all systems: 1. Learning 2. Forgetting Specific to adaptive systems: 3. Undergoing disruption IUI

9 θ k : 1. Learning θ j : Context: visual-spatial cues, function usage C k = log(freq k,nb k ) Learning rate, λ Є [0,1] Updating strength: t t M k = (1- λ)m k + λc k t t-1 t [memory recall] IUI

10 θ k : 1. Learning θ j : Context: visual-spatial cues, function usage C k = log(freq k,nb k ) Learning rate, λ Є [0,1] Updating strength: t t M k = (1- λ)m k + λc k t t-1 t [memory recall] IUI

11 2. Forgetting θ k : Adopt exponential forgetting Forgetting rate, β Є [0,1] Updating strength: M k = βm k t t-1 IUI

12 2. Forgetting θ k : Adopt exponential forgetting Forgetting rate, β Є [0,1] Updating strength: M k = βm k t t-1 IUI

13 Tracking Model Strength Freq=3 Freq=7 Freq=4 Freq=3 Freq=28 Freq=5 14

14 hypothetical φ k : θ k : 3. Modeling Disruption Aspects of disruption Disruption time (objective) Annoyance factor (subjective) Mixing rate, α Є [0,1] Updating strength: M k = (1- α)m k + αm(φ k ) t t-1 t-1 15

15 hypothetical φ k : θ k : 3. Modeling Disruption Aspects of disruption Disruption time (objective) Annoyance factor (subjective) Mixing rate, α Є [0,1] Updating strength: M k = (1- α)m k + αm(φ k ) t t-1 t-1 16

16 hypothetical φ k : θ k : 3. Modeling Disruption Mixing rate, α Є [0,1] Updating strength: M k = (1- α)m k + αm(φ k ) t t-1 t-1 Aspects of disruption Disruption time (objective) Annoyance factor (subjective) 17

17 hypothetical φ k : θ k : 3. Modeling Disruption Mixing rate, α Є [0,1] Updating strength: M k = (1- α)m k + αm(φ k ) t t-1 t-1 Aspects of disruption Disruption time (objective) Annoyance factor (subjective) Mt k = δm k t-1 18

18 Decision-Theoretic Adaptive System Savings vs. disruption Individual preferences Sequential decision making Adaptive Action Mental Model t-1 Loc t-1 Loc t Costs Benefits IUI

19 Decision-Theoretic Adaptive System A L t-1 L t M t-1 K Disruption Savings IUI

20 Decision-Theoretic Adaptive System Joint Expected Savings S(l k,l k ) = fitts(l k ) fitts(l k ) t-1 t t-1 t JES(A l 1:K ) = Σ K p k S(l k,l k ) t-1 k=1 t-1 t A L t-1 L t... M t-1 L t-1 L t M t-1 K Disruption Savings 21

21 Decision-Theoretic Adaptive System Joint Expected Disruption D k = g(m k A) is linear in M k t t t JED(A M 1:K ) = Σ K p k g(m k A ) t k=1 t [adaptive selection] A L t-1 L t... M t-1 L t-1 L t M t-1 K Disruption Savings 22

22 Decision-Theoretic Adaptive System w s JES(A l 1:K ) w d JED(A M 1:K ) t-1 t-1 A L t-1 L t... M t-1 L t-1 L t M t-1 K s.t. w d + w s = 1.0 w d Disruption w s Savings 23

23 Decision-Theoretic Adaptive System Σ H h=1 γ h w s JES γ h (1 α) h w d JED deteriorates discount factor horizon look-ahead A WER policy (approx. long term) L t-1 L t... M t-1 L t-1 L t M t-1 K Disruption Savings 24

24 WER policy Select best action s.t.: A * = argmax A WER(A M 1:K, l 1:K,w s,h,α,γ) t t-1 greedy approximation IUI

25 Evaluation: Menu Selection Frequency distributions: Zipf, Uniform Metrics Selection time Disruption time Total number of strong models (M k >0.4) Percentage of strong models moved Comparison policies Static: Best-Static Adaptive (TOP action): Random-4, Split-4 (move), WER-4 IUI

26 Usability Experiment 2 distributions x 4 policies (diff. labels, rotated) 8 participants Would you like adaptive systems if they were designed to SPEED UP the tasks? Yes w s =0.9 (most similar to Split-4) Maybe w s =0.5 No w s =0.1 IUI

27 Quantitative Results Zipf Policy Task Time (ms) Estimated Disrupt Time (ms) Total Strong Models Strong Models Moved (%) Best Static % Random % Split-4 (move) % WER-4 (all) % WER-4 (w s =0.1) % WER-4 (w s =0.5) % WER-4 (w s =0.9) % IUI

28 Quantitative Results Zipf Policy Task Time (ms) Estimated Disrupt Time (ms) Total Strong Models Strong Models Moved (%) Best Static fastest % Random-4 worst % Split-4 (move) % WER-4 (all) % WER-4 (w s =0.1) % WER-4 (w s =0.5) % WER-4 (w s =0.9) % IUI

29 Quantitative Results Zipf Policy Task Time (ms) Estimated Disrupt Time (ms) Total Strong Models Strong Models Moved (%) Best Static % Random % competitive Split-4 (move) % WER-4 (all) % WER-4 (w s =0.1) % WER-4 (w s =0.5) % WER-4 (w s =0.9) % faster IUI

30 Quantitative Results Zipf Policy Task Time (ms) Estimated Disrupt Time (ms) Total Strong Models Strong Models Moved (%) Best Static % Random % Split-4 (move) % WER-4 (all) % WER-4 (w s =0.1) % WER-4 (w s =0.5) % WER-4 (w s =0.9) % faster recovery IUI

31 Quantitative Results Zipf Policy Task Time (ms) Estimated Disrupt Time (ms) Total Strong Models Strong Models Moved (%) Best Static % Random % more learning Split-4 (move) % WER-4 (all) % WER-4 (w s =0.1) % p<0.05 WER-4 (w s =0.5) % WER-4 (w s =0.9) % IUI

32 Quantitative Results Zipf Policy Task Time (ms) Estimated Disrupt Time (ms) Total Strong Models Strong Models Moved (%) Best Static % Random % Split-4 (move) % WER-4 (all) % WER-4 (w s =0.1) % WER-4 (w s =0.5) % WER-4 (w s =0.9) % prefers to disrupt weak mental models IUI

33 Quantitative Results Uniform Policy Task Time (ms) Estimated Disrupt Time (ms) Total Strong Models Strong Models Moved (%) Best Static % Random % Split-4 (move) % WER-4 (all) % WER-4 (w s =0.1) % p<0.01 WER-4 (w s =0.5) % p<0.05 WER-4 (w s =0.9) % similar patterns across all metrics 34

34 More Post-Questionnaire Less Frustrating? Easy to use? Efficient? IUI

35 Summary and Future Work DT approach to adaptive systems Cost of disruption Probabilistic representation of mental models Principled tradeoffs Sequential reasoning Individual preferences Usability feedback suggests value in our approach Future work Other kinds of approximations Parameter learning experiments More system comparisons 36

the tree till a class assignment is reached

the tree till a class assignment is reached Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal

More information

Outline. Lecture 13. Sequential Decision Making. Sequential Decision Making. Markov Decision Process. Stationary Preferences

Outline. Lecture 13. Sequential Decision Making. Sequential Decision Making. Markov Decision Process. Stationary Preferences Outline Lecture 3 October 27, 2009 C 486/686 Markov Decision Processes Dynamic Decision Networks Russell and Norvig: ect 7., 7.2 (up to p. 620), 7.4, 7.5 2 equential Decision Making tatic Decision Making

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Lecture 3: The Reinforcement Learning Problem

Lecture 3: The Reinforcement Learning Problem Lecture 3: The Reinforcement Learning Problem Objectives of this lecture: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Control & Response Selection

Control & Response Selection Control & Response Selection Response Selection Response Execution 1 Types of control: Discrete Continuous Open-loop startle reaction touch typing hitting a baseball writing "motor programs" Closed-loop

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

2 How many distinct elements are in a stream?

2 How many distinct elements are in a stream? Dealing with Massive Data January 31, 2011 Lecture 2: Distinct Element Counting Lecturer: Sergei Vassilvitskii Scribe:Ido Rosen & Yoonji Shin 1 Introduction We begin by defining the stream formally. Definition

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Robust Monte Carlo Methods for Sequential Planning and Decision Making Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory

More information

An Analytic Solution to Discrete Bayesian Reinforcement Learning

An Analytic Solution to Discrete Bayesian Reinforcement Learning An Analytic Solution to Discrete Bayesian Reinforcement Learning Pascal Poupart (U of Waterloo) Nikos Vlassis (U of Amsterdam) Jesse Hoey (U of Toronto) Kevin Regan (U of Waterloo) 1 Motivation Automated

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

About Nnergix +2, More than 2,5 GW forecasted. Forecasting in 5 countries. 4 predictive technologies. More than power facilities

About Nnergix +2, More than 2,5 GW forecasted. Forecasting in 5 countries. 4 predictive technologies. More than power facilities About Nnergix +2,5 5 4 +20.000 More than 2,5 GW forecasted Forecasting in 5 countries 4 predictive technologies More than 20.000 power facilities Nnergix s Timeline 2012 First Solar Photovoltaic energy

More information

Reinforcement Learning Active Learning

Reinforcement Learning Active Learning Reinforcement Learning Active Learning Alan Fern * Based in part on slides by Daniel Weld 1 Active Reinforcement Learning So far, we ve assumed agent has a policy We just learned how good it is Now, suppose

More information

Reinforcement Learning II

Reinforcement Learning II Reinforcement Learning II Andrea Bonarini Artificial Intelligence and Robotics Lab Department of Electronics and Information Politecnico di Milano E-mail: bonarini@elet.polimi.it URL:http://www.dei.polimi.it/people/bonarini

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University

More information

Stochastic Safest and Shortest Path Problems

Stochastic Safest and Shortest Path Problems Stochastic Safest and Shortest Path Problems Florent Teichteil-Königsbuch AAAI-12, Toronto, Canada July 24-26, 2012 Path optimization under probabilistic uncertainties Problems coming to searching for

More information

Ministry of Health and Long-Term Care Geographic Information System (GIS) Strategy An Overview of the Strategy Implementation Plan November 2009

Ministry of Health and Long-Term Care Geographic Information System (GIS) Strategy An Overview of the Strategy Implementation Plan November 2009 Ministry of Health and Long-Term Care Geographic Information System (GIS) Strategy An Overview of the Strategy Implementation Plan November 2009 John Hill, Health Analytics Branch Health System Information

More information

CAP Plan, Activity, and Intent Recognition

CAP Plan, Activity, and Intent Recognition CAP6938-02 Plan, Activity, and Intent Recognition Lecture 10: Sequential Decision-Making Under Uncertainty (part 1) MDPs and POMDPs Instructor: Dr. Gita Sukthankar Email: gitars@eecs.ucf.edu SP2-1 Reminder

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

Markov Decision Processes Infinite Horizon Problems

Markov Decision Processes Infinite Horizon Problems Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld 1 What is a solution to an MDP? MDP Planning Problem: Input: an MDP (S,A,R,T)

More information

Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm

Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm Saba Q. Yahyaa, Madalina M. Drugan and Bernard Manderick Vrije Universiteit Brussel, Department of Computer Science, Pleinlaan 2, 1050 Brussels,

More information

Reinforcement Learning and NLP

Reinforcement Learning and NLP 1 Reinforcement Learning and NLP Kapil Thadani kapil@cs.columbia.edu RESEARCH Outline 2 Model-free RL Markov decision processes (MDPs) Derivative-free optimization Policy gradients Variance reduction Value

More information

Discrete-event simulations

Discrete-event simulations Discrete-event simulations Lecturer: Dmitri A. Moltchanov E-mail: moltchan@cs.tut.fi http://www.cs.tut.fi/kurssit/elt-53606/ OUTLINE: Why do we need simulations? Step-by-step simulations; Classifications;

More information

CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes

CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes Kee-Eung Kim KAIST EECS Department Computer Science Division Markov Decision Processes (MDPs) A popular model for sequential decision

More information

Temporal Difference Learning & Policy Iteration

Temporal Difference Learning & Policy Iteration Temporal Difference Learning & Policy Iteration Advanced Topics in Reinforcement Learning Seminar WS 15/16 ±0 ±0 +1 by Tobias Joppen 03.11.2015 Fachbereich Informatik Knowledge Engineering Group Prof.

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Cockpit System Situational Awareness Modeling Tool

Cockpit System Situational Awareness Modeling Tool ABSTRACT Cockpit System Situational Awareness Modeling Tool John Keller and Dr. Christian Lebiere Micro Analysis & Design, Inc. Capt. Rick Shay Double Black Aviation Technology LLC Dr. Kara Latorella NASA

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Quantitative Information Flow. Lecture 7

Quantitative Information Flow. Lecture 7 Quantitative Information Flow Lecture 7 1 The basic model: Systems = Information-Theoretic channels Secret Information Observables s1 o1... System... sm on Input Output 2 Probabilistic systems are noisy

More information

R O B U S T E N E R G Y M AN AG E M E N T S Y S T E M F O R I S O L AT E D M I C R O G R I D S

R O B U S T E N E R G Y M AN AG E M E N T S Y S T E M F O R I S O L AT E D M I C R O G R I D S ROBUST ENERGY MANAGEMENT SYSTEM FOR ISOLATED MICROGRIDS Jose Daniel La r a Claudio Cañizares Ka nka r Bhattacharya D e p a r t m e n t o f E l e c t r i c a l a n d C o m p u t e r E n g i n e e r i n

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and

More information

The World Bank Mali Reconstruction and Economic Recovery (P144442)

The World Bank Mali Reconstruction and Economic Recovery (P144442) Public Disclosure Authorized AFRICA Mali Social, Urban, Rural and Resilience Global Practice Global Practice IBRD/IDA Emergency Recovery Loan FY 2014 Seq No: 7 ARCHIVED on 30-Jun-2017 ISR28723 Implementing

More information

The impact of the open geographical data follow up study Agency for Data Supply and Efficiency

The impact of the open geographical data follow up study Agency for Data Supply and Efficiency www.pwc.dk The impact of the open geographical data follow up study Agency for Data Supply and Efficiency March 17th 2017 Management summary In recent years, interest in releasing public-sector data has

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Recurrent Autoregressive Networks for Online Multi-Object Tracking. Presented By: Ishan Gupta

Recurrent Autoregressive Networks for Online Multi-Object Tracking. Presented By: Ishan Gupta Recurrent Autoregressive Networks for Online Multi-Object Tracking Presented By: Ishan Gupta Outline Multi Object Tracking Recurrent Autoregressive Networks (RANs) RANs for Online Tracking Other State

More information

Dialogue as a Decision Making Process

Dialogue as a Decision Making Process Dialogue as a Decision Making Process Nicholas Roy Challenges of Autonomy in the Real World Wide range of sensors Noisy sensors World dynamics Adaptability Incomplete information Robustness under uncertainty

More information

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING

More information

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation Probabilistic Model Checking and Strategy Synthesis for Robot Navigation Dave Parker University of Birmingham (joint work with Bruno Lacerda, Nick Hawes) AIMS CDT, Oxford, May 2015 Overview Probabilistic

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 4: Probabilistic Retrieval Models April 29, 2010 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

Dynamic Power Management under Uncertain Information. University of Southern California Los Angeles CA

Dynamic Power Management under Uncertain Information. University of Southern California Los Angeles CA Dynamic Power Management under Uncertain Information Hwisung Jung and Massoud Pedram University of Southern California Los Angeles CA Agenda Introduction Background Stochastic Decision-Making Framework

More information

Lecture 3: Probabilistic Retrieval Models

Lecture 3: Probabilistic Retrieval Models Probabilistic Retrieval Models Information Retrieval and Web Search Engines Lecture 3: Probabilistic Retrieval Models November 5 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme

More information

Course basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage.

Course basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage. Course basics CSE 190: Reinforcement Learning: An Introduction The website for the class is linked off my homepage. Grades will be based on programming assignments, homeworks, and class participation.

More information

Some AI Planning Problems

Some AI Planning Problems Course Logistics CS533: Intelligent Agents and Decision Making M, W, F: 1:00 1:50 Instructor: Alan Fern (KEC2071) Office hours: by appointment (see me after class or send email) Emailing me: include CS533

More information

Sales Analysis User Manual

Sales Analysis User Manual Sales Analysis User Manual Confidential Information This document contains proprietary and valuable, confidential trade secret information of APPX Software, Inc., Richmond, Virginia Notice of Authorship

More information

Evaluation of multi armed bandit algorithms and empirical algorithm

Evaluation of multi armed bandit algorithms and empirical algorithm Acta Technica 62, No. 2B/2017, 639 656 c 2017 Institute of Thermomechanics CAS, v.v.i. Evaluation of multi armed bandit algorithms and empirical algorithm Zhang Hong 2,3, Cao Xiushan 1, Pu Qiumei 1,4 Abstract.

More information

Reinforcement Learning. George Konidaris

Reinforcement Learning. George Konidaris Reinforcement Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom

More information

Decision Trees. Tirgul 5

Decision Trees. Tirgul 5 Decision Trees Tirgul 5 Using Decision Trees It could be difficult to decide which pet is right for you. We ll find a nice algorithm to help us decide what to choose without having to think about it. 2

More information

Computer Science CPSC 322. Lecture 23 Planning Under Uncertainty and Decision Networks

Computer Science CPSC 322. Lecture 23 Planning Under Uncertainty and Decision Networks Computer Science CPSC 322 Lecture 23 Planning Under Uncertainty and Decision Networks 1 Announcements Final exam Mon, Dec. 18, 12noon Same general format as midterm Part short questions, part longer problems

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Constrained data assimilation. W. Carlisle Thacker Atlantic Oceanographic and Meteorological Laboratory Miami, Florida USA

Constrained data assimilation. W. Carlisle Thacker Atlantic Oceanographic and Meteorological Laboratory Miami, Florida USA Constrained data assimilation W. Carlisle Thacker Atlantic Oceanographic and Meteorological Laboratory Miami, Florida 33149 USA Plan Range constraints: : HYCOM layers have minimum thickness. Optimal interpolation:

More information

Multimodal context analysis and prediction

Multimodal context analysis and prediction Multimodal context analysis and prediction Valeria Tomaselli (valeria.tomaselli@st.com) Sebastiano Battiato Giovanni Maria Farinella Tiziana Rotondo (PhD student) Outline 2 Context analysis vs prediction

More information

Bayesian Sequential Design under Model Uncertainty using Sequential Monte Carlo

Bayesian Sequential Design under Model Uncertainty using Sequential Monte Carlo Bayesian Sequential Design under Model Uncertainty using Sequential Monte Carlo, James McGree, Tony Pettitt October 7, 2 Introduction Motivation Model choice abundant throughout literature Take into account

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

Squeezing Every Ounce of Information from An Experiment: Adaptive Design Optimization

Squeezing Every Ounce of Information from An Experiment: Adaptive Design Optimization Squeezing Every Ounce of Information from An Experiment: Adaptive Design Optimization Jay Myung Department of Psychology Ohio State University UCI Department of Cognitive Sciences Colloquium (May 21, 2014)

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical

More information

16.4 Multiattribute Utility Functions

16.4 Multiattribute Utility Functions 285 Normalized utilities The scale of utilities reaches from the best possible prize u to the worst possible catastrophe u Normalized utilities use a scale with u = 0 and u = 1 Utilities of intermediate

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?

More information

Econometric Causality

Econometric Causality Econometric (2008) International Statistical Review, 76(1):1-27 James J. Heckman Spencer/INET Conference University of Chicago Econometric The econometric approach to causality develops explicit models

More information

Classification, Linear Models, Naïve Bayes

Classification, Linear Models, Naïve Bayes Classification, Linear Models, Naïve Bayes CMSC 470 Marine Carpuat Slides credit: Dan Jurafsky & James Martin, Jacob Eisenstein Today Text classification problems and their evaluation Linear classifiers

More information

1 MDP Value Iteration Algorithm

1 MDP Value Iteration Algorithm CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using

More information

Preference Elicitation for Sequential Decision Problems

Preference Elicitation for Sequential Decision Problems Preference Elicitation for Sequential Decision Problems Kevin Regan University of Toronto Introduction 2 Motivation Focus: Computational approaches to sequential decision making under uncertainty These

More information

Collaborative Recommendation with Multiclass Preference Context

Collaborative Recommendation with Multiclass Preference Context Collaborative Recommendation with Multiclass Preference Context Weike Pan and Zhong Ming {panweike,mingz}@szu.edu.cn College of Computer Science and Software Engineering Shenzhen University Pan and Ming

More information

Forecasting Practice: Decision Support System to Assist Judgmental Forecasting

Forecasting Practice: Decision Support System to Assist Judgmental Forecasting Forecasting Practice: Decision Support System to Assist Judgmental Forecasting Gauresh Rajadhyaksha Dept. of Electrical and Computer Engg. University of Texas at Austin Austin, TX 78712 Email: gaureshr@mail.utexas.edu

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of

More information

The Underutilization of GIS & How to Cure It. Adam Carnow Esri

The Underutilization of GIS & How to Cure It. Adam Carnow Esri The Underutilization of GIS & How to Cure It Adam Carnow Esri What is GIS? A framework to organize, communicate, and understand the science of our world Business Intelligence (BI) is the set of

More information

Elements of Reinforcement Learning

Elements of Reinforcement Learning Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,

More information

16.400/453J Human Factors Engineering. Manual Control I

16.400/453J Human Factors Engineering. Manual Control I J Human Factors Engineering Manual Control I 1 Levels of Control Human Operator Human Operator Human Operator Human Operator Human Operator Display Controller Display Controller Display Controller Display

More information

Real-Time Scheduling and Resource Management

Real-Time Scheduling and Resource Management ARTIST2 Summer School 2008 in Europe Autrans (near Grenoble), France September 8-12, 2008 Real-Time Scheduling and Resource Management Lecturer: Giorgio Buttazzo Full Professor Scuola Superiore Sant Anna

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

CS 570: Machine Learning Seminar. Fall 2016

CS 570: Machine Learning Seminar. Fall 2016 CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or

More information

The Markov Decision Process Extraction Network

The Markov Decision Process Extraction Network The Markov Decision Process Extraction Network Siegmund Duell 1,2, Alexander Hans 1,3, and Steffen Udluft 1 1- Siemens AG, Corporate Research and Technologies, Learning Systems, Otto-Hahn-Ring 6, D-81739

More information

A GIS Tool for Modelling and Visualizing Sustainability Indicators Across Three Regions of Ireland

A GIS Tool for Modelling and Visualizing Sustainability Indicators Across Three Regions of Ireland International Conference on Whole Life Urban Sustainability and its Assessment M. Horner, C. Hardcastle, A. Price, J. Bebbington (Eds) Glasgow, 2007 A GIS Tool for Modelling and Visualizing Sustainability

More information

An Introduction to Reinforcement Learning

An Introduction to Reinforcement Learning An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@cse.iitb.ac.in Department of Computer Science and Engineering Indian Institute of Technology Bombay April 2018 What is Reinforcement

More information

Qualitative vs Quantitative metrics

Qualitative vs Quantitative metrics Qualitative vs Quantitative metrics Quantitative: hard numbers, measurable Time, Energy, Space Signal-to-Noise, Frames-per-second, Memory Usage Money (?) Qualitative: feelings, opinions Complexity: Simple,

More information

Using first-order logic, formalize the following knowledge:

Using first-order logic, formalize the following knowledge: Probabilistic Artificial Intelligence Final Exam Feb 2, 2016 Time limit: 120 minutes Number of pages: 19 Total points: 100 You can use the back of the pages if you run out of space. Collaboration on the

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets

On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets Pablo Samuel Castro pcastr@cs.mcgill.ca McGill University Joint work with: Doina Precup and Prakash

More information

1 [15 points] Search Strategies

1 [15 points] Search Strategies Probabilistic Foundations of Artificial Intelligence Final Exam Date: 29 January 2013 Time limit: 120 minutes Number of pages: 12 You can use the back of the pages if you run out of space. strictly forbidden.

More information

Reinforcement Learning

Reinforcement Learning CS7/CS7 Fall 005 Supervised Learning: Training examples: (x,y) Direct feedback y for each input x Sequence of decisions with eventual feedback No teacher that critiques individual actions Learn to act

More information

User Modeling 1. Predicting thoughts and actions. Fall 2017 PSYCH / CS

User Modeling 1. Predicting thoughts and actions. Fall 2017 PSYCH / CS User Modeling 1 Predicting thoughts and actions Fall 2017 PSYCH / CS 6755 1 Agenda Ø Cognitive models Ø Physical models Fall 2017 PSYCH / CS 6755 2 User Modeling Ø Build a model of how a user works, then

More information

Quantization. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Quantization. Robert M. Haralick. Computer Science, Graduate Center City University of New York Quantization Robert M. Haralick Computer Science, Graduate Center City University of New York Outline Quantizing 1 Quantizing 2 3 4 5 6 Quantizing Data is real-valued Data is integer valued with large

More information

Markov localization uses an explicit, discrete representation for the probability of all position in the state space.

Markov localization uses an explicit, discrete representation for the probability of all position in the state space. Markov Kalman Filter Localization Markov localization localization starting from any unknown position recovers from ambiguous situation. However, to update the probability of all positions within the whole

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Artificial Intelligence Review manuscript No. (will be inserted by the editor) Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Howard M. Schwartz Received:

More information

Multiobjective optimization methods

Multiobjective optimization methods Multiobjective optimization methods Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi spring 2014 TIES483 Nonlinear optimization No-preference methods DM not available (e.g. online optimization)

More information

Reinforcement Learning In Continuous Time and Space

Reinforcement Learning In Continuous Time and Space Reinforcement Learning In Continuous Time and Space presentation of paper by Kenji Doya Leszek Rybicki lrybicki@mat.umk.pl 18.07.2008 Leszek Rybicki lrybicki@mat.umk.pl Reinforcement Learning In Continuous

More information

Short Course: Multiagent Systems. Multiagent Systems. Lecture 1: Basics Agents Environments. Reinforcement Learning. This course is about:

Short Course: Multiagent Systems. Multiagent Systems. Lecture 1: Basics Agents Environments. Reinforcement Learning. This course is about: Short Course: Multiagent Systems Lecture 1: Basics Agents Environments Reinforcement Learning Multiagent Systems This course is about: Agents: Sensing, reasoning, acting Multiagent Systems: Distributed

More information

Maximizing throughput in zero-buffer tandem lines with dedicated and flexible servers

Maximizing throughput in zero-buffer tandem lines with dedicated and flexible servers Maximizing throughput in zero-buffer tandem lines with dedicated and flexible servers Mohammad H. Yarmand and Douglas G. Down Department of Computing and Software, McMaster University, Hamilton, ON, L8S

More information

Applications of Bayesian networks

Applications of Bayesian networks Applications of Bayesian networks Jiří Vomlel Laboratory for Intelligent Systems University of Economics, Prague Institute of Information Theory and Automation Academy of Sciences of the Czech Republic

More information

MTAT Software Engineering

MTAT Software Engineering MTAT.03.094 Software Engineering Lecture 14: Measurement Dietmar Pfahl Fall 2015 email: dietmar.pfahl@ut.ee Schedule of Lectures Week 01: Introduction to SE Week 02: Requirements Engineering I Week 03:

More information

An Introduction to Reinforcement Learning

An Introduction to Reinforcement Learning An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@csa.iisc.ernet.in Department of Computer Science and Automation Indian Institute of Science August 2014 What is Reinforcement

More information

CS230: Lecture 9 Deep Reinforcement Learning

CS230: Lecture 9 Deep Reinforcement Learning CS230: Lecture 9 Deep Reinforcement Learning Kian Katanforoosh Menti code: 21 90 15 Today s outline I. Motivation II. Recycling is good: an introduction to RL III. Deep Q-Learning IV. Application of Deep

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information