CS343 Artificial Intelligence
|
|
- Amelia Fleming
- 5 years ago
- Views:
Transcription
1 CS343 Artificial Intelligence Prof: Department of Computer Science The University of Texas at Austin
2 Good Afternoon, Colleagues
3 Good Afternoon, Colleagues Are there any questions?
4 Logistics Problems with the assignment?
5 Logistics Problems with the assignment? 3.1: Goal formulation vs. problem formulation
6 Logistics Problems with the assignment? 3.1: Goal formulation vs. problem formulation Mailing list
7 Logistics Problems with the assignment? 3.1: Goal formulation vs. problem formulation Mailing list CC Daniel and me on everything
8 Logistics Problems with the assignment? 3.1: Goal formulation vs. problem formulation Mailing list CC Daniel and me on everything Use correct subject line
9 Logistics Problems with the assignment? 3.1: Goal formulation vs. problem formulation Mailing list CC Daniel and me on everything Use correct subject line Next week s assignments up
10 Logistics Problems with the assignment? 3.1: Goal formulation vs. problem formulation Mailing list CC Daniel and me on everything Use correct subject line Next week s assignments up Comments on Forum for AI talk last Friday?
11 Logistics Problems with the assignment? 3.1: Goal formulation vs. problem formulation Mailing list CC Daniel and me on everything Use correct subject line Next week s assignments up Comments on Forum for AI talk last Friday? Reading responses good overall
12 An Example
13 An Example You, as a class, act as a learning agent Actions: Wave, Stand, Clap Observations: colors, reward Goal: Find an optimal policy Way of selecting actions that gets you the most reward
14 How did you do it?
15 How did you do it? What is your policy? What does the world look like?
16 Formalizing the Activity Knowns:
17 Formalizing the Activity Knowns: O = {Blue, Red, Green, Black,...} Rewards in IR A = {W ave, Clap, Stand} o 0, a 0, r 0, o 1, a 1, r 1, o 2,...
18 Formalizing the Activity Knowns: O = {Blue, Red, Green, Black,...} Rewards in IR A = {W ave, Clap, Stand} o 0, a 0, r 0, o 1, a 1, r 1, o 2,... Unknowns:
19 Formalizing the Activity Knowns: O = {Blue, Red, Green, Black,...} Rewards in IR A = {W ave, Clap, Stand} o 0, a 0, r 0, o 1, a 1, r 1, o 2,... Unknowns: S = 4x3 grid R : S A IR P = S O T : S A S
20 Formalizing the Activity Knowns: O = {Blue, Red, Green, Black,...} Rewards in IR A = {W ave, Clap, Stand} o 0, a 0, r 0, o 1, a 1, r 1, o 2,... Unknowns: S = 4x3 grid R : S A IR P = S O T : S A S o i = P(s i )
21 Formalizing the Activity Knowns: O = {Blue, Red, Green, Black,...} Rewards in IR A = {W ave, Clap, Stand} o 0, a 0, r 0, o 1, a 1, r 1, o 2,... Unknowns: S = 4x3 grid R : S A IR P = S O T : S A S o i = P(s i ) r i = R(s i, a i )
22 Formalizing the Activity Knowns: O = {Blue, Red, Green, Black,...} Rewards in IR A = {W ave, Clap, Stand} o 0, a 0, r 0, o 1, a 1, r 1, o 2,... Unknowns: S = 4x3 grid R : S A IR P = S O T : S A S o i = P(s i ) r i = R(s i, a i ) s i+1 = T (s i, a i )
23 Formalizing the Activity Knowns: O = {Blue, Red, Green, Black,...} Rewards in IR A = {W ave, Clap, Stand} o 0, a 0, r 0, o 1, a 1, r 1, o 2,... Unknowns: S = 4x3 grid R : S A IR P = S O T : S A S o i = P(s i ) r i = R(s i, a i ) s i+1 = T (s i, a i )
24 PEAS description Performance measure: Environment: Actuators: Sensors:
25 PEAS description Performance measure: Environment: Actuators: Sensors: fully observable vs. partially observable (accessible)
26 PEAS description Performance measure: Environment: Actuators: Sensors: fully observable vs. partially observable (accessible) single-agent vs. multiagent
27 PEAS description Performance measure: Environment: Actuators: Sensors: fully observable vs. partially observable (accessible) single-agent vs. multiagent deterministic vs. non-deterministic
28 PEAS description Performance measure: Environment: Actuators: Sensors: fully observable vs. partially observable (accessible) single-agent vs. multiagent deterministic vs. non-deterministic episodic vs. sequential
29 PEAS description Performance measure: Environment: Actuators: Sensors: fully observable vs. partially observable (accessible) single-agent vs. multiagent deterministic vs. non-deterministic episodic vs. sequential static vs. dynamic
30 PEAS description Performance measure: Environment: Actuators: Sensors: fully observable vs. partially observable (accessible) single-agent vs. multiagent deterministic vs. non-deterministic episodic vs. sequential static vs. dynamic discrete vs. continuous
31 PEAS description Performance measure: Environment: Actuators: Sensors: fully observable vs. partially observable (accessible) single-agent vs. multiagent deterministic vs. non-deterministic episodic vs. sequential static vs. dynamic discrete vs. continuous known vs. unknown
32 Next week: Relaxing the Assumptions Textbook readings: continuous spaces, nondeterminism, partial observations, unknown environments,... Responses both Monday and Wednesday Search programming assignment
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationTemporal Difference Learning & Policy Iteration
Temporal Difference Learning & Policy Iteration Advanced Topics in Reinforcement Learning Seminar WS 15/16 ±0 ±0 +1 by Tobias Joppen 03.11.2015 Fachbereich Informatik Knowledge Engineering Group Prof.
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum The niversity of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188
More informationCS788 Dialogue Management Systems Lecture #2: Markov Decision Processes
CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes Kee-Eung Kim KAIST EECS Department Computer Science Division Markov Decision Processes (MDPs) A popular model for sequential decision
More informationCS 4700: Foundations of Artificial Intelligence Ungraded Homework Solutions
CS 4700: Foundations of Artificial Intelligence Ungraded Homework Solutions 1. Neural Networks: a. There are 2 2n distinct Boolean functions over n inputs. Thus there are 16 distinct Boolean functions
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of
More informationKnowledge-based Agents. CS 331: Artificial Intelligence Propositional Logic I. Knowledge-based Agents. Outline. Knowledge-based Agents
Knowledge-based Agents CS 331: Artificial Intelligence Propositional Logic I Can represent knowledge And reason with this knowledge How is this different from the knowledge used by problem-specific agents?
More informationCS 331: Artificial Intelligence Propositional Logic I. Knowledge-based Agents
CS 331: Artificial Intelligence Propositional Logic I 1 Knowledge-based Agents Can represent knowledge And reason with this knowledge How is this different from the knowledge used by problem-specific agents?
More informationMarkov Decision Processes
Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour
More informationA Polynomial-time Nash Equilibrium Algorithm for Repeated Games
A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman mlittman@cs.rutgers.edu Rutgers University Peter Stone pstone@cs.utexas.edu The University of Texas at Austin Main Result
More informationSeminar in Artificial Intelligence Near-Bayesian Exploration in Polynomial Time
Seminar in Artificial Intelligence Near-Bayesian Exploration in Polynomial Time 26.11.2015 Fachbereich Informatik Knowledge Engineering Group David Fischer 1 Table of Contents Problem and Motivation Algorithm
More informationMS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction
MS&E338 Reinforcement Learning Lecture 1 - April 2, 2018 Introduction Lecturer: Ben Van Roy Scribe: Gabriel Maher 1 Reinforcement Learning Introduction In reinforcement learning (RL) we consider an agent
More informationCS 570: Machine Learning Seminar. Fall 2016
CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationMultiagent Value Iteration in Markov Games
Multiagent Value Iteration in Markov Games Amy Greenwald Brown University with Michael Littman and Martin Zinkevich Stony Brook Game Theory Festival July 21, 2005 Agenda Theorem Value iteration converges
More informationCoarticulation in Markov Decision Processes
Coarticulation in Markov Decision Processes Khashayar Rohanimanesh Department of Computer Science University of Massachusetts Amherst, MA 01003 khash@cs.umass.edu Sridhar Mahadevan Department of Computer
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out
More informationOptimally Solving Dec-POMDPs as Continuous-State MDPs
Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Dibangoye (1), Chris Amato (2), Olivier Buffet (1) and François Charpillet (1) (1) Inria, Université de Lorraine France (2) MIT, CSAIL USA IJCAI
More informationCS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability
CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability Due: Thursday 10/15 in 283 Soda Drop Box by 11:59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators)
More informationLecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning
More informationCSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes
CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes Name: Roll Number: Please read the following instructions carefully Ø Calculators are allowed. However, laptops or mobile phones are not
More informationOutline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012
CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationShort Course: Multiagent Systems. Multiagent Systems. Lecture 1: Basics Agents Environments. Reinforcement Learning. This course is about:
Short Course: Multiagent Systems Lecture 1: Basics Agents Environments Reinforcement Learning Multiagent Systems This course is about: Agents: Sensing, reasoning, acting Multiagent Systems: Distributed
More informationExercises, II part Exercises, II part
Inference: 12 Jul 2012 Consider the following Joint Probability Table for the three binary random variables A, B, C. Compute the following queries: 1 P(C A=T,B=T) 2 P(C A=T) P(A, B, C) A B C 0.108 T T
More informationCS 4649/7649 Robot Intelligence: Planning
CS 4649/7649 Robot Intelligence: Planning Probability Primer Sungmoon Joo School of Interactive Computing College of Computing Georgia Institute of Technology S. Joo (sungmoon.joo@cc.gatech.edu) 1 *Slides
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Probability Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationRecap from Last Time
NP-Completeness Recap from Last Time Analyzing NTMs When discussing deterministic TMs, the notion of time complexity is (reasonably) straightforward. Recall: One way of thinking about nondeterminism is
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationIntroduction to Artificial Intelligence. Logical Agents
Introduction to Artificial Intelligence Logical Agents (Logic, Deduction, Knowledge Representation) Bernhard Beckert UNIVERSITÄT KOBLENZ-LANDAU Winter Term 2004/2005 B. Beckert: KI für IM p.1 Outline Knowledge-based
More informationPlanning in Markov Decision Processes
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov
More informationCoarticulation in Markov Decision Processes Khashayar Rohanimanesh Robert Platt Sridhar Mahadevan Roderic Grupen CMPSCI Technical Report 04-33
Coarticulation in Markov Decision Processes Khashayar Rohanimanesh Robert Platt Sridhar Mahadevan Roderic Grupen CMPSCI Technical Report 04- June, 2004 Department of Computer Science University of Massachusetts
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More informationInternet Monetization
Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition
More informationCSE250A Fall 12: Discussion Week 9
CSE250A Fall 12: Discussion Week 9 Aditya Menon (akmenon@ucsd.edu) December 4, 2012 1 Schedule for today Recap of Markov Decision Processes. Examples: slot machines and maze traversal. Planning and learning.
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationThis question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.
This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you
More informationReinforcement Learning. George Konidaris
Reinforcement Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom
More informationLearning Dexterity Matthias Plappert SEPTEMBER 6, 2018
Learning Dexterity Matthias Plappert SEPTEMBER 6, 2018 OpenAI OpenAI is a non-profit AI research company, discovering and enacting the path to safe artificial general intelligence. OpenAI OpenAI is a non-profit
More informationProgress in Learning 3 vs. 2 Keepaway
Progress in Learning 3 vs. 2 Keepaway Gregory Kuhlmann Department of Computer Sciences University of Texas at Austin Joint work with Peter Stone RoboCup and Reinforcement Learning Reinforcement Learning
More informationCS221 Practice Midterm
CS221 Practice Midterm Autumn 2012 1 ther Midterms The following pages are excerpts from similar classes midterms. The content is similar to what we ve been covering this quarter, so that it should be
More informationCS 7180: Behavioral Modeling and Decisionmaking
CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and
More informationMenu. Lecture 2: Orders of Growth. Predicting Running Time. Order Notation. Predicting Program Properties
CS216: Program and Data Representation University of Virginia Computer Science Spring 2006 David Evans Lecture 2: Orders of Growth Menu Predicting program properties Orders of Growth: O, Ω Course Survey
More informationDeep Reinforcement Learning
Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 50 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015
More informationCS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability
CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability Due: Thursday 10/15 in 283 Soda Drop Box by 11:59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators)
More informationAn Adaptive Clustering Method for Model-free Reinforcement Learning
An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at
More informationReinforcement Learning. Yishay Mansour Tel-Aviv University
Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak
More informationHuman-level control through deep reinforcement. Liia Butler
Humanlevel control through deep reinforcement Liia Butler But first... A quote "The question of whether machines can think... is about as relevant as the question of whether submarines can swim" Edsger
More informationEE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 10, 5/9/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes Logical Agents Chapter 7
More informationNeural Map. Structured Memory for Deep RL. Emilio Parisotto
Neural Map Structured Memory for Deep RL Emilio Parisotto eparisot@andrew.cmu.edu PhD Student Machine Learning Department Carnegie Mellon University Supervised Learning Most deep learning problems are
More informationThe Reinforcement Learning Problem
The Reinforcement Learning Problem Slides based on the book Reinforcement Learning by Sutton and Barto Formalizing Reinforcement Learning Formally, the agent and environment interact at each of a sequence
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationA Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty
2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of
More informationMarkov decision processes (MDP) CS 416 Artificial Intelligence. Iterative solution of Bellman equations. Building an optimal policy.
Page 1 Markov decision processes (MDP) CS 416 Artificial Intelligence Lecture 21 Making Complex Decisions Chapter 17 Initial State S 0 Transition Model T (s, a, s ) How does Markov apply here? Uncertainty
More informationZero-Sum Stochastic Games An algorithmic review
Zero-Sum Stochastic Games An algorithmic review Emmanuel Hyon LIP6/Paris Nanterre with N Yemele and L Perrotin Rosario November 2017 Final Meeting Dygame Dygame Project Amstic Outline 1 Introduction Static
More informationLogical Agent & Propositional Logic
Logical Agent & Propositional Logic Berlin Chen 2005 References: 1. S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Chapter 7 2. S. Russell s teaching materials Introduction The representation
More informationReinforcement Learning: An Introduction
Introduction Betreuer: Freek Stulp Hauptseminar Intelligente Autonome Systeme (WiSe 04/05) Forschungs- und Lehreinheit Informatik IX Technische Universität München November 24, 2004 Introduction What is
More informationKecerdasan Buatan M. Ali Fauzi
Kecerdasan Buatan M. Ali Fauzi Artificial Intelligence M. Ali Fauzi Logical Agents M. Ali Fauzi In which we design agents that can form representations of the would, use a process of inference to derive
More informationIntroduction to Arti Intelligence
Introduction to Arti Intelligence cial Lecture 4: Constraint satisfaction problems 1 / 48 Constraint satisfaction problems: Today Exploiting the representation of a state to accelerate search. Backtracking.
More informationReinforcement Learning. Machine Learning, Fall 2010
Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30
More informationMAT01A1. Numbers, Inequalities and Absolute Values. (Appendix A)
MAT01A1 Numbers, Inequalities and Absolute Values (Appendix A) Dr Craig 8 February 2017 Leftovers from yesterday: lim n i=1 3 = lim n n 3 = lim n n n 3 i ) 2 ] + 1 n[( n ( n i 2 n n + 2 i=1 i=1 3 = lim
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188
More informationDon t Plan for the Unexpected: Planning Based on Plausibility Models
Don t Plan for the Unexpected: Planning Based on Plausibility Models Thomas Bolander, DTU Informatics, Technical University of Denmark Joint work with Mikkel Birkegaard Andersen and Martin Holm Jensen
More information1 Introduction 2. 4 Q-Learning The Q-value The Temporal Difference The whole Q-Learning process... 5
Table of contents 1 Introduction 2 2 Markov Decision Processes 2 3 Future Cumulative Reward 3 4 Q-Learning 4 4.1 The Q-value.............................................. 4 4.2 The Temporal Difference.......................................
More informationDate Morning Length Afternoon Length. WBI03 Biology Unit 3: Practical Biology and Research Skills 1h 30m
Week 1 Date Morning Length Afternoon Length Monday 9 May Tuesday 10 May Wednesday 11 May WBI03 Biology Unit 3: Practical Biology and Research Skills WCH03 Chemistry Unit 3: Chemistry Laboratory Skills
More informationMAT01A1. Numbers, Inequalities and Absolute Values. (Appendix A)
MAT01A1 Numbers, Inequalities and Absolute Values (Appendix A) Dr Craig 7 February 2018 Leftovers from yesterday: lim n i=1 3 = lim n n 3 = lim n n n 3 i ) 2 ] + 1 n[( n ( n i 2 n n + 2 i=1 i=1 3 = lim
More informationPlanning by Probabilistic Inference
Planning by Probabilistic Inference Hagai Attias Microsoft Research 1 Microsoft Way Redmond, WA 98052 Abstract This paper presents and demonstrates a new approach to the problem of planning under uncertainty.
More informationMarkov Decision Processes and Solving Finite Problems. February 8, 2017
Markov Decision Processes and Solving Finite Problems February 8, 2017 Overview of Upcoming Lectures Feb 8: Markov decision processes, value iteration, policy iteration Feb 13: Policy gradients Feb 15:
More informationRecap from Last Time
P and NP Recap from Last Time The Limits of Decidability In computability theory, we ask the question What problems can be solved by a computer? In complexity theory, we ask the question What problems
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationArtificial Intelligence & Sequential Decision Problems
Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet
More informationFinal. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.
CS 188 Spring 2014 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed notes except your two-page crib sheet. Mark your answers
More informationCS311H. Prof: Peter Stone. Department of Computer Science The University of Texas at Austin
CS311H Prof: Department of Computer Science The University of Texas at Austin Recurrences In how many ways can a 2 n rectangular checkerboard be tiled using 1 2 and 2 2 pieces? Good Morning, Colleagues
More informationReview: TD-Learning. TD (SARSA) Learning for Q-values. Bellman Equations for Q-values. P (s, a, s )[R(s, a, s )+ Q (s, (s ))]
Review: TD-Learning function TD-Learning(mdp) returns a policy Class #: Reinforcement Learning, II 8s S, U(s) =0 set start-state s s 0 choose action a, using -greedy policy based on U(s) U(s) U(s)+ [r
More informationTentamen TDDC17 Artificial Intelligence 20 August 2012 kl
Linköpings Universitet Institutionen för Datavetenskap Patrick Doherty Tentamen TDDC17 Artificial Intelligence 20 August 2012 kl. 08-12 Points: The exam consists of exercises worth 32 points. To pass the
More informationSection Handout 5 Solutions
CS 188 Spring 2019 Section Handout 5 Solutions Approximate Q-Learning With feature vectors, we can treat values of states and q-states as linear value functions: V (s) = w 1 f 1 (s) + w 2 f 2 (s) +...
More informationFinal exam of ECE 457 Applied Artificial Intelligence for the Fall term 2007.
Fall 2007 / Page 1 Final exam of ECE 457 Applied Artificial Intelligence for the Fall term 2007. Don t panic. Be sure to write your name and student ID number on every page of the exam. The only materials
More informationCS 380: ARTIFICIAL INTELLIGENCE LOGICAL AGENTS. Santiago Ontañón
CS 380: RTIFICIL INTELLIGENCE LOGICL GENTS Santiago Ontañón so367@dreel.edu Summary so far: What is I? Rational gents Problem solving: Systematic search Uninformed search: DFS, BFS, ID Informed search:
More informationSequential Decision Problems
Sequential Decision Problems Michael A. Goodrich November 10, 2006 If I make changes to these notes after they are posted and if these changes are important (beyond cosmetic), the changes will highlighted
More informationReminders : Sign up to make up Labs (Tues/Thurs from 3 4 pm)
Monday, 5.11.15 Learning Target : I can identify nuclear reactions based on the characteristics of their chemical equations. Homework: Bingo packet 2 due Thursday What is a fission reaction? What is a
More informationIntroduction to Quantum Computing
Introduction to Quantum Computing Einar Pius University of Edinburgh Why this Course To raise interest in quantum computing To show how quantum computers could be useful To talk about concepts not found
More informationConnecting the Demons
Connecting the Demons How connection choices of a Horde implementation affect Demon prediction capabilities. Author: Jasper van der Waa jaspervanderwaa@gmail.com Supervisors: Dr. ir. Martijn van Otterlo
More information6 Reinforcement Learning
6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,
More information15-780: ReinforcementLearning
15-780: ReinforcementLearning J. Zico Kolter March 2, 2016 1 Outline Challenge of RL Model-based methods Model-free methods Exploration and exploitation 2 Outline Challenge of RL Model-based methods Model-free
More informationArtificial Intelligence Chapter 7: Logical Agents
Artificial Intelligence Chapter 7: Logical Agents Michael Scherger Department of Computer Science Kent State University February 20, 2006 AI: Chapter 7: Logical Agents 1 Contents Knowledge Based Agents
More informationCS 598 Statistical Reinforcement Learning. Nan Jiang
CS 598 Statistical Reinforcement Learning Nan Jiang Overview What s this course about? A grad-level seminar course on theory of RL 3 What s this course about? A grad-level seminar course on theory of RL
More informationProbabilistic Planning. George Konidaris
Probabilistic Planning George Konidaris gdk@cs.brown.edu Fall 2017 The Planning Problem Finding a sequence of actions to achieve some goal. Plans It s great when a plan just works but the world doesn t
More informationReinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it
More informationREINFORCEMENT LEARNING
REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents
More informationCOMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati
COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning
More informationCS:4420 Artificial Intelligence
CS:4420 rtificial Intelligence Spring 2018 Logical gents Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart Russell
More informationQ-learning Tutorial. CSC411 Geoffrey Roeder. Slides Adapted from lecture: Rich Zemel, Raquel Urtasun, Sanja Fidler, Nitish Srivastava
Q-learning Tutorial CSC411 Geoffrey Roeder Slides Adapted from lecture: Rich Zemel, Raquel Urtasun, Sanja Fidler, Nitish Srivastava Tutorial Agenda Refresh RL terminology through Tic Tac Toe Deterministic
More informationLinear Least-squares Dyna-style Planning
Linear Least-squares Dyna-style Planning Hengshuai Yao Department of Computing Science University of Alberta Edmonton, AB, Canada T6G2E8 hengshua@cs.ualberta.ca Abstract World model is very important for
More informationFinite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs
Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu
More informationReading Response: Due Wednesday. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
Reading Response: Due Wednesday R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Another Example Get to the top of the hill as quickly as possible. reward = 1 for each step where
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationFinite Automata Part One
Finite Automata Part One Computability Theory What problems can we solve with a computer? What kind of computer? Computers are Messy http://www.intel.com/design/intarch/prodbref/27273.htm We need a simpler
More informationCS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!
CS188 Outline We re done with art I: Search and lanning! CS 188: Artificial Intelligence robability art II: robabilistic Reasoning Diagnosis Speech recognition Tracking objects Robot mapping Genetics Error
More information