CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes

Similar documents
Using first-order logic, formalize the following knowledge:

Final Exam December 12, 2017

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

Final Exam December 12, 2017

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

Final Exam, Fall 2002

Introduction to Spring 2009 Artificial Intelligence Midterm Exam

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

CS221 Practice Midterm

CSE 546 Final Exam, Autumn 2013

Name: UW CSE 473 Final Exam, Fall 2014

Tentamen TDDC17 Artificial Intelligence 20 August 2012 kl

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

CS 570: Machine Learning Seminar. Fall 2016

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The Reinforcement Learning Problem

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Final Exam, Spring 2006

FINAL: CS 6375 (Machine Learning) Fall 2014

CS 188: Artificial Intelligence Spring Announcements

Reinforcement Learning: An Introduction

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Introduction to Spring 2006 Artificial Intelligence Practice Final

Homework 2: MDPs and Search

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability

Introduction to Fall 2009 Artificial Intelligence Final Exam

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

CS540 ANSWER SHEET

CS599 Lecture 1 Introduction To RL

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Exercises, II part Exercises, II part

Introduction to Fall 2008 Artificial Intelligence Midterm Exam

Introduction to Spring 2009 Artificial Intelligence Midterm Solutions

CS 4100 // artificial intelligence. Recap/midterm review!

ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007.

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability

Figure 1: Bayes Net. (a) (2 points) List all independence and conditional independence relationships implied by this Bayes net.

Final Examination CS 540-2: Introduction to Artificial Intelligence

Qualifier: CS 6375 Machine Learning Spring 2015

Be able to define the following terms and answer basic questions about them:

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

CS 234 Midterm - Winter

, and rewards and transition matrices as shown below:

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

15-780: Graduate Artificial Intelligence. Reinforcement learning (RL)

Markov Models and Reinforcement Learning. Stephen G. Ware CSCI 4525 / 5525

Examination Artificial Intelligence Module Intelligent Interaction Design December 2014

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

Final exam of ECE 457 Applied Artificial Intelligence for the Fall term 2007.

Lecture 3: Markov Decision Processes

Machine Learning, Fall 2009: Midterm

Decision making, Markov decision processes

CS 188 Fall Introduction to Artificial Intelligence Midterm 2

Introduction to Fall 2011 Artificial Intelligence Final Exam

Q-learning. Tambet Matiisen

Q-Learning in Continuous State Action Spaces

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

CS 188 Fall Introduction to Artificial Intelligence Midterm 1

Reinforcement Learning

CITS4211 Mid-semester test 2011

16.4 Multiattribute Utility Functions

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Quiz 1 Date: Monday, October 17, 2016

Lecture 23: Reinforcement Learning

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Final Exam, Machine Learning, Spring 2009

Planning by Probabilistic Inference

Midterm 2 V1. Introduction to Artificial Intelligence. CS 188 Spring 2015

CS221 Practice Midterm #2 Solutions

Final Examination CS540-2: Introduction to Artificial Intelligence

CSCI3390-Lecture 14: The class NP

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Introduction to Reinforcement Learning

Lecture 1: March 7, 2018

1 [15 points] Search Strategies

CS 188 Introduction to AI Fall 2005 Stuart Russell Final

Planning in Markov Decision Processes

Machine Learning, Midterm Exam

Markov decision processes

Final. CS 188 Fall Introduction to Artificial Intelligence

Reading Response: Due Wednesday. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

Decision Theory: Markov Decision Processes

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Math 115 Practice for Exam 2

Hidden Markov Models (HMM) and Support Vector Machine (SVM)

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Andrew/CS ID: Midterm Solutions, Fall 2006

Midterm. Introduction to Artificial Intelligence. CS 188 Summer You have approximately 2 hours and 50 minutes.

Reinforcement Learning II

15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted

CSE 150. Assignment 6 Summer Maximum likelihood estimation. Out: Thu Jul 14 Due: Tue Jul 19

Sample questions for COMP-424 final exam

2 n Minesweeper Consistency Problem is in P

Introduction to Machine Learning Midterm Exam

Transcription:

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes Name: Roll Number: Please read the following instructions carefully Ø Calculators are allowed. However, laptops or mobile phones are not allowed. Ø You can bring one A4 size cheat sheet. Please attach the cheat sheet along with this booklet. Ø Use the space provided after every question for writing your answer. You will be given additional sheets for rough work. Please attach the additional sheet(s) along with this booklet. Ø Be precise and concise in your answers. Ø Include explanations, derivations, and examples when appropriate. This can fetch partial scores even if the final answer is incorrect. Ø Please write legibly Ø There are 5 questions worth a total of 50 points. Ø Work efficiently. Some questions are easier than others. Try to answer the easier ones before you get bogged down by the harder ones. Ø Keep Calm and Good Luck. # Question Max. Score Score 1 Minesweeper 8 2 Planners 10 3 Bayesian Networks 12 4 Markov Decision Processes 12 5 Reinforcement Learning 8 Total 50

1. Minesweeper (8 points) Minesweeper is a single player puzzle. The objective of the game is to clear a rectangular board containing hidden mines without detonating any of them using clues about the number of neighboring mines in each field. Each square in the rectangular board can be cleared, by clicking on it. If a square that contains a mine is clicked, the game is over. If the square does not contain a mine, one of the two things can happen A number between 1 and 8 appears indicating the number of adjacent (including diagonallyadjacent) squares containing mines. No number appears; in which case, there are no mines in the adjacent cells. The figure below is an example of the game. (4,2) a. Define a first order language (functions, objects, relations) that allows to formalize the knowledge of a player in the game. Represent the following knowledge using the defined language (5 points) There are exactly n mines in the minefield If a cell contains the number 1, then there is exactly one mine in the adjacent cells b. Prove by resolution that there must be a mine in the position (4,2) in the Figure above. (3 points)

2. Planners (10 points) Consider the following artificial planning problem: Initial State: X Goal State: Y, Z Actions: A1 Prec: none Effect: Y, X A2 Prec: X Effect: Z a. Construct the tree resulting from performing one level of progression search. Complete the branch of the tree that will result in the solution. (2 points) b. Construct the tree resulting from performing one level of regression search. Complete the branch of the tree that will result in the solution. (2 points). c. Construct the planning graph until the goals are satisfied. (2 points) d. Identify all the mutex relationships that exists in the graph. (2 points)

e. In general, suppose the progression search is conducted using A* search with a heuristic h that is inadmissible, but overestimates the cost by k units. (if the true cost is c, h might give an estimate of c + k). Can we give a guarantee on how far the plan found by A* will be from the optimum? (2 points)

3. Bayesian Networks (12 points) 3.1 Consider the Bayes network shown below. A B C D E F a. Is A conditionally independent of E given F? Explain.(1 point) b. Given the CPT for A, B, and C and the full joint distribution table, compute the CPT for nodes D, E, and F. (4 points)

c. Suppose that the variables A, B, C, and F have been observed. Variables D and E are unobserved. Prove from first principles that removing node D from the network will not affect the posterior distribution for E. (3 points) d. Under the same assumptions as part c, can we remove node D if we are planning to use rejection sampling and likelihood weighting for obtaining the posterior distribution for E? Explain (4 points)

4. Markov Decision Processes (12 points) 4.1 An agent would like to use standard search techniques for solving an MDP. What should be the conditions on the MDP to perform the standard search? (2 points) 4.2 Given a fixed policy π, where π s is the deterministic action to be taken in state s, the value of the policy satisfies the following equation: V ) s = T s, π s, s - R s, π s, s -, γv ) s - 0 1 On the other hand, a stochastic policy does not recommend a single, deterministic action for each state. It gives for each possible action a in a state s a probability - π a, s = P a s. Modify the above equation to compute the value of a stochastic policy π. (3 points) 4.3 Consider the grid world, illustrated in the Figure below, where A is the start state and the squares with the double rectangle are the exit states. For an exit state, the only action available is Exit, which results in the listed reward and ends the game. For the non-exit states, the agent can choose either East, West, North, or South actions, which move the agent in the corresponding direction; i.e., the actions are deterministic. There are no living rewards. Assume that V 5 s = 0, s, and γ = 1 Z +5 Y A X +10 +15 1 2 3 a. What is the optimal value V A? (1 point) b. When running value iteration, what is the non-zero value of V ; A? What is the value of k when V ; A takes this non-zero value? (2 points)

c. After how many value iterations will V ; A = V A? (write never, if they will never become equal) (2 points) d. If γ = 0.5, what is the optimal value V A? (2 points) 5. Reinforcement Learning (8 points) Consider the grid world illustrated below. The agent is trying to learn the optimal policy. At any square the agent can move North (N), South (S), East(E), or West(W). The terminal states (marked using double squares) also have the exit action performing which the MDP terminates. There are no living rewards. The agent received rewards only while exiting from the terminal states. Let us assume that γ = 1 and α = 0.5. 3-10 -10-10 2 A 1 +10 +15 1 2 3 The agent starts exploring the grid from (2,1) resulting in the following set of episodes. Each entry in the episode is a tuple of the form s, a, s -, r. The agent was in state s, performed the action a, ended in state s, resulting in a reward of r. Episode 1 Episode 2 Episode 3 Episode 4 Episode 5 (2,1), E, (2,2), 0 (2,1), E, (2,2), 0 (2,1), E, (2,2), 0 (2,1), S, (1,1), 0 (2,1), E, (2,2), 0 (2,2), S, (1,2), 0 (2,2), S, (1,2), 0 (2,2), E, (2,3), 0 (1,1), Exit, -, +10 (2,2), S, (1,2), 0 (1,2), E, (1,3), 0 (1,2), N, (2,2), 0 (2,3), N, (3,3), 0 (1,2), E, (1,3), 0 (1,3), Exit, -, +15 (2,2), N, (3,2), 0 (3,3), Exit, -, -10 (1,3), Exit, -, +15 (3,2), Exit, -, -10

a. If the agent were to employ direct utility estimation, what would be the q-value estimates for ((2,2), S), ((1,2), E), ((2,3), E) and ((2,3), N)? (2 points) b. If the agent were to employ Q-learning, what would be the q-value estimates for ((2,2), S), ((1,2), E), ((2,3), E) and ((2,3), N). Also indicate the episode and iteration number when the q-value estimates for these q-states become non-zero? If the q-value never becomes non-zero, write never. (4 points) c. In general, suppose we have a deterministic MDP, the Q-learning update with a learning rate of α = 1 will correctly learn the optimal q-values. True or False, Explain. (2 points)