Using first-order logic, formalize the following knowledge:

Similar documents
1 [15 points] Search Strategies

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

Final Exam December 12, 2017

Final Exam December 12, 2017

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

CSE 546 Final Exam, Autumn 2013

Final Exam, Spring 2006

CS221 Practice Midterm

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability

Final Exam, Fall 2002

Introduction to Fall 2009 Artificial Intelligence Final Exam

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability

Qualifier: CS 6375 Machine Learning Spring 2015

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Introduction to Fall 2008 Artificial Intelligence Midterm Exam

Intelligent Systems:

Be able to define the following terms and answer basic questions about them:

CS 570: Machine Learning Seminar. Fall 2016

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

CS 4100 // artificial intelligence. Recap/midterm review!

Introduction to Spring 2006 Artificial Intelligence Practice Final

Name: UW CSE 473 Final Exam, Fall 2014

Andrew/CS ID: Midterm Solutions, Fall 2006

Introduction to Artificial Intelligence Midterm 2. CS 188 Spring You have approximately 2 hours and 50 minutes.

Don t Plan for the Unexpected: Planning Based on Plausibility Models

CS599 Lecture 1 Introduction To RL

Lecture 1: March 7, 2018

Final Exam, Machine Learning, Spring 2009

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Final Examination. Adrian Georgi Josh Karen Lee Min Nikos Tina. There are 12 problems totaling 150 points. Total time is 170 minutes.

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Midterm. Introduction to Artificial Intelligence. CS 188 Summer You have approximately 2 hours and 50 minutes.

Be able to define the following terms and answer basic questions about them:

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Machine Learning, Midterm Exam

15-780: Graduate Artificial Intelligence. Reinforcement learning (RL)

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.

Machine Learning 4771

Figure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. (c) P (E B) C P (D = t) f 0.9 t 0.

Learning Tetris. 1 Tetris. February 3, 2009

Introduction to Spring 2009 Artificial Intelligence Midterm Exam

CS 188 Fall Introduction to Artificial Intelligence Midterm 2

Machine Learning I Reinforcement Learning

Final Examination CS 540-2: Introduction to Artificial Intelligence

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Quiz 1 Date: Monday, October 17, 2016

1 MDP Value Iteration Algorithm

Grundlagen der Künstlichen Intelligenz

Introduction to Reinforcement Learning Part 1: Markov Decision Processes

1. (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a

2 : Directed GMs: Bayesian Networks

Reinforcement Learning: An Introduction

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

total 58

PROBABILISTIC REASONING SYSTEMS

6.867 Machine learning, lecture 23 (Jaakkola)

Figure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. C P (D = t) f 0.9 t 0.

Parts 3-6 are EXAMPLES for cse634

Internet Monetization

Artificial Intelligence

Bayesian Networks. Motivation

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Reinforcement Learning and Control

Section Handout 5 Solutions

CS 4700: Foundations of Artificial Intelligence Ungraded Homework Solutions

Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007.

Lecture 7: DecisionTrees

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Lecture 8: Bayesian Networks

Introduction to Artificial Intelligence (AI)

Tentamen TDDC17 Artificial Intelligence 20 August 2012 kl

Chapter 4: Dynamic Programming

CS181 Midterm 2 Practice Solutions

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Reinforcement Learning

Lecture 15. Probabilistic Models on Graph

CS 234 Midterm - Winter

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Bayesian Networks Inference with Probabilistic Graphical Models

SFM-11:CONNECT Summer School, Bertinoro, June 2011

9 Forward-backward algorithm, sum-product on factor graphs

Lecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Chapter 10 Markov Chains and Transition Matrices

FINAL: CS 6375 (Machine Learning) Fall 2014

1 [15 points] Frequent Itemsets Generation With Map-Reduce

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

Hidden Markov Models (HMM) and Support Vector Machine (SVM)

10601 Machine Learning Assignment 7: Graphical Models

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Final Examination CS540-2: Introduction to Artificial Intelligence

Probabilistic Reasoning Systems

Lecture 23: More PSPACE-Complete, Randomized Complexity

Transcription:

Probabilistic Artificial Intelligence Final Exam Feb 2, 2016 Time limit: 120 minutes Number of pages: 19 Total points: 100 You can use the back of the pages if you run out of space. Collaboration on the exam is strictly forbidden. (1 point) Please fill in your student ID and fullname (LASTNAME, FIRSTNAME) in all capital letters. 1. Propositional and First-order Logic (12 points) A minesweeper world is a rectangular grid of m squares with n invisible mines scattered among them. Each square contains at most 1 mine. Any square may be probed by the minesweeper; if the probed square does not contain a mine, then a number from 0-8 will be revealed, indicating the number of mines that are directly or diagonally adjacent. The minesweeper has to find all the mines in a given minefield without detonating any of them. (4 points) (i) We want to provide a first-order knowledge base that formalizes the knowledge of a player in a game state. We consider the following first-order vocabulary: Mine(x), a unary predicate, which denotes that the square x contains a mine Adj(x, y), a binary predicate, which means that the square x is adjacent to the square y Contains(x, n), a binary predicate, which denotes that the square x contains the number n Using first-order logic, formalize the following knowledge: If a square contains the number 1, then there cannot be a mine in that square, and there is exactly one mine in the adjacent squares. 1

(8 points) (ii) Consider the game state provided in the figure below. The shaded squares represent locations that have not yet been probed. We use the propositional symbols A I, A II, B I, B II, C I, C II to denote that the corresponding square contains a mine. For example, A I denotes that square (A, I) contains a mine, and A I denotes that there is no mine in square (A, I). I 1 II A 2 A B C We have already established the following facts in propositional logic: B I C I B II B I A II B II B II A II B I Using propositional resolution, and the three facts provided above, prove that there is no mine in square (A, II). 2

3

2. Information Cascade (20 points) As part of developing an early warning system against earthquakes, the government of a city has installed a grid of sound alarms throughout the city to notify the people in case an earthquake is imminent. The nodes in the grid are arranged in the form of a binary tree. The root of the tree receives the noise-free signal about the earthquake, and the information is transmitted by a node to its left and right child nodes, thus creating a downward information cascade starting from the root to the leaves. Let there be a total of N nodes in the grid. The scenario is modeled by the following Bayesian network shown below for the case of N = 7 total nodes: X 1 X 2 X 3 X 4 X 5 X 6 X 7 Bayesian network for N = 7 nodes Each node X i represents a random variable corresponding to the binary signal {0, 1} observed at that node. For the root node X 1 where the information cascade begins, the probability of observing a signal of 1 is 2 /3. As the cascade unfolds, each node transmits the signal it received to its left and right children nodes. However, there is a transmission error of 1 /4 at the receiving-end, i.e. the probability that a node X i observes the same signal as that observed by its parent π(x i ) is 3 /4. The conditional probability tables of this information cascade process are shown in the tables below. X 1 P (X 1 ) 1 2/3 0 1/3 X π(x) P (X π(x)) 1 1 3/4 0 1 1/4 1 0 1/4 0 0 3/4 (4 points) (i) You are given evidence E that node X 2 observed signal 0 and node X 3 observed signal 1. Given this evidence, compute the conditional probability that the signal observed by the root X 1 is 1. 4

(4 points) (ii) You are given evidence E that node X 2 observed signal 0, and all other nodes X 3, X 4,... X 7 observed signal 1. Given this evidence, compute the conditional probability that the signal observed by root X 1 is 1. 5

(2 points) (iii) You are given evidence E that node X 1 observed signal 1. Given this evidence, compute the conditional probability that the signal observed by X 3 is 1. (4 points) (iv) You are given evidence E that node X 1 observed signal 1. Given this evidence, compute the conditional probability that the signal observed by X 7 is 1. 6

(6 points) (v) Given evidence E that node X 1 observed signal 1, let us define γ(n) to be the expected fraction of nodes in the network that observed signal 1 (i.e. X i = 1). Formally, we can state γ(n) by: γ(n) = 1 N N P (X i = 1 X 1 = 1) Compute γ(n) for the Bayesian network shown above, i.e. for N = 7. i=1 7

3. Variable Elimination (15 points) Considering the Bayesian network shown below, solve the following questions. A C D B F E G H I (4 points) (i) Find a minimal set S d of variables that must be observed so that d-sep(g; E S d ) holds. Minimal means that removing any element from S d will make the statement false. Note that S d can be if the property holds without any additional observations. Briefly explain your answer. We use d-sep(x; Y Z) to denote that X is d-separated from Y, given a set of observed variables Z. 8

(2 points) (ii) Determine a minimal set of edges that should be removed from the network to convert it into a polytree. Briefly explain why the resulting network is a polytree. (4 points) (iii) Consider the resulting polytree after removing the edges from the previous answer. Suggest an ordering for variable elimination which results in factors with the minimum size possible. Briefly explain how you selected that ordering. 9

(5 points) (iv) Consider the ordering from (iii). For the first four iterations of the variable elimination algorithm, determine which factors are removed and introduced. 10

4. Belief propagation (14 points) Consider the factor graph shown below and its associated tables. Answer the following questions. A B D φ 1 φ 2 E C A B E φ 1 (A, B, E) 0 0 0 0.25 0 0 1 0.00 0 1 0 0.75 0 1 1 1.00 1 0 0 0.00 1 0 1 0.75 1 1 0 1.00 1 1 1 0.25 B C D φ 2 (B, C, D) 0 0 0 0.9 0 0 1 0.7 0 1 0 0.1 0 1 1 0.3 1 0 0 1.0 1 0 1 0.5 1 1 0 0.0 1 1 1 0.5 (6 points) (i) Compute the first message µ (1) φ 1 B from factor node φ 1 to variable node B. Remember that the messages from variable nodes v to factor nodes u are initialized as: µ (0) v u = 1 11

(8 points) (ii) Compute the first message µ (1) φ 2 B from factor node φ 2 to variable node B, and use the result to compute the estimated marginal distribution ˆP (2) (B). 12

13

5. Inside an RL robot (18 points) Imagine that you are a robot. Your task is to press one of the two big red buttons, labeled Left and Right. Whenever you press a button, a big capital letter (A, B, or C) flashes up in front of you and, depending on which letter is shown, you get a certain amount of reward. After pressing the buttons at random for a while, you realize that there is an underlying pattern that determines the amount of reward that you obtain at each time step. You want to figure out which buttons you should press in order to maximize the future rewards. (9 points) (i) As a first step you want to model the system based on your past experience. Assume you have recorded the past data as tuples, (s, a, s, r), containing the state s before transition, action a taken (button pressed), state s after the transition, and the reward r received. Given the following observations, estimate the transition probabilities and state-dependent rewards and draw the corresponding MDP. A, Left, 5, A A, Right, 10, B B, Right, 0, C C, Right, 5, A A, Right, 5, A A, Right, 5, A A, Right, 5, A A, Right, 10, B B, Left, 10, B B, Left, 5, A A, Right, 10, B B, Right, 0, C C, Left, 5, A 14

(9 points) (ii) Given the model of the system it is now time to find the optimal policy. You decide to discount future rewards with a factor of γ = 0.5 and want to use policy iteration to find the optimal button to press. Recall that policy iteration starts with an arbitrary initial policy π. Until convergence, it iteratively computes the value function V π (x) for the current policy and then updates the current policy to be the greedy policy π g w.r.t the computed V π (x). The greedy policy for a value function is given by π g (x) = arg max a ( ) x P (x x, a) r(x, a, x ) + γv π (x ), and given a fixed policy π, the value function V π (x) satisfies the condition V π (x) = x P (x x, π(x))[r(x, π(x), x ) + γv π (x )]. Compute the optimal policy and its value function for the above MDP. [Hint: To save time, start with an initial guess for the optimal policy and prove that it is greedy w.r.t. the corresponding value function, i.e. policy iteration terminates] 15

16

6. Qmax (8 points) Consider a grid world, in which we want to learn a policy that maximizes future rewards. The world is modeled as a deterministic MDP that is initially unknown to the learner. At each state, there are two possible actions: Left and Right, which deterministically move the learner to the corresponding neighboring state (or have the learner stay in place, if the action tries to move out of the grid). At each state, an action receives the state-dependent reward shown in the grid-world below, irrespective of the new state after taking the action. 20 30 1 2 (8 points) (i) The learner uses optimistic Q-learning with a discount factor γ = 0.5 and learning-rate α = 0.5 in order to learn the optimal policy. Recall that optimistic Q-Learning initializes the reward estimates for each state-action pair to a high value (Q max = 100). At each iteration, it picks the action with the highest estimated Q-value from all the available actions in the current state. Ties are broken by picking actions in the order (Right, Left). Write down the first 5 state-action pairs that are taken by the learner when exploring the environment as well as the corresponding Q-value updates. The learner starts in field 1. 17

7. Hidden Markov Models (12 points) Consider the following Hidden Markov Model. In each time step, a coin is flipped, resulting in heads (h) or tails (t). There are two coins, one is biased (b) and the other one is fair (f). After each coin flip (indexed by i N), the coin that is used in the next time step changes with probability 3 4. The prior for the two coins at i = 1 is P (X 1 = b) = 3 5 and P (X 1 = f) = 2 5. The fair coin results in heads and tails with 1 2 probability each, whereas the biased coin results in heads with probability 4 5 and in tails with probability 1 5. This process is illustrated below: 4 5 h 1 2 1 4 biased 3 4 3 4 fair 1 4 1 5 t 1 2 (6 points) (i) Derive the probability lim P (X i = b), i.e. the probability of flipping the biased coin as i. i 18

(6 points) (ii) Derive the probability lim P (X i = b Y 1 = t), i.e. the probability of flipping the biased coin as i i, given that you observe tails at i = 1. 19