Introduction to Spring 2009 Artificial Intelligence Midterm Solutions

Size: px

Start display at page:

Download "Introduction to Spring 2009 Artificial Intelligence Midterm Solutions"

Jonathan Craig
5 years ago
Views:

1 S 88 Introduction to Spring 009 rtificial Intelligence Midterm Solutions. (6 points) True/False For the following questions, a correct answer is worth points, no answer is worth point, and an incorrect answer is worth 0 points. ircle true or false to indicate your answer. a) (true or false) If g(s) and h(s) are two admissible heuristics, then their average f(s) = g(s) + h(s) must also be admissible. True. Let h (s) be the true distance from s. We know that g(s) h (s) and h(s) h (s), thus f(s) = g(s) + h(s) h (s) + h (s) = h (s) b) (true or false) For a search problem, the path returned by uniform cost search may change if we add a positive constant to every step cost. True. onsider that there are two paths from the start state (S) to the goal (G), S G and S G. cost(s, ) =, cost(, G) =, and cost(s, G) =. So the optimal path is through. Now, if we add to each of the costs, the optimal path is directly from S to G. Since uniform cost search finds the optimal path, its path will change. c) (true or false) The running-time of an efficient solver for tree-structured constraint satisfaction problems is linear in the number of variables. True. The running time of the algorithm for tree-structured SPs is (n d ), where n is the number of variables and d is the maximum size of any variable s domain. d) (true or false) If h (s) is a consistent heuristic and h (s) is an admissible heuristic, then min(h (s), h (s)) must be consistent. False. For instance, if h (s) be admissible but inconsistent, and h (s) dominate h (s), then min(h (s), h (s)) = h (s), which is inconsistent. e) (true or false) The amount of memory required to run minimax with alpha-beta pruning is (b d ) for branching factor b and depth limit d. True and False (everyone wins). The memory required is only (bd), so we accepted False. However, by definition an algorithm that is (bd) is also (b d ), because denotes upper bounds that may or may not be tight, so technically this statement is True (but not very useful). f) (true or false) In a Markov decision process with discount γ =, the difference in values for two adjacent states is bounded by the reward between them: V (s) V (s ) max a R(s, a, s ). False. Let V (s ) = 0, and R(s, a, s ) = 0 a, but there is an action a which takes s to the terminal state T and gives a reward 00. Thus V (s) 00, but the inequality above says that V (s) 0 0. g) (true or false) Value iteration and policy iteration must always converge to the same policy. True and False (everyone wins). Both algorithms are guaranteed to converge to the optimal policy, so we accepted True. If there are multiple policies that are optimal (meaning they yield the same maximal values for every state), then the algorithms might diverge. Both value iteration and policy iteration will always lead to the same optimal values. h) (true or false) In a Bayes net, if B, then B for some variable other than or B. False. onsider the Bayes net B. learly, B, but B.

2 E! F! G. (5 points) Search: rawler s Escape G G Whilst Pacman was Q-learning, rawler snuck into mediumlassic and stole all the dots. Now, it s trying to escape as quickly as possible. t each time step, G rawler can either move its shoulder G or its elbow one position up or down. Both joints s and e have five total positions each ( through 5) and both begin in position. Upon changing arm positions from (s, e) to (s, e ), rawler s body moves some distance r(s, e, s, e ), where r(s, e, s, e ) meters (negative distance means the crawler moves backwards). rawler must travel 0 meters to reach the exit. N ction: increase elbow position exit r(,,, ) 0 meters In this problem, you will design a search problem for which the optimal solution will allow rawler to escape in as few time steps as possible. (a) ( pt) Define a state space, start state and goal test to represent this problem. state is a -tuple consisting of distance to exit, shoulder position, and elbow position. The start state is (0,, ). Goal test: distance to exit is less than or equal to zero. (b) ( pt) Define the successor function for the start state by listing its (successor state, action, step cost) triples. Use the actions s + and s for increasing and decreasing the shoulder position and e + and e for the elbow. { ((0 r(,,, ),, ), s +, ), ((0 r(,,, ),, ), s, ), ((0 r(,,, ),, ), e +, ), ((0 r(,,, ),, ), e, ) }

3 NME: (c) ( pt) Would depth-first graph search be complete for the search problem you defined? Explain. No, depth-first search would not be complete because the state space is infinite. Depth-first search could forever keep expanding states with increasing distance to the goal. (d) ( pt) Design a non-trivial, admissible heuristic for this problem. Distance to exit divided by is admissible because each step can move rawler at most meters closer to the goal. slightly better heuristic would be to round this heuristic up to the nearest integer. (e) ( pt) rawler s shoulder overheats if it switches direction more than once in any three consecutive time steps, making progress impossible. Describe how you would change your search problem so that an optimal search algorithm would only return solutions that avoid overheating. onsider a reversal to be s + followed by s or s followed by s +. To avoid these, we can extend the state space to include the last two actions. We then remove from the output of the successor function any triple that includes the action sequences (s +, s, s + ) or (s, s +, s ). We also accepted solutions that gave a very high step cost to these successors.

4 . (0 points) SPs: onstraining Graph Structure Hired as a consultant, you built a Bayes net model of heeseboard Pizza customers. Last night, a corporate spy from Zachary s Pizza deleted the directions from! all your arcs.!! E You still have the undirected n! graph of your! E! E n-! model and a list of dependencies. You now have to recover the direction of the arrows to build a graph that allows for these dependencies. Note: Y means is not independent of Y. E! B Distribution properties (dependencies): G!!! n! E! E n-! B F! E! G F! G B F B G a) ( pt) Given the first constraint only, circle all the topologies that are allowed for the triple (,, G). G G G G P! T! P! G G r(s 0, e 0, s 0, e 0 +) G G b) ( pt) Formulate direction-recovery for this graph as a SP with explicit binary constraints only. The D variables and values are provided for you. The variable E stands for the direction of the arc between N ction: increase and the center, where out means the arrow points outward elbow position (toward ), and in means the arrow points T! inward (toward ). Variables: E, E B, E F, E G r(,,, ) Values: out, in 0 meters onstraints: n explicit constraint N lists all legal tuples of values are listed for a tuple of variables. (E, E G ) {(in, out), (out, out), (out, in)} (E B, E F ) {(in, out), (out, out), (out, in)} (E B, E G ) {(in, in)} exit 0 meters c) ( pt) Draw the constraint graph for this SP. E E G E B E F d) ( pt) fter selecting E = in, cross out all values eliminated from E B, E F and E G by forward checking. E E B E F E G in out in out in out in Forward checking removes the values of variables that directly contradict the assignment E = in. e) ( pt) ross out all values eliminated by arc consistency applied before any backtracking search. E E B E F E G out in out in out in out in We first remove E B = out because it is incompatible with any value for E G. Likewise, we remove E G = out. Now, we can remove E = in based on E G and E F = in based on E B. f) ( pt) Solve this SP, then add the correct directions to the arcs of the graph at the top of the page. E = out, E B = in, E F = out, and E G = in. exit

5 NME: 5 Now consider a chain structured graph over variables,..., n. Edges E,..., E n connect adjacent variables, but their directions are again unknown. B D!!! E n!! E! E n-! g) ( pt) Using only binary constraints and two-valued variables, formulate a SP that is satisfied by only and all networks that can represent a distribution where n. Describe what your variables mean in terms of direction of the arrows in the network. Variables: Variables E,..., E n correspond to the directions of these edges and take values {left, right}. onstraints: (E i, E i+ ) (right, left), for i {0,..., n }. h) (6 pt) Using only unary, binary andternary ( variable) constraints andtwo-valued variables, formulate a SP that is satisfied by only and all networks that enforce n. Describe what your variables mean in terms of direction of the arrows in the network. Hint: You will have to introduce additional variables. Variables: Variables E,..., E n correspond to the directions of these edges and take values {left, right}. I,..., I n can take values {T, F }.,..., n can take values {T, F }. onstraints: Intuitively, we want I i to be T if i i+ i+, i.e. the triple i, i+, i+ is inactive. lso, we want i to be T if either I i or i is T, i.e. if the current triple is inactive or if any of the prior triples was inactive. Finally, we want n to be T which would imply that there is some inactive triple. (I i, E i, E i+ ) {(T, right, left), (F, right, right), (F, left, right), (F, left, left)} for i {,..., n }. (I, ) {(T, T ), (F, F )} ( i, I i, i ) {(F, F, F ), (F, T, T ), (T, F, T ), (T, T, T )} for i {,..., n } n = true

6 6. (0 points) Multi-agent Search: onnect- In the game of onnect-, players and alternate moves, dropping their symbols into columns,,, or. Three-in-a-row wins the game, horizontally, vertically or diagonally. plays first. (a) ( pt) What is the maximum branching factor of minimax search for onnect-?. (b) ( pt) What is the maximum tree depth in plies? 6. ccording to our terminology, a search ply in a multi-agent search problem includes one move for every agent. Because the textbook differs in its definition, we also accepted. (c) ( pt) Give a reasonably tight upper bound on the number of terminal states. Several answers were accepted: is a reasonable bound on the number of leaves of the search tree. is a reasonable bound on the number of total states in the game, because each square can be empty, contain an or contain an. tighter bound is 5, which follows from the observation that there are only 5 ways to legally fill a column from the bottom up with s and s. Bounds based on the assumption that the entire board would be filled with s and s were not accepted, even if they in fact overestimated the true number of terminal states. (d) ( pt) Draw the game tree starting from the board shown above right (with to play next). You may abbreviate states by drawing only the upper-right region circled. The root node is drawn for you.!"#!"# $#!"# $#!%#!"#!%# (e) ( pt) is the maximizer, while is the minimizer. s utility for terminal states is k when wins and k when wins, where k is for a horizontal -in-a-row win (as in above left), for a vertical win, and for a diagonal win. tie has value 0. Label each node of your tree with its minimax value. (f) ( pt) ircle all nodes of your tree that will not be explored when using alpha-beta pruning and a move ordering that maximizes the number of nodes pruned.

7 NME: 7 5. ( points) MDPs: Robot Soccer soccer robot is on a fast break toward the goal, starting in position. From positions through, it can either shoot (S) or dribble the ball forward (D); from it can only shoot. If it shoots, it either scores a goal (state G) or misses (state M). If it dribbles, it either advances a square or loses the ball, ending up in M. D Goal In this MDP, the states are,,,, G and M, where G and M are terminal states. The transition model depends on the parameter y, which is the probability of dribbling success. ssume a discount of γ =. T (k, S, G) = k 6 T (k, S, M) = k 6 for k {,,, } T (k, D, k + ) = y T (k, D, M) = y for k {,, } R(k, S, G) = for k {,,, }, and rewards are 0 for all other transitions (a) ( pt) What is V π () for the policy π that always shoots? V π () = T (, S, G)R(, S, G) + T (, S, M)R(, S, M) = 6 (b) ( pt) What is Q (, D) in terms of y? Q (, D) = T (, D, )(R(, D, ) + V ()) + T (, D, M)R(, D, M) = T (, D, )V () = T (, D, )Q (, S) = T (, D, )(T (, S, G)R(, S, G) + T (, S, M)R(, S, m)) = T (, D, )T (, S, G)R(, S, G) = y (c) ( pt) Using y =, complete the first two iterations of value iteration. i Vi () V i () V i () V i () (d) ( pt) fter how many iterations will value iteration compute the optimal values for all states? fter iterations, the values will have converged when y =. bove, only V () has not yet converged. We note that for y >, a fourth iteration would be required because a fast break has up to four transitions. (e) ( pt) For what range of values of y is Q (, S) Q (, D)? Q (, S) Q (, D) T (, S, G) T (, D, ) T (, S, G) y y 0

8 8 The dribble success probability y in fact depend on the presence or absence of a defending robot, D. has no way of detecting whether D is present, but does know some statistical properties of its environment. D is present of the time. When D is absent, y =. When D is present, y =. (f) ( pt) What is the posterior probability that D is present, given that Dribbles twice successfully from to, then Shoots from state and scores. We can use Bayes rule, where D is a random variable denoting the presence of D, and e is the evidence that dribbled twice and scored. P (d e) = P (e d) P (d) P (e) P (e) = P (e d) P (d) + P (e d) P ( d) P (e d) = P (e d) = P (e) = + 9 = 96 P (d e) = 96 / 96 = (g) ( pt) What transition model should use in order to correctly compute its maximum expected reward when it doesn t know whether or not D is present? To maximize expected total reward, the agent should model the situation as accurately as possible. We accepted two answers for T (k, D, k + ). ne answer is to claim that the agent should use its marginal belief that each dribble will be successful, summing over the two possibilities of D s presence or absense. In this case, T (k, D, k + ) = + = 5 7, and T (k, D, M) =. nother acceptable answer would be to update the beliefs of the agent after every successful dribble. Hence, while T (, D, ) = 5 as above, reaching state implies a successful previous dribble, and so another successful dribble is more likely. In particular, let, and indicate successful first, second and third dribbles, while D is the presence of the defender. Since D, we have P (x x ) = d P (d x )P (x d). omputing similarly to part (f), P (d x ) = 5, so T (, D, ) = = 0. We can compute T (, D, ) similarly. is conditionally independent of both and given D, so P (x x, x ) = d P (d x, x )P (x d). Using our result from (f), P (d x, x ) =, and so T (, D, ) = + 9 = 9. Since a dribble either succeeds or fails, T (k, D, M) = T (k, D, k + ) for all k. Shooting probabilities are unchanged by the presence of the defender. T (k, S, G) = k 6 T (k, S, M) = k 6 for k {,,, } T (k, D, k + ) = 5 T (k, D, M) = 7 for k {,, } R T (k, S, G) = k 6 T (k, S, M) = k 6 for k {,,, } T (, D, ) = 5 T (, D, ) = 0 T (, D, ) = 9 T (, D, M) = 7 T (, D, M) = 9 0 T (, D, M) = 5

9 NME: 9 (h) ( pt) What is the optimal policy π when doesn t know whether or not D is present? Using T (k, D, k + ) = 5, we find that π = { : S, : S, : S, : S}. We showed from part (e) that for any y <, shooting is preferable to dribbling from state. Therefore, we know that V () = Q (, S). We can perform similar computations for states and : V () = Q (, S) = π () = S Q (, S) = Q (, D) = 5 V () = 5 V () = max(q (, S), Q (, D)) = π () = S Q (, S) = 6 Q (, D) = 5 V () = 5 6 V () = max(q (, S), Q (, D)) = 6 π () = S Under the second answer for part (g), similar computations give π = { : S, : S, : S, : S}. V () = Q (, S) = Q (, D) = 9 V () = V () = max(q (, S), Q (, D)) = π () = S Q (, S) = Q (, D) = 0 V () = V () = max(q (, S), Q (, D)) = π () = S Q (, S) = 6 Q (, D) = 5 V () = 5 6 V () = max(q (, S), Q (, D)) = 6 π () = S

10 B 0 D 6. (8 points) Bayes Nets: The Mind of Pacman Pacman doesn t just eat dots. He also enjoys pears, apples, carrots, nuts, eggs and toast. Each morning, he chooses to eat some subset of these foods. His preferences are modeled by a Bayes net with this structure. E P! T! N a) ( pt) Factor the probability that Pacman chooses to eat only apples and nuts in terms of conditional probabilities from this Bayes net. P ( p, a, c, n, e, t) = P (a) P ( p a) P ( c a) P (n p, c) P ( t c) P ( e n, t) b) ( pt) For each of the following properties, circle whether they are true, false or unknown of the distribution P (P,,, N, E, T ) for this Bayes net. Without the conditional probability tables, we do not know if any two variables are dependent. From the network topology, we can only conclude true or unknown for the questions below. N T (true, false, unknown) unknown: T N is an active path. P E N (true, false, unknown) unknown: P T E is an active path. P N, (true, false, unknown) unknown: P and N are directly connected. E, N (true, false, unknown) true: ll three paths from E to are inactive. c) ( pt) If the arrow from T to E were reversed, list two conditional independence properties that would be true under the new graph that were not guaranteed under the original graph. The triple T E is no longer active when T is unknown, therefore P E N and E N and E N The triple N E T is no longer active when E is known, therefore T N, E and T P, E and T, E There are further independence assumptions based on these 6 that condition on more variables, such as E N, P. d) ( pt) You discover that P (a c, t) > P (a c). Suggest how you might change the network in response. This inequality implies that T by definition, but in the network above, and T are conditionally independent given. So, the network must be changed to allow for this dependence. dding an arc from to T would fix the problem. Note: adding an arc from T to creates a cycle, which is not allowed in a Bayes net. ther acceptable solutions exist, like reversing the direction of the arc from T to.

11 NME: e) ( pt) You now want to compute the distribution P () using variable elimination. List the factors that remain before and after eliminating the variable N. Before: The initial factors are just the conditional probability tables of the Bayes net. P (), P (P ), P ( ), P (T ), P (N P, ), P (E T, N) fter: First, all factors that include the variable N are joined together, yielding P (E, N P,, T ) Next, the variable N is summed out of this new factor, yielding P (E P,, T ) The remaining factors include this new factor and the unused original factors. P (), P (P ), P ( ), P (T ), P (E P,, T ) Referring to the final factor as m(e, P,, T ) (like in the textbook) was also accepted. f) ( pt) Pacman s new diet allows only fruit (P and ) to be eaten, but Pacman only follows the diet occasionally. dd the new variable D (for whether he follows the diet) to the network below by adding arcs. Briefly justify your answer. P! P! D D T! N T! N The best answer is the one on the left, which allows us to express changes to all food preferences based on the diet. ther answers were accepted with appropriate justification, such as the network on the right, which specifies that the diet specifically disallows, N, E and T. onnecting D only to P and/or is problematic: for instance, observing P and would make D independent of, but the diet should still affect whether carrots () are allowed even when and P are known. g) ( pt) Given the Bayes net and the factors below, fill in the table for P (D a) or state that there is not enough information. From the tables P (D) and P ( D), the entries of the factor P (D = a) can be computed from Bayes rule (which always holds): no additional information about the properties of the distributions are required. The trick is to observe that while P ( a) is not explicitly given, it can be computed from P (D) and P ( D) using the product rule: a,d : P (a, d) = P (d)p (a d). omputing P ( a) in this way is equivalent to applying the normalization trick. D P (D) d 0. d 0.6 D P ( D) d a 0.8 d a 0. d a 0.5 d a 0.5 D P (D = a) P ( a d) P (d) d a P ( a d) P ( d)+p ( a d) P (d) = = 9 P ( a d) P ( d) d a P ( a d) P ( d)+p ( a d) P (d) = = 5 9

Introduction to Spring 2009 Artificial Intelligence Midterm Exam

Introduction to Spring 2009 Artificial Intelligence Midterm Exam S 188 Introduction to Spring 009 rtificial Intelligence Midterm Exam INSTRUTINS You have 3 hours. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators