CS 4700: Foundations of Artificial Intelligence Ungraded Homework Solutions

Size: px

Start display at page:

Download "CS 4700: Foundations of Artificial Intelligence Ungraded Homework Solutions"

Madeleine Scott
5 years ago
Views:

1 CS 4700: Foundations of Artificial Intelligence Ungraded Homework Solutions 1. Neural Networks: a. There are 2 2n distinct Boolean functions over n inputs. Thus there are 16 distinct Boolean functions over 2 inputs. How many of these are representable by a perceptron? There are 14. There are four combinations of inputs, {(0,0), (0,1), (1,0), (1,1)}. For each of the four there is a perceptron that labels only that point as 1 and labels the others 0. There is similarly one that labels just that item as 0 and the others 1. This totals 8. Next, there s a perceptron that ignores input 2 and labels as 1 both points for which input 1 equals 1 and labels the other two points as 0, and there is also one that labels those points in reverse. That s 2 more. Finally there s an identical pair of perceptrons that ignore input 1 and does the same as the previous case, only this time using input 2. Finally, there are also the trivial cases of all data points being labeled 1, or all being labeled 0, which are also representable as linear thresholds for which all points are on the same desired side of a linear surface. b. Imagine you have a single-node neural network with two inputs whose activation function is the logistic function, and with only two data points ((0,0),1) and ((1,1),0). If all weights are initially set to 0, the learning rate is set to 1, and you make two passes through the data, give the values of the weights after each update. c. Textbook, problem 18.16a Example w 0 w 1 w 2 ((0,0),1) ((1,1),0) ((0,0),1) ((1,1),0) The points inside the circle are those for which (x 1 a) 2 +(x 2 b) 2 r 2, or equivalently r 2 (x 1 a) 2 (x 2 b) 2 0. If you expand out (x 1 a) 2 and (x 2 b) 2 and reorder the terms this becomes x 1 2 x ax 1 + 2bx 2 + (r 2 a 2 b 2 ) 0. If we let y 1 = x 1 2, y 2 = x 2 2, y 3 = x 1, and y 4 = x 2 this becomes y 1 y 2 + 2ay 3 + 2by 4 + (r 2 a 2 b 2 ) 0. Thus if each two-dimensional data point (x 1, x 2 ) were transformed into a new four-dimensional data point whose values are (x 1 2, x 2 2, x 1, x 2 ) you could classify the points using a perceptron whose weights are ( 1, 1,2a, 2b) and whose weight on x 0 is (r 2 a 2 b 2 ). d. Textbook, problem 18.22a If neuron k is in the output layer, its output is g( j connected to k w j,k a j ) = c j w j,k a j. (Henceforth to simplify notation the summations will always be over all nodes that feed into the node in question. The right-hand side of the preceding equation, for example, sums over all j feeding into k.) Each a j in that equation is itself a j = g( i w i,j x i ) =

2 2. Clustering: c i w i,j x i. Plugging this in for each that each a j in the formula for output-layer neuron k gives c j w j,k (c i w i,j x i ). Rearranging the terms gives us c i c( i w i,j w j,k )x i. This is itself representable with a single neuron using the given activation function, but where the the weights woud be w i = c i w i,j w j,k (where w i,j and w j,k are the weights in the original network). e. Textbook, problem The error function on a set of data D is err(d) = m i=1 (y i a i ) 2. Here all the inputs are identical, so all the a i are identical we ll use the letter a to represent this value. Further, m = 100, where in 80 cases y i = 1 and in 20 cases y i = 0, meaning err(d) = 80(1 a) (0 a) 2. This function is minimized when a = 0.8, and thus backprop will attempt to find weights that yield this output. a. Give an example where k-means clustering forms different clusters depending on which data points are randomly chosen as the initial centroids for each cluster. Your example should not rely on how you break ties if you have multiple points that are the same distant from multiple centroids. (Or, stated differently, you can make this happen even if the distance between every pair of points is distinct.) Consider the following points: A: (0,2) B: (2,0) C: (3,0) D: (4,2) If you do k-means clustering with k=2 and start with B and D as the centroids you form a cluster containing A, B, and C and a cluster containing D. If you start with A and C one contains A and the other B, C, and D. 3. MDPs and reinforcement learning: a. There are N cities along a major highway that forms a big loop. The cities are numbered 1 through N. If you go clockwise from city i you get to city i+1 and if you go counterclockwise you get to city i-1, except that counter-clockwise from city 1 puts you in city N and clockwise from city N puts you in city 1. You start in city 1. Each day you can either - do the STAY action, in which case you ll be in that city the following day - do the CLOCKWISE or COUNTER-CLOCKWISE actions, in which case with probability pi you ll wind up in the next city in the selected direction the next day, but with probability (1-pi) your car won t start in which case the city you re in will be unchanged the next day. All even numbered cities give reward ri = 1, and all odd numbered cities give reward ri=0, if you are in that city at the start of a given day. i. If for all cities pi = 1 and the discount factor γ = 0.5: 1. What is the value of U π (1) for the policy π that says to execute STAY in all states?

3 [ t R(1)] = [½ t 0] = 0 2. What is the value of U π (N) for this policy? If N is odd, 0, the same as for state 1. If N is even, the calculation is as follows. [ t R(N)] = [½ t 1] = 2 3. What is the optimal value U(1) for city 1? What policy does it imply? 1.0. The policy would say to go CLOCKWISE, putting you in state 2, with value U(2)=2. (The optimal policy for all even-numbered states is to STAY, giving the U value of 2, as in part 2 above. You then get U(1) = R(1) + γ P(s 1, CLOCKWISE)U(s ) s [1:N] = U(2) = 1.0 What is the optimal value U(N) for city N? What policy does it imply? If N is even, 2.0. The corresponding policy is to always do the STAY action. If N is odd, 1.0. The policy would say to go COUNTER-CLOCKWISE to get to N-1, which is even and has value 2.0. The rest of the analysis is the same as the case for state 1 that immediately precedes this. ii. If N=3: 1. Show the values of U for the first two iterations of value iteration. Here are the values for the first 10 iterations. (After 16 iterations the values are 1.000, 2.000, and ) Iteration U(1) U(2) U(3)

4 2. Show the values of π for the first two iterations of policy iteration, assuming that for each state π is initially set to CLOCKWISE. Iteration π(1) π(2) π(3) CLOCKWISE CLOCKWISE CLOCKWISE 1 CLOCKWISE STAY COUNTER 2 CLOCKWISE STAY COUNTER The corresponding values for U are: U(1) U(2) U(3) b. Consider a problem with 3 states A, B, and C, where A is the initial state, and where you have two actions, X and Y, that you can do in each state. Imagine we use Q-learning on this problem with N=6, learning rate α = 0.5, and R MAX = 10 (using the f function given in class). Because, for some initial period of time, n<n each time f is called, at the last step of each Q-Learning call many different actions will be tied with value R MAX (where we assume ties are broken at random). Imagine that as a result of executing the Q-Learning procedure you proceed through the following sets of states/actions/rewards: State Reward Action Resulting State A 3 X B B 2 X C C 0 Y B B 2 X C C 1 X B i. What are the final values of Q(s,a) after executing the updates at each step (there should be 5 updates)? State Reward Action Result Q(A,X) Q(A,Y) Q(B,X) Q(B,Y) Q(C,X) Q(C,Y) A 3 X B B 2 X C C 0 Y B B 2 X C C 1 X B

5 ii. Given these Q values, what policy would they yield for the 3 states? π(a) = X π(b)=x π(c)=x 4. Search: a. Agents A and B are locations on an NxN grid. They both know where each of them is on the grid. They each have at most five actions they can take, UP, DOWN, LEFT, RIGHT, and STOP, where each action does what you would expect it to do. If any of these actions would cause them to hit a wall then that action is not permissible in that location. They need to wind up in the same location, they don t care where. They both take actions simultaneously they do not alternate turns. The problem is to come up with a sequence of actions for the two agents that is as short as possible. i. Formally state this as a (single agent) state space search problem. Each state is a four-tuple. Two of the elements are the x and y coordinates for one agent, the other two are the x and y coordinates for the other agent. Actions are all pairs of original actions, one to be applied to one agent, the other to the second agent. The final state has the two agents at the same coordinates. ii. Give a (non-trivial) admissible heuristic for this problem. h((x 1, x 2, y 1, y 2 )) = (distance between x and y)/2. iii. Which of the following are guaranteed to find a shortest solution? 1. Depth-first search 2. Breadth-first search 3. Hillclimbing using an admissible heuristic 4. A* with a heuristic that returns 0 for all states

CS221 Practice Midterm

CS221 Practice Midterm Autumn 2012 1 ther Midterms The following pages are excerpts from similar classes midterms. The content is similar to what we ve been covering this quarter, so that it should be