CS540 ANSWER SHEET

Similar documents
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Midterm: CS 6375 Spring 2015 Solutions

Midterm, Fall 2003

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

Final Examination CS 540-2: Introduction to Artificial Intelligence

Introduction to Machine Learning Midterm, Tues April 8

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

Artificial Neural Networks Examination, March 2004

Logic and machine learning review. CS 540 Yingyu Liang

Final Examination CS540-2: Introduction to Artificial Intelligence

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

Machine Learning Practice Page 2 of 2 10/28/13

Artificial Neural Networks Examination, June 2005

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Midterm 2 V1. Introduction to Artificial Intelligence. CS 188 Spring 2015

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Midterm Examination CS540: Introduction to Artificial Intelligence

Be able to define the following terms and answer basic questions about them:

Examination Artificial Intelligence Module Intelligent Interaction Design December 2014

CSCI-567: Machine Learning (Spring 2019)

Multiclass Classification-1

1 Machine Learning Concepts (16 points)

CSE 546 Final Exam, Autumn 2013

CSE 473: Artificial Intelligence Autumn Topics

CS145: INTRODUCTION TO DATA MINING

Qualifier: CS 6375 Machine Learning Spring 2015

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

CS6220: DATA MINING TECHNIQUES

Stochastic Gradient Descent

Machine Learning, Fall 2012 Homework 2

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Logistic Regression. Machine Learning Fall 2018

Be able to define the following terms and answer basic questions about them:

Input layer. Weight matrix [ ] Output layer

MIRA, SVM, k-nn. Lirong Xia

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

CS 188: Artificial Intelligence. Machine Learning

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

CS221 Practice Midterm

Generative Learning algorithms

Final Exam December 12, 2017

Introduction to Fall 2009 Artificial Intelligence Final Exam

CS6220: DATA MINING TECHNIQUES

MLPR: Logistic Regression and Neural Networks

Introduction to Machine Learning Midterm Exam

Name (NetID): (1 Point)

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Final Exam December 12, 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017

Gaussian and Linear Discriminant Analysis; Multiclass Classification

CS 570: Machine Learning Seminar. Fall 2016

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Randomized Decision Trees

Introduction to Machine Learning Midterm Exam Solutions

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Final Exam, Machine Learning, Spring 2009

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Artificial Neural Networks Examination, June 2004

Midterm: CS 6375 Spring 2018

Learning: Binary Perceptron. Examples: Perceptron. Separable Case. In the space of feature vectors

CSCE 750 Final Exam Answer Key Wednesday December 7, 2005

Midterm. You may use a calculator, but not any device that can access the Internet or store large amounts of data.

AI Programming CS F-20 Neural Networks

Undirected Graphical Models

9 Classification. 9.1 Linear Classifiers

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Incomplete version for students of easllc2012 only. 6.6 The Model Existence Game 99

1 The Basic Counting Principles

CS 188: Artificial Intelligence. Outline

Ad Placement Strategies

Final Exam, Spring 2006

CS 188 Fall Introduction to Artificial Intelligence Midterm 2

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Final Exam, Fall 2002

CS 188: Artificial Intelligence Spring Today

Machine Learning, Fall 2009: Midterm

ECE521 Lecture7. Logistic Regression

Linear Models for Classification

Midterm. Introduction to Artificial Intelligence. CS 188 Summer You have approximately 2 hours and 50 minutes.

Tufts COMP 135: Introduction to Machine Learning

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Naïve Bayes classification

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Machine Learning for NLP

COMS 4771 Introduction to Machine Learning. Nakul Verma

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Logistic Regression & Neural Networks

Transcription:

CS540 ANSWER SHEET Name Email 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 1

2

Final Examination CS540-1: Introduction to Artificial Intelligence Fall 2016 20 questions, 5 points each INSTRUCTIONS: Choose ONE answer per question. WRITE YOUR ANSWERS ON THE ANSWER SHEET. WE WILL NOT GRADE ANSWERS ON OTHER PAGES. ON THE ANSWER SHEET WRITE DOWN THE ANSWERS ONLY DO NOT INCLUDE INTERMEDIATE STEPS OR DERIVATIONS. BE SURE TO INCLUDE YOUR NAME AND EMAIL ON THE ANSWER SHEET, TOO. 1. Consider uninformed search over a tree, where the root is at level 0, each internal node has k children, and all leaves are at level D. There is a single goal state somewhere at level d, where 0 d D. When is DFS definitely going to goal-check more nodes than BFS? (A) always (B) never (C) when k > d (D) when k > D (E) none of the above Due to the ambiguity of natural language, we will accept both B and E: it depends on where the goal is on level d 2. Recall in uniform-cost search, each node has a path-cost from the initial node (sum of edge costs along the path), and the search expands the least path-cost node first. Consider a search graph with n > 1 nodes (n is an even number): 1, 2,..., n. For all 1 i < j n there is a directed edge from node i to node j with an edge cost j. The initial node is 1, and the goal node is n. How many goal-checks with uniform-cost search perform? (A) 2 (B) n/2 (C) n (D) n(n + 1)/2 (E) none of the above C: all nodes will be expanded 3. X, Y are both Boolean random variables taking value 0 or 1. If P (X = 1) = 1/2, P (Y = 1) = 1/3, P (X = 1 Y = 0) = 1/4, what is P (X = 1 Y = 1)? (A) 1/6 (B) 1/4 (C) 3/4 (D) 1 (E) undetermined because P (X = 1 Y = 0) and P (X = 1 Y = 1) do not sum to one. D: X=0 X=1 Y=0 1/2 1/6 2/3 Y=1 0 1/3 1/3 ----------- 0.5 0.5 p(x y) = p(x,y)/p(y) = p(y x)p(x)/p(y) = (1-p(!y x))p(x)/p(y) = (1-p(x y!)p(y!)/p( p(x=1 y=1) =p(y=1 x=1)p(x=1)/p(y=1) 3

=(1-p(y=0 x=1))p(x=1)/p(y=1) =(1-p(x=1 y=0)p(y=0)/p(x=1) )p(x=1)/p(y=1) =(1-1/4 * 2/3 *2) * 1/2 * 3 =(1-1/3)*3/2 = 2/3 * 3/2 = 1 4. Mary rolled a loaded six-sided die many times and recorded the number of outcomes: 1: zero times, 2: one time, 3: two times, 4: three times, 5: four times, 6: five times. Mary estimates the probability of each face with add-1 smoothing. What is her estimated probability for face 3? (A) 2/15 (B) 1/5 (C) 3/16 (D) 1/7 (E) none of the above D. (1+2)/(6+1+2+3+4+5) = 1/7 5. Consider a state space where each state is represented by a triple of integers (x 1, x 2, x 3 ), where x 1, x 2, x 3 {0, 1, 2,..., 10}. The neighbors of a state (x 1, x 2, x 3 ) are constructed as follows. Pick an i from 1,2,3 then change x i by one (add or subtract, as long as the resulting number is within the range). For example, (x 1, x 2 1, x 3 ) is a neighbor if x 2 1 0. Therefore, most states will have 6 neighbors. The score of a state is defined as f(x 1, x 2, x 3 ) = x 1 2x 2 4x 3. Starting from the state (5, 5, 5), perform simple hill climbing to maximize the score. Which state will hill climbing end up with? (A) (5,5,5) (B) (10,0,0) (C) (5,4,5) (D) (5,0,5) (E) none of the above B. This is the global maximum, and the 3D grid allows any starting point to greedily reach it 6. Consider a board with 4 positions in a row: o G. There is a piece o at position 1, and the goal is at position 4 (indicated by G ). Two players play in turns: A goes first, B goes second. Each player must move the piece one position to the left or right on the board. The player who moves the piece to G wins. Who will win, and after how many total moves from both players? (A) A, 3 (B) A, 5 (C) B, 4 (D) B, 7 (E) none of the above E. Infinite game 7. Perform iterated elimination of strictly dominated strategies. Player A s strategies are the rows. The two numbers are (A,B) s payoffs, respectively. Recall each player wants 0,2 3,1 2,3 to maximize their own payoff. 1,4 2,1 4,1 What is left over? 2,1 4,4 3,2 (A) (2,3) (B) (3,1) (C) (4,4) (D) multiple cells (E) none of the above D: after removing the first row, we don t have strict domination. 4

8. Consider Euclidean distance in R. There are three clusters: A = (0, 2, 6), B = (3, 9), C = (11). Which two clusters will single linkage and complete linkage merge next, respectively? (A) AB and AB (B) AB and AC (C) AB and BC (D) AC and AC (E) none of the above C. single: AB=1, AC=5, BC=2, so merge AB complete: AB=9, AC=11, BC=8, so merge BC 9. Perform k-means clustering on six points x i = i for i = 1... 6 with two clusters. Initially the cluster centers are at c 1 = 1, c 2 = 6. Run k-means until convergence. What is the reduction in distortion? (A) 2 (B) 4 (C) 6 (D) 8 (E) none of the above C. The clustering stays the same but the centers move. (1 2 3) (4 5 6) (0+1+4)*2=10 (1+0+1)*2=4 10. List English letters from A to Z. Define the distance between two letters in the natural way, that is, d(a,a)=0, d(a,b)=1, d(a,c)=2 and so on. Each letter has a label: vowel or consonant. For our purpose, A,E,I,O,U are vowels and others are consonants. This is your training data. Now classify each letter using knn for odd k = 1, 3, 5, 7,.... What is the smallest k where all letters are classified the same? (A) 1 (B) 3 (C) 5 (D) 25 (E) none of the above B: -BCD-FGH-JKLMN-PQRST-VWXYZ 11. Consider a training set with n items. Each item is represented by a d-dimensional feature vector (x 1,..., x d ), and a class label y. However, each feature dimension is continuous (i.e. it is a real number). To build a decision tree, one may ask questions of the form Is x 1 θ? where θ is a threshold value. Ideally, how many different θ values should we consider for the first dimension x 1? (A) Infinite, because x 1 is continuous (B) As many as floating point numbers representable by a computer (C) nd, all possible numbers in the training set (D) One, at the middle of x 1 s range (E) none of the above E: there are at most n distinct values that x 1 can take in the training set. These divide the infinite θs into at most n + 1 equivalence classes (each training item will answer all questions in an equivalence class in the same way). So we only needed to consider at most n + 1 values. 5

12. Consider three Boolean variables where R = (P Q) ( P Q). Suppose P has probability 1/2 to be true, and independently Q also has probability 1/2 to be true. What is the mutual information I(P ; R)? (A) 0 (B) 1/4 (C) 1/2 (D) 1 (E) none of the above A: R = P XOR Q 13. For simple linear regression with training data (x 1 = 5, y 1 = 3), (x 2 = 1, y 2 = 3), what is the coefficient of determination? (A) 0 (B) 1 (C) -3 (D) undefined (E) none of the above D: the least squares fit is a constant function y=-3 itself, so we get a 0/0. 14. Learning the w s in which of the following functions can be posed as linear regression? (A) f(x) = w 0 + w 1 x + w 2 x 2 + w 3 x 3 (B) f(x) = w 0 + w 1 sin(x) + w 2 cos(x) (C) f(x) = w 0 x w 1 (1 x) w 2 for x (0, 1) (D) all of A,B,C (E) none of A,B,C D. For (c) take the log on both sides so it s linear in w 1, w 2. Also we can do a change of variable on log(w 0 ) 15. Let f( d i=1 w ix i ) = 1/(1 +e d i=1 w ix i ) be a sigmoid perceptron with inputs x 1 =... = x d = 0 and weights w 1 =... = w d = 1. There is no constant bias input of one. If the desired output is y = 1/4, and the sigmoid perceptron update rule has a learning rate of α = 1, what will happen after one step of update? (A) x 1,..., x d will decrease (B) x 1,..., x d will increase (C) w 1,..., w d will decrease (D) w 1,..., w d will increase (E) none of the above E: w stays the same because the gradient is zero due to x s being zero 16. What is the minimum number of linear threshold units needed to implement the NOR function? Recall P NOR Q = (P Q) for Boolean inputs P, Q. (A) 1 (B) 2 (C) 3 (D) 4 (E) none of the above A: the decision boundary is linear in the (P,Q) space 17. You have a joint probability table over k random variables X 1,..., X k, where each variable takes m possible values: 1,..., m. To compute P (X 1 = m), how many cells in the table do you need to access? (A) mk (B) m k (C) k m (D) m! k!(k m)! E: m k 1 6 (E) none of the above

18. Consider a classification problem with 10 classes y {1, 2,..., 10}, and two binary features x 1, x 2 {0, 1}. Suppose p(y = y) = 1/10, p(x 1 = 1 Y = y) = y/10, p(x 2 = 1 Y = y) = y/540. Which class will naive Bayes classifier produce on a test item with (x 1 = 0, x 2 = 1)? (A) 1 (B) 3 (C) 5 (D) 8 (E) 10 C argmax y P (y x 1 = 0, x 2 = 1) (1) = argmax y P (x 1 = 0, x 2 = 1 y)p(y) (2) = argmax y P (x 1 = 0 y)p (x 2 = 1 y)p(y) (3) = argmax y (1 P (x 1 = 1 y))p (x 2 = 1 y)p(y) (4) = argmax y (1 y/10)y/540 (5) = argmax y (10 y)y (6) Now you could enumerate y to find the maximum. Or pretend y is continuous, taking derivative and set to zero: 10-2y=0 leads to y=5. 19. With three Boolean variables P, Q, R, under how many interpretations is (P Q R) (P P ) true? (A) 0 (B) 3 (C) 7 (D) 8 (E) none of the above D. Note P P is valid. 20. (α β) entails which sentence? (A) β α (B) α β (C) α β (D) all of A,B,C (E) none of A,B,C D. The interpretation is when α, β are both false, then all implications are true. 7