MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Similar documents
Midterm: CS 6375 Spring 2018

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Midterm: CS 6375 Spring 2015 Solutions

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Qualifier: CS 6375 Machine Learning Spring 2015

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm, Fall 2003

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Stochastic Gradient Descent

ECE521 Lectures 9 Fully Connected Neural Networks

The exam is closed book, closed notes except your one-page cheat sheet.

Name (NetID): (1 Point)

Qualifying Exam in Machine Learning

Mining Classification Knowledge

Midterm Exam, Spring 2005

Final Exam, Fall 2002

Machine Learning Practice Page 2 of 2 10/28/13

Final Exam, Machine Learning, Spring 2009

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Introduction to Machine Learning Midterm Exam

Learning from Examples

Linear & nonlinear classifiers

6.036: Midterm, Spring Solutions

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Supervised Learning. George Konidaris

Machine Learning, Fall 2009: Midterm

Machine Learning (CS 567) Lecture 3

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CS 6375 Machine Learning

1 Machine Learning Concepts (16 points)

Nonlinear Classification

Department of Computer and Information Science and Engineering. CAP4770/CAP5771 Fall Midterm Exam. Instructor: Prof.

10-701/ Machine Learning - Midterm Exam, Fall 2010

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Multilayer Perceptron

Introduction to Machine Learning Midterm, Tues April 8

CS540 ANSWER SHEET

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

6.036 midterm review. Wednesday, March 18, 15

CS Machine Learning Qualifying Exam

CSE 546 Final Exam, Autumn 2013

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Ad Placement Strategies

Midterm exam CS 189/289, Fall 2015

Logistic Regression & Neural Networks

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Final Examination CS 540-2: Introduction to Artificial Intelligence

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

The Perceptron Algorithm 1

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Machine Learning (CS 567) Lecture 5

Lecture 5: Logistic Regression. Neural Networks

Name (NetID): (1 Point)

Linear Classifiers: Expressiveness

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Introduction to Machine Learning Midterm Exam Solutions

Final Exam, Spring 2006

AE = q < H(p < ) + (1 q < )H(p > ) H(p) = p lg(p) (1 p) lg(1 p)

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Week 5: Logistic Regression & Neural Networks

In the Name of God. Lecture 11: Single Layer Perceptrons

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Machine Learning for NLP

Be able to define the following terms and answer basic questions about them:

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Machine Learning

Logistic Regression. Machine Learning Fall 2018

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

Numerical Learning Algorithms

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Neural Networks and the Back-propagation Algorithm

Neural networks. Chapter 20, Section 5 1

Mining Classification Knowledge

Logistic Regression. COMP 527 Danushka Bollegala

Neural networks. Chapter 20. Chapter 20 1

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

CS534 Machine Learning - Spring Final Exam

The Perceptron Algorithm

Machine Learning (CSE 446): Neural Networks

Machine Learning, Midterm Exam

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

ECE521 week 3: 23/26 January 2017

Machine Learning

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

Machine Learning Basics

Final Examination CS540-2: Introduction to Artificial Intelligence

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

PAC-learning, VC Dimension and Margin-based Bounds

Machine Learning

Revision: Neural Network

Neural networks. Chapter 19, Sections 1 5 1

Neural Networks Task Sheet 2. Due date: May

y(x n, w) t n 2. (1)

Transcription:

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional sheet (available from the instructor) and staple it to your exam. NAME UTD-ID if known SECTION 1: SECTION 2: SECTION 3: SECTION 4: SECTION 5: Out of 90: 1

CS 6375 FALL 2013 Midterm, Page 2 of 13 October 23, 2013

CS 6375 FALL 2013 Midterm, Page 3 of 13 October 23, 2013 SECTION 1: SHORT QUESTIONS (15 points) 1. (3 points) The Naive Bayes classifier uses the maximum a posteriori or the MAP decision rule for classification. True or False. Explain. 2. (6 points) Let θ be the probability that Thumbtack 1 (we will abbreviate it as T1) shows heads and 2θ be the probability that Thumbtack 2 (we will abbreviate it as T2) shows heads. You are given the following Dataset (6 examples). T1 T2 T1 T1 T2 T2 Tails Heads Tails Tails Heads Heads What is the likelihood of the data given θ. What is the maximum likelihood estimate of θ. (Hint: Derivative of log(x) is 1/x).

CS 6375 FALL 2013 Midterm, Page 4 of 13 October 23, 2013 3. (2 points) If the data is not linearly separable, then the gradient descent algorithm for training a logistic regression classifier will never converge. True or False. Explain your answer. 4. (2 points) If the data is linearly separable, then the 3-nearest neighbors algorithm will always have 100% accuracy on the training set. True or False. Explain your answer. 5. (2 points) The decision tree classifier has 100% accuracy on the training set (namely, the data is noise-free). Will logistic regression have the same accuracy (100%) on the training set?

CS 6375 FALL 2013 Midterm, Page 5 of 13 October 23, 2013 SECTION 2: Decision Trees (20 points) 1. (4 points) Let x be a vector of n Boolean variables and let k be an integer less than n. Let f k be a target concept which is a disjunction consisting of k literals. (Let the n variables be denoted by the set {X 1,..., X n }. Examples of f 2 : X 1 X 2, X 1 X 4, etc. Examples of f 3 : X 1 X 2 X 10, X 1 X 4 X 7, etc.) State the size of the smallest possible consistent decision tree (namely a decision tree that correctly classifies all possible examples) for f k in terms of n and k and describe its shape. We wish to learn a decision tree to help students pick restaurants using three aspects the price, the location of the restaurant and the speed of service. The data for training the tree is given below, where the target concept is the column labeled Like? 2. (4 points) What is the entropy of the collection of examples with respect to the target label (Like?)?

CS 6375 FALL 2013 Midterm, Page 6 of 13 October 23, 2013 SECTION 2: Decision Trees (20 points) Continued I have copied the dataset from the previous page to this page for convenience. 3. (4 points) Compute the information gain of the attribute Price. 4. (4 points) Compute the information gain of the attribute Fast?

CS 6375 FALL 2013 Midterm, Page 7 of 13 October 23, 2013 SECTION 2: Decision Trees (20 points) Continued I have copied the dataset from the previous page to this page for convenience. 5. (4 points) Choose Price as the root node for the decision tree. With this root node, write down a decision tree that is consistent with the data (namely a decision tree that correctly classifies all the training examples). You do not need to learn the decision tree; just make sure it is consistent with the data.

CS 6375 FALL 2013 Midterm, Page 8 of 13 October 23, 2013 SECTION 3: GRADIENT DESCENT (20 points) Suppose that the CEO of some startup company calls you and tells you that he sells d items on his web site. He has access to ratings given by n users, each of whom have rated a subset of the the d items. Assume that each rating is a real-number. Your task is to estimate the missing ratings. (Think of the data as n by d matrix in which some entries are missing and your task is to estimate the missing entries). Being a good machine learner, you have come up with the following model: where x i,j = a i + b j + u i v j x i,j is the rating of the j-th product by user i a i is the base rating for the user b j is the base rating for each item u i and v j is some co-efficient for each user and each dimension respectively Notice that your model has 2(n + d) parameters. 1. (5 points) Set up the machine learning problem of estimating the 2(n + d) parameters as an optimization task of minimizing the squared error. Write down the formula for the squared error and the optimization task. 2. (5 points) Compute the gradient of the error function w.r.t. a i, b j, u i and v j.

CS 6375 FALL 2013 Midterm, Page 9 of 13 October 23, 2013 SECTION 3: GRADIENT DESCENT (20 points) CONTINUED 3. (5 points) Give the pseudo-code for Batch Gradient Descent for this problem. 4. (5 points) Will you use batch gradient descent or incremental (or stochastic or online) gradient descent for this task. How will the data size measured in terms of n, d and number of missing entries (let us denote these by m) affect your choice?

CS 6375 FALL 2013 Midterm, Page 10 of 13 October 23, 2013 SECTION 4: NEURAL NETWORKS AND PERCEPTRONS (25 POINTS) Consider the neural network given below: Assume that all internal nodes compute the sigmoid function. Write an explicit expression to how back propagation (applied to minimize the least squares error function) changes the values of w 1, w 2, w 3, w 4 and w 5 when the algorithm is given the example x 1 = 0, x 2 = 1, with the desired response y = 0 (notice that x 0 = 1 is the bias term). Assume that the learning rate is α and that the current values of the weights are: w 1 = 3, w 2 = 2, w 3 = 2, w 4 = 3 and w 5 = 2. Let o j be the output of the hidden units and output units indexed by j. 1. (3 points) Forward propagation. Write equations for o 1, o 2 and o 3 in terms of the given weights and example. 2. (6 points) Backward propagation. Write equations for δ 1, δ 2 and δ 3 in terms of the given weights and example.

CS 6375 FALL 2013 Midterm, Page 11 of 13 October 23, 2013 3. (5 points) Give an explicit expression for the new weights. 4. (5 points) Draw a neural network that represents the following Boolean function: (X 1 X 2 ) ( X 1 X 2 )

CS 6375 FALL 2013 Midterm, Page 12 of 13 October 23, 2013 SECTION 4: NEURAL NETWORKS AND PERCEPTRONS (25 POINTS) Continued Consider the neural network given below. Assume that each unit j is a linear unit and its output is of the form o j = C n i=1 w i x i where x i s are the inputs to the unit and w i is the weight on the corresponding edge. For example: o 2 = C(w 1 x 1 + w 2 x 2 ). 5. (6 points) Can any function that is represented by the above network also be represented by a Perceptron having a single linear unit of the form o = P i w ix i where x i are the inputs, w i are the weights attached to the edges and P is a constant. If your answer is yes, then draw the Perceptron detailing the weights and the value for P (i.e., express the weights, lets say v 1 and v 2 as well as P of the Perceptron in terms of w 1,w 2,w 3,w 4, w 5, w 6, and C.) If your answer is no, explain why not?

CS 6375 FALL 2013 Midterm, Page 13 of 13 October 23, 2013 SECTION 5: K nearest neighbors (10 points) Let us employ the leave one out cross validation approach to choose k in k nearest neighbors. As the name suggests, it involves using a single example from the training data as the test (or validation) data, and the remaining examples as the training data. This is repeated such that each example in the training data is used once as the test data. Formally, given a classifier C, the approach computes the Error of C as follows. Let d be the number of training examples and let us assume that the examples are indexed from 1 to d. Error = 0 For i = 1 to d Remove the i-th example from the training set and use it as your test set Train the classifier C on the new training set If the test example is incorrectly classified by C then Error = Error + 1 Put back the i-th example in the training set Return Error To select the best classifier among a group of classifiers, you apply the leave-one-out cross validation approach to all the classifiers and choose the one that has the smallest error. Dataset: X -0.1 0.7 1.0 1.6 2.0 2.5 3.2 3.5 4.1 4.9 Class - + + - + + - - + + 1. (4 points) What is the leave-one-out cross validation error of 1 nearest neighbor algorithm on the dataset given above (use Euclidean distance). Explain your answer. 2. (4 points) What is the leave-one-out cross validation error of 3 nearest neighbor algorithms on the dataset given above (use Euclidean distance). Explain your answer. 3. (2 points) Will you choose k = 1 or k = 3 for this dataset?