Qualifier: CS 6375 Machine Learning Spring 2015

Similar documents
FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Midterm: CS 6375 Spring 2018

Midterm: CS 6375 Spring 2015 Solutions

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Name (NetID): (1 Point)

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

CS446: Machine Learning Fall Final Exam. December 6 th, 2016

Name (NetID): (1 Point)

Department of Computer and Information Science and Engineering. CAP4770/CAP5771 Fall Midterm Exam. Instructor: Prof.

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

Final Exam, Spring 2006

Final Examination CS 540-2: Introduction to Artificial Intelligence

Midterm Exam, Spring 2005

CSE 546 Final Exam, Autumn 2013

Naïve Bayes classification

Final Exam, Machine Learning, Spring 2009

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Final Exam. December 11 th, This exam booklet contains five problems, out of which you are expected to answer four problems of your choice.

Machine Learning, Midterm Exam

10-701/ Machine Learning - Midterm Exam, Fall 2010

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

CS446: Machine Learning Spring Problem Set 4

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Midterm, Fall 2003

The Naïve Bayes Classifier. Machine Learning Fall 2017

Final Examination CS540-2: Introduction to Artificial Intelligence

CS534 Machine Learning - Spring Final Exam

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Introduction to Machine Learning Midterm, Tues April 8

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

MLE/MAP + Naïve Bayes

Final Exam, Fall 2002

Bayesian Methods: Naïve Bayes

Linear & nonlinear classifiers

Introduction to Machine Learning Midterm Exam

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS , Fall 2011 Assignment 2 Solutions

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Logistic Regression. Machine Learning Fall 2018

CS 498F: Machine Learning Fall 2010

PAC-learning, VC Dimension and Margin-based Bounds

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Bayesian Networks. Motivation

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

CIS519: Applied Machine Learning Fall Homework 5. Due: December 10 th, 2018, 11:59 PM

Qualifying Exam in Machine Learning

Be able to define the following terms and answer basic questions about them:

Introduction to Machine Learning Midterm Exam Solutions

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

Brief Introduction of Machine Learning Techniques for Content Analysis

Chapter 16. Structured Probabilistic Models for Deep Learning

Bayesian Learning (II)

Introduction to Logistic Regression and Support Vector Machine

Support Vector Machines

Quiz 1 Date: Monday, October 17, 2016

Introduction to Bayesian Learning

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Using first-order logic, formalize the following knowledge:

Bayesian Learning. Bayesian Learning Criteria

Undirected Graphical Models

Introduction to Bayesian Learning. Machine Learning Fall 2018

10-701/15-781, Machine Learning: Homework 4

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Bayesian Approaches Data Mining Selected Technique

Machine Learning Practice Page 2 of 2 10/28/13

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Midterm 2 V1. Introduction to Artificial Intelligence. CS 188 Spring 2015

1 Machine Learning Concepts (16 points)

CS Machine Learning Qualifying Exam

Andrew/CS ID: Midterm Solutions, Fall 2006

Machine Learning 4771

Midterm Exam Solutions, Spring 2007

CS145: INTRODUCTION TO DATA MINING

MLE/MAP + Naïve Bayes

Machine Learning, Fall 2009: Midterm

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

CS 6375 Machine Learning

Neural Networks: Introduction

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Bayesian Networks Inference with Probabilistic Graphical Models

STA 4273H: Statistical Machine Learning

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

6.036 midterm review. Wednesday, March 18, 15

ECE 5984: Introduction to Machine Learning

CS 70 Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand Midterm 1 Solutions

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.

Machine Learning

CS246 Final Exam, Winter 2011

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

ECE521 week 3: 23/26 January 2017

Learning Bayesian network : Given structure and completely observed data

Machine Learning: Homework Assignment 2 Solutions

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Transcription:

Qualifier: CS 6375 Machine Learning Spring 2015 The exam is closed book. You are allowed to use two double-sided cheat sheets and a calculator. If you run out of room for an answer, use an additional sheet (available from the instructor) and staple it to your exam. Time: 2 hours 30 minutes. Question Points Score Support Vector Machines 15 Bayesian networks: Representation 10 Bayesian networks: Inference 16 Naive Bayes 15 Learning Theory 12 AdaBoost 12 Short Questions 20 Total: 100 1

Spring 2015 Qualifier, Page 2 of 16 May 16, 2015

Spring 2015 Qualifier, Page 3 of 16 May 16, 2015 Question 1: Support Vector Machines (15 points) (a) (5 points) Consider a 2-D dataset that is separable by an axis-aligned ellipse (namely there exists an axis-aligned ellipse that has 100% accuracy on the dataset). Show that this dataset is linearly separable in the 6-D feature space (1,x 1,x 2,x 2 1,x2 2,x 1x 2 ) where x 1, x 2 denote the attributes in the 2-D space. Hint: Recall that the equation of a axis-aligned ellipse is c(x 1 a) 2 + d(x 2 b) 2 = 1 where a, b, c, d are real-numbers.

Spring 2015 Qualifier, Page 4 of 16 May 16, 2015 Consider the dataset given below (x 1,x 2,x 3 are the attributes and y is the class variable): x 1 x 2 x 3 y 1 0 0 +1 1 2 1 1 1 2 1 1 (b) (10 points) Find the linear SVM classifier for the dataset given above. Do your optimization either using the primal problem or the dual problem. Provide a precise setting of the weights w and the bias term b. What is the size of the margin?

Spring 2015 Qualifier, Page 5 of 16 May 16, 2015 Question 2: Bayesian networks: Representation (10 points) (a) (10 points) Consider the formula F = (X 1 X 2 ) ( X 3 X 2 ). Construct a Bayesian network over the Boolean variables X 1,X 2,X 3 that represents the following function. If the assignment (X 1 = x 1,X 2 = x 2,X 3 = x 3 ) evaluates the formula F to True, then P(X 1 = x 1,X 2 = x 2,X 3 = x 3 ) = 1/n wherenis the number of solutions off. Notice that n = 5. If the assignment (X 1 = x 1,X 2 = x 2,X 3 = x 3 ) evaluates F to False, then P(X 1 = x 1,X 2 = x 2,X 3 = x 3 ) = 0. Your Bayesian network should have the minimal number of edges to get full credit. Moreover, to get full credit, you should provide the precise structure of the network (namely the directed acyclic graph) and the parameters (the CPTs).

Spring 2015 Qualifier, Page 6 of 16 May 16, 2015 Question 3: Bayesian networks: Inference (16 points) (a) (3 points) Consider a Bayesian network defined over a set of Boolean variables {X 1,...,X n }. Assume that the number of parents of each node is bounded by a constant. Then, the probability P(X i = True) for any arbitrary variable X i {X 1,...,X n } can be computed in polynomial time (namely, ino(n k ) time, wherek is a constant). True or False. Explain your answer. No credit without correct explanation. (b) (3 points) Let {X 1,...,X r } be a subset of root nodes of a Bayesian network (the nodes having no parents). Then, P(X 1 = x 1,...,X r = x r ) = r i=1 P(X i = x i ) where the notation X j = x j denotes an assignment of value x j to the variable X j, 1 j r. True or False. Explain your answer. No credit without correct explanation.

Spring 2015 Qualifier, Page 7 of 16 May 16, 2015 Consider the Bayesian network given below. X 1 X 4 X 7 X 2 X 5 X 8 X 3 X 6 X 9 For the following two questions, assume that all variables in the network are Binary. Namely, they take values from the set {T rue, F alse}. Further assume that all conditional probabilities tables are such that P(X i = True parents(x i )) = 0.8 where1 i 9, andparents(x i ) is any truthassignment to all parents of X i (note that for the root nodes, the parent set is empty and therefore P(X 1 = True) = P(X 4 = True) = P(X 7 = True) = 0.8). (c) (5 points) What is the probability that X 1 is True given that X 3 and X 6 are True? Explain your answer. (d) (5 points) What is the probability that X 3 is True given that X 4 and X 6 are True? Explain your answer.

Spring 2015 Qualifier, Page 8 of 16 May 16, 2015 Question 4: Naive Bayes (15 points) In this problem, we will train a probabilistic model to classify a document written in a simplified language. However, instead of training the naive Bayes model with multivariate Bernoulli distributions, we will train it with a model that uses multinominal distributions. Assume that all the documents are written in a language which has only three words a, b and c. All the documents have exactly n words (each word can be either a, b or c). We are given a labeled document collection {D 1,D 2,...,D m }. The label y i of document D i is 1 or 0, indicating whether D i is good or bad. This model uses the multinominal distributions in the following way. Given the i-th document D i, we denote by a i (b i,c i, respectively) the number of times that worda(b,c, respectively) appears in D i. Therefore, a i +b i +c i = D i = n. In this model, we define P(y i = 1) = η P(D i y i = 1) = n! a i!b i!c i! αa i 1 βb i 1 γc i 1 where α 1 (respectively β 1, γ 1 ) is the probability that word a (respectively b, c) appears in a good document and α 1 +β 1 +γ 1 = 1. Similarly, P(D i y i = 0) = n! a i!b i!c i! αa i 0 βb i 0 γc i 0 where α 0 (respectively β 0, γ 0 ) is the probability that word a (respectively b, c) appears in a bad document and α 0 +β 0 +γ 0 = 1. (a) (2 points) Given a document D i, we want to classify it usingp(y i D i ). Write down an expression for P(y i D i ) and explain what parameters are to be estimated from data in order to calculate P(y i D i ). (b) (13 points) Derive the most likely value of the parameters. Namely, write the expression for likelihood of the data and derive a closed form expression for the parameters. [Hint: You may have to use Lagrange multipliers because you have to maximize the log-likelihood under the constraint that α 1, β 1 and γ 1 (similarly α 0, β 0 and γ 0 ) sum to one. ] Derive your answer on the next page.

Spring 2015 Qualifier, Page 9 of 16 May 16, 2015 (Write your answer here.)

Spring 2015 Qualifier, Page 10 of 16 May 16, 2015 Question 5: Learning Theory (12 points) (a) (6 points) Consider a concept space H of axis-parallel origin-centered embedded rectangles (see the figure below). Formally, a concept h H is defined by four non-negative real parameters a,b,c,d R +, such that a < b and c < d. An example (x,y) is labeled positive if and only if (x,y) is in the rectangle b < x < b and d < y < d, but not in the rectangle a < x < a and c < y < c. Is the VC-dimension of H greater than or equal to 4? Explain your answer. No credit without correct explanation. d c a b

Spring 2015 Qualifier, Page 11 of 16 May 16, 2015 (b) (6 points) What is the VC-dimension of the following concept defined over 2-dimensions x 1,x 2. If(αx 2 1 +αx2 2 +β) > 0 then class is positive. Otherwise, the class is negative. Here,α,β R are real numbers. Explain your answer. No credit without correct explanation.

Spring 2015 Qualifier, Page 12 of 16 May 16, 2015 Question 6: AdaBoost (12 points) Consult the AdaBoost algorithm given on the last page for this question. Suppose you have two weak learners, h 0 and h 1, and a set of 17 points. (a) (2 points) You find that h 1 makes one mistake and h 2 makes four mistakes on the dataset. Which learner will AdaBoost choose in the first iteration (namely m = 1)? Justify your answer. (b) (2 points) What isα 1? (c) (2 points) Calculate the data weighting co-efficients w 2 for the following two cases: (1) the points on which the chosen learner made a mistake and (2) the points on which the chosen learner did not make a mistake.

Spring 2015 Qualifier, Page 13 of 16 May 16, 2015 (d) (6 points) Consider a simple modification to the AdaBoost algorithm in which we normalize the data weighting co-efficients. Namely, we replace w n (m+1) by w n (m+1) /Z (m+1) where Z (m+1) = N n=1 w(m+1) n. Prove that Z (m+1) = 2(1 ǫ m ). Hint: Notice that if the weights are normalized, then ǫ m = N n=1 w(m) n I(y m (x n ) t n ).

Spring 2015 Qualifier, Page 14 of 16 May 16, 2015 Question 7: Short Questions (20 points) (a) (4 points) Consider a 2-D dataset that is linearly separable. Your Boss tells you to use ID3 (a decision tree classifier) instead of a linear classifier (e.g. SVMs). Make an argument based on the representation size that this is not a good idea. (b) (5 points) Let p be the probability of a coin landing heads up when tossed. You flip the coin 3 times and observe 2 tails and 1 head. Suppose p can only take two values: 0.3 or 0.6. Find the Maximum likelihood estimate of p over the set of possible values {0.3,0.6}. (c) (5 points) Suppose that you have the following prior on the parameter p: P(p = 0.3) = 0.3 and P(p = 0.6) = 0.7. Given that you flipped the coin 3 times with the observations described above, find the MAP estimate of p over the set {0.3,0.6}, using the prior.

Spring 2015 Qualifier, Page 15 of 16 May 16, 2015 (d) (6 points) Draw a neural network having minimum number of nodes that represents the following function. Please provide a precise structure as well as a setting of weights. You can only use simple threshold units (namely,o = +1 if i w ix i > 0 ando = 1 otherwise) as hidden units and output units. X 1, X 2 and X 3 are attributes and Y is the class variable. X 1 X 2 X 3 Y 0 0 0 +1 0 0 1 1 0 1 0 +1 0 1 1 +1 1 0 0 +1 1 0 1 1 1 1 0 +1 1 1 1 +1

Spring 2015 Qualifier, Page 16 of 16 May 16, 2015