CS534 Machine Learning - Spring Final Exam

Similar documents
FINAL: CS 6375 (Machine Learning) Fall 2014

10-701/ Machine Learning - Midterm Exam, Fall 2010

Introduction to Machine Learning Midterm Exam

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Machine Learning, Midterm Exam

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Final Exam, Machine Learning, Spring 2009

Introduction to Machine Learning Midterm Exam Solutions

Learning with multiple models. Boosting.

CSCI-567: Machine Learning (Spring 2019)

Pattern Recognition and Machine Learning

Introduction to Machine Learning Midterm, Tues April 8

Machine Learning Practice Page 2 of 2 10/28/13

CSE 546 Final Exam, Autumn 2013

Data Mining und Maschinelles Lernen

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Midterm: CS 6375 Spring 2015 Solutions

Final Exam, Spring 2006

CS 6375 Machine Learning

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Notes on Discriminant Functions and Optimal Classification

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Chapter 14 Combining Models

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

Midterm Exam, Spring 2005

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Midterm Exam Solutions, Spring 2007

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Qualifying Exam in Machine Learning

6.036 midterm review. Wednesday, March 18, 15

Midterm: CS 6375 Spring 2018

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Machine Learning, Fall 2009: Midterm

VBM683 Machine Learning

Machine Learning, Fall 2011: Homework 5

Gaussian Mixture Models

6.867 Machine learning

Machine Learning. Ensemble Methods. Manfred Huber

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

CS446: Machine Learning Fall Final Exam. December 6 th, 2016

Mathematical Formulation of Our Example

Introduction to Graphical Models

Multiclass Boosting with Repartitioning

Bias-Variance Tradeoff

Linear & nonlinear classifiers

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Qualifier: CS 6375 Machine Learning Spring 2015

10701/15781 Machine Learning, Spring 2007: Homework 2

The exam is closed book, closed notes except your one-page cheat sheet.

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Name (NetID): (1 Point)

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

10-701/ Machine Learning, Fall

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

1 Machine Learning Concepts (16 points)

ECE 5424: Introduction to Machine Learning

Linear Models for Regression CS534

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Decision Tree Learning Lecture 2

PDEEC Machine Learning 2016/17

Statistical Machine Learning from Data

c 4, < y 2, 1 0, otherwise,

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Machine Learning. Chao Lan

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Linear Models for Classification

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Mixture of Gaussians Models

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

Name (NetID): (1 Point)

Ensemble Methods and Random Forests

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

Machine Learning. Lecture 9: Learning Theory. Feng Li.

CPSC 540: Machine Learning

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

PAC-learning, VC Dimension and Margin-based Bounds

Introduction to Machine Learning HW6

Week 5: Logistic Regression & Neural Networks

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

CS446: Machine Learning Spring Problem Set 4

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Final Exam, Fall 2002

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Transcription:

CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the difficult one later. If the question asks for explanation, you must provide explanation, otherwise no point will be given. The exam is open book and notes, but no computer, cell phone, and internet. Good luck! 1

1. (24 points) True/False questions. No need to explain your answer but each incorrect T/F answer will get -1 point. Providing no answer will get zero point. a. Both PCA and Spectral Clustering perform eigen-decomposition of different matrices. However, the size of these two matrices are the same. b. When the data is sparse, ISOMAP may fail to find the underlying manifold. c. The dimensionality of the feature map generated by polynomial kernels (e.g., K(x, y) = (1+x y) d ) is polynomial with respect to the power d of the polynomial kernel. d. There is at least one set of 4 points in R 3 that can be shattered by the hypothesis space of all linear classifiers in R 3. e. The log-likelihood of the data will always increase through successive iterations of the Expectation Maximization algorithm. f. In AdaBoost the weights of the misclassified examples go up by the same multiplicative factor in one iteration. g. The more hidden layers a neural network has, the better it predicts desired outputs for new inputs that it was not trained with. h. When clustering with Gaussian Mixture Models with K Gaussians, we can select the best K as follows: run GMM with different K values and pick the one that leads to the highest data likelihood p(x θ). 2

2. (20 pts) Learning theory a. (4 pts) Consider the hypothesis space of decision stumps with binary split. What is its VCdimension? Briefly justify your answer. b. (4 pts) Consider the hypothesis space of unlimited-depth decision trees. What is its VC-dimension? Briefly justify your answer. Consider linear regression using polynomial basis functions of order M. The expected loss can be decomposed into three parts, the bias, the variance and the noise. c. (6 pts) If we increase the order of the polynomial basis function (e.g., from quadratic to cubit, or higher order polynomials), will it increase, decrease or have no impact on the three terms? Briefly explain. Bias: Variance: Noise: d. (6 pts) If we increase the training set size, will it increase, decrease or have no impact on each of the three terms? Briefly explain. Bias: Variance: Noise: 3

3. [12 points]dimension Reduction. In a supervised classification task we have a total of 10 features, among which only three are useful for predicting the target variable, the other features are pure random noise with very high variance. What complicates matters even worse is that the three features when considered individually show no predictive power, and only work when considered together. Consider each of the following dimension reduction techniques and indicate whether it can successfully identify the relevant dimensions. Briefly explain why. Feature selection with greedy forward search Feature selection with greedy backward search Linear discriminant analysis Principle Component Analysis 4

4. [14 points] Ensemble methods. a. Consider a data set D = x 1, x 2,..., x 10. We apply Adaboost with decision stump. Let w 1,, w 10 be the weights of the ten training examples respectively. i. (4pts) In the first iteration, x 1, x 2 and x 3 are misclassified. Please provide the weights after update and normalization in the first iteration. ii. (6pts) In the second iteration, x 3 and x 4 are misclassified. Please provide the weights after update and normalization at the end of this iteration. b (4pts) If the training set contains noise in class labels, which ensemble learning method do you expect to be hurt more by the label noise, boosting or bagging? Why? 5

5. [15 pts] Clustering: K-Means, GMM and Spectral a. [3 points] Why is it important to randomly restart k-means multiple times? b. [4 points] Apply Gaussian Mixture Models to cluster the following data into two clusters. Consider two different GMM models. The first variant restricts the covariance matrix Σ of both clusters to be diagonal, whereas the second variant makes no such restriction. Please mark out what you think the final clusters (Gaussians) would look like for each variant in the figure. (a) Diagonal Σ (b)unrestricted Σ c. [4 pts] Consider the following clustering problem where we want to partition the data into two clusters. We consider clustering with both the min-cut and normalized-cut objectives as defined in class. Please mark out the clustering results for both objectives. (a) Min-cut (b)nomalized cut 6

d. [4 points] Apply K-means to the following 2-d data set for k = 2. Start with µ 1 (0) = (0, 0), and µ 2 (0) = (4, 3). Use the figures provided below to mark the means and the cluster membership after (1) the first iteration and (2) after the algorithm converges. Use as many figures as needed to work through iterations. Simply mark out the first iteration and the final result. 7

6. (10pts) Expectation maximization. Imagine a machine learning class where the probability that a student gets an A grade is P (A) = 1/2, a B grade P (B) = µ, a C grade P (C) = 2µ, and a D grade 1/2 3µ. Given a set of studentsm we observe that there are c students who got C and d students who got D. We also know that a total of h students got either A or B, but we don t observe a, the total number of A students, and b, the total number of B students. We only observe their sum, which is h. The goal is to apply EM to obtain a maximum likelihood estimation of µ. (5 pts) Expectation. Which of the following formula computes the expected value of a and b given µ? â = 1/2 1/2 + h µ ˆb = µ 1/2 + h µ â = 1/2 1/2 + µ h ˆb = µ 1/2 + µ h â = µ 1/2 + µ h ˆb = 1/2 1/2 + µ h â = 1/2 1 + µ 2 h ˆb = µ 1 + µ 2 h (5 pts) Maximization. Given the expected value of a and b, which of the following formula computes the maximum likelihood estimate of µ? Hint: compute the MLE of µ assuming that unobserved variables are replaced by their expectation, ˆµ = h a + c 6(h a + c + d) ˆµ = h a + d 6(h 2a d) ˆµ = ˆµ = h a 6(h 2a + c) 2(h a) 3(h a + c + d) 8