ECE 592 Topics in Data Science

Similar documents
Machine Learning, Fall 2009: Midterm

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

ECE521 week 3: 23/26 January 2017

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine Learning, Midterm Exam

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

ECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression

CS534 Machine Learning - Spring Final Exam

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Final Exam, Spring 2006

CSE 546 Final Exam, Autumn 2013

Lecture: Face Recognition and Feature Reduction

Introduction to Machine Learning Midterm Exam

STA 4273H: Statistical Machine Learning

Final Exam, Machine Learning, Spring 2009

Lecture: Face Recognition and Feature Reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

10-701/ Machine Learning - Midterm Exam, Fall 2010

Machine Learning Basics

Exam III #1 Solutions

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

Midterm: CS 6375 Spring 2018

CITS 4402 Computer Vision

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Machine Learning (Spring 2012) Principal Component Analysis

Understanding Generalization Error: Bounds and Decompositions

EECS490: Digital Image Processing. Lecture #26

Metric-based classifiers. Nuno Vasconcelos UCSD

FINAL: CS 6375 (Machine Learning) Fall 2014

Department of Computer Science and Engineering CSE 151 University of California, San Diego Fall Midterm Examination

Machine Learning Practice Page 2 of 2 10/28/13

Chemometrics: Classification of spectra

Linear and Logistic Regression. Dr. Xiaowei Huang

6.867 Machine Learning

6.867 Machine Learning

Introduction to Machine Learning Midterm Exam Solutions

Name (NetID): (1 Point)

Machine Learning for Signal Processing Bayes Classification and Regression

From Bayes Theorem to Pattern Recognition via Bayes Rule

The Naïve Bayes Classifier. Machine Learning Fall 2017

Machine Learning and Pattern Recognition, Tutorial Sheet Number 3 Answers

Qualifier: CS 6375 Machine Learning Spring 2015

Naive Bayes classification

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

ECE521 Lecture7. Logistic Regression

Machine Learning for Signal Processing Bayes Classification

Lecture Notes 1: Vector spaces

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Naïve Bayes classification

Introduction to Machine Learning Midterm, Tues April 8

The exam is closed book, closed notes except your one-page cheat sheet.

Linear Regression (continued)

10701/15781 Machine Learning, Spring 2007: Homework 2

Principal Component Analysis (PCA)

Dimensionality reduction

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Bayesian Learning (II)

Intermediate Algebra Section 9.1 Composite Functions and Inverse Functions

College Algebra Unit 1 Standard 2

Department of Computer and Information Science and Engineering. CAP4770/CAP5771 Fall Midterm Exam. Instructor: Prof.

Midterm: CS 6375 Spring 2015 Solutions

A first model of learning

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Introduction to Uncertainty and Treatment of Data

Data Mining Techniques

Mathematical Formulation of Our Example

Final Exam, Fall 2002

Classification: The rest of the story

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

PAC-learning, VC Dimension and Margin-based Bounds

L11: Pattern recognition principles

7. The set of all points for which the x and y coordinates are negative is quadrant III.

Midterm Exam Solutions, Spring 2007

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Machine Learning 2nd Edition

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Midterm. You may use a calculator, but not any device that can access the Internet or store large amounts of data.

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Midterm, Fall 2003

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Machine Learning, Fall 2011: Homework 5

Singular Value Decomposition and Digital Image Compression

Overfitting, Bias / Variance Analysis

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

Stephen Scott.

PAC-learning, VC Dimension and Margin-based Bounds

LAB Exercise #4 - Answers The Traction Vector and Stress Tensor. Introduction. Format of lab. Preparation reading

UVA CS 4501: Machine Learning

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Transcription:

ECE 592 Topics in Data Science Final Fall 2017 December 11, 2017 Please remember to justify your answers carefully, and to staple your test sheet and answers together before submitting. Name: Student ID: Question 1 (Warm-up questions. [6 pts]) Below are a few quick questions that you should be able to answer quickly. (a) [2 pts] Consider a graph G(V, E), where V are the vertices and E are edges. Given that G is a forest comprised of k trees, how many edges does it have? (Hint: your answer should relate E and V, the cardinalities of the sets E and V.) Solution: The graph can be partitioned into k sub-graphs, G 1 (V 1, E 1 ),..., G k (V k, E k ), where each sub-graph is a tree. Note that V = k i=1 V i = V is unchanged, because every vertex belongs in one sub-graph. Moreover, the number of edges is E = k i=1 E i = k i=1 ( V i 1) = V k. (b) [2 pts] Describe an application where hardware can acquire linear measurements in a compressive way. Solution: Many students described a single pixel camera, where light from the scene is reflected at the focal plane off an array of small mirrors. Each mirror either sends photons corresponding to that pixel toward the sensor, or deflects them away from the sensor. Therefore, each measurement sums over the pixels whose corresponding mirrors are directed at the sensor, and this measurement approximates an inner product between a binary measurement vector (a row of our matrix) and a vectorized version of the image. (c) [2 pts] Describe a type of signal for which linear approximation is much worse than nonlinear approximation. Make sure to justify your answer. Solution: Non-linear approximation involves selecting the largest transform coefficients, while linear approximation involves selecting the same coefficients every time. For some types of signals, linear approximation makes sense, for example audio signals are often roughly low pass. In contrast, for signals such as 2D images represented by 2D wavelets, the location of edges affects where the wavelet coefficients of interest are located, and non-linear approximation often works well. Question 2 (Bayes classification. [6 pts]) You are given training vectors labeled into two classes, where each vector is in p = 2 dimensions. The training set is comprised of the following points; these will be plotted below. Class 1: {(11, 11), (13, 11), (8, 10), (9, 9), (7, 7), (7, 5), (15, 3)} 1

Class 2: {(7, 11), (15, 9), (15, 7), (13, 5), (14, 4), (9, 3), (11, 3)} These vectors follow Gaussian distributions, where Class 1 has the following mean vector and covariance matrix, [ ] [ ] 10 10 0 µ 1 =, Σ 8 1 =, 0 10 and Class 2 has the following mean vector and covariance matrix, [ ] [ ] 12 10 0 µ 2 =, Σ 6 2 =. 0 10 For ease of visualization, we include a scatter plot of the feature vectors, with Classes 1 and 2 denoted by crosses and squares, respectively. (a) [1 pt] Are the two classes linearly separable? Please explain. Solution: Two classes are linearly separable if some hyper plane (in our case a line) can perfectly separate the two. In our case, we have outliers in the top left and bottom right, no line can perfectly separate the crosses and squares, and our two classes are not linearly separable. (b) [2 pts] In the scatter plot, indicate the decision boundary that you would get using a Bayesian classifier; note that both classes use the same diagonal covariance matrix. Solution: There were many ways to solve this question. Because the classes are Gaussian and the covariance matrices are both diagonal with the same variance for each component, the Bayesian classifier is linear. (Some students confused the linearity of the Bayesian classifier with part a, which referred to the specific data at hand.) Because the covariance matrices are scaled identity matrices, the classifier selects the closer class mean between µ 1 and µ 2. It can be formally shown (and quite a few students did so) that a diagonal with slope one bisecting the midpoint between the two is the Bayesian classifier, but a sketch based on the intuition described here was fine. 2

(c) [1 pt] In the same scatter plot, sketch the decision boundary that you would get using a nearest neighbor (NN) classifier using k = 1. Solution: The points are arranged in such a way that the NN classifier is mostly identical to the Bayes, except for the two outlier points (see part a above) carving out areas around them. (d) [2 pts] Classify test samples (6,11) and (14,3) using the Gaussian Bayes classifier. (Hint: you need not go into detailed derivations if you can explain the output of the classifier.) Solution: For the first sample, (6,11), its squared distance from µ 1 is (10 6) 2 + (8 11) 2 = 16+9 = 25, and its squared distance from µ 2 is (12 6) 2 +(6 11) 2 = 36+25 = 61. The first sample is closer to µ 1 and is in Class 1. For the second test sample, (14,3), it can be shown to be closer to µ 2, and is in Class 2. Question 3 (Dynamic programming (DP). [3 pts]) A criminal approaches you and wants your advice about producing fake coins. The criminal can produce three types of coins. The values of these coins are v 1 = 1, v 2 = 4, and v 3 = 11, and the number of units of metal needed to produce each such coin is w 1 = 2, w 2 = 5, and w 3 = 13, respectively. The criminal wants to produce N value in coins using as little metal as possible. Use dynamic programming to compute Ψ(N), the minimal number of units of metal needed to produce N value in coins. Solution: DP involves two parts. First, the basis case covers situations where N is small. (Some students just showed that Ψ(1) = 2 while others listed values up to N = 11.) Second, when we move beyond the basis case to larger N, we must minimize among (i) producing a coin whose value is N (only feasible for N {1, 4, 11}); (ii) partitioning into sub-problems of size k and N k and plugging in our previously computed solutions, Ψ(k) and Ψ(N k), where we must minimize the amount of metal over various possible values of k. Different students solved this in various ways, but these themes repeated themselves in many solutions. Question 4 (Principal component analysis (PCA). [6 pts]) In a sample of 36 flea beetles, 6 different physical characteristics are measured for each beetle: head length, body length, length of one joint, length of a second joint, total weight, and body temperature. The first four are measured in micrometers, the fifth is measured in milligrams, and the sixth is measured in degrees Celsius. The outcome of a principal component analysis (PCA) and a related plot are given in two figures below. Based on these results, answer the following questions: 3

(a) [2 pts] The correlation matrix is used here instead of the covariance matrix. Explain why the correlation matrix is a more sensible choice than the covariance matrix for this analysis. Solution: The covariance matrix would rely on the actual numbers, and due to different units used, the scale can be grossly different. In contrast, correlations normalize everything to [ 1, +1], which aides visualization. For example, in the second table the various values in PC1 associated with the beetle s size (0.489, 0.495, 0.437, 0.372, 0.496) are all similar in the correlation domain, implying that PC1 can be interpreted as the size of a beetle. (b) [2 pts] How much of the variation in this dataset is explained by the first principal component? How much is explained by the first and the second components together? Solution: In the first table, we can see that PC1 accounts for 0.674 of the variation while PC2 accounts for 0.174; together they account for 0.847. (c) [2 pts] A plot of the principal component scores for the 36 beetles is shown in the figure below. Imagine that the horizontal and vertical axes are drawn on the plot, dividing it into 4 quadrants: UR (upper right), UL (upper left), LR (lower right), and LL (lower left). Suppose that two new beetles, Big Bob and Tiny Tim, are measured. Bob is much larger than any of the other beetles and is also slightly warmer. Tim, on the other hand, is very small compared with the other beetles but like Bob, Tim is warmer than the others. For both Bob and Tim, which quadrant of the above graph would each lie in? How do you know? Solution: As discussed earlier, PC1 corresponds to the size of a beetle. Similarly, in PC2 the weight of the temperature is 0.824 while the weights corresponding to the other size variables are modest. Therefore, PC2 corresponds to the temperature. Big Bob is larger than usual (PC1 is positive) and warm (PC2 is positive), corresponding to the upper right (UR) quadrant. Similarly, Tiny Tim is smaller (PC2 is negative) and warm (PC2 is positive), corresponding to the upper left (UL) quadrant. Question 5 (Individual projects. [4 pts]) From the individual project presentations held in class, select any two (excluding your own). For each one, briefly describe: 4

(i) The motivation for the project, i.e., what problem was solved and why. (ii) Which data science techniques were used in the project. Solution: The objective of this question was to ensure that students actually viewed their project and other students projects as an interesting learning experience worth hearing about. Most students got 3.5 4 points here; those who described their own project lost 2 points. 5