c 4, < y 2, 1 0, otherwise,

Size: px

Start display at page:

Download "c 4, < y 2, 1 0, otherwise,"

Florence Malone
6 years ago
Views:

1 Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =, Pr(B) =, Pr(A B) = 3 and Pr(A B C) = are given a) Are A and B disjoint events? Give a short reason for your answer. (P) b) Are A and B independent events? Give a short reason for your answer. (P) c) Determine the probability Pr(A B). (P) d) Assuming that C is disjoint from both A and B, determine the probability Pr(C). (P) The figure below shows the support of the underlying density f(x, y) of our experiment divided into four areas corresponding to the subsets A \ (A B), A B, B \ (A B) and C. Outside of the colored region the density f(x, y) is zero. y C.0 A B 0.5 A \ (A B) B \ (A B) x e) Assuming that f(x, y) is constant on each area, i.e., c, < y 3, < x, c 2, < y 3, < x 5, f(x, y) = c 3, < y 3, 5 < x 3, c 4, < y 2, < x 3, 2 2 0, otherwise, determine the corresponding constants c, c 2, c 3 and c 4, such that f(x, y) becomes a valid density function. (4P)

2 f) Assume that X and Y are random variables with joint density f(x, y). Determine the mean value E(X). (P) g) Determine the marginal density f Y (y) of the random variable Y. (3P) h) Are X and Y independent random variables? Give a short reason for your answer. (P)

3 Problem 2. Principal Component Analysis: The sample covariance matrix S of n samples x i with sample mean x is given by S = ( ) a) Find the spectral decomposition VΛV T of S by determining the matrices Λ and V. (2+4P) b) Determine the best projection matrix Q to transform the two-dimensional samples to a one-dimensional subspace. (P) c) Determine the residuum max n Q ni= Qx i Q x 2 2 for the above choice of Q. (P) d) Calculate the one-dimensional projections ˆl (s) and ˆl 2 (t) of the lines l (s) = s 2 ( 3 ) and l 2 (t) = t 2 ( 3 ) for s, t R, and discuss the solution. (2+2P)

4 Problem 3. Diffusion Map: Suppose that a dataset is composed of n real vectors x i for i =, 2,..., n, of dimension m (x i R m ). a) In a diffusion map, which properties must be satisfied by kernel functions? (3P) b) Could the following functions be used as valid kernel functions for diffusion maps? Please give a reason for your answer (one phrase per function is enough). (4P) K (x i j ) = x j x i 2 2, K 2 (x i j ) = x j x i 2, K 3 (x i j ) = cos( π 2 x j x i 2 ) for x j x i 2, and zero elsewhere, K 4 (x i j ) = max{ ( x j 2 2 x T j x i ), 0}. Let the dataset be composed of the following 3 vectors (n = 3) of dimension 3 (m = 3) x T = ( ) T 2 = ( ) T 3 = ( ), and the kernel function be given by K(x i j ) = max{ 6 x j x i 2 2, 0}. c) For the random walk of the diffusion map, a weight matrix W is needed. Calculate the remaining weights of the following weight matrix W R 3 3 : (2P) w 2 0 W = w 2 w 23 0 w 32 d) In another application with n = 3, the values of x 2 3 lead to the following decomposition of the transition matrix M: 0 M = The left and right eigenvectors of M are denoted as φ i and ψ i for i =, 2, 3. The transition matrix M can be expressed as M = 3 k= λ k φ k ψ T k. What are the values of λ k for k =, 2, 3? (3P) T

5 Problem 4. Discriminant Analysis: A training dataset consists of three-dimensional vectors belonging to two classes (also known as groups) denoted by the labels y i {, 2}. The dataset is given below. Data Label Data Label x = y = x 4 = y 4 = 2 0 x 2 = 2 y 2 = x 5 = 2 y 5 = x 3 = 0 y 3 = x 6 = y 6 = 2 a) Find the centering matrices, namely E and E 2. (P) b) Find the average of the dataset, namely x. (P) c) Find the averages over groups and 2, namely x and x 2. (2P) d) Find the matrix B corresponding to the sum of squares between groups. (4P) Now consider a different dataset where the inverse of the matrix W corresponding to the sum of squares within groups, and the matrix B corresponding to the sum of squares between groups, are given by [ ] [ ] 3 2 W =, B =. 2 2 e) In Fisher discriminant analysis the maximum value of at Ba over all a a T Wa R2 is needed. Calculate the value of max a R 2 a T Ba a T Wa. Hint: there is no need for calculating the vector a that maximizes at Ba a T Wa. (5P)

6 Problem 5. Support Vector Machines: Suppose that a training dataset is composed of vectors x i R 3, i =,..., 6, belonging to two classes. The class membership is indicated by the labels y i {, +}. Suppose that the dataset is non-separable. A support vector machine is used to find the maximum-margin hyperplane by solving the following dual problem: 6 max λ i 6 6 y i y j λ i λ j xi T x j λ i= 2 i= j= s.t. 0 λ i 3 and 6 λ i y i = 0. i= The dataset and the outputs of the optimization problem are given in the following table. Data Label Solution Data Label Solution 0 x = y 0 = λ = 3 x 4 = y 4 = λ 4 = 3 x 2 = y 0 2 = λ 2 = x 5 = y 5 = λ 5 = x 3 = y 3 3 = λ 3 = 0 x 6 = y 2 6 = λ 6 = a) Determine the support vectors. (4P) b) Find the maximum-margin hyperplane a T x + b by finding a and b. (6P) c) For the above a and b, suppose that a soft classifier is used for the support vector machine based classification given by d(x) = h(a T x + b ) with Determine the value of d(x 3 ). (2P), t <, h(t) = t, t,, t >.

7 Problem 6. Kernels for SVM: a) Suppose that the mapping K : R p R p R is a valid kernel function for support vector machines. Prove that for any x,... n R p, the matrix K = (K(x i j )) i,j=,...,n, defined below is non-negative definite. (3P) K(x ) K(x 2 )... K(x n ) K(x 2 ) K(x 2 2 )... K(x 2 n ) K = K(x n ) K(x n 2 )... K(x n n ) b) Suppose that a kernel is given by K(x, y) = (x T y) 2 + (x T y) for x, y R p. Find the feature function for this kernel. Determine the dimension of the feature space. (6P) c) Specify the optimization problem for a kernel-based support vector machine using the kernel K(x, y) = (2x T y + ) 3. (4P)

8 Problem 7. Clustering: The set Φ = {x i i =,..., 6} contains 2-dimensional data which belongs to 2 clusters C {, 2}, with x = = = 9 4 = = = The k-means clustering algorithm is used to cluster the samples for Φ a) At a certain iteration and x 3 are the center of cluster and cluster 2, respectively. Assign each data sample in Φ to the appropriate cluster. (4P) b) Update the centers of the clusters according to the assignment in (a). (P) c) Suppose that the Euclidian distance in the k-means clustering algorithm is replaced by the following distances for any x, y Φ, with x = d (x, y) = x y = x y + x 2 y 2, d (x, y) = x y = max( x y, x 2 y 2 ), x x 2 and y = y Assign the data samples in Φ to the appropriate cluster, assuming x and x 3 are the centers of cluster and cluster 2, respectively. (8P) y 2.

9 Problem 8. Regression: The table below presents the total car sales (in thousand Euros) of a certain car dealer for January, February, March, May and June 207. It also shows whether the car dealer has sold at least one car of type XXX per month, indicated by b = or no car of type XXX is sold, indicated by b = 0. Months (m) Total car sales (s) indicator (b) for car type XXX a) Find a linear model representing the sales s as a function of m using the linear regression algorithm. (7P) b) Use the linear regression model obtained in a) to estimate the amount of sales in October 207. (P) Suppose that the logistic regression algorithm is used to find the logistic regression coefficients for the car type XXX indicated by b as a function of m. Assuming the learning parameter α = 0., the logistic regression coefficients at a certain iteration k are given by ν (k) = (ν (k) 0, ν (k) ). The probability of selling at least one XXX for the recorded months for (ν (k) 0, ν (k) ) = ( 0.02, 0.04) are given by: h ν (k)() = h ν (k)(2) = 0.55 h ν (k)(3) = h ν (k)(5) = h ν (k)(6) = c) Find the new update ν (k+) = (ν (k+) 0, ν (k+) ). (3P) d) For the given (ν (k) 0, ν (k) ) = ( 0.02, 0.04), find the probability of selling at least one XXX car in October 207, i.e., determine h ν (k)(0). (P)

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the