Clustering VS Classification

Size: px

Start display at page:

Download "Clustering VS Classification"

Jonathan Ford
5 years ago
Views:

1 MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans: (d) 2. To measure the density at a point, consider a. sphere of any size b. sphere of unit volume c. hyper-cube of unit volume d. both (b) and (c) 3. Agglomerative clustering falls under which type of clustering method? a. partition b. hierarchical c. none of the above Ans: (c) 4. Indicate which is/are a method of clustering a. linkage method b. split and merge c. both a and b d. neither a nor b 5. K means and K-medioids are example of which type of clustering method? a. Hierarchical b. partition c. probabilistic d. None of the above.

2 6. Unsupervised classification can be termed as a. distance measurement b. dimensionality reduction c. clustering d. none of the above Ans: (d) Ans: (c) 7. Indicate which one is a method of density estimation a. Histogram based b. Branch and bound procedure c. Neighborhood distance d. all of the above

3 MCQ Linear Algebra 1. Which of the properties are true for matrix multiplication a. Distributive b. Commutative c. both a and b d. neither a nor b Ans: (c) 2. Which of the operations can be valid with two matrices of different sizes? a. addition b. subtraction c. multiplication d. Division Ans: (c) 3. Which of the following statements are true? a. trace(a)=trace(a ) b. det(a)=det(a ) c. both a and b d. neither a nor b 4. Which property ensures that inverse of a matrix exists? a. determinant is non-zero b. determinant is zero c. matrix is square d. trace of matrix is positive value. 5. Identify the correct order of general to specific matrix? a. square->identity->symmetric->diagonal b. symmetric->diagonal->square->identity c. square->diagonal->identity->symmetric

4 d. square->symmetric->diagonal->identity Ans: (d) Ans: (d) 6. Which of the statements are true? a. If A is a symmetric matrix, inv(a) is also symmetric b. det(inv(a)) = 1/det(A) c. If A and B are invertible matrices, AB is an invertible matrix too. d. all of the above Ans: (d) 7. Which of the following options hold true? a. inv(inv(a)) = A b. inv(ka)=inv(a)/k c. inv(a ) = inv(a) d. all of the above

5 MCQ Eigenvalues and Eigenvectors 1. The Eigenvalues of a matrix 2 7 are 1 6 a. 3 and 0 b. -2 and 7 c. -5 and 1 d. 3 and -5 Ans: (c) The Eigenvalues of are a. -1, 1 and 2 b. 1, 1 and -2 c. -1, -1 and 2 d. 1, 1 and 2 Ans: (c) The Eigenvectors of are a. (1 1 1), (1 0 1) and (1 1 0) b. (1 1-1), (1 0-1) and (1 1 0) c. (-1 1-1), (1 0 1) and (1 1 0) d. (1 1 1), (-1 0 1) and (-1 1 0) Ans: (d) Ans: (c) 4. Indicate which of the statements are true? a. A and A*A have same Eigenvectors b. If m is an Eigenvalue of A, then m^2 is an Eigenvalue of A*A. c. both a and b d. neither a nor b 5. Indicate which of the statements are true? a. If m is an Eigenvalue of A, the m is an Eigenvalue of A

6 b. If m is an Eigenvalue of A, then 1/m is the Eigenvalue of inv(a) c. both a and b d. neither a nor b Ans: (c) 6. Indicate which of the statements are true? a. A singular matrix must have a zero Eigenvalue b. A singular matrix must have a negative Eigenvalue c. A singular matrix must have a complex Eigenvalue d. (d) All of the above

7 MCQ Vector Spaces 1. Which of these is a vector space? a. {(x y z w) R 4 x + y z + w = 0} b. {(x y z) R 3 x + y + z = 0} c. {(x y z) R 3 x 2 + y 2 + z 2 = 1} d. { a 1 a, b, c R} b c Ans: (d) Ans: (d) 2. Under which of the following operations {(x, y) x, y R} is a vector space? a. (x 1, y 1 ) + (x 2, y 2 ) = (x 1 + x 2, y 1 + y 2 ) and r. (x, y) = (rx, y) b. (x 1, y 1 ) + (x 2, y 2 ) = (x 1 + x 2, y 1 + y 2 ) and r. (x, y) = (rx, 0) c. both a and b d. neither a nor b 3. Which of the following statements are true? a. r. v = 0, if and only if r=0 b. r 1. v = r 2. v, if and only if r 1 = r 2 c. set of all matrices under usual operations is not a vector space d. all of the above a 3b + 6c 4. What is the dimension of the subspace H = 5a + 4d : a, b, c, d R b 2c d 5d a. 1 b. 2 c. 3 d. 4 Ans: (c) 2 5. What is the rank of the matrix

8 a. 2 b. 3 c. 4 d If v1, v2, v3, v4 are in R 4 and v3 is not a linear combination of v1, v2, v4, then {v1, v2, v3, v4} must be linearly independent. a. True b. False. For example, if v4 = v1 + v2, then 1v1 + 1 v2 + 0 v3-1 v4 = The vectors x1= 1, x2= 1, x3= 1 are : a. Linearly dependent b. Linearly independent. Because 2x1 + x2 -x3 = The vectors x1= 1, x2= 5 are : 2 3 a. Linearly dependent b. Linearly independent.

9 Rank and SVD MCQ 1. The number of non-zero rows in an echelon form is called? a. reduced echelon form b. rank of a matrix c. conjugate of the matrix d. cofactor of the matrix 2. Let A and В be arbitrary m x n matrices. Then which one of the following statement is true a. rank(a + B) rank(a) + rank(b) b. rank(a + B) < rank(a) + rank(b) c. rank(a + B) rank(a) + rank(b) d. rank(a + B) > rank(a) + rank(b) 3. The rank of the matrix is a. 0 b. 2 c. 1 d The rank of is a. 3 b. 2 c. 1 d. 0 Ans: (c)

10 5. Consider the following two statements: I. The maximum number of linearly independent column vectors of a matrix A is called the rank of A. II. If A is an n x n square matrix, it will be nonsingular is rank A = n. With reference to the above statements, which of the following applies? a. Both the statements are false2 b. Both the statements are true c. I is true but II is false. d. I is false but II is true 6. The rank of a 3 x 3 matrix C (= AB), found by multiplying a non-zero column matrix A of size 3 x 1 and a non-zero row matrix B of size 1 x 3, is a. 0 b. 1 c. 2 d Find the singular values of the matrix B = a. 2 and 4 b. 3 and 4 c. 2 and 3 d. 3 and 1 Ans: (d) 8. Grahm-Schmidt Process involves factorizing a matrix as a multiplication of two matrices a. One is Orthogonal and the other one is upper-triangular b. Both are symmetric c. One is symmetric and the other one is anti-symmetric d. One is diagonal and the other one is symmetric

11 9. SVD is defined as A = U ΣV T where U consists of Eigenvectors of a. AA T b. A T A c. AA -1 d. A*A 10. SVD is defined as A = U ΣV T, where Σ is : a. diagonal matrix having singular values b. diagonal matrix having arbitrary values c. identity matrix d. non diagonal matrix

12 MCQ Normal Distribution and Decision Boundary I 1. Three components of Bayes decision rule are class prior, likelihood and a. Evidence b. Instance c. Confidence d. Salience 2. Gaussian function is also called function a. Bell b. Signum c. Fixed Point d. Quintic 3. The span of the Gaussian curve is determined by the. of the distribution a. Mean b. Mode c. Median d. Variance Ans: (d) 4. When the value of the data is equal to the mean of the distribution in which it belongs to, the Gaussian function attains value a. Minimum b. Maximum c. Zero d. None of the above 5. The full width of the Gaussian function at half the maximum is a. 2.35σ b. 1. 5σ c. 0.5σ d σ

13 6. Property of correlation coefficient is a. 1 ρ xy 1 b. 0.5 ρ xy 1 c. 1 ρ xy 1.5 d. 0.5 ρ xy The correlation coefficient can be viewed as angle between two vectors in R D a. Sin b. Cos c. Tan d. Sec 8. For a n-dimensional data, number of correlation coefficient is equal to a. n C 2 b. n-1 c. n 2 d. log(n) 9. Iso-contour lines of smaller radius depicts. value of the density function a. Higher b. Lower c. Equal d. None of the above

14 MCQ Normal Distribution and Decision Boundary II 1. If the covariance matrix is strictly diagonal with equal variance then the iso-contour lines (data scatter) of the data resembles a. Concentric circle b. Ellipse c. Oriented Ellipse d. None of the above 2. Nature of the decision boundary is determined by a. Decision Rule b. Decision boundary c. Discriminant function d. None of the above Ans: (c) 3. In Supervised learning, class labels of the training samples are a. Known b. Unknown c. Doesn t matter d. Partially known 4. In learning is online then it is called a. Supervised b. Unsupervised c. Semi-supervised d. None of the above 5. In supervised learning, the process of learning is a. Online b. Offline c. Partially online and offline d. Doesn t matter

15 6. For spiral data the decision boundary will be a. Linear b. Non-linear c. Does not exist 7. In a 2-class problem, if the discriminant function satisfies g 1 (x) = g 2 (x) then, the data point lies a. On the DB b. Class 1 s side c. Class 2 s side d. None of the above

16 Bayes Theorem MCQ 1. P X P w i X = Ans: (c) a. P 1 X P w i X b. P X P 1 w i X c. P X w i P(w i ) d. P X w i P w i X 2. In Bayes Theorem, unconditional probability is called as a. Evidence b. Likelihood c. Prior d. Posterior 3. In Bayes Theorem, Class conditional probability is called as a. Evidence b. Likelihood c. Prior d. Posterior 4. When the covariance term in Mahalobian distance becomes Identity then the distance is similar to a. Euclidean distance b. Manhattan distance c. City block distance d. Geodesic distance 5. The decision boundary for an N-dimensional (N>3) data will be a a. Point b. Line c. Plane d. Hyperplane Ans: (d)

17 6. Bayes error is the.. bound of probability of classification error. a. Lower b. Upper 7. Bayes decision rule is the theoretically.. classifier that minimize probability of classification error. a. Best b. Worst c. Average

18 MCQ Linear Discriminant Function and Perceptron Learning 1. A perceptron is: a. a single McCulloch-Pitts neuron b. an autoassociative neural network c. a double layer autoassociative neural network d. All the above 2. Perceptron is used as a classifier for a. Linearly separable data b. Non-linearly separable data c. Linearly non-separable data d. Any data 3. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the constant of proportionality being equal to 2. The inputs are 4, 10, 5 and 20 respectively. The output will be: a. 238 b. 76 c. 119 d Consider a perceptron for which training sample, uu R 2 and 1 1 for a > 0 f(a) = 0 for a = 0 1 for a < 0 uu 1 uu 2 f(a) Let the desired output (y) be 1 when elements of class A = {(1,2),(2,4),(3,3),(4,4)} is applied as input and let it be -1 for the class B = {(0,0),(2,3),(3,0),(4,2)}. Let the initial connection weights w 0 (0) = +1, w 1 (0) = -2, w 2 (0) = +1 and learning rate be η = 0.5.

19 This perceptron is to be trained by perceptron convergence procedure, for which the weight update formula is (t + 1) = w(t) + η(y f(a))uu, where f(a) is the actual output. A. If u = (4,4) is applied as input, then w(1)=? a. [2,2,5] T b. [2,1,5] T c. [2,1,1] T d. [2,0,5] T B. If (4,2) is then applied, what will be w(2) a. [1,-2,3] T b. [-1,-2,3] T c. [1,-2,-3] T d. [1,2,3] T 5. Perceptron training rule converges, if data is a. Linearly separable b. Non-linearly separable c. Linearly non-separable data d. Any data 6. Is XOR problem solvable using a single perceptron a. Yes b. No c. Can t say 7. Consider a perceptron for which training sample, uu R 2 and actual output, x {0,1}, let the desired output be 0 when elements of class A={(2,4),(3,2),(3,4)} is applied as input and let it be 1 for the class B={(1,0),(1,2),(2,1)}. Let the learning rate η be 0.5 and initial connection weights are w 0 =0, w 1 =1, w 2 =1. Answer the following questions: A. Shall the perceptron convergence procedure terminate if the input patterns from class A and B are repeatedly applied by choosing a very small learning rate?

20 a. Yes b. No c. Can t say. Since Classes are linearly separable. B. Now add sample (5,2) to class B, what is your answer now, i.e. will it converge or not? a. Yes b. No c. Can t say. After adding above sample, classes become non linear separable.

21 MCQ Linear and Non-Linear Decision Boundaries 1. Decision Boundary in case of same covariance matrix, with identical diagonal elements is : a. Linear b. Non-Linear c. None of the above 2. Decision Boundary in case of diagonal covariance matrix, with identical diagonal elements is given by W T (X X 0 ) = 0, where W is given by: a. (μ k μ l )/ σ 2 b. (μ k + μ l )/ σ 2 c. (μ k 2 + μ l 2 )/ σ 2 d. (μ k + μ l )/ σ 3. Decision Boundary in case of arbitrary covariance matrix but identical for all class is : a. Linear b. Non-Linear c. None of the above 4. Decision Boundary in case of arbitrary covariance matrix but identical for all class is given by W T (X X 0 ) = 0, where W is given by: a. (μ k μ l )/ σ 2 b. Σ 1 ( µ k µ l ) c. (μ 2 k + μ 2 l )/ σ d. Σ ( µ ) k µ l 5. Decision Boundary in case of arbitrary covariance matrix and also unequal is : a. Linear b. Non-Linear c. None of the above

22 6. Discriminant function in case of arbitrary covariance matrix and all parameters are class dependent is given by X T W i X + w i T X + w io = 0, where W is given by: d. 1 Σ 1 i a. 2 b. Σ 1 i i µ 1 Σ 1 i µ c. i 2 1 Σ 1 i 4

23 MCQ PCA Ans: (c) 1. The tool used to obtain a PCA is a. LU Decomposition b. QR Decomposition c. SVD d. Cholesky Decomposition 2. PCA is used for a. Dimensionality Enhancement b. Dimensionality Reduction c. Both d. None 3. The scatter matrix of the transformed feature vector is given by a. (x k μ)(x k μ) T N k=1 N b. k=1 (x k μ) T (x k μ) c. N k=1 (μ x k )(μ x k ) T d. N (μ x k ) T (μ x k ) k=1 4. PCA is used for a. Supervised Classification b. Unsupervised Classification c. Semi-supervised Classification d. Cannot be used for classification

24 5. The vectors which correspond to the vanishing singular values of a matrix that span the null space of the matrix are: a. Right singular vectors b. Left singular vectors c. All the singular vectors d. None Ans: (d) 6. If S is the scatter of the data in the original domain, then the scatter of the transformed feature vectors is given by a. S T b. S c. WSW T d. W T SW 7. The largest Eigen vector gives the direction of the a. Maximum scatter of the data b. Minimum scatter of the data c. No such information can be interpreted d. Second largest Eigen vector which is in the same direction. Ans: (d) 8. The following linear transform does not have a fixed set of basis vectors: a. DCT b. DFT c. DWT d. PCA 9. The Within Class scatter matrix is given by: c a. (x k μ i )(x k μ i ) T i=1 c i=1 c i=1 c i=1 N k=1 N b. k=1(x k μ i ) T (x k μ i ) c. k=1(x i μ k )(x i μ k ) T d. N (x i μ k ) T (x i μ k ) k=1

25 10. The Between Class scatter matrix is given by: a. N i (μ i μ)(μ i μ) T c i=1 c b. i=1 N i (μ i μ) T (μ i μ) c. c i=1 N i (μ μ i )(μ μ i ) T c d. N i (μ μ i ) T (μ μ i ) i=1 11. Which of the following is unsupervised technique? a. PCA b. LDA c. Bayes d. None of the above

26 MCQ Linear Discriminant Analysis 1. Linear Discriminant Analysis is a. Unsupervised Learning b. Supervised Learning c. Semi-supervised Learning d. None of the above 2. The following property of a within-class scatter matrix is a must for LDA: a. Singular b. Non-singular c. Does not matter d. Problem-specific 3. In Supervised learning, class labels of the training samples are a. Known b. Unknown c. Doesn t matter d. Partially known 4. The upper bound of the number of non-zero Eigenvalues of S w -1S B (C = No. of Classes) a. C - 1 b. C + 1 c. C d. None of the above 5. If S w is singular and N<D, its rank is at most (N is total number of samples, D dimension of data, C is number of classes) a. N + C b. N c. C d. N - C Ans: (d)

27 6. If S w is singular and N<D the alternative solution is to use (N is total number of samples, D dimension of data) a. EM b. PCA c. ML d. Any one of the above

28 MCQ GMM 1. A method to estimate the parameters of a distribution is a. Maximum Likelihood b. Linear Programming c. Dynamic Programming d. Convex Optimization 2. Gaussian mixtures are also known as a. Gaussian multiplication b. Non-linear super-position of Gaussians c. Linear super-position of Gaussians d. None of the above Ans: (c) 3. The mixture coefficients of the GMM add upto a. 1 b. 0 c. Any value greater then 0 d. Any value less than 0 4. The mixture coefficients are a. Strictly positive b. Positive c. Strictly negative d. Negative 5. The mixture coefficients can take a value a. Greater than zero b. Greater than 1 c. Less than zero d. Between zero and 1 Ans: (d) 6. For Gaussian mixture models, parameters are estimated using a closed form solution by

29 a. Expectation Minimization b. Expectation Maximization c. Maximum Likelihood d. None of the above 7. Latent Variable in GMM is also known as: a. Prior Probability b. Posterior Probability c. Responsibility d. None of the above Ans: (b,c) 8. A GMM with K Gaussian mixture has K covariance matrices, with dimension: a. Arbitrary b. K X K c. D X D (Dimension of data) d. N X N (No of samples in the dataset) Ans: c

30 References: 1. Pattern Recognition and Machine Learning, Christopher M. Bishop, ISBN-13: , Springer, Linear Algebra and Its Applications, David C. Lay, ISBN-13: , Pearson, Pattern Classification. Richard O. Duda, Peter E. Hart, David G. Strok, ISBN , Wiley,

L11: Pattern recognition principles

L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction