6-1. Canonical Correlation Analysis

Size: px
Start display at page:

Download "6-1. Canonical Correlation Analysis"

Transcription

1 6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another set. Canonical variables, Canonical correlation. Examples: Arithmetic speed and arithmetic power to reading speed and reading power, Governmental policy variables with economic goal variables, College performance variables with precollege achievement variables. 1

2 Canonical Variates and Canonical Correlation Let X (1) be a p 1 random vector, X (2) be a q 1 random vector with p q, and E(X (1) ) = µ (1) ; Cov(X (1) ) = Σ 11 ; Set E(X (2) ) = µ (2) ; Cov(X (2) ) = Σ 22 ; Cov(X (1), X (2) ) = Σ 12 = Σ 21. U = a X (1), V = b X (2), Then we shall seek coefficient vectors a and b such that Corr(U, V ) = a Σ 12 b a Σ 11 a b Σ 22 b is as large as possible. 2

3 The first pair of canonical variables, or first canonical variate pair the pair linear combination U 1 and V 1 having unit variances, which maximize the correlation Corr(U, V ); The second pair of canonical variables, or second canonical variate pair the pair of linear combinations U 2 and V 2 having unit variances, which maximize the correlation Corr(U, V ) among all choices that are uncorrelated with the first pair of canonical variables. The kth pair of canonical variables, or kth canonical variate pair the pair of linear combinations U k, V k having unit variances, which maximize the correlation Corr(U, V ) among all choices uncorrelated with the previous k 1 canonical variable pares The correlation between the kth pair of canonical variable s is called the kth canonical correlation 3

4 Result 6-1.1: Suppose p q and let the p-dimensional random vectors X ( 1) and q dimensional X (2) have Cov(X (1) ) = Σ 11, Cov(X (1) ) = Σ 22, and Cov(X (1), X (2) ) = Σ 12, where Σ has full rank. For coefficients p 1 vector a and q 1 vector b, form the linear combination U = a X (1) and V = b X (2). Then max a,b Corr(U, V ) = ρ 1 attained by the linear combinations (first canonical variate pair) U 1 = e 1Σ 1/2 11 X (1) and V 1 = f 1Σ 1/2 22 X (2), The kth pair of canonical variates, k = 2, 3,..., p U k = e kσ 1/2 11 X (1) and V k = f kσ 1/2 22 X (2), maximize max a,b Corr(U, V ) = ρ k among those linear combinations uncorrelated with the preceding 1, 2,..., k 1 canonical variables. 4

5 Here ρ 2 1 ρ 2 2 ρ 2 p are the eigenvalues of Σ 1/2 11 Σ 12 Σ 1 22 Σ 21Σ 1/2 11, and e 1, e 2,..., e p are associated p 1 eigenvectors. The quantities ρ 2 1 ρ 2 2 ρ 2 p are also the p largest eigenvalues of the matrix Σ 1/2 22 Σ 21 Σ 1 11 Σ 12Σ 1/2 22 with corresponding q 1 eigenvectors f 1, f 2,..., f p. Each f i is proportional to Σ 1/2 22 Σ 21 Σ 1/2 11 e i. The canonical variates have the properties Var(U k ) = Var(V k ) = 1, k = 1,..., p, Corr(U k, U l ) = Corr(V k, V l ) = Corr(U k, V l ) = 0 for k, l = 1, 2,..., p and k l. 5

6 Example Suppose Z (1) = [Z (1) 1, Z(1) 2 ] are standardize variables and Z (2) = [Z (2) 1, Z(2) 2 ] are also standardized variables. Let Z = [Z (1), Z (2) ] and Cov(Z) = Calculate canonical variates and canonical correlations for standardized variables Z (1) and Z (2). Example Compute the computing correlations between the first pair canonical variates and their component variables for the situation considered in Example

7 Example Consider the covariance matrix Cov X (1) 1 X (1) 2 X (2) 1 X (2) 2 = Calculate the canonical correlation between [X (1) 1, X(1) 2 ] and [X (2) 1, X(2) 2 ] 7

8 The Sample Canonical Variates and Sample Canonical Correlation Results S 1/2 11 S 12 S22 1 S 21S 1/2 Let ˆρ 2 1 ˆρ ˆρ 2 p be the p ordered eigenvalues of 11 with corresponding eigenvectors ê 1, ê 2,..., ê p, where p q. Let ˆf 1,...,ˆf p be the eigenvectors of S 1/2 22 S 21 S11 1 S 12S 1/2 22. Then the kth sample canonical variate pair is Û k = ê k S 1/2 11 x (1), ˆVk = ˆf k S 1/2 22 x (2) where x (1) and x (2) are the values of the variables X (1) and X (2) for a particular experimental unit. Also for the kth pair, k = 1,..., p rûk, ˆV k = ˆρ k. The quantities ˆρ 1,..., ˆρ p are the sample canonical correlations. 8

9 Large Sample Inference Results Let [ ] X (1) j X j = X (2), j = 1, 2,..., n j bee a random sample from an N p+q (µ, Σ) population with Σ = [ ] Σ11 Σ 12 Σ 21 Σ 22 Then the likelihood ratio test H 0 : Σ 12 = 0 vs H 1 : Σ 12 0 reject H 0 for large value of ( ) S11 S 22 p 2 ln Γ = n ln = n ln (1 ˆρ 2 i ) S where S = [ ] S11 S 12 S 21 S 22 is the unbiased estimator of Σ. For large n the test statistic 2 ln Γ is approximately distributed as a chi-square random variable with pq degree 9 freedom. i=1

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25 6-2. Discrimination and Classification Discrimination and classification are multivariate techniques concerned with separating distinct sets of objects (or observations) and with allocating new objects (observations) to previously defined group. Goal 1. To describe, either graphically ( in three or fewer dimensions) or algebraically, the differential features of objects (observations) from several known collections (population). We try to find discriminants whose numerical values are such that the collections are separated as much as possible. Goal 2. To sort objects (observations) into two or more labeled classes. The emphasis is on deriving a rule that can be used to optimally assign new objects to the labeled classes. 1

26 Separation and Classification for Two populations 2

27 Allocation or classification rules are usually developed from learning samples. Measured characteristics of randomly selected object known to come from each of the two populations are examined for differences. Why we know that some observations belong to a particular population, but we are unsure about others. Incomplete knowledge of future performance Perfect information requires destroying the object. Unavailable or expensive information. 3

28 4

29 5

30 Let f 1 (x) and f 2 (x) be the probability density functions associated with p 1 vector random variable X for the populations π 1 and π 2 respectively. An object with associate measurements x must be assigned to either π 1 or π 2. Let Ω be the complete sample space, let R 1 be that set of x values for which we classify objects as π 1 and R 2 = Ω R 1 be the remaining x values for which we classify objects as π 2. So R 1 R 2 = Ω and R 1 R 2 =. 6

31 The conditional probability, P (2 1), of classifying an object as π 2 when, in fact, it is from π 1 is P (2 1) = P (X R 2 π 1 ) = R 2 =Ω R 1 f 1 (x)dx. Similar, the conditional probability, P (1 2), of classifying an object as π 1 when, in fact, it is from π 2 is. P (1 2) = P (X R 1 π 2 ) = R 1 =Ω R 2 f 1 (x)dx 7

32 Let p 1 be the prior probability of π 1 and p 2 be the prior probability of π 2, where p 1 + p 2 = 1. Then P (observation is correlectly classified as π 1 ) = P (X R 1 π 1 )P (π 1 ) = P (1 1)p 1, P (observation is misclassified as π 1 ) = P (X R 1 π 2 )P (π 2 ) = P (1 2)p 2, P (observation is correlectly classified as π 2 ) = P (X R 2 π 2 )P (π 2 ) = P (2 2)p 2, P (observation is misclassified as π 2 ) = P (X R 2 π 1 )P (π 1 ) = P (2 1)p 1, 8

33 The costs of misclassification is defined by a cost matrix π 1 π 2 π 1 0 c(2 1) π 2 c(1 2) 0 Expected cost of misclassification (ECM) ECM = c(2 1)P (2 1)p 1 + c(1 2)P (1 2)p 2 A reasonable classification rule should have an ECM as small, or nearly as small, as possible. 9

34 Results The region R 1 and R 2 that minimize the ECM are defined by the value x for which the following inequalities hold: R 1 : f 1(x) f 2 (x) ( ) ( c(1 2) p2 c(2 1) p 1 ) ( density ratio ) ( cost ratio ) ( prior probability ratio ) R 2 : f 1(x) f 2 (x) < ( ) ( c(1 2) p2 c(2 1) p 1 ) ( density ratio ) < ( cost ratio ) ( prior probability ratio ) 10

35 11

36 12

37 Other classification procedures: Choose R 1 and R 2 to minimize the total probability of misclassification (TPM). TPM = p 1 f 1 (x)dx + p 2 f 2 (x)dx. R 2 R 1 Allocate a new observation x 0 to the population with the largest posterior probability P (π i x 0 ). P (π 1 x 0 ) = p 1 f 1 (x 0 ) p 1 f 1 (x 0 ) + p 2 f 2 (x 0 ), P (π 2 x 0 ) = 1 P (π 1 x 0 ) = p 2 f 2 (x 0 ) p 1 f 1 (x 0 ) + p 2 f 2 (x 0 ). Classifying an observation x 0 as π 1 when P (π 1 x 0 ) > P (π 2 x 0 ) is equivalent to using the (b) rule for total probability of misclassification. 13

38 Classification with Two Multivariate Normal Populations Assume f 1 (x) and f 2 (x) are multivariate normal densities, the first with mean vector µ 1 and covariance Σ 1 and the second with mean µ 2 and covariance Σ 2. Suppose that joint densities of X = [X 1, X 2,..., X p ] for population π 1 and π 2 are given by f i (x) = [ 1 exp 1 ] (2π) p/2 Σ 1/2 2 (x µ i) Σ(x µ i ) for i = 1, 2. 14

39 Result Let the populations π 1 and π 2 be described by multivariate normal densities of the form above. Then the allocation rule that minimizes the ECM is as follows: Allocate x 0 to π 1 if (µ 1 µ 2 ) Σ 1 x 0 1 [( ) ( )] c(1 2) 2 (µ 1 µ 2 )Σ 1 p2 (µ 1 + µ 2 ) ln, c(2 1) Allocate x 0 to π 2 otherwise. p 1 The Estimated Minimum ECM Rule for Two Normal Population Allocate x 0 to π 1 if ( x 1 x 2 ) S 1 pooled x ( x 1 x 2 )S 1 pooled ( x 1 + x 2 ) ln Allocate x 0 to π 2 otherwise. [( c(1 2) c(2 1) ) ( )] p2, p 1 15

40 16

41 17

42 18

43 19

44 Scaling The coefficient vector â = S pooled ( x 1 x 2 ) is unique only up to a multiplicative constant, so for c 0, any vector câ will also serve as discriminant coefficients. The vector â is frequently scaled or normalized to ease the interpretation of its elements. (1) Set â = â/ â â. (2) Set â = â/â 1. 20

45 Fisher s Approach to Classification with Two populations Fisher s idea was to transform the multivariate observations x to univariate observation y such that y s derived from populations π 1 and π 2 were separated as much as possible. Fisher suggested taking linear combinations of x to create y s because they are simple enough functions of the x to be handled easily. Fisher s approach does not assume that the populations are normal, but implicitly assume that the population covariance matrices are equal. 21

46 Result The linear combinations ŷ = â x = ( x 1 x 2 ) S 1 pooled x maximize the ratio ( squared distance between sample mean of y (sample variance of y) ) = (ȳ 1 ȳ 2 ) 2 s 2 y = (â x 1 â x 2 ) 2 â S pooled â = (â d) 2 â S pooled â over all possible coefficient vectors â where d = ( x 1 x 2 ). The maximum of the ratio is D 2 = ( x 1 x 2 ) S 1 pooled ( x 1 x 2 ). 22

47 23

48 An Allocation Rule Based on Fisher s Discriminant Function Allocate x 0 to π 1 if ŷ 0 = ( x 1 x 2 )S 1 pooled x 0 ˆm = 1 2 ( x 1 x 2 )S 1 pooled ( x 1 + x 2 ) or Allocate x 0 to π 2 if ŷ 0 ˆm 0. ŷ 0 < ˆm or ŷ 0 ˆm < 0. 24

49 25

50 Classification of Normal Population When Σ 1 Σ 2 Result Let the populations π 1 and π 2 be described by multivariate normal densities with mean vectors and covariance matrices µ 1, Σ 1 and µ 2, Σ 2, respectively. The allocation rule that minimizes the expected cost of misclassification is given by Allocate x 0 to π 1 if 1 2 x 0(Σ 1 1 Σ 1 2 )x 0 + (µ 1Σ 1 1 µ 2Σ 1 2 )x 0 k ln [( c(1 2) c(2 1) ) ( )] p2, ( ) where k = 1 2 ln Σ1 Σ (µ 1Σ 1 1 µ 1 µ 2Σ 1 2 µ 2). Allocate x 0 to π 2 otherwise. p 1 26

51 Quadratic Classification Rule (Normal Populations with Unequal Covariance Matrices) Allocate x 0 to π 1 if 1 2 x 0(S 1 1 S 1 2 )x 0 + ( x 1S 1 1 x 2S 1 2 )x 0 k ln Allocate x 0 to π 2 otherwise. [( c(1 2) c(2 1) ) ( )] p2. p 1 27

52 28

53 Evaluating Classification Functions The total probability of misclassification TPM = p 1 f 1 (x)dx + p 2 f 2 (x)dx R 2 R 1 Optimum error rate (OER): the smallest value of TPM, obtained by a judicious choice of R 1 and R 2. R 1 and R 2 for OER are determined by the rule of Minimum Expected Cost Regions with equal misclassification costs. 29

54 30

55 31

56 32

57 Actual error rate (AER): AER = p 1 f 1 (x)dx + p 2 f 2 (x)dx ˆR 2 ˆR 1 Apparent error rate (APER): the fraction of observations in the training sample that are misclassified by the sample classification function. 33

58 34

59 35

60 APER tends to underestimate the AER, and the problem does not disappear unless the sample sizes n 1 and n 2 are very large. Essentially, this optimistic estimate occurs because the data used to build the classification function are also used to evaluate it. Error-rate estimates can be constructed that are better than the apparent error rate, remain relatively easy to calculate, and do not require distributional assumption. Split the total sample into training sample and a validation sample. Shortcoming: 1. Requires large samples, 2. valuable information may be lost. Lachenbruch s holdout procedure. 36

61 Lachenbruch s holdout procedure 1. Start with the π 1 group of observations. Omit one observation from this group, and develop a classification function based on the remaining n 1 1, n 2 observations. 2. Classify the holdout observation, using the function constructed in Step Repeat Step 1 and 2 until all of the π 1 observations are classified, Let n (H) 1M be the number of holdout (H) observations misclassified in this group. 4. Repeat Step 1 through 3 for the π 2 observations, Let n (H) 2M holdout observations misclassified in this group. be the number of and ˆP (2 1) = n(h) 1M n, ˆP (H) (1 2) = 2M n 1 n 2 Ê(AER) = n(h) 1M + n(h) 2M n 1 + n 2 37

62 38

63 39

64 40

65 41

66 42

67 43

68 Classification with Serval Populations The Minimum Expected Cost of Misclassification Method Let f i (x) be the density associated with population π i, i = 1, 2,..., g. Let p i be the prior probability of population π i, i = 1,..., g, and c(k i) be the cost of allocating an item to π k when, in fact, it belongs to π i, for k, i = 1,..., g. For k = i, c(i i) = 0. Finally, let R k be the set of x s classed as π k and P (k i) = R k f i (x)dx for k, i = 1, 2,..., g with P (i i) = 1 g k=1,k i P (k i). The conditional expected cost of misclassifying an x from π 1 π 3,..., π g is int π 2, or ECM(1) = P (2 1)c(2 1) P (g 1)c(g 1) = g P (k 1)c(k 1). 44 k=2

69 Multiplying each conditional ECM by its prior probability and summing gives the overall ECM: ECM = p 1 ECM(1) + + p g ECM(g) g g = P (k i)c(k i). i=1 p i k=1,k i Result The classification regions that minimize the above ECM are defined by allocating g p i f i (x)c(k i) i=1,i k is smallest. If a tie occurs, x can be assigned to any of the tied populations. If c(i k) for any i k are same, allocate x to π k if p k f k (x) > p i f i (x), for all i k. 45

70 Classification with Normal Populations When the f i (x) = [ 1 exp 1 ] (2π) p/2 Σ i 1/2 2 (x µ i) Σ 1 i (x µ i ), i = 1, 2,..., g, If further the misclassification costs are all equals, c(k i) = 1, k i, then Allocate x to π k if ln p k f k (x) = ln p k p 2 ln(2π) 1 2 ln Σ k 1 2 (x µ k) Σ 1 k (x µ k) = max ln p i f i (x). i 46

71 Define the sample quadratic discrimination score ˆd Q i (x) as ˆd Q i (x) = 1 2 ln S i 1 2 (x (x) i ) Σ 1 k (x (x) i ) + ln p i, i = 1, 2,..., g. Then allocate x to π k if the quadratic score ˆd Q k ˆd Q 1 (x),..., ˆd Q g (x). (x) is the largest of 47

72 If the population covariance matrices Σ i are equal, then define the sample linear discrimination score as ˆd i (x) = x i S 1 pooled x 1 2 S 1 pooled x i + ln p i, for i = 1, 2,..., g. Then, allocate x to π i if the linear discrimination score ˆd k (x) is the a largest of ˆd 1 (x),..., ˆd g (x). where S pooled = 1 n 1 + n n g g ((n 1 )S 1 + (n 2 1)S (n g 1)S g ). 48

73 Fisher s Method for Discrimiating among Several Populations The motivation behind the Fisher discriminant analysis is the need to obtain a reasonable representation of the populations that involves only a few linear combinations of the observations, such as a 1x, a 2x and a 3x. The approach has several advantages when one is interested in separating several population for (1) visual inspection or (2) graphical descriptive purposes. Assume that p p population covariance matrices are equal and of full rank. That is Σ 1 = Σ 2 = = Σ g = Σ. 49

74 Define and x i = 1 n i W = n i j=1 x ij, i = 1,..., g, and x = 1 g B = g ( x i x)( x i x) i=1 g (n i 1)S i = i=1 g x i. i=1 n g i (x ij x i )(x ij x i ). i=1 j=1 50

75 Fisher s Sample Linear Discriminants Let ˆλ 1, ˆλ 2,..., ˆλ s > 0 denote the s min(g 1, p) nonzero eigenvalues of W 1 B and ê 1,..., ê s be the corresponding eigenvectors (scaled so that ê S pooled ê = 1). Then the vector of coefficients â that maximizes the ratio â Bâ â Wâ = ( g ) â ( x i x)( x i x) â i=1 ( ) g â n i (x ij x i )(x ij x i ) â i=1 j=1 is given by â 1 = ê 1. The linear combination â 1x is, called the sample first discriminant. The choice â 2 = ê 2 produces the sample second discriminant, â 2x, and continuing, we obtain â k x = ê kx, the sample kth discriminant, k s. 51

76 Let or, equivalently d i (x) = µ iσ 1 x 1 2 µ iσ 1 µ i + ln p i d i (x) 1 2 x Σ 1 x = 1 2 (x µ i) Σ 1 (x µ i ) + ln p i. 52

77 Result Let y j = a j x where a j = Σ 1/2 e j and e j is an eigenvector of Σ 1/2 BµΣ 1/2. Then p (y j µ iyj ) 2 = j=1 p [a j(x µ i )] 2 = (x µ i ) Σ 1 (x µ i ) j=1 If λ 1 λ s > 0 = λ s+1 = = λ p, = 2d i (x) + x Σ 1 x + 2 ln p i p j=s+1 (y j µ iyj ) 2 is constant for all populations, i = 1, 2,..., g so only the first s discriminants y j, or contribute to the classification. s (y j µ iyj ) 2, j=1 53

78 Fisher s Classification Procedure Based on Sample Discriminations Allocate x to π k if r (ŷ j ȳ kj ) 2 = j=1 r [â j(x x k )] 2 j=1 r [â j(x x j )] 2 mboxforall i k j=1 where â j is the corresponding eigenvectors of W 1 B, ȳ kj = â j x k and r s. 54

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 9: Discrimination and Classification 1 Basic concept Discrimination is concerned with separating

More information

DISCRIMINANT ANALYSIS. 1. Introduction

DISCRIMINANT ANALYSIS. 1. Introduction DISCRIMINANT ANALYSIS. Introduction Discrimination and classification are concerned with separating objects from different populations into different groups and with allocating new observations to one

More information

12 Discriminant Analysis

12 Discriminant Analysis 12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into

More information

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 8: Canonical Correlation Analysis

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 8: Canonical Correlation Analysis MS-E2112 Multivariate Statistical (5cr) Lecture 8: Contents Canonical correlation analysis involves partition of variables into two vectors x and y. The aim is to find linear combinations α T x and β

More information

Lecture 8: Classification

Lecture 8: Classification 1/26 Lecture 8: Classification Måns Eriksson Department of Mathematics, Uppsala University eriksson@math.uu.se Multivariate Methods 19/5 2010 Classification: introductory examples Goal: Classify an observation

More information

Discrimination: finding the features that separate known groups in a multivariate sample.

Discrimination: finding the features that separate known groups in a multivariate sample. Discrimination and Classification Goals: Discrimination: finding the features that separate known groups in a multivariate sample. Classification: developing a rule to allocate a new object into one of

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

Canonical Correlations

Canonical Correlations Canonical Correlations Like Principal Components Analysis, Canonical Correlation Analysis looks for interesting linear combinations of multivariate observations. In Canonical Correlation Analysis, a multivariate

More information

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko. SF2935: MODERN METHODS OF STATISTICAL LEARNING LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS Tatjana Pavlenko 5 November 2015 SUPERVISED LEARNING (REP.) Starting point: we have an outcome

More information

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples DOI 10.1186/s40064-016-1718-3 RESEARCH Open Access Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples Atinuke Adebanji 1,2, Michael Asamoah Boaheng

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

which is the chance that, given the observation/subject actually comes from π 2, it is misclassified as from π 1. Analogously,

which is the chance that, given the observation/subject actually comes from π 2, it is misclassified as from π 1. Analogously, 47 Chapter 11 Discrimination and Classification Suppose we have a number of multivariate observations coming from two populations, just as in the standard two-sample problem Very often we wish to, by taking

More information

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay Solutions to Final Exam 1. (13 pts) Consider the monthly log returns, in percentages, of five

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

Stat 216 Final Solutions

Stat 216 Final Solutions Stat 16 Final Solutions Name: 5/3/05 Problem 1. (5 pts) In a study of size and shape relationships for painted turtles, Jolicoeur and Mosimann measured carapace length, width, and height. Their data suggest

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Canonical Correlation Analysis of Longitudinal Data

Canonical Correlation Analysis of Longitudinal Data Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS

CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS EECS 833, March 006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@gs.u.edu 864-093 Overheads and resources available at http://people.u.edu/~gbohling/eecs833

More information

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

STAT 501 Assignment 1 Name Spring Written Assignment: Due Monday, January 22, in class. Please write your answers on this assignment

STAT 501 Assignment 1 Name Spring Written Assignment: Due Monday, January 22, in class. Please write your answers on this assignment STAT 5 Assignment Name Spring Reading Assignment: Johnson and Wichern, Chapter, Sections.5 and.6, Chapter, and Chapter. Review matrix operations in Chapter and Supplement A. Examine the matrix properties

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Principal component analysis

Principal component analysis Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance

More information

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis TAMS39 Lecture 10 Principal Component Analysis Factor Analysis Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content - Lecture Principal component analysis

More information

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen

More information

FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3

FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3 FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3 The required files for all problems can be found in: http://www.stat.uchicago.edu/~lekheng/courses/331/hw3/ The file name indicates which problem

More information

Exam 2. Jeremy Morris. March 23, 2006

Exam 2. Jeremy Morris. March 23, 2006 Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth Application: Can we tell what people are looking at from their brain activity (in real time? Gaussian Spatial Smooth 0 The Data Block Paradigm (six runs per subject Three Categories of Objects (counterbalanced

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Partitioned Covariance Matrices and Partial Correlations. Proposition 1 Let the (p + q) (p + q) covariance matrix C > 0 be partitioned as C = C11 C 12

Partitioned Covariance Matrices and Partial Correlations. Proposition 1 Let the (p + q) (p + q) covariance matrix C > 0 be partitioned as C = C11 C 12 Partitioned Covariance Matrices and Partial Correlations Proposition 1 Let the (p + q (p + q covariance matrix C > 0 be partitioned as ( C11 C C = 12 C 21 C 22 Then the symmetric matrix C > 0 has the following

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Canonical Correlation Analysis

Canonical Correlation Analysis Chapter 7 Canonical Correlation Analysis 7.1 Introduction Let x be the p 1 vector of predictors, and partition x = (w T, y T ) T where w is m 1 and y is q 1 where m = p q q and m, q 2. Canonical correlation

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

STT 843 Key to Homework 1 Spring 2018

STT 843 Key to Homework 1 Spring 2018 STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Statistical Classification. Minsoo Kim Pomona College Advisor: Jo Hardin

Statistical Classification. Minsoo Kim Pomona College Advisor: Jo Hardin Statistical Classification Minsoo Kim Pomona College Advisor: Jo Hardin April 2, 2010 2 Contents 1 Introduction 5 2 Basic Discriminants 7 2.1 Linear Discriminant Analysis for Two Populations...................

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Linear Algebra in Computer Vision. Lecture2: Basic Linear Algebra & Probability. Vector. Vector Operations

Linear Algebra in Computer Vision. Lecture2: Basic Linear Algebra & Probability. Vector. Vector Operations Linear Algebra in Computer Vision CSED441:Introduction to Computer Vision (2017F Lecture2: Basic Linear Algebra & Probability Bohyung Han CSE, POSTECH bhhan@postech.ac.kr Mathematics in vector space Linear

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Feature selection and extraction Spectral domain quality estimation Alternatives

Feature selection and extraction Spectral domain quality estimation Alternatives Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

SOME RESULTS ON THE MULTIPLE GROUP DISCRIMINANT PROBLEM. Peter A. Lachenbruch

SOME RESULTS ON THE MULTIPLE GROUP DISCRIMINANT PROBLEM. Peter A. Lachenbruch .; SOME RESULTS ON THE MULTIPLE GROUP DISCRIMINANT PROBLEM By Peter A. Lachenbruch Department of Biostatistics University of North Carolina, Chapel Hill, N.C. Institute of Statistics Mimeo Series No. 829

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

DYNAMIC AND COMPROMISE FACTOR ANALYSIS

DYNAMIC AND COMPROMISE FACTOR ANALYSIS DYNAMIC AND COMPROMISE FACTOR ANALYSIS Marianna Bolla Budapest University of Technology and Economics marib@math.bme.hu Many parts are joint work with Gy. Michaletzky, Loránd Eötvös University and G. Tusnády,

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques A reminded from a univariate statistics courses Population Class of things (What you want to learn about) Sample group representing

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

Announcements (repeat) Principal Components Analysis

Announcements (repeat) Principal Components Analysis 4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Pattern Recognition 2

Pattern Recognition 2 Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error

More information

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline. MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Two sample T 2 test 1 Two sample T 2 test 2 Analogous to the univariate context, we

More information

Minimum Error Rate Classification

Minimum Error Rate Classification Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

Principal Components Analysis

Principal Components Analysis Principal Components Analysis Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 16-Mar-2017 Nathaniel E. Helwig (U of Minnesota) Principal

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information