Factor Analysis and Indian Buffet Process

Size: px

Start display at page:

Download "Factor Analysis and Indian Buffet Process"

Patrick Hood
6 years ago
Views:

1 Factor Analysis and Indian Buffet Process Lecture 4 Peixian Chen pchenac@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology March 25, 2013 Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

2 Factor Analysis A statistical method for dimensionality reduction Represents correlated observed variables with a smaller number of latent variables referred to as Factors Used to explore the underlying dimensions of the data (Exploratory Factor Analysis) or to test specific hypotheses (Confirmatory Factor Analysis) Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

3 Outline Factor Models 1 Factor Models Two Linear Models 2 Exploratory Factor Analysis and Confirmatory Factor Analysis Exploratory Factor Analysis 3 Indian Buffet Process Finite to infinite binary matrices Equivalence Class The Indian Buffet Process Applications Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

4 Tow Linear Models Factor Models For Component Analysis: x i = a i1 z i + a i2 z a in z n (i = 1, 2,..., n) x i the ith observed variable, z 1... z n n uncorrelated components, a i1... a in the correlations of n components with ith variable. Each component makes a maximum contrbution to the sum of the variances of the n variables. All the components are required to reproduce the correlations among the variables, but only a few components may be retained in a practical problem if they account for a large percentage of the total vairance Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

5 Two Linear Models Factor Models Two Linear Models For (Classical) Factor Analysis: x i = a i1 z i + a i2 z a im z m + ɛu i (i = 1, 2,..., n) x i the ith observed variable, z 1... z m m common factors(normally m < n), u i a unique factor, a i1... a im factor loadings. Classical factor analysis model is designed to maximally reproduce the correlations. Each of the n observed variables is described linearly in terms of m common factors and a unique factor. Common factors account for the correlations among the variabels, while each unique factor accounts for the remaining variance (including error). Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

6 Factor Models Two Linear Models Difference of Factor and Components Components are real factors. In PCA the factors are actual combinations of variables. The factor loadings are the correlations of these combinations with the factors. Common factors are hypothetical. Common factors have to be estimated from actual variables and to obtain them mathematical procedures must be used which specify factors in terms of common variance. Common and unique variance in factor analysis are separated. Excluding the latter one is beneficial because generally the unique variance is of no scientific interest. PCA tries to account for both of them. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

7 Factor Model Factor Models Two Linear Models FA assumes that there is a set of latent factors z j which when acting in combination generate the observed variables x. The goal of FA is to characterize the dependency among the observed variables by means of a smaller number of factors. we simplify the model as : x i u i = a i1 z i + a i2 z a im z m + ɛ i (i = 1, 2,..., n) or in vector-marix form as x u = Az + ɛ x i the ith observed variable, z 1... z m m common factors(normallym < n), ɛ i Noise source treated as noise or error as the unique factor of ith observed variable, a i1... a im factor loadings, u i u i = E[x i ]. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

8 Factor Models Factor Model Assumption Two Linear Models E[x] = u, Cov(x) = Σ. Without loss of generality, we assume u = 0; E[z i ] = 0,Var(z i ) = 1, and Cov(z i, z j ) = 0 (standardized, factors are mutually independent); E[ɛ i ] = 0,Var(ɛ i ) = φ i, Cov(ɛ i, ɛ j ) = 0; Cov(ɛ, z) = 0; Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

9 Factor Models Two Linear Models Covariance Matrix Given that Var(z i ) = 1 and Var(ɛ i ) = φ i, Σ = Cov(x) = Cov(Az + ɛ) = Cov(Az) + Cov(ɛ) = ACov(z)A T + Φ = AA T + Φ Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

10 Factor Models Two Linear Models A Single Common Factor Example This is a two-variable, one-common factor model: x 1 = a 1 z + ɛ 1 x 2 = a 2 z + ɛ 2 Cov(z, x 1 ) = E[(z z)(x 1 x 1 )] = E[zx 1 ] = E[z(a 1 z + ɛ 1 )] = a 1 E[z 2 ] + E[zɛ 1 ] = a 1 Var(z) + Cov(zɛ 1 ) = a 1 Var(z) = a 1 Similarly, Cov(z, x 2 ) = a 2. And to extend the conclusion to multi-common factor model, the factor loadings A represent the covariance between the variables and the factors. Note that if all variables are standardized to have unit variance, factor loadings are equivalent to correlations between factors and variables where only a single common factor is involved, or in the case where multiple common factors are orthogonal to each other. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

11 Factor Models Two Linear Models A Single Common Factor Example So,Cov(x 1, x 2 ) = a 1 a 2. Cov(x 1, x 2 ) = E[(x 1 x 1 )(x 2 x 2 )] = E[(a 1 z + ɛ 1 )(a 2 z + ɛ 2 )] = E[a 1 a 2 z 2 + a 1 zɛ 2 + a 2 zɛ 1 + ɛ 1 ɛ 2 ] = a 1 a 2 Var(z) = a 1 a 2 Note that in other models with more common factors, the covariance between two observed variables is more complex. For example, Cov(x 1, x 2 ) = a 11 a 21 + a 12 a 22 in a two-factor model. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

12 Exploratory Factor Analysis and Confirmatory Factor Analysis Outline 1 Factor Models Two Linear Models 2 Exploratory Factor Analysis and Confirmatory Factor Analysis Exploratory Factor Analysis 3 Indian Buffet Process Finite to infinite binary matrices Equivalence Class The Indian Buffet Process Applications Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

13 Exploratory Factor Analysis and Confirmatory Factor Analysis Brief Introduction to EFA and CFA Exploratory Factor Analysis (EFA) Used to explore the dimensionality of a measurement instrument by finding the smallest number of interpretable factors needed to explain the correlations among a set of variables. Exploratory in the sense that it places no structure on the linear relationships between the observed variables and on the linear relationships between the observed variables and the factors but only specifies the number of latent variables. Confirmatory Factor Analysis (CFA) Used to study how well a hypothesized factor model fits a new sample from the same population or a sample from a different population and characterized by allowing restrictions on the parameters of the model. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

14 Exploratory Factor Analysis and Confirmatory Factor Analysis Exploratory Factor Analysis Main Steps in Exploratory Factor Analysis (1) Collect and explore data: choose relevant variables. (2) Extract initial factors (via principal components) (3) Rotate and interpret (4) (a) Decide if changes need to be made (b) repeat (3) (5) Construct scales and use in further analysis Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

15 Exploratory Factor Analysis and Confirmatory Factor Analysis (1) Data Matrix Exploratory Factor Analysis Factor analysis is totally dependent on correlations between variables. So first a covariance matrix or correlation matrix should be prepared if it s not already available. From Kim and Mueller, they suggest one may rely on the use of a correlation matrix in EFA. Because(1) many existing computer programs do not accept the covariance matrix as basic input data, and (2)almost all of the examples in the literature are based on correlation matrices. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

16 Exploratory Factor Analysis and Confirmatory Factor Analysis Exploratory Factor Analysis (2)Extracting initial factors To find the number of factors that can adequately explain the observed correlations among observed variables. Typical approaches: maximum likelihood method least-squares method Alpha factoring Image factoring principal components analysis At this stage of the analysis one should not be concerned with whether the underlying factors are orthogonal or oblique all the initial solutions are based on the orthogonal solution. Nor should one be too concerned with whether the factors extracted are interpretable or meaningful. The chief concern is whether a smaller number of factors can account for the covariation among a much larger number of variables. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

17 Exploratory Factor Analysis and Confirmatory Factor Analysis Exploratory Factor Analysis (2)Extracting initial factors An initial solution must provide number of common factors to be extracted OR objective criterion for choosing number of factors Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

18 Exploratory Factor Analysis and Confirmatory Factor Analysis (3) Rotation to a terminal solution Exploratory Factor Analysis On initial solution, certain restrictions are imposed. there are k common factors, underlying factors are orthogonal to each other, the first factor accounts for as much variance as possible, the second factor accounts for as much of the residual variance left unexplained by the first factor...and so on. After choosing number of factors to retain, we want to spread variability more evenly among factors. so we rotate factors: redefine factors such that loadings on various factors tend to be very high (-1 or 1) or very low (0) intuitively, it makes sharper distinctions in the meanings of the factors Note that we use factor analysis for rotation NOT principal components! Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

19 Exploratory Factor Analysis and Confirmatory Factor Analysis (3)Rotation (continued) Exploratory Factor Analysis Unrotated solution is based on the idea that each factor tries to maximize variance explained, conditional on previous factors. What if we take that away? Then, there is not one best solution. All solutions are relatively the same. Goal is simple structure Most construct validation assumes simple (typically rotated) structure. Rotation does NOT improve fit! Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

20 Exploratory Factor Analysis and Confirmatory Factor Analysis (3)Rotation (continued) Exploratory Factor Analysis Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

21 Outline Indian Buffet Process 1 Factor Models Two Linear Models 2 Exploratory Factor Analysis and Confirmatory Factor Analysis Exploratory Factor Analysis 3 Indian Buffet Process Finite to infinite binary matrices Equivalence Class The Indian Buffet Process Applications Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

22 Indian Buffet Process Finite to infinite binary matrices Finite to infinite binary matrices Clustering algorithms (e.g. using mixture models) represent data in terms of which cluster each data point belongs to.but clustering models are restrictive. Consider modelling peoples movie preferences (the Netix problem). A movie might be described using features such as is science ction, has Charlton Heston, was made in the US, was made in 1970s, has apes in it... these features may be unobserved (latent). The number of potential latent features for describing a movie (or person, news story, image, gene, speech waveform, etc) is unlimited. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

23 Indian Buffet Process Finite to infinite binary matrices Finite to infinite binary matrices Derive a distribution on innite binary matrices by starting with a simple model that assumes K features, and then taking the limit as K. The resulting distribution corresponds to a simple generative process, which we term the Indian buffet process. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

24 Infinite binary matrices Indian Buffet Process Finite to infinite binary matrices F = [f T 1 ft 2... ft N ] latent feature values for all N objects A prior on F can be dened by specifying priors for Z and V separately, with p(f) = P(Z)p(V). The binary matrix Z indicating which featuresare possessed by each object, with z i k = 1 if object i has feature k and 0 otherwise, and a second matrix V indicating the value of each feature for each object. F can be expressed as the elementwise (Hadamard) product of Z and V, F = Z V. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

25 A finite feature model Indian Buffet Process Finite to infinite binary matrices Probability model: π k α Beta( α K, 1) z ik π k Bernoulli(π k ) z ik form a binary N K feature matrix Z. Each object possesses feature k with probability π k. The features are generated independently. π k can each take on any value in [0,1]. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

26 A finite feature model Indian Buffet Process Finite to infinite binary matrices P(Z π) = K k=1 i=1 N P(z ik π k ) = K k=1 π m k k (1 π k) N m k m k = N i=1 z ik is the number of objects possessing feature k. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

27 A finite feature model Indian Buffet Process Finite to infinite binary matrices P(Z) = = = K N ( P(z ik π k ))p(π k )dπ k k=1 i=1 K B(m k + α K, N m k + 1) k=1 K k=1 B( α K, 1) α K Γ(m k + α K )Γ(N m k + 1) Γ(N α K ) (1) Note that the result follows from conjugacy between the binomial and beta distributions. This distribution is exchangeable, depending only on the counts m k = N i=1 z ik. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

28 A finite feature model Indian Buffet Process Finite to infinite binary matrices The expectation of the number of non-zero entries in the matrix Z,E[I T ZI] = E[ ik z ik], has and upper bound that is independent of K. Each column of Z is independent, first we compute: E[I T z k ] = N E(z ik ) = i=1 N i=1 1 E[I T ZI] = KE[I T z k ] = Nα 1 + α K 0 α π k p(π k )dπ k = N K 1 + α Then K Upper bound: Nα. Even in the K limit, the matrix is expected to have a nite number of non-zero entries. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

Equivalence Class Indian Buffet Process Equivalence Class Function that maps binary matrices to left-ordered binary matrices : lof(x) lof(z) is obtained by ordering the columns of the binary

29 Equivalence Class Indian Buffet Process Equivalence Class Function that maps binary matrices to left-ordered binary matrices : lof(x) lof(z) is obtained by ordering the columns of the binary matrix Z from left to right by the magnitude of the binary number, taking the first row as the most significant bit. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

30 Indian Buffet Process Equivalence Class History history : the full history of the feature k, (z 1k, z 2k,..., z Nk ). K h :the number of features possessing the history h. K 0 : number of features for which m k = 0. K + = 2 N 1 h=1 K h: the number of features for which m k > 0. K = K 0 + K +. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

31 Cardinality of [Z] Indian Buffet Process Equivalence Class Cardinality of [Z]: the number of matrices that map to the same left-ordered form. The cardinality reduces when Z contains identical columns. The cardinality of [Z] is K! 2 N 1 h=0 K h! Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

32 Indian Buffet Process Taking the Infinite Limit Equivalence Class From equation (1) P([Z]) = Z [Z] P(Z) = K! 2 N 1 h=0 K h! K k=1 α K Γ(m k + α K )Γ(N m k + 1) Γ(N α K ) (2) Then we divide the columns of Z into two subsets: m k > 0 if k K + and m k = 0 otherwise. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

33 Taking the infinite limit Indian Buffet Process Equivalence Class Then we get K k=1 α K Γ(m k + α K )Γ(N m k + 1) Γ(N α K ) α =( K Γ( α )Γ(N + 1) K Γ(N α K ) )K K + α =( K Γ( α )Γ(N + 1) K Γ(N α K ) )K K + k=1 K + k=1 α K Γ(m k + α K )Γ(N m k + 1) Γ(N α K ) Γ(m k + α K )Γ(N m k + 1) Γ( α )Γ(N + 1) K (3) N! =( N j=1 (j + α ( α K ))K K )K+ K + k=1 (N m k )! m k 1 j=1 (j + α K ) N! Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

34 Indian Buffet Process Taking the infinite limite Equivalence Class Substituing Equation (3) to Equation (2) and rearranging terms, we take the limit K : lim K = α K+ 2N 1 h=1 K h! α K+ 2 N 1 K K! K 0!K ( N! + K+ N j=1 (j + α K ))K h=1 K h! 1 exp{ αh N} K + k=1 k=1 (N m k )!(m k 1)! N! where H N is the Nth harmonic number, H N = N j=1 1 j. (N m k )! m k 1 j=1 (j + α K ) Again, this distribution is exchangeable: neither the number of identical columns nor the column sums are affected by the ordering on objects. N! (4) Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

35 Indian Buffet Process The Indian Buffet Process The Indian Buffet Process N customers enter a restaurant one after another. Each customer encounters a buffet consisting of innitely many dishes arranged in a line. The rst customer starts at the left of the buffet and takes a serving from each dish, stopping after a Poisson(α) number of dishes The ith customer moves along the buffet, sampling dishes in proportion to their popularity, serving himself with probability m k i, where mk is the number of previous customers who have sampled a dish. Having reached the end of all previous sampled dishes, the ith customer then tries a Poisson(α) number of new dishes. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

36 Indian Buffet Process The Indian Buffet Process The Indian Buffet Process Using K (i) 1 to indicate the number of new dishes sampled by the ith customer, the probability of any particular matrix being produced by this process is P(Z) = Not left-ordered form α K+ N i=1 K (i) 1!exp{ αh N} K + k=1 (N m k )!(m k 1)! N! Customers are not exchangeable under this distribution N i=1 K (i) 1! 2N 1 h=1 K matrices generated via this process map to the same left-ordered h! form. P([Z]) is obtained by multiplying by this quantity. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

37 Indian Buffet Process Inference by Gibbs Sampling The Indian Buffet Process Conditional distribution: When K, P(z ik = 1 z i,k ) = 1 0 P(z ik π k )p(π k z i,k )dπ k = m i,k + α K N + α K P(z ik ) = m i,k N Similarly the number of new features associated with object i should be drawn from a Poisson( α N ) distribution. Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

38 Applications Indian Buffet Process Applications Modelling Data Latent variable model: let X be the N D matrix of observed data, and Z be the N K matrix of binary latent features P (X, Z α) = P (X Z)P (Z α) By combining the IBP with different likelihood functions we can get different kinds of models: Models for graph structures (w/ Wood, Griffiths, 2006) Models for protein complexes (w/ Chu, Wild, 2006) Models for overlapping clusters (w/ Heller, 2007) Models for choice behaviour (Görür, Jäkel & Rasmussen, 2006) Models for users in collaborative filtering (w/ Meeds, Roweis, Neal, 2006) Sparse latent factor models (w/ Knowles, 2007) Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

39 Applications Indian Buffet Process Applications Gibbs sampling: Posterior Inference in IBPs P (Z, α X) P (X Z)P (Z α)p (α) P (z nk = 1 Z (nk), X, α) P (z nk = 1 Z (nk), α)p (X Z) If m n,k > 0, P (z nk = 1 z n,k ) = m n,k N For infinitely many k such that m n,k = 0: Metropolis steps with truncation to sample from the number of new features for each object. If α has a Gamma prior then the posterior is also Gamma Gibbs sample. Conjugate sampler: assumes that P (X Z) can be computed. Non-conjugate sampler: P (X Z) = P (X Z, θ)p (θ)dθ cannot be computed, requires sampling latent θ as well (c.f. (Neal 2000) non-conjugate DPM samplers). Slice sampler: non-conjugate case, is not approximate, and has an adaptive truncation level using a stick-breaking construction of the IBP (Teh, et al, 2007). Particle Filter: (Wood & Griffiths, 2007). Accelerated Gibbs Sampling: maintaining a probability distribution over some of the variables (Doshi-Velez & Ghahramani, 2009). Variational inference: (Doshi-Velez, Miller, van Gael, & Teh, 2009). Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

Applications Indian Buffet Process Applications An application of IBPs A Non-Parametric Bayesian Method for Inferring Hidden Causes (Wood, Griffiths, Ghahramani, 2006) Inferring stroke localization

40 Applications Indian Buffet Process Applications An application of IBPs A Non-Parametric Bayesian Method for Inferring Hidden Causes (Wood, Griffiths, Ghahramani, 2006) Inferring stroke localization from patient symptoms: (50 stroke patients, 56 symptoms/signs) The IBP models the graph structure connecting hidden causes to symptoms Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

41 Applications Indian Buffet Process Applications Infinite Sparse Latent Factor Models x z... Model: Y = G(Z X) + E y G where Y is the data matrix, G is the factor loading matrix, Z IBP(α, β) is a mask matrix, X is heavy tailed factors and E is Gaussian noise. The IBP models the sparsity structure in the latent variables (w/ Knowles, 2007) Peixian Chen (HKUST) Factor Analysis and Indian Buffet Process March 25, / 47

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable