LECTURE :FACTOR ANALYSIS

Similar documents
Least Squares Fitting of Data

Least Squares Fitting of Data

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup

Excess Error, Approximation Error, and Estimation Error

y new = M x old Feature Selection: Linear Transformations Constraint Optimization (insertion)

Elastic Collisions. Definition: two point masses on which no external forces act collide without losing any energy.

Mixture of Gaussians Expectation Maximization (EM) Part 2

On the Eigenspectrum of the Gram Matrix and the Generalisation Error of Kernel PCA (Shawe-Taylor, et al. 2005) Ameet Talwalkar 02/13/07

Composite Hypotheses testing

Several generation methods of multinomial distributed random number Tian Lei 1, a,linxihe 1,b,Zhigang Zhang 1,c

1 Definition of Rademacher Complexity

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

PGM Learning Tasks and Metrics

Fermi-Dirac statistics

Preference and Demand Examples

XII.3 The EM (Expectation-Maximization) Algorithm

1 Review From Last Time

COS 511: Theoretical Machine Learning

Our focus will be on linear systems. A system is linear if it obeys the principle of superposition and homogenity, i.e.

CHAPT II : Prob-stats, estimation

COMP th April, 2007 Clement Pang

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. ) with a symmetric Pcovariance matrix of the y( x ) measurements V

Lecture Notes on Linear Regression

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

On the Calderón-Zygmund lemma for Sobolev functions

Limited Dependent Variables

Need for Probabilistic Reasoning. Raymond J. Mooney. Conditional Probability. Axioms of Probability Theory. Classification (Categorization)

Recap: the SVM problem

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Slobodan Lakić. Communicated by R. Van Keer

SEMI-EMPIRICAL LIKELIHOOD RATIO CONFIDENCE INTERVALS FOR THE DIFFERENCE OF TWO SAMPLE MEANS

Outline. Prior Information and Subjective Probability. Subjective Probability. The Histogram Approach. Subjective Determination of the Prior Density

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

, are assumed to fluctuate around zero, with E( i) 0. Now imagine that this overall random effect, , is composed of many independent factors,

Multipoint Analysis for Sibling Pairs. Biostatistics 666 Lecture 18

Singular Value Decomposition: Theory and Applications

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

Machine learning: Density estimation

Applied Mathematics Letters

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Classification learning II

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Collaborative Filtering Recommendation Algorithm

ITERATIVE ESTIMATION PROCEDURE FOR GEOSTATISTICAL REGRESSION AND GEOSTATISTICAL KRIGING

Discriminative classifier: Logistic Regression. CS534-Machine Learning

halftoning Journal of Electronic Imaging, vol. 11, no. 4, Oct Je-Ho Lee and Jan P. Allebach

Gradient Descent Learning and Backpropagation

On the number of regions in an m-dimensional space cut by n hyperplanes

Hidden Markov Models

EE513 Audio Signals and Systems. Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky

T E C O L O T E R E S E A R C H, I N C.

Lecture 3: Probability Distributions

ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980

EM and Structure Learning

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Lecture 10 Support Vector Machines II

Markov Chain Monte-Carlo (MCMC)

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Lecture 3. Camera Models 2 & Camera Calibration. Professor Silvio Savarese Computational Vision and Geometry Lab. 13- Jan- 15.

Generative classification models

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Speech and Language Processing

System in Weibull Distribution

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

,..., k N. , k 2. ,..., k i. The derivative with respect to temperature T is calculated by using the chain rule: & ( (5) dj j dt = "J j. k i.

Denote the function derivatives f(x) in given points. x a b. Using relationships (1.2), polynomials (1.1) are written in the form

Norms, Condition Numbers, Eigenvalues and Eigenvectors

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

3.1 ML and Empirical Distribution

Outline. Review Numerical Approach. Schedule for April and May. Review Simple Methods. Review Notation and Order

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

Expectation Maximization Mixture Models HMMs

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Conjugacy and the Exponential Family

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

Generative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

The Expectation-Maximization Algorithm

PHYS 342L NOTES ON ANALYZING DATA. Spring Semester 2002

Scattering by a perfectly conducting infinite cylinder

Least squares cubic splines without B-splines S.K. Lucas

Determination of the Confidence Level of PSD Estimation with Given D.O.F. Based on WELCH Algorithm

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

AN ANALYSIS OF A FRACTAL KINETICS CURVE OF SAVAGEAU

Tests of Exclusion Restrictions on Regression Coefficients: Formulation and Interpretation

Linear Approximation with Regularization and Moving Least Squares

Support Vector Machines

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

VERIFICATION OF FE MODELS FOR MODEL UPDATING

Xiangwen Li. March 8th and March 13th, 2001

Normal Random Variable and its discriminant functions

Errors for Linear Systems

Transcription:

LCUR :FACOR ANALYSIS Rta Osadchy Based on Lecture Notes by A. Ng

Motvaton Dstrbuton coes fro MoG Have suffcent aount of data: >>n denson Use M to ft Mture of Gaussans nu. of tranng ponts If <<n dffcult to odel a sngle Gaussan uch less a ture of Gaussan

Motvaton data ponts span only a low-densonal n subspace of ML estator of Gaussan paraeters: More generally unless eceeds n by soe reasonable aount the au lkelhood estates of the ean and covarance ay be qute poor. Sngular Can t copute Gaussan Densty

Restrcton on Goal: Ft a reasonable Gaussan odel to the data when <<n. Possble solutons: Lt the nuber of paraeters assue s dagonal. Lt I where s the paraeter under our control.

Contours of a Gaussan Densty General Dagonal Contours are as algned I

Correlaton n the data Restrctng to be dagonal eans odellng the dfferent coordnates of the data as beng uncorrelated and ndependent. Often we would lke to capture soe nterestng correlaton structure n the data.

Modelng Correlaton he odel we wll see today

Factor Analyss Model k Assue a latent rando varable k n ~ N0 I he paraeters of the odel ~ N n nk nn s dagonal quvalently and are ndependent. ~ N0

aple of the generatve odel of p 0 3 4 7 ~ N0 0 0 7 0 0 ~ N0

Generatve process n hgher densons We assue that each data pont s generated by saplng a k-denson ultvarate Gaussan. hen t s apped to a k-densonal n affne space of by coputng Lastly s generated by addng covarance nose to.

Defntons Suppose Suppose Parttoned vector s r.v. where ~ N where r s rs Here r s rr rs and Under our assuptons and are jontly ultvarate Gaussan.

Margnal dstrbuton of By defnton of the jont covarance of and Cov. ] [ Cov Margnal dstrbutons of Gaussans are theselves Gaussan hence ~ N d p p

Condtonal dstrbuton of gven p p p N N Referrng to the defnton of the ultvarate Gaussan dstrbuton t can be shown that ~ N where

Fndng the Paraeters of FA odel Assue and have a jont Gaussan dstrbuton: We want to fnd ~ N and [ ] 0 snce ~ N0 I [ ] [ ] [ ] [ ]. 0 k n

Fndng We need to calculate upper left block [ upper-rght block [ [ ]] [ ]] lower-rght block [ ]] [ ]] [ [ ]] [ ]] ] ] ] Cov ~ N0 I I

Fndng [ [ ] [ ] ] [ ] =0 [ ] [ ndependent Cov [ ] [ ] 0 ]

Fndng Slarly [ [ ] [ ] ] [ ] [ ] [ ] [ ]

Fndng the paraeters cont. Puttng everythng together we have that I N 0 ~ We also see that the argnal dstrbuton of s gven by ~ N hus gven a tranng set log lkelhood of the paraeters s: } { n l / ep log

Fndng the paraeters cont. l log ep n/ o perfor au lkelhood estaton we would lke to ae ths quantty wth respect to the paraeters. But ang ths forula eplctly s hard and we are aware of no algorth that does so n closed-for. So we wll nstead use the M algorth.

M for Factor Analyss -step: p M-step: arg a p log ; d

-step M for FA We need to copute p ; Usng a condtonal dstrbuton of a Gaussan we fnd that ~ N I 0 I / ep k

M-step M for FA Mae: wth respect to the paraeters log p ; d We wll work out the optaton wth respect to Dervatons of the updates for Do t! s an eercse

Update for Λ d p ; log d p p ] log log ; [log ] log log ; [log ~ p p pectaton wth respect to drawn fro

Update for Λ cont. ] log log ; [log ~ p p Reeber that We want to ae ths epresson wth respect to Λ ] ; [log ~ p ep log / / n log log n Do not depend on Λ ~ N

Update for Λ cont. ake dervatve wth respect to Λ ; tr a a a scalar tr tr tr Splfy:

tr tr tr tr BA AB Update for Λ cont. C A B C A B C ABA A tr tr tr

Update for Λ cont. Settng ths to ero and splfyng we get: ~ ~ Solvng for Λ we obtan: ~ ~ Snce s Gaussan wth ean and covarance ~ ] [ ~ ] [ ] [ ] [ ] [ Y Y YY Y Cov ] [ ] [ ] [ Y Cov Y Y YY hence

Update for Λ cont. ~ ~ ~ ] [ ~ ] [ substtute

M-step updates for μ and Ψ ; p Doesn t depend on hence can be coputed once for all the teratons. he dagonal contans only dagonal entrees

Probablstc PCA Probablstc generatve vew of data D M

Copare Probablstc PCA FA sphercal dagonal as-algned

Probablstc PCA he coluns of W are the prncple coponents. Can be found usng ML n closed for M ore effcent when only few egenvectors are requred avods evaluaton of data covarance atr Other advantages see Bshop Ch..