MODEL BASED CLUSTERING FOR COUNT DATA
|
|
- Edmund Henry
- 5 years ago
- Views:
Transcription
1 MODEL BASED CLUSTERING FOR COUNT DATA Dimitris Karlis Department of Statistics Athens University of Economics and Business, Athens April
2 OUTLINE Clustering methods Model based clustering!"the general model!"algorithmic issues!"inference!"multivariate Normal clustering Multivariate Poisson distributions Model Based clustering via the Multivariate Poisson distribution!"the models!"estimation Application to marketing data!"the data!"the model!"results Generalizing the model Future research and open problems
3 CLUSTERING Purpose: To find observations with similar characteristics Other names: taxonomy, segmentation etc Methods used Hierarchical Clustering!"Large storage demands!"dependence on the distance measure and the linkage method!"not solid theoretical background K- Means!"Heuristic algorithm!"computationally feasible!"dependence on the initial solution!"not solid theoretical background. Model Based Clustering!"Probabilistic method!"strong theoretical background!"inferential procedures available
4 MODEL BASED CLUSTERING The population consists of k subpopulations For the q-dimensional observation y i from the j-th subpopulation: yi ~ f( yi θ j) (θ j unknown vector of parameters ) Unconditional density : f ( y i ) = k j= p j f ( y i θ j ) Problem: Finding the values of the non-observable vector ϕ = (φ, φ,, φ n ) φ i = j if the i-th observation belongs to the j-th subpopulation. Define the vector z ij = φ φ i i = j j
5 ESTIMATION The purpose of model-based clustering is to estimate the parameters Θ= ( p,..., p, θ,..., θ ). k k Loglikelihood: L( y; θ, p) n k = ln i= j= p j f ( y i θ j ) Solution: EM algorithm
6 EM ALGORITHM FOR MODEL BASED CLUSTERING Observed data y i Complete data ( y, z ) i i E-step: Estimate w = E( z y, Θ ) ij ij i M-step: Update Θ by solving appropriate equations (usually weighted likelihood equations with weights w ij ) Variants: Clasiffication EM (CEM)
7 THE EM ALGORITHM (Dempster et al, 975, Meng and Van Dyk, 997, McLachlan and Krishnan, 997) Z = Complete Data, ) X observed data i i X Y ( i i Y missing data (or they can be considered as missing) i E-step Compute k ) Q( ϕ ϕ ) = E(log p( Z ( ϕ) X, ϕ ( k ) ) M-step Update ( k + ) ) ϕ by maximizing Q( ϕ ( k ) ϕ with respect to ϕ
8 MODEL BASED CLUSTERING VIA MULTIVARIATE NORMAL MIXTURES (Barnfield and Raftery, 993) Assume y i ~ f(x θ j ) is MN(μ j,σ j ) μ j mean vector Σ j covariance matrix Wide range of applications (see, McLachlan and Basford, 989, McLAchlan and Peel, etc)
9 INTERESTING FEATURES Σ = λ DAD k k k k ' λ k volume A k shape D k orientation Model Covariance matrix of the k- th component Σ k Shape Orientation Volume λ I Spherical None Same λ I k Spherical None Different 3 Σ Same Same Same 4 λkσ Same Same Different 5 DAD k k ' 6 kdad k k ' λ Same Different Same λ Same Different Different 7 λ DAD k ' Different Same Different 8 Σ k Different Different Different
10 NUMBER OF COMPONENTS Criteria: Information criteria!"aic AIC = -L k + d k!"bic BIC = L + d ln( n)!"aic3, CAIC, JAAIC!"Information Complexity Criteria!"Minimum Information Ratio Criterion k k Cross-validated Criteria Classification Based Criteria!"Normalized Entropy Criterion Bootstrap based criteria!"bootstrap LRT
11 INFERENTIAL PROBLEMS General ML Theorem: The number of components that can be fitted in any dataset is finite and it depends on the data. Usually we are able to estimate a small number of components Other Topics Goodness of fit Standard errors Classifying new observations
12 MULTIVARIATE COUNT DATA Examples Number of different crimes in different areas/time Number of occurrence of different diseases in different areas/time Number of different transaction for a bank account Number of purchases of different products Common elements correlation between the variables the data are counts, usually with large number of zero, so multivariate normal approach inappropriate Approach Proposed: Model based clustering based on the multivariate Poisson distribution
13 MULTIVARIATE POISSON DISTRIBUTION X ~ Poisson( λ ), i=,,m i i X i s are independent Multivariate Poisson distributions (MP) defined as Y=AX, A is a q x m matrix of zeros and ones.
14 Example. m=3, q= X = ( X, X, X3) A = Y = ( Y, Y ) Y = X + X 3 Y = X + X 3 (Trivariate reduction technique) #"Important result: The marginal distributions are Poisson distributions
15 GENERAL MODEL Suppose that the matrix A has the form A = [ A A... ] where A i is a matrix of dimensions q where the columns of the matrix are all the combinations containing exactly i ones and q-i zeros. A m q i E(Y) = AM and Var(Y) = AΣA T where M = E(X) = (λ, λ,, λ m ) and Σ is the variance/covariance matrix of X and is given as: Σ = Var(X) = diag(λ, λ,, λ m )
16 Example For instance, for q=3, we need = A = A = A 3 Y = X + X + X 3 + X 3 Y = X + X + X 3 + X 3 Y 3 = X 3 + X 3 + X 3 + X 3 where the form of the matrix A is now: = A and X = (X, X, X 3, X, X 3, X 3, X 3 ).
17 SOME INTERESTING THINGS Calculation of the probability function computationally demanding Estimation via EM algorithm Use of the entire covariance structure of little practical interest Very few applications, usually assuming just a common covariance term Even if we start with conditionally independent Poisson distribution the resulting mixture has covariance structure
18 COVARIANCE STRUCTURE X λ ~ Poisson( λ ), i=,,m, X i s are independent i i i Define Y=AX, where A is a q x m matrix of zeros and ones. Let θ = ( λ, λ,..., λ m ) the vector of parameters. Then, Y θ ~ MultPoisson( θ ) θ ~ G( θ ), The unconditional variance of Y is given as Var( Y ) = ADA' where D Var( λ) + E( λ) Cov( λ, λ)... Cov( λ, λm) Cov( λ, λ ) Var( λ ) E( λ )... Cov( λ, λ ) Cov( λ, λm) Var( λm) + E( λm) m = D= Var( θ ) + B, B E( λ )... E( λ ) E( λm) =
19 Important things The unconditional covariance can be decomposed in two parts: the intrinsic covariance and the covariance due to mixing Even if the variables are conditionally uncorrelated, unconditionally there is covariance Resulting covariance can be negative as well (multivariate Poisson model has necessarily positive covariance)
20 APPLICATION (Brijs, Karlis, et al. ) The data 55 customers of a large super market chain in Belgium. Purchase of 4 products were of interest!" Cakemix!" Frosting!" Softener!" Detergent Mean Variance Variance/mean Cakemix Frosting Detergent Softener Univariate Clustering Product Number of components Softener Detergent 3 Frosting Cakemix 3
21 cakemix Frequency frosting Frequency detergent Frequency softener Frequency cakemix frosting detergent softener
22 THE MODEL $" Not all the covariances were used $" Restricted covariance structure Important covariances Frostings and cakemix Detergent and softener (r=.66) (r=.48) The model Latent variables X = (X C, X F, X D, X S, X CF, X DS ) The form of the matrix A is now: A = the vector of parameters is now θ = (λ C, λ F, λ D, λ S, λ CF, λ DS ). Thus we have Y C = X C + X CF Y F = X F + X CF Y D = X D + X DS Y S = X S + X DS
23 Conditional probability function Py ( θ) = Py (, y, y, y θ) C F D S = BP( y, y ; λ, λ, λ ) BP( y, y ; λ, λ, λ ) C F C F CF D S D S DS where BP( y λ y λ y min( y, y ) e λc e λ y y, y; λ, λ, λ) = i y y i i i!!! = λ λ λ i with y,y =,,... The unconditional probability mass function is given under a mixture with k-components model by k Py ( ) = ppy j ( C, yf, yd, ys θ j) j=
24 Estimation for the mixture model!"for a model with k components the number of parameters = 7k-.!"The likelihood function is quite complicated for direct maximization.!"em type of algorithm is used.
25 THE EM ALGORITHM Observed data Y i = (Y Ci, Y Fi, Y Di, Y Si ) Unobserved data X i = (X Ci, X Fi, X Di, X Si, X CFi, X DSi ) and Z i = (Z i, Z i,, Z ki ) Complete Data the vectors (Y i, X i, Z i ). Θ : the vector of parameters,
26 E-step: Using the current values of parameters calculate w E( Z Y, ) ppy ( θ ) j i j ij = ji i Θ =, i=, n, j=, k Py ( i ) x = E( X Y, Θ ) = p BP( y, y ; λ, λ, λ ) CFi CF, i i j Di Si Dj Sj DSj j= min( yc, yf) r= k rpoy ( Ci r λcj ) Poy ( Fi r λfj ) Por ( λcfj ), Py ( ) x = E( X Y, Θ ) = p BP( y, y ; λ, λ, λ ) DSi DS, i i j Ci Fi Cj Fj CFj j= min( yd, ys) r= x x Ci k i rpoy ( Di r λdj ) Poy ( Si r λsj ) Por ( λdsj ), Py ( ) Di = y = y Ci Di x x CFi DSi,, x x Fi Si i = y = y Fi Si x x CFi Dsi,, M-step: Update the parameters n w x ij i= ij = n i= w ij ji p n w ij i= j =, j=, k n i { C F D S CF DS λ, j=, k,,,,,, } If some convergence criterions is satisfied, stop iterating, otherwise go back to the E-step.
27 Issues related to the EM Lk ( + ) Lk ( ) Lk ( ) < $" Stopping criterion: $" Multiple maxima Step : sets of initial values are chosen randomly Step : Each set run for iterations and Step 3: We keep iterating from the solution with the largest likelihood after the initial iterations The above procedure was run times. Check via the gradient function
28 Scalability:!"Working with frequency tables, not with original data speed up the process considerably.!"appropriate for huge databases with million of customers!"appropriate for Data mining!"speed depends on $" the number of components $" the separation between components and $" the relative size of the parameters
29 Number of components Number of parameters loglikelihood AIC BIC Other criteria used: NEC, MIRC, similar results
30 λ λ λ 3 λ 4 Component K=5 λ CF λ DS p K=
31 Bivariate Poisson probability function with parameter vector,4,
32 Probability function of a trivariate Poisson distribution with full covariance structure min( y, y) min( y x, y3) min( y x, y3 x3) Py (, y, y3) = exp λ j x = x3 = x3 = j A y x x3 y x x3 y3 x3 x3 λ λ λ, y x x! y x x! y x x! ( ) ( ) ( ) A = {,,3,,3,4}
33 - loglikelihood Loglik -5 - AIC BIC -5 5 k 5 The Loglikelihood and the AIC and BIC criteria for our dataset
34 . prob k The mixing proportions for a wide range of models with varying number of clusters.
35 Customers in the cluster Cluster Centers cakemix frosting detergen Softener
36 - 3 loglik number of components 5 - model described, -model with covariance between cakemix - detergent and softener -frosting 3- common covariance 4- no covariance model
37 frosting cakemix
38 lambda.4 lambda lambda lambda..5 8 lambda6. lambda lambda lambda4
39 QUALITY OF THE FITTED MODEL Entropy criterion: I ( k) = n k w ij ln( w ij ) i= j= nln(/ k) with the convention that w ij ln(w ij ) = if w ij =. Perfect classification values near Our solution.83 Criterion can be used to select the number of components
40 AN INTERESTING CASE Full covariance structure A [ A, A ]......! "! " # # = = Simple multivariate Poisson distribution (no mixture) Parameters to be estimated λ λ λ " λm λm λm!"off-diagonal elements are covariances!"probability function too complicated!"use of recursive relationships (problematic for 5 or more dimensions)!"latent structure can be used to derive an EM!"EM does not need the calculation of the probability function
41 ONGOING RESEARCH EM algorithm for the general model Extension to the finite mixture model for clustering purposes Bayesian Approach: same data augmentation helps Generalization of the results in order to be applicable (and computationally feasible for large dimensions)
Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs
Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies
More informationEstimating the parameters of hidden binomial trials by the EM algorithm
Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :
More informationDetermining the number of components in mixture models for hierarchical data
Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationStatistics 202: Data Mining. c Jonathan Taylor. Model-based clustering Based in part on slides from textbook, slides of Susan Holmes.
Model-based clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Model-based clustering General approach Choose a type of mixture model (e.g. multivariate Normal)
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationOverview. and data transformations of gene expression data. Toy 2-d Clustering Example. K-Means. Motivation. Model-based clustering
Model-based clustering and data transformations of gene expression data Walter L. Ruzzo University of Washington UW CSE Computational Biology Group 2 Toy 2-d Clustering Example K-Means? 3 4 Hierarchical
More informationPenalized Loss functions for Bayesian Model Choice
Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More information13 Notes on Markov Chain Monte Carlo
13 Notes on Markov Chain Monte Carlo Markov Chain Monte Carlo is a big, and currently very rapidly developing, subject in statistical computation. Many complex and multivariate types of random data, useful
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationAn EM Algorithm for Multivariate Mixed Poisson Regression Models and its Application
Applied Mathematical Sciences, Vol. 6, 2012, no. 137, 6843-6856 An EM Algorithm for Multivariate Mixed Poisson Regression Models and its Application M. E. Ghitany 1, D. Karlis 2, D.K. Al-Mutairi 1 and
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationMore on Unsupervised Learning
More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data
More informationPackage EMMIX. November 8, 2013
Title The EM Algorithm and Mixture Models Package EMMIX November 8, 2013 Description Fit multivariate mixture models via the EM Algorithm. Multivariate distributions include Normal distribution, t-distribution,
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationFinite Mixture Models and Clustering
Finite Mixture Models and Clustering Mohamed Nadif LIPADE, Université Paris Descartes, France Nadif (LIPADE) EPAT, May, 2010 Course 3 1 / 40 Introduction Outline 1 Introduction Mixture Approach 2 Finite
More informationNew Global Optimization Algorithms for Model-Based Clustering
New Global Optimization Algorithms for Model-Based Clustering Jeffrey W. Heath Department of Mathematics University of Maryland, College Park, MD 7, jheath@math.umd.edu Michael C. Fu Robert H. Smith School
More informationVariable selection for model-based clustering
Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition
More informationWhat is Latent Class Analysis. Tarani Chandola
What is Latent Class Analysis Tarani Chandola methods@manchester Many names similar methods (Finite) Mixture Modeling Latent Class Analysis Latent Profile Analysis Latent class analysis (LCA) LCA is a
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationClustering. Léon Bottou COS 424 3/4/2010. NEC Labs America
Clustering Léon Bottou NEC Labs America COS 424 3/4/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression, other.
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationFinancial Econometrics
Financial Econometrics Estimation and Inference Gerald P. Dwyer Trinity College, Dublin January 2013 Who am I? Visiting Professor and BB&T Scholar at Clemson University Federal Reserve Bank of Atlanta
More informationRegression Clustering
Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationBayesian Learning. Bayesian Learning Criteria
Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:
More informationA Note on the Expectation-Maximization (EM) Algorithm
A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)
36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)
More informationA Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution
A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The
More informationClustering on Unobserved Data using Mixture of Gaussians
Clustering on Unobserved Data using Mixture of Gaussians Lu Ye Minas E. Spetsakis Technical Report CS-2003-08 Oct. 6 2003 Department of Computer Science 4700 eele Street orth York, Ontario M3J 1P3 Canada
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationTHE EMMIX SOFTWARE FOR THE FITTING OF MIXTURES OF NORMAL AND t-components
THE EMMIX SOFTWARE FOR THE FITTING OF MIXTURES OF NORMAL AND t-components G.J. McLachlan, D. Peel, K.E. Basford*, and P. Adams Department of Mathematics, University of Queensland, St. Lucia, Queensland
More informationMixtures of Rasch Models
Mixtures of Rasch Models Hannah Frick, Friedrich Leisch, Achim Zeileis, Carolin Strobl http://www.uibk.ac.at/statistics/ Introduction Rasch model for measuring latent traits Model assumption: Item parameters
More informationChapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationDependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.
Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,
More informationA Probability Review
A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in
More informationMultivariate Statistics
Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering
More informationFirst Year Examination Department of Statistics, University of Florida
First Year Examination Department of Statistics, University of Florida August 19, 010, 8:00 am - 1:00 noon Instructions: 1. You have four hours to answer questions in this examination.. You must show your
More informationEmpirical Market Microstructure Analysis (EMMA)
Empirical Market Microstructure Analysis (EMMA) Lecture 3: Statistical Building Blocks and Econometric Basics Prof. Dr. Michael Stein michael.stein@vwl.uni-freiburg.de Albert-Ludwigs-University of Freiburg
More informationBayesian Analysis of Bivariate Count Data
Karlis and Ntzoufras: Bayesian Analysis of Bivariate Count Data 1 Bayesian Analysis of Bivariate Count Data Dimitris Karlis and Ioannis Ntzoufras, Department of Statistics, Athens University of Economics
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationLast lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton
EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares
More informationNovember 2002 STA Random Effects Selection in Linear Mixed Models
November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationGaussian Mixtures and the EM algorithm
Gaussian Mixtures and the EM algorithm 1 sigma=1.0 sigma=1.0 Responsibilities 0.0 0.2 0.4 0.6 0.8 1.0 sigma=0.2 sigma=0.2 Responsibilities 0.0 0.2 0.4 0.6 0.8 1.0 2 Details of figure Left panels: two Gaussian
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationOutline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models
Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University
More informationCommunications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study
Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Journal: Manuscript ID: LSSP-00-0.R Manuscript Type: Original Paper Date Submitted by the Author: -May-0 Complete List
More informationIntroduction to Machine Learning and Cross-Validation
Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance
More informationDifferent points of view for selecting a latent structure model
Different points of view for selecting a latent structure model Gilles Celeux Inria Saclay-Île-de-France, Université Paris-Sud Latent structure models: two different point of views Density estimation LSM
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationDiscrete Choice Modeling
[Part 4] 1/43 Discrete Choice Modeling 0 Introduction 1 Summary 2 Binary Choice 3 Panel Data 4 Bivariate Probit 5 Ordered Choice 6 Count Data 7 Multinomial Choice 8 Nested Logit 9 Heterogeneity 10 Latent
More informationLabel Switching and Its Simple Solutions for Frequentist Mixture Models
Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching
More informationA note on multiple imputation for general purpose estimation
A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume
More informationBrandon C. Kelly (Harvard Smithsonian Center for Astrophysics)
Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationMachine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /
Machine Learning for Signal rocessing Expectation Maximization Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning Distributions for Data roblem: Given a collection of examples from some data,
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationExpectation Maximization, and Learning from Partly Unobserved Data (part 2)
Expectation Maximization, and Learning from Partly Unobserved Data (part 2) Machine Learning 10-701 April 2005 Tom M. Mitchell Carnegie Mellon University Clustering Outline K means EM: Mixture of Gaussians
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationChoosing initial values for the EM algorithm for nite mixtures
Computational Statistics & Data Analysis 41 (2003) 577 590 www.elsevier.com/locate/csda Choosing initial values for the EM algorithm for nite mixtures Dimitris Karlis, Evdokia Xekalaki Department of Statistics,
More informationPACKAGE LMest FOR LATENT MARKOV ANALYSIS
PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationCOMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017
COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University OVERVIEW This class will cover model-based
More informationLecture 4: Generalized Linear Mixed Models
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 An example with one random effect An example with two nested random effects
More informationChapter 10. Semi-Supervised Learning
Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline
More information4. Distributions of Functions of Random Variables
4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationPackage flexcwm. R topics documented: July 2, Type Package. Title Flexible Cluster-Weighted Modeling. Version 1.1.
Package flexcwm July 2, 2014 Type Package Title Flexible Cluster-Weighted Modeling Version 1.1 Date 2013-11-03 Author Mazza A., Punzo A., Ingrassia S. Maintainer Angelo Mazza Description
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationSTOCHASTIC MODELING OF ENVIRONMENTAL TIME SERIES. Richard W. Katz LECTURE 5
STOCHASTIC MODELING OF ENVIRONMENTAL TIME SERIES Richard W Katz LECTURE 5 (1) Hidden Markov Models: Applications (2) Hidden Markov Models: Viterbi Algorithm (3) Non-Homogeneous Hidden Markov Model (1)
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationMultiple Random Variables
Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationGlobal Model Fit Test for Nonlinear SEM
Global Model Fit Test for Nonlinear SEM Rebecca Büchner, Andreas Klein, & Julien Irmer Goethe-University Frankfurt am Main Meeting of the SEM Working Group, 2018 Nonlinear SEM Measurement Models: Structural
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationThe generative approach to classification. A classification problem. Generative models CSE 250B
The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which
More information39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017
Permuted and IROM Department, McCombs School of Business The University of Texas at Austin 39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017 1 / 36 Joint work
More informationPackage LCAextend. August 29, 2013
Type Package Package LCAextend August 29, 2013 Title Latent Class Analysis (LCA) with familial dependence in extended pedigrees Version 1.2 Date 2012-03-15 Author Arafat TAYEB ,
More information