BU Statistics Seminar Series Fall 2006
|
|
- Bathsheba Patrick
- 6 years ago
- Views:
Transcription
1 Modal EM for Mixtures and its Application in Clustering BU Statistics Seminar Series Fall 2006 Surajit Ray BU : Sep slide #1
2 Why Study Modality Inference about actual data generation process. Potentially symptomatic to the underlying population structure or substructure. Modes are stable and more elemental than components of a mixture model. For example to model a single t-distribution we may require mixture of many normals with same mean but different variance. BU : Sep slide #2
3 Mixture of Distributions K-component mixture: f(x) = K j=1 π j f j (x; λ j, θ) x R D f j (x; λ j, θ) component densities. π j mixing proportions. 0 π j 1 j and K j=1 π j = 1. π lives in a K 1 simplex BU : Sep slide #3
4 Mixture of Distributions Flexible way of modeling heterogeneous population. Data reduction technique: Visualization Directly applicable model based clustering.[ray 2003, Ray and Lindsay 2006] BU : Sep slide #4
5 Contents Modal EM and Modal Cluster of parametric mixtures Modes and Clusters from Kernel Density Estimates/ Nonparametric Modes Multi-scale modal clustering or Hierarchical Modal clustering Further Pruning of Modes: Modal Ridges, Gap test, Hilltop Test, Coverage Visualization of Modes: Dimension Reduction: Direction of maximum homogeneity Application: Parametric mixture Modal clustering Multi-scale Modal Clustering Conclusion BU : Sep slide #5
6 Components vs Modes 5 component Bivariate Normal Mixture: yy µ µ µ µ µ xx BU : Sep slide #6
7 Components vs Modes 5 component Bivariate Normal Mixture: Only 3 modes y µ µ µ µ µ x BU : Sep slide #6
8 Number of components Number of Modes Density Density x Mixture of Normals 4 standard deviations apart Bimodal x Mixture of Normals 2 standard deviations apart Unimodal For Univariate Normals: number of components number of modes. BU : Sep slide #7
9 Number of components Number of Modes Density Density x Mixture of Normals 4 standard deviations apart Bimodal x Mixture of Normals 2 standard deviations apart Unimodal Conditions for bimodality of two univariate normals (Helguero, 1904) X πn (µ 1, σ) + (1 π)n (µ 2, σ) µ 2 µ 1 σ 2 (π = 1 2 ) Conditions for arbitrary univariate mixture in (Eisenberger:1964,Behbodian:1970, Robertson and Fryer,1969) BU : Sep slide #7
10 Parametric Modes: Multivariate data : Equal variance Not many previous work in this area except Ray and Lindsay (2005) A mixture of multivariate normal is bimodal the univariate distribution along some line is bimodal. Get the axis of maximum separation. Use the bimodality condition in the univariate case to determine modality. For mixture of two multivariate normals this is the line joining the two means. BU : Sep slide #8
11 Parametric Modes: Multivariate data : Equal variance Not many previous work in this area except Ray and Lindsay (2005) A mixture of multivariate normal is bimodal the univariate distribution along some line is bimodal. Get the axis of maximum separation. Use the bimodality condition in the univariate case to determine modality. For mixture of two multivariate normals this is the line joining the two means. Theorem 1 Multivariate Modality Condition: Equal variance Case The distribution of X is bimodal iff X πn (µ 1, Σ) + (1 π)n (µ 2, Σ) (µ 1 µ 2 ) Σ 1 (µ 1 µ 2 ) > 4 BU : Sep slide #8
12 The Ridgeline Manifold g(x) = π 1 φ(x; µ 1, Σ 1 ) + π 2 φ(x; µ 2, Σ 2 ) π K φ(x; µ K, Σ K ) Mapping x : R K 1 R D x (α) = [ α 1 Σ α 2 Σ α K Σ 1 ] 1 K [ α1 Σ 1 1 µ 1 + α 2 Σ 1 2 µ α K Σ 1 K µ ] K The image of this map will be denoted by M and called the ridgeline surface or manifold. If K = 2, it will be called the ridgeline as it is a one-dimensional curve. Theorem 2 All the critical values of the D-dimensional multivariate mixture, and hence modes, antimodes and saddle points, lie in the manifold M given by x (α) Enormous dimension reduction for d >> K. Explore only K 1 dimensions. BU : Sep slide #9
13 Modality and mixing proportion: Π-plots Ridgeline elevation/contour plots Full information (location and height) of the modes and saddle-point. No dependence on π. ( Mixing proportions) Π-plots and Analytical Solutions Only location of critical points. But dependence on the mixing parameter π BU : Sep slide #10
14 The Π-equation Two component case: If x (α) is a critical value of h(α) if it satisfies h (α) = πφ 1 (x (α)) + πφ 2 (x (α)) = 0, whereh(α) = g(x α ) Solving gives Π(α) = φ 2(α) φ 2 (α) φ 1 (α). = if α is a critical value then it solves the ıpi-equation : Π(α) = π. 1 solution = 1 mode 3 solution = 2 mode 5 solution = 3 mode BU : Sep slide #11
15 Analytical Solution for the number of modes where p(α) = (µ 2 µ 1 ) Σ 1 κ(α) = [p(α)] 2 [1 αᾱp(α)], 1 Sα 1 Σ 1 2 Sα 1 Σ 1 2 Sα 1 Σ 1 1 (µ 2 µ 1 ). where S α = [ ] α 1 Σ α 2 Σ 1 1 [ 2 α1 Σ 1 1 µ 1 + α 2 Σ 1 2 µ ] 2 BU : Sep slide #12
16 Analytical Solution for the number of modes where p(α) = (µ 2 µ 1 ) Σ 1 κ(α) = [p(α)] 2 [1 αᾱp(α)], 1 Sα 1 Σ 1 2 Sα 1 Σ 1 2 Sα 1 Σ 1 1 (µ 2 µ 1 ). where S α = [ ] α 1 Σ α 2 Σ 1 1 [ 2 α1 Σ 1 1 µ 1 + α 2 Σ 1 2 µ ] 2 Solution to: Quadratic for Equal Variance. Cubic for Proportional variance. Open Question Upper bound for the number of modes for unequal variance case. BU : Sep slide #12
17 Modal EM and Modal Cluster of parametric mixtures x1 x2 x3 x4 x Any point x can move along its path of steepest ascent, w.r.t the density. The associated mode of x is that point after which no more ascent is possible. BU : Sep slide #13
18 MEM: The Modal EM Modal EM (MEM) algorithm solves for a local maximum of a mixture density by an ascending iterative procedure starting from any initial point. 1. Expectation Step p k = π kf k (x (r) ) f(x (r) ), k = 1,..., K 2. Maximization Step x (r+1) = arg max x K p k log f k (x). k=1 BU : Sep slide #14
19 MEM: The Modal EM Modal EM (MEM) algorithm solves for a local maximum of a mixture density by an ascending iterative procedure starting from any initial point. 1. Expectation Step p k = π kf k (x (r) ) f(x (r) ), k = 1,..., K 2. Maximization Step x (r+1) = arg max x K p k log f k (x). k=1 MM [Hunter and Lange] is not an algorithm but a prescription for constructing EM like algorithms. Surrogate function by majorizing or minorizing the objective function. Likelihood or missing data framework: not necessary General Convexity of function and relevant inequalities: important BU : Sep slide #14
20 Modal EM: Normal Mixture Normal mixture, common covariance Σ, = x r+1 = Pr(J = j X = x r )µ j. Posterior probability. Pr(J = j X = x r ) = π jf j (x r ) k π kf k (x r ) BU : Sep slide #15
21 Properties of Modal EM This iteration will, monotonely increase the density i.e. g(x r+1 ) g(x r ). Discrete approximation to the path of steepest ascent until no further ascent is possible. Computationally simple steps even in high-dimensions BU : Sep slide #16
22 Modal EM for non-normal distributions T-distribution M-Step f k (x; µ k, Σ, ν) = Γ[(ν + D)/2] (πν) D/2 Σ 1/2 Γ[ν/2] ( 1 + (x µ ) k) T Σ 1 (ν+d)/2 (x µ k ) ν (1) Poisson Distribution Modal merging even more important for this one parameter distribution. Using integer optimization and using the fact f(x + 1)/f(x) = λ/(x + 1), and using a smooth function that replicates the ease of solution. Use Stirlings Approximation. BU : Sep slide #17
23 Defining modal clusters using modal EM For each µ j in the support, run the modal EM with it as a starting value. Group together all µ j that have the same associated mode and call it a modal cluster. EXAMPLE: modal clusters of parametric mixtures n = 80 Estimated Means µ = ( )( )( )( ) BU : Sep slide #18
24 BU : Sep slide #19 EXAMPLE: modal clusters of parametric mixtures Estimated modes ( ) ( ) x y µ µ µ µ
25 Motivation behind kernel density based clustering No parametric assumptions necessary Kernel density obtained by putting a normal kernel on each data point: K(x, x i ; h) Exactly a finite mixture model (n-component) Natural extension to the parametric modal clustering BU : Sep slide #20
26 History of nonparametric modality tests Hartigan and Hartigan(1985) and Cheng and Hall (1998) developed the DIP Test Univariate, using local minima and maxima. SPAN test and RANT test Hartigan(1998) and Hartigan and Mohanty (1992) Cheng et al (2004) Bivariate, path of steepest ascent Wegman and Luo (2002) Mode Forest. BU : Sep slide #21
27 Example: Model based Nonparametric clustering Selection of h the smoothing parameter is extremely crucial BU : Sep slide #22
28 Multi-scale Modal Clustering Based on Kernel Density approach. Points belonging to the common mode of the density estimator defines a cluster. Varying the smoothing parameter provides multi-scale analysis Research Questions Interesting range of h ( kernel smoothing ) to be explored. Incremental steps for h. Are the modes significant. Are the modes connected: ridge-line elevation plot. Starting value for a particular level. Flat Clustering Hierarchical Modal Clustering (HMC) BU : Sep slide #23
29 Diffusion Process Motivation When using diffusion kernel K(x, µ; h) for the smoothing, then, for a continuous time Markov chain, it represents the prob. distribution of the location x of a particle that started at µ, after h 2 time units. If the process is reversible, then this works going backward in time h units. If we have n points x 1,..., n, and ask for the distribution of a randomly selected x j h units in the past, then the empirical kernel represents its likely distribution h units ago. That is, the modes are the most likely evolution points of the past. BU : Sep slide #24
30 Choice of h: Spectral Degrees of Freedom sdof analogous to DOF in chi-squared goodness of fit [ Details in Ray 2003, Ray and Lindsay 2006 ] Hypothesis testing H 0 : G = F τ : Fit statistic d K (f, g) = ( f h(w) g h(w)) 2dw w K(h/2) (x, w)df (x) K(h/2) (x, w)dg(x) Choose h ( ) D+1 such that 2 < sdof < n/5 BU : Sep slide #25
31 Spectral Decomposition If K(x, y): S K(x, y) 2 dm(x)dm(y) < K(x, y) generates a Hilbert-Schmidt operator on L 2 (M) through the operation (Kg)(x) = K(x, y)g(y) dm(y) Theorem: Yosida (1980) K(x, y) = λ j φ j (x)φ j (y), j=1 λ j s eigenvalues φ j (x) s are corresponding normalized eigen functions of K under baseline measure M. BU : Sep slide #26
32 Satterthwaite Approximation and SDOF Asymptotically d K (f, g) = d λ j χ 2 (1) j 1 Can be further approximated by a single chi-squared with df= Spectral Degrees of Freedom χ 2 sdof Empirical estimate of sdof given by sdof = (trace ˆF (K (h)c )) 2 trace ˆF (K 2 (h) c ) = [ 1 n n i=1 K (h) c (x i, x i ) ] 2 2 n ( n(n 1) i=1 j<i K(h)c (x i, x j ) ) 2, where the centered kernel based on the empirical distribution is given by K (h)c (x i, x j ) = K h (x i, x j ) 1 K h (x i, x j ) 1 n n i K h (x i, x j ) + 1 K n 2 h (x i, x j ) i i j BU : Sep slide #27
33 Practical Issues regarding choice of h Meaningful clusters The values of h should make difference in cluster allocation of sample points Select range and intermediate points Number of clusters and SDOF... do they have a connection BU : Sep slide #28
34 Example: Hierarchical Modal Clustering BU : Sep slide #29
35 Modal Ridges: Gap test Purpose: To find out the separateness of each pairwise mode. Evaluate the height along the ridgeline connecting the two modes. Modal ridge determined by the height of the saddle between the the two modes. Use paired t-test of the kernel density estimates and multiple testing BU : Sep slide #30
36 Modal Ridges: Gap test Significant Ridge-line plots at different levels of h BU : Sep slide #30
37 Dimension reduction using ridge direction Visualization of Modes: In high dimension how do we visualize modes Using the ridge-lines can we find interesting linear or non-linear direction to project the data for interesting visualization Direction of maximum heterogeneity BU : Sep slide #31
38 Rees Data Analysis used by Hartigan Rees (1993) determined the proper motions of 515 stars in the region of the globular cluster M5. BU : Sep slide #32
39 Rees Data Analysis used by Hartigan Rees (1993) determined the proper motions of 515 stars in the region of the globular cluster M5. y µ µ µ µ µ µ µ µ µ x Parametric Modal Clustering starting with 9 clusters BU : Sep slide #32
40 Rees Data Analysis used by Hartigan Rees (1993) determined the proper motions of 515 stars in the region of the globular cluster M5. y x Parametric Modal Clustering starting with 10 clusters BU : Sep slide #32
41 Rees Data Analysis used by Hartigan Rees (1993) determined the proper motions of 515 stars in the region of the globular cluster M5. y x Parametric Modal Clustering starting with 14 clusters BU : Sep slide #32
42 Rees Data: Non-parametric clustering Nonparametric Modal Clustering (a) h= BU : Sep slide #33
43 Rees Data: Non-parametric clustering Nonparametric Modal Clustering (a) h= BU : Sep slide #33
44 Rees Data: Non-parametric clustering Nonparametric Modal Clustering (a) h= BU : Sep slide #33
45 Rees Data: Non-parametric clustering Nonparametric Hierarchical Modal Clustering BU : Sep slide #33
46 Image Segmentation Cluster the pixel color components and label pixels in the same cluster as one region Partitions images into relatively homogeneous region. Comparison with k-means clustering with respect to Computational Cost Prior Specification Interpretation of result BU : Sep slide #34
47 Image Segmentation: Impressionist HMAC modes K-means means BU : Sep slide #35
48 Image Segmentation: Conclusion K-means clusters the pixels and computes the centroid vector for each cluster HMAC finds the modal vectors, The vectors are bump peaks of the density. The representative colors generated by k-means tend to be muddier due to averaging. HMAC retains the true colors better white color of the stars in the first picture purplish pink of the vase in the second. HMAC may ignore certain colors that either are not distinct enough from others or do not occupy enough pixels. For the purpose of finding the main palette of a painting, HMC may be more preferable. BU : Sep slide #36
49 Medical Image Segmentation BU : Sep slide #37
50 Medical Image Segmentation BU : Sep slide #38
51 Conclusion Using the best of two worlds of Model based and distance based/partition clustering. Multivariate in nature. Unlike hierarchical clustering at each step it is a global analysis ( for kernels with appropriate support). We use the knowledge of topography of mixtures to merge component of mixtures whose contribution is unimodal. (Pruning) Robust to specification of underlying probability model. Can be completely nonparametric. Underlying theme computational simplicity even in higher dimensions. BU : Sep slide #39
52 Extension of Modal Clustering Determining marginal modes. Determining conditional modes. Missing data problem. Modal parametric distribution. Modal non-parametric distribution. BU : Sep slide #40
53 Collaborators Mixture Models, Quadratic Distance, Modal Clusters Bruce G. Lindsay Marianthi Markatou Shu-Chuan (Grace) Chen Jia Li HDLSS geometry, bi-clustering, analysis of microarray data Steve Marron Protemics: Classification of MHC Binders, Vaccine Design, HMM Thomas B Kepler Andrew Nobel Scott Schmidler Medical Imaging, Non-linear Shape Statistics, Imaging using tissue Mixtures Keith E. Muller Stephen M. Pizer Joshua Stough (Ph D student) Ja Yeon Jeong (Ph D student) Bayesian Model Selection, Generalized BIC, Structural Equations Model Jim Berger Kenneth Bollen Jane Zavisca M. J. (Susie) Bayarri BU : Sep slide #41
54 References Topography of Mixtures and Modal Clustering Ray, S., Lindsay, B.(2005). The Topography of Multivariate Normal Mixtures. The Annals of Statistics 33, 5, Ray, S. (2003) Distance-based Model-Selection with application to the Analysis of Gene Expression Data. Electronic Thesis. HDLSS and Microarray Analysis Ray, S., Marron, J.S. Feature selection in High dimensional Low sample size data using two-way mixtures. Model Selection in High Dimension Ray, S., Lindsay, B.G. Model selection in High-Dimensions: A Quadratic-risk Based Approach. [Submitted] Lindsay, B.G., Ray, S., Chen, S.C., Yang, K., Markatou M. on probabilities: the foundations.[submitted] Quadratic distances Lindsay, B.G., Ray, S., Chen, S.C., Yang, K., Markatou M. Quadratic distances on probabilities: Multivariate construction using tuneable diffusion kernels. [Tech report] BU : Sep slide #42
HOW MANY MODES CAN A TWO COMPONENT MIXTURE HAVE? Surajit Ray and Dan Ren. Boston University
HOW MANY MODES CAN A TWO COMPONENT MIXTURE HAVE? Surajit Ray and Dan Ren Boston University Abstract: The main result of this article states that one can get as many as D + 1 modes from a two component
More informationOn the upper bound of the number of modes of a multivariate normal mixture
On the upper bound of the number of modes of a multivariate normal mixture Surajit Ray and Dan Ren Department of Mathematics and Statistics Boston University 111 Cummington Street, Boston, MA 02215, USA
More informationTHE TOPOGRAPHY OF MULTIVARIATE NORMAL MIXTURES 1
The Annals of Statistics 2005, Vol. 33, No. 5, 2042 2065 DOI: 10.1214/009053605000000417 Institute of Mathematical Statistics, 2005 THE TOPOGRAPHY OF MULTIVARIATE NORMAL MIXTURES 1 BY SURAJIT RAY AND BRUCE
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationGaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces
Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces Mu Qiao and Jia Li Abstract We investigate a Gaussian mixture model (GMM) with component means constrained in a pre-selected
More informationA Novel Nonparametric Density Estimator
A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationConvexification and Multimodality of Random Probability Measures
Convexification and Multimodality of Random Probability Measures George Kokolakis & George Kouvaras Department of Applied Mathematics National Technical University of Athens, Greece Isaac Newton Institute,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationClustering and Gaussian Mixture Models
Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap
More informationThe Pennsylvania State University The Graduate School Eberly College of Science VISUAL ANALYTICS THROUGH GAUSSIAN MIXTURE MODELS WITH
The Pennsylvania State University The Graduate School Eberly College of Science VISUAL ANALYTICS THROUGH GAUSSIAN MIXTURE MODELS WITH SUBSPACE CONSTRAINED COMPONENT MEANS A Thesis in Statistics by Mu Qiao
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationClustering. Léon Bottou COS 424 3/4/2010. NEC Labs America
Clustering Léon Bottou NEC Labs America COS 424 3/4/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression, other.
More informationLabel Switching and Its Simple Solutions for Frequentist Mixture Models
Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationBayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models
Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Kohta Aoki 1 and Hiroshi Nagahashi 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationPrerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3
University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.
More informationTesting Statistical Hypotheses
E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationNonparametric Modal Regression
Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationCS 534: Computer Vision Segmentation III Statistical Nonparametric Methods for Segmentation
CS 534: Computer Vision Segmentation III Statistical Nonparametric Methods for Segmentation Ahmed Elgammal Dept of Computer Science CS 534 Segmentation III- Nonparametric Methods - - 1 Outlines Density
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationLecture 3. Inference about multivariate normal distribution
Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationMaximum Smoothed Likelihood for Multivariate Nonparametric Mixtures
Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationResponse Surface Methods
Response Surface Methods 3.12.2014 Goals of Today s Lecture See how a sequence of experiments can be performed to optimize a response variable. Understand the difference between first-order and second-order
More informationData Preprocessing. Cluster Similarity
1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M
More informationSome recent advances in multivariate statistics: modality inference and statistical monitoring of clinical trials with multiple co-primary endpoints
Boston University OpenBU Theses & Dissertations http://open.bu.edu Boston University Theses & Dissertations 2014 Some recent advances in multivariate statistics: modality inference and statistical monitoring
More informationPairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion
Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationDiffusion/Inference geometries of data features, situational awareness and visualization. Ronald R Coifman Mathematics Yale University
Diffusion/Inference geometries of data features, situational awareness and visualization Ronald R Coifman Mathematics Yale University Digital data is generally converted to point clouds in high dimensional
More informationJ. Cwik and J. Koronacki. Institute of Computer Science, Polish Academy of Sciences. to appear in. Computational Statistics and Data Analysis
A Combined Adaptive-Mixtures/Plug-In Estimator of Multivariate Probability Densities 1 J. Cwik and J. Koronacki Institute of Computer Science, Polish Academy of Sciences Ordona 21, 01-237 Warsaw, Poland
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationExtreme Value Analysis and Spatial Extremes
Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models
More informationA Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait
A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationForecasting Wind Ramps
Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators
More informationA Fully Nonparametric Modeling Approach to. BNP Binary Regression
A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationDEGREES OF FREEDOM IN QUADRATIC GOODNESS OF FIT. Pennsylvania State University, Columbia University, Boston University
Submitted to the Annals of Statistics DEGREES OF FREEDOM IN QUADRATIC GOODNESS OF FIT By Bruce G. Lindsay, Marianthi Markatou and Surajit Ray Pennsylvania State University, Columbia University, Boston
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationReview and Motivation
Review and Motivation We can model and visualize multimodal datasets by using multiple unimodal (Gaussian-like) clusters. K-means gives us a way of partitioning points into N clusters. Once we know which
More informationTesting Statistical Hypotheses
E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationParametric Empirical Bayes Methods for Microarrays
Parametric Empirical Bayes Methods for Microarrays Ming Yuan, Deepayan Sarkar, Michael Newton and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 General Model Structure: Two Conditions
More informationClustering and Model Integration under the Wasserstein Metric. Jia Li Department of Statistics Penn State University
Clustering and Model Integration under the Wasserstein Metric Jia Li Department of Statistics Penn State University Clustering Data represented by vectors or pairwise distances. Methods Top- down approaches
More informationMultivariate Statistics
Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical
More informationLecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1
Lecture 13: Data Modelling and Distributions Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Why data distributions? It is a well established fact that many naturally occurring
More informationSpectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures
Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures STIAN NORMANN ANFINSEN ROBERT JENSSEN TORBJØRN ELTOFT COMPUTATIONAL EARTH OBSERVATION AND MACHINE LEARNING LABORATORY
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationEstimation of cumulative distribution function with spline functions
INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationHybrid Dirichlet processes for functional data
Hybrid Dirichlet processes for functional data Sonia Petrone Università Bocconi, Milano Joint work with Michele Guindani - U.T. MD Anderson Cancer Center, Houston and Alan Gelfand - Duke University, USA
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationOptimal Estimation of a Nonsmooth Functional
Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationSome Curiosities Arising in Objective Bayesian Analysis
. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationSemi-Nonparametric Inferences for Massive Data
Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationSimultaneous inference for multiple testing and clustering via a Dirichlet process mixture model
Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model David B Dahl 1, Qianxing Mo 2 and Marina Vannucci 3 1 Texas A&M University, US 2 Memorial Sloan-Kettering
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More information