BU Statistics Seminar Series Fall 2006

Size: px
Start display at page:

Download "BU Statistics Seminar Series Fall 2006"

Transcription

1 Modal EM for Mixtures and its Application in Clustering BU Statistics Seminar Series Fall 2006 Surajit Ray BU : Sep slide #1

2 Why Study Modality Inference about actual data generation process. Potentially symptomatic to the underlying population structure or substructure. Modes are stable and more elemental than components of a mixture model. For example to model a single t-distribution we may require mixture of many normals with same mean but different variance. BU : Sep slide #2

3 Mixture of Distributions K-component mixture: f(x) = K j=1 π j f j (x; λ j, θ) x R D f j (x; λ j, θ) component densities. π j mixing proportions. 0 π j 1 j and K j=1 π j = 1. π lives in a K 1 simplex BU : Sep slide #3

4 Mixture of Distributions Flexible way of modeling heterogeneous population. Data reduction technique: Visualization Directly applicable model based clustering.[ray 2003, Ray and Lindsay 2006] BU : Sep slide #4

5 Contents Modal EM and Modal Cluster of parametric mixtures Modes and Clusters from Kernel Density Estimates/ Nonparametric Modes Multi-scale modal clustering or Hierarchical Modal clustering Further Pruning of Modes: Modal Ridges, Gap test, Hilltop Test, Coverage Visualization of Modes: Dimension Reduction: Direction of maximum homogeneity Application: Parametric mixture Modal clustering Multi-scale Modal Clustering Conclusion BU : Sep slide #5

6 Components vs Modes 5 component Bivariate Normal Mixture: yy µ µ µ µ µ xx BU : Sep slide #6

7 Components vs Modes 5 component Bivariate Normal Mixture: Only 3 modes y µ µ µ µ µ x BU : Sep slide #6

8 Number of components Number of Modes Density Density x Mixture of Normals 4 standard deviations apart Bimodal x Mixture of Normals 2 standard deviations apart Unimodal For Univariate Normals: number of components number of modes. BU : Sep slide #7

9 Number of components Number of Modes Density Density x Mixture of Normals 4 standard deviations apart Bimodal x Mixture of Normals 2 standard deviations apart Unimodal Conditions for bimodality of two univariate normals (Helguero, 1904) X πn (µ 1, σ) + (1 π)n (µ 2, σ) µ 2 µ 1 σ 2 (π = 1 2 ) Conditions for arbitrary univariate mixture in (Eisenberger:1964,Behbodian:1970, Robertson and Fryer,1969) BU : Sep slide #7

10 Parametric Modes: Multivariate data : Equal variance Not many previous work in this area except Ray and Lindsay (2005) A mixture of multivariate normal is bimodal the univariate distribution along some line is bimodal. Get the axis of maximum separation. Use the bimodality condition in the univariate case to determine modality. For mixture of two multivariate normals this is the line joining the two means. BU : Sep slide #8

11 Parametric Modes: Multivariate data : Equal variance Not many previous work in this area except Ray and Lindsay (2005) A mixture of multivariate normal is bimodal the univariate distribution along some line is bimodal. Get the axis of maximum separation. Use the bimodality condition in the univariate case to determine modality. For mixture of two multivariate normals this is the line joining the two means. Theorem 1 Multivariate Modality Condition: Equal variance Case The distribution of X is bimodal iff X πn (µ 1, Σ) + (1 π)n (µ 2, Σ) (µ 1 µ 2 ) Σ 1 (µ 1 µ 2 ) > 4 BU : Sep slide #8

12 The Ridgeline Manifold g(x) = π 1 φ(x; µ 1, Σ 1 ) + π 2 φ(x; µ 2, Σ 2 ) π K φ(x; µ K, Σ K ) Mapping x : R K 1 R D x (α) = [ α 1 Σ α 2 Σ α K Σ 1 ] 1 K [ α1 Σ 1 1 µ 1 + α 2 Σ 1 2 µ α K Σ 1 K µ ] K The image of this map will be denoted by M and called the ridgeline surface or manifold. If K = 2, it will be called the ridgeline as it is a one-dimensional curve. Theorem 2 All the critical values of the D-dimensional multivariate mixture, and hence modes, antimodes and saddle points, lie in the manifold M given by x (α) Enormous dimension reduction for d >> K. Explore only K 1 dimensions. BU : Sep slide #9

13 Modality and mixing proportion: Π-plots Ridgeline elevation/contour plots Full information (location and height) of the modes and saddle-point. No dependence on π. ( Mixing proportions) Π-plots and Analytical Solutions Only location of critical points. But dependence on the mixing parameter π BU : Sep slide #10

14 The Π-equation Two component case: If x (α) is a critical value of h(α) if it satisfies h (α) = πφ 1 (x (α)) + πφ 2 (x (α)) = 0, whereh(α) = g(x α ) Solving gives Π(α) = φ 2(α) φ 2 (α) φ 1 (α). = if α is a critical value then it solves the ıpi-equation : Π(α) = π. 1 solution = 1 mode 3 solution = 2 mode 5 solution = 3 mode BU : Sep slide #11

15 Analytical Solution for the number of modes where p(α) = (µ 2 µ 1 ) Σ 1 κ(α) = [p(α)] 2 [1 αᾱp(α)], 1 Sα 1 Σ 1 2 Sα 1 Σ 1 2 Sα 1 Σ 1 1 (µ 2 µ 1 ). where S α = [ ] α 1 Σ α 2 Σ 1 1 [ 2 α1 Σ 1 1 µ 1 + α 2 Σ 1 2 µ ] 2 BU : Sep slide #12

16 Analytical Solution for the number of modes where p(α) = (µ 2 µ 1 ) Σ 1 κ(α) = [p(α)] 2 [1 αᾱp(α)], 1 Sα 1 Σ 1 2 Sα 1 Σ 1 2 Sα 1 Σ 1 1 (µ 2 µ 1 ). where S α = [ ] α 1 Σ α 2 Σ 1 1 [ 2 α1 Σ 1 1 µ 1 + α 2 Σ 1 2 µ ] 2 Solution to: Quadratic for Equal Variance. Cubic for Proportional variance. Open Question Upper bound for the number of modes for unequal variance case. BU : Sep slide #12

17 Modal EM and Modal Cluster of parametric mixtures x1 x2 x3 x4 x Any point x can move along its path of steepest ascent, w.r.t the density. The associated mode of x is that point after which no more ascent is possible. BU : Sep slide #13

18 MEM: The Modal EM Modal EM (MEM) algorithm solves for a local maximum of a mixture density by an ascending iterative procedure starting from any initial point. 1. Expectation Step p k = π kf k (x (r) ) f(x (r) ), k = 1,..., K 2. Maximization Step x (r+1) = arg max x K p k log f k (x). k=1 BU : Sep slide #14

19 MEM: The Modal EM Modal EM (MEM) algorithm solves for a local maximum of a mixture density by an ascending iterative procedure starting from any initial point. 1. Expectation Step p k = π kf k (x (r) ) f(x (r) ), k = 1,..., K 2. Maximization Step x (r+1) = arg max x K p k log f k (x). k=1 MM [Hunter and Lange] is not an algorithm but a prescription for constructing EM like algorithms. Surrogate function by majorizing or minorizing the objective function. Likelihood or missing data framework: not necessary General Convexity of function and relevant inequalities: important BU : Sep slide #14

20 Modal EM: Normal Mixture Normal mixture, common covariance Σ, = x r+1 = Pr(J = j X = x r )µ j. Posterior probability. Pr(J = j X = x r ) = π jf j (x r ) k π kf k (x r ) BU : Sep slide #15

21 Properties of Modal EM This iteration will, monotonely increase the density i.e. g(x r+1 ) g(x r ). Discrete approximation to the path of steepest ascent until no further ascent is possible. Computationally simple steps even in high-dimensions BU : Sep slide #16

22 Modal EM for non-normal distributions T-distribution M-Step f k (x; µ k, Σ, ν) = Γ[(ν + D)/2] (πν) D/2 Σ 1/2 Γ[ν/2] ( 1 + (x µ ) k) T Σ 1 (ν+d)/2 (x µ k ) ν (1) Poisson Distribution Modal merging even more important for this one parameter distribution. Using integer optimization and using the fact f(x + 1)/f(x) = λ/(x + 1), and using a smooth function that replicates the ease of solution. Use Stirlings Approximation. BU : Sep slide #17

23 Defining modal clusters using modal EM For each µ j in the support, run the modal EM with it as a starting value. Group together all µ j that have the same associated mode and call it a modal cluster. EXAMPLE: modal clusters of parametric mixtures n = 80 Estimated Means µ = ( )( )( )( ) BU : Sep slide #18

24 BU : Sep slide #19 EXAMPLE: modal clusters of parametric mixtures Estimated modes ( ) ( ) x y µ µ µ µ

25 Motivation behind kernel density based clustering No parametric assumptions necessary Kernel density obtained by putting a normal kernel on each data point: K(x, x i ; h) Exactly a finite mixture model (n-component) Natural extension to the parametric modal clustering BU : Sep slide #20

26 History of nonparametric modality tests Hartigan and Hartigan(1985) and Cheng and Hall (1998) developed the DIP Test Univariate, using local minima and maxima. SPAN test and RANT test Hartigan(1998) and Hartigan and Mohanty (1992) Cheng et al (2004) Bivariate, path of steepest ascent Wegman and Luo (2002) Mode Forest. BU : Sep slide #21

27 Example: Model based Nonparametric clustering Selection of h the smoothing parameter is extremely crucial BU : Sep slide #22

28 Multi-scale Modal Clustering Based on Kernel Density approach. Points belonging to the common mode of the density estimator defines a cluster. Varying the smoothing parameter provides multi-scale analysis Research Questions Interesting range of h ( kernel smoothing ) to be explored. Incremental steps for h. Are the modes significant. Are the modes connected: ridge-line elevation plot. Starting value for a particular level. Flat Clustering Hierarchical Modal Clustering (HMC) BU : Sep slide #23

29 Diffusion Process Motivation When using diffusion kernel K(x, µ; h) for the smoothing, then, for a continuous time Markov chain, it represents the prob. distribution of the location x of a particle that started at µ, after h 2 time units. If the process is reversible, then this works going backward in time h units. If we have n points x 1,..., n, and ask for the distribution of a randomly selected x j h units in the past, then the empirical kernel represents its likely distribution h units ago. That is, the modes are the most likely evolution points of the past. BU : Sep slide #24

30 Choice of h: Spectral Degrees of Freedom sdof analogous to DOF in chi-squared goodness of fit [ Details in Ray 2003, Ray and Lindsay 2006 ] Hypothesis testing H 0 : G = F τ : Fit statistic d K (f, g) = ( f h(w) g h(w)) 2dw w K(h/2) (x, w)df (x) K(h/2) (x, w)dg(x) Choose h ( ) D+1 such that 2 < sdof < n/5 BU : Sep slide #25

31 Spectral Decomposition If K(x, y): S K(x, y) 2 dm(x)dm(y) < K(x, y) generates a Hilbert-Schmidt operator on L 2 (M) through the operation (Kg)(x) = K(x, y)g(y) dm(y) Theorem: Yosida (1980) K(x, y) = λ j φ j (x)φ j (y), j=1 λ j s eigenvalues φ j (x) s are corresponding normalized eigen functions of K under baseline measure M. BU : Sep slide #26

32 Satterthwaite Approximation and SDOF Asymptotically d K (f, g) = d λ j χ 2 (1) j 1 Can be further approximated by a single chi-squared with df= Spectral Degrees of Freedom χ 2 sdof Empirical estimate of sdof given by sdof = (trace ˆF (K (h)c )) 2 trace ˆF (K 2 (h) c ) = [ 1 n n i=1 K (h) c (x i, x i ) ] 2 2 n ( n(n 1) i=1 j<i K(h)c (x i, x j ) ) 2, where the centered kernel based on the empirical distribution is given by K (h)c (x i, x j ) = K h (x i, x j ) 1 K h (x i, x j ) 1 n n i K h (x i, x j ) + 1 K n 2 h (x i, x j ) i i j BU : Sep slide #27

33 Practical Issues regarding choice of h Meaningful clusters The values of h should make difference in cluster allocation of sample points Select range and intermediate points Number of clusters and SDOF... do they have a connection BU : Sep slide #28

34 Example: Hierarchical Modal Clustering BU : Sep slide #29

35 Modal Ridges: Gap test Purpose: To find out the separateness of each pairwise mode. Evaluate the height along the ridgeline connecting the two modes. Modal ridge determined by the height of the saddle between the the two modes. Use paired t-test of the kernel density estimates and multiple testing BU : Sep slide #30

36 Modal Ridges: Gap test Significant Ridge-line plots at different levels of h BU : Sep slide #30

37 Dimension reduction using ridge direction Visualization of Modes: In high dimension how do we visualize modes Using the ridge-lines can we find interesting linear or non-linear direction to project the data for interesting visualization Direction of maximum heterogeneity BU : Sep slide #31

38 Rees Data Analysis used by Hartigan Rees (1993) determined the proper motions of 515 stars in the region of the globular cluster M5. BU : Sep slide #32

39 Rees Data Analysis used by Hartigan Rees (1993) determined the proper motions of 515 stars in the region of the globular cluster M5. y µ µ µ µ µ µ µ µ µ x Parametric Modal Clustering starting with 9 clusters BU : Sep slide #32

40 Rees Data Analysis used by Hartigan Rees (1993) determined the proper motions of 515 stars in the region of the globular cluster M5. y x Parametric Modal Clustering starting with 10 clusters BU : Sep slide #32

41 Rees Data Analysis used by Hartigan Rees (1993) determined the proper motions of 515 stars in the region of the globular cluster M5. y x Parametric Modal Clustering starting with 14 clusters BU : Sep slide #32

42 Rees Data: Non-parametric clustering Nonparametric Modal Clustering (a) h= BU : Sep slide #33

43 Rees Data: Non-parametric clustering Nonparametric Modal Clustering (a) h= BU : Sep slide #33

44 Rees Data: Non-parametric clustering Nonparametric Modal Clustering (a) h= BU : Sep slide #33

45 Rees Data: Non-parametric clustering Nonparametric Hierarchical Modal Clustering BU : Sep slide #33

46 Image Segmentation Cluster the pixel color components and label pixels in the same cluster as one region Partitions images into relatively homogeneous region. Comparison with k-means clustering with respect to Computational Cost Prior Specification Interpretation of result BU : Sep slide #34

47 Image Segmentation: Impressionist HMAC modes K-means means BU : Sep slide #35

48 Image Segmentation: Conclusion K-means clusters the pixels and computes the centroid vector for each cluster HMAC finds the modal vectors, The vectors are bump peaks of the density. The representative colors generated by k-means tend to be muddier due to averaging. HMAC retains the true colors better white color of the stars in the first picture purplish pink of the vase in the second. HMAC may ignore certain colors that either are not distinct enough from others or do not occupy enough pixels. For the purpose of finding the main palette of a painting, HMC may be more preferable. BU : Sep slide #36

49 Medical Image Segmentation BU : Sep slide #37

50 Medical Image Segmentation BU : Sep slide #38

51 Conclusion Using the best of two worlds of Model based and distance based/partition clustering. Multivariate in nature. Unlike hierarchical clustering at each step it is a global analysis ( for kernels with appropriate support). We use the knowledge of topography of mixtures to merge component of mixtures whose contribution is unimodal. (Pruning) Robust to specification of underlying probability model. Can be completely nonparametric. Underlying theme computational simplicity even in higher dimensions. BU : Sep slide #39

52 Extension of Modal Clustering Determining marginal modes. Determining conditional modes. Missing data problem. Modal parametric distribution. Modal non-parametric distribution. BU : Sep slide #40

53 Collaborators Mixture Models, Quadratic Distance, Modal Clusters Bruce G. Lindsay Marianthi Markatou Shu-Chuan (Grace) Chen Jia Li HDLSS geometry, bi-clustering, analysis of microarray data Steve Marron Protemics: Classification of MHC Binders, Vaccine Design, HMM Thomas B Kepler Andrew Nobel Scott Schmidler Medical Imaging, Non-linear Shape Statistics, Imaging using tissue Mixtures Keith E. Muller Stephen M. Pizer Joshua Stough (Ph D student) Ja Yeon Jeong (Ph D student) Bayesian Model Selection, Generalized BIC, Structural Equations Model Jim Berger Kenneth Bollen Jane Zavisca M. J. (Susie) Bayarri BU : Sep slide #41

54 References Topography of Mixtures and Modal Clustering Ray, S., Lindsay, B.(2005). The Topography of Multivariate Normal Mixtures. The Annals of Statistics 33, 5, Ray, S. (2003) Distance-based Model-Selection with application to the Analysis of Gene Expression Data. Electronic Thesis. HDLSS and Microarray Analysis Ray, S., Marron, J.S. Feature selection in High dimensional Low sample size data using two-way mixtures. Model Selection in High Dimension Ray, S., Lindsay, B.G. Model selection in High-Dimensions: A Quadratic-risk Based Approach. [Submitted] Lindsay, B.G., Ray, S., Chen, S.C., Yang, K., Markatou M. on probabilities: the foundations.[submitted] Quadratic distances Lindsay, B.G., Ray, S., Chen, S.C., Yang, K., Markatou M. Quadratic distances on probabilities: Multivariate construction using tuneable diffusion kernels. [Tech report] BU : Sep slide #42

HOW MANY MODES CAN A TWO COMPONENT MIXTURE HAVE? Surajit Ray and Dan Ren. Boston University

HOW MANY MODES CAN A TWO COMPONENT MIXTURE HAVE? Surajit Ray and Dan Ren. Boston University HOW MANY MODES CAN A TWO COMPONENT MIXTURE HAVE? Surajit Ray and Dan Ren Boston University Abstract: The main result of this article states that one can get as many as D + 1 modes from a two component

More information

On the upper bound of the number of modes of a multivariate normal mixture

On the upper bound of the number of modes of a multivariate normal mixture On the upper bound of the number of modes of a multivariate normal mixture Surajit Ray and Dan Ren Department of Mathematics and Statistics Boston University 111 Cummington Street, Boston, MA 02215, USA

More information

THE TOPOGRAPHY OF MULTIVARIATE NORMAL MIXTURES 1

THE TOPOGRAPHY OF MULTIVARIATE NORMAL MIXTURES 1 The Annals of Statistics 2005, Vol. 33, No. 5, 2042 2065 DOI: 10.1214/009053605000000417 Institute of Mathematical Statistics, 2005 THE TOPOGRAPHY OF MULTIVARIATE NORMAL MIXTURES 1 BY SURAJIT RAY AND BRUCE

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces

Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces Mu Qiao and Jia Li Abstract We investigate a Gaussian mixture model (GMM) with component means constrained in a pre-selected

More information

A Novel Nonparametric Density Estimator

A Novel Nonparametric Density Estimator A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Convexification and Multimodality of Random Probability Measures

Convexification and Multimodality of Random Probability Measures Convexification and Multimodality of Random Probability Measures George Kokolakis & George Kouvaras Department of Applied Mathematics National Technical University of Athens, Greece Isaac Newton Institute,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

The Pennsylvania State University The Graduate School Eberly College of Science VISUAL ANALYTICS THROUGH GAUSSIAN MIXTURE MODELS WITH

The Pennsylvania State University The Graduate School Eberly College of Science VISUAL ANALYTICS THROUGH GAUSSIAN MIXTURE MODELS WITH The Pennsylvania State University The Graduate School Eberly College of Science VISUAL ANALYTICS THROUGH GAUSSIAN MIXTURE MODELS WITH SUBSPACE CONSTRAINED COMPONENT MEANS A Thesis in Statistics by Mu Qiao

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America Clustering Léon Bottou NEC Labs America COS 424 3/4/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression, other.

More information

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Label Switching and Its Simple Solutions for Frequentist Mixture Models Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Kohta Aoki 1 and Hiroshi Nagahashi 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

CS 534: Computer Vision Segmentation III Statistical Nonparametric Methods for Segmentation

CS 534: Computer Vision Segmentation III Statistical Nonparametric Methods for Segmentation CS 534: Computer Vision Segmentation III Statistical Nonparametric Methods for Segmentation Ahmed Elgammal Dept of Computer Science CS 534 Segmentation III- Nonparametric Methods - - 1 Outlines Density

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Response Surface Methods

Response Surface Methods Response Surface Methods 3.12.2014 Goals of Today s Lecture See how a sequence of experiments can be performed to optimize a response variable. Understand the difference between first-order and second-order

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

Some recent advances in multivariate statistics: modality inference and statistical monitoring of clinical trials with multiple co-primary endpoints

Some recent advances in multivariate statistics: modality inference and statistical monitoring of clinical trials with multiple co-primary endpoints Boston University OpenBU Theses & Dissertations http://open.bu.edu Boston University Theses & Dissertations 2014 Some recent advances in multivariate statistics: modality inference and statistical monitoring

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Diffusion/Inference geometries of data features, situational awareness and visualization. Ronald R Coifman Mathematics Yale University

Diffusion/Inference geometries of data features, situational awareness and visualization. Ronald R Coifman Mathematics Yale University Diffusion/Inference geometries of data features, situational awareness and visualization Ronald R Coifman Mathematics Yale University Digital data is generally converted to point clouds in high dimensional

More information

J. Cwik and J. Koronacki. Institute of Computer Science, Polish Academy of Sciences. to appear in. Computational Statistics and Data Analysis

J. Cwik and J. Koronacki. Institute of Computer Science, Polish Academy of Sciences. to appear in. Computational Statistics and Data Analysis A Combined Adaptive-Mixtures/Plug-In Estimator of Multivariate Probability Densities 1 J. Cwik and J. Koronacki Institute of Computer Science, Polish Academy of Sciences Ordona 21, 01-237 Warsaw, Poland

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Forecasting Wind Ramps

Forecasting Wind Ramps Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

DEGREES OF FREEDOM IN QUADRATIC GOODNESS OF FIT. Pennsylvania State University, Columbia University, Boston University

DEGREES OF FREEDOM IN QUADRATIC GOODNESS OF FIT. Pennsylvania State University, Columbia University, Boston University Submitted to the Annals of Statistics DEGREES OF FREEDOM IN QUADRATIC GOODNESS OF FIT By Bruce G. Lindsay, Marianthi Markatou and Surajit Ray Pennsylvania State University, Columbia University, Boston

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Review and Motivation

Review and Motivation Review and Motivation We can model and visualize multimodal datasets by using multiple unimodal (Gaussian-like) clusters. K-means gives us a way of partitioning points into N clusters. Once we know which

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Parametric Empirical Bayes Methods for Microarrays

Parametric Empirical Bayes Methods for Microarrays Parametric Empirical Bayes Methods for Microarrays Ming Yuan, Deepayan Sarkar, Michael Newton and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 General Model Structure: Two Conditions

More information

Clustering and Model Integration under the Wasserstein Metric. Jia Li Department of Statistics Penn State University

Clustering and Model Integration under the Wasserstein Metric. Jia Li Department of Statistics Penn State University Clustering and Model Integration under the Wasserstein Metric Jia Li Department of Statistics Penn State University Clustering Data represented by vectors or pairwise distances. Methods Top- down approaches

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Lecture 13: Data Modelling and Distributions Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Why data distributions? It is a well established fact that many naturally occurring

More information

Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures

Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures STIAN NORMANN ANFINSEN ROBERT JENSSEN TORBJØRN ELTOFT COMPUTATIONAL EARTH OBSERVATION AND MACHINE LEARNING LABORATORY

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Hybrid Dirichlet processes for functional data

Hybrid Dirichlet processes for functional data Hybrid Dirichlet processes for functional data Sonia Petrone Università Bocconi, Milano Joint work with Michele Guindani - U.T. MD Anderson Cancer Center, Houston and Alan Gelfand - Duke University, USA

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model

Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model David B Dahl 1, Qianxing Mo 2 and Marina Vannucci 3 1 Texas A&M University, US 2 Memorial Sloan-Kettering

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information