Estimation for nonparametric mixture models

Size: px
Start display at page:

Download "Estimation for nonparametric mixture models"

Transcription

1 Estimation for nonparametric mixture models David Hunter Penn State University Research supported by NSF Grant SES Joint work with Didier Chauveau (University of Orléans, France), Tatiana Benaglia (Cambridge University) Purdue University, November 12, 2009

2 Finite mixture estimation problem Goal: Estimate λ j and f j (or f jk ) given an i.i.d. sample from: Univariate Case g(x) = m λ j f j (x) j=1

3 Finite mixture estimation problem Goal: Estimate λ j and f j (or f jk ) given an i.i.d. sample from: Univariate Case m g(x) = λ j f j (x) j=1 Multivariate case g(x) = m r f jk (x k ) λ j j=1 k=1 N.B.: Assume conditional independence of x 1,..., x r.

4 Finite mixture estimation problem Goal: Estimate λ j and f j (or f jk ) given an i.i.d. sample from: Univariate Case m g(x) = λ j f j (x) j=1 Multivariate case g(x) = m r f jk (x k ) λ j j=1 k=1 N.B.: Assume conditional independence of x 1,..., x r. Motivation for this talk: Do not assume any more than necessary about the parametric form of f j or f jk.

5 Publicité All computational techniques in this talk are implemented in a package called mixtools for R. mixtools is on CRAN: See recent J. Stat. Soft. article (Benaglia et al., 2009)

6 Outline of talk 1 Motivating examples and notation 2 Review of EM algorithm-ology 3 Identifiability: The univariate case 4 The multivariate non-parametric EM algorithm

7 Outline: Next up... 1 Motivating examples and notation 2 Review of EM algorithm-ology 3 Identifiability: The univariate case 4 The multivariate non-parametric EM algorithm

8 Univariate example: Old Faithful wait times (min.) Time between Old Faithful eruptions Minutes from Obvious bimodality Normal-looking components (?) More on this later!

9 Multivariate example: Water-level angles This example is due to Thomas, Lohaus, and Brainerd (1993). The task: Subjects are shown 8 vessels, pointing at 1:00, 2:00, 4:00, 5:00, 7:00, 8:00, 10:00, and 11:00 They draw the water surface for each Measure: (signed) angle with horizontal formed by drawn surface Vessel tilted to point at 1:00

10 Notational nightmare We have: n = # of individuals in the sample m = # of Mixture components r = # of Repeated measurements (coordinates per observation)

11 Notational nightmare We have: n = # of individuals in the sample m = # of Mixture components r = # of Repeated measurements (coordinates per observation) Thus, the log-likelihood given data x 1,..., x n is L(θ) = n m log λ j i=1 j=1 k=1 r f jk (x ik ) Note the subscripts: Throughout, we use 1 i n 1 j m 1 k r

12 Water-level dataset After loading mixtools, R> data(waterdata) R> Waterdata Trial.1 Trial.2 Trial.3 Trial.4 Trial.5 Trial.6 Trial.7 Trial For these data, Number of subjects: n = 405 Number of coordinates (repeated measures): r = 8. What should m (number of mixture components) be?

13 Outline: Next up... 1 Motivating examples and notation 2 Review of EM algorithm-ology 3 Identifiability: The univariate case 4 The multivariate non-parametric EM algorithm

14 Review of standard EM for mixtures For MLE in finite mixtures, EM algorithms are standard. A complete observation (X, Z) consists of: The observed data X. The unobserved vector Z, defined by { 1 if X comes from component j for 1 j m, Z j = 0 otherwise

15 Review of standard EM for mixtures For MLE in finite mixtures, EM algorithms are standard. A complete observation (X, Z) consists of: The observed data X. The unobserved vector Z, defined by { 1 if X comes from component j for 1 j m, Z j = 0 otherwise What does this mean? In simulations: Generate Z first, then X Z [ j fj ( ) ] Z j In real data, Z is a latent variable whose interpretation depends on context.

16 Parametric EM algorithm for mixtures E-step: Amounts to finding the conditional expectation of each Z i : def Ẑ ij = E ˆθ Z ij = ˆλ j ˆfj (x i ) ˆλ ˆf(x i ) M-step: Amounts to maximizing the expected complete data loglikelihood n m L c (θ) = Ẑ ij log [ λ j f j (x i ) ] i=1 j=1 Iterate: Let ˆθ = arg max θ L c (θ) and repeat. N.B.: Usually, f j (x) f (x; φ j ). We let θ denote (λ, φ).

17 Parametric EM algorithm for mixtures E-step: Amounts to finding the conditional expectation of each Z i : def Ẑ ij = E ˆθ Z ij = ˆλ j ˆfj (x i ) ˆλ ˆf(x i ) M-step: Amounts to maximizing the expected complete data loglikelihood n m L c (θ) = Ẑ ij log [ λ j f j (x i ) ] = ˆλ next = 1 Ẑ i n i=1 j=1 i Iterate: Let ˆθ = arg max θ L c (θ) and repeat. N.B.: Usually, f j (x) f (x; φ j ). We let θ denote (λ, φ).

18 Old Faithful data with parametric normal EM Density Time between Old Faithful eruptions Minutes In R with mixtools, type R> data(faithful) R> attach(faithful) R> ans=normalmixem( R+ waiting, R+ mu=c(55,80), R+ sigma=5, R+ fast=t) number of iterations= 24

19 Old Faithful data with parametric normal EM Density Time between Old Faithful eruptions λ 1 = Minutes In R with mixtools, type R> data(faithful) R> attach(faithful) R> ans=normalmixem( R+ waiting, R+ mu=c(55,80), R+ sigma=5, R+ fast=t) number of iterations= 24 Gaussian EM result: ˆµ = (54.6, 80.1)

20 Outline: Next up... 1 Motivating examples and notation 2 Review of EM algorithm-ology 3 Identifiability: The univariate case 4 The multivariate non-parametric EM algorithm

21 Identifiability Univariate Case g(x) = m λ j f j (x) j=1 Identifiability means: g(x) uniquely determines all λ j and f j (up to permuting the subscripts). Parametric case: When f j (x) = f (x; φ j ), generally no problem Nonparametric case: Big problem! We need some restrictions on f j.

22 How to restrict f j in the univariate (r = 1) case? Bordes, Mottelet, and Vandekerkhove (2006) and Hunter, Wang, and Hettmansperger (2007) both showed that g(x) = 2 λ j f j (x) j=1 is identifiable, at least when λ 1 1/2, if f j (x) f (x µ j ) for some density f ( ) that is symmetric about the origin.

23 Exploiting identifiability: An EM algorithm Assume that g(x) = 2 λ j f (x µ j ), j=1 where f ( ) is a symmetric density. Bordes, Chauveau, and Vandekerkhove (2007) introduce an EM-like algorithm that includes a kernel density estimation step. It is much simpler than the algorithms of Bordes et al. (2006) or Hunter et al. (2007).

24 An EM algorithm for m = 2, r = 1: E-step: Same as usual: Ẑ ij def = E ˆθ Z ij = ˆλ jˆf (xi µ j ) ˆλ 1ˆf (xi µ 1 ) + ˆλ 2ˆf (xi µ 2 )

25 An EM algorithm for m = 2, r = 1: E-step: Same as usual: Ẑ ij def = E ˆθ Z ij = ˆλ jˆf (xi µ j ) ˆλ 1ˆf (xi µ 1 ) + ˆλ 2ˆf (xi µ 2 ) M-step: Maximize complete data loglikelihood for λ and µ: ˆλ next = 1 n n i=1 Ẑ i ˆµ next j? = 1 nλ next n Ẑ ij x i i=1

26 An EM algorithm for m = 2, r = 1: E-step: Same as usual: Ẑ ij def = E ˆθ Z ij = ˆλ jˆf (xi µ j ) ˆλ 1ˆf (xi µ 1 ) + ˆλ 2ˆf (xi µ 2 ) M-step: Maximize complete data loglikelihood for λ and µ: ˆλ next = 1 n n i=1 Ẑ i ˆµ next j? = 1 nλ next n Ẑ ij x i i=1 KDE-step: Update estimate of f (for some bandwidth h) by ˆf next (u) = 1 nh n i=1 j=1 2 ( u xi + ˆµ j Ẑ ij K h ), then symmetrize.

27 Old Faithful data again Time between Old Faithful eruptions Density Minutes

28 Old Faithful data again Time between Old Faithful eruptions Density λ 1 = Gaussian EM: ˆµ = (54.6, 80.1) Minutes

29 Old Faithful data again Time between Old Faithful eruptions Density λ 1 = λ 1 = Gaussian EM: ˆµ = (54.6, 80.1) Semiparametric EM with bandwidth = 4: ˆµ = (54.7, 79.8) Minutes

30 Outline: Next up... 1 Motivating examples and notation 2 Review of EM algorithm-ology 3 Identifiability: The univariate case 4 The multivariate non-parametric EM algorithm

31 The blessing of dimensionality (!) Recall the model in the multivariate case, r > 1: g(x) = m r f jk (x k ) λ j j=1 k=1 N.B.: Assume conditional independence of x 1,..., x r. Hall and Zhou (2003) show that when m = 2 and r = 3, the model is identifiable without restrictions on the f jk ( )!

32 The blessing of dimensionality (!) Recall the model in the multivariate case, r > 1: g(x) = m r f jk (x k ) λ j j=1 k=1 N.B.: Assume conditional independence of x 1,..., x r. Hall and Zhou (2003) show that when m = 2 and r = 3, the model is identifiable without restrictions on the f jk ( )! Hall et al (2005) remark that this is a case in which... from at least one point of view, the curse of dimensionality works in reverse.

33 An amazing result Recall the model in the multivariate case, r > 1: g(x) = m r f jk (x k ) λ j j=1 k=1 N.B.: Assume conditional independence of x 1,..., x r. Allman et al (2009) use a theorem by Kruskal (1976) to show that if: f 1k,..., f mk are linearly independent for each k; r 3... then g(x) uniquely determines all the λ j and f jk (up to label-switching).

34 The nonparametric EM, generalized E-step: Same as usual: Ẑ ij def = E ˆθ Z ij = ˆλ j r k=1 ˆf jk (x ik ) j ˆλ j r k=1 ˆf j k(x ik )

35 The nonparametric EM, generalized E-step: Same as usual: Ẑ ij def = E ˆθ Z ij = ˆλ j r k=1 ˆf jk (x ik ) j ˆλ j r k=1 ˆf j k(x ik ) M-step: Maximize complete data loglikelihood for λ: ˆλ next = 1 n n i=1 Ẑ i

36 The nonparametric EM, generalized E-step: Same as usual: Ẑ ij def = E ˆθ Z ij = ˆλ j r k=1 ˆf jk (x ik ) j ˆλ j r k=1 ˆf j k(x ik ) M-step: Maximize complete data loglikelihood for λ: n ˆλ next = 1 n i=1 Ẑ i KDE-step: Update estimate of f (for some bandwidth h) by ˆf next jk (u) = 1 nhˆλ j n ( u xik Ẑ ij K h i=1 ).

37 Simulated example: bandwidth chosen by default Using mixtools in R, type mu = rbind(0, 1:4) # means (m=2, r=4) sigma = matrix(1, 2, 4) lambda = c(0.4, 0.6) x = rmvnormmix(500, lambda, mu, sigma) # simulated data a = npem(x, 2) # fit the model (2 comps) par(mfrow=c(2,2)) plot(a) Coordinate(s) 1 Coordinate(s) 2 Estimated means: component 1 component 2 coordinate coordinate coordinate coordinate Estimated variances: component 1 component 2 coordinate coordinate coordinate coordinate bandwidth = 0.23 Density Density xx Coordinate(s) Density Density xx Coordinate(s) xx xx

38 Simulated example: bandwidth = 1 Use same simulated x data from previous slide, then type b = npem(x, 2, bw=1) # fit the model (2 comps) plot(b) Coordinate(s) 1 Coordinate(s) 2 Estimated means: component 1 component 2 coordinate coordinate coordinate coordinate Estimated variances: component 1 component 2 coordinate coordinate coordinate coordinate Density Density xx Coordinate(s) Density Density xx Coordinate(s) xx xx

39 Different bandwidths From earlier... KDE-step: Update estimate of f (for some bandwidth h) by ˆf next jk (u) = 1 nhˆλ j No reason h can t be h jk. n ( u xik Ẑ ij K h i=1 ). Algorithms for choosing h jk are discussed in Benaglia et al (2008) and implemented in mixtools.

40 The notation gets even worse... Suppose some of the r coordinates are identically distributed. Let the r coordinates be grouped into B i.d. blocks. Denote the block of the kth coordinate by b k, 1 b k B. The model becomes g(x) = m r f jbk (x k ) λ j j=1 k=1

41 The notation gets even worse... Suppose some of the r coordinates are identically distributed. Let the r coordinates be grouped into B i.d. blocks. Denote the block of the kth coordinate by b k, 1 b k B. The model becomes g(x) = m r f jbk (x k ) λ j j=1 k=1 Special cases: b k = k for each k: Fully general model, seen earlier (Hall et al. 2005; Qin and Leung 2006) b k = 1 for each k: Conditionally i.i.d. assumption (Elmore et al. 2004)

42 The Water-level data, three components Block 1: 1:00 and 7:00 orientations Block 2: 2:00 and 8:00 orientations Appearance of Vessel at Orientation = 1:00 Mixing Proportion (Mean, Std Dev) ( 32.1, 19.4) ( 3.9, 23.3) ( 1.4, 6.0) Appearance of Vessel at Orientation = 2:00 Mixing Proportion (Mean, Std Dev) ( 31.4, 55.4) ( 11.7, 27.0) ( 2.7, 4.6) Block 3: 4:00 and 10:00 orientations Block 4: 5:00 and 11:00 orientations Appearance of Vessel at Orientation = 4:00 Mixing Proportion (Mean, Std Dev) ( 43.6, 39.7) ( 11.4, 27.5) ( 1.0, 5.3) Appearance of Vessel at Orientation = 5:00 Mixing Proportion (Mean, Std Dev) ( 27.5, 19.3) ( 2.0, 22.1) ( 0.1, 6.1)

43 The Water-level data, four components Block 1: 1:00 and 7:00 orientations Block 2: 2:00 and 8:00 orientations Appearance of Vessel at Orientation = 1:00 Mixing Proportion (Mean, Std Dev) ( 31.0, 10.2) ( 22.9, 35.2) ( 0.5, 16.4) ( 1.7, 5.1) Appearance of Vessel at Orientation = 2:00 Mixing Proportion (Mean, Std Dev) ( 48.2, 36.2) ( 0.3, 51.9) ( 14.5, 18.0) ( 2.7, 4.3) Block 3: 4:00 and 10:00 orientations Block 4: 5:00 and 11:00 orientations Appearance of Vessel at Orientation = 4:00 Mixing Proportion (Mean, Std Dev) ( 58.2, 16.3) ( 0.5, 49.0) ( 15.6, 16.9) ( 0.9, 5.2) Appearance of Vessel at Orientation = 5:00 Mixing Proportion (Mean, Std Dev) ( 28.2, 12.0) ( 18.0, 34.6) ( 1.9, 14.8) ( 0.3, 5.3)

44 Old slide: Pros and cons of npem compared with: Mixture model inversion method (Hall et al. 2005) Pro: Easily generalizes beyond m = 2, r = 3. Con : Easily generalizes beyond m = 2, r = 3. Pro: Much lower MISE for similar test problems. Pro: Computationally simple. Exponential tilting method (Qin and Leung, 2006) Pro: and Con : Easily generalizable, as above Pro: Computationally simple. Pro: Less restrictive model; does not assume different components are related in any way. Cutpoint method (Elmore et al. 2004) Pro: No need to assume conditionally i.i.d. Pro: No loss of information from categorizing data. Con: Identifiability of model not settled Update: Now, the "cons" are fixed; identifiability is settled.

45 Open questions Are the estimators consistent, and if so at what rate? Emperical evidence: Rates of convergence similar to those in non-mixture setting. Does this algorithm maximize any sort of log-likelihood? Is it truly an EM algorithm? With small adjustments to the E-step, the answer is yes. This is ongoing work with Michael Levine.

46 References Allman, E. S., Matias, C., and Rhodes, J. A. (2009), Identifiability of parameters in latent structure models with many observed variables, Annals of Statistics, 37: Benaglia, T., Chauveau, D., and Hunter, D. R. (2008), Bandwidth Selection in an EM-like algorithm for nonparametric multivariate mixtures, Technical Report hal , version 1, HAL, URL Benaglia, T., Chauveau, D., and Hunter, D. R. (2009a), An EM-like algorithm for semi- and non-parametric estimation in mutlivariate mixtures, Journal of Computational and Graphical Statistics, 18: Benaglia, T., Chauveau, D., Hunter, D. R., and Young, D. S. (2009b), mixtools: An R Package for Analyzing Finite Mixture Models, Journal of Statistical Software, 32(6). Bordes, L., Mottelet, S., and Vandekerkhove, P. (2006), Semiparametric estimation of a two-component mixture model, Annals of Statistics, 34, Bordes, L., Chauveau, D., and Vandekerkhove, P. (2007), An EM algorithm for a semiparametric mixture model, Computational Statistics and Data Analysis, 51: Elmore, R. T., Hettmansperger, T. P., and Thomas, H. (2004), Estimating component cumulative distribution functions in finite mixture models, Communications in Statistics: Theory and Methods, 33: Hall, P. and Zhou, X. H. (2003) Nonparametric estimation of component distributions in a multivariate mixture, Annals of Statistics, 31: Hall, P., Neeman, A., Pakyari, R., and Elmore, R. (2005), Nonparametric inference in multivariate mixtures, Biometrika, 92: Hunter, D. R., Wang, S., and Hettmansperger, T. P. (2007), Inference for mixtures of symmetric distributions, Annals of Statistics, 35: Kruskal, J. B. (1976), More Factors Than Subjects, Tests and Treatments: An Indeterminacy Theorem for Canonical Decomposition and Individual Differences Scaling, Psychometrika, 41: Thomas, H., Lohaus, A., and Brainerd, C.J. (1993). Modeling Growth and Individual Differences in Spatial Tasks, Monographs of the Society for Research in Child Development, 58, 9: Qin, J. and Leung, D. H.-Y. (2006), Semiparametric analysis in conditionally independent multivariate mixture models, unpublished manuscript.

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,

More information

New mixture models and algorithms in the mixtools package

New mixture models and algorithms in the mixtools package New mixture models and algorithms in the mixtools package Didier Chauveau MAPMO - UMR 6628 - Université d Orléans 1ères Rencontres BoRdeaux Juillet 2012 Outline 1 Mixture models and EM algorithm-ology

More information

An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures

An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures Tatiana Benaglia 1 Didier Chauveau 2 David R. Hunter 1,3 August 2008 1 Pennsylvania State University, USA 2 Université

More information

NONPARAMETRIC ESTIMATION IN MULTIVARIATE FINITE MIXTURE MODELS

NONPARAMETRIC ESTIMATION IN MULTIVARIATE FINITE MIXTURE MODELS The Pennsylvania State University The Graduate School Department of Statistics NONPARAMETRIC ESTIMATION IN MULTIVARIATE FINITE MIXTURE MODELS A Dissertation in Statistics by Tatiana A. Benaglia 2008 Tatiana

More information

Semi-parametric estimation for conditional independence multivariate finite mixture models

Semi-parametric estimation for conditional independence multivariate finite mixture models Statistics Surveys Vol. 9 (2015) 1 31 ISSN: 1935-7516 DOI: 10.1214/15-SS108 Semi-parametric estimation for conditional independence multivariate finite mixture models Didier Chauveau Université d Orléans,

More information

Computation of an efficient and robust estimator in a semiparametric mixture model

Computation of an efficient and robust estimator in a semiparametric mixture model Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Computation of an efficient and robust estimator in

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions

Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions UDC 519.2 Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions Yu. K. Belyaev, D. Källberg, P. Rydén Department of Mathematics and Mathematical Statistics, Umeå University,

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

University of Chicago; Sciences Po, Paris; and Sciences Po, Paris and University College, London

University of Chicago; Sciences Po, Paris; and Sciences Po, Paris and University College, London ESTIMATING MULTIVARIATE LATENT-STRUCTURE MODELS By Stéphane Bonhomme, Koen Jochmans and Jean-Marc Robin University of Chicago; Sciences Po, Paris; and Sciences Po, Paris and University College, London

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Gaussian Process Regression Networks

Gaussian Process Regression Networks Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

A Conditional Approach to Modeling Multivariate Extremes

A Conditional Approach to Modeling Multivariate Extremes A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

UNIVERSITY OF CALGARY. Minimum Hellinger Distance Estimation for a Two-Component Mixture Model. Xiaofan Zhou A THESIS

UNIVERSITY OF CALGARY. Minimum Hellinger Distance Estimation for a Two-Component Mixture Model. Xiaofan Zhou A THESIS UNIVERSITY OF CALGARY Minimum Hellinger Distance Estimation for a Two-Component Mixture Model by Xiaofan Zhou A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Label Switching and Its Simple Solutions for Frequentist Mixture Models Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Mai Zhou 1 University of Kentucky, Lexington, KY 40506 USA Summary. Empirical likelihood ratio method (Thomas and Grunkmier

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Review and Motivation

Review and Motivation Review and Motivation We can model and visualize multimodal datasets by using multiple unimodal (Gaussian-like) clusters. K-means gives us a way of partitioning points into N clusters. Once we know which

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon

More information

The EM Algorithm for the Finite Mixture of Exponential Distribution Models

The EM Algorithm for the Finite Mixture of Exponential Distribution Models Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

MIXTURE MODELS AND EM

MIXTURE MODELS AND EM Last updated: November 6, 212 MIXTURE MODELS AND EM Credits 2 Some of these slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Simon Prince, University College London Sergios Theodoridis,

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 ) Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select

More information

Bayesian semiparametric modeling and inference with mixtures of symmetric distributions

Bayesian semiparametric modeling and inference with mixtures of symmetric distributions Bayesian semiparametric modeling and inference with mixtures of symmetric distributions Athanasios Kottas 1 and Gilbert W. Fellingham 2 1 Department of Applied Mathematics and Statistics, University of

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

Lecture 2: Simple Classifiers

Lecture 2: Simple Classifiers CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Label switching and its solutions for frequentist mixture models

Label switching and its solutions for frequentist mixture models Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Label switching and its solutions for frequentist mixture

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Models for Multivariate Panel Count Data

Models for Multivariate Panel Count Data Semiparametric Models for Multivariate Panel Count Data KyungMann Kim University of Wisconsin-Madison kmkim@biostat.wisc.edu 2 April 2015 Outline 1 Introduction 2 3 4 Panel Count Data Motivation Previous

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Mixture of Gaussians Models

Mixture of Gaussians Models Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Markov Chains and Hidden Markov Models

Markov Chains and Hidden Markov Models Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Finite Singular Multivariate Gaussian Mixture

Finite Singular Multivariate Gaussian Mixture 21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Hypothesis Testing: The Generalized Likelihood Ratio Test

Hypothesis Testing: The Generalized Likelihood Ratio Test Hypothesis Testing: The Generalized Likelihood Ratio Test Consider testing the hypotheses H 0 : θ Θ 0 H 1 : θ Θ \ Θ 0 Definition: The Generalized Likelihood Ratio (GLR Let L(θ be a likelihood for a random

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Guy Lebanon February 19, 2011 Maximum likelihood estimation is the most popular general purpose method for obtaining estimating a distribution from a finite sample. It was

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION. Alireza Bayestehtashk and Izhak Shafran

PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION. Alireza Bayestehtashk and Izhak Shafran PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION Alireza Bayestehtashk and Izhak Shafran Center for Spoken Language Understanding, Oregon Health & Science University, Portland, Oregon, USA

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Computing the MLE and the EM Algorithm

Computing the MLE and the EM Algorithm ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood Mai Zhou Yifan Yang Received:

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Kernel Density Estimation

Kernel Density Estimation Kernel Density Estimation Univariate Density Estimation Suppose tat we ave a random sample of data X 1,..., X n from an unknown continuous distribution wit probability density function (pdf) f(x) and cumulative

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro

More information