Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Size: px
Start display at page:

Download "Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures"

Transcription

1 Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove, Laurent Bordes, Tatiana Benaglia, Michael Levine, Xiaotian Zhu Research partially supported by NSF Grants SES , DMS JdS Toulouse, May 30, 2013

2 Outline 1 Nonparametric mixtures and parameter identifiability 2 Motivating example; extension to multivariate case 3 Smoothed maximum likelihood

3 Outline 1 Nonparametric mixtures and parameter identifiability 2 Motivating example; extension to multivariate case 3 Smoothed maximum likelihood

4 We first introduce nonparametric finite mixtures X g(x) }{{} mixture density = f φ (x) }{{} component density dq(φ) }{{} mixing distribution (1) Sometimes, Q( ) is the nonparametric part; e.g., work by Bruce Lindsay assumes Q( ) is unrestricted. However, in this talk we assume that f φ ( ) is (mostly) unrestricted Q( ) has finite support So (1) becomes g(x) = m λ j f j (x) j=1... and we assume m is known.

5 Living histogram at Penn State (Brian Joiner, 1975) Can we learn about male / female subpopulations (i.e., parameters)? What can we say about individuals?

6 Old Faithful Geyser waiting times provide another simple univariate example Time between Old Faithful eruptions Minutes from Let m = 2, so assume we have a sample from λ 1 f 1 (x) + λ 2 f 2 (x). Why do we need any assumptions on f j?

7 With no assumptions, parameters are not identifiable λ 2 f 2 λ 1 f 1 Multiple different parameter combinations (λ 1, λ 2, f 1, f 2 ) λ 1 f 1 λ 2 f 2 give the same mixture density. Thus, some constraints on f j are necessary. NB: Sometimes, there is no obvious multi-modality. λ 1 f 1 λ 2 f 2

8 The univariate case is identifiable under some assumptions It is possible to show 1 that if g(x) = 2 λ j f j (x), j=1 the λ j and f j are uniquely identifiable from g if λ 1 1/2 and f j (x) f (x µ j ) for some density f ( ) that is symmetric about the origin. 1 cf. Bordes, Mottelet, and Vandekerkhove (2006); Hunter, Wang, and Hettmansperger (2007)

9 A modified EM algorithm may be used for estimation EM preliminaries: A complete observation (X, Z) consists of: The observed data X The unobserved vector Z, defined by { 1 if X comes from component j for 1 j m, Z j = 0 otherwise What does this mean? In simulations: Generate Z first, then X Z [ j fj ( ) ] Z j In real data, Z is a latent variable whose interpretation depends on context.

10 Standard EM for finite mixtures looks like this: E-step: Amounts to finding the conditional expectation of each Z i : def Ẑ ij = E ˆθ Z ij = ˆλ j ˆfj (x i ) ˆλ ˆf(x i ) M-step: Amounts to maximizing the expected complete data loglikelihood n m L c (θ) = Ẑ ij log [ λ j f j (x i ) ] = ˆλ next = 1 Ẑ i n i=1 j=1 Iterate: Let ˆθ next = arg max θ L c (θ) and repeat. i N.B.: Usually, f j (x) f (x; φ j ). We let θ denote (λ, φ).

11 A modified EM algorithm may be used for estimation E-step and M-step: Same as usual: Ẑ ij def = E ˆθ Z ij = ˆλ j ˆfj (x i ) ˆλ ˆf(x i ) and ˆλ next = 1 n n i=1 Ẑ i KDE-step: Update ˆf j using a weighted kernel density estimate. Weight each x i by the corresponding Ẑij. Alternatively, select randomly Z i Mult(1, Ẑi) and use a standard KDE for the x i selected into each component. (cf. Bordes, Chauveau, & Vandekerkhove, 2007).

12 For the symmetric location family assumption, the modified EM looks like this E-step: Same as usual: Ẑ ij E ˆθ Z ij = ˆλ jˆf (xi µ j ) ˆλ 1ˆf (xi µ 1 ) + ˆλ 2ˆf (xi µ 2 ) M-step: Maximize complete data loglikelihood for λ and µ: ˆλ next j = 1 n n i=1 Ẑ ij ˆµ next j = (n λ j ) 1 n Ẑ ij x i i=1 KDE-step: Update estimate of f (for some bandwidth h) by ˆf next (u) = (nh) 1 n i=1 j=1 2 ( u xi + ˆµ j Ẑ ij K h ), then symmetrize.

13 Compare two solutions for Old Faithful data Time between Old Faithful eruptions Density Minutes

14 Compare two solutions for Old Faithful data Time between Old Faithful eruptions Density λ 1 = Minutes Gaussian EM: ˆµ = (54.6, 80.1)

15 Compare two solutions for Old Faithful data Time between Old Faithful eruptions Density λ 1 = λ 1 = Minutes Gaussian EM: ˆµ = (54.6, 80.1) Semiparametric EM with bandwidth = 4: ˆµ = (54.7, 79.8) Both algorithms are implemented in mixtools package for R (Benaglia et al., 2009).

16 Outline 1 Nonparametric mixtures and parameter identifiability 2 Motivating example; extension to multivariate case 3 Smoothed maximum likelihood

17 Multivariate example: Water-level angles This example is due to Thomas, Lohaus, & Brainerd (1993). The task: Subjects are shown 8 vessels, pointing at 1:00, 2:00, 4:00, 5:00, 7:00, 8:00, 10:00, and 11:00 They draw the water surface on each vessel Measure: (signed) angle with horizontal formed by line drawn Vessel tilted to point at 1:00

18 Water-level dataset After loading mixtools, R> data(waterdata) R> Waterdata Trial.1 Trial.2 Trial.3 Trial.4 Trial.5 Trial.6 Trial.7 Trial For these data, Number of subjects: n = 405 Number of coordinates (repeated measures): r = 8. What should m (number of mixture components) be?

19 We will assume conditional independence Each f j (x) density on R r is assumed to be the product of its marginals: g(x) = m r f jk (x k ) λ j j=1 k=1 We call this assumption conditional independence (cf. Hall and Zhou, 2003; Qin and Leung, 2006) Very similar to repeated measures models: In RM models, we often assume measurements are independent conditional on the individual. Here, we have component-specific effects instead of individual-specific effects.

20 There exists a nice identifiability result for conditional independence when r 3 Recall the conditional independence finite mixture model: g(x) = m r f jk (x k ) λ j j=1 k=1 Allman, Matias, & Rhodes (2009) use a theorem by Kruskal (1976) to show that if: f 1k,..., f mk are linearly independent for each k; r 3... then g(x) uniquely determines all the λ j and f jk (up to label-switching).

21 Some of the marginals may be assumed identical Let the r coordinates be grouped into B i.d. blocks. Denote the block of the kth coordinate by b k, 1 b k B. The model becomes g(x) = m r f jbk (x k ) λ j j=1 k=1 Special cases: b k = k for each k: Fully general model, seen earlier (Hall, Neeman, Pakyari, & Elmore 2005; Qin & Leung 2006) b k = 1 for each k: Conditionally i.i.d. assumption (Elmore, Hettmansperger, & Thomas 2004)

22 The water-level data may be blocked 8 vessels, presented in the order: 11, 4, 2, 7, 10, 5, 1, 8 o clock Assume that opposite clock-face orientations lead to conditionally iid responses (same behavior) B = 4 blocks defined by b = (4, 3, 2, 1, 3, 4, 1, 2) e.g., b 4 = b 7 = 1, i.e., block 1 relates to coordinates 4 and 7, corresponding to clock orientations 1:00 and 7:00 11:00 4:00 2:00 7:00 10:00 5:00 1:00 8:00

23 The nonparametric EM algorithm is easily extended to the multivariate conditional independence case E-step and M-step: Same as usual: Ẑ ij = ˆλ j ˆfj (x i ) ˆλ ˆf(x i ) = ˆλ r ˆf j k=1 jk (x ik ) ˆλ ˆf(x i ) and ˆλ next = 1 n n i=1 Ẑ i KDE-step: Update estimate of f (for some bandwidth h) by ˆf next jk (u) = 1 nhˆλ next j n ( u xik Ẑ ij K h i=1 ). (Benaglia, Chauveau, Hunter, 2009)

24 The Water-level data, three components Appearance of Vessel at Orientation = 1:00 Block 1: 1:00 and 7:00 orientations Mixing Proportion (Mean, Std Dev) ( 32.1, 19.4) ( 3.9, 23.3) ( 1.4, 6.0) Appearance of Vessel at Orientation = 2:00 Block 2: 2:00 and 8:00 orientations Mixing Proportion (Mean, Std Dev) ( 31.4, 55.4) ( 11.7, 27.0) ( 2.7, 4.6) Block 3: 4:00 and 10:00 orientations Block 4: 5:00 and 11:00 orientations Appearance of Vessel at Orientation = 4:00 Mixing Proportion (Mean, Std Dev) ( 43.6, 39.7) ( 11.4, 27.5) ( 1.0, 5.3) Appearance of Vessel at Orientation = 5:00 Mixing Proportion (Mean, Std Dev) ( 27.5, 19.3) ( 2.0, 22.1) ( 0.1, 6.1)

25 The Water-level data, four components Appearance of Vessel at Orientation = 1:00 Block 1: 1:00 and 7:00 orientations Mixing Proportion (Mean, Std Dev) ( 31.0, 10.2) ( 22.9, 35.2) ( 0.5, 16.4) ( 1.7, 5.1) Appearance of Vessel at Orientation = 2:00 Block 2: 2:00 and 8:00 orientations Mixing Proportion (Mean, Std Dev) ( 48.2, 36.2) ( 0.3, 51.9) ( 14.5, 18.0) ( 2.7, 4.3) Block 3: 4:00 and 10:00 orientations Block 4: 5:00 and 11:00 orientations Appearance of Vessel at Orientation = 4:00 Mixing Proportion (Mean, Std Dev) ( 58.2, 16.3) ( 0.5, 49.0) ( 15.6, 16.9) ( 0.9, 5.2) Appearance of Vessel at Orientation = 5:00 Mixing Proportion (Mean, Std Dev) ( 28.2, 12.0) ( 18.0, 34.6) ( 1.9, 14.8) ( 0.3, 5.3)

26 Outline 1 Nonparametric mixtures and parameter identifiability 2 Motivating example; extension to multivariate case 3 Smoothed maximum likelihood

27 The previous algorithm is not truly EM Does this algorithm maximize any sort of log-likelihood? Does it have an EM-like ascent property? Are the estimators consistent and, if so, at what rate? Empirical evidence: Rates of convergence similar to those in non-mixture setting. We might hope that the algorithm has an ascent property for l(λ, f) = = n m log λ j f j (x i ) i=1 j=1 n m log λ j i=1 j=1 k=1 r f jk (x ik ). Unfortunately, l(λ next, f next ) l(λ, f)

28 Smoothing the likelihood No ascent property for l(λ, f) = n m log λ j f j (x i ). i=1 j=1 However, we borrow an idea from Eggermont and LaRiccia (1995) and introduce a nonlinearly smoothed version: n m l n S (λ, f) = log λ j [N f j ](x i ), i=1 j=1 where 1 [N f j ](x) = exp h r K r ( x u h r ) log f j (u) du.

29 The infinite sample case: Minimum K-L divergence n m Whereas l n S (λ, f) = log λ j N f j (x i ), i=1 j=1 we may define e j = λ j f j and write l (e) = ( ) g(x) g(x) log j N e dx + j(x) j e j (x) dx. We wish to minimize l (e) over vectors e of functions not necessarily integrating to unity. The added term ensures that the solution satisfies the usual constraints, i.e., j λ j = 1 and f j (x) dx = 1. This may be viewed as minimizing a (penalized) K-L divergence between g( ) and j N e j( ).

30 The finite- and infinite-sample problems lead to EM-like (actually MM) algorithms E-step: Old: Ẑij = ˆλ jˆfj (x i ) j ˆλ j ˆf j (x i ) M-step: n ˆλ next = 1 n i=1 Ẑ i KDE-step: (Part 2 of M-step) ˆf next jk (u) = 1 nhˆλ next j n ( u xik Ẑ ij K h i=1 ).

31 The finite- and infinite-sample problems lead to EM-like (actually MM) algorithms E-step: (Technically an M-step, for minorization ) Old: Ẑij = ˆλ jˆfj (x i ) j ˆλ j ˆf j (x i ) New: Ẑij = ˆλ j Nˆf j (x i ) j ˆλ j Nˆf j (x i ) M-step: n ˆλ next = 1 n i=1 Ẑ i KDE-step: (Part 2 of M-step) ˆf next jk (u) = 1 nhˆλ next j n ( u xik Ẑ ij K h i=1 ).

32 First M in MM is for minorization Recall the smoothed loglikelihood: Define n l n S (λ, f) = i=1 log m λ j N f j (x i ) j=1 Q(λ, f ˆλ,ˆf) = n m Ẑ ij log { λ j N f j (x i ) } i=1 j=1 We can prove l n S (λ, f) ln S (ˆλ,ˆf) Q(λ, f ˆλ,ˆf) Q(ˆλ,ˆf ˆλ,ˆf). We say that Q( ˆλ,ˆf) is a minorizer of l n S ( ) at (λ, f).

33 MM is a generalization of EM The minorizing equation is l n S (λ, f) ln S (ˆλ,ˆf) Q(λ, f ˆλ,ˆf) Q(ˆλ,ˆf ˆλ,ˆf). (2) By (2), increasing Q( ˆλ,ˆf) leads to an increase in l n S ( ). Q(λ, f ˆλ,ˆf) is maximized at (ˆλ next,ˆf next ). Thus, the MM algorithm guarantees that l n S (ˆλ next,ˆf next ) l n S (ˆλ,ˆf). Top curve: l n S (λ, f) Bottom curve: Q(λ, f ˆλ,ˆf) Q( ˆλ,ˆf ˆλ,ˆf) + l n S ( ˆλ,ˆf) Equality occurs at ( ˆλ,ˆf)

34 The Water-level data revisited Block 4: 5:00 and 11:00 orientations Block 3: 4:00 and 10:00 orientations Density Density NEMS Block 2: 2:00 and 8:00 orientations NEMS Density Density NEMS Block 1: 1:00 and 7:00 orientations NEMS NEMS = nonlinearly smoothed EM-like algorithm (Levine, Chauveau, Hunter 2011) Colored lines = original (non-smoothed) algorithm Very little difference in any example we ve seen

35 Next step: Use ascent property to establish theory Empirical results suggest consistency and convergence rate results are possible. Eggermont and LaRiccia (1995) prove asymptotic results for a related problem; however, it appears that their methods do not apply directly here. for f 21 with 90% empirical CI f 21 : slope = log( MISE) n log(n) for λ 2 with 90% empirical CI λ 2 : slope Multivariate = Nonparametric Mixtures

36 References Allman, E. S., Matias, C., and Rhodes, J. A. (2009), Identifiability of parameters in latent structure models with many observed variables, Annals of Statistics, 37: Benaglia, T., Chauveau, D., and Hunter, D. R. (2009), An EM-like algorithm for semi- and non-parametric estimation in mutlivariate mixtures, Journal of Computational and Graphical Statistics, 18: Benaglia, T., Chauveau, D., Hunter, D. R., and Young, D. S. (2009), mixtools: An R Package for Analyzing Finite Mixture Models, Journal of Statistical Software, 32(6). Bordes, L., Mottelet, S., and Vandekerkhove, P. (2006), Semiparametric estimation of a two-component mixture model, Annals of Statistics, 34, Bordes, L., Chauveau, D., and Vandekerkhove, P. (2007), An EM algorithm for a semiparametric mixture model, Computational Statistics and Data Analysis, 51: Eggermont, P. P. B. and LaRiccia, V. N. (1995), Maximum Smoothed Density Estimation for Inverse Problems, Annals of Statistics, 23, Elmore, R. T., Hettmansperger, T. P., and Thomas, H. (2004), Estimating component cumulative distribution functions in finite mixture models, Communications in Statistics: Theory and Methods, 33: Hall, P., Neeman, A., Pakyari, R., and Elmore, R. (2005), Nonparametric inference in multivariate mixtures, Biometrika, 92: Hunter, D. R., Wang, S., and Hettmansperger, T. P. (2007), Inference for mixtures of symmetric distributions, Annals of Statistics, 35: Levine, M., Hunter, D. R., and Chauveau, D. (2011), Maximum Smoothed Likelihood for Multivariate Mixtures, Biometrika, 98 (2): Kruskal, J. B. (1976), More Factors Than Subjects, Tests and Treatments: An Indeterminacy Theorem for Canonical Decomposition and Individual Differences Scaling, Psychometrika, 41: Thomas, H., Lohaus, A., and Brainerd, C.J. (1993). Modeling Growth and Individual Differences in Spatial Tasks, Monographs of the Society for Research in Child Development, 58, 9: Qin, J. and Leung, D. H.-Y. (2006), Semiparametric analysis in conditionally independent multivariate mixture models, unpublished manuscript.

37 EXTRA SLIDES

38 Identifiability for mixtures of regressions: Intuition Consider the simplest case: Univariate x i and J {1, 2}: x y 0 x 1 x 0 0 Y = Xβ J + ɛ where ɛ f Fix X = x 0.

39 Identifiability for mixtures of regressions: Intuition Consider the simplest case: Univariate x i and J {1, 2}: x y 0 x 1 x 0 0 Y X = x 0 Y = Xβ J + ɛ where ɛ f Fix X = x 0. Conditional distribution of Y when X = x 0 not necessarily identifiable as mixture of shifted versions of f, even if f is assumed (say) symmetric.

40 Identifiability for mixtures of regressions: Intuition Consider the simplest case: Univariate x i and J {1, 2}: y 0 Y X = x 1 0 x 1 x 0 x Y X = x 0 Y = Xβ J + ɛ where ɛ f Fix X = x 0. Conditional distribution of Y when X = x 0 not necessarily identifiable as mixture of shifted versions of f, even if f is assumed (say) symmetric. Identifiability depends on using additional X values that change the relative locations of the mixture components.

41 Mixtures of simple linear regressions Next allow an intercept: Y = β J1 + Xβ J2 + ɛ, with ɛ f. x y 0 x 1 x 0 0 Even if f is assumed (say) symmetric about zero, identifiability can fail: Additional X values give no new information if the regression lines are parallel.

42 Mixtures of simple linear regressions Next allow an intercept: Y = β J1 + Xβ J2 + ɛ, with ɛ f. x y 0 x 1 x 0 0 Y X = x 0 Y X = x 1 Even if f is assumed (say) symmetric about zero, identifiability can fail: Additional X values give no new information if the regression lines are parallel.

43 Theorem (Hunter and Young 2012) Theorem: If the support of the density h(x) contains an open set in R p, then the parameters in ψ(x, y) = h(x) m λ j f (y x t β j ), j=1 are uniquely identifiable from ψ(x, y). When X is one-dimensional, there is a nice way to see this based on an idea of Laurent Bordes and Pierre Vandekerkhove.

44 Single predictor case Let m g x (y) = λ j f (y µ j + β j x) j=1 denote the conditional density of y given x. Next, define R k (a, b) = x k g x (a + bx) dx. A bit of algebra gives: R 0 (a, b) = R 1 (a, b) = m j=1 m j=1 λ j b β j This identifies the β j, λ j ; λ j (µ j a) (b β j ) 2 This identifies the µ j ;

Estimation for nonparametric mixture models

Estimation for nonparametric mixture models Estimation for nonparametric mixture models David Hunter Penn State University Research supported by NSF Grant SES 0518772 Joint work with Didier Chauveau (University of Orléans, France), Tatiana Benaglia

More information

New mixture models and algorithms in the mixtools package

New mixture models and algorithms in the mixtools package New mixture models and algorithms in the mixtools package Didier Chauveau MAPMO - UMR 6628 - Université d Orléans 1ères Rencontres BoRdeaux Juillet 2012 Outline 1 Mixture models and EM algorithm-ology

More information

An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures

An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures Tatiana Benaglia 1 Didier Chauveau 2 David R. Hunter 1,3 August 2008 1 Pennsylvania State University, USA 2 Université

More information

Semi-parametric estimation for conditional independence multivariate finite mixture models

Semi-parametric estimation for conditional independence multivariate finite mixture models Statistics Surveys Vol. 9 (2015) 1 31 ISSN: 1935-7516 DOI: 10.1214/15-SS108 Semi-parametric estimation for conditional independence multivariate finite mixture models Didier Chauveau Université d Orléans,

More information

NONPARAMETRIC ESTIMATION IN MULTIVARIATE FINITE MIXTURE MODELS

NONPARAMETRIC ESTIMATION IN MULTIVARIATE FINITE MIXTURE MODELS The Pennsylvania State University The Graduate School Department of Statistics NONPARAMETRIC ESTIMATION IN MULTIVARIATE FINITE MIXTURE MODELS A Dissertation in Statistics by Tatiana A. Benaglia 2008 Tatiana

More information

Computation of an efficient and robust estimator in a semiparametric mixture model

Computation of an efficient and robust estimator in a semiparametric mixture model Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Computation of an efficient and robust estimator in

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

The EM Algorithm for the Finite Mixture of Exponential Distribution Models

The EM Algorithm for the Finite Mixture of Exponential Distribution Models Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution

More information

UNIVERSITY OF CALGARY. Minimum Hellinger Distance Estimation for a Two-Component Mixture Model. Xiaofan Zhou A THESIS

UNIVERSITY OF CALGARY. Minimum Hellinger Distance Estimation for a Two-Component Mixture Model. Xiaofan Zhou A THESIS UNIVERSITY OF CALGARY Minimum Hellinger Distance Estimation for a Two-Component Mixture Model by Xiaofan Zhou A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION. Alireza Bayestehtashk and Izhak Shafran

PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION. Alireza Bayestehtashk and Izhak Shafran PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION Alireza Bayestehtashk and Izhak Shafran Center for Spoken Language Understanding, Oregon Health & Science University, Portland, Oregon, USA

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

CV-NP BAYESIANISM BY MCMC. Cross Validated Non Parametric Bayesianism by Markov Chain Monte Carlo CARLOS C. RODRIGUEZ

CV-NP BAYESIANISM BY MCMC. Cross Validated Non Parametric Bayesianism by Markov Chain Monte Carlo CARLOS C. RODRIGUEZ CV-NP BAYESIANISM BY MCMC Cross Validated Non Parametric Bayesianism by Markov Chain Monte Carlo CARLOS C. RODRIGUE Department of Mathematics and Statistics University at Albany, SUNY Albany NY 1, USA

More information

A Conditional Approach to Modeling Multivariate Extremes

A Conditional Approach to Modeling Multivariate Extremes A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying

More information

Copulas. MOU Lili. December, 2014

Copulas. MOU Lili. December, 2014 Copulas MOU Lili December, 2014 Outline Preliminary Introduction Formal Definition Copula Functions Estimating the Parameters Example Conclusion and Discussion Preliminary MOU Lili SEKE Team 3/30 Probability

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

INFERENCE ON MIXTURES UNDER TAIL RESTRICTIONS

INFERENCE ON MIXTURES UNDER TAIL RESTRICTIONS Discussion paper 24- INFERENCE ON MIXTURES UNDER TAIL RESTRICTIONS Marc Henry Koen Jochmans Bernard Salanié Sciences Po Economics Discussion Papers INFERENCE ON MIXTURES UNDER TAIL RESTRICTIONS By Marc

More information

Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions

Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions UDC 519.2 Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions Yu. K. Belyaev, D. Källberg, P. Rydén Department of Mathematics and Mathematical Statistics, Umeå University,

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models NIH Talk, September 03 Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models Eric Slud, Math Dept, Univ of Maryland Ongoing joint project with Ilia

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Yoan Miche CIS, HUT March 17, 27 Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 1 / 23 Mise en Bouche Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 2 / 23 Mise

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

10-701/15-781, Machine Learning: Homework 4

10-701/15-781, Machine Learning: Homework 4 10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Log-Linear Models, MEMMs, and CRFs

Log-Linear Models, MEMMs, and CRFs Log-Linear Models, MEMMs, and CRFs Michael Collins 1 Notation Throughout this note I ll use underline to denote vectors. For example, w R d will be a vector with components w 1, w 2,... w d. We use expx

More information

Bayesian inference for factor scores

Bayesian inference for factor scores Bayesian inference for factor scores Murray Aitkin and Irit Aitkin School of Mathematics and Statistics University of Newcastle UK October, 3 Abstract Bayesian inference for the parameters of the factor

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Guy Lebanon February 19, 2011 Maximum likelihood estimation is the most popular general purpose method for obtaining estimating a distribution from a finite sample. It was

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Collaborative Topic Modeling for Recommending Scientific Articles

Collaborative Topic Modeling for Recommending Scientific Articles Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei Best student paper award at KDD 2011 Computer Science Department, Princeton University Presented by Tian Cao

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

A Novel Nonparametric Density Estimator

A Novel Nonparametric Density Estimator A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Accelerating the EM Algorithm for Mixture Density Estimation

Accelerating the EM Algorithm for Mixture Density Estimation Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18 Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic

More information

A tailor made nonparametric density estimate

A tailor made nonparametric density estimate A tailor made nonparametric density estimate Daniel Carando 1, Ricardo Fraiman 2 and Pablo Groisman 1 1 Universidad de Buenos Aires 2 Universidad de San Andrés School and Workshop on Probability Theory

More information

Bayesian Aggregation for Extraordinarily Large Dataset

Bayesian Aggregation for Extraordinarily Large Dataset Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Estimation of a Two-component Mixture Model

Estimation of a Two-component Mixture Model Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint

More information

University of Chicago; Sciences Po, Paris; and Sciences Po, Paris and University College, London

University of Chicago; Sciences Po, Paris; and Sciences Po, Paris and University College, London ESTIMATING MULTIVARIATE LATENT-STRUCTURE MODELS By Stéphane Bonhomme, Koen Jochmans and Jean-Marc Robin University of Chicago; Sciences Po, Paris; and Sciences Po, Paris and University College, London

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information