Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures
|
|
- Loreen Fowler
- 6 years ago
- Views:
Transcription
1 Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove, Laurent Bordes, Tatiana Benaglia, Michael Levine, Xiaotian Zhu Research partially supported by NSF Grants SES , DMS JdS Toulouse, May 30, 2013
2 Outline 1 Nonparametric mixtures and parameter identifiability 2 Motivating example; extension to multivariate case 3 Smoothed maximum likelihood
3 Outline 1 Nonparametric mixtures and parameter identifiability 2 Motivating example; extension to multivariate case 3 Smoothed maximum likelihood
4 We first introduce nonparametric finite mixtures X g(x) }{{} mixture density = f φ (x) }{{} component density dq(φ) }{{} mixing distribution (1) Sometimes, Q( ) is the nonparametric part; e.g., work by Bruce Lindsay assumes Q( ) is unrestricted. However, in this talk we assume that f φ ( ) is (mostly) unrestricted Q( ) has finite support So (1) becomes g(x) = m λ j f j (x) j=1... and we assume m is known.
5 Living histogram at Penn State (Brian Joiner, 1975) Can we learn about male / female subpopulations (i.e., parameters)? What can we say about individuals?
6 Old Faithful Geyser waiting times provide another simple univariate example Time between Old Faithful eruptions Minutes from Let m = 2, so assume we have a sample from λ 1 f 1 (x) + λ 2 f 2 (x). Why do we need any assumptions on f j?
7 With no assumptions, parameters are not identifiable λ 2 f 2 λ 1 f 1 Multiple different parameter combinations (λ 1, λ 2, f 1, f 2 ) λ 1 f 1 λ 2 f 2 give the same mixture density. Thus, some constraints on f j are necessary. NB: Sometimes, there is no obvious multi-modality. λ 1 f 1 λ 2 f 2
8 The univariate case is identifiable under some assumptions It is possible to show 1 that if g(x) = 2 λ j f j (x), j=1 the λ j and f j are uniquely identifiable from g if λ 1 1/2 and f j (x) f (x µ j ) for some density f ( ) that is symmetric about the origin. 1 cf. Bordes, Mottelet, and Vandekerkhove (2006); Hunter, Wang, and Hettmansperger (2007)
9 A modified EM algorithm may be used for estimation EM preliminaries: A complete observation (X, Z) consists of: The observed data X The unobserved vector Z, defined by { 1 if X comes from component j for 1 j m, Z j = 0 otherwise What does this mean? In simulations: Generate Z first, then X Z [ j fj ( ) ] Z j In real data, Z is a latent variable whose interpretation depends on context.
10 Standard EM for finite mixtures looks like this: E-step: Amounts to finding the conditional expectation of each Z i : def Ẑ ij = E ˆθ Z ij = ˆλ j ˆfj (x i ) ˆλ ˆf(x i ) M-step: Amounts to maximizing the expected complete data loglikelihood n m L c (θ) = Ẑ ij log [ λ j f j (x i ) ] = ˆλ next = 1 Ẑ i n i=1 j=1 Iterate: Let ˆθ next = arg max θ L c (θ) and repeat. i N.B.: Usually, f j (x) f (x; φ j ). We let θ denote (λ, φ).
11 A modified EM algorithm may be used for estimation E-step and M-step: Same as usual: Ẑ ij def = E ˆθ Z ij = ˆλ j ˆfj (x i ) ˆλ ˆf(x i ) and ˆλ next = 1 n n i=1 Ẑ i KDE-step: Update ˆf j using a weighted kernel density estimate. Weight each x i by the corresponding Ẑij. Alternatively, select randomly Z i Mult(1, Ẑi) and use a standard KDE for the x i selected into each component. (cf. Bordes, Chauveau, & Vandekerkhove, 2007).
12 For the symmetric location family assumption, the modified EM looks like this E-step: Same as usual: Ẑ ij E ˆθ Z ij = ˆλ jˆf (xi µ j ) ˆλ 1ˆf (xi µ 1 ) + ˆλ 2ˆf (xi µ 2 ) M-step: Maximize complete data loglikelihood for λ and µ: ˆλ next j = 1 n n i=1 Ẑ ij ˆµ next j = (n λ j ) 1 n Ẑ ij x i i=1 KDE-step: Update estimate of f (for some bandwidth h) by ˆf next (u) = (nh) 1 n i=1 j=1 2 ( u xi + ˆµ j Ẑ ij K h ), then symmetrize.
13 Compare two solutions for Old Faithful data Time between Old Faithful eruptions Density Minutes
14 Compare two solutions for Old Faithful data Time between Old Faithful eruptions Density λ 1 = Minutes Gaussian EM: ˆµ = (54.6, 80.1)
15 Compare two solutions for Old Faithful data Time between Old Faithful eruptions Density λ 1 = λ 1 = Minutes Gaussian EM: ˆµ = (54.6, 80.1) Semiparametric EM with bandwidth = 4: ˆµ = (54.7, 79.8) Both algorithms are implemented in mixtools package for R (Benaglia et al., 2009).
16 Outline 1 Nonparametric mixtures and parameter identifiability 2 Motivating example; extension to multivariate case 3 Smoothed maximum likelihood
17 Multivariate example: Water-level angles This example is due to Thomas, Lohaus, & Brainerd (1993). The task: Subjects are shown 8 vessels, pointing at 1:00, 2:00, 4:00, 5:00, 7:00, 8:00, 10:00, and 11:00 They draw the water surface on each vessel Measure: (signed) angle with horizontal formed by line drawn Vessel tilted to point at 1:00
18 Water-level dataset After loading mixtools, R> data(waterdata) R> Waterdata Trial.1 Trial.2 Trial.3 Trial.4 Trial.5 Trial.6 Trial.7 Trial For these data, Number of subjects: n = 405 Number of coordinates (repeated measures): r = 8. What should m (number of mixture components) be?
19 We will assume conditional independence Each f j (x) density on R r is assumed to be the product of its marginals: g(x) = m r f jk (x k ) λ j j=1 k=1 We call this assumption conditional independence (cf. Hall and Zhou, 2003; Qin and Leung, 2006) Very similar to repeated measures models: In RM models, we often assume measurements are independent conditional on the individual. Here, we have component-specific effects instead of individual-specific effects.
20 There exists a nice identifiability result for conditional independence when r 3 Recall the conditional independence finite mixture model: g(x) = m r f jk (x k ) λ j j=1 k=1 Allman, Matias, & Rhodes (2009) use a theorem by Kruskal (1976) to show that if: f 1k,..., f mk are linearly independent for each k; r 3... then g(x) uniquely determines all the λ j and f jk (up to label-switching).
21 Some of the marginals may be assumed identical Let the r coordinates be grouped into B i.d. blocks. Denote the block of the kth coordinate by b k, 1 b k B. The model becomes g(x) = m r f jbk (x k ) λ j j=1 k=1 Special cases: b k = k for each k: Fully general model, seen earlier (Hall, Neeman, Pakyari, & Elmore 2005; Qin & Leung 2006) b k = 1 for each k: Conditionally i.i.d. assumption (Elmore, Hettmansperger, & Thomas 2004)
22 The water-level data may be blocked 8 vessels, presented in the order: 11, 4, 2, 7, 10, 5, 1, 8 o clock Assume that opposite clock-face orientations lead to conditionally iid responses (same behavior) B = 4 blocks defined by b = (4, 3, 2, 1, 3, 4, 1, 2) e.g., b 4 = b 7 = 1, i.e., block 1 relates to coordinates 4 and 7, corresponding to clock orientations 1:00 and 7:00 11:00 4:00 2:00 7:00 10:00 5:00 1:00 8:00
23 The nonparametric EM algorithm is easily extended to the multivariate conditional independence case E-step and M-step: Same as usual: Ẑ ij = ˆλ j ˆfj (x i ) ˆλ ˆf(x i ) = ˆλ r ˆf j k=1 jk (x ik ) ˆλ ˆf(x i ) and ˆλ next = 1 n n i=1 Ẑ i KDE-step: Update estimate of f (for some bandwidth h) by ˆf next jk (u) = 1 nhˆλ next j n ( u xik Ẑ ij K h i=1 ). (Benaglia, Chauveau, Hunter, 2009)
24 The Water-level data, three components Appearance of Vessel at Orientation = 1:00 Block 1: 1:00 and 7:00 orientations Mixing Proportion (Mean, Std Dev) ( 32.1, 19.4) ( 3.9, 23.3) ( 1.4, 6.0) Appearance of Vessel at Orientation = 2:00 Block 2: 2:00 and 8:00 orientations Mixing Proportion (Mean, Std Dev) ( 31.4, 55.4) ( 11.7, 27.0) ( 2.7, 4.6) Block 3: 4:00 and 10:00 orientations Block 4: 5:00 and 11:00 orientations Appearance of Vessel at Orientation = 4:00 Mixing Proportion (Mean, Std Dev) ( 43.6, 39.7) ( 11.4, 27.5) ( 1.0, 5.3) Appearance of Vessel at Orientation = 5:00 Mixing Proportion (Mean, Std Dev) ( 27.5, 19.3) ( 2.0, 22.1) ( 0.1, 6.1)
25 The Water-level data, four components Appearance of Vessel at Orientation = 1:00 Block 1: 1:00 and 7:00 orientations Mixing Proportion (Mean, Std Dev) ( 31.0, 10.2) ( 22.9, 35.2) ( 0.5, 16.4) ( 1.7, 5.1) Appearance of Vessel at Orientation = 2:00 Block 2: 2:00 and 8:00 orientations Mixing Proportion (Mean, Std Dev) ( 48.2, 36.2) ( 0.3, 51.9) ( 14.5, 18.0) ( 2.7, 4.3) Block 3: 4:00 and 10:00 orientations Block 4: 5:00 and 11:00 orientations Appearance of Vessel at Orientation = 4:00 Mixing Proportion (Mean, Std Dev) ( 58.2, 16.3) ( 0.5, 49.0) ( 15.6, 16.9) ( 0.9, 5.2) Appearance of Vessel at Orientation = 5:00 Mixing Proportion (Mean, Std Dev) ( 28.2, 12.0) ( 18.0, 34.6) ( 1.9, 14.8) ( 0.3, 5.3)
26 Outline 1 Nonparametric mixtures and parameter identifiability 2 Motivating example; extension to multivariate case 3 Smoothed maximum likelihood
27 The previous algorithm is not truly EM Does this algorithm maximize any sort of log-likelihood? Does it have an EM-like ascent property? Are the estimators consistent and, if so, at what rate? Empirical evidence: Rates of convergence similar to those in non-mixture setting. We might hope that the algorithm has an ascent property for l(λ, f) = = n m log λ j f j (x i ) i=1 j=1 n m log λ j i=1 j=1 k=1 r f jk (x ik ). Unfortunately, l(λ next, f next ) l(λ, f)
28 Smoothing the likelihood No ascent property for l(λ, f) = n m log λ j f j (x i ). i=1 j=1 However, we borrow an idea from Eggermont and LaRiccia (1995) and introduce a nonlinearly smoothed version: n m l n S (λ, f) = log λ j [N f j ](x i ), i=1 j=1 where 1 [N f j ](x) = exp h r K r ( x u h r ) log f j (u) du.
29 The infinite sample case: Minimum K-L divergence n m Whereas l n S (λ, f) = log λ j N f j (x i ), i=1 j=1 we may define e j = λ j f j and write l (e) = ( ) g(x) g(x) log j N e dx + j(x) j e j (x) dx. We wish to minimize l (e) over vectors e of functions not necessarily integrating to unity. The added term ensures that the solution satisfies the usual constraints, i.e., j λ j = 1 and f j (x) dx = 1. This may be viewed as minimizing a (penalized) K-L divergence between g( ) and j N e j( ).
30 The finite- and infinite-sample problems lead to EM-like (actually MM) algorithms E-step: Old: Ẑij = ˆλ jˆfj (x i ) j ˆλ j ˆf j (x i ) M-step: n ˆλ next = 1 n i=1 Ẑ i KDE-step: (Part 2 of M-step) ˆf next jk (u) = 1 nhˆλ next j n ( u xik Ẑ ij K h i=1 ).
31 The finite- and infinite-sample problems lead to EM-like (actually MM) algorithms E-step: (Technically an M-step, for minorization ) Old: Ẑij = ˆλ jˆfj (x i ) j ˆλ j ˆf j (x i ) New: Ẑij = ˆλ j Nˆf j (x i ) j ˆλ j Nˆf j (x i ) M-step: n ˆλ next = 1 n i=1 Ẑ i KDE-step: (Part 2 of M-step) ˆf next jk (u) = 1 nhˆλ next j n ( u xik Ẑ ij K h i=1 ).
32 First M in MM is for minorization Recall the smoothed loglikelihood: Define n l n S (λ, f) = i=1 log m λ j N f j (x i ) j=1 Q(λ, f ˆλ,ˆf) = n m Ẑ ij log { λ j N f j (x i ) } i=1 j=1 We can prove l n S (λ, f) ln S (ˆλ,ˆf) Q(λ, f ˆλ,ˆf) Q(ˆλ,ˆf ˆλ,ˆf). We say that Q( ˆλ,ˆf) is a minorizer of l n S ( ) at (λ, f).
33 MM is a generalization of EM The minorizing equation is l n S (λ, f) ln S (ˆλ,ˆf) Q(λ, f ˆλ,ˆf) Q(ˆλ,ˆf ˆλ,ˆf). (2) By (2), increasing Q( ˆλ,ˆf) leads to an increase in l n S ( ). Q(λ, f ˆλ,ˆf) is maximized at (ˆλ next,ˆf next ). Thus, the MM algorithm guarantees that l n S (ˆλ next,ˆf next ) l n S (ˆλ,ˆf). Top curve: l n S (λ, f) Bottom curve: Q(λ, f ˆλ,ˆf) Q( ˆλ,ˆf ˆλ,ˆf) + l n S ( ˆλ,ˆf) Equality occurs at ( ˆλ,ˆf)
34 The Water-level data revisited Block 4: 5:00 and 11:00 orientations Block 3: 4:00 and 10:00 orientations Density Density NEMS Block 2: 2:00 and 8:00 orientations NEMS Density Density NEMS Block 1: 1:00 and 7:00 orientations NEMS NEMS = nonlinearly smoothed EM-like algorithm (Levine, Chauveau, Hunter 2011) Colored lines = original (non-smoothed) algorithm Very little difference in any example we ve seen
35 Next step: Use ascent property to establish theory Empirical results suggest consistency and convergence rate results are possible. Eggermont and LaRiccia (1995) prove asymptotic results for a related problem; however, it appears that their methods do not apply directly here. for f 21 with 90% empirical CI f 21 : slope = log( MISE) n log(n) for λ 2 with 90% empirical CI λ 2 : slope Multivariate = Nonparametric Mixtures
36 References Allman, E. S., Matias, C., and Rhodes, J. A. (2009), Identifiability of parameters in latent structure models with many observed variables, Annals of Statistics, 37: Benaglia, T., Chauveau, D., and Hunter, D. R. (2009), An EM-like algorithm for semi- and non-parametric estimation in mutlivariate mixtures, Journal of Computational and Graphical Statistics, 18: Benaglia, T., Chauveau, D., Hunter, D. R., and Young, D. S. (2009), mixtools: An R Package for Analyzing Finite Mixture Models, Journal of Statistical Software, 32(6). Bordes, L., Mottelet, S., and Vandekerkhove, P. (2006), Semiparametric estimation of a two-component mixture model, Annals of Statistics, 34, Bordes, L., Chauveau, D., and Vandekerkhove, P. (2007), An EM algorithm for a semiparametric mixture model, Computational Statistics and Data Analysis, 51: Eggermont, P. P. B. and LaRiccia, V. N. (1995), Maximum Smoothed Density Estimation for Inverse Problems, Annals of Statistics, 23, Elmore, R. T., Hettmansperger, T. P., and Thomas, H. (2004), Estimating component cumulative distribution functions in finite mixture models, Communications in Statistics: Theory and Methods, 33: Hall, P., Neeman, A., Pakyari, R., and Elmore, R. (2005), Nonparametric inference in multivariate mixtures, Biometrika, 92: Hunter, D. R., Wang, S., and Hettmansperger, T. P. (2007), Inference for mixtures of symmetric distributions, Annals of Statistics, 35: Levine, M., Hunter, D. R., and Chauveau, D. (2011), Maximum Smoothed Likelihood for Multivariate Mixtures, Biometrika, 98 (2): Kruskal, J. B. (1976), More Factors Than Subjects, Tests and Treatments: An Indeterminacy Theorem for Canonical Decomposition and Individual Differences Scaling, Psychometrika, 41: Thomas, H., Lohaus, A., and Brainerd, C.J. (1993). Modeling Growth and Individual Differences in Spatial Tasks, Monographs of the Society for Research in Child Development, 58, 9: Qin, J. and Leung, D. H.-Y. (2006), Semiparametric analysis in conditionally independent multivariate mixture models, unpublished manuscript.
37 EXTRA SLIDES
38 Identifiability for mixtures of regressions: Intuition Consider the simplest case: Univariate x i and J {1, 2}: x y 0 x 1 x 0 0 Y = Xβ J + ɛ where ɛ f Fix X = x 0.
39 Identifiability for mixtures of regressions: Intuition Consider the simplest case: Univariate x i and J {1, 2}: x y 0 x 1 x 0 0 Y X = x 0 Y = Xβ J + ɛ where ɛ f Fix X = x 0. Conditional distribution of Y when X = x 0 not necessarily identifiable as mixture of shifted versions of f, even if f is assumed (say) symmetric.
40 Identifiability for mixtures of regressions: Intuition Consider the simplest case: Univariate x i and J {1, 2}: y 0 Y X = x 1 0 x 1 x 0 x Y X = x 0 Y = Xβ J + ɛ where ɛ f Fix X = x 0. Conditional distribution of Y when X = x 0 not necessarily identifiable as mixture of shifted versions of f, even if f is assumed (say) symmetric. Identifiability depends on using additional X values that change the relative locations of the mixture components.
41 Mixtures of simple linear regressions Next allow an intercept: Y = β J1 + Xβ J2 + ɛ, with ɛ f. x y 0 x 1 x 0 0 Even if f is assumed (say) symmetric about zero, identifiability can fail: Additional X values give no new information if the regression lines are parallel.
42 Mixtures of simple linear regressions Next allow an intercept: Y = β J1 + Xβ J2 + ɛ, with ɛ f. x y 0 x 1 x 0 0 Y X = x 0 Y X = x 1 Even if f is assumed (say) symmetric about zero, identifiability can fail: Additional X values give no new information if the regression lines are parallel.
43 Theorem (Hunter and Young 2012) Theorem: If the support of the density h(x) contains an open set in R p, then the parameters in ψ(x, y) = h(x) m λ j f (y x t β j ), j=1 are uniquely identifiable from ψ(x, y). When X is one-dimensional, there is a nice way to see this based on an idea of Laurent Bordes and Pierre Vandekerkhove.
44 Single predictor case Let m g x (y) = λ j f (y µ j + β j x) j=1 denote the conditional density of y given x. Next, define R k (a, b) = x k g x (a + bx) dx. A bit of algebra gives: R 0 (a, b) = R 1 (a, b) = m j=1 m j=1 λ j b β j This identifies the β j, λ j ; λ j (µ j a) (b β j ) 2 This identifies the µ j ;
Estimation for nonparametric mixture models
Estimation for nonparametric mixture models David Hunter Penn State University Research supported by NSF Grant SES 0518772 Joint work with Didier Chauveau (University of Orléans, France), Tatiana Benaglia
More informationNew mixture models and algorithms in the mixtools package
New mixture models and algorithms in the mixtools package Didier Chauveau MAPMO - UMR 6628 - Université d Orléans 1ères Rencontres BoRdeaux Juillet 2012 Outline 1 Mixture models and EM algorithm-ology
More informationAn EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures
An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures Tatiana Benaglia 1 Didier Chauveau 2 David R. Hunter 1,3 August 2008 1 Pennsylvania State University, USA 2 Université
More informationSemi-parametric estimation for conditional independence multivariate finite mixture models
Statistics Surveys Vol. 9 (2015) 1 31 ISSN: 1935-7516 DOI: 10.1214/15-SS108 Semi-parametric estimation for conditional independence multivariate finite mixture models Didier Chauveau Université d Orléans,
More informationNONPARAMETRIC ESTIMATION IN MULTIVARIATE FINITE MIXTURE MODELS
The Pennsylvania State University The Graduate School Department of Statistics NONPARAMETRIC ESTIMATION IN MULTIVARIATE FINITE MIXTURE MODELS A Dissertation in Statistics by Tatiana A. Benaglia 2008 Tatiana
More informationComputation of an efficient and robust estimator in a semiparametric mixture model
Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Computation of an efficient and robust estimator in
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationThe EM Algorithm for the Finite Mixture of Exponential Distribution Models
Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution
More informationUNIVERSITY OF CALGARY. Minimum Hellinger Distance Estimation for a Two-Component Mixture Model. Xiaofan Zhou A THESIS
UNIVERSITY OF CALGARY Minimum Hellinger Distance Estimation for a Two-Component Mixture Model by Xiaofan Zhou A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationA general mixed model approach for spatio-temporal regression data
A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression
More informationPARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION. Alireza Bayestehtashk and Izhak Shafran
PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION Alireza Bayestehtashk and Izhak Shafran Center for Spoken Language Understanding, Oregon Health & Science University, Portland, Oregon, USA
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationSpatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood
Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October
More informationCV-NP BAYESIANISM BY MCMC. Cross Validated Non Parametric Bayesianism by Markov Chain Monte Carlo CARLOS C. RODRIGUEZ
CV-NP BAYESIANISM BY MCMC Cross Validated Non Parametric Bayesianism by Markov Chain Monte Carlo CARLOS C. RODRIGUE Department of Mathematics and Statistics University at Albany, SUNY Albany NY 1, USA
More informationA Conditional Approach to Modeling Multivariate Extremes
A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying
More informationCopulas. MOU Lili. December, 2014
Copulas MOU Lili December, 2014 Outline Preliminary Introduction Formal Definition Copula Functions Estimating the Parameters Example Conclusion and Discussion Preliminary MOU Lili SEKE Team 3/30 Probability
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationINFERENCE ON MIXTURES UNDER TAIL RESTRICTIONS
Discussion paper 24- INFERENCE ON MIXTURES UNDER TAIL RESTRICTIONS Marc Henry Koen Jochmans Bernard Salanié Sciences Po Economics Discussion Papers INFERENCE ON MIXTURES UNDER TAIL RESTRICTIONS By Marc
More informationStatistical Analysis of Data Generated by a Mixture of Two Parametric Distributions
UDC 519.2 Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions Yu. K. Belyaev, D. Källberg, P. Rydén Department of Mathematics and Mathematical Statistics, Umeå University,
More informationSemi-Nonparametric Inferences for Massive Data
Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationPROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:
More informationIntegrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University
Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y
More informationEfficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models
NIH Talk, September 03 Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models Eric Slud, Math Dept, Univ of Maryland Ongoing joint project with Ilia
More informationMinimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.
Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationMixture Models and EM
Mixture Models and EM Yoan Miche CIS, HUT March 17, 27 Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 1 / 23 Mise en Bouche Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 2 / 23 Mise
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More information10-701/15-781, Machine Learning: Homework 4
10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one
More informationAsymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands
Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department
More informationBoundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta
Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationMultivariate Survival Analysis
Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in
More informationQuantile Regression for Residual Life and Empirical Likelihood
Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationLog-Linear Models, MEMMs, and CRFs
Log-Linear Models, MEMMs, and CRFs Michael Collins 1 Notation Throughout this note I ll use underline to denote vectors. For example, w R d will be a vector with components w 1, w 2,... w d. We use expx
More informationBayesian inference for factor scores
Bayesian inference for factor scores Murray Aitkin and Irit Aitkin School of Mathematics and Statistics University of Newcastle UK October, 3 Abstract Bayesian inference for the parameters of the factor
More informationAdditive Isotonic Regression
Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Guy Lebanon February 19, 2011 Maximum likelihood estimation is the most popular general purpose method for obtaining estimating a distribution from a finite sample. It was
More informationA Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints
Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note
More informationSmooth simultaneous confidence bands for cumulative distribution functions
Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationCollaborative Topic Modeling for Recommending Scientific Articles
Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei Best student paper award at KDD 2011 Computer Science Department, Princeton University Presented by Tian Cao
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationA Novel Nonparametric Density Estimator
A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with
More informationLinear regression COMS 4771
Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between
More informationFactor Analysis (10/2/13)
STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationAccelerating the EM Algorithm for Mixture Density Estimation
Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18 Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic
More informationA tailor made nonparametric density estimate
A tailor made nonparametric density estimate Daniel Carando 1, Ricardo Fraiman 2 and Pablo Groisman 1 1 Universidad de Buenos Aires 2 Universidad de San Andrés School and Workshop on Probability Theory
More informationBayesian Aggregation for Extraordinarily Large Dataset
Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work
More informationCalibration Estimation of Semiparametric Copula Models with Data Missing at Random
Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationEstimation of a Two-component Mixture Model
Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint
More informationUniversity of Chicago; Sciences Po, Paris; and Sciences Po, Paris and University College, London
ESTIMATING MULTIVARIATE LATENT-STRUCTURE MODELS By Stéphane Bonhomme, Koen Jochmans and Jean-Marc Robin University of Chicago; Sciences Po, Paris; and Sciences Po, Paris and University College, London
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html
More informationA Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution
A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More information