New mixture models and algorithms in the mixtools package

Size: px
Start display at page:

Download "New mixture models and algorithms in the mixtools package"

Transcription

1 New mixture models and algorithms in the mixtools package Didier Chauveau MAPMO - UMR Université d Orléans 1ères Rencontres BoRdeaux Juillet 2012

2 Outline 1 Mixture models and EM algorithm-ology 2 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms 3

3 Univariate mixture example 1: Old Faithful wait times Time between Old Faithful eruptions (minutes) Minutes from Obvious bimodality Normal-looking components? Classification of individuals?

4 Univariate mixture example 2: lifetime data mixture of exponentials Density No obvious multimodality but 2 failure sources suspected? lifetime

5 Univariate mixture example 2: lifetime data mixture of exponentials Density component 1 component 2 mixture No obvious multimodality but 2 failure sources suspected? True components and mixture densities for these simulated data differences in the tails only! lifetime

6 Finite mixture model and missing data setup A complete ith observation Y i = (X i, Z i ) consists of: The missing 0/1 latent variable Z i = (Z i1,..., Z im ), { 1 if X i comes from component j Z ij =, P(Z ij = 1) = λ j 0 otherwise The observed, incomplete data of interest X i with jth component density (X i Z ij = 1) f j Most models in mixtools share in common a finite mixture pdf g θ (x) = m λ j f j (x) j=1 with the model parameters: θ = (λ, f) = (λ 1,..., λ m, f 1,..., f m )

7 Some mixture estimation problems in mixtools Goal: Estimate the parameter θ given an iid sample x from Univariate Cases: x R some parametric families g θ (x) = m λ j f (x φ j ) j=1 semi-parametric location-shift mixture g θ (x) = m λ j f (x µ j ) j=1 mixture of regressions,...

8 Some mixture estimation problems in mixtools Goal: Estimate the parameter θ given an iid sample x from Univariate Cases: x R some parametric families Multivariate cases: x R r parametric (Gaussian,... ) g θ (x) = m λ j f (x φ j ) j=1 g θ (x) = m λ j f (x φ j ) j=1 semi-parametric location-shift mixture g θ (x) = m λ j f (x µ j ) j=1 mixture of regressions,... Fully nonparametric, e.g. g θ (x) = m λ j j=1 k=1 r f jk (x k ) Conditional independence of x 1,..., x r given z

9 Multivariate example: Water-level data Example from psychometrics Thomas Lohaus Brainerd (1993). Subjects are shown r = 8 vessels orientations, presented in this order They draw the water surface for each Measure = signed angle formed by surface with horizontal 11:00 4:00 2:00 7:00 10:00 5:00 1:00 8:00

10 Multivariate example: Water-level data Example from psychometrics Thomas Lohaus Brainerd (1993). Subjects are shown r = 8 vessels orientations, presented in this order They draw the water surface for each Measure = signed angle formed by surface with horizontal Assume that opposite clock-face orientations lead to same behavior = conditionally iid responses (1 & 7), (2 & 8), (4 & 10), (5 & 11), 11:00 4:00 2:00 7:00 10:00 5:00 1:00 8:00

11 Water-level data: histograms by blocks of iid coord. Opposite clock-face = conditionally iid responses 1:00 and 7:00 orientations 2:00 and 8:00 orientations Vessel at Orientation 1: Vessel at Orientation 2: :00 and 10:00 orientations 5:00 and 11:00 orientations Vessel at Orientation 4: Vessel at Orientation 5:

12 EM Algorithm (1/3) Dempster Laird Rubin (1977) General context of missing data: A fraction x of y is observed MLE s on observed data (x) and complete data (y): ˆθ x = arg max θ Θ L x(θ), ˆθ y = arg max θ Θ Lc y(θ), where L x (θ) = n i=1 log g θ(x i ) and L c y(θ) =...

13 EM Algorithm (1/3) Dempster Laird Rubin (1977) General context of missing data: A fraction x of y is observed MLE s on observed data (x) and complete data (y): ˆθ x = arg max θ Θ L x(θ), ˆθ y = arg max θ Θ Lc y(θ), where L x (θ) = n i=1 log g θ(x i ) and L c y(θ) =... Intuition: often the MLE on the complete data ˆθ y is available, while ˆθ x is not Try to maximize instead of L c y(θ), its expectation given x for some θ... Q(θ θ ) := E [L c Y (θ) x; θ ]

14 EM Algorithm (2/3): general principle θ 0 = some arbitrary initialization, θ t = current value: EM θ t θ t+1 iteration Expectation step: compute θ Q(θ θ t ) Maximization step: set θ t+1 = arg max θ Θ Q(θ θ t ). Why does it works under mild conditions? Ascent property for the observed loglikelihood L x (θ t+1 ) L x (θ t ) alternatives: GEM, ECM, MM, Stochastic-EM,...

15 EM Algorithm (3/3): parametric mixture f j = f (x φ j ) E-step: Amounts to find the posterior probabilities ij := E θ t [Z ij x i ] = P θ t [Z ij = 1 x i ] = λt j f (x i φ t j ) j λt j f (x i φ t j ) Z t M-step: Maximization looks like weighted MLE λ t+1 j = 1 n n i=1 Z t ij generic for EM algorithms for mixtures and for e.g., the Gaussian f (x φ j ) = the pdf of N (µ j, v j ), µ t+1 j = n i=1 Z ij t x i n i=1 Z ij t, v t+1 j = n i=1 Z t ij (x i µ t+1 j ) 2 n i=1 Z t ij

16 About Stochastic EM versions In some (mixture) setup, it may be useful to simulate the latent data from the posterior probabilities: Then: Ẑ t i Mult ( 1 ; Z t i1,..., Z t im), i = 1,..., n The complete data (x, Ẑt ) allows direct computation of the MLE the sequence (θ t ) t 1 becomes a Markov Chain Historically, parametric Stochastic EM introduced by Celeux Diebolt (1985, 1986,... ) General convergence properties: Nielsen 2000

17 About Stochastic EM versions In some (mixture) setup, it may be useful to simulate the latent data from the posterior probabilities: Then: Ẑ t i Mult ( 1 ; Z t i1,..., Z t im), i = 1,..., n The complete data (x, Ẑt ) allows direct computation of the MLE the sequence (θ t ) t 1 becomes a Markov Chain Historically, parametric Stochastic EM introduced by Celeux Diebolt (1985, 1986,... ) General convergence properties: Nielsen 2000 In non-parametric framework: Stochastic EM for reliability mixture models, Bordes C (2010, 2012) more on this later

18 Outline: Next up... Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms 1 Mixture models and EM algorithm-ology 2 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms 3

19 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Old Faithful data with parametric Gaussian EM Time between Old Faithful eruptions (minutes) Density In R with mixtools, type R> data(faithful) R> attach(faithful) R> normalmixem(waiting, mu=c(55,80), sigma=5) number of iterations= Minutes

20 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Old Faithful data with parametric Gaussian EM Time between Old Faithful eruptions (minutes) Density λ 1 = In R with mixtools, type R> data(faithful) R> attach(faithful) R> normalmixem(waiting, mu=c(55,80), sigma=5) number of iterations= 24 Gaussian EM result: ˆµ = (54.6, 80.1) Minutes

21 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Univariate identifiable semiparametric mixtures One originality of mixtools: Its capability to handle various nonparametric components f j s in g θ (x) = m j=1 λ jf j (x), for several models Benaglia C Hunter Young (J. Stat. Soft. 2009)

22 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Univariate identifiable semiparametric mixtures One originality of mixtools: Its capability to handle various nonparametric components f j s in g θ (x) = m j=1 λ jf j (x), for several models Benaglia C Hunter Young (J. Stat. Soft. 2009) Location-shift semi-parametric mixture model: g θ (x) = λ 1 f (x µ 1 ) + (1 λ 1 )f (x µ 2 ) This model is identifiable when f ( ) is even. Bordes Mottelet Vandekerkhove (2006), Hunter Wang Hettmansperger (2007) Bordes C Vandekerkhove (2007) introduced an EM-like algorithm that includes a Kernel Density Estimation (KDE) step.

23 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms mixtools Semi-parametric EM algorithm E-step: Same as usual: Z t ij E θ t [Z ij x i ] = λ t j f t (x i µ t j ) λ t 1 f t (x i µ t 1 ) + λt 2 f t (x i µ t 2 ) M-step: Maximize complete data loglikelihood for λ and µ: λ t+1 j = 1 n n i=1 Zij t µ t+1 j = (nλ t+1 j ) 1 n i=1 Z t ij x i Weighted KDE-step: Update f t (for some bandwidth h) by ( f t+1 (u) = 1 n 2 u (xi Zij t nh K µ t+1 ) j ), then symmetrize. h i=1 j=1

24 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Location-shift semi-parametric mixture model Time between Old Faithful eruptions (minutes) Density λ 1 = Gaussian EM: ˆµ = (54.6, 80.1) Minutes

25 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Location-shift semi-parametric mixture model g θ (x) = λ 1 f (x µ 1 ) + (1 λ 1 )f (x µ 2 ) Time between Old Faithful eruptions (minutes) Density λ 1 = λ 1 = Gaussian EM: ˆµ = (54.6, 80.1) Semiparametric EM R> spemsymloc(waiting, mu=c(55,80), h=4) # bandwidth ˆµ = (54.7, 79.8) Minutes

26 Outline: Next up... Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms 1 Mixture models and EM algorithm-ology 2 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms 3

27 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Identifiability: the blessing of dimensionality (!) Recall the model in the multivariate case, r > 1: g θ (x) = m r f jk (x k ) λ j j=1 k=1 N.B.: Assume conditional independence of x 1,..., x r Hall Zhou (2003) show that when m = 2 and r 3, the model is identifiable under mild restrictions on the f jk ( ) Hall et al. (2005)... from at least one point of view, the curse of dimensionality works in reverse. Allman et al. (2008) give mild sufficient conditions for identifiability whenever r 3

28 The notation gets even worse... Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Motivation: the Water-level data with 4 blocks of 2 similar responses Let the r coordinates be grouped into B r blocks of conditionally iid coordinates. b k {1,..., B} is the block index of the kth coordinate The model becomes g(x) = m λ j j=1 k=1 r f jbk (x k ) Special cases: b k = k for k = 1,..., r: general model of conditional independence as in (Hall et al ) b k = 1 for all k: Conditionally i.i.d. assumption (Elmore et al. 2004)

29 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Nonparametric EM Benaglia C Hunter (2009, 2011) E-step: Same as for a genuine parametric EM, r k=1 f jb t k (x ik ) Z t ij = λt j j λt j r k=1 f t j b k (x ik ) M-step: Maximize complete data loglikelihood for λ: λ t+1 j = 1 n Zij t n WKDE-step: Update estimate of f jl (component j, block l) by ( ) r n f t+1 jl (u) = Zij t I {b k =l}k 1 nh t+1 jl C l λ t+1 j i=1 k=1 i=1 u x ik h t+1 jl where C l = # of coordinates in block l, = Iterative and per component & block adaptive bandwidth h t+1 jl

30 The Water-level data Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Dataset previously analysed with conditional i.i.d. assumption. Hettmansperger Thomas (2000), Elmore et al. (2004) The non appropriate conditional i.i.d. assumption masks interesting features that our model reveals In mixtools: npem algorithm R> library(mixtools) R> data(waterdata) R> a <- npem(waterdata, 3, blockid=c(1,2,3,4,2,1,4,3), h=4) # user-fixed bandwidth here R> par(mfrow=c(2,2)) R> plot(a) R> summary(a) # some statistics per components...

31 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms The Water-level data, m = 3 components, 4 blocks Block 1: 1:00 and 7:00 orientations Block 2: 2:00 and 8:00 orientations Appearance of Vessel at Orientation = 1:00 Mixing Proportion (Mean, Std Dev) ( 32.1, 19.4) ( 3.9, 23.3) ( 1.4, 6.0) Appearance of Vessel at Orientation = 2:00 Mixing Proportion (Mean, Std Dev) ( 31.4, 55.4) ( 11.7, 27.0) ( 2.7, 4.6) Block 3: 4:00 and 10:00 orientations Block 4: 5:00 and 11:00 orientations Appearance of Vessel at Orientation = 4:00 Mixing Proportion (Mean, Std Dev) ( 43.6, 39.7) ( 11.4, 27.5) ( 1.0, 5.3) Appearance of Vessel at Orientation = 5:00 Mixing Proportion (Mean, Std Dev) ( 27.5, 19.3) ( 2.0, 22.1) ( 0.1, 6.1)

32 Outline: Next up... 1 Mixture models and EM algorithm-ology 2 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms 3

33 From npem to NEMS for nonparametric mixtures Motivation: Our npem algorithms are not true EM Idea: combine regularization and npem approach to define an algorithm with a provable ascent property Levine Hunter C (Biometrika 2011) Nonparametric in smooth EM literature refers to a continuous mixing distribution true EM but ill-posed difficulties, Vardi et al. (1985) Smoothed EM (EMS), Silverman et al. (1990) Nonlinear EMS (NEMS) regularization approach Eggermont, LaRiccia (1995); Eggermont (1992, 1999)

34 Smoothing the mixture Applying Eggermont (1992, 1999) idea to component pdf s Nonlinear smoothing N f (x) = exp Ω K h (x u) log f (u) du, where K h (u) = h r r k=1 K (h 1 u k ) is a product kernel

35 Smoothing the mixture Applying Eggermont (1992, 1999) idea to component pdf s Nonlinear smoothing N f (x) = exp Ω K h (x u) log f (u) du, where K h (u) = h r r k=1 K (h 1 u k ) is a product kernel For f = (f 1,..., f m ), define m M λ N f(x) := λ j N f j (x) Goal: minimizing the -sample objective function (Kullback) g(x) l(θ) = l(f, λ) := g(x) log [M λ N f](x) dx. Ω j=1

36 Majorization-Minimization (MM) trick Principle: Let θ t be the current value of θ in an algorithm. A function B(θ θ t ) is said to majorize l(θ) at θ t provided B(θ θ t ) l(θ) for all θ B(θ t θ t ) = l(θ t ) MM minimization algorithm: set θ t+1 = arg min θ B(θ θ t ) The MM algorithm satisfies the descent property l(θ t+1 ) l(θ t ) and we can define such a majorizing function B(θ θ t ) here!

37 MM algorithm with a descent property Smoothed log-likelihood version for finite sample-size given the sample x 1,..., x n iid g l n (f, λ) := n log[m λ N f](x i ) i=1 The following MM algorithm satisfies a descent property l n (f t+1, λ t+1 ) l n (f t, λ t )

38 nonparametric Maximum Smoothed Likelihood (npmsl) algorithm E-step: (requires additional univariate numerical integrations) r k=1 N f jk t (x ik) w t ij = λt j N f t j (x i) M λ t N f t (x i ) = M-step: for j = 1,..., m λ t+1 j λ t j m j =1 λ j r k=1 N f t j k (x ik) = 1 n WKDE-step: For each component j and coord. k (or blocks), f t+1 1 n ( ) u jk (u) = wij t K xik h nhλ t+1 j i=1 n i=1 w t ij

39 The Water-level data again, npem vs. npmsl (dotted) Block 4: 5:00 and 11:00 Orientations Block 3: 4:00 and 10:00 Orientations Density Block 2: 2:00 and 8:00 Orientations Density Block 1: 1:00 and 7:00 Orientations npmsl in mixtools: R> b <- npmsl(waterdata, 3, blockid=c(1,2,3,4,2,1,4,3), h=4) # bandwidth Density Density R> plot(b)

40 Further extensions: Semiparametric models Component or block density may differ only in location and/or scale parameters, e.g. f jl (x) = 1 ( ) x µjl f j σ jl or or f jl (x) = 1 σ jl f l f jl (x) = 1 σ jl f where f j, f l, f remain fully unspecified σ jl ( ) x µjl σ jl ( ) x µjl For all these situations special cases of the npem/npmsl algorithm can be designed (some are already in mixtools). σ jl

41 Outline: Next up... 1 Mixture models and EM algorithm-ology 2 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms 3

42 Bordes C (2010, 2012) Typical data from reliability and life testing + mixture model: lifetime data on R + (X 1,..., X n ) iid g θ = j λ jf j (x) censoring times (C1,..., C n ) iid q (unknown) Random (right) censoring T i = min(x i, C i ), D i = I {Xi C i } Observed (incomplete) data (t, d) = ((t i, d i ), i = 1,... n)

43 Bordes C (2010, 2012) Typical data from reliability and life testing + mixture model: lifetime data on R + (X 1,..., X n ) iid g θ = j λ jf j (x) censoring times (C1,..., C n ) iid q (unknown) Random (right) censoring T i = min(x i, C i ), D i = I {Xi C i } Observed (incomplete) data (t, d) = ((t i, d i ), i = 1,... n) A semiparametric example: Semiparametric accelerated lifetime mixture model g θ (x) = λf (x) + (1 λ)ξ f (ξx) Scaling model: (X Z 1 = 1) has distribution U f, and (X Z 2 = 1) U/ξ, identifiable under some restrictions

44 Semi-parametric Stochastic-EM for censored data (1) Denote F the cdf of f, S = 1 F the Survival function and α = f /S the hazard rate Building blocks: For a single censored sample (t, d), S and α can be estimated nonparametrically using Kaplan-Meier and Nelson-Aalen estimators.

45 Semi-parametric Stochastic-EM for censored data (1) Denote F the cdf of f, S = 1 F the Survival function and α = f /S the hazard rate Building blocks: For a single censored sample (t, d), S and α can be estimated nonparametrically using Kaplan-Meier and Nelson-Aalen estimators. E(X Z j = 1) = + 0 S j (s)ds, where S j (s) = P(X > s Z j = 1) is the jth component survival

46 Semi-parametric Stochastic-EM for censored data (1) Denote F the cdf of f, S = 1 F the Survival function and α = f /S the hazard rate Building blocks: For a single censored sample (t, d), S and α can be estimated nonparametrically using Kaplan-Meier and Nelson-Aalen estimators. E(X Z j = 1) = + 0 S j (s)ds, where S j (s) = P(X > s Z j = 1) is the jth component survival In this scaling model ξ = E(X Z 1=1) E(X Z 2 =1)

47 Semi-parametric Stochastic-EM for censored data (1) Denote F the cdf of f, S = 1 F the Survival function and α = f /S the hazard rate Building blocks: For a single censored sample (t, d), S and α can be estimated nonparametrically using Kaplan-Meier and Nelson-Aalen estimators. E(X Z j = 1) = + 0 S j (s)ds, where S j (s) = P(X > s Z j = 1) is the jth component survival In this scaling model ξ = E(X Z 1=1) E(X Z 2 =1) If X comes from component 2, then ξx f, so that {X i : Z i1 = 1} {ξx i : Z i2 = 1} iid f

48 Semi-parametric Stochastic-EM for censored data (1) Denote F the cdf of f, S = 1 F the Survival function and α = f /S the hazard rate Building blocks: For a single censored sample (t, d), S and α can be estimated nonparametrically using Kaplan-Meier and Nelson-Aalen estimators. E(X Z j = 1) = + 0 S j (s)ds, where S j (s) = P(X > s Z j = 1) is the jth component survival In this scaling model ξ = E(X Z 1=1) E(X Z 2 =1) If X comes from component 2, then ξx f, so that {X i : Z i1 = 1} {ξx i : Z i2 = 1} iid f Estimates ˆξ and ˆf = ˆαŜ available in mixture setup if Z is observed

49 Semi-parametric St-EM for censored data (2) Stochastic version required here! E-step: if d i = 0 (censored lifetime) else (d i = 1 observed lifetime) Z t i1 = λ t S t (t i ) λ t S t (t i ) + (1 λ t )S t (ξ t t i ) Z t i1 = λ t α t (t i )S t (t i ) λ t α t (t i )S t (t i ) + (1 λ t )ξ t α t (ξ t t i )S k (ξ k t i ) S-step: Simulate Ẑ i1 t B(pt i1 ), i = 1,..., n, and set χ t j = {i {1,..., n} : Ẑ i1 t = 1}, j = 1, 2

50 Semi-parametric St-EM for censored data (3) M-step for λ, ξ: λ t+1 = Card(χt 1 ), ξ t+1 = n τ t 1 0 Ŝt 1 (s)ds τ t, τ t 2 0 Ŝt 2 (s)ds j = max where Ŝt j is the Kaplan-Meier estimate based on {(t i, d i ); i χ t j } Nonparametric-step for α, S: Let t t be the order statistic of {t i ; i χ t 1 } {ξt t i ; i χ t 2 } and dt the associated censoring indicators α t+1 (s) = n 1 h K i=1 S t+1 (s) = i:t t i s ( s t t ) i h ( 1 d t i n i + 1 d t i n i + 1 ) for a kerneld. KChauveau and bandwidth RR Bordeaux 2012h. New algorithms in mixtools i χ t j t i

51 Simulated example: scale mixture of Lognormals n = 300, 15% censored scaling weight of component in mixtools: semiparametric Reliability Mixture Model Stochastic EM iterations iterations f(x) Survival function estimate true density Density estimate true R> library(mixtools) R> library(survival) # for KM estimation # simulating data... R> s <- sprmmsem(t, d, scaling=0.1) R> plot(s) time time

52 Outline: Next up... 1 Mixture models and EM algorithm-ology 2 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms 3

53 Other new models/algorithms available (1) Parametric (Exponential, Weibull, lognormal... ) reliability mixture models for censored data

54 Other new models/algorithms available (1) Parametric (Exponential, Weibull, lognormal... ) reliability mixture models for censored data Gaussian mixtures with linear constraints on the parameters, e.g. for p m, known M, C, µ = µ 1. µ m = Mβ + C = M β 1. β p C 1 +. C m and similar constaints on variances requires ECM and/or MM algorithms C Hunter (2011)

55 Other new models/algorithms available (1) Parametric (Exponential, Weibull, lognormal... ) reliability mixture models for censored data Gaussian mixtures with linear constraints on the parameters, e.g. for p m, known M, C, µ = µ 1. µ m = Mβ + C = M β 1. β p C 1 +. C m and similar constaints on variances requires ECM and/or MM algorithms C Hunter (2011) Advances in mixtures of regression: predictor-dependent mixing proportions Young, Hunter (CSDA 2010)

56 Other new models (2): Mixtures in FDR estimation In multiple testing (e.g., micro-arrays) we consider n multiple tests p-values p = (p 1,..., p n ) question: False Discovery Rate (FDR) = expected proportion of falsely rejected H 0 s

57 Other new models (2): Mixtures in FDR estimation In multiple testing (e.g., micro-arrays) we consider n multiple tests p-values p = (p 1,..., p n ) question: False Discovery Rate (FDR) = expected proportion of falsely rejected H 0 s mixture modelling: latent variable Z i = 1 if H 0 is true and Z i = 0 if H 0 is rejected (= interesting case) g θ (p) = λ 0 f 0 (p) + (1 λ 0 )f 1 (p) theoretically (p i H 0 true) U [0,1] f 0 known!

58 Other new models (2): Mixtures in FDR estimation In multiple testing (e.g., micro-arrays) we consider n multiple tests p-values p = (p 1,..., p n ) question: False Discovery Rate (FDR) = expected proportion of falsely rejected H 0 s mixture modelling: latent variable Z i = 1 if H 0 is true and Z i = 0 if H 0 is rejected (= interesting case) g θ (p) = λ 0 f 0 (p) + (1 λ 0 )f 1 (p) theoretically (p i H 0 true) U [0,1] f 0 known! nonparametric assumption for f 1 as e.g. in Robin et. al or the fdrtool package Strimmer 2008

59 Other new models (2): Mixtures in FDR estimation In multiple testing (e.g., micro-arrays) we consider n multiple tests p-values p = (p 1,..., p n ) question: False Discovery Rate (FDR) = expected proportion of falsely rejected H 0 s mixture modelling: latent variable Z i = 1 if H 0 is true and Z i = 0 if H 0 is rejected (= interesting case) g θ (p) = λ 0 f 0 (p) + (1 λ 0 )f 1 (p) theoretically (p i H 0 true) U [0,1] f 0 known! nonparametric assumption for f 1 as e.g. in Robin et. al or the fdrtool package Strimmer 2008 In mixtools: a semiparametric EM with one component known, for FDR estimation C Saby (2011, 2012)

60 The end MERCI DE VOTRE ATTENTION! ET

61 For Further Reading (1/3) Benaglia, T., Chauveau, D., and Hunter, D. R. An EM-like algorithm for semi- and non-parametric estimation in mutlivariate mixtures. J. Comput. Graph. Statist. 18(2): , Benaglia T., Chauveau D., Hunter D. R., Young D. S. mixtools: An R Package for Analyzing Mixture Models. Journal of Statistical Software 32: 1 29, L. Bordes, D. Chauveau, and P. Vandekerkhove. A stochastic EM algorithm for a semiparametric mixture model. Comput. Statist. & Data Anal., 51(11): , 2007.

62 For Further Reading (2/3) Bordes, L. and Chauveau D. EM and Stochastic EM algorithms for reliability mixture models under random censoring. Preprint HAL, 2012 Levine, M., Hunter, D. R.,Chauveau, D. Maximum Smoothed Likelihood for Multivariate Mixtures. Biometrika 98(2): , E. S. Allman, C. Matias, and J. A. Rhodes. Identifiability of parameters in latent structure models with many observed variables. Ann. Statist, 37(6A): , 2009.

63 For Further Reading (3/3) S. F. Nielsen. The stochastic EM algorithm: Estimation and asymptotic results. Bernoulli, 6(3): , D. Young and D. Hunter. Mixtures of regressions with predictor-dependent mixing proportions. Computational Statistics and Data Analysis, 54: , 2010.

Estimation for nonparametric mixture models

Estimation for nonparametric mixture models Estimation for nonparametric mixture models David Hunter Penn State University Research supported by NSF Grant SES 0518772 Joint work with Didier Chauveau (University of Orléans, France), Tatiana Benaglia

More information

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,

More information

Semi-parametric estimation for conditional independence multivariate finite mixture models

Semi-parametric estimation for conditional independence multivariate finite mixture models Statistics Surveys Vol. 9 (2015) 1 31 ISSN: 1935-7516 DOI: 10.1214/15-SS108 Semi-parametric estimation for conditional independence multivariate finite mixture models Didier Chauveau Université d Orléans,

More information

An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures

An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures Tatiana Benaglia 1 Didier Chauveau 2 David R. Hunter 1,3 August 2008 1 Pennsylvania State University, USA 2 Université

More information

NONPARAMETRIC ESTIMATION IN MULTIVARIATE FINITE MIXTURE MODELS

NONPARAMETRIC ESTIMATION IN MULTIVARIATE FINITE MIXTURE MODELS The Pennsylvania State University The Graduate School Department of Statistics NONPARAMETRIC ESTIMATION IN MULTIVARIATE FINITE MIXTURE MODELS A Dissertation in Statistics by Tatiana A. Benaglia 2008 Tatiana

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

11 Survival Analysis and Empirical Likelihood

11 Survival Analysis and Empirical Likelihood 11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with

More information

Computation of an efficient and robust estimator in a semiparametric mixture model

Computation of an efficient and robust estimator in a semiparametric mixture model Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Computation of an efficient and robust estimator in

More information

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models NIH Talk, September 03 Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models Eric Slud, Math Dept, Univ of Maryland Ongoing joint project with Ilia

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Optimization Methods II. EM algorithms.

Optimization Methods II. EM algorithms. Aula 7. Optimization Methods II. 0 Optimization Methods II. EM algorithms. Anatoli Iambartsev IME-USP Aula 7. Optimization Methods II. 1 [RC] Missing-data models. Demarginalization. The term EM algorithms

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Mai Zhou 1 University of Kentucky, Lexington, KY 40506 USA Summary. Empirical likelihood ratio method (Thomas and Grunkmier

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

Maximum likelihood estimation of a log-concave density based on censored data

Maximum likelihood estimation of a log-concave density based on censored data Maximum likelihood estimation of a log-concave density based on censored data Dominic Schuhmacher Institute of Mathematical Statistics and Actuarial Science University of Bern Joint work with Lutz Dümbgen

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

ON SOME TWO-STEP DENSITY ESTIMATION METHOD

ON SOME TWO-STEP DENSITY ESTIMATION METHOD UNIVESITATIS IAGELLONICAE ACTA MATHEMATICA, FASCICULUS XLIII 2005 ON SOME TWO-STEP DENSITY ESTIMATION METHOD by Jolanta Jarnicka Abstract. We introduce a new two-step kernel density estimation method,

More information

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University Survival Analysis: Weeks 2-3 Lu Tian and Richard Olshen Stanford University 2 Kaplan-Meier(KM) Estimator Nonparametric estimation of the survival function S(t) = pr(t > t) The nonparametric estimation

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

STAT 6350 Analysis of Lifetime Data. Probability Plotting

STAT 6350 Analysis of Lifetime Data. Probability Plotting STAT 6350 Analysis of Lifetime Data Probability Plotting Purpose of Probability Plots Probability plots are an important tool for analyzing data and have been particular popular in the analysis of life

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods

More information

EM for ML Estimation

EM for ML Estimation Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Statistical Inference and Methods

Statistical Inference and Methods Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 31st January 2006 Part VI Session 6: Filtering and Time to Event Data Session 6: Filtering and

More information

EM for Spherical Gaussians

EM for Spherical Gaussians EM for Spherical Gaussians Karthekeyan Chandrasekaran Hassan Kingravi December 4, 2007 1 Introduction In this project, we examine two aspects of the behavior of the EM algorithm for mixtures of spherical

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

The EM Algorithm for the Finite Mixture of Exponential Distribution Models

The EM Algorithm for the Finite Mixture of Exponential Distribution Models Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Nonparametric Function Estimation with Infinite-Order Kernels

Nonparametric Function Estimation with Infinite-Order Kernels Nonparametric Function Estimation with Infinite-Order Kernels Arthur Berg Department of Statistics, University of Florida March 15, 2008 Kernel Density Estimation (IID Case) Let X 1,..., X n iid density

More information

Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions

Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions UDC 519.2 Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions Yu. K. Belyaev, D. Källberg, P. Rydén Department of Mathematics and Mathematical Statistics, Umeå University,

More information

Mixture Models and Expectation-Maximization

Mixture Models and Expectation-Maximization Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Overview of today s class Kaplan-Meier Curve

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Empirical Likelihood in Survival Analysis

Empirical Likelihood in Survival Analysis Empirical Likelihood in Survival Analysis Gang Li 1, Runze Li 2, and Mai Zhou 3 1 Department of Biostatistics, University of California, Los Angeles, CA 90095 vli@ucla.edu 2 Department of Statistics, The

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

and Comparison with NPMLE

and Comparison with NPMLE NONPARAMETRIC BAYES ESTIMATOR OF SURVIVAL FUNCTIONS FOR DOUBLY/INTERVAL CENSORED DATA and Comparison with NPMLE Mai Zhou Department of Statistics, University of Kentucky, Lexington, KY 40506 USA http://ms.uky.edu/

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

Weighted Finite-State Transducers in Computational Biology

Weighted Finite-State Transducers in Computational Biology Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

ST495: Survival Analysis: Maximum likelihood

ST495: Survival Analysis: Maximum likelihood ST495: Survival Analysis: Maximum likelihood Eric B. Laber Department of Statistics, North Carolina State University February 11, 2014 Everything is deception: seeking the minimum of illusion, keeping

More information

MIXTURE MODELS AND EM

MIXTURE MODELS AND EM Last updated: November 6, 212 MIXTURE MODELS AND EM Credits 2 Some of these slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Simon Prince, University College London Sergios Theodoridis,

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

EM Algorithm. Expectation-maximization (EM) algorithm.

EM Algorithm. Expectation-maximization (EM) algorithm. EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,

More information

Kernel density estimation for heavy-tailed distributions...

Kernel density estimation for heavy-tailed distributions... Kernel density estimation for heavy-tailed distributions using the Champernowne transformation Buch-Larsen, Nielsen, Guillen, Bolance, Kernel density estimation for heavy-tailed distributions using the

More information

Key Words: survival analysis; bathtub hazard; accelerated failure time (AFT) regression; power-law distribution.

Key Words: survival analysis; bathtub hazard; accelerated failure time (AFT) regression; power-law distribution. POWER-LAW ADJUSTED SURVIVAL MODELS William J. Reed Department of Mathematics & Statistics University of Victoria PO Box 3060 STN CSC Victoria, B.C. Canada V8W 3R4 reed@math.uvic.ca Key Words: survival

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Label Switching and Its Simple Solutions for Frequentist Mixture Models Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching

More information

Investigation of goodness-of-fit test statistic distributions by random censored samples

Investigation of goodness-of-fit test statistic distributions by random censored samples d samples Investigation of goodness-of-fit test statistic distributions by random censored samples Novosibirsk State Technical University November 22, 2010 d samples Outline 1 Nonparametric goodness-of-fit

More information

Algorithmic approaches to fitting ERG models

Algorithmic approaches to fitting ERG models Ruth Hummel, Penn State University Mark Handcock, University of Washington David Hunter, Penn State University Research funded by Office of Naval Research Award No. N00014-08-1-1015 MURI meeting, April

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Maximum Lq-Likelihood Estimation via the. Expectation Maximization Algorithm: A Robust Estimation of Mixture Models

Maximum Lq-Likelihood Estimation via the. Expectation Maximization Algorithm: A Robust Estimation of Mixture Models Maximum Lq-Likelihood Estimation via the Expectation Maximization Algorithm: A Robust Estimation of Mixture Models Yichen Qin and Carey E. Priebe Abstract We introduce a maximum Lq-likelihood estimation

More information

A Conditional Approach to Modeling Multivariate Extremes

A Conditional Approach to Modeling Multivariate Extremes A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Computing the MLE and the EM Algorithm

Computing the MLE and the EM Algorithm ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes: Practice Exam 1 1. Losses for an insurance coverage have the following cumulative distribution function: F(0) = 0 F(1,000) = 0.2 F(5,000) = 0.4 F(10,000) = 0.9 F(100,000) = 1 with linear interpolation

More information

Issues on Fitting Univariate and Multivariate Hyperbolic Distributions

Issues on Fitting Univariate and Multivariate Hyperbolic Distributions Issues on Fitting Univariate and Multivariate Hyperbolic Distributions Thomas Tran PhD student Department of Statistics University of Auckland thtam@stat.auckland.ac.nz Fitting Hyperbolic Distributions

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests Biometrika (2014),,, pp. 1 13 C 2014 Biometrika Trust Printed in Great Britain Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests BY M. ZHOU Department of Statistics, University

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Fitting Narrow Emission Lines in X-ray Spectra

Fitting Narrow Emission Lines in X-ray Spectra Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:

More information