New mixture models and algorithms in the mixtools package

Size: px

Start display at page:

Download "New mixture models and algorithms in the mixtools package"

Esmond McLaughlin
5 years ago
Views:

1 New mixture models and algorithms in the mixtools package Didier Chauveau MAPMO - UMR Université d Orléans 1ères Rencontres BoRdeaux Juillet 2012

2 Outline 1 Mixture models and EM algorithm-ology 2 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms 3

Univariate mixture example 1: Old Faithful wait times Time between Old Faithful eruptions (minutes) 0.00 0.01 0.02 0.03 0.

3 Univariate mixture example 1: Old Faithful wait times Time between Old Faithful eruptions (minutes) Minutes from Obvious bimodality Normal-looking components? Classification of individuals?

4 Univariate mixture example 2: lifetime data mixture of exponentials Density No obvious multimodality but 2 failure sources suspected? lifetime

5 Univariate mixture example 2: lifetime data mixture of exponentials Density component 1 component 2 mixture No obvious multimodality but 2 failure sources suspected? True components and mixture densities for these simulated data differences in the tails only! lifetime

6 Finite mixture model and missing data setup A complete ith observation Y i = (X i, Z i ) consists of: The missing 0/1 latent variable Z i = (Z i1,..., Z im ), { 1 if X i comes from component j Z ij =, P(Z ij = 1) = λ j 0 otherwise The observed, incomplete data of interest X i with jth component density (X i Z ij = 1) f j Most models in mixtools share in common a finite mixture pdf g θ (x) = m λ j f j (x) j=1 with the model parameters: θ = (λ, f) = (λ 1,..., λ m, f 1,..., f m )

7 Some mixture estimation problems in mixtools Goal: Estimate the parameter θ given an iid sample x from Univariate Cases: x R some parametric families g θ (x) = m λ j f (x φ j ) j=1 semi-parametric location-shift mixture g θ (x) = m λ j f (x µ j ) j=1 mixture of regressions,...

8 Some mixture estimation problems in mixtools Goal: Estimate the parameter θ given an iid sample x from Univariate Cases: x R some parametric families Multivariate cases: x R r parametric (Gaussian,... ) g θ (x) = m λ j f (x φ j ) j=1 g θ (x) = m λ j f (x φ j ) j=1 semi-parametric location-shift mixture g θ (x) = m λ j f (x µ j ) j=1 mixture of regressions,... Fully nonparametric, e.g. g θ (x) = m λ j j=1 k=1 r f jk (x k ) Conditional independence of x 1,..., x r given z

9 Multivariate example: Water-level data Example from psychometrics Thomas Lohaus Brainerd (1993). Subjects are shown r = 8 vessels orientations, presented in this order They draw the water surface for each Measure = signed angle formed by surface with horizontal 11:00 4:00 2:00 7:00 10:00 5:00 1:00 8:00

10 Multivariate example: Water-level data Example from psychometrics Thomas Lohaus Brainerd (1993). Subjects are shown r = 8 vessels orientations, presented in this order They draw the water surface for each Measure = signed angle formed by surface with horizontal Assume that opposite clock-face orientations lead to same behavior = conditionally iid responses (1 & 7), (2 & 8), (4 & 10), (5 & 11), 11:00 4:00 2:00 7:00 10:00 5:00 1:00 8:00

11 Water-level data: histograms by blocks of iid coord. Opposite clock-face = conditionally iid responses 1:00 and 7:00 orientations 2:00 and 8:00 orientations Vessel at Orientation 1: Vessel at Orientation 2: :00 and 10:00 orientations 5:00 and 11:00 orientations Vessel at Orientation 4: Vessel at Orientation 5:

12 EM Algorithm (1/3) Dempster Laird Rubin (1977) General context of missing data: A fraction x of y is observed MLE s on observed data (x) and complete data (y): ˆθ x = arg max θ Θ L x(θ), ˆθ y = arg max θ Θ Lc y(θ), where L x (θ) = n i=1 log g θ(x i ) and L c y(θ) =...

13 EM Algorithm (1/3) Dempster Laird Rubin (1977) General context of missing data: A fraction x of y is observed MLE s on observed data (x) and complete data (y): ˆθ x = arg max θ Θ L x(θ), ˆθ y = arg max θ Θ Lc y(θ), where L x (θ) = n i=1 log g θ(x i ) and L c y(θ) =... Intuition: often the MLE on the complete data ˆθ y is available, while ˆθ x is not Try to maximize instead of L c y(θ), its expectation given x for some θ... Q(θ θ ) := E [L c Y (θ) x; θ ]

14 EM Algorithm (2/3): general principle θ 0 = some arbitrary initialization, θ t = current value: EM θ t θ t+1 iteration Expectation step: compute θ Q(θ θ t ) Maximization step: set θ t+1 = arg max θ Θ Q(θ θ t ). Why does it works under mild conditions? Ascent property for the observed loglikelihood L x (θ t+1 ) L x (θ t ) alternatives: GEM, ECM, MM, Stochastic-EM,...

15 EM Algorithm (3/3): parametric mixture f j = f (x φ j ) E-step: Amounts to find the posterior probabilities ij := E θ t [Z ij x i ] = P θ t [Z ij = 1 x i ] = λt j f (x i φ t j ) j λt j f (x i φ t j ) Z t M-step: Maximization looks like weighted MLE λ t+1 j = 1 n n i=1 Z t ij generic for EM algorithms for mixtures and for e.g., the Gaussian f (x φ j ) = the pdf of N (µ j, v j ), µ t+1 j = n i=1 Z ij t x i n i=1 Z ij t, v t+1 j = n i=1 Z t ij (x i µ t+1 j ) 2 n i=1 Z t ij

16 About Stochastic EM versions In some (mixture) setup, it may be useful to simulate the latent data from the posterior probabilities: Then: Ẑ t i Mult ( 1 ; Z t i1,..., Z t im), i = 1,..., n The complete data (x, Ẑt ) allows direct computation of the MLE the sequence (θ t ) t 1 becomes a Markov Chain Historically, parametric Stochastic EM introduced by Celeux Diebolt (1985, 1986,... ) General convergence properties: Nielsen 2000

17 About Stochastic EM versions In some (mixture) setup, it may be useful to simulate the latent data from the posterior probabilities: Then: Ẑ t i Mult ( 1 ; Z t i1,..., Z t im), i = 1,..., n The complete data (x, Ẑt ) allows direct computation of the MLE the sequence (θ t ) t 1 becomes a Markov Chain Historically, parametric Stochastic EM introduced by Celeux Diebolt (1985, 1986,... ) General convergence properties: Nielsen 2000 In non-parametric framework: Stochastic EM for reliability mixture models, Bordes C (2010, 2012) more on this later

18 Outline: Next up... Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms 1 Mixture models and EM algorithm-ology 2 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms 3

19 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Old Faithful data with parametric Gaussian EM Time between Old Faithful eruptions (minutes) Density In R with mixtools, type R> data(faithful) R> attach(faithful) R> normalmixem(waiting, mu=c(55,80), sigma=5) number of iterations= Minutes

20 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Old Faithful data with parametric Gaussian EM Time between Old Faithful eruptions (minutes) Density λ 1 = In R with mixtools, type R> data(faithful) R> attach(faithful) R> normalmixem(waiting, mu=c(55,80), sigma=5) number of iterations= 24 Gaussian EM result: ˆµ = (54.6, 80.1) Minutes

21 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Univariate identifiable semiparametric mixtures One originality of mixtools: Its capability to handle various nonparametric components f j s in g θ (x) = m j=1 λ jf j (x), for several models Benaglia C Hunter Young (J. Stat. Soft. 2009)

22 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Univariate identifiable semiparametric mixtures One originality of mixtools: Its capability to handle various nonparametric components f j s in g θ (x) = m j=1 λ jf j (x), for several models Benaglia C Hunter Young (J. Stat. Soft. 2009) Location-shift semi-parametric mixture model: g θ (x) = λ 1 f (x µ 1 ) + (1 λ 1 )f (x µ 2 ) This model is identifiable when f ( ) is even. Bordes Mottelet Vandekerkhove (2006), Hunter Wang Hettmansperger (2007) Bordes C Vandekerkhove (2007) introduced an EM-like algorithm that includes a Kernel Density Estimation (KDE) step.

23 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms mixtools Semi-parametric EM algorithm E-step: Same as usual: Z t ij E θ t [Z ij x i ] = λ t j f t (x i µ t j ) λ t 1 f t (x i µ t 1 ) + λt 2 f t (x i µ t 2 ) M-step: Maximize complete data loglikelihood for λ and µ: λ t+1 j = 1 n n i=1 Zij t µ t+1 j = (nλ t+1 j ) 1 n i=1 Z t ij x i Weighted KDE-step: Update f t (for some bandwidth h) by ( f t+1 (u) = 1 n 2 u (xi Zij t nh K µ t+1 ) j ), then symmetrize. h i=1 j=1

24 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Location-shift semi-parametric mixture model Time between Old Faithful eruptions (minutes) Density λ 1 = Gaussian EM: ˆµ = (54.6, 80.1) Minutes

25 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Location-shift semi-parametric mixture model g θ (x) = λ 1 f (x µ 1 ) + (1 λ 1 )f (x µ 2 ) Time between Old Faithful eruptions (minutes) Density λ 1 = λ 1 = Gaussian EM: ˆµ = (54.6, 80.1) Semiparametric EM R> spemsymloc(waiting, mu=c(55,80), h=4) # bandwidth ˆµ = (54.7, 79.8) Minutes

26 Outline: Next up... Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms 1 Mixture models and EM algorithm-ology 2 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms 3

27 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Identifiability: the blessing of dimensionality (!) Recall the model in the multivariate case, r > 1: g θ (x) = m r f jk (x k ) λ j j=1 k=1 N.B.: Assume conditional independence of x 1,..., x r Hall Zhou (2003) show that when m = 2 and r 3, the model is identifiable under mild restrictions on the f jk ( ) Hall et al. (2005)... from at least one point of view, the curse of dimensionality works in reverse. Allman et al. (2008) give mild sufficient conditions for identifiability whenever r 3

28 The notation gets even worse... Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Motivation: the Water-level data with 4 blocks of 2 similar responses Let the r coordinates be grouped into B r blocks of conditionally iid coordinates. b k {1,..., B} is the block index of the kth coordinate The model becomes g(x) = m λ j j=1 k=1 r f jbk (x k ) Special cases: b k = k for k = 1,..., r: general model of conditional independence as in (Hall et al ) b k = 1 for all k: Conditionally i.i.d. assumption (Elmore et al. 2004)

29 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Nonparametric EM Benaglia C Hunter (2009, 2011) E-step: Same as for a genuine parametric EM, r k=1 f jb t k (x ik ) Z t ij = λt j j λt j r k=1 f t j b k (x ik ) M-step: Maximize complete data loglikelihood for λ: λ t+1 j = 1 n Zij t n WKDE-step: Update estimate of f jl (component j, block l) by ( ) r n f t+1 jl (u) = Zij t I {b k =l}k 1 nh t+1 jl C l λ t+1 j i=1 k=1 i=1 u x ik h t+1 jl where C l = # of coordinates in block l, = Iterative and per component & block adaptive bandwidth h t+1 jl

30 The Water-level data Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms Dataset previously analysed with conditional i.i.d. assumption. Hettmansperger Thomas (2000), Elmore et al. (2004) The non appropriate conditional i.i.d. assumption masks interesting features that our model reveals In mixtools: npem algorithm R> library(mixtools) R> data(waterdata) R> a <- npem(waterdata, 3, blockid=c(1,2,3,4,2,1,4,3), h=4) # user-fixed bandwidth here R> par(mfrow=c(2,2)) R> plot(a) R> summary(a) # some statistics per components...

31 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms The Water-level data, m = 3 components, 4 blocks Block 1: 1:00 and 7:00 orientations Block 2: 2:00 and 8:00 orientations Appearance of Vessel at Orientation = 1:00 Mixing Proportion (Mean, Std Dev) ( 32.1, 19.4) ( 3.9, 23.3) ( 1.4, 6.0) Appearance of Vessel at Orientation = 2:00 Mixing Proportion (Mean, Std Dev) ( 31.4, 55.4) ( 11.7, 27.0) ( 2.7, 4.6) Block 3: 4:00 and 10:00 orientations Block 4: 5:00 and 11:00 orientations Appearance of Vessel at Orientation = 4:00 Mixing Proportion (Mean, Std Dev) ( 43.6, 39.7) ( 11.4, 27.5) ( 1.0, 5.3) Appearance of Vessel at Orientation = 5:00 Mixing Proportion (Mean, Std Dev) ( 27.5, 19.3) ( 2.0, 22.1) ( 0.1, 6.1)

32 Outline: Next up... 1 Mixture models and EM algorithm-ology 2 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms 3

33 From npem to NEMS for nonparametric mixtures Motivation: Our npem algorithms are not true EM Idea: combine regularization and npem approach to define an algorithm with a provable ascent property Levine Hunter C (Biometrika 2011) Nonparametric in smooth EM literature refers to a continuous mixing distribution true EM but ill-posed difficulties, Vardi et al. (1985) Smoothed EM (EMS), Silverman et al. (1990) Nonlinear EMS (NEMS) regularization approach Eggermont, LaRiccia (1995); Eggermont (1992, 1999)

34 Smoothing the mixture Applying Eggermont (1992, 1999) idea to component pdf s Nonlinear smoothing N f (x) = exp Ω K h (x u) log f (u) du, where K h (u) = h r r k=1 K (h 1 u k ) is a product kernel

35 Smoothing the mixture Applying Eggermont (1992, 1999) idea to component pdf s Nonlinear smoothing N f (x) = exp Ω K h (x u) log f (u) du, where K h (u) = h r r k=1 K (h 1 u k ) is a product kernel For f = (f 1,..., f m ), define m M λ N f(x) := λ j N f j (x) Goal: minimizing the -sample objective function (Kullback) g(x) l(θ) = l(f, λ) := g(x) log [M λ N f](x) dx. Ω j=1

36 Majorization-Minimization (MM) trick Principle: Let θ t be the current value of θ in an algorithm. A function B(θ θ t ) is said to majorize l(θ) at θ t provided B(θ θ t ) l(θ) for all θ B(θ t θ t ) = l(θ t ) MM minimization algorithm: set θ t+1 = arg min θ B(θ θ t ) The MM algorithm satisfies the descent property l(θ t+1 ) l(θ t ) and we can define such a majorizing function B(θ θ t ) here!

37 MM algorithm with a descent property Smoothed log-likelihood version for finite sample-size given the sample x 1,..., x n iid g l n (f, λ) := n log[m λ N f](x i ) i=1 The following MM algorithm satisfies a descent property l n (f t+1, λ t+1 ) l n (f t, λ t )

38 nonparametric Maximum Smoothed Likelihood (npmsl) algorithm E-step: (requires additional univariate numerical integrations) r k=1 N f jk t (x ik) w t ij = λt j N f t j (x i) M λ t N f t (x i ) = M-step: for j = 1,..., m λ t+1 j λ t j m j =1 λ j r k=1 N f t j k (x ik) = 1 n WKDE-step: For each component j and coord. k (or blocks), f t+1 1 n ( ) u jk (u) = wij t K xik h nhλ t+1 j i=1 n i=1 w t ij

39 The Water-level data again, npem vs. npmsl (dotted) Block 4: 5:00 and 11:00 Orientations Block 3: 4:00 and 10:00 Orientations Density Block 2: 2:00 and 8:00 Orientations Density Block 1: 1:00 and 7:00 Orientations npmsl in mixtools: R> b <- npmsl(waterdata, 3, blockid=c(1,2,3,4,2,1,4,3), h=4) # bandwidth Density Density R> plot(b)

40 Further extensions: Semiparametric models Component or block density may differ only in location and/or scale parameters, e.g. f jl (x) = 1 ( ) x µjl f j σ jl or or f jl (x) = 1 σ jl f l f jl (x) = 1 σ jl f where f j, f l, f remain fully unspecified σ jl ( ) x µjl σ jl ( ) x µjl For all these situations special cases of the npem/npmsl algorithm can be designed (some are already in mixtools). σ jl

41 Outline: Next up... 1 Mixture models and EM algorithm-ology 2 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms 3

42 Bordes C (2010, 2012) Typical data from reliability and life testing + mixture model: lifetime data on R + (X 1,..., X n ) iid g θ = j λ jf j (x) censoring times (C1,..., C n ) iid q (unknown) Random (right) censoring T i = min(x i, C i ), D i = I {Xi C i } Observed (incomplete) data (t, d) = ((t i, d i ), i = 1,... n)

43 Bordes C (2010, 2012) Typical data from reliability and life testing + mixture model: lifetime data on R + (X 1,..., X n ) iid g θ = j λ jf j (x) censoring times (C1,..., C n ) iid q (unknown) Random (right) censoring T i = min(x i, C i ), D i = I {Xi C i } Observed (incomplete) data (t, d) = ((t i, d i ), i = 1,... n) A semiparametric example: Semiparametric accelerated lifetime mixture model g θ (x) = λf (x) + (1 λ)ξ f (ξx) Scaling model: (X Z 1 = 1) has distribution U f, and (X Z 2 = 1) U/ξ, identifiable under some restrictions

44 Semi-parametric Stochastic-EM for censored data (1) Denote F the cdf of f, S = 1 F the Survival function and α = f /S the hazard rate Building blocks: For a single censored sample (t, d), S and α can be estimated nonparametrically using Kaplan-Meier and Nelson-Aalen estimators.

45 Semi-parametric Stochastic-EM for censored data (1) Denote F the cdf of f, S = 1 F the Survival function and α = f /S the hazard rate Building blocks: For a single censored sample (t, d), S and α can be estimated nonparametrically using Kaplan-Meier and Nelson-Aalen estimators. E(X Z j = 1) = + 0 S j (s)ds, where S j (s) = P(X > s Z j = 1) is the jth component survival

46 Semi-parametric Stochastic-EM for censored data (1) Denote F the cdf of f, S = 1 F the Survival function and α = f /S the hazard rate Building blocks: For a single censored sample (t, d), S and α can be estimated nonparametrically using Kaplan-Meier and Nelson-Aalen estimators. E(X Z j = 1) = + 0 S j (s)ds, where S j (s) = P(X > s Z j = 1) is the jth component survival In this scaling model ξ = E(X Z 1=1) E(X Z 2 =1)

47 Semi-parametric Stochastic-EM for censored data (1) Denote F the cdf of f, S = 1 F the Survival function and α = f /S the hazard rate Building blocks: For a single censored sample (t, d), S and α can be estimated nonparametrically using Kaplan-Meier and Nelson-Aalen estimators. E(X Z j = 1) = + 0 S j (s)ds, where S j (s) = P(X > s Z j = 1) is the jth component survival In this scaling model ξ = E(X Z 1=1) E(X Z 2 =1) If X comes from component 2, then ξx f, so that {X i : Z i1 = 1} {ξx i : Z i2 = 1} iid f

48 Semi-parametric Stochastic-EM for censored data (1) Denote F the cdf of f, S = 1 F the Survival function and α = f /S the hazard rate Building blocks: For a single censored sample (t, d), S and α can be estimated nonparametrically using Kaplan-Meier and Nelson-Aalen estimators. E(X Z j = 1) = + 0 S j (s)ds, where S j (s) = P(X > s Z j = 1) is the jth component survival In this scaling model ξ = E(X Z 1=1) E(X Z 2 =1) If X comes from component 2, then ξx f, so that {X i : Z i1 = 1} {ξx i : Z i2 = 1} iid f Estimates ˆξ and ˆf = ˆαŜ available in mixture setup if Z is observed

49 Semi-parametric St-EM for censored data (2) Stochastic version required here! E-step: if d i = 0 (censored lifetime) else (d i = 1 observed lifetime) Z t i1 = λ t S t (t i ) λ t S t (t i ) + (1 λ t )S t (ξ t t i ) Z t i1 = λ t α t (t i )S t (t i ) λ t α t (t i )S t (t i ) + (1 λ t )ξ t α t (ξ t t i )S k (ξ k t i ) S-step: Simulate Ẑ i1 t B(pt i1 ), i = 1,..., n, and set χ t j = {i {1,..., n} : Ẑ i1 t = 1}, j = 1, 2

50 Semi-parametric St-EM for censored data (3) M-step for λ, ξ: λ t+1 = Card(χt 1 ), ξ t+1 = n τ t 1 0 Ŝt 1 (s)ds τ t, τ t 2 0 Ŝt 2 (s)ds j = max where Ŝt j is the Kaplan-Meier estimate based on {(t i, d i ); i χ t j } Nonparametric-step for α, S: Let t t be the order statistic of {t i ; i χ t 1 } {ξt t i ; i χ t 2 } and dt the associated censoring indicators α t+1 (s) = n 1 h K i=1 S t+1 (s) = i:t t i s ( s t t ) i h ( 1 d t i n i + 1 d t i n i + 1 ) for a kerneld. KChauveau and bandwidth RR Bordeaux 2012h. New algorithms in mixtools i χ t j t i

51 Simulated example: scale mixture of Lognormals n = 300, 15% censored scaling weight of component in mixtools: semiparametric Reliability Mixture Model Stochastic EM iterations iterations f(x) Survival function estimate true density Density estimate true R> library(mixtools) R> library(survival) # for KM estimation # simulating data... R> s <- sprmmsem(t, d, scaling=0.1) R> plot(s) time time

52 Outline: Next up... 1 Mixture models and EM algorithm-ology 2 Univariate (semiparametric) mixtures Nonparametric multivariate EM algorithms 3

53 Other new models/algorithms available (1) Parametric (Exponential, Weibull, lognormal... ) reliability mixture models for censored data

54 Other new models/algorithms available (1) Parametric (Exponential, Weibull, lognormal... ) reliability mixture models for censored data Gaussian mixtures with linear constraints on the parameters, e.g. for p m, known M, C, µ = µ 1. µ m = Mβ + C = M β 1. β p C 1 +. C m and similar constaints on variances requires ECM and/or MM algorithms C Hunter (2011)

55 Other new models/algorithms available (1) Parametric (Exponential, Weibull, lognormal... ) reliability mixture models for censored data Gaussian mixtures with linear constraints on the parameters, e.g. for p m, known M, C, µ = µ 1. µ m = Mβ + C = M β 1. β p C 1 +. C m and similar constaints on variances requires ECM and/or MM algorithms C Hunter (2011) Advances in mixtures of regression: predictor-dependent mixing proportions Young, Hunter (CSDA 2010)

56 Other new models (2): Mixtures in FDR estimation In multiple testing (e.g., micro-arrays) we consider n multiple tests p-values p = (p 1,..., p n ) question: False Discovery Rate (FDR) = expected proportion of falsely rejected H 0 s

57 Other new models (2): Mixtures in FDR estimation In multiple testing (e.g., micro-arrays) we consider n multiple tests p-values p = (p 1,..., p n ) question: False Discovery Rate (FDR) = expected proportion of falsely rejected H 0 s mixture modelling: latent variable Z i = 1 if H 0 is true and Z i = 0 if H 0 is rejected (= interesting case) g θ (p) = λ 0 f 0 (p) + (1 λ 0 )f 1 (p) theoretically (p i H 0 true) U [0,1] f 0 known!

58 Other new models (2): Mixtures in FDR estimation In multiple testing (e.g., micro-arrays) we consider n multiple tests p-values p = (p 1,..., p n ) question: False Discovery Rate (FDR) = expected proportion of falsely rejected H 0 s mixture modelling: latent variable Z i = 1 if H 0 is true and Z i = 0 if H 0 is rejected (= interesting case) g θ (p) = λ 0 f 0 (p) + (1 λ 0 )f 1 (p) theoretically (p i H 0 true) U [0,1] f 0 known! nonparametric assumption for f 1 as e.g. in Robin et. al or the fdrtool package Strimmer 2008

59 Other new models (2): Mixtures in FDR estimation In multiple testing (e.g., micro-arrays) we consider n multiple tests p-values p = (p 1,..., p n ) question: False Discovery Rate (FDR) = expected proportion of falsely rejected H 0 s mixture modelling: latent variable Z i = 1 if H 0 is true and Z i = 0 if H 0 is rejected (= interesting case) g θ (p) = λ 0 f 0 (p) + (1 λ 0 )f 1 (p) theoretically (p i H 0 true) U [0,1] f 0 known! nonparametric assumption for f 1 as e.g. in Robin et. al or the fdrtool package Strimmer 2008 In mixtools: a semiparametric EM with one component known, for FDR estimation C Saby (2011, 2012)

60 The end MERCI DE VOTRE ATTENTION! ET

61 For Further Reading (1/3) Benaglia, T., Chauveau, D., and Hunter, D. R. An EM-like algorithm for semi- and non-parametric estimation in mutlivariate mixtures. J. Comput. Graph. Statist. 18(2): , Benaglia T., Chauveau D., Hunter D. R., Young D. S. mixtools: An R Package for Analyzing Mixture Models. Journal of Statistical Software 32: 1 29, L. Bordes, D. Chauveau, and P. Vandekerkhove. A stochastic EM algorithm for a semiparametric mixture model. Comput. Statist. & Data Anal., 51(11): , 2007.

62 For Further Reading (2/3) Bordes, L. and Chauveau D. EM and Stochastic EM algorithms for reliability mixture models under random censoring. Preprint HAL, 2012 Levine, M., Hunter, D. R.,Chauveau, D. Maximum Smoothed Likelihood for Multivariate Mixtures. Biometrika 98(2): , E. S. Allman, C. Matias, and J. A. Rhodes. Identifiability of parameters in latent structure models with many observed variables. Ann. Statist, 37(6A): , 2009.

63 For Further Reading (3/3) S. F. Nielsen. The stochastic EM algorithm: Estimation and asymptotic results. Bernoulli, 6(3): , D. Young and D. Hunter. Mixtures of regressions with predictor-dependent mixing proportions. Computational Statistics and Data Analysis, 54: , 2010.

Estimation for nonparametric mixture models

Estimation for nonparametric mixture models David Hunter Penn State University Research supported by NSF Grant SES 0518772 Joint work with Didier Chauveau (University of Orléans, France), Tatiana Benaglia