Maximum likelihood estimation of a log-concave density based on censored data

Size: px

Start display at page:

Download "Maximum likelihood estimation of a log-concave density based on censored data"

Marylou McDonald
5 years ago
Views:

1 Maximum likelihood estimation of a log-concave density based on censored data Dominic Schuhmacher Institute of Mathematical Statistics and Actuarial Science University of Bern Joint work with Lutz Dümbgen and Kaspar Rufibach Swiss Statistics Seminar, 25 October 2011

2 Overview 1 A review of log-concave density estimation for exact data 2 Log-concave density estimation for censored data Problem formulation Theoretical results An EM algorithm Simulated and real data examples

3 Part 1: Log-concave density estimation for exact data

4 Part 1: log-concave densities and exact data Problem Problem: Nonparametric estimation of the density f of a distribution P on R based on independent observations X 1, X 2,..., X n using maximum likelihood.

5 Part 1: log-concave densities and exact data Problem Problem: Nonparametric estimation of the density f of a distribution P on R based on independent observations X 1, X 2,..., X n using maximum likelihood. Log-likelihood: n l(f ) := log f (X i ), i=1 not bounded from above on space of all densities f.

6 Part 1: log-concave densities and exact data Problem Problem: Nonparametric estimation of the density f of a distribution P on R based on independent observations X 1, X 2,..., X n using maximum likelihood. Log-likelihood: l(f ) := n log f (X i ), i=1 not bounded from above on space of all densities f. Our approach: impose shape constraint. Unimodality is not enough.

7 Part 1: log-concave densities and exact data Unimodality is not enough! Distribute mass 1/2 uniformly over [x (1), x (n) ] (blue) and concentrate mass 1/2 more and more closely around one individual point (red)

8 Part 1: log-concave densities and exact data Log-concavity A density f is called log-concave if ϕ := log f : R [, ) is a concave function.

9 Part 1: log-concave densities and exact data Log-concavity A density f is called log-concave if ϕ := log f : R [, ) is a concave function. Turns out that class F lc of log-concave densities is a large, rather flexible class with many nice properties.

10 Part 1: log-concave densities and exact data Properties of F lc Every log-concave density is unimodal.

11 Part 1: log-concave densities and exact data Properties of F lc Every log-concave density is unimodal. Ibragimov (1956): f log-concave if and only if f g is unimodal for every unimodal g (Log-concavity has been called strong unimodality ).

12 Part 1: log-concave densities and exact data Properties of F lc Every log-concave density is unimodal. Ibragimov (1956): f log-concave if and only if f g is unimodal for every unimodal g (Log-concavity has been called strong unimodality ). F lc is closed under convolution and weak limits.

13 Part 1: log-concave densities and exact data Properties of F lc Every log-concave density is unimodal. Ibragimov (1956): f log-concave if and only if f g is unimodal for every unimodal g (Log-concavity has been called strong unimodality ). F lc is closed under convolution and weak limits. S, Hüsler and Dümbgen (2009/11), strong continuity property: weak convergence implies pointwise convergence of densities, and conv. in an exponentially weighted total variation distance: exp(δ x ) fn (x) f (x) dx 0 for some δ > 0.

14 Part 1: log-concave densities and exact data Properties of F lc Every log-concave density is unimodal. Ibragimov (1956): f log-concave if and only if f g is unimodal for every unimodal g (Log-concavity has been called strong unimodality ). F lc is closed under convolution and weak limits. S, Hüsler and Dümbgen (2009/11), strong continuity property: weak convergence implies pointwise convergence of densities, and conv. in an exponentially weighted total variation distance: exp(δ x ) fn (x) f (x) dx 0 for some δ > 0. Every log-concave density has subexponential tails and a non-decreasing hazard rate.

15 Part 1: log-concave densities and exact data Parametric families of log-concave distributions The class of log-concave densities contains the following parametric families: Normal, logistic, exponential, Laplace, uniform, Gumbel Gamma and Weibull with shape parameter 1 Beta with both shape parameters 1

16 Part 1: log-concave densities and exact data Log-concave density estimation Nonparametric maximum likelihood estimation under log-concavity constraint has been studied independently in Rufibach (2006) and Dümbgen and Rufibach (2009) Pal, Woodroofe, and Meyer (2007)

17 Part 1: log-concave densities and exact data Log-concave density estimation Nonparametric maximum likelihood estimation under log-concavity constraint has been studied independently in Rufibach (2006) and Dümbgen and Rufibach (2009) Pal, Woodroofe, and Meyer (2007) Task: for n 2 find fˆn arg max f F lc n log f (X i ), i=1 where F lc := {f : R R + log-concave density}.

18 Part 1: log-concave densities and exact data First results Equivalent problem (by Silverman s trick ): find ϕ n arg max ϕ Φ n ϕ(x i ) 1 exp ϕ(x) dx n i=1 0 }{{} =:L(ϕ) where Φ is set of all concave (and upper semicontinuous, say) functions R [, ).

19 Part 1: log-concave densities and exact data First results Equivalent problem (by Silverman s trick ): find ϕ n arg max ϕ Φ n ϕ(x i ) 1 exp ϕ(x) dx n i=1 0 }{{} =:L(ϕ) where Φ is set of all concave (and upper semicontinuous, say) functions R [, ). Shape: ϕ n piecewise linear, changes of slope only possible at data points X i, dom(ϕ n) = [X (1), X (n) ].

20 Part 1: log-concave densities and exact data First results Equivalent problem (by Silverman s trick ): find ϕ n arg max ϕ Φ n ϕ(x i ) 1 exp ϕ(x) dx n i=1 0 }{{} =:L(ϕ) where Φ is set of all concave (and upper semicontinuous, say) functions R [, ). Shape: ϕ n piecewise linear, changes of slope only possible at data points X i, dom(ϕ n) = [X (1), X (n) ]. L depends on ϕ only via ϕ(x 1 ),..., ϕ(x n ). Is strictly concave and coercive; defined on a closed, convex cone R n. = Existence and uniqueness of NPMLE fˆn.

21 Part 1: log-concave densities and exact data Further properties of fˆn (simplified) (Dümbgen & Rufibach, 2009, and Balabdaoui, Rufibach & Wellner, 2009) Mean(fˆn) = empirical mean, Variance(fˆn) empirical variance.

22 Part 1: log-concave densities and exact data Further properties of fˆn (simplified) (Dümbgen & Rufibach, 2009, and Balabdaoui, Rufibach & Wellner, 2009) Mean(fˆn) = empirical mean, Variance(fˆn) empirical variance. Uniform consistency: fˆn f 0, and F n F 0.

23 Part 1: log-concave densities and exact data Further properties of fˆn (simplified) (Dümbgen & Rufibach, 2009, and Balabdaoui, Rufibach & Wellner, 2009) Mean(fˆn) = empirical mean, Variance(fˆn) empirical variance. Uniform consistency: fˆn f 0, and F n F 0. Rate-optimality and adaptivity: For any compact interval T int{f > 0} and a true f that is Hölder continuous with exponent β [1, 2], we have max x T fˆn(x) f (x) = O p ( (log(n) n ) β/(2β+1) ).

24 Part 1: log-concave densities and exact data Further properties of fˆn (simplified) (Dümbgen & Rufibach, 2009, and Balabdaoui, Rufibach & Wellner, 2009) Mean(fˆn) = empirical mean, Variance(fˆn) empirical variance. Uniform consistency: fˆn f 0, and F n F 0. Rate-optimality and adaptivity: For any compact interval T int{f > 0} and a true f that is Hölder continuous with exponent β [1, 2], we have max x T fˆn(x) f (x) = O p ( (log(n) n ) β/(2β+1) ). Pointwise limit distributions: If f (x 0 ) > 0, ϕ C 2 around x 0, and ϕ (x 0 ) 0: n 2/5( fˆn(x 0 ) f (x 0 ) ) D c(x0, f )H (0), where H is so-called lower invelope of Y (t) = 0 t 0 t W (s) ds t 4, W two-sided Brownian motion starting in 0.

25 Part 1: log-concave densities and exact data Further properties of fˆn (simplified) (Dümbgen & Rufibach, 2009, and Balabdaoui, Rufibach & Wellner, 2009) Mean(fˆn) = empirical mean, Variance(fˆn) empirical variance. Uniform consistency: fˆn f 0, and F n F 0. Rate-optimality and adaptivity: For any compact interval T int{f > 0} and a true f that is Hölder continuous with exponent β [1, 2], we have max x T fˆn(x) f (x) = O p ( (log(n) n ) β/(2β+1) ). Pointwise limit distributions: If f (x 0 ) > 0, ϕ C 2 around x 0, and ϕ (x 0 ) 0: n 2/5( fˆn(x 0 ) f (x 0 ) ) D c(x0, f )H (0), where H is so-called lower invelope of Y (t) = 0 t 0 t W (s) ds t 4, W two-sided Brownian motion starting in 0. results for derived quantities: mode, distance between knots.

26 Part 1: log-concave densities and exact data Computation We have a concave maximization problem over a closed convex cone in R n. n potentially large.

27 Part 1: log-concave densities and exact data Computation We have a concave maximization problem over a closed convex cone in R n. n potentially large. There is a fast active set algorithm by Dümbgen and Rufibach (2011), available in the R-package logcondens.

28 Part 1: log-concave densities and exact data Simulation example Sample 100 points from Γ(3, 1).

29 Part 1: log-concave densities and exact data Simulation example Sample 100 points from Γ(3, 1) Log-concave density estimation from i.i.d. data Density estimate. Dashed vertical lines indicate knots of the log-density. log-concave f^n log-concave smoothed f^n* 0.20 density

30 Part 1: log-concave densities and exact data Simulation example Sample 100 points from Γ(3, 1) Log-concave density estimation from i.i.d. data Density estimate. Dashed vertical lines indicate knots of the log-density. log-concave f^n log-concave smoothed f^n* 0.20 density

31 Part 1: log-concave densities and exact data Related work Multivariate case (Cule, Samworth and Stewart, 2010, JRSS B); R package LogConcDEAD (Cule, Gramacy and Samworth, 2009, J Stat. Soft.).

32 Part 1: log-concave densities and exact data Related work Multivariate case (Cule, Samworth and Stewart, 2010, JRSS B); R package LogConcDEAD (Cule, Gramacy and Samworth, 2009, J Stat. Soft.). General approximation theory with applications to additive regression models with log-concave error distribution (Dümbgen, Samworth and S, 2010, Ann. Stat.).

33 Part 1: log-concave densities and exact data Related work Multivariate case (Cule, Samworth and Stewart, 2010, JRSS B); R package LogConcDEAD (Cule, Gramacy and Samworth, 2009, J Stat. Soft.). General approximation theory with applications to additive regression models with log-concave error distribution (Dümbgen, Samworth and S, 2010, Ann. Stat.). Discrete case (Balabdaoui and Rufibach, preprint 2011).

34 Part 2: Log-concave density estimation for censored data

35 Censored data For simplicity of notation: estimate densities on R + ; think of distribution of time to a certain event. Instead of exact values X i we observe intervals X i containing X i.

36 Censored data For simplicity of notation: estimate densities on R + ; think of distribution of time to a certain event. Instead of exact values X i we observe intervals X i containing X i. The following cases are possible: X i = {X i }; X i = (L i, R i ] where 0 L i < R i < ; X i = (L i, ) where L i 0.

37 Censored data For simplicity of notation: estimate densities on R + ; think of distribution of time to a certain event. Instead of exact values X i we observe intervals X i containing X i. The following cases are possible: X i = {X i }; X i = (L i, R i ] where 0 L i < R i < ; X i = (L i, ) where L i 0. Write generally L i := inf X i, R i := sup X i.

38 Censored data For simplicity of notation: estimate densities on R + ; think of distribution of time to a certain event. Instead of exact values X i we observe intervals X i containing X i. The following cases are possible: X i = {X i }; X i = (L i, R i ] where 0 L i < R i < ; X i = (L i, ) where L i 0. Write generally L i := inf X i, R i := sup X i. We would like to estimate the underlying distribution of the X i s. (It is not clear what this means without further specification.)

39 A concrete censoring model Assume that the i-th individual is inspected at times T ij, 1 j K i, where 0 =: T i0 < T i1 < T i2 <... < T i,ki. If K i =, assume that T i, := sup j (T ij ) =.

40 A concrete censoring model Assume that the i-th individual is inspected at times T ij, 1 j K i, where 0 =: T i0 < T i1 < T i2 <... < T i,ki. If K i =, assume that T i, := sup j (T ij ) =. Let furthermore Y ij be an indicator for exact observability in the j-th inter-inspection interval { (T ij, T i,j+1 ] if 0 j < K i ; I ij := (T ij, ) if j = K i.

41 A concrete censoring model Assume that the i-th individual is inspected at times T ij, 1 j K i, where 0 =: T i0 < T i1 < T i2 <... < T i,ki. If K i =, assume that T i, := sup j (T ij ) =. Let furthermore Y ij be an indicator for exact observability in the j-th inter-inspection interval { (T ij, T i,j+1 ] if 0 j < K i ; I ij := (T ij, ) if j = K i. Then the X i translate into observations X i as follows: { X i if X i I ij and Y ij = 1; X i := I ij if X i I ij and Y ij = 0.

42 A concrete censoring model Assume that the i-th individual is inspected at times T ij, 1 j K i, where 0 =: T i0 < T i1 < T i2 <... < T i,ki. If K i =, assume that T i, := sup j (T ij ) =. Let furthermore Y ij be an indicator for exact observability in the j-th inter-inspection interval { (T ij, T i,j+1 ] if 0 j < K i ; I ij := (T ij, ) if j = K i. Then the X i translate into observations X i as follows: { X i if X i I ij and Y ij = 1; X i := I ij if X i I ij and Y ij = 0. Note: The K i, T ij, and Y ij may be arbitrarily dependent random variables.

43 Special cases Right-censoring: if event happens prior to a single inspection time, it is observed exactly, otherwise censored. K i 1, I i0 1, I i1 0.

44 Special cases Right-censoring: if event happens prior to a single inspection time, it is observed exactly, otherwise censored. K i 1, I i0 1, I i1 0. Current status model: it is only known whether event happened before or after a single inspection time. K i 1, I i0 I i1 0.

45 Special cases Right-censoring: if event happens prior to a single inspection time, it is observed exactly, otherwise censored. K i 1, I i0 1, I i1 0. Current status model: it is only known whether event happened before or after a single inspection time. K i 1, I i0 I i1 0. Mixed case interval-censoring: for each individual it is known in which of a number of contiguous intervals the event happened. K i finite-valued, I ij 0 for all j.

46 Special cases Right-censoring: if event happens prior to a single inspection time, it is observed exactly, otherwise censored. K i 1, I i0 1, I i1 0. Current status model: it is only known whether event happened before or after a single inspection time. K i 1, I i0 I i1 0. Mixed case interval-censoring: for each individual it is known in which of a number of contiguous intervals the event happened. K i finite-valued, I ij 0 for all j. Rounding and binning: X i s are known up to rounding to closest integer (say). K i, I ij 0, T ij j + 1/2 for all i, j.

47 Conditional log-likelihood Assume that conditionally on all K i, T ij, and Y ij the random variables X 1,..., X n are i.i.d. with density f = exp ϕ.

48 Conditional log-likelihood Assume that conditionally on all K i, T ij, and Y ij the random variables X 1,..., X n are i.i.d. with density f = exp ϕ. The normalized log-likelihood for ϕ is then l(ϕ) := l(ϕ; X 1,..., X n) := 1 n [ 1{L i = R i }ϕ(x i ) + 1{L i < R i } log n i=1 where L i = inf X i, R i = sup X i. ( Ri L i )] exp ϕ(x) dx,

49 Likelihood maximization Maximize l(ϕ) over all ϕ Φ that satisfy exp ϕ(x) dx = 1.

50 Likelihood maximization Maximize l(ϕ) over all ϕ Φ that satisfy exp ϕ(x) dx = 1. Equivalently: Maximize L(ϕ) = 1 n n [ 1{L i = R i }ϕ(x i ) + 1{L i < R i } log i=1 0 exp ϕ(x) dx ( Ri L i )] exp ϕ(x) dx over ϕ Φ.

51 Shape requirements for maximizer ϕ Let 0 τ 1 < τ 2 <... < τ q be the endpoints of the data intervals, i.e. {τ 1,..., τ q } = {L i : 1 i n} {R i : 1 i n}.

52 Shape requirements for maximizer ϕ Let 0 τ 1 < τ 2 <... < τ q be the endpoints of the data intervals, i.e. {τ 1,..., τ q } = {L i : 1 i n} {R i : 1 i n}. Then any ϕ arg max ϕ Φ L(ϕ) satisfies [min i R i, max i L i ] dom ( ϕ ) [τ 1, τ q ]

53 Shape requirements for maximizer ϕ Let 0 τ 1 < τ 2 <... < τ q be the endpoints of the data intervals, i.e. {τ 1,..., τ q } = {L i : 1 i n} {R i : 1 i n}. Then any ϕ arg max ϕ Φ L(ϕ) satisfies [min i R i, max i L i ] dom ( ϕ ) [τ 1, τ q ] ϕ is linear on every interval (τ j, τ j+1 ) dom(ϕ ) \ n i=1 [L i, R i ].

54 Shape simplifications for maximizer Theorem (shape) Suppose that arg max ϕ Φ L(ϕ).

55 Shape simplifications for maximizer Theorem (shape) Suppose that arg max ϕ Φ L(ϕ). Then there exists a maximizer ϕ with dom(ϕ ) = [τ j1, τ j2 ] R + for some j 1, j 2 that is piecewise linear on dom(ϕ ) with at most one change of slope in any interval (τ j, τ j+1 ).

56 Shape simplifications for maximizer Theorem (shape) Suppose that arg max ϕ Φ L(ϕ). Then there exists a maximizer ϕ with dom(ϕ ) = [τ j1, τ j2 ] R + for some j 1, j 2 that is piecewise linear on dom(ϕ ) with at most one change of slope in any interval (τ j, τ j+1 ). There is no change of slope in the two extreme intervals (τ j1, τ j1 +1) and (τ j2 1, τ j2 ); the interval (τ j1 +1, τ j1 +2) unless there is an i such that L i = R i = τ j1, and the interval (τ j2 2, τ j2 1) unless there is an i such that L i = R i = τ j2 ; any interval (τ j, τ j+1 ) dom(ϕ ) \ n i=1 [L i, R i ].

57 Existence of maximizer Theorem (existence) Assume that there is no i o {1,..., n} with L io = R io = X io and Then n [L i, R i ] = {X io }. i=1 arg max ϕ G conc L(ϕ), where G conc Φ consists of all piecewise linear functions with domain and changes of slope according to the shape theorem. We only allow slope changes on a fine grid t 1 < t 2 <... < t m containing the finite τ j.

58 Counter-example Suppose that there is an i o {1,..., n} with L io = R io = X io and n [L i, R i ] = {X io }. i=1

59 Counter-example Suppose that there is an i o {1,..., n} with L io = R io = X io and n [L i, R i ] = {X io }. i=1 Assuming that there are intervals to the left and to the right of X io =: τ jo (otherwise idea is easily adapted), we can obtain an arbitrarily large log-likelihood, by defining ϕ as a triangle function with domain [τ jo 1, τ jo+1] and peak in τ jo :

60 Counter-example Suppose that there is an i o {1,..., n} with L io = R io = X io and n [L i, R i ] = {X io }. i=1 Assuming that there are intervals to the left and to the right of X io =: τ jo (otherwise idea is easily adapted), we can obtain an arbitrarily large log-likelihood, by defining ϕ as a triangle function with domain [τ jo 1, τ jo+1] and peak in τ jo : The integral of exp ϕ over [τ jo 1, τ jo ] and [τ jo, τ jo+1] can be kept constant, while ϕ(τ jo ) and hence L(ϕ) go towards.

61 Proof idea of existence theorem Consider as a problem of maximizing L(ϕ) over a convex cone in [, ) m [, 0); ϕ = (ϕ 1,..., ϕ m, ϕ m+1 ), where ϕ i := ϕ(t i ) and ϕ m+1 := ϕ (t m +).

62 Proof idea of existence theorem Consider as a problem of maximizing L(ϕ) over a convex cone in [, ) m [, 0); ϕ = (ϕ 1,..., ϕ m, ϕ m+1 ), where ϕ i := ϕ(t i ) and ϕ m+1 := ϕ (t m +). 1 Log-likelihood is continuous on G conc ([, k] m [, 1/k]) for every k N, which is compact in extended Euclidean topology = max over this set exists.

63 Proof idea of existence theorem Consider as a problem of maximizing L(ϕ) over a convex cone in [, ) m [, 0); ϕ = (ϕ 1,..., ϕ m, ϕ m+1 ), where ϕ i := ϕ(t i ) and ϕ m+1 := ϕ (t m +). 1 Log-likelihood is continuous on G conc ([, k] m [, 1/k]) for every k N, which is compact in extended Euclidean topology = max over this set exists. 2 Show L(ϕ (k) ) for every sequence (ϕ (k) ) k G conc with max 1 i m+1 ϕ (k) i by somewhat tedious calculations, where ϕ (k) m+1 := log( ϕ (k) m+1 ).

64 Proof idea of existence theorem Consider as a problem of maximizing L(ϕ) over a convex cone in [, ) m [, 0); ϕ = (ϕ 1,..., ϕ m, ϕ m+1 ), where ϕ i := ϕ(t i ) and ϕ m+1 := ϕ (t m +). 1 Log-likelihood is continuous on G conc ([, k] m [, 1/k]) for every k N, which is compact in extended Euclidean topology = max over this set exists. 2 Show L(ϕ (k) ) for every sequence (ϕ (k) ) k G conc with max 1 i m+1 ϕ (k) i by somewhat tedious calculations, where ϕ (k) m+1 := log( ϕ (k) m+1 ). Assuming now that there is no maximizer in G conc ( [, ) m [, 0) ) leads to a contradiction.

65 Open problems uniqueness

66 Open problems uniqueness consistency and rates

67 Open problems uniqueness consistency and rates everything else...

68 Computation Still maximization over cone in typically very high dimension (now m). But our log-likelihood function is not concave anymore!

69 Computation Still maximization over cone in typically very high dimension (now m). But our log-likelihood function is not concave anymore! We try an EM algorithm.

70 Computation Still maximization over cone in typically very high dimension (now m). But our log-likelihood function is not concave anymore! We try an EM algorithm. Remember: l(ϕ) := 1 n [ 1{L i = R i }ϕ(x i ) + 1{L i < R i } log n i=1 incomplete data log-likelihood (normalized); ( Ri L i )] exp ϕ(x) dx.

71 Computation Still maximization over cone in typically very high dimension (now m). But our log-likelihood function is not concave anymore! We try an EM algorithm. Remember: l(ϕ) := 1 n [ 1{L i = R i }ϕ(x i ) + 1{L i < R i } log n i=1 incomplete data log-likelihood (normalized); ( Ri L i )] exp ϕ(x) dx. l (ϕ) := 1 n n ϕ(x i ) i=1 complete data log-likelihood (normalized).

72 EM algorithm E step: Given ϕ r G conc (log-densities in G conc ), compute l (ϕ; ϕ r ) := E ϕr ( l (ϕ) X i X i for all i ) = 1 n = 1 n n i=1 i=1 ( E ϕr ϕ(xi ) ) Xi X i n [1{L i = R i }ϕ(x i ) + 1{L i < R i } E ( ϕ r ϕ(xi ) 1{X i X i} ) ] ( ). P ϕr Xi X i Easy, since ϕ r, ϕ piecewise linear.

73 EM algorithm E step: Given ϕ r G conc (log-densities in G conc ), compute l (ϕ; ϕ r ) := E ϕr ( l (ϕ) X i X i for all i ) = 1 n = 1 n n i=1 i=1 ( E ϕr ϕ(xi ) ) Xi X i n [1{L i = R i }ϕ(x i ) + 1{L i < R i } E ( ϕ r ϕ(xi ) 1{X i X i} ) ] ( ). P ϕr Xi X i Easy, since ϕ r, ϕ piecewise linear. M step: Maximize l (ϕ; ϕ r ) over all ϕ G conc with dom(ϕ) dom(ϕ r ). Domain will not get smaller. Call maximizer ϕ r+1.

74 Reduction to Active Set Algorithm E ϕr ( ϕ(xi ) 1{X i [a, b]} ) is a linear combination of ϕ(a) and ϕ(b) if ϕ r, ϕ are linear on [a, b], 0 a b <.

75 Reduction to Active Set Algorithm E ϕr ( ϕ(xi ) 1{X i [a, b]} ) is a linear combination of ϕ(a) and ϕ(b) if ϕ r, ϕ are linear on [a, b], 0 a b <. E ϕr ( ϕ(xi ) 1{X i [a, )} ) is a linear combination of ϕ(a) and ϕ (a+) if ϕ r, ϕ are linear on [a, ), 0 a <.

76 Reduction to Active Set Algorithm E ϕr ( ϕ(xi ) 1{X i [a, b]} ) is a linear combination of ϕ(a) and ϕ(b) if ϕ r, ϕ are linear on [a, b], 0 a b <. E ϕr ( ϕ(xi ) 1{X i [a, )} ) is a linear combination of ϕ(a) and ϕ (a+) if ϕ r, ϕ are linear on [a, ), 0 a <. Therefore l (ϕ; ϕ r ) = m w j ϕ(t j ) + w m+1 ϕ (t m +) j=1 with certain (computable) weights w 1,..., w m > 0; w m+1 > 0 if right endpoint appears in data intervals, = 0 otherwise.

77 Reduction to Active Set Algorithm E ϕr ( ϕ(xi ) 1{X i [a, b]} ) is a linear combination of ϕ(a) and ϕ(b) if ϕ r, ϕ are linear on [a, b], 0 a b <. E ϕr ( ϕ(xi ) 1{X i [a, )} ) is a linear combination of ϕ(a) and ϕ (a+) if ϕ r, ϕ are linear on [a, ), 0 a <. Therefore l (ϕ; ϕ r ) = m w j ϕ(t j ) + w m+1 ϕ (t m +) j=1 with certain (computable) weights w 1,..., w m > 0; w m+1 > 0 if right endpoint appears in data intervals, = 0 otherwise. Is weighted version of our exact data log-likelihood. We may use the original active set algorithm for m data points (with an extension for the right-hand slope)!

78 Domain reduction EM algorithm starts on maximal domain [min i L i, max i R i ]. Problem: Domain is never reduced, but we may have started on too large domain. Algorithm forces ϕ r towards on extra domain (leading to convergence / numerical problems).

79 Domain reduction EM algorithm starts on maximal domain [min i L i, max i R i ]. Problem: Domain is never reduced, but we may have started on too large domain. Algorithm forces ϕ r towards on extra domain (leading to convergence / numerical problems). Solution: If τ j+1 τ j exp ϕ(x) dx gets too small for some j (only possible at the very left or very right of current domain), the corresponding interval is removed from the domain and the EM algorithm is restarted on the reduced domain.

80 Three examples (simulated) right-censored Γ(3, 1)-data

81 Three examples (simulated) right-censored Γ(3, 1)-data (simulated) mixed-case interval-censored Gumbel(2, 1)-data

82 Three examples (simulated) right-censored Γ(3, 1)-data (simulated) mixed-case interval-censored Gumbel(2, 1)-data (real data) ovarian cancer cases

83 Example 1: Right-censoring Consider the same 100 data points from Γ(3, 1) as before, but now introduce censoring after individual Γ(2, 1/2)-distributed inspection times. Resulting data (37 censored):

84 Example 1: Log-density estimates NPMLE for censored data true log-density NPMLE for exact data

85 Example 1: Density estimates NPMLE for censored data true density NPMLE for exact data

86 Example 2: Interval-censoring 100 data points from a Gumbel(2, 1)-distribution. Each individual is inspected according to a Poisson(1)-process, i.e. each inspection takes place an Exp(1)-distributed time after the last (all times independent)

87 Example 2: Survival function estimates survival function of log-concave NPMLE true survival function Turnbull estimator of survival function

88 Example 3: Real data We consider the dataset ovarian from the R-package survival. Survival times in days of 26 women with ovarian cancer (14 censored)

89 Example 3: Survival function estimates log-concave NPMLE Kaplan-Meier estimator

90 Cure probability What if we allow for the possibility that patients are cured, i.e. with a certain probability p 0 a woman will not die from cancer?

91 Cure probability What if we allow for the possibility that patients are cured, i.e. with a certain probability p 0 a woman will not die from cancer? We model this by allowing subprobability densities of total mass 1 p 0 and adding a point mass of p 0 at time.

92 Cure probability What if we allow for the possibility that patients are cured, i.e. with a certain probability p 0 a woman will not die from cancer? We model this by allowing subprobability densities of total mass 1 p 0 and adding a point mass of p 0 at time. New log-likelihood: l(ϕ, p 0 ) := 1 n [ 1{L i = R i }ϕ(x i ) n i=1 ( Ri )] + 1{L i < R i } log exp ϕ(x) dx + p 0 1{R i = }. L i

93 Cure probability What if we allow for the possibility that patients are cured, i.e. with a certain probability p 0 a woman will not die from cancer? We model this by allowing subprobability densities of total mass 1 p 0 and adding a point mass of p 0 at time. New log-likelihood: l(ϕ, p 0 ) := 1 n [ 1{L i = R i }ϕ(x i ) n i=1 ( Ri )] + 1{L i < R i } log exp ϕ(x) dx + p 0 1{R i = }. L i For known p 0, nothing essential changes in our algorithm!

94 Profile likelihood maximization in p 0 Maximize l profile (p 0 ) := max ϕ Gconc (p 0 ) l(ϕ, p 0) p 0 profile log-likelihood p0

95 Profile likelihood maximization in p 0 Maximize l profile (p 0 ) := max ϕ Gconc (p 0 ) l(ϕ, p 0) p 0 profile log-likelihood In our example p Compute ϕ arg max ϕ Gconc l(ϕ, p 0). (p 0) p0

96 Example 3: Survival function estimates log-concave NPMLE Kaplan-Meier estimator

97 R package Algorithm implemented in R package logconcens. Give it a try!

Nonparametric estimation of log-concave densities

Nonparametric estimation of log-concave densities Jon A. Wellner University of Washington, Seattle Seminaire, Institut de Mathématiques de Toulouse 5 March 2012 Seminaire, Toulouse Based on joint work