Active Set Methods for Log-Concave Densities and Nonparametric Tail Inflation

Size: px

Start display at page:

Download "Active Set Methods for Log-Concave Densities and Nonparametric Tail Inflation"

Emmeline Taylor
5 years ago
Views:

1 Active Set Methods for Log-Concave Densities and Nonparametric Tail Inflation Lutz Duembgen (Bern) Aleandre Moesching and Christof Straehl (Bern) Peter McCullagh and Nicholas G. Polson (Chicago) January 2018

2 The general setting Data. Summarized as a distribution P = n w i δ i i=1 with weights w 1, w 2,..., w n > 0 support points 1 < 2 < < n in an open interval X R

3 The general setting Data. Summarized as a distribution P = n w i δ i i=1 with weights w 1, w 2,..., w n > 0 support points 1 < 2 < < n in an open interval X R Model. P estimates distribution P(d) = e θ() M(d) with given measure M on X unknown function θ in given family Θ 1

4 Goal. Estimate θ Θ 1 via MLE θ arg ma θ Θ 1 θ d P

5 Goal. Estimate θ Θ 1 via MLE θ arg ma θ Θ 1 θ d P Modification. Suppose that for a larger function family Θ { } Θ 1 = θ Θ : e θ dm = 1 θ + c Θ for all θ Θ, c R Then ( θ arg ma θ d P θ Θ ) e θ dm

6 Setting 1: Log-concave densities P has log-concave density on X, i.e. M = Lebesgue measure on X Θ = { θ : X [, ) concave and u.s.c. } f() log(f())

7 Setting 2A: Tail inflation P has log-conve density w.r.t. given distribution P 0 on X, i.e. M = P 0 Θ = { θ : X R conve } p 0(), p() θ()

8 Eample. Observe X i = µ i + σ i ε i, 1 i n, with unknown parameters µ i R, σ i 1 and independent r.v.s ε i P 0 := N (0, 1)

9 Eample. Observe X i = µ i + σ i ε i, 1 i n, with unknown parameters µ i R, σ i 1 and independent r.v.s ε i P 0 := N (0, 1) Marginal dist. P := n 1 i=1 L(X i): θ() := log dp dp 0 () = log( 1 n n ) e θ i () i=1 θ i () := log(σ i ) ( µ i) 2 2σi 2 θ i, θ 0 (Artin s theorem)

10 Setting 2B: Tail inflation (McCullagh and Polson 2012) Assume that P has log-conve and isotonic density w.r.t. given distribution P 0 on X, i.e. M = P 0 Θ = { θ : X R conve and isotonic }. p 0(), p() θ()

11 Eample. Observe with independent r.v.s X i = S i ε i, 1 i n, ε i N (0, 1), S i 1.

12 Eample. Observe with independent r.v.s X i = S i ε i, 1 i n, ε i N (0, 1), S i 1. Then Y i := X 2 i P 0 = χ 2 1 = Gamma(1/2, 2) if S i 1 IE Gamma(1/2, 2Si 2) in general

13 Eample. Observe with independent r.v.s X i = S i ε i, 1 i n, ε i N (0, 1), S i 1. Then Y i := X 2 i P 0 = χ 2 1 = Gamma(1/2, 2) if S i 1 IE Gamma(1/2, 2Si 2) in general Marginal dist. P := n 1 i=1 L(Y i): θ := log dp dp 0 is conve and isotonic

14 Eistence and uniqueness of θ Lemma 1 (Log-concavity) In Setting 1, (! θ arg ma θ d P ) e θ() d. θ Θ X

15 Eistence and uniqueness of θ Lemma 1 (Log-concavity) In Setting 1, (! θ arg ma θ d P ) e θ() d. θ Θ X Additional properties: θ piecewise linear on [1, n ] changes of slope only in { 2,..., n 1 } θ on R \ [1, n ]

16 Lemma 2. In Settings 2A and 2B, (! θ arg ma θ d P θ Θ e θ dp 0 ) provided that supp(p 0 ) = X.

17 Lemma 2. In Settings 2A and 2B, (! θ arg ma θ d P θ Θ e θ dp 0 ) provided that supp(p 0 ) = X. Additional properties: θ piecewise linear on X changes of slope only in n 1 i=1 ( i, i+1 ) at most one change of slope in ( i, i+1 ), 1 i < n

18 Active set algorithm In Settings 1 and 2A, θ V Θ with V := { linear splines on X o with kinks on D } { [ 1, n ] in Setting 1 X o := X in Setting 2A { { 2,..., n 1 } in Setting 1 D := X in Setting 2A

19 For such a spline v V set D(v) := { τ D : v (τ ) v (τ +) } (deactivated (equality) constraints)

20 For such a spline v V set D(v) := { τ D : v (τ ) v (τ +) } (deactivated (equality) constraints) For finite set D D define V D := { v V : D(v) D } a linear space with dim(v D ) = 2 + #D

21 Target functional L(θ) := θ d P e θ dm with directional derivatives DL(θ, v) := lim t 0 + L(θ + tv) L(θ) t

22 Target functional L(θ) := θ d P e θ dm with directional derivatives DL(θ, v) := lim t 0 + L(θ + tv) L(θ) t Characterization of θ. θ V Θ equals θ if, and only if, DL(θ, v) 0 whenever θ + tv Θ for some t > 0.

23 Local Search (Shape-constrained Newton). Let θ V Θ. θ new Newton(θ, V D(θ) ) δ DL(θ, θ new θ) while δ > δ o do while L(θ new ) < L(θ) + δ/3 do θ new (θ + θ new )/2 δ δ/2 end while if θ new Θ do t o ma { t (0, 1] : (1 t)θ + tθ new Θ } θ new (1 t o )θ + t o θ new end if θ θ new θ new Newton(θ, V D(θ) ) δ DL(θ, θ new θ) end while

24 Essential properties of local search:

25 Essential properties of local search: L(θ) increases

26 Essential properties of local search: L(θ) increases D(θ) decreases

27 Essential properties of local search: L(θ) increases D(θ) decreases Eventually, θ is locally optimal (appro.): θ = arg ma η V D(θ) L(η)

28 Checking optimality. Let θ V Θ be locally optimal (appro.). Then θ = θ if, and only if, DL(θ, V τ ) 0 for all τ D \ D(θ) where V τ () := { ( τ) + in Setting 1 +( τ) + in Setting 2A

29 Checking optimality. Let θ V Θ be locally optimal (appro.). Then θ = θ if, and only if, DL(θ, V τ ) 0 for all τ D \ D(θ) where V τ () := { ( τ) + in Setting 1 +( τ) + in Setting 2A If not, determine τ(θ) D \ D(θ) such that 0 < DL(θ, V τ(θ) ) ma τ D\D(θ) DL(θ, V τ ) and run a modified local search.

30 Modified local search. θ new Newton(θ, V D(θ) {τ (θ)} ) δ DL(θ, θ new θ) while δ > δ o do while L(θ new ) < L(θ) + δ/3 do θ new (θ + θ new )/2 δ δ/2 end while if θ new Θ do t o ma { t (0, 1] : (1 t)θ + tθ new Θ } θ new (1 t o )θ + t o θ new end if θ θ new θ new Newton(θ, V D(θ) ) δ DL(θ, θ new θ) end while

31 Remarks. After finitely many (modified) local searches algorithm will stop at θ = θ (appro.)

32 Remarks. After finitely many (modified) local searches algorithm will stop at θ = θ (appro.) Replace simple kink functions V τ = ±( τ) + with localized versions to gain numerical precision

33 Remarks. After finitely many (modified) local searches algorithm will stop at θ = θ (appro.) Replace simple kink functions V τ = ±( τ) + with localized versions to gain numerical precision In Settings 2A and 2B, the function τ DL(θ, V τ ) is strictly concave on any interval ( i, i+1 )...

34 Eample for Setting 1. n = 800 observations from Gamma(3, 1) Starting point: LL = phi()

35 LL = , dir. deriv. = phi(), phi.new()

36 LL = , dir. deriv. = phi(), phi.new()

37 LL = , dir. deriv. = phi(), phi.new()

38 Newton proposals in dimensions 7, 7, 7, 7, 7, 7 Local optimum: LL = phi()

39 LL = , dir. deriv. = phi()

40 LL = , dir. deriv. = phi(), phi.new()

41 phi()

42 LL = , dir. deriv. = phi(), phi.new()

43 phi()

44 Newton proposals in dimensions 8, 7, 6, 6 Local optimum: LL = phi()

45 LL = , dir. deriv. = phi()

46 LL = , dir. deriv. = phi(), phi.new()

47 Newton proposals in dimensions 7, 7, 7, 7 Local optimum: LL = phi()

48 LL = , dir. deriv. = phi()

49 LL = , dir. deriv. = phi(), phi.new()

50 phi()

51 LL = , dir. deriv. = phi(), phi.new()

52 phi()

53 LL = , dir. deriv. = phi(), phi.new()

54 Newton proposals in dimensions 8, 7, 6, 6, 6 Local optimum: LL = phi()

55 LL = , dir. deriv. = phi()

56 Local optimum: LL = phi()

57 LL = , dir. deriv. = phi()

58 LL = , dir. deriv. = e-05 phi(), phi.new()

59 Local optimum: LL = phi()

60 After 14 (modified) local searches, 45 Newton proposals:

61 After 14 (modified) local searches, 45 Newton proposals: Global optimum: LL = φ()

62 Directional derivatives for etra knot at τ: DL(φ, V τ ) τ

63 Log-densities: φ()

64 Densities: f()

65 Eample for Setting 2B. n = 1000 observations from P P 0 = Gamma(1.5, 1.2) p 0(), p(), p()

66 θ(), θ()

67 Open questions for Settings 2A-B Suppose we replace discrete P with arbitrary distribution.

68 Open questions for Settings 2A-B Suppose we replace discrete P with arbitrary distribution. Under which conditions on P and P 0 eists a unique ( θ arg ma θ d P ) e θ dp 0? θ Θ

69 Open questions for Settings 2A-B Suppose we replace discrete P with arbitrary distribution. Under which conditions on P and P 0 eists a unique ( θ arg ma θ d P ) e θ dp 0? θ Θ What continuity properties has the mapping P θ?

70 Open questions for Settings 2A-B Suppose we replace discrete P with arbitrary distribution. Under which conditions on P and P 0 eists a unique ( θ arg ma θ d P ) e θ dp 0? θ Θ What continuity properties has the mapping P θ? (Analogous questions for Setting 1 well understood)

71 References Groeneboom, Jongbloed, Wellner (2008). The support reduction algorithm for computing nonparametric function estimates in miture models. Scand. J. Statist. 35 D., Hüsler, Rufibach (2007/2011). Active set and EM algorithms for log-concave densities based on complete and censored data. Technical report, IMSV, University of Bern (ariv: ) D., Mösching, Strähl (2018). Active set algorithms for density estimators under shape-constraints. In preparation McCullagh, Polson (2012). Tail inflation. Biometrika 99 D., Samworth, Schuhmacher (2011). Approimation by log-concave distributions, with applications to regression. Ann. Statist. 39

Maximum likelihood estimation of a log-concave density based on censored data

Maximum likelihood estimation of a log-concave density based on censored data Dominic Schuhmacher Institute of Mathematical Statistics and Actuarial Science University of Bern Joint work with Lutz Dümbgen