New Dirichlet Mean Identities

Size: px

Start display at page:

Download "New Dirichlet Mean Identities"

Philippa Cain
6 years ago
Views:

1 Hong Kong University of Science and Technology Isaac Newton Institute, August 10, 2007

2 Origins CIFARELLI, D. M. and REGAZZINI, E. (1979). Considerazioni generali sull impostazione bayesiana di problemi non parametrici. Le medie associative nel contesto del processo aleatorio di Dirichlet I, II. Riv. Mat. Sci. Econom. Social 2, Later there is the important 1990 Annals of Statistics paper.

3 1 It is a cornerstone work in Bayesian NP and I believe one of the most important contributions in the theory, and indeed application, of random processes. 2 Why? Dirichlet means arise everywhere.

4 Some things to keep in mind Hard and very technical stuff But Cifarelli and Regazzini already did the hard work so we do not have to Our Task is really how to figure out how to use their results. Not so much on how to re-prove them. For some recent non-trivial applications look me up on the math Arxiv I never really do hard things. This is not a Joke

5 Theme Prendere due piccioni con una fava

6 Dirichlet process Let X be a non-negative random variable with cumulative distribution function F X. Note furthermore for a measurable set C, we use the notation F X (C) to mean the probability that X is in C. One may define a Dirichlet process random probability measure, say P θ, on [0, ) with total mass parameter θ and prior parameter F X, via its finite dimensional distribution as follows; for any disjoint partition on [0, ), say (C 1,..., C k ), the distribution of the random vector. (P θ (C 1 ),..., P θ (C k )) is a k-variate Dirichet distribution with parameters (θf X (C 1 ),..., θf X (C k )).

7 Hence for each C, P θ (C) = 0 I(x C)P θ (dx) has a beta distribution with parameters (θf X (C), θ(1 F X (C))).

8 Equivalently setting θf X (C i ) = θ i for i = 1,..., k, ( ) (P θ (C 1 ),..., P θ (C k )) = d Gθi ; i = 1,..., k G θ where (G θi ) are independent random variables with gamma(θ i, 1) distributions and G θ = G θ1 + + G θk has a gamma(θ, 1) distribution.

9 This means that one can define the Dirichlet process via the normalization of an independent increment gamma process on [0, ), say γ θ ( ), as P θ ( ) = γ θ( ) γ θ ([0, )) where γ θ (C i ) d = G θi and whose almost surely finite total random mass is γ θ ([0, )) d = G θ.

10 A very important aspect of this construction is the fact that G θ is independent of P θ, and hence any functional P θ (g) = 0 g(x)p θ (dx)

11 Furthermore θ > 0, G θ P θ (g) = d 0 g(x)γ θ (dx) = γ θ (g) See for instance Lijoi and Regazzini- AOP

12 Cifarelli and Regazzini/Markov Krein ID Now, E[e λg θp θ (g) ] = E[(1 + λp θ (g)) θ ] The Laplace transform of a random variable representable as γ θ (g) is equivalent to the Cauchy-Stieltjes transform of order θ of a random variable representable as P θ (g)

13 Something to keep in mind Lets think about random variables on the real line rather than random processes in some Banach space

14 A more general statement: Psychologically easier to prove???!!! Let M be a positive random variable independent of G θ, and define the random variable R θ d = Gθ M then from your first course in probability E[e λr θ ] = E[(1 + λm) θ ]

15 Question Are the results on the last two slides equivalent?

16 Answer-Questions γ θ (g) is always infinitely divisible Scale mixtures of gamma random variables are not always infinitely divisible. Scale mixtures of gamma random variables of index 0 < θ 1 are always infinitely divisible. If R θ is infinitely divisible is it true then?

17 Classical Question Recall that E[e λr θ ] = E[(1 + λm) θ ] In general :How to find the density of R θ? In general :How to find the density of M?

18 Classical but painful answer Utilize Classical Inversion formula You have to deal with the Analysis of Complex functions No simpler answer in general

19 Cifarelli and Regazzini provide answers for M d = P θ (g)

20 My Solutions Call up my Italian friends!! (Lijoi and Prünster) Build on existing results

21 Important? γ θ (g) constitute a large class of Infinitely divisible random variables. Id random variables generate EVERY positive Lévy process Levy processes are now the building blocks in many diverse fields, physics, finance, genetics, machine learning. All these models can be treated from a Bayesian NP point of view Levy processes are connected with special functions One wants to get the laws of certain functionals, BUT this is hard Simple example-finance: One may want to use a Barndorff-Nielsen OU SV process for option pricing without resorting to a series representation.

22 some more 1 Many important quantities can be represented as P θ (g) 2 Curiously James, Lijoi and Pruenster[JLP], show that for θ > 0 every linear functional of a two parameter (α, θ) Poisson Dirichlet random probability measure can be represented as a Dirichlet mean of order θ.

23 Define a Dirichlet mean of order θ indexed by F X as M θ (F X ) = 0 xp θ (dx)

24 Say that d T θ = Gθ M θ (F X ) is a GGC(θ, F X ) random variable which satisfies E[e λt θ ] = E[(1 + λm θ (F X )) θ ] = e θψ F X (λ)

25 where ψ FX (λ) = 0 log(1 + λx)f X (dx) = E[log(1 + λx)].

26 Key expressions for density Cifarelli and Regazzini(1990) 1 Φ FX (t) = 2 furthermore, define, 0 log( t x )I(t x)f X (dx) = E[log( t X )I(t X)] θ (t F X ) = 1 π sin(πθf X(t))e θφ F X (t).

27 Then from [CR] 1 The cdf of M θ (F X ) for all θ > 0 is expressible as x 2 when θ = 1, the density is, 0 (x t) θ 1 θ (t F X )dt ξ FX (x) = 1 (x F X ) = 1 π sin(πf X(x))e Φ F (x) X. 3 Density formulae for θ > 1 are described as ξ θfx (x) = (θ 1) x 0 (x t) θ 2 θ (t F X )dt.

28 Result James, Lijoi and Prünster(2006) Recognizing a connection of [CR] s result for the cdf to Abel transforms, [JLP] reasoned that it is rather straightforward to establish the following 1 for all θ > 0 ξ θfx (x) = 2 is the derivative of x 0 (x t) θ 1 (t F X )dt

29 In general the only really nice expression for the density is when θ = 1 ξ FX (x) = 1 (x F X ) = 1 π sin(πf X(x))e Φ F X (x).

30 But notice that θ (t F X ) = 1 π sin(πθf X(t))e θφ F X (t) > 0. for 0 < θ 1 and all t > 0.

31 Comments on possible random variables that are Dirichlet means 1 The formulae we just saw are general expressions and don t look too familiar 2 However, we (James, Roynette and Yor-Coming Soon) show that there are many many familiar random variables that are Dirichlet means 3 This includes Generalized Inverse Gaussian, Positive Stable, Pareto, Uniform and at the very least a large class of random variables that are generalized gamma convolutions 4 Note if for instance U is uniform[0,1] then for 0 < θ 2, there exist F Xθ such that U d = M θ (F Xθ )

32 A useful identity: Hjort and Ongaro 05 Let D = (D 1,..., D k ) denote a Dirichlet (θ 1,..., θ k ) random vector such that θ = k i=1 θ i. Then M θ (F X ) d = k D i M θi (F X ) i=1 where M θi (F X ) are independent and independent of D

33 Goal Use the analytic work used to obtain results for M θ (F X ) to obtain results for an entire family of Dirichlet mean random variables {M θ (F Zc ) : c > 0} As a byproduct we get results for T θ and Lévy processes derived from T θ

34 Simple examples are of course the choices Z c = X + c and Z c = cx, which, due to the linearity properties of mean functionals, results easily in the identities in law M θ (F X+c ) = c + M θ (F X ) and M θ (F cx ) = cm θ (F X )

35 General DP Mean Identities I introduced two simple ideas to address some of these points. The idea to construct such things was in part influenced quite a bit by some results of Pitman and Yor (1997) On length of excursions of bessel processes and others. Concerning multiplication by beta random variables and so on. The two results can be described as 1 Beta Scaling of DP Mean functionals 2 Gamma Tilting 3 Actually there are two more

36 Beta Scaling of DP mean functionals Let 0 < σ 1 and θ > 0. Let β θσ,θ(1 σ) denote a beta random variable. Then consider the following simple random variable β θσ,θ(1 σ) M θσ (F X ) Remember the density of M θσ (F X ) is not NICE except for θσ = 1.

37 Let Y σ denote a Bernoulli random variable with success probability σ. Consider the independent product XY σ with cdf F XYσ (x) = σf X (x) + (1 σ)i(x 0)

38 Result 1: change of total mass Then β θσ,θ(1 σ) M θσ (F X ) d = M θ (F XYσ )

39 Why? ψ FXYσ (λ) = but is equal to 0 σψ FX (λ) = σ log(1 + λw)f XYσ (dw) = E[log(1 + λxy σ )]. 0 log(1 + λx)f X (dx) = σe[log(1 + λx)].

40 One to many For every fixed θ how many mean functionals of order θ did we just create? Answer: An uncountable number indexed by 0 < σ 1 (M θ (F XYσ ) : 0 < σ 1)

41 One to many explicit densities How many mean functionals of order θ = 1 did we just create? Answer: An uncountable number indexed by 0 < σ 1 (M 1 (F XYσ ) : 0 < σ 1)

42 Φ FXYσ (x) = E[log( x XY σ )I(XY σ x)] = σφ FX (x) + (1 σ) log(x)

43 Result 2 The density of β σ,1 σ M σ (F X ) d = M 1 (F XYσ ) is ξ FXYσ (x) = xσ 1 π sin(πf XY σ (x))e σφ F X (x) Note you do not need to know the density of M σ (F X ).

44 Back to GGC Recall but Hence T θσ = G θσ M θσ (F X ) G θσ d = Gθ [β θσ,θ(1 σ) ] T θσ = G θσ M θσ (F X ) = G θ [β θσ,θ(1 σ) M θσ (F X )]

45 Hence i.e. it is a GGC(θ, F XYσ ) T θσ = G θσ M θσ (F X ) = G θ M θ (F XYσ )

46 Hence i.e. it is a GGC(1, F XYσ ) T σ = G σ M σ (F X ) = G 1 M 1 (F XYσ )

47 Exploiting infinite divisibility Let 0 < σ k = 1 Then k=1 T 1 d = k=1 T σk where T σk are independent and each has distribution d T σk = G1 M 1 (F XYσk )

48 GGC species sampling models Let (Z k ) denote iid random elements with some common distribution H. Then set P k = T σk /T 1 Define a random probability measure as, P k δ Zk (dx) = P(dx) k=1 THESE ARE NOT NRMI

49 That is, for any finite k (P 1,..., P k ) d = (G 1,i M 1 (F XYσi )/[T + S]; i = 1,..., k)

50 where T = and for σ k = 1 k i=1 σ i k G 1,i M 1 (F XYσi ) i=1 S d = G 1 M 1 (F XYσ k ) independent of T.

51 For fun one could choose for i = 1, 2,... σ i = λ i 1 e λ /(i 1)!

52 Fidi of GGC NRM Let µ( ) denote a completely random measure derived from T 1, such that E[µ(A)] = H(A) and µ(x ) = T 1 Then for any A, set H(A) = σ, and one has that µ(a) = G 1 M 1 (F XYσ ) moreover for a disjoint partition of some space X, (C 1,..., C N ) The fidi of µ( ) is composed of independent components so that µ(c i ) = G 1 M 1 (F XYσi )

53 but writing T 1 = N µ(c i ) k=1 This gives the fidi of the random probability measure, P( ) = µ( )/T 1 i.e. (P(C 1 ),..., P(C N ).) These NRM have appeared in James Lijoi and Pruenster (2005) but not the fidi part.

54 Now conditioning on T 1 = t, this gives the fidi of the corresponding conditional Poisson Kingman class, L((P(C 1 ),..., P(C N ) T 1 = t) Now mix over an arbitrary density γ(t). Hence I just created an uncountable number of random probability measures with explicit fidi s.

55 Approximating GGC NRM s Now suppose that for general θ, let µ θ ( ) denote a CRM derived from T θ. Then form the NRM by, P θ ( ) = µ θ ( )/T θ Now write T θ = N k=1 T θ/n,k where T θ/n,k are iid equal in distribution to T θ/n

56 Now we are interested in choosing N > θ, in this case the joint distribution of (T θ/n,k /T θ ; k N) can be computed and certainly easily simulated. Since Furthermore as N T θ/n d = G1 M 1 (F XYθ/N ) N k=1 T θ/n,k T θ δ Zk ( ) P θ ( ) We can also use the left hand side as a sieve by letting N grow with the data.

57 Special Known Case G θ/n,k iid gamma(θ/n, 1) then N k=1 where D θ is a Dirichlet process G θ/n,k G θ δ Zk ( ) D θ ( )

On the posterior structure of NRMI

On the posterior structure of NRMI Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with L.F. James and A. Lijoi Isaac Newton Institute, BNR Programme, 8th August 2007 Outline