The Poisson-Dirichlet Distribution: Constructions, Stochastic Dynamics and Asymptotic Behavior

Similar documents
The two-parameter generalization of Ewens random partition structure

THE POISSON DIRICHLET DISTRIBUTION AND ITS RELATIVES REVISITED

The Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion. Bob Griffiths University of Oxford

Inhomogeneous Wright Fisher construction of two-parameter Poisson Dirichlet diffusions

for all f satisfying E[ f(x) ] <.

WXML Final Report: Chinese Restaurant Process

On the posterior structure of NRMI

ON COMPOUND POISSON POPULATION MODELS

Gamma-Dirichlet Structure and Two Classes of Measure-Valued Processes

Optimal filtering and the dual process

Bayesian Nonparametrics: some contributions to construction and properties of prior distributions

Discussion of On simulation and properties of the stable law by L. Devroye and L. James

Bayesian Nonparametrics

Asymptotics for posterior hazards

Lecture 4: Introduction to stochastic processes and stochastic calculus

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals

Probabilistic number theory and random permutations: Functional limit theory

On The Mutation Parameter of Ewens Sampling. Formula

Measure-valued processes and related topics

Bayesian nonparametric models for bipartite graphs

Irregular Birth-Death process: stationarity and quasi-stationarity

Joyce, Krone, and Kurtz

Poisson Process and Poisson Random Measure

REVERSIBLE MARKOV STRUCTURES ON DIVISIBLE SET PAR- TITIONS

Stochastic Demography, Coalescents, and Effective Population Size

Branching Processes II: Convergence of critical branching to Feller s CSB

Department of Statistics. University of California. Berkeley, CA May 1998

Slice sampling σ stable Poisson Kingman mixture models

Evolution in a spatial continuum

New Dirichlet Mean Identities

The Combinatorial Interpretation of Formulas in Coalescent Theory

On Simulations form the Two-Parameter. Poisson-Dirichlet Process and the Normalized. Inverse-Gaussian Process

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

L n = l n (π n ) = length of a longest increasing subsequence of π n.

Pathwise construction of tree-valued Fleming-Viot processes

Foundations of Nonparametric Bayesian Methods

Some Properties of NSFDEs

Truncation error of a superposed gamma process in a decreasing order representation

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Dependent hierarchical processes for multi armed bandits

An ergodic theorem for partially exchangeable random partitions

Spring 2012 Math 541B Exam 1

Stochastic flows associated to coalescent processes

Bayesian Nonparametrics: Dirichlet Process

Learning Session on Genealogies of Interacting Particle Systems

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Truncation error of a superposed gamma process in a decreasing order representation

SIMILAR MARKOV CHAINS

Two viewpoints on measure valued processes

On Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation

EXCHANGEABLE COALESCENTS. Jean BERTOIN

Lectures on Stochastic Stability. Sergey FOSS. Heriot-Watt University. Lecture 4. Coupling and Harris Processes

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Evgeny Spodarev WIAS, Berlin. Limit theorems for excursion sets of stationary random fields

SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES

MOMENTS AND CUMULANTS OF THE SUBORDINATED LEVY PROCESSES

Large Deviations from the Hydrodynamic Limit for a System with Nearest Neighbor Interactions

The Continuity of SDE With Respect to Initial Value in the Total Variation

Abstract These lecture notes are largely based on Jim Pitman s textbook Combinatorial Stochastic Processes, Bertoin s textbook Random fragmentation

Probability and Measure

Optimal filtering and the dual process

The Moran Process as a Markov Chain on Leaf-labeled Trees

Bayesian nonparametrics

Bayesian nonparametric latent feature models

On a coverage model in communications and its relations to a Poisson-Dirichlet process

STOCHASTIC GEOMETRY BIOIMAGING

On the quantiles of the Brownian motion and their hitting times.

Invariance Principle for Variable Speed Random Walks on Trees

Asymptotics for posterior hazards

Convergence Time to the Ewens Sampling Formula

A Note on Certain Stability and Limiting Properties of ν-infinitely divisible distributions

LIST OF MATHEMATICAL PAPERS

Dynamics of the evolving Bolthausen-Sznitman coalescent. by Jason Schweinsberg University of California at San Diego.

THE STANDARD ADDITIVE COALESCENT 1. By David Aldous and Jim Pitman University of California, Berkeley

Definition: Lévy Process. Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 2: Lévy Processes. Theorem

CS Lecture 19. Exponential Families & Expectation Propagation

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

2008 Hotelling Lectures

Filtrations, Markov Processes and Martingales. Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 3: The Lévy-Itô Decomposition

Lecture 17 Brownian motion as a Markov process

The mathematical challenge. Evolution in a spatial continuum. The mathematical challenge. Other recruits... The mathematical challenge

GARCH processes continuous counterparts (Part 2)

On the Truncation Error of a Superposed Gamma Process

Small parts in the Bernoulli sieve

Particle models for Wasserstein type diffusion

n [ F (b j ) F (a j ) ], n j=1(a j, b j ] E (4.1)

ELEMENTS OF PROBABILITY THEORY

Applied Mathematics Letters. Stationary distribution, ergodicity and extinction of a stochastic generalized logistic system

classes with respect to an ancestral population some time t

The Wright-Fisher Model and Genetic Drift

PITMAN S 2M X THEOREM FOR SKIP-FREE RANDOM WALKS WITH MARKOVIAN INCREMENTS

Invariant Measures for the Continual Cartan Subgroup

Shifting processes with cyclically exchangeable increments at random

The ubiquitous Ewens sampling formula

The Lévy-Itô decomposition and the Lévy-Khintchine formula in31 themarch dual of 2014 a nuclear 1 space. / 20

Kernel families of probability measures. Saskatoon, October 21, 2011

{σ x >t}p x. (σ x >t)=e at.

Random Fields: Skorohod integral and Malliavin derivative

Pathwise uniqueness for stochastic differential equations driven by pure jump processes

Transcription:

The Poisson-Dirichlet Distribution: Constructions, Stochastic Dynamics and Asymptotic Behavior Shui Feng McMaster University June 26-30, 2011. The 8th Workshop on Bayesian Nonparametrics, Veracruz, Mexico. Typeset by FoilTEX 1

Part I: Definition and Models GEM distribution, Poisson-Dirichlet Distribution and Dirichlet Process An Urn Model Dirichlet Distribution Derivation Subordinator Representation Gamma-Dirichlet Algebra Part II: Stochastic Dynamics Wright-Fisher Model Infinitely-Many-Alleles Model Fleming-Viot Process Dynamical Analogue of the Gamma-Dirichlet Algebra Part III: Asymptotics Sampling Formula and Large Sample Approximation Large θ Approximation Typeset by FoilTEX 2

Part I: Definitions and Models 1. GEM Distribution, Poisson-Dirichlet Disitrbution and Dirichlet Process Definition 1.1 For 0 α < 1, θ > α, let U 1, U 2,... be independent, and U i Beta(1 α, θ + iα). Set V 1 = U 1, V n = (1 U 1 ) (1 U n 1 )U n, n 2. The law of (V 1, V 2,...) is called the GEM distribution, denoted by GEM(α, θ). Definition 1.2 The law of the descending order statistics of V 1, V 2,... is called the two-parameter Poisson-Dirichlet distribution, denoted by P D(α, θ). The case α = 0 corresponds to Kingman s Poisson-Dirichlet distribution. Typeset by FoilTEX 3

Definition 1.3 Let S be a Polish space and ξ 1, ξ 2,... be iid with common diffuse law ν 0 on S, and independently, (P 1 (α, θ), P 2 (α, θ),...) follows the two-parameter Poisson-Dirichlet distribution. The random measure on S Ξ α,θ,ν0 (dx) = P i (α, θ)δ ξi (dx) i=1 is called the two-parameter Dirichlet process. = {x = (x 1, x 2,...) [0, 1] : 0 x i 1, = {(x 1, x 2,...) : x 1 x 2 0}, M 1 (S) = the set of all probabilities on S. x i 1}, i=1 Then the GEM distribution is a probability on, P D(α, θ) is a probability on, and the Dirichlet process is a probability on M 1 (S). Typeset by FoilTEX 4

2. An Urn Model Consider an urn that initially contains a black ball of mass θ. Balls are drawn from the urn successively with probabilities proportional to their masses. When a black ball is drawn, it is returned to the urn together with a black ball of mass α and a ball of new color with mass 1 α. If a non-black ball is drawn, it is returned to the urn with one additional ball of mass one with the same color. Colors are labelled 1, 2, 3,... in the order of appearance. Typeset by FoilTEX 5

For each n 1, let C i (n) denote the number of non-black balls with label 1 i n after n draws. Then ( C 1(n) n, C 2(n) n,..., C n(n) n, 0, 0,...) (V 1, V 2,...) in distribution. Similarly, let C [1] (n) C [2] (n)... denote the descending order statistics of C 1 (n), C 2 (n),.... Then ( C [1](n) n, C [2](n) n,..., C [n](n), 0, 0,...) (P 1 (α, θ), P 2 (α, θ),...) in distribution. n Typeset by FoilTEX 6

3. Dirichlet Distribution Derivation For any n 2, let (X1 n,..., Xn) n be a Dirichlet(a 1,..., a n ) random vector with order statistics (X[1] n,..., Xn [n]). Assume Then max{a 1,..., a n } 0, n, n a i θ, n. i=1 (X n [1],..., Xn [n], 0, 0,...) (P 1(0, θ), P 2 (0, θ),...) in distribution. This derivation explains the name of Poisson-Dirichlet and works for the case of α = 0. Typeset by FoilTEX 7

4. Subordinator Representation Definition 4.1 A process {τ s : s 0} is called a subordinator if it has stationary, independent, and non-negative increments with τ 0 = 0. Definition 4.2 A subordinator {τ s : s 0} has no drift if for any λ 0, s 0, { E[e λτ s ] = exp s 0 } (1 e λx )Λ(d x), where Λ is the Lévy measure on [0, ). Example 4.1 Poisson process {N s, s 0} with parameter c > 0, E[e λn s ] = exp{ cs(1 e λ )}. The corresponding Lévy measure is Λ(d x) = cδ 1. Typeset by FoilTEX 8

Example 4.2 The subordinator {γ s : s 0} is called a Gamma subordinator if its Lévy measure is Λ(dx) = x 1 e x d x, x > 0. In this case, E[e λτ s ] = exp { s 0 (1 e λx )x 1 e x d x } = 1 (1 + λ) s. Example 4.3 Stable subordinator {ρ s : s 0} with index α (0, 1). measure is Λ(d x) = αx (1+α) d x, x > 0. In this case, Its Lévy E[e λρ s ] = exp{ sγ(1 α)λ α }. Typeset by FoilTEX 9

Example 4.4 The subordinator {ϱ s : s 0} is a generalized Gamma process with scale parameter one (Brix (99), Lijoi, Mena and Prünster(07)) if its Lévy measure is Λ(d x) = Γ(1 α) 1 x (1+α) e x d x, x > 0. In this case, E[e λϱ s ] = exp{ s α ((λ + 1)α 1)}. In general, let {τ s : s > 0} be a drift free subordinator with Lévy measure Λ satisfying (i) Λ(0, ) = +, (ii) 0 x 1 Λ(d x) < +. Remark: It follows from Campbell s theorem that condition (ii) guarantees that for every t > 0, τ t < almost surely. Typeset by FoilTEX 10

Let V 1 (τ t ), V 2 (τ t ), denote the jump sizes of {τ s : s 0} up to time t, ranked by size. Then the sequence is infinite due to (i) and due to (ii). Set Then τ t = V i (τ t ) < i=1 U i (τ t ) = V i(τ t ) τ t. (U 1 (τ t ), U 2 (τ t ), ) forms a random discrete probability. Typeset by FoilTEX 11

It follows from direct calculation that the Lévy measures of the Gamma subordinator, the stable subordinator and the standard generalized Gamma process satisfy both (i) and (ii). Theorem 1. (Kingman (75), Pitman and Yor (97)) (1) For θ > 0, the law of (U 1 (γ θ ), U 2 (γ θ ), ) is PD(0, θ); (2) For 0 < α < 1, the law of (U 1 (ρ s ), U 2 (ρ s ), ) is the same for all positive s and is P D(α, 0); (3) For 0 < α < 1, θ > 0, the law of (U 1 (σ α,θ ), U 2 (σ α,θ ), ) is P D(α, θ), where ( γ( θ σ α,θ = ϱ γ( θ α )/Γ(1 α) =: ϱ α ) ). Γ(1 α) Typeset by FoilTEX 12

Example 4.5 (Brownian Motion). Let B t be the standard Brownian motion. Define Z = {t 0 : B t = 0} and let L t denote the local time of B t at zero. Set τ s = inf{t 0 : L t > s}. Then {τ s : s 0} is a stable subordinator with index 1/2 and Z is the closure of the range of {τ s, s 0}. Thus the length of the excursion interval of B t corresponds to the jump size of the stable subordiantor. For any t > 0, let V 1 (t) > V 2 (t) > be the ranked sequence of the excursion lengths up to time t including the meander (the last interval) length. Then the law of is P D(1/2, 0). ( V 1(t) t, V 2(t),...) t Typeset by FoilTEX 13

5. Gamma-Dirichlet Algebra The focus here will be on the case of α = 0. More thorough treatment of the general case can be found in James(03, 05a, 05b). Let S be Polish space, ν 0 a probability on S, θ and β any two positive numbers. Definition:The Gamma process with shape parameter θν 0 and scale parameter β is a random measure on S given by Υ β θ,ν 0 ( ) = β γ i δ ξi ( ) i=1 where γ 1 > γ 2 > are the points of the inhomogeneous Poisson point process on (0, ) with mean measure θx 1 e x d x, and independently, ξ 1, ξ 2,... are i.i.d. with common distribution ν 0. Typeset by FoilTEX 14

Denote the law of Υ β θ,ν 0 by Γ β θ,ν 0. The corresponding Laplace functional has the form: M(S) e µ,g Γ β θ,ν 0 (d µ) = exp{ θ ν 0, log(1 + βg) } where M(S) is the set of all non-negative finite measures on S, and g(s) > 1/β, for all s S. Typeset by FoilTEX 15

Set σ = γ i, i=1 P i = γ i, i = 1, 2,..., σ X θ,ν0 ( ) = P i δ ξi ( ). i=1 The law of (P 1, P 2,...) is the Poisson-Dirichlet distribution with parameter θ, X θ,ν0 ( ) equals in distribution to the Dirichlet process Ξ 0,θ,ν0 with law denoted by Π θ,ν0, and X θ,ν0 ( ) = Υβ θ,ν 0 ( ) Υ β θ,ν 0 (S). (1) Typeset by FoilTEX 16

Algebraic Relations 1 Additive property: for independent Υ β θ 1,ν 1 and Υ β θ 2,ν 2 Υ β θ 1,ν 1 + Υ β θ 2,ν 2 d = Υ β θ θ 1 +θ 2, 1, θ 1 +θ ν 1 + θ 2 2 θ 1 +θ ν 2 2 where d = denotes the equality in distribution. 2 Mixing: for independent η Beta(θ 1, θ 2 ), X θ1,ν 1, and X θ2,ν 2 ηx θ1,ν 1 + (1 η)x θ2,ν 2 d = Xθ1 +θ 2, θ 1 θ 1 +θ ν 1 + θ 2. 2 θ 1 +θ ν 2 2 3. Markov-Krein identity: (1 + λ ν, f ) 1 Π θ,ν0 (d ν) = exp{ θ ν 0, log(1 + λf) }. M 1 (S) Typeset by FoilTEX 17

Formal Hamiltonian Consider an abstract space Ω with a formal reference probability measure P (uniform or invariant in some sense). The formal Hamiltonian H(ω) is a function associated with another probability Q such that Q(dω) = Z 1 exp{ H(ω)}P(d ω). For each µ M(S), set µ( ) = µ( ) µ(s) M 1(S). Let and for any ν 1, ν 2 M 1 (S) φ(x) = x log x (x 1), x 0, Ent(ν 1 ν 2 ) = { M1(S) log d ν 1 d ν 2 d ν 1, ν 1 ν 2 +, else. Typeset by FoilTEX 18

Hamiltonian for Gamma process Γ β θ,ν 0 (Handa(01)): H g (µ) = θent(ν 0 µ) + µ(s) β βθ φ( µ(s) ) = angular component + radial component. Hamiltonian for Dirichlet process Π θ,ν0 (Dawson and F(98), Handa(01)): H d (ν) = θent(ν 0 ν), ν M 1 (S). For βθ = 1, we have H g (µ) µ(s)=1 = H d ( µ). Typeset by FoilTEX 19

Quasi-invariant Gamma Process Γ β θ,ν 0 (Tsilevich, Vershik and Yor (01)): Let B + (S) be the collection of positive Borel measurable functions on S with strictly positive lower bound. For each f in B + (S), set T f (µ)(d x) = f(x)µ(d x), µ M(S), and let T f (Γ β θ,ν 0 ) denote the image law of Γ β θ,ν 0 under T f. Then T f (Γ β θ,ν 0 ) and Γ β θ,ν 0 are mutually absolutely continuous and d T f (Γ β θ,ν 0 ) d Γ β θ,ν 0 (µ) = exp{ [θ ν 0, log f + µ, β 1 (f 1 1)]}. Typeset by FoilTEX 20

Dirichlet Process Π θ,ν0 (Handa (01)): For each f in B + (S), set Quasi-invariant(cont.) T f (ν)(d x) = f(x)ν(d x), ν M 1 (S), ν, f where ν, f denotes the integration of f with respect to ν. Let T f (Π θ,ν0 ) denote the image law of Π θ,ν0 under T f. Then T f (Π θ,ν0 ) and Π θ,ν0 are mutually absolutely continuous and d T f (Π θ,ν0 ) d Π θ,ν0 (ν) = exp{ θ[ ν 0, log f + log ν, f 1 ]}. Typeset by FoilTEX 21

Part II: Stochastic Dynamics 1. Wright-Fisher Model Consider a population of 2N individuals. Each individual is one of two types: a,a. The population evolves under the influence of mutation and random sampling. Mutation: the two types mutate to each other at rate u. Random sampling: individual in next generation is randomly sampled from the current population with replacement. Typeset by FoilTEX 22

Wright-Fisher Model(cont.) The distribution of types of the new generation is thus binomial with parameters p, 2N, where p is the proportion of type A individuals in the population after the mutation. Let X N (t) denote the proportion of type A individuals in the population at time (generation) t. Then X N (t) is the two type Wright-Fisher Markov chain. Diffusion approximation: Count the time in units of 2N generations and scale the mutation rate by a factor of 1/2N. The scaling limit proportion of type A individuals then follows SDE dx t = θ 4 (1 2x t)dt + x t (1 x t )db t, θ = 4Nu. The unique stationary distribution of the SDE is Beta( θ 2, θ 2 ). Typeset by FoilTEX 23

Finite Type Wright-Fisher Model Finite type models: Wright-Fisher model allows the number of types to be any finite number K. Mutation: symmetric parent independent, i.e., type i to type j with rate u/k. Sampling:multinomial sampling. Finite type diffusion approximation: dx i (t) = b i (x(t))dt + K 1 j=1 σ ij (x(t))db j (t) with b i (x(t)) = θ 2 ( 1 K x i(t)), Typeset by FoilTEX 24

and K 1 l=1 σ il (x(t))σ jl (x(t)) = x i (t)(δ ij x j (t)). The unique stationary distribution is Dirichlet( θ K,..., θ K ). The generator of the diffusion has the form K 1 2 [ i,j=1 2 x i (t)(δ ij x j (t)) + θ x i x j 2 K i=1 ( 1 K x i(t)) x i ]. Typeset by FoilTEX 25

2. Infinitely-Many-Alleles Model For any n 1, let and For f D 0, set L α,θ f(x) = 1 2 φ 1 (x) = 1, φ n (x) = x n i, n 2, x i=1 D 0 = algebra generated by {φ n : n 1}. i,j=1 2 f x i (δ ij x j ) x i x j (θx i + α) f x i. i=1 Typeset by FoilTEX 26

Theorem 2. (Petrov(09)) (1) The generator L α,θ defined on D 0 is closable in C( ). The closure, also denoted by L α,θ for notational simplicity, generates a unique -valued diffusion process X α,θ (t), the two-parameter infinite-allele diffusion process; (2) The process X α,θ (t) is reversible with respect to P D(α, θ). Remarks: (a) Theorems 2 is a generalization of the results in Ethier and Kurtz (81) and Ethier (92) where α = 0. Ruggiero and Walker(09) provides an alternate derivation of the infinite-allele diffusion process. The same model is studied in F and Sun (10) using techniques from the theory of Dirichlet forms. (b) Mimicking the derivation of the Poisson-Dirichlet distribution from the Dirichlet distributions, the case of α = 0 can be derived from the finite type Wright-Fisher diffusions by letting K tending to infinity. Typeset by FoilTEX 27

3. Fleming-Viot Process Let S be a compact metric space, C(S) be the set of continuous functions on S, and ν 0 a diffuse probability in M 1 (S). Consider operator A of the form Af(x) = θ 2 (f(y) f(x))ν 0 (dy), f C(S). Define D = {u : u(µ) = f( µ, φ ), f C b (R), φ C(S), µ M 1 (S)}, where µ, φ is the integration of φ with respect to µ and Cb of all bounded, infinitely differentiable functions on R. (R) denotes the set Typeset by FoilTEX 28

The Fleming-Viot process with neutral parent independent mutation (FVprocess) is a pure atomic measure-valued Markov process with generator where Au(µ) = µ( ), Aδu(µ)/δµ( ) + f ( µ, φ ) φ, φ µ, u D 2 = mutation + sampling, δu(µ)/δµ(x) = lim ε 0+ ε 1 {u((1 ε)µ + εδ x ) u(µ)}, φ, ψ µ = µ, φψ µ, φ µ, ψ, and δ x stands for the Dirac measure at x S. The FV-process is reversible with the Dirichlet process Π θ,ν as the reversible measure (Ethier (90)). Typeset by FoilTEX 29

4. Dynamical Analogue of the Gamma-Dirichlet Algebra The measure-valued branching diffusion with immigration {Y t }: L = 1 2 { θν 0 λµ, δ δµ(x) + µ, δ 2 δµ(x) 2 } = immigration and drift + branching where θ > 0, λ = 1 β > 0, ν 0 M 1 (S). Let N t be the standard Poisson process with rate one and set C(λ, t) = λ 1 (e λt/2 1) q a,λ n (t) = P {N a/c(λ,t) = n}, a > 0, n = 0, 1, 2,.... Typeset by FoilTEX 30

Transition function(ethier and Griffiths(93)): P 1 (t, µ, ) = q µ(s),λ 0 (t)γ C( λ,t) + n=1 q n µ(s),λ (t) where η n = 1 n n i=1 δ x i. θ,ν 0 ( ) S n ( µ µ(s) )n (d x 1 d x n )Γ C( λ,t) n+θ, n θ+n η n+ θ+n θ ν 0 The process is reversible with reversible measure Γ β θ,ν 0. Marginal distribution: Given Y 0 = µ, N µ(s)/c(λ,t) = n, it follows that for any t > 0 ( ) Y t d = Υ C( λ,t) n,η n + Υ C( λ,t) θ,ν 0. Typeset by FoilTEX 31

Transition function of FV-process(Ethier and Griffiths(93)): P 2 (t, ν, ) = d θ 0(t)Π θ,ν0 ( ) + d θ n(t) ν n (d x 1 d x n )Π n+θ, n S n θ+n η n+ θ+n θ ν ( ) 0 n=1 where {d θ n(t) : t > 0, n = 0, 1,...} is the marginal distribution of a pure death process {D θ t, t 0} taking values in {, 0, 1,...} with death rates n(n + θ 1) { 2 : n = 0, 1,...} and entrance boundary. Typeset by FoilTEX 32

Coefficients: The pure death process {D θ (t) : t > 0} is the embedded chain of Kingman s coalescent and d θ n(t) is the probability of having n different families at time t or n lines of decent beginning at generation zero. The time-changed Poisson process N µ(s)/c(λ,t) is a time inhomogeneous pure death process with death rate n/2c( λ, t) from state n 0 and t > 0. It represents the number of non-immigrant individuals in the population at time t. Comparison between P 1 (t, µ, ) and P 2 (t, ν, ): structure between a termwise Gamma-Dirichlet Γ C( λ,t) n+θ, n ( ) and Π θ+n η n+ θ+n θ ν 0 n+θ, n θ+n η n+ θ+n θ ν ( ). 0 Typeset by FoilTEX 33

Part III: Asymptotics 1. Sampling Formula and Large Sample Approximation For any n 1, a random sample of size n from the two-parameter Poisson-Dirichlet population consists of n positive integer-valued random variables X 1,..., X n which, given (P 1 (α, θ),...) = (x 1,...), are iid with common distribution (x 1, x 2,...). Set A n = {(a 1,..., a n ) : a k is nonnegative integer, k = 1,..., n; n ia i = n}, A k = the number of values appearing in the sample exactly k times. i=1 An example: n = 3. If one value appears in the sample once and the other appears twice. Then A 1 = 1, A 2 = 1, A 3 = 0. If three values appear in the sample, then A 1 = 3, A 2 = 0, A 3 = 0. If there are only one value in the sample, then A 1 = 0, A 2 = 0, A 3 = 1. Typeset by FoilTEX 34

An equivalent way is to partition [0, 1] into a random countable union of disjoint subintervals with interval-length given by (P 1 (α, θ), P 2 (α, θ),...). Uniformly pick n points from the unit interval. Then A k will be the number of intervals that contain exactly k points. Theorem 3. (Ewens (72), Pitman (92)) The random vector A n = (A 1,..., A n ) is a A n -valued random variable with distribution given by the well known Pitman sampling formula: P{A i = a i, i = 1,..., n} = n! Π k 1 l=0 θ (θ + ((1 α) (j 1) ) a j lα)πn j=1 (n) (j!) a, j(a j!) where a i > 0, n j=1 ja j = n, and k = n j=1 a j. The notation x (m) is the ascending factorial defined by x(x + 1) (x + m 1). The case of α = 0 is the well known Ewens sampling formula. Typeset by FoilTEX 35

Large Sample Approximation An important random variable is K n (α, θ) = total number of different values in the sample, where the special case K n (0, θ) is a sufficient statistic for θ. Theorem 4. (Korwar and Hollander(73)) d K n (0, θ) = η 1 + + η n, where η 1, η 2,..., η n are independent and η i has Bernoulli distribution with success probability θ θ+i 1 and lim n K n (0, θ) log n = θ almost surely. Typeset by FoilTEX 36

Theorem 5. (Fluctuation Theorem) (1)(Hansen(94)(fixed θ), and Goncharov(44)(θ = 1)) K n (0, θ) θ log n θ log n = N(0, 1), n. (2)(Pitman(06)) K n (α, θ) n α S α,θ almost surely, where S α,θ is related to the Mittag-Leffler distribution. (3)(Arratia, Barbour and Tavaré(92)) When α = 0, (A 1,..., A n, 0,...) (Y 1, Y 2,..), n, where Y 1, Y 2,... are independent Poisson random variable with mean θ i. Typeset by FoilTEX 37

Large Deviations Large deviations for a family of probability measures {P λ : λ index set} on space E are estimations of the following type: P λ {G} exp{ a(λ) inf x G I(x)}, where a(λ) is called the large deviation speed, and nonnegative function I( ) is the rate function. It describes the most likely event among all unlikely events. Theorem 6. (F and Hoppe(98)) (a) For appropriate subset A of [0, ), we have P {K n (0, θ)/ log n A} exp{ log n inf x A I(x)} where I(x) = x log x θ x + θ. Typeset by FoilTEX 38

(b) For 0 < α < 1 and appropriate subset A of [0, ), P {K n (α, θ)/n A} exp{ n inf x A I(x)} where and Λ α (λ) = I(x) = sup{λx Λ α (λ)}, λ { log[1 (1 e λ ) α] 1 if λ > 0, 0, else Typeset by FoilTEX 39

2. Large θ Approximaton Law of Large Numbers Noting that the parameter θ is the scaled population mutation rate in the case α = 0. In general, it is related in certain way to the size of a population. The limiting procedure of θ tending to infinity is thus associated with the behavior of a population when the population size tends to infinity. WLLN: lim θ (P 1 (α, θ), P 2 (α, θ),...) = (0, 0,...). Typeset by FoilTEX 40

Fluctuations Consider a sequence of random variables ζ 1 ζ 2... such that for each r 1, 1 k r, ζ 1 e u e u, i.e., ζ 1 has Gumbel distribution, ζ k 1 (k 1)! exp{ (ku + e u )}, (ζ 1,..., ζ r ) exp{ (u 1 + + u r ) e u r }, u 1 u 2 u r. Set β(α, θ) = log θ (1 + α) log log θ log Γ(1 α). Theorem 7. (Handa (09)) For each r 1, (θp 1 (α, θ) β(α, θ),..., θp r (α, θ) β(α, θ)) (ζ 1,..., ζ r ) as θ converges to infinity. Typeset by FoilTEX 41

Large Deviations Theorem 8. have (F(07), F and Gao (10)). For any appropriate subset B of, we P D(α, θ){b} exp{ θ inf x B I(x)} where I(x) = { log 1 1 i=1 x i,, else. i=1 x i < 1 Typeset by FoilTEX 42

Dirichlet Process Theorem 9. (LLN) As θ tends to infinity, the Dirichlet process Ξ α,θ,ν0 (dx) = P i (α, θ)δ ξi (dx) i=1 converges in probability to ν 0 in space M 1 (S). For S = [0, 1], treat Ξ α,θ,ν0 ([0, t]) as a random process in time t and ν 0 ([0, t]) as a function of t. Let ˆB(t) denote the Brownian bridge over [0, 1]. Typeset by FoilTEX 43

Theorem 10. (CLT, James(08)) The random process θ[ξα,θ,ν0 ([0, t]) ν 0 ([0, t])] converges in distribution to 1 α ˆB(ν0 ([0, t])) as θ tends to infinity. Typeset by FoilTEX 44

Theorem 11. (Large Deviations, F(07)) For appropriate subset C of M 1 (S), where Π α,θ,ν0 {C} exp{ θ inf µ C I(µ)} I(µ) = sup { 1 f>0,f C(S) α log ν 0, f α + 1 µ, f } and when α = 0, I(µ) becomes the relative entropy of ν 0 with respect to µ. Typeset by FoilTEX 45

References R. Arratia, A.D. Barbour and S. Tavaré (1992). Poisson process approximations for the Ewens sampling formula. Ann. Appl. Probab. 2, 519 535. A. Brix (1999). Generalized gamma measures and shot-noise Cox processes. Adv. Appl. Probab. 31, 929 953. D.A. Dawson and S. Feng (1998). Large deviations for the Fleming Viot process with neutral mutation and selection. Stoch. Proc. Appl. 77, 207 232. D.A. Dawson and S. Feng (2006). Asymptotic behavior of Poisson-Dirichlet distribution for large mutation rate. Ann. Appl. Probab., 7, No. 2, 562 582. S.N. Ethier (1990). The infinitely-many-neutral-alleles diffusion model with ages. Adv. Appl. Probab. 22, 1 24. S.N. Ethier (1992). Eigenstructure of the infinitely-many-neutral-alleles diffusion model. J. Appl. Probab. 29, 487 498. Typeset by FoilTEX 46

S.N. Ethier and R.C. Griffiths (1993). The transition function of a Fleming Viot process. Ann. Probab. 21, No. 3, 1571 1590 S.N. Ethier and T.G. Kurtz (1981). The infinitely-many-neutral-alleles diffusion model. Adv. Appl. Probab. 13, 429 452. W.J. Ewens (1972). The sampling theory of selectively neutral alleles. Theor. Pop. Biol. 3, 87 112. S. Feng (2007). Large deviations associated with Poisson-Dirichlet distribution and Ewens sampling formula. Ann. Appl. Probab. 17, Nos. 5/6, 1570 1595. S. Feng (2009). Poisson-Dirichlet distribution with small mutation rate. Stoch. Proc. Appl. 119, 2082 2094. S. Feng and F. Gao (2010). Asymptotics results for the two-parameter Poisson- Dirichlet distribution. Stoch. Proc. Appl. 120, 1159 1177. S. Feng and F.M. Hoppe (1998). Large deviation principles for some random Typeset by FoilTEX 47

combinatorial structures in population genetics and Brownian motion. Ann. Appl. Probab. 8, No. 4, 975 994. S. Feng and W. Sun (2010). Some diffusion processes associated with two parameter Poisson Dirichlet distribution and Dirichlet process. Probab. Theory Relat. Fields, 148, No. 3-4, 501 525. V.L. Goncharov (1944). Some facts from combinatorics. Izvestia Akad. Nauk. SSSR, Ser. Mat. 8, 3 48. R.C. Griffiths (1979). On the distribution of allele frequencies in a diffusion model. Theor. Pop. Biol. 15, 140-158. K. Handa (2001). Quasi-invariant measure and their characterization by conditional probabilities. Bull. Sci. Math., 125, No. 6-7, 583-604. K. Handa (2009). The two-parameter Poisson Dirichlet point process. Bernoulli, 15, No. 4, 1082-1116. Typeset by FoilTEX 48

J.C. Hansen (1990). A functional central limit theorem for the Ewens sampling formula. J. Appl. Probab. 27:28 43. L.F. James (2003). Bayesian calculus for gamma processes with applications to semiparametric intensity models. Sankhyā 65, No. 1, 179 206. L.F. James (2005a). Bayesian Poisson process partition with an application to Bayesian Lévy moving averages. Ann. Statist. 33, No. 4, 1771 1799. L.F. James (2005b). Functionals of Dirichlet processes, the Cifarelli-Regazzini identity and beta-gamma processes. Ann. Statist. 33, No. 2, 647 660. L.F. James (2008). Large sample asymptotics for the two-parameter Poisson Dirichlet processes. In Bertrand Clarke and Subhashis Ghosal, eds, Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh, 187 199, Institute of Mathematical Statistics Collections, Vol. 3, Beachwood, Ohio. P. Joyce, S.M. Krone, and T.G. Kurtz (2002). Gaussian limits associated with Typeset by FoilTEX 49

the Poisson-Dirichlet distribution and the Ewens sampling formula. Ann. Probab. 12, No. 1, 101 124. Appl. J.C.F. Kingman (1975). Random discrete distributions. J. Roy. Statist. Soc. B 37, 1 22. R.M. Korwar and M. Hollander (1973). Contributions to the theory of Dirichlet processes. Ann. Probab. 1, 705 711. A. Lijoi, R.H. Mena, and I. Prünster (2007). Controlling the reinforcement in Bayesian non-parametric mixture models. J. Roy. Statist. Soc. B 69, 715 740. L.A. Petrov (2009). Two-parameter family of infinite-dimensional diffusions on the Kingman simplex. Funct. Anal. Appl. 43, No. 4, 279 296. J. Pitman (1992). The two-parameter generalization of Ewens random partition structure. Technical Report 345, Dept. Statistics, University of California, Berkeley. J. Pitman (2006). Combinatorial Stochastic Processes, Ecole d Été de Typeset by FoilTEX 50

Probabilités de Saint Flour, Lecture Notes in Math., Vol. 1875, Springer-Verlag, Berlin. J. Pitman and M. Yor (1997). The two-parameter Poisson Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25, No. 2, 855 900. M. Ruggiero and S.G. Walker (2009). Countable representation for infinitedimensional diffusions derived from the two-parameter Poisson-Dirichlet process.. Electro. Comm. Probab. 14, 501-517. N.V. Tsilevich, A. Vershik, and M. Yor (2001). An infinite-dimensional analogue of the Lebesgue measure and distinguished properties of the gamma process. J. Funct. Anal. 185, No. 1, 274 296. Typeset by FoilTEX 51

THANK YOU! Typeset by FoilTEX 52