Minimum Message Length Analysis of the Behrens Fisher Problem

Size: px

Start display at page:

Download "Minimum Message Length Analysis of the Behrens Fisher Problem"

Clemence Davis
6 years ago
Views:

1 Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011

2 Outline Introduction 1 Introduction Problem Description 2 The Wallace Freeman approximation 3 4

3 Outline Introduction Problem Description 1 Introduction Problem Description 2 The Wallace Freeman approximation 3 4

4 Problem Description (1) Problem Description We have two mutually independent sequences of i.i.d. data y 1 = (y 11,..., y 1n1 ) and y 2 = (y 21,..., y 1n2 ) Data assumed to be generated by the Gaussian model y ij N(µ i, τ i ) (i = 1, 2; j = 1,..., n i ) The sequence means and variances are unknown µ = (µ 1, µ 2 ) and τ = (τ 1, τ 2 )

5 Problem Description (2) Problem Description Task Is there a difference between the two population means? µ 1 = µ 2? Existing solutions Frequentist (based on Student t pivot) Bayes factor We will use (MML)

6 Problem Description (2) Problem Description Task Is there a difference between the two population means? µ 1 = µ 2? Existing solutions Frequentist (based on Student t pivot) Bayes factor We will use (MML)

7 Problem Description (2) Problem Description Task Is there a difference between the two population means? µ 1 = µ 2? Existing solutions Frequentist (based on Student t pivot) Bayes factor We will use (MML)

8 Outline Introduction The Wallace Freeman approximation 1 Introduction Problem Description 2 The Wallace Freeman approximation 3 4

9 (1) Introduction The Wallace Freeman approximation Practical implementation of theory of inductive inference Initially proposed by Solomonoff Model that yields the briefest encoding of data in a hypothetical message is optimal The message comprises the assertion, statement describing a particular model θ Θ R k the detail, encoding of the data y using the assertion model θ

10 (2) Introduction The Wallace Freeman approximation The total length of the two-part message, I(θ, y), is sum of the lengths of the assertion and the detail I(θ, y) = I(θ) + I(y θ) MML advocates choosing model θ that minimises the codelength of the two-part message

11 MML87 (1) Introduction The Wallace Freeman approximation The Wallace Freeman, or MML87 codelength, for model θ Θ R k and data y is I 87 (y, θ) = log π(θ) log J θ(θ) + k 2 log κ k + k log p(y θ) }{{}} 2 {{} I 87 (θ) I 87 (y θ) p(y θ) denotes the likelihood function π( ) is a prior distribution over the parameter space Θ R k J θ (θ) is the Fisher information matrix κ k is the normalised second moment of an optimal quantising lattice in k-dimensions

12 MML87 (2) Introduction The Wallace Freeman approximation The model that minimises I 87 (y, θ) is (a posteriori) most plausible ˆθ 87 (y) = arg min {I 87 (y, θ)} θ Θ MML treats both parameter estimation and model class selection on the same footing Wallace Freeman codelengths are invariant under a smooth, one-to-one reparameterisation of the parameters

13 Outline Introduction 1 Introduction Problem Description 2 The Wallace Freeman approximation 3 4

14 MML Solution Introduction The MML solution to the Behrens Fisher problem requires codelength of data under Shared mean model (µ 1 = µ 2 ) Different means model (µ 1 µ 2 ) The model resulting in the shortest codelength is preferred Let δ = I 87 (y, ˆµ, ˆτ ) I 87 (y, ˆµ, ˆτ ) If δ < 0, single population mean preferred The term exp( δ) is the posterior odds in favour of the model with common population mean

15 (1) Let y = (y 1, y 2 ) denote the observed data Parameters to be estimated θ = (µ, τ ) R 3, τ = (τ 1, τ 2 )

16 (2) The negative log-likelihood function log p(y θ) = n 2 log 2π+ 1 2 n i log τ i + 1 n i (y ij µ) 2 2 τ i i=1 j=1 The determinant of the Fisher information matrix ( 2 ) (n1 n i J(θ) = + n ) 2 τ 1 τ 2 2τ 2 i=1 i

17 (3) Prior densities over the parameters θ π(θ) = π µ (µ)π τ (τ ) Population variances π τ (τ ) = (Ωτ 1 τ 2 ) 1, τ 1, τ 2 Ξ Population mean ( ) 1/2 1 n π(µ) = vol(λ 1 ) = 4y, µ Λ 1 y Λ 1 = { µ : nµ 2 y y }

18 (4) Prior density for the population mean Observed data y is generated from the model y = y + ε, ε N n (0, Σ n ) One can show that E (y y) = y y + tr (Σ n ) An estimate (1 n ˆµ) of µ of y should then satisfy y y (1 n ˆµ) (1 n ˆµ) = nˆµ 2

19 (5) Total Wallace Freeman codelength, I 87 (y, µ, τ ) n 2 log 2π n i log τ i + 1 n i (y ij µ) ( 2 τ i=1 i 2 log n1 + n ) 2 τ j=1 1 τ 2 ( ) log Ω 2 (y y) 2 n i 2 32 n Wallace Freeman parameter estimates (ˆµ, ˆτ ) = arg min µ,τ {I 87(y, µ, τ )} i=1

20 (1) Parameters to be estimated θ = (µ, τ ) R 4 where µ = (µ 1, µ 2 ), τ = (τ 1, τ 2 ) The negative log-likelihood function log p(y θ) = n 1 2 log 2πτ 1 + n 2 2 log 2πτ n i (y ij µ i ) 2 i=1 j=1 2τ i The determinant of the Fisher information matrix is ( ) 2 n 2 J(θ) = i i=1 2τ 3 i

21 (2) Prior densities over the parameters θ π(θ) = π µ (µ)π τ (τ ) Population variances π τ (τ ) = (Ωτ 1 τ 2 ) 1, τ 1, τ 2 Ξ Population means π(µ) = Λ 2 = 1 vol(λ 2 ) = n1 n 2 πy y, µ Λ 2 { } 2 (µ 1, µ 2 ) : n i µ 2 i y y i=1

22 (3) Total Wallace Freeman codelength, I 87 (y, µ, τ ) ( 2 ) n 2 log 2π + 1 (n i 1) log ˆτ i + n i=1 + log (y y n 1 n 2 Ωπ/2) 3 14 Wallace Freeman parameter estimates ˆµ i = 1 n i y ij, ˆτ i = 1 (y ij ˆµ i ) 2, (i = 1, 2) n i n j=1 i 1 j=1 n i

23 Outline Introduction 1 Introduction Problem Description 2 The Wallace Freeman approximation 3 4

24 (1) MML approach empirically compared to standard procedures using artificial data Hypothesis testing Parameter estimation

25 (2) Criterion n 1 n MML Student t Bayesian

26 (3) Median Kullback Leibler divergence computed over 10 5 iterations between the data generating distribution and the MML and ML estimators Estimator n 1 n MML ML

The Minimum Message Length Principle for Inductive Inference

The Minimum Message Length Principle for Inductive Inference The Principle for Inductive Inference Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne University of Helsinki, August 25,