Stat 8931 (Aster Models) Lecture Slides Deck 8

Similar documents
Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Likelihood Inference in Exponential Families and Generic Directions of Recession

Stat 8931 (Exponential Families) Lecture Notes Aster Models Charles J. Geyer December 5, 2016

Aster Models for Life History Analysis

3 Independent and Identically Distributed 18

Stat 8931 (Aster Models) Lecture Slides Deck 9 Directions of Recession (Solutions at Infinity )

An exponential family of distributions is a parametric statistical model having log likelihood

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

Lecture 26: Likelihood ratio tests

5601 Notes: The Sandwich Estimator

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Stat 710: Mathematical Statistics Lecture 27

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution. Charles J. Geyer School of Statistics University of Minnesota

Stat 5101 Lecture Notes

Charles J. Geyer. School of Statistics University of Minnesota

Lecture 16 Solving GLMs via IRWLS

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Aster Modeling of Chamaecrista fasciculata

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Aster Models Stat 8053 Lecture Notes

Lecture 17: Likelihood ratio and asymptotic tests

Probabilistic Graphical Models

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

Finite Math - J-term Section Systems of Linear Equations in Two Variables Example 1. Solve the system

The First Derivative and Second Derivative Test

Stat 710: Mathematical Statistics Lecture 12

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Essential facts about NP-completeness:

The First Derivative and Second Derivative Test

Sampling distribution of GLM regression coefficients

What happens to the value of the expression x + y every time we execute this loop? while x>0 do ( y := y+z ; x := x:= x z )

Proportional hazards regression

Generalized Linear Models

ECE 275A Homework 7 Solutions

Lecture 32: Asymptotic confidence sets and likelihoods

Generalized Linear Models Introduction

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Is the cholesterol concentration in blood related to the body mass index (bmi)?

GAUSSIAN PROCESS REGRESSION

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Generalized Linear Models. Kurt Hornik

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Detection Theory. Composite tests

Lines With Many Points On Both Sides

Stat 451 Lecture Notes Numerical Integration

Outline of GLMs. Definitions

Stat 5421 Lecture Notes Likelihood Inference Charles J. Geyer March 12, Likelihood

Loglikelihood and Confidence Intervals

Maximum Likelihood Estimation

CPSC 540: Machine Learning

i=1 h n (ˆθ n ) = 0. (2)

Lecture 4 September 15

c 1 v 1 + c 2 v 2 = 0 c 1 λ 1 v 1 + c 2 λ 1 v 2 = 0

Maximum likelihood estimation

Empirical Likelihood

Based on slides by Richard Zemel

Multivariate Statistical Analysis

Final Examination. STA 215: Statistical Inference. Saturday, 2001 May 5, 9:00am 12:00 noon

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Math 181B Homework 1 Solution

When is MLE appropriate

Expectation Maximization Algorithm

POLI 8501 Introduction to Maximum Likelihood Estimation

Review and continuation from last week Properties of MLEs

Stat 5101 Lecture Slides: Deck 7 Asymptotics, also called Large Sample Theory. Charles J. Geyer School of Statistics University of Minnesota

Verifying Regularity Conditions for Logit-Normal GLMM

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

Lecture 3 September 1

From Model to Log Likelihood

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Mixture of Gaussians Models

Maximum Likelihood Estimation

Linear Methods for Prediction

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Flat and multimodal likelihoods and model lack of fit in curved exponential families

Aster Models and Lande-Arnold Beta By Charles J. Geyer and Ruth G. Shaw Technical Report No. 675 School of Statistics University of Minnesota January

Topic 2: Logistic Regression

Notes on Simplex Algorithm

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Generalized linear models

Semiparametric Gaussian Copula Models: Progress and Problems

Joint work with Nottingham colleagues Simon Preston and Michail Tsagris.

Choosing among models

22A-2 SUMMER 2014 LECTURE 5

Rapid Introduction to Machine Learning/ Deep Learning

Orthogonality. Orthonormal Bases, Orthogonal Matrices. Orthogonality

Stat 8931 (Exponential Families) Lecture Notes The Eggplant that Ate Chicago (the Notes that Got out of Hand) Charles J. Geyer November 2, 2016

Generalized Linear Models

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

MAT246H1S - Concepts In Abstract Mathematics. Solutions to Term Test 1 - February 1, 2018

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science : MULTIVARIABLE CONTROL SYSTEMS by A.

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Semiparametric Gaussian Copula Models: Progress and Problems

To appear in The American Statistician vol. 61 (2007) pp

Principal Components Theory Notes

Transcription:

Stat 8931 (Aster Models) Lecture Slides Deck 8 Charles J. Geyer School of Statistics University of Minnesota June 7, 2015

Conditional Aster Models A conditional aster model is a submodel parameterized θ = a + Mβ An unconditional aster model is a submodel parameterized ϕ = a + Mβ There is a subtle but profound difference.

Both are exponential families, but An unconditional aster model is a regular full exponential family. A conditional aster model is a curved exponential family. Curved exponential families have some nice properties (asymptotics always work for sufficiently large sample sizes), but none of the nice properties we talked about for unconditional aster models.

Review. Unconditional aster models have concave log likelihood, MLE unique if they exist, MLE characterized by observed = expected, observed and expected Fisher information the same, submodel canonical statistic is sufficient, maximum entropy property, multivariate monotone relationship between canonical and mean value parameters. Curved exponential families don t, in general, have any of these properties.

Conditional aster models have two of these concave log likelihood, MLE unique if they exist.

The log likelihood is (from deck 2) l(θ) = j J [ yj θ j y p(j) c j (θ j ) ] = y, θ j J y p(j) c j (θ j ) and the conditional canonical affine submodel is l(β) = M T y, β j J y p(j) c j (θ j )

l(β) = M T y, β j J y p(j) c j (θ j ) We see we get almost no sufficient dimension reduction. The likelihood is a function of M T y and the set of all predecessors. That typically is not a dimension reduction at all (when the dimension of M T y is more than the number of terminal nodes).

l(θ) = j J [ yj θ j y p(j) c j (θ j ) ] Each term in square brackets is strictly concave. The sum of strictly concave functions is strictly concave. The composition of a strictly concave function and an affine function is strictly concave. Hence the log likelihood for a conditional canonical affine submodel is strictly concave. Hence the MLE is unique if it exists.

l(θ) = j J [ yj θ j y p(j) c j (θ j ) ] The observed Fisher information matrix for θ for a saturated aster model is J sat (θ) = 2 l(θ) is a diagonal matrix whose j, j component is y p(j) c j (θ j ) where the double prime indicates ordinary second derivative.

The expected Fisher information matrix for θ, denoted I sat (θ), is the expectation of the observed Fisher information matrix. So it too is diagonal, and its i, i component is µ p(j) c j (θ j )

Then conditional canonical affine submodel observed and expected Fisher information matrices are J(β) = M T J sat (a + Mβ)M I (β) = M T I sat (a + Mβ)M

The maximum entropy argument only works for full exponential families, not for curved exponential families.

We do have the saturated model multivariate monotone relationships µ ϕ and ξ θ. But that doesn t tell us anything about canonical affine submodels.

Unconditional canonical affine submodels have the property that changing ϕ j changes θ k for all k j. Conditional canonical affine submodels do not have this property. Changing θ j only changes θ j. Thus conditional canonical affine submodels tend to need many more parameters to fit adequately.

So if conditional canonical affine submodels don t have any nice properties, why do they even exist? One reason is just because they do exist as abstract mathematical objects, and they weren t that much extra code to implement, and who knows? maybe they will find an important use someday. Just because they exist does not mean we actually recommend them for anything.

One issue that would be a good reason to use conditional aster models is if you do not want the property that changing ϕ j changes θ k for all k j. For example, you might require that the conditional expectation of survival given survival to the previous year be the same for all years. An unconditional aster model wouldn t do that, but a conditional aster model could.