Maximum Likelihood Estimation in Loglinear Models

Size: px

Start display at page:

Download "Maximum Likelihood Estimation in Loglinear Models"

Randolf Pitts
6 years ago
Views:

1 in Loglinear Models Alessandro Rinaldo Carnegie Mellon University joint work with Steve Fienberg April 18, 2012 Workshop on Graphical Models: Mathematics, Statistics and Computer Sciences Fields Institute A. Rinaldo in Loglinear Models 1/32

2 Frequentist inference (estimation, goodness-of-fit testing, model selection) in log-linear models relies on the maximum likelihood estimator (MLE). A. Rinaldo in Loglinear Models 2/32

3 Frequentist inference (estimation, goodness-of-fit testing, model selection) in log-linear models relies on the maximum likelihood estimator (MLE). The MLE may not exist due to sampling zeros. A. Rinaldo in Loglinear Models 2/32

4 Frequentist inference (estimation, goodness-of-fit testing, model selection) in log-linear models relies on the maximum likelihood estimator (MLE). The MLE may not exist due to sampling zeros. Nonexistence of the MLE largely ignored in practice. Important issue, particular in large and sparse tables. A. Rinaldo in Loglinear Models 2/32

5 Motivating Pathological Example Consider a 2 3 table and the model [12][13][23] A. Rinaldo in Loglinear Models 3/32

6 Motivating Pathological Example Zero margin: MLE does not exist! A. Rinaldo in Loglinear Models 3/32

7 Motivating Pathological Example Haberman s example (1974). Positive margins and nonexistent MLE loglin routine in R Warning message: Algorithm did not converge... stop("this should not happen") glm routine in R fitted rates numerically 0 A. Rinaldo in Loglinear Models 3/32

8 Motivating Pathological Example Haberman s example (1974). Positive margins and nonexistent MLE loglin routine in R Warning message: Algorithm did not converge... stop("this should not happen") glm routine in R fitted rates numerically 0 PROC CATMOD routine in SAS If you want zero frequencies that PROC CATMOD would normally treat as structural zeros to be interpreted as sampling zeros, simply insert a one-line statement into the data step that changes each zero to a very small number (such as 1E-20). A. Rinaldo in Loglinear Models 3/32

9 Motivating Pathological Example Haberman s example (1974). Positive margins and nonexistent MLE loglin routine in R Warning message: Algorithm did not converge... stop("this should not happen") glm routine in R fitted rates numerically 0 PROC CATMOD routine in SAS If you want zero frequencies that PROC CATMOD would normally treat as structural zeros to be interpreted as sampling zeros, simply insert a one-line statement into the data step that changes each zero to a very small number (such as 1E-20). Always perfect fit to the data. The p-value is always 1. A. Rinaldo in Loglinear Models 3/32

10 Outline Log-Linear Models Background on log-linear models and exponential families. Existence of the MLE. Parameter estimability under a nonexistent MLE. Computation of extended MLE. A. Rinaldo in Loglinear Models 4/32

11 Log-Linear Models Consider the exponential family tp θ, θ P R I u over a finite set of cells I: P θ ptiuq exp tpθ, a i q φpθqu, θ P R d, i P I, with a i P N d zt0u and φpθq log i exp tpθ, a iqu. Model specified by a I d design matrix A, whose i-th row is a J i. A. Rinaldo in Loglinear Models 5/32

12 Log-Linear Models Consider the exponential family tp θ, θ P R I u over a finite set of cells I: P θ ptiuq exp tpθ, a i q φpθqu, θ P R d, i P I, with a i P N d zt0u and φpθq log i exp tpθ, a iqu. Model specified by a I d design matrix A, whose i-th row is a J i. Observe a number N (possibly random) of cells tl 1,..., L N u, with L j P I, all j. The corresponding contingency table is the random vector n P N I npiq tj : L j iu, i P I. A. Rinaldo in Loglinear Models 5/32

13 Log-Linear Models Log-linear model analysis (see, e.g. Haberman, 1974 and Bishop et al., 2007) is concerned with modeling the distribution of n by assuming that m : Epnq 0, µ : logpmq P M R I, where M RpAq is the log-linear subspace. Sampling constraints: let N M be a linear subspace of M of dimension m d: sampling subspace. Conditional Poisson sampling: tnpiq, i P Iu have the conditional distribution of I independent Poisson random variables with means texppµpiqq, i P Iu given that Π N n c for some known c P R I. A. Rinaldo in Loglinear Models 6/32

14 Conditional Poisson Sampling Schemes: Examples Poisson Likelihood: N t0u. The log-likelihood is pn, µq exppµpiqq log pnpiqq!, µ P M. i i A. Rinaldo in Loglinear Models 7/32

15 Conditional Poisson Sampling Schemes: Examples Poisson Likelihood: N t0u. The log-likelihood is pn, µq exppµpiqq log pnpiqq!, µ P M. i Product Multinomial Likelihood: N spanpχ 1,..., χ mq, where the χ j s are the indicator functions of a partition of I. The sampling constrains are pn, χ j q N j 0 for all j. i A. Rinaldo in Loglinear Models 7/32

16 Conditional Poisson Sampling Schemes: Examples Poisson Likelihood: N t0u. The log-likelihood is pn, µq exppµpiqq log pnpiqq!, µ P M. i Product Multinomial Likelihood: N spanpχ 1,..., χ mq, where the χ j s are the indicator functions of a partition of I. The sampling constrains are pn, χ j q N j 0 for all j. The log-likelihood is m j1 ipb j npiq log well defined only on mpiq pm, χ j q i log N j! log npiq!, µ P M, ipb j tµ P M: pχ j, exppµqq N j, j 1,..., ru M. A. Rinaldo in Loglinear Models 7/32

17 Conditional Poisson Sampling Schemes: Examples Poisson Likelihood: N t0u. The log-likelihood is pn, µq exppµpiqq log pnpiqq!, µ P M. i Product Multinomial Likelihood: N spanpχ 1,..., χ mq, where the χ j s are the indicator functions of a partition of I. The sampling constrains are pn, χ j q N j 0 for all j. The log-likelihood is m j1 ipb j npiq log well defined only on mpiq pm, χ j q i log N j! log npiq!, µ P M, ipb j tµ P M: pχ j, exppµqq N j, j 1,..., ru M. Poisson-Multinomial likelihood (Lang, 2005): a combination of the two. A. Rinaldo in Loglinear Models 7/32

18 Example: Hierarchical Log-Linear Models Let X 1,..., X K be discrete random variables, each X k supported on finite set I k of labels. Then I K k1 I k. A hierarchical log-linear model is a simplicial complex: class of subsets of t1,..., K u such that S P and T S implies T P ). Graphical models are special cases. There log-linear subspaces are of ANOVA-type and there are canonical ways of constructing A (see Lauritzen, 1996, Appendix B). A. Rinaldo in Loglinear Models 8/32

19 Example: Hierarchical Log-Linear Models Let X 1,..., X K be discrete random variables, each X k supported on finite set I k of labels. Then I K k1 I k. A hierarchical log-linear model is a simplicial complex: class of subsets of t1,..., K u such that S P and T S implies T P ). Graphical models are special cases. There log-linear subspaces are of ANOVA-type and there are canonical ways of constructing A (see Lauritzen, 1996, Appendix B). Inferential tasks: estimation, goodness-of-fit testing and model selection. A. Rinaldo in Loglinear Models 8/32

20 Example: Hierarchical Log-Linear Models Mildew fungus example: 2 6 sparse table. Source: Edwards (2000). 1 2 D E F A B C A. Rinaldo in Loglinear Models 8/32

21 Example: Network Models the β-model Let G v be the set of simple graphs on v nodes. A. Rinaldo in Loglinear Models 9/32

22 Example: Network Models the β-model Let G v be the set of simple graphs on v nodes. β-model: edges are independent and occurs with probabilities e β i β j, i j, β pβ 1 e β i β 1,..., β j vq P R v. A. Rinaldo in Loglinear Models 9/32

23 Example: Network Models the β-model Let G v be the set of simple graphs on v nodes. β-model: edges are independent and occurs with probabilities e β i β j, i j, β pβ 1 e β i β 1,..., β j vq P R v. The probability of a graph x P G n is exp # v i1 d i β i ψpβq +, β P R v, where dpxq d pd 1,..., d vq is the (ordered) degree sequence of x. A. Rinaldo in Loglinear Models 9/32

24 Example: Network Models the β-model Let G v be the set of simple graphs on v nodes. β-model: edges are independent and occurs with probabilities e β i β j, i j, β pβ 1 e β i β 1,..., β j vq P R v. The probability of a graph x P G n is exp # v i1 d i β i ψpβq +, β P R v, where dpxq d pd 1,..., d vq is the (ordered) degree sequence of x. It is a log-linear model under product multinomial sampling. A. Rinaldo in Loglinear Models 9/32

25 Exponential Family Representation It is more convenient to represent the log-linear model likelihood in exponential form, with densities! ) p θ pnq exp pa J n, θq ψpθq νpnq, θ P R d, where n P SpN, cq : tx P N I : Π N x cu and the base measure is ¹ 1 νpxq 1 xpspn,cq xpiq!, x P NI. ipi A. Rinaldo in Loglinear Models 10/32

26 Exponential Family Representation It is more convenient to represent the log-linear model likelihood in exponential form, with densities! ) p θ pnq exp pa J n, θq ψpθq νpnq, θ P R d, where n P SpN, cq : tx P N I : Π N x cu and the base measure is ¹ 1 νpxq 1 xpspn,cq xpiq!, x P NI. ipi It is a family of order d m, where m dimpn q 0. Minimality: replace A with full-rank A 1 such that RpA 1 q M a N : M X N c. A. Rinaldo in Loglinear Models 10/32

27 Exponential Family Representation It is more convenient to represent the log-linear model likelihood in exponential form, with densities! ) p θ pnq exp pa J n, θq ψpθq νpnq, θ P R d, where n P SpN, cq : tx P N I : Π N x cu and the base measure is ¹ 1 νpxq 1 xpspn,cq xpiq!, x P NI. ipi It is a family of order d m, where m dimpn q 0. Minimality: replace A with full-rank A 1 such that RpA 1 q M a N : M X N c. Better log-likelihood model parametrization for product-multinomial sampling: pn, βq m j1 N j logpexp β, χ j q log npiq!, β P M a N. ipi A. Rinaldo in Loglinear Models 10/32

28 Three views of Log-Linear Models 1 The exponential family natural parametrization: R d m. 2 The log-linear model parametrization: M a N R I. A. Rinaldo in Loglinear Models 11/32

29 Three views of Log-Linear Models 1 The exponential family natural parametrization: R d m. 2 The log-linear model parametrization: M a N R I. 3 Connection with Algebraic Geometry (see, e.g., Drton et al. 2008). Let M texp µ, µ P Mu: intersection of a real toric variety in R I with the positive orthant. The model is parametrized by V M X tx P R I : Π N x cu. Interpretable parametrization: (conditional) expected cell counts. A. Rinaldo in Loglinear Models 11/32

30 In log-linear models inference relies on MLE: pθ argsup θpr d mp θ pnq The MLEs pµ P M a N and pm P V are similarly defined. A. Rinaldo in Loglinear Models 12/32

31 In log-linear models inference relies on MLE: pθ argsup θpr d mp θ pnq The MLEs pµ P M a N and pm P V are similarly defined. If the supremum is not achieved the MLE does not exist. Instead supremum is realized in the limit tθ nu R d m : }θ n} Ñ 8 tµ nu M a N : }µ n} Ñ 8 tm nu V : m npiq Ñ 0, for some i A. Rinaldo in Loglinear Models 12/32

32 In log-linear models inference relies on MLE: pθ argsup θpr d mp θ pnq The MLEs pµ P M a N and pm P V are similarly defined. If the supremum is not achieved the MLE does not exist. Instead supremum is realized in the limit tθ nu R d m : }θ n} Ñ 8 tµ nu M a N : }µ n} Ñ 8 tm nu V : m npiq Ñ 0, for some i Far from being a just numerical issue. Existent MLE needed to get correct asymptotic approximations to various goodness-of-fit testing statistics in regular and double-asymptotic setting; obtain standard errors for the model parameters; obtain proper posteriors under improper priors and adequate Bayesian inference; carry out conditional inference. A. Rinaldo in Loglinear Models 12/32

33 In log-linear models inference relies on MLE: pθ argsup θpr d mp θ pnq The MLEs pµ P M a N and pm P V are similarly defined. If the supremum is not achieved the MLE does not exist. Instead supremum is realized in the limit tθ nu R d m : }θ n} Ñ 8 tµ nu M a N : }µ n} Ñ 8 tm nu V : m npiq Ñ 0, for some i Far from being a just numerical issue. Existent MLE needed to get correct asymptotic approximations to various goodness-of-fit testing statistics in regular and double-asymptotic setting; obtain standard errors for the model parameters; obtain proper posteriors under improper priors and adequate Bayesian inference; carry out conditional inference. Nonexistence of the MLE leads to issues of estimability and assessment of the model complexity. A. Rinaldo in Loglinear Models 12/32

34 Well known issue in theory: Haberman (1974), Aickin (1979), Glonek et al. (1988), Verbeek (1992) Glonek, Lauritzen (1996). A. Rinaldo in Loglinear Models 13/32

35 Well known issue in theory: Haberman (1974), Aickin (1979), Glonek et al. (1988), Verbeek (1992) Glonek, Lauritzen (1996). Haberman s pathological example (1974). 0 [12][13][23] 0 A. Rinaldo in Loglinear Models 13/32

36 Well known issue in theory: Haberman (1974), Aickin (1979), Glonek et al. (1988), Verbeek (1992) Glonek, Lauritzen (1996). Haberman s pathological example (1974). 0 [12][13][23] 0 In 2 K table with positive entries set 2 cells to zero at random. Under the model of no-3-way interactions, a zero margin will occur with probability k 0, for large K, 2 k 1 and a nonexistent MLE leaving all margins positive with probability 2 k 1 k 2 k 1 1, for large K. 2 A. Rinaldo in Loglinear Models 13/32

37 : Goals 1 Characterize patterns of sampling zeros leading to the nonexistence of the MLE. 2 Statistical consequence of a nonexistent MLE. 3 Algorithms. A. Rinaldo in Loglinear Models 14/32

38 Basics of Discrete Exponential Families See Barndorff-Nielsen (1974), Brown (1986), Jensen (1989), Letác (1992), Csiszár and Matúš (2001,2003,2005,2008). A. Rinaldo in Loglinear Models 15/32

39 Basics of Discrete Exponential Families See Barndorff-Nielsen (1974), Brown (1986), Jensen (1989), Letác (1992), Csiszár and Matúš (2001,2003,2005,2008). The distribution of T A J n belongs to the family E tp θ, θ P Θu, with P θ ptq exp txθ, ty ψpθqu µptq, θ P Θ R d m. whose support T R d and base measure µ depend on the sampling scheme. A. Rinaldo in Loglinear Models 15/32

40 Basics of Discrete Exponential Families See Barndorff-Nielsen (1974), Brown (1986), Jensen (1989), Letác (1992), Csiszár and Matúš (2001,2003,2005,2008). The distribution of T A J n belongs to the family E tp θ, θ P Θu, with P θ ptq exp txθ, ty ψpθqu µptq, θ P Θ R d m. whose support T R d and base measure µ depend on the sampling scheme. The set P convhullpt q is called the convex support of E. A. Rinaldo in Loglinear Models 15/32

41 Basics of Discrete Exponential Families See Barndorff-Nielsen (1974), Brown (1986), Jensen (1989), Letác (1992), Csiszár and Matúš (2001,2003,2005,2008). The distribution of T A J n belongs to the family E tp θ, θ P Θu, with P θ ptq exp txθ, ty ψpθqu µptq, θ P Θ R d m. whose support T R d and base measure µ depend on the sampling scheme. The set P convhullpt q is called the convex support of E. It is a polyhedron. JavaView v Loading intppq te θ rt s, θ P Θu: mean value space. intppq and Θ are homeomorphic: mean value parametrization. A. Rinaldo in Loglinear Models 15/32

42 Basics of Discrete Exponential Families See Barndorff-Nielsen (1974), Brown (1986), Jensen (1989), Letác (1992), Csiszár and Matúš (2001,2003,2005,2008). The distribution of T A J n belongs to the family E tp θ, θ P Θu, with P θ ptq exp txθ, ty ψpθqu µptq, θ P Θ R d m. whose support T R d and base measure µ depend on the sampling scheme. The set P convhullpt q is called the convex support of E. It is a polyhedron. JavaView v Loading intppq te θ rt s, θ P Θu: mean value space. intppq and Θ are homeomorphic: mean value parametrization. Existence of the MLE The MLE exists if and only if t P intppq. A. Rinaldo in Loglinear Models 15/32

43 Extended Exponential Families: Geometric Construction For every face F of P, construct the exponential family of distributions E F for the sample points in F with convex support F. Note that E F depends on dimpf q d parameters only. A. Rinaldo in Loglinear Models 16/32

44 Extended Exponential Families: Geometric Construction For every face F of P, construct the exponential family of distributions E F for the sample points in F with convex support F. Note that E F depends on dimpf q d parameters only. The extended exponential family is E E Y te F : F is a face of Pu A. Rinaldo in Loglinear Models 16/32

45 Extended Exponential Families: Geometric Construction For every face F of P, construct the exponential family of distributions E F for the sample points in F with convex support F. Note that E F depends on dimpf q d parameters only. The extended exponential family is E E Y te F : F is a face of Pu Within E, the MLE always exists A. Rinaldo in Loglinear Models 16/32

46 Extended Exponential Families: Geometric Construction For every face F of P, construct the exponential family of distributions E F for the sample points in F with convex support F. Note that E F depends on dimpf q d parameters only. The extended exponential family is E E Y te F : F is a face of Pu Within E, the MLE always exists Extended Exponential Family The extended exponential family is the closure of the original family. Geometrically, this corresponds to taking the closure of the mean value space, i.e. including the boundary of P. A. Rinaldo in Loglinear Models 16/32

47 Convex Supports (Mean Value Parametrization) JavaView v Loading JavaView v Loading Poisson sampling scheme: marginal cone C A conepaq (Eriksson et al., 2006) Multinomial sampling scheme: marginal polytope convhullpaq Product multinomial sampling scheme: Minkowksi sum of convex polytopes... Poisson-multinomial: Miknowksi sum of polyhedral cone and convex polytopes... A. Rinaldo in Loglinear Models 17/32

48 Convex Supports (Mean Value Parametrization) intppq homeomorphic to V, with homeomorpism given by known as the moment map. x P V ÞÑ A J x P P, A. Rinaldo in Loglinear Models 17/32

x P V ÞÑ A J x P P, Homeomorphism extended to the boundaries.

49 Convex Supports (Mean Value Parametrization) intppq homeomorphic to V, with homeomorpism given by known as the moment map. x P V ÞÑ A J x P P, Homeomorphism extended to the boundaries. clpv q is also a mean value parametrization of E. A. Rinaldo in Loglinear Models 17/32

50 : Existence Existence of the MLE (assume minimality) The MLE of θ (or of µ or of m) exists (and is unique) if and only if A J n P ripc Aq and satisfy the moment equations or, equivalently, ψpp θq A J n pm pmpp θq V X tx 0: A J x A J nu. A. Rinaldo in Loglinear Models 18/32

51 : Existence Existence of the MLE (assume minimality) The MLE of θ (or of µ or of m) exists (and is unique) if and only if A J n P ripc Aq and satisfy the moment equations or, equivalently, ψpp θq A J n pm pmpp θq V X tx 0: A J x A J nu. Application of standard theory of exponential families. Haberman (1974) first to derive it. Results apply to more general conditional Poisson sampling under additional condition. A. Rinaldo in Loglinear Models 18/32

52 : Existence Need more explicit result to characterize problematic zero entries in the table. A. Rinaldo in Loglinear Models 19/32

53 : Existence Need more explicit result to characterize problematic zero entries in the table. Facial sets (Geiger et al., 2006). A set F I for which, for some c P R d, pa i, cq 0 i P F pa i, cq 0 i R F, where a i is the i-th row of A is called a facial set of C A. A. Rinaldo in Loglinear Models 19/32

54 : Existence Need more explicit result to characterize problematic zero entries in the table. Facial sets (Geiger et al., 2006). A set F I for which, for some c P R d, pa i, cq 0 i P F pa i, cq 0 i R F, where a i is the i-th row of A is called a facial set of C A. Facial sets captures combinatorial structure of C A and of clpv q. The following are equivalent t P ripfq t P cone pta i, i P Fuq t A J m, for one m P clpv q with supppmq F. A. Rinaldo in Loglinear Models 19/32

55 : Existence Need more explicit result to characterize problematic zero entries in the table. Facial sets (Geiger et al., 2006). A set F I for which, for some c P R d, pa i, cq 0 i P F pa i, cq 0 i R F, where a i is the i-th row of A is called a facial set of C A. Facial sets captures combinatorial structure of C A and of clpv q. The following are equivalent t P ripfq t P cone pta i, i P Fuq t A J m, for one m P clpv q with supppmq F. Existence of the MLE The MLE does not exists if and only if ti : npiq 0u F c, for a facial set F. A. Rinaldo in Loglinear Models 19/32

56 Examples: Likelihood Zeros polymake 2 2 table and the model [12][13][23]. C A has 16 facets, 12 of which correspond to null margins. 0 0 A. Rinaldo in Loglinear Models 20/32

57 Examples: Likelihood Zeros polymake 2 2 table and the model [12][13][23]. C A has 16 facets, 12 of which correspond to null margins. 0 0 A. Rinaldo in Loglinear Models 20/32

58 Examples: Likelihood Zeros polymake 3 3 table and the model r12sr13sr23s. C A has 207 facets, 27 of which correspond to null margins A. Rinaldo in Loglinear Models 20/32

59 Examples: Likelihood Zeros polymake 3 3 table and the model r12sr13sr23s. C A has 207 facets, 27 of which correspond to null margins A. Rinaldo in Loglinear Models 20/32

60 Examples: Likelihood Zeros polymake table and the model r12sr13sr23s. C A has 153,858 facets, 54 of which correspond to null margins A. Rinaldo in Loglinear Models 20/32

61 Examples: Likelihood Zeros polymake 2 4 table and the non-graphical model r12sr13sr14sr23sr34s. C A has 56 facets, 24 of which correspond to zero margins A. Rinaldo in Loglinear Models 20/32

62 Examples: Likelihood Zeros polymake 3 4 table and the 4-cycle model r12sr14sr23sr34s. C A has 1,116 facets, 16 of which correspond to zero margins A. Rinaldo in Loglinear Models 20/32

63 Example: The β-model For the β-model, the convex support is the polytope of degree sequences: P v R v. A. Rinaldo in Loglinear Models 21/32

64 Example: The β-model For the β-model, the convex support is the polytope of degree sequences: P v R v. The MLE is not defined whenever d i 0 or d i v 1 for some i. These are 2v (easy) cases in total; many more... A. Rinaldo in Loglinear Models 21/32

65 Example: The β-model For the β-model, the convex support is the polytope of degree sequences: P v R v. The MLE is not defined whenever d i 0 or d i v 1 for some i. These are 2v (easy) cases in total; many more... The combinatorial complexity of P v is large. Example the f -vector of P 8 is (Stanley, 1991) p334982, , , , , , 81144, 3322q. The number of facets and of vertices of P 4, P 5, P 6 and P 7 are 22, 60, 224 and 882 and 46, 332, 2874 and 29874, respectively. A. Rinaldo in Loglinear Models 21/32

66 Example: The β-model For the β-model, the convex support is the polytope of degree sequences: P v R v. The MLE is not defined whenever d i 0 or d i v 1 for some i. These are 2v (easy) cases in total; many more... The combinatorial complexity of P v is large. Example the f -vector of P 8 is (Stanley, 1991) p334982, , , , , , 81144, 3322q. The number of facets and of vertices of P 4, P 5, P 6 and P 7 are 22, 60, 224 and 882 and 46, 332, 2874 and 29874, respectively. Cayley Trick: Represent β-model as a log-linear model under product-multinomial constraints. Use a higher-dimensional marginal cone in R pv 2q v with smaller complexity. A. Rinaldo in Loglinear Models 21/32

67 Example: The β-model When v 4, there are 14 facial sets corresponding to the facets of P 4, 8 of which associated to a degree of 0 or 3. For v 5, example of a facial set for which the degrees are bounded away from 0 and 4. For v 6, example of a facial set for which the degrees are bounded away from 0 and 5. A. Rinaldo in Loglinear Models 22/32

68 Parameter Estimability Assume the Poisson scheme and suppose that the MLE does not exist. A J n P ripfq, for some random face F of C A of random dimension d F. Maximum likelihood estimation well-defined in E F. What can we do (estimate)? A. Rinaldo in Loglinear Models 23/32

69 Parameter Estimability Assume the Poisson scheme and suppose that the MLE does not exist. A J n P ripfq, for some random face F of C A of random dimension d F. Maximum likelihood estimation well-defined in E F. What can we do (estimate)? Let L F to be the subspace generated by the normal cone to F. Equivalence relation on R d L : θ F 1 θ2 if and only if θ 1 θ 2 P L F. Set θ LF for the equivalence class containing θ and Θ LF tθ LF, θ P R d u. Let F the facial set associated to F and let π F : R I Ñ R F be the coordinate projection: x ÞÑ txpiq: i P Fu. A. Rinaldo in Loglinear Models 23/32

70 Parameter Estimability Parameter Estimability L (i) The family E F is non-identifiable: any two points θ F 1 θ2 specify the same distribution. (ii) The family E F is parametrized by Θ LF, or, equivalently, by π F pmq and is of order d F. (iii) The set Θ LF is a d F -dimensional dimensional vector space comprised of parallel affine subspaces of R d of dimension dimpl F q d d F. It is isomorphic to π F pmq. For product multinomial sampling schemes, replace L F with L F tζ : Aζ P N u; π F pmq with π F pm a N q. A. Rinaldo in Loglinear Models 24/32

71 Parameter Estimability For the extended family E F with corresponding facial set F M X pn L F q c is estimable π F pv q is estimable where L F, F and their dimension d F are random. A. Rinaldo in Loglinear Models 25/32

72 Parameter Estimability For the extended family E F with corresponding facial set F M X pn L F q c is estimable π F pv q is estimable where L F, F and their dimension d F are random. Extended MLEs: Natural parametrization: Reparametrize E F using a new design mantrix A F with RpA F q M X pn L F q c. The extended MLE of the natural parameter is the MLE (which exists and is unique) of the corresponding family. Mean value parametrization: The extended MLE is the unique point where suppp pmq F. pm bdpv q X tx 0: A J x A J nu, Correct model complexity: the adjusted number of degrees of freedom is F d F. A. Rinaldo in Loglinear Models 25/32

73 Parameter Estimability: Examples 2 3 table and the model r12sr13sr23s 0 0 d F F 6: saturated model for E F! A. Rinaldo in Loglinear Models 26/32

74 Parameter Estimability: Examples 3 3 table and the model r12sr13sr23s d F F 18: saturated model for E F! A. Rinaldo in Loglinear Models 26/32

75 Parameter Estimability: Examples 3 3 table and the model r12sr13sr23s. The red zeros are not likelihood zeros d F 18, F 21: the number of adjusted degrees of freedom is 3. A. Rinaldo in Loglinear Models 26/32

76 Parameter Estimability: Examples 3 3 table and the model r12sr13sr23s The MLE exists! A. Rinaldo in Loglinear Models 26/32

77 Parameter Estimability: Mildew Fungus Example 1 2 D E F A B C A. Rinaldo in Loglinear Models 27/32

78 Parameter Estimability: Mildew Fungus Example MIM (Edwards, 2000) selected optimal model using a greedy stepwise backward model selection procedure based on testing individual edges. The final model is biologically plausible. A. Rinaldo in Loglinear Models 27/32

79 Parameter Estimability: Mildew Fungus Example Sequence of models found by MIM. Red boxes indicate zero margins (MLE does not exist). Model Unadjusted d.f. Adjusted d.f. [ABCDEF] 0 0 [ABCEF] [ABCDE] 16 3 [BCEF] [ABCDE] 24 6 [BCEF] [ABCE] [ABCD] [BCEF] [ABCE] [ABD] [BCEF] [AD] [ABCE] [CEF] [AD] [ABCE] [CEF] [AD] [BCE] [ABE] [CEF] [AD] [ABE] [CEF] [AD] [BE] [AB] [CF][CE][AD] [BE] [AB] A. Rinaldo in Loglinear Models 27/32

80 Extended : Numerical Procedures How to compute the extended MLE? A. Rinaldo in Loglinear Models 28/32

81 Extended : Numerical Procedures How to compute the extended MLE? If A J n P ripfq, for some face of C A, then (non-zero) points in the normal cone to F are directions of recession of the negative log-likelihood; the Fisher information Ipθq matrices for E F are singular. ζ J Ipθqζ P L P R d. A. Rinaldo in Loglinear Models 28/32

82 Extended : Numerical Procedures How to compute the extended MLE? If A J n P ripfq, for some face of C A, then (non-zero) points in the normal cone to F are directions of recession of the negative log-likelihood; the Fisher information Ipθq matrices for E F are singular. Two-step procedure: ζ J Ipθqζ P L P R d. Step 1 Identify the facial set F and compute a basis for π F To compute F: M X pn L F q c. Given t A J n, determine the facial set F of rows of A which span the face F of C A such that t P ripfq. A. Rinaldo in Loglinear Models 28/32

83 Extended : Numerical Procedures How to compute the extended MLE? If A J n P ripfq, for some face of C A, then (non-zero) points in the normal cone to F are directions of recession of the negative log-likelihood; the Fisher information Ipθq matrices for E F are singular. Two-step procedure: ζ J Ipθqζ P L P R d. Step 1 Identify the facial set F and compute a basis for π F To compute F: M X pn L F q c. Given t A J n, determine the facial set F of rows of A which span the face F of C A such that t P ripfq. Step 2 Optimize the restricted likelihood function for the extended family. When F is available, this is equivalent to maximizing the likelihood of a log-linear model with structural zeros along F. (Easy) A. Rinaldo in Loglinear Models 28/32

84 Algorithms for Extended Step 1 (finding Fq is the important one. Let A and A 0 the sub-matrix of A corresponding to positive and zero entries of n. A. Rinaldo in Loglinear Models 29/32

85 Algorithms for Extended Step 1 (finding Fq is the important one. Let A and A 0 the sub-matrix of A corresponding to positive and zero entries of n. Identifying F c is equivalent to finding of a vector c P R d such that: A c 0; A 0 c 0; the set supppacq has maximal cardinality. A. Rinaldo in Loglinear Models 29/32

86 Algorithms for Extended Step 1 (finding Fq is the important one. Let A and A 0 the sub-matrix of A corresponding to positive and zero entries of n. Identifying F c is equivalent to finding of a vector c P R d such that: A c 0; A 0 c 0; the set supppacq has maximal cardinality. This can be done with repeated iterations of LP (see also Geyer, 2009) or with non-linear methods. For large problems, they can be computationally intensive. Relevance to exact inference: knowledge of F can help finding Markov bases. A. Rinaldo in Loglinear Models 29/32

87 Reducible Hierarchical Log-Linear Models For reducible hierarchical log-linear models both tasks can be carried out in parallel over appropriate sub-models. A hierarchical log-linear model (simplicial complex) is reducible if it can be obtained as the direct join of two sub simplicial complex (see Lauritzen, 1996). Apply the definition recursively. A. Rinaldo in Loglinear Models 30/32

88 Reducible Hierarchical Log-Linear Models For reducible hierarchical log-linear models both tasks can be carried out in parallel over appropriate sub-models. A hierarchical log-linear model (simplicial complex) is reducible if it can be obtained as the direct join of two sub simplicial complex (see Lauritzen, 1996). Apply the definition recursively. A. Rinaldo in Loglinear Models 30/32

89 Reducible Hierarchical Log-Linear Models For reducible hierarchical log-linear models both tasks can be carried out in parallel over appropriate sub-models. A hierarchical log-linear model (simplicial complex) is reducible if it can be obtained as the direct join of two sub simplicial complex (see Lauritzen, 1996). Apply the definition recursively. A. Rinaldo in Loglinear Models 30/32

90 Reducible Hierarchical Log-Linear Models For reducible hierarchical log-linear models both tasks can be carried out in parallel over appropriate sub-models. A hierarchical log-linear model (simplicial complex) is reducible if it can be obtained as the direct join of two sub simplicial complex (see Lauritzen, 1996). Apply the definition recursively. Theoretical justification: reducible models are defined by cuts (Barndorff-Nielsen, 1974). A. Rinaldo in Loglinear Models 30/32

91 Reducible Hierarchical Log-Linear Models For reducible hierarchical log-linear models both tasks can be carried out in parallel over appropriate sub-models. A hierarchical log-linear model (simplicial complex) is reducible if it can be obtained as the direct join of two sub simplicial complex (see Lauritzen, 1996). Apply the definition recursively. Theoretical justification: reducible models are defined by cuts (Barndorff-Nielsen, 1974). Old idea: Hara et al. (2011), Engsrtöm et al. (2011), Sullivant (2007), Eriksson et al. (2006), Dobra and Sullivant (2004), Badsberg and Malvestuto (2001), Frydenberg (1990), Leimer (1993), Tarjan (1985). A. Rinaldo in Loglinear Models 30/32

92 Still lots of work ahead... The validity/applicability of model selection based on adjusted degree of freedom has to be investigated, especially in the double asymptotic framework. Computationally efficient methods for model selection for large tables still lacking. A. Rinaldo in Loglinear Models 31/32

93 Still lots of work ahead... Thank you Fienberg S. E. and Rinaldo, A. (2011). in Log-linear Models, Rinaldo, A., Petrović, S. and Fienberg, S. E. (2011). Maximum Likelihood Estimation in Network Models, Rinaldo, A., Fienberg, S. E. and Zhou, Y. (2009). On the Geometry of Discrete Exponential Families with Application to Exponential Random Graph Models, Electronic Journal of Statistics, 3, A. Rinaldo in Loglinear Models 31/32

94 9 nodes: sufficient statistics are the number of edges and triangles. 90 Convex support Number of triangles Number of edges

95 Entropy plot over the mean value space.

96 Entropy plot over the natural parameter space.

97 Entropy plots of the natural space and mean value spaces.

98 Entropy plots of the natural space space with superimposed the normal fan and of the mean value space.

Computing Maximum Likelihood Estimates in Log-Linear Models

Computing Maximum Likelihood Estimates in Log-Linear Models Alessandro Rinaldo Department of Statistics Carnegie Mellon University June, 2006 Abstract We develop computational strategies for extended maximum