Maximum Likelihood Estimation in Loglinear Models

Size: px
Start display at page:

Download "Maximum Likelihood Estimation in Loglinear Models"

Transcription

1 in Loglinear Models Alessandro Rinaldo Carnegie Mellon University joint work with Steve Fienberg April 18, 2012 Workshop on Graphical Models: Mathematics, Statistics and Computer Sciences Fields Institute A. Rinaldo in Loglinear Models 1/32

2 Frequentist inference (estimation, goodness-of-fit testing, model selection) in log-linear models relies on the maximum likelihood estimator (MLE). A. Rinaldo in Loglinear Models 2/32

3 Frequentist inference (estimation, goodness-of-fit testing, model selection) in log-linear models relies on the maximum likelihood estimator (MLE). The MLE may not exist due to sampling zeros. A. Rinaldo in Loglinear Models 2/32

4 Frequentist inference (estimation, goodness-of-fit testing, model selection) in log-linear models relies on the maximum likelihood estimator (MLE). The MLE may not exist due to sampling zeros. Nonexistence of the MLE largely ignored in practice. Important issue, particular in large and sparse tables. A. Rinaldo in Loglinear Models 2/32

5 Motivating Pathological Example Consider a 2 3 table and the model [12][13][23] A. Rinaldo in Loglinear Models 3/32

6 Motivating Pathological Example Zero margin: MLE does not exist! A. Rinaldo in Loglinear Models 3/32

7 Motivating Pathological Example Haberman s example (1974). Positive margins and nonexistent MLE loglin routine in R Warning message: Algorithm did not converge... stop("this should not happen") glm routine in R fitted rates numerically 0 A. Rinaldo in Loglinear Models 3/32

8 Motivating Pathological Example Haberman s example (1974). Positive margins and nonexistent MLE loglin routine in R Warning message: Algorithm did not converge... stop("this should not happen") glm routine in R fitted rates numerically 0 PROC CATMOD routine in SAS If you want zero frequencies that PROC CATMOD would normally treat as structural zeros to be interpreted as sampling zeros, simply insert a one-line statement into the data step that changes each zero to a very small number (such as 1E-20). A. Rinaldo in Loglinear Models 3/32

9 Motivating Pathological Example Haberman s example (1974). Positive margins and nonexistent MLE loglin routine in R Warning message: Algorithm did not converge... stop("this should not happen") glm routine in R fitted rates numerically 0 PROC CATMOD routine in SAS If you want zero frequencies that PROC CATMOD would normally treat as structural zeros to be interpreted as sampling zeros, simply insert a one-line statement into the data step that changes each zero to a very small number (such as 1E-20). Always perfect fit to the data. The p-value is always 1. A. Rinaldo in Loglinear Models 3/32

10 Outline Log-Linear Models Background on log-linear models and exponential families. Existence of the MLE. Parameter estimability under a nonexistent MLE. Computation of extended MLE. A. Rinaldo in Loglinear Models 4/32

11 Log-Linear Models Consider the exponential family tp θ, θ P R I u over a finite set of cells I: P θ ptiuq exp tpθ, a i q φpθqu, θ P R d, i P I, with a i P N d zt0u and φpθq log i exp tpθ, a iqu. Model specified by a I d design matrix A, whose i-th row is a J i. A. Rinaldo in Loglinear Models 5/32

12 Log-Linear Models Consider the exponential family tp θ, θ P R I u over a finite set of cells I: P θ ptiuq exp tpθ, a i q φpθqu, θ P R d, i P I, with a i P N d zt0u and φpθq log i exp tpθ, a iqu. Model specified by a I d design matrix A, whose i-th row is a J i. Observe a number N (possibly random) of cells tl 1,..., L N u, with L j P I, all j. The corresponding contingency table is the random vector n P N I npiq tj : L j iu, i P I. A. Rinaldo in Loglinear Models 5/32

13 Log-Linear Models Log-linear model analysis (see, e.g. Haberman, 1974 and Bishop et al., 2007) is concerned with modeling the distribution of n by assuming that m : Epnq 0, µ : logpmq P M R I, where M RpAq is the log-linear subspace. Sampling constraints: let N M be a linear subspace of M of dimension m d: sampling subspace. Conditional Poisson sampling: tnpiq, i P Iu have the conditional distribution of I independent Poisson random variables with means texppµpiqq, i P Iu given that Π N n c for some known c P R I. A. Rinaldo in Loglinear Models 6/32

14 Conditional Poisson Sampling Schemes: Examples Poisson Likelihood: N t0u. The log-likelihood is pn, µq exppµpiqq log pnpiqq!, µ P M. i i A. Rinaldo in Loglinear Models 7/32

15 Conditional Poisson Sampling Schemes: Examples Poisson Likelihood: N t0u. The log-likelihood is pn, µq exppµpiqq log pnpiqq!, µ P M. i Product Multinomial Likelihood: N spanpχ 1,..., χ mq, where the χ j s are the indicator functions of a partition of I. The sampling constrains are pn, χ j q N j 0 for all j. i A. Rinaldo in Loglinear Models 7/32

16 Conditional Poisson Sampling Schemes: Examples Poisson Likelihood: N t0u. The log-likelihood is pn, µq exppµpiqq log pnpiqq!, µ P M. i Product Multinomial Likelihood: N spanpχ 1,..., χ mq, where the χ j s are the indicator functions of a partition of I. The sampling constrains are pn, χ j q N j 0 for all j. The log-likelihood is m j1 ipb j npiq log well defined only on mpiq pm, χ j q i log N j! log npiq!, µ P M, ipb j tµ P M: pχ j, exppµqq N j, j 1,..., ru M. A. Rinaldo in Loglinear Models 7/32

17 Conditional Poisson Sampling Schemes: Examples Poisson Likelihood: N t0u. The log-likelihood is pn, µq exppµpiqq log pnpiqq!, µ P M. i Product Multinomial Likelihood: N spanpχ 1,..., χ mq, where the χ j s are the indicator functions of a partition of I. The sampling constrains are pn, χ j q N j 0 for all j. The log-likelihood is m j1 ipb j npiq log well defined only on mpiq pm, χ j q i log N j! log npiq!, µ P M, ipb j tµ P M: pχ j, exppµqq N j, j 1,..., ru M. Poisson-Multinomial likelihood (Lang, 2005): a combination of the two. A. Rinaldo in Loglinear Models 7/32

18 Example: Hierarchical Log-Linear Models Let X 1,..., X K be discrete random variables, each X k supported on finite set I k of labels. Then I K k1 I k. A hierarchical log-linear model is a simplicial complex: class of subsets of t1,..., K u such that S P and T S implies T P ). Graphical models are special cases. There log-linear subspaces are of ANOVA-type and there are canonical ways of constructing A (see Lauritzen, 1996, Appendix B). A. Rinaldo in Loglinear Models 8/32

19 Example: Hierarchical Log-Linear Models Let X 1,..., X K be discrete random variables, each X k supported on finite set I k of labels. Then I K k1 I k. A hierarchical log-linear model is a simplicial complex: class of subsets of t1,..., K u such that S P and T S implies T P ). Graphical models are special cases. There log-linear subspaces are of ANOVA-type and there are canonical ways of constructing A (see Lauritzen, 1996, Appendix B). Inferential tasks: estimation, goodness-of-fit testing and model selection. A. Rinaldo in Loglinear Models 8/32

20 Example: Hierarchical Log-Linear Models Mildew fungus example: 2 6 sparse table. Source: Edwards (2000). 1 2 D E F A B C A. Rinaldo in Loglinear Models 8/32

21 Example: Network Models the β-model Let G v be the set of simple graphs on v nodes. A. Rinaldo in Loglinear Models 9/32

22 Example: Network Models the β-model Let G v be the set of simple graphs on v nodes. β-model: edges are independent and occurs with probabilities e β i β j, i j, β pβ 1 e β i β 1,..., β j vq P R v. A. Rinaldo in Loglinear Models 9/32

23 Example: Network Models the β-model Let G v be the set of simple graphs on v nodes. β-model: edges are independent and occurs with probabilities e β i β j, i j, β pβ 1 e β i β 1,..., β j vq P R v. The probability of a graph x P G n is exp # v i1 d i β i ψpβq +, β P R v, where dpxq d pd 1,..., d vq is the (ordered) degree sequence of x. A. Rinaldo in Loglinear Models 9/32

24 Example: Network Models the β-model Let G v be the set of simple graphs on v nodes. β-model: edges are independent and occurs with probabilities e β i β j, i j, β pβ 1 e β i β 1,..., β j vq P R v. The probability of a graph x P G n is exp # v i1 d i β i ψpβq +, β P R v, where dpxq d pd 1,..., d vq is the (ordered) degree sequence of x. It is a log-linear model under product multinomial sampling. A. Rinaldo in Loglinear Models 9/32

25 Exponential Family Representation It is more convenient to represent the log-linear model likelihood in exponential form, with densities! ) p θ pnq exp pa J n, θq ψpθq νpnq, θ P R d, where n P SpN, cq : tx P N I : Π N x cu and the base measure is ¹ 1 νpxq 1 xpspn,cq xpiq!, x P NI. ipi A. Rinaldo in Loglinear Models 10/32

26 Exponential Family Representation It is more convenient to represent the log-linear model likelihood in exponential form, with densities! ) p θ pnq exp pa J n, θq ψpθq νpnq, θ P R d, where n P SpN, cq : tx P N I : Π N x cu and the base measure is ¹ 1 νpxq 1 xpspn,cq xpiq!, x P NI. ipi It is a family of order d m, where m dimpn q 0. Minimality: replace A with full-rank A 1 such that RpA 1 q M a N : M X N c. A. Rinaldo in Loglinear Models 10/32

27 Exponential Family Representation It is more convenient to represent the log-linear model likelihood in exponential form, with densities! ) p θ pnq exp pa J n, θq ψpθq νpnq, θ P R d, where n P SpN, cq : tx P N I : Π N x cu and the base measure is ¹ 1 νpxq 1 xpspn,cq xpiq!, x P NI. ipi It is a family of order d m, where m dimpn q 0. Minimality: replace A with full-rank A 1 such that RpA 1 q M a N : M X N c. Better log-likelihood model parametrization for product-multinomial sampling: pn, βq m j1 N j logpexp β, χ j q log npiq!, β P M a N. ipi A. Rinaldo in Loglinear Models 10/32

28 Three views of Log-Linear Models 1 The exponential family natural parametrization: R d m. 2 The log-linear model parametrization: M a N R I. A. Rinaldo in Loglinear Models 11/32

29 Three views of Log-Linear Models 1 The exponential family natural parametrization: R d m. 2 The log-linear model parametrization: M a N R I. 3 Connection with Algebraic Geometry (see, e.g., Drton et al. 2008). Let M texp µ, µ P Mu: intersection of a real toric variety in R I with the positive orthant. The model is parametrized by V M X tx P R I : Π N x cu. Interpretable parametrization: (conditional) expected cell counts. A. Rinaldo in Loglinear Models 11/32

30 In log-linear models inference relies on MLE: pθ argsup θpr d mp θ pnq The MLEs pµ P M a N and pm P V are similarly defined. A. Rinaldo in Loglinear Models 12/32

31 In log-linear models inference relies on MLE: pθ argsup θpr d mp θ pnq The MLEs pµ P M a N and pm P V are similarly defined. If the supremum is not achieved the MLE does not exist. Instead supremum is realized in the limit tθ nu R d m : }θ n} Ñ 8 tµ nu M a N : }µ n} Ñ 8 tm nu V : m npiq Ñ 0, for some i A. Rinaldo in Loglinear Models 12/32

32 In log-linear models inference relies on MLE: pθ argsup θpr d mp θ pnq The MLEs pµ P M a N and pm P V are similarly defined. If the supremum is not achieved the MLE does not exist. Instead supremum is realized in the limit tθ nu R d m : }θ n} Ñ 8 tµ nu M a N : }µ n} Ñ 8 tm nu V : m npiq Ñ 0, for some i Far from being a just numerical issue. Existent MLE needed to get correct asymptotic approximations to various goodness-of-fit testing statistics in regular and double-asymptotic setting; obtain standard errors for the model parameters; obtain proper posteriors under improper priors and adequate Bayesian inference; carry out conditional inference. A. Rinaldo in Loglinear Models 12/32

33 In log-linear models inference relies on MLE: pθ argsup θpr d mp θ pnq The MLEs pµ P M a N and pm P V are similarly defined. If the supremum is not achieved the MLE does not exist. Instead supremum is realized in the limit tθ nu R d m : }θ n} Ñ 8 tµ nu M a N : }µ n} Ñ 8 tm nu V : m npiq Ñ 0, for some i Far from being a just numerical issue. Existent MLE needed to get correct asymptotic approximations to various goodness-of-fit testing statistics in regular and double-asymptotic setting; obtain standard errors for the model parameters; obtain proper posteriors under improper priors and adequate Bayesian inference; carry out conditional inference. Nonexistence of the MLE leads to issues of estimability and assessment of the model complexity. A. Rinaldo in Loglinear Models 12/32

34 Well known issue in theory: Haberman (1974), Aickin (1979), Glonek et al. (1988), Verbeek (1992) Glonek, Lauritzen (1996). A. Rinaldo in Loglinear Models 13/32

35 Well known issue in theory: Haberman (1974), Aickin (1979), Glonek et al. (1988), Verbeek (1992) Glonek, Lauritzen (1996). Haberman s pathological example (1974). 0 [12][13][23] 0 A. Rinaldo in Loglinear Models 13/32

36 Well known issue in theory: Haberman (1974), Aickin (1979), Glonek et al. (1988), Verbeek (1992) Glonek, Lauritzen (1996). Haberman s pathological example (1974). 0 [12][13][23] 0 In 2 K table with positive entries set 2 cells to zero at random. Under the model of no-3-way interactions, a zero margin will occur with probability k 0, for large K, 2 k 1 and a nonexistent MLE leaving all margins positive with probability 2 k 1 k 2 k 1 1, for large K. 2 A. Rinaldo in Loglinear Models 13/32

37 : Goals 1 Characterize patterns of sampling zeros leading to the nonexistence of the MLE. 2 Statistical consequence of a nonexistent MLE. 3 Algorithms. A. Rinaldo in Loglinear Models 14/32

38 Basics of Discrete Exponential Families See Barndorff-Nielsen (1974), Brown (1986), Jensen (1989), Letác (1992), Csiszár and Matúš (2001,2003,2005,2008). A. Rinaldo in Loglinear Models 15/32

39 Basics of Discrete Exponential Families See Barndorff-Nielsen (1974), Brown (1986), Jensen (1989), Letác (1992), Csiszár and Matúš (2001,2003,2005,2008). The distribution of T A J n belongs to the family E tp θ, θ P Θu, with P θ ptq exp txθ, ty ψpθqu µptq, θ P Θ R d m. whose support T R d and base measure µ depend on the sampling scheme. A. Rinaldo in Loglinear Models 15/32

40 Basics of Discrete Exponential Families See Barndorff-Nielsen (1974), Brown (1986), Jensen (1989), Letác (1992), Csiszár and Matúš (2001,2003,2005,2008). The distribution of T A J n belongs to the family E tp θ, θ P Θu, with P θ ptq exp txθ, ty ψpθqu µptq, θ P Θ R d m. whose support T R d and base measure µ depend on the sampling scheme. The set P convhullpt q is called the convex support of E. A. Rinaldo in Loglinear Models 15/32

41 Basics of Discrete Exponential Families See Barndorff-Nielsen (1974), Brown (1986), Jensen (1989), Letác (1992), Csiszár and Matúš (2001,2003,2005,2008). The distribution of T A J n belongs to the family E tp θ, θ P Θu, with P θ ptq exp txθ, ty ψpθqu µptq, θ P Θ R d m. whose support T R d and base measure µ depend on the sampling scheme. The set P convhullpt q is called the convex support of E. It is a polyhedron. JavaView v Loading intppq te θ rt s, θ P Θu: mean value space. intppq and Θ are homeomorphic: mean value parametrization. A. Rinaldo in Loglinear Models 15/32

42 Basics of Discrete Exponential Families See Barndorff-Nielsen (1974), Brown (1986), Jensen (1989), Letác (1992), Csiszár and Matúš (2001,2003,2005,2008). The distribution of T A J n belongs to the family E tp θ, θ P Θu, with P θ ptq exp txθ, ty ψpθqu µptq, θ P Θ R d m. whose support T R d and base measure µ depend on the sampling scheme. The set P convhullpt q is called the convex support of E. It is a polyhedron. JavaView v Loading intppq te θ rt s, θ P Θu: mean value space. intppq and Θ are homeomorphic: mean value parametrization. Existence of the MLE The MLE exists if and only if t P intppq. A. Rinaldo in Loglinear Models 15/32

43 Extended Exponential Families: Geometric Construction For every face F of P, construct the exponential family of distributions E F for the sample points in F with convex support F. Note that E F depends on dimpf q d parameters only. A. Rinaldo in Loglinear Models 16/32

44 Extended Exponential Families: Geometric Construction For every face F of P, construct the exponential family of distributions E F for the sample points in F with convex support F. Note that E F depends on dimpf q d parameters only. The extended exponential family is E E Y te F : F is a face of Pu A. Rinaldo in Loglinear Models 16/32

45 Extended Exponential Families: Geometric Construction For every face F of P, construct the exponential family of distributions E F for the sample points in F with convex support F. Note that E F depends on dimpf q d parameters only. The extended exponential family is E E Y te F : F is a face of Pu Within E, the MLE always exists A. Rinaldo in Loglinear Models 16/32

46 Extended Exponential Families: Geometric Construction For every face F of P, construct the exponential family of distributions E F for the sample points in F with convex support F. Note that E F depends on dimpf q d parameters only. The extended exponential family is E E Y te F : F is a face of Pu Within E, the MLE always exists Extended Exponential Family The extended exponential family is the closure of the original family. Geometrically, this corresponds to taking the closure of the mean value space, i.e. including the boundary of P. A. Rinaldo in Loglinear Models 16/32

47 Convex Supports (Mean Value Parametrization) JavaView v Loading JavaView v Loading Poisson sampling scheme: marginal cone C A conepaq (Eriksson et al., 2006) Multinomial sampling scheme: marginal polytope convhullpaq Product multinomial sampling scheme: Minkowksi sum of convex polytopes... Poisson-multinomial: Miknowksi sum of polyhedral cone and convex polytopes... A. Rinaldo in Loglinear Models 17/32

48 Convex Supports (Mean Value Parametrization) intppq homeomorphic to V, with homeomorpism given by known as the moment map. x P V ÞÑ A J x P P, A. Rinaldo in Loglinear Models 17/32

49 Convex Supports (Mean Value Parametrization) intppq homeomorphic to V, with homeomorpism given by known as the moment map. x P V ÞÑ A J x P P, Homeomorphism extended to the boundaries. clpv q is also a mean value parametrization of E. A. Rinaldo in Loglinear Models 17/32

50 : Existence Existence of the MLE (assume minimality) The MLE of θ (or of µ or of m) exists (and is unique) if and only if A J n P ripc Aq and satisfy the moment equations or, equivalently, ψpp θq A J n pm pmpp θq V X tx 0: A J x A J nu. A. Rinaldo in Loglinear Models 18/32

51 : Existence Existence of the MLE (assume minimality) The MLE of θ (or of µ or of m) exists (and is unique) if and only if A J n P ripc Aq and satisfy the moment equations or, equivalently, ψpp θq A J n pm pmpp θq V X tx 0: A J x A J nu. Application of standard theory of exponential families. Haberman (1974) first to derive it. Results apply to more general conditional Poisson sampling under additional condition. A. Rinaldo in Loglinear Models 18/32

52 : Existence Need more explicit result to characterize problematic zero entries in the table. A. Rinaldo in Loglinear Models 19/32

53 : Existence Need more explicit result to characterize problematic zero entries in the table. Facial sets (Geiger et al., 2006). A set F I for which, for some c P R d, pa i, cq 0 i P F pa i, cq 0 i R F, where a i is the i-th row of A is called a facial set of C A. A. Rinaldo in Loglinear Models 19/32

54 : Existence Need more explicit result to characterize problematic zero entries in the table. Facial sets (Geiger et al., 2006). A set F I for which, for some c P R d, pa i, cq 0 i P F pa i, cq 0 i R F, where a i is the i-th row of A is called a facial set of C A. Facial sets captures combinatorial structure of C A and of clpv q. The following are equivalent t P ripfq t P cone pta i, i P Fuq t A J m, for one m P clpv q with supppmq F. A. Rinaldo in Loglinear Models 19/32

55 : Existence Need more explicit result to characterize problematic zero entries in the table. Facial sets (Geiger et al., 2006). A set F I for which, for some c P R d, pa i, cq 0 i P F pa i, cq 0 i R F, where a i is the i-th row of A is called a facial set of C A. Facial sets captures combinatorial structure of C A and of clpv q. The following are equivalent t P ripfq t P cone pta i, i P Fuq t A J m, for one m P clpv q with supppmq F. Existence of the MLE The MLE does not exists if and only if ti : npiq 0u F c, for a facial set F. A. Rinaldo in Loglinear Models 19/32

56 Examples: Likelihood Zeros polymake 2 2 table and the model [12][13][23]. C A has 16 facets, 12 of which correspond to null margins. 0 0 A. Rinaldo in Loglinear Models 20/32

57 Examples: Likelihood Zeros polymake 2 2 table and the model [12][13][23]. C A has 16 facets, 12 of which correspond to null margins. 0 0 A. Rinaldo in Loglinear Models 20/32

58 Examples: Likelihood Zeros polymake 3 3 table and the model r12sr13sr23s. C A has 207 facets, 27 of which correspond to null margins A. Rinaldo in Loglinear Models 20/32

59 Examples: Likelihood Zeros polymake 3 3 table and the model r12sr13sr23s. C A has 207 facets, 27 of which correspond to null margins A. Rinaldo in Loglinear Models 20/32

60 Examples: Likelihood Zeros polymake table and the model r12sr13sr23s. C A has 153,858 facets, 54 of which correspond to null margins A. Rinaldo in Loglinear Models 20/32

61 Examples: Likelihood Zeros polymake 2 4 table and the non-graphical model r12sr13sr14sr23sr34s. C A has 56 facets, 24 of which correspond to zero margins A. Rinaldo in Loglinear Models 20/32

62 Examples: Likelihood Zeros polymake 3 4 table and the 4-cycle model r12sr14sr23sr34s. C A has 1,116 facets, 16 of which correspond to zero margins A. Rinaldo in Loglinear Models 20/32

63 Example: The β-model For the β-model, the convex support is the polytope of degree sequences: P v R v. A. Rinaldo in Loglinear Models 21/32

64 Example: The β-model For the β-model, the convex support is the polytope of degree sequences: P v R v. The MLE is not defined whenever d i 0 or d i v 1 for some i. These are 2v (easy) cases in total; many more... A. Rinaldo in Loglinear Models 21/32

65 Example: The β-model For the β-model, the convex support is the polytope of degree sequences: P v R v. The MLE is not defined whenever d i 0 or d i v 1 for some i. These are 2v (easy) cases in total; many more... The combinatorial complexity of P v is large. Example the f -vector of P 8 is (Stanley, 1991) p334982, , , , , , 81144, 3322q. The number of facets and of vertices of P 4, P 5, P 6 and P 7 are 22, 60, 224 and 882 and 46, 332, 2874 and 29874, respectively. A. Rinaldo in Loglinear Models 21/32

66 Example: The β-model For the β-model, the convex support is the polytope of degree sequences: P v R v. The MLE is not defined whenever d i 0 or d i v 1 for some i. These are 2v (easy) cases in total; many more... The combinatorial complexity of P v is large. Example the f -vector of P 8 is (Stanley, 1991) p334982, , , , , , 81144, 3322q. The number of facets and of vertices of P 4, P 5, P 6 and P 7 are 22, 60, 224 and 882 and 46, 332, 2874 and 29874, respectively. Cayley Trick: Represent β-model as a log-linear model under product-multinomial constraints. Use a higher-dimensional marginal cone in R pv 2q v with smaller complexity. A. Rinaldo in Loglinear Models 21/32

67 Example: The β-model When v 4, there are 14 facial sets corresponding to the facets of P 4, 8 of which associated to a degree of 0 or 3. For v 5, example of a facial set for which the degrees are bounded away from 0 and 4. For v 6, example of a facial set for which the degrees are bounded away from 0 and 5. A. Rinaldo in Loglinear Models 22/32

68 Parameter Estimability Assume the Poisson scheme and suppose that the MLE does not exist. A J n P ripfq, for some random face F of C A of random dimension d F. Maximum likelihood estimation well-defined in E F. What can we do (estimate)? A. Rinaldo in Loglinear Models 23/32

69 Parameter Estimability Assume the Poisson scheme and suppose that the MLE does not exist. A J n P ripfq, for some random face F of C A of random dimension d F. Maximum likelihood estimation well-defined in E F. What can we do (estimate)? Let L F to be the subspace generated by the normal cone to F. Equivalence relation on R d L : θ F 1 θ2 if and only if θ 1 θ 2 P L F. Set θ LF for the equivalence class containing θ and Θ LF tθ LF, θ P R d u. Let F the facial set associated to F and let π F : R I Ñ R F be the coordinate projection: x ÞÑ txpiq: i P Fu. A. Rinaldo in Loglinear Models 23/32

70 Parameter Estimability Parameter Estimability L (i) The family E F is non-identifiable: any two points θ F 1 θ2 specify the same distribution. (ii) The family E F is parametrized by Θ LF, or, equivalently, by π F pmq and is of order d F. (iii) The set Θ LF is a d F -dimensional dimensional vector space comprised of parallel affine subspaces of R d of dimension dimpl F q d d F. It is isomorphic to π F pmq. For product multinomial sampling schemes, replace L F with L F tζ : Aζ P N u; π F pmq with π F pm a N q. A. Rinaldo in Loglinear Models 24/32

71 Parameter Estimability For the extended family E F with corresponding facial set F M X pn L F q c is estimable π F pv q is estimable where L F, F and their dimension d F are random. A. Rinaldo in Loglinear Models 25/32

72 Parameter Estimability For the extended family E F with corresponding facial set F M X pn L F q c is estimable π F pv q is estimable where L F, F and their dimension d F are random. Extended MLEs: Natural parametrization: Reparametrize E F using a new design mantrix A F with RpA F q M X pn L F q c. The extended MLE of the natural parameter is the MLE (which exists and is unique) of the corresponding family. Mean value parametrization: The extended MLE is the unique point where suppp pmq F. pm bdpv q X tx 0: A J x A J nu, Correct model complexity: the adjusted number of degrees of freedom is F d F. A. Rinaldo in Loglinear Models 25/32

73 Parameter Estimability: Examples 2 3 table and the model r12sr13sr23s 0 0 d F F 6: saturated model for E F! A. Rinaldo in Loglinear Models 26/32

74 Parameter Estimability: Examples 3 3 table and the model r12sr13sr23s d F F 18: saturated model for E F! A. Rinaldo in Loglinear Models 26/32

75 Parameter Estimability: Examples 3 3 table and the model r12sr13sr23s. The red zeros are not likelihood zeros d F 18, F 21: the number of adjusted degrees of freedom is 3. A. Rinaldo in Loglinear Models 26/32

76 Parameter Estimability: Examples 3 3 table and the model r12sr13sr23s The MLE exists! A. Rinaldo in Loglinear Models 26/32

77 Parameter Estimability: Mildew Fungus Example 1 2 D E F A B C A. Rinaldo in Loglinear Models 27/32

78 Parameter Estimability: Mildew Fungus Example MIM (Edwards, 2000) selected optimal model using a greedy stepwise backward model selection procedure based on testing individual edges. The final model is biologically plausible. A. Rinaldo in Loglinear Models 27/32

79 Parameter Estimability: Mildew Fungus Example Sequence of models found by MIM. Red boxes indicate zero margins (MLE does not exist). Model Unadjusted d.f. Adjusted d.f. [ABCDEF] 0 0 [ABCEF] [ABCDE] 16 3 [BCEF] [ABCDE] 24 6 [BCEF] [ABCE] [ABCD] [BCEF] [ABCE] [ABD] [BCEF] [AD] [ABCE] [CEF] [AD] [ABCE] [CEF] [AD] [BCE] [ABE] [CEF] [AD] [ABE] [CEF] [AD] [BE] [AB] [CF][CE][AD] [BE] [AB] A. Rinaldo in Loglinear Models 27/32

80 Extended : Numerical Procedures How to compute the extended MLE? A. Rinaldo in Loglinear Models 28/32

81 Extended : Numerical Procedures How to compute the extended MLE? If A J n P ripfq, for some face of C A, then (non-zero) points in the normal cone to F are directions of recession of the negative log-likelihood; the Fisher information Ipθq matrices for E F are singular. ζ J Ipθqζ P L P R d. A. Rinaldo in Loglinear Models 28/32

82 Extended : Numerical Procedures How to compute the extended MLE? If A J n P ripfq, for some face of C A, then (non-zero) points in the normal cone to F are directions of recession of the negative log-likelihood; the Fisher information Ipθq matrices for E F are singular. Two-step procedure: ζ J Ipθqζ P L P R d. Step 1 Identify the facial set F and compute a basis for π F To compute F: M X pn L F q c. Given t A J n, determine the facial set F of rows of A which span the face F of C A such that t P ripfq. A. Rinaldo in Loglinear Models 28/32

83 Extended : Numerical Procedures How to compute the extended MLE? If A J n P ripfq, for some face of C A, then (non-zero) points in the normal cone to F are directions of recession of the negative log-likelihood; the Fisher information Ipθq matrices for E F are singular. Two-step procedure: ζ J Ipθqζ P L P R d. Step 1 Identify the facial set F and compute a basis for π F To compute F: M X pn L F q c. Given t A J n, determine the facial set F of rows of A which span the face F of C A such that t P ripfq. Step 2 Optimize the restricted likelihood function for the extended family. When F is available, this is equivalent to maximizing the likelihood of a log-linear model with structural zeros along F. (Easy) A. Rinaldo in Loglinear Models 28/32

84 Algorithms for Extended Step 1 (finding Fq is the important one. Let A and A 0 the sub-matrix of A corresponding to positive and zero entries of n. A. Rinaldo in Loglinear Models 29/32

85 Algorithms for Extended Step 1 (finding Fq is the important one. Let A and A 0 the sub-matrix of A corresponding to positive and zero entries of n. Identifying F c is equivalent to finding of a vector c P R d such that: A c 0; A 0 c 0; the set supppacq has maximal cardinality. A. Rinaldo in Loglinear Models 29/32

86 Algorithms for Extended Step 1 (finding Fq is the important one. Let A and A 0 the sub-matrix of A corresponding to positive and zero entries of n. Identifying F c is equivalent to finding of a vector c P R d such that: A c 0; A 0 c 0; the set supppacq has maximal cardinality. This can be done with repeated iterations of LP (see also Geyer, 2009) or with non-linear methods. For large problems, they can be computationally intensive. Relevance to exact inference: knowledge of F can help finding Markov bases. A. Rinaldo in Loglinear Models 29/32

87 Reducible Hierarchical Log-Linear Models For reducible hierarchical log-linear models both tasks can be carried out in parallel over appropriate sub-models. A hierarchical log-linear model (simplicial complex) is reducible if it can be obtained as the direct join of two sub simplicial complex (see Lauritzen, 1996). Apply the definition recursively. A. Rinaldo in Loglinear Models 30/32

88 Reducible Hierarchical Log-Linear Models For reducible hierarchical log-linear models both tasks can be carried out in parallel over appropriate sub-models. A hierarchical log-linear model (simplicial complex) is reducible if it can be obtained as the direct join of two sub simplicial complex (see Lauritzen, 1996). Apply the definition recursively. A. Rinaldo in Loglinear Models 30/32

89 Reducible Hierarchical Log-Linear Models For reducible hierarchical log-linear models both tasks can be carried out in parallel over appropriate sub-models. A hierarchical log-linear model (simplicial complex) is reducible if it can be obtained as the direct join of two sub simplicial complex (see Lauritzen, 1996). Apply the definition recursively. A. Rinaldo in Loglinear Models 30/32

90 Reducible Hierarchical Log-Linear Models For reducible hierarchical log-linear models both tasks can be carried out in parallel over appropriate sub-models. A hierarchical log-linear model (simplicial complex) is reducible if it can be obtained as the direct join of two sub simplicial complex (see Lauritzen, 1996). Apply the definition recursively. Theoretical justification: reducible models are defined by cuts (Barndorff-Nielsen, 1974). A. Rinaldo in Loglinear Models 30/32

91 Reducible Hierarchical Log-Linear Models For reducible hierarchical log-linear models both tasks can be carried out in parallel over appropriate sub-models. A hierarchical log-linear model (simplicial complex) is reducible if it can be obtained as the direct join of two sub simplicial complex (see Lauritzen, 1996). Apply the definition recursively. Theoretical justification: reducible models are defined by cuts (Barndorff-Nielsen, 1974). Old idea: Hara et al. (2011), Engsrtöm et al. (2011), Sullivant (2007), Eriksson et al. (2006), Dobra and Sullivant (2004), Badsberg and Malvestuto (2001), Frydenberg (1990), Leimer (1993), Tarjan (1985). A. Rinaldo in Loglinear Models 30/32

92 Still lots of work ahead... The validity/applicability of model selection based on adjusted degree of freedom has to be investigated, especially in the double asymptotic framework. Computationally efficient methods for model selection for large tables still lacking. A. Rinaldo in Loglinear Models 31/32

93 Still lots of work ahead... Thank you Fienberg S. E. and Rinaldo, A. (2011). in Log-linear Models, Rinaldo, A., Petrović, S. and Fienberg, S. E. (2011). Maximum Likelihood Estimation in Network Models, Rinaldo, A., Fienberg, S. E. and Zhou, Y. (2009). On the Geometry of Discrete Exponential Families with Application to Exponential Random Graph Models, Electronic Journal of Statistics, 3, A. Rinaldo in Loglinear Models 31/32

94 9 nodes: sufficient statistics are the number of edges and triangles. 90 Convex support Number of triangles Number of edges

95 Entropy plot over the mean value space.

96 Entropy plot over the natural parameter space.

97 Entropy plots of the natural space and mean value spaces.

98 Entropy plots of the natural space space with superimposed the normal fan and of the mean value space.

Computing Maximum Likelihood Estimates in Log-Linear Models

Computing Maximum Likelihood Estimates in Log-Linear Models Computing Maximum Likelihood Estimates in Log-Linear Models Alessandro Rinaldo Department of Statistics Carnegie Mellon University June, 2006 Abstract We develop computational strategies for extended maximum

More information

Computing Maximum Likelihood Estimates in Log-Linear Models

Computing Maximum Likelihood Estimates in Log-Linear Models Computing Maximum Likelihood Estimates in Log-Linear Models Stephen E. Fienberg Center for Automated Learning and Discovery and Cylab Carnegie Mellon University Alessandro Rinaldo Department of Statistics

More information

Maximum Likelihood Estimation in Log-Linear Models Supplementary Material

Maximum Likelihood Estimation in Log-Linear Models Supplementary Material Maximum Likelihood Estimation in Log-Linear Models Supplementary Material Stephen E. Fienberg Department of Statistics Heinz College Machine Learning Department Cylab Carnegie Mellon University Pittsburgh,

More information

Polyhedral Conditions for the Nonexistence of the MLE for Hierarchical Log-linear Models

Polyhedral Conditions for the Nonexistence of the MLE for Hierarchical Log-linear Models Polyhedral Conditions for the Nonexistence of the MLE for Hierarchical Log-linear Models Nicholas Eriksson Department of Mathematics University of California, Berkeley Alessandro Rinaldo Department of

More information

Asymptotics and Extremal Properties of the Edge-Triangle Exponential Random Graph Model

Asymptotics and Extremal Properties of the Edge-Triangle Exponential Random Graph Model Asymptotics and Extremal Properties of the Edge-Triangle Exponential Random Graph Model Alessandro Rinaldo Carnegie Mellon University joint work with Mei Yin, Sukhada Fadnavis, Stephen E. Fienberg and

More information

arxiv: v2 [stat.ot] 10 Nov 2011

arxiv: v2 [stat.ot] 10 Nov 2011 Maximum Likelihood Estimation in Network Models arxiv:1105.6145v2 [stat.ot] 10 Nov 2011 Alessandro Rinaldo Department of Statistics Carnegie Mellon University Pittsburgh, PA, 15213-3890 USA Sonja Petrović

More information

arxiv:math/ v1 [math.co] 4 May 2004

arxiv:math/ v1 [math.co] 4 May 2004 Polyhedral Conditions for the Nonexistence of the MLE for Hierarchical Log-linear Models arxiv:math/0405044v1 [math.co] 4 May 2004 Nicholas Eriksson a Stephen E. Fienberg b,c Alessandro Rinaldo b Seth

More information

arxiv: v1 [stat.me] 5 Apr 2013

arxiv: v1 [stat.me] 5 Apr 2013 Logistic regression geometry Karim Anaya-Izquierdo arxiv:1304.1720v1 [stat.me] 5 Apr 2013 London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK e-mail: karim.anaya@lshtm.ac.uk and Frank Critchley

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Polyhedral conditions for the nonexistence of the MLE for hierarchical log-linear models

Polyhedral conditions for the nonexistence of the MLE for hierarchical log-linear models Journal of Symbolic Computation 41 (2006) 222 233 www.elsevier.com/locate/jsc Polyhedral conditions for the nonexistence of the MLE for hierarchical log-linear models Nicholas Eriksson a,stephene.fienberg

More information

arxiv: v1 [stat.ml] 30 Dec 2008

arxiv: v1 [stat.ml] 30 Dec 2008 On the Geometry of Discrete Exponential Families with Application to Exponential Random Graph Models arxiv:0901.0026v1 [stat.ml] 30 Dec 2008 Stephen E. Fienberg Department of Statistics, Machine Learning

More information

Marginal log-linear parameterization of conditional independence models

Marginal log-linear parameterization of conditional independence models Marginal log-linear parameterization of conditional independence models Tamás Rudas Department of Statistics, Faculty of Social Sciences ötvös Loránd University, Budapest rudas@tarki.hu Wicher Bergsma

More information

Algebraic Statistics progress report

Algebraic Statistics progress report Algebraic Statistics progress report Joe Neeman December 11, 2008 1 A model for biochemical reaction networks We consider a model introduced by Craciun, Pantea and Rempala [2] for identifying biochemical

More information

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as page1 Loglinear Models Loglinear models are a way to describe association and interaction patterns among categorical variables. They are commonly used to model cell counts in contingency tables. These

More information

Collapsed Variational Inference for Sum-Product Networks

Collapsed Variational Inference for Sum-Product Networks for Sum-Product Networks Han Zhao 1, Tameem Adel 2, Geoff Gordon 1, Brandon Amos 1 Presented by: Han Zhao Carnegie Mellon University 1, University of Amsterdam 2 June. 20th, 2016 1 / 26 Outline Background

More information

New Bayesian methods for model comparison

New Bayesian methods for model comparison Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Toric statistical models: parametric and binomial representations

Toric statistical models: parametric and binomial representations AISM (2007) 59:727 740 DOI 10.1007/s10463-006-0079-z Toric statistical models: parametric and binomial representations Fabio Rapallo Received: 21 February 2005 / Revised: 1 June 2006 / Published online:

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Smoothness of conditional independence models for discrete data

Smoothness of conditional independence models for discrete data Smoothness of conditional independence models for discrete data A. Forcina, Dipartimento di Economia, Finanza e Statistica, University of Perugia, Italy December 6, 2010 Abstract We investigate the family

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

1 Matroid intersection

1 Matroid intersection CS 369P: Polyhedral techniques in combinatorial optimization Instructor: Jan Vondrák Lecture date: October 21st, 2010 Scribe: Bernd Bandemer 1 Matroid intersection Given two matroids M 1 = (E, I 1 ) and

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

INFORMATION THEORY AND STATISTICS

INFORMATION THEORY AND STATISTICS INFORMATION THEORY AND STATISTICS Solomon Kullback DOVER PUBLICATIONS, INC. Mineola, New York Contents 1 DEFINITION OF INFORMATION 1 Introduction 1 2 Definition 3 3 Divergence 6 4 Examples 7 5 Problems...''.

More information

Introduction into Bayesian statistics

Introduction into Bayesian statistics Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Likelihood Inference in Exponential Families and Generic Directions of Recession

Likelihood Inference in Exponential Families and Generic Directions of Recession Likelihood Inference in Exponential Families and Generic Directions of Recession Charles J. Geyer School of Statistics University of Minnesota Elizabeth A. Thompson Department of Statistics University

More information

Empirical Evaluation and Estimation of Large-Scale, Nonlinear Economic Models

Empirical Evaluation and Estimation of Large-Scale, Nonlinear Economic Models Empirical Evaluation and Estimation of Large-Scale, Nonlinear Economic Models Michal Andrle, RES The views expressed herein are those of the author and should not be attributed to the International Monetary

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling Paul Marriott 1, Radka Sabolova 2, Germain Van Bever 2, and Frank Critchley 2 1 University of Waterloo, Waterloo, Ontario,

More information

Combinatorial Types of Tropical Eigenvector

Combinatorial Types of Tropical Eigenvector Combinatorial Types of Tropical Eigenvector arxiv:1105.55504 Ngoc Mai Tran Department of Statistics, UC Berkeley Joint work with Bernd Sturmfels 2 / 13 Tropical eigenvalues and eigenvectors Max-plus: (R,,

More information

Introduction to Algebraic Statistics

Introduction to Algebraic Statistics Introduction to Algebraic Statistics Seth Sullivant North Carolina State University January 5, 2017 Seth Sullivant (NCSU) Algebraic Statistics January 5, 2017 1 / 28 What is Algebraic Statistics? Observation

More information

Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data

Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data Stephen E. Fienberg Department of Statistics, Machine Learning Department and Cylab Carnegie Mellon University Pittsburgh,

More information

Algebraic Statistics for a Directed Random Graph Model with Reciprocation

Algebraic Statistics for a Directed Random Graph Model with Reciprocation Algebraic Statistics for a Directed Random Graph Model with Reciprocation Sonja Petrović, Alessandro Rinaldo, and Stephen E. Fienberg Abstract. The p 1 model is a directed random graph model used to describe

More information

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 1 Theory This section explains the theory of conjugate priors for exponential families of distributions,

More information

Decomposable Graphical Gaussian Models

Decomposable Graphical Gaussian Models CIMPA Summerschool, Hammamet 2011, Tunisia September 12, 2011 Basic algorithm This simple algorithm has complexity O( V + E ): 1. Choose v 0 V arbitrary and let v 0 = 1; 2. When vertices {1, 2,..., j}

More information

Lecture 10 February 4, 2013

Lecture 10 February 4, 2013 UBC CPSC 536N: Sparse Approximations Winter 2013 Prof Nick Harvey Lecture 10 February 4, 2013 Scribe: Alexandre Fréchette This lecture is about spanning trees and their polyhedral representation Throughout

More information

EKT of Some Wonderful Compactifications

EKT of Some Wonderful Compactifications EKT of Some Wonderful Compactifications and recent results on Complete Quadrics. (Based on joint works with Soumya Banerjee and Michael Joyce) Mahir Bilen Can April 16, 2016 Mahir Bilen Can EKT of Some

More information

THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS

THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS Proceedings of the 00 Winter Simulation Conference E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, eds. THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION

More information

Solving the MWT. Recall the ILP for the MWT. We can obtain a solution to the MWT problem by solving the following ILP:

Solving the MWT. Recall the ILP for the MWT. We can obtain a solution to the MWT problem by solving the following ILP: Solving the MWT Recall the ILP for the MWT. We can obtain a solution to the MWT problem by solving the following ILP: max subject to e i E ω i x i e i C E x i {0, 1} x i C E 1 for all critical mixed cycles

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Asymptotic Approximation of Marginal Likelihood Integrals

Asymptotic Approximation of Marginal Likelihood Integrals Asymptotic Approximation of Marginal Likelihood Integrals Shaowei Lin 10 Dec 2008 Abstract We study the asymptotics of marginal likelihood integrals for discrete models using resolution of singularities

More information

arxiv: v2 [math.co] 11 Oct 2016

arxiv: v2 [math.co] 11 Oct 2016 ON SUBSEQUENCES OF QUIDDITY CYCLES AND NICHOLS ALGEBRAS arxiv:1610.043v [math.co] 11 Oct 016 M. CUNTZ Abstract. We provide a tool to obtain local descriptions of quiddity cycles. As an application, we

More information

Toric Varieties in Statistics

Toric Varieties in Statistics Toric Varieties in Statistics Daniel Irving Bernstein and Seth Sullivant North Carolina State University dibernst@ncsu.edu http://www4.ncsu.edu/~dibernst/ http://arxiv.org/abs/50.063 http://arxiv.org/abs/508.0546

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

PRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM

PRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM PRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM ALEX FINK 1. Introduction and background Consider the discrete conditional independence model M given by {X 1 X 2 X 3, X 1 X 3 X 2 }. The intersection axiom

More information

2.1.3 The Testing Problem and Neave s Step Method

2.1.3 The Testing Problem and Neave s Step Method we can guarantee (1) that the (unknown) true parameter vector θ t Θ is an interior point of Θ, and (2) that ρ θt (R) > 0 for any R 2 Q. These are two of Birch s regularity conditions that were critical

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Open Problems in Algebraic Statistics

Open Problems in Algebraic Statistics Open Problems inalgebraic Statistics p. Open Problems in Algebraic Statistics BERND STURMFELS UNIVERSITY OF CALIFORNIA, BERKELEY and TECHNISCHE UNIVERSITÄT BERLIN Advertisement Oberwolfach Seminar Algebraic

More information

Linear Algebra March 16, 2019

Linear Algebra March 16, 2019 Linear Algebra March 16, 2019 2 Contents 0.1 Notation................................ 4 1 Systems of linear equations, and matrices 5 1.1 Systems of linear equations..................... 5 1.2 Augmented

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Affine Geometry and Discrete Legendre Transform

Affine Geometry and Discrete Legendre Transform Affine Geometry and Discrete Legendre Transform Andrey Novoseltsev April 24, 2008 Abstract The combinatorial duality used in Gross-Siebert approach to mirror symmetry is the discrete Legendre transform

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Chapter 1. Preliminaries

Chapter 1. Preliminaries Introduction This dissertation is a reading of chapter 4 in part I of the book : Integer and Combinatorial Optimization by George L. Nemhauser & Laurence A. Wolsey. The chapter elaborates links between

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

arxiv: v2 [stat.me] 5 May 2016

arxiv: v2 [stat.me] 5 May 2016 Palindromic Bernoulli distributions Giovanni M. Marchetti Dipartimento di Statistica, Informatica, Applicazioni G. Parenti, Florence, Italy e-mail: giovanni.marchetti@disia.unifi.it and Nanny Wermuth arxiv:1510.09072v2

More information

Hodge theory for combinatorial geometries

Hodge theory for combinatorial geometries Hodge theory for combinatorial geometries June Huh with Karim Adiprasito and Eric Katz June Huh 1 / 48 Three fundamental ideas: June Huh 2 / 48 Three fundamental ideas: The idea of Bernd Sturmfels that

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging 10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Geometry of Phylogenetic Inference

Geometry of Phylogenetic Inference Geometry of Phylogenetic Inference Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 References N. Eriksson, K. Ranestad, B. Sturmfels, S. Sullivant, Phylogenetic algebraic

More information

Markovian Combination of Decomposable Model Structures: MCMoSt

Markovian Combination of Decomposable Model Structures: MCMoSt Markovian Combination 1/45 London Math Society Durham Symposium on Mathematical Aspects of Graphical Models. June 30 - July, 2008 Markovian Combination of Decomposable Model Structures: MCMoSt Sung-Ho

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Time-Sensitive Dirichlet Process Mixture Models

Time-Sensitive Dirichlet Process Mixture Models Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce

More information

Variational algorithms for marginal MAP

Variational algorithms for marginal MAP Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Construction of an Informative Hierarchical Prior Distribution: Application to Electricity Load Forecasting

Construction of an Informative Hierarchical Prior Distribution: Application to Electricity Load Forecasting Construction of an Informative Hierarchical Prior Distribution: Application to Electricity Load Forecasting Anne Philippe Laboratoire de Mathématiques Jean Leray Université de Nantes Workshop EDF-INRIA,

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Ground states for exponential random graphs

Ground states for exponential random graphs Ground states for exponential random graphs Mei Yin Department of Mathematics, University of Denver March 5, 2018 We propose a perturbative method to estimate the normalization constant in exponential

More information

Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data

Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data Stephen E. Fienberg Department of Statistics, Machine Learning Department and Cylab Carnegie Mellon University Pittsburgh,

More information

ALGEBRAIC STATISTICAL MODELS

ALGEBRAIC STATISTICAL MODELS Statistica Sinica 7(007), 73-97 ALGEBRAIC STATISTICAL MODELS Mathias Drton and Seth Sullivant University of Chicago and Harvard University Abstract: Many statistical models are algebraic in that they are

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

Linear Algebra 1 Exam 2 Solutions 7/14/3

Linear Algebra 1 Exam 2 Solutions 7/14/3 Linear Algebra 1 Exam Solutions 7/14/3 Question 1 The line L has the symmetric equation: x 1 = y + 3 The line M has the parametric equation: = z 4. [x, y, z] = [ 4, 10, 5] + s[10, 7, ]. The line N is perpendicular

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information

The problematic art of counting

The problematic art of counting The problematic art of counting Ragni Piene SMC Stockholm November 16, 2011 Mikael Passare A (very) short history of counting: tokens and so on... Partitions Let n be a positive integer. In how many ways

More information

Maximum likelihood in log-linear models

Maximum likelihood in log-linear models Graphical Models, Lecture 4, Michaelmas Term 2010 October 22, 2010 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs Let A denote an arbitrary set of subsets

More information

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Lecture 8: Graphical models for Text

Lecture 8: Graphical models for Text Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/

More information

Intractable Likelihood Functions

Intractable Likelihood Functions Intractable Likelihood Functions Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Recap p(x y o ) = z p(x,y o,z) x,z

More information

Learning with Submodular Functions: A Convex Optimization Perspective

Learning with Submodular Functions: A Convex Optimization Perspective Foundations and Trends R in Machine Learning Vol. 6, No. 2-3 (2013) 145 373 c 2013 F. Bach DOI: 10.1561/2200000039 Learning with Submodular Functions: A Convex Optimization Perspective Francis Bach INRIA

More information

Identification of discrete concentration graph models with one hidden binary variable

Identification of discrete concentration graph models with one hidden binary variable Bernoulli 19(5A), 2013, 1920 1937 DOI: 10.3150/12-BEJ435 Identification of discrete concentration graph models with one hidden binary variable ELENA STANGHELLINI 1 and BARBARA VANTAGGI 2 1 D.E.F.S., Università

More information

COUNTING INTEGER POINTS IN POLYHEDRA. Alexander Barvinok

COUNTING INTEGER POINTS IN POLYHEDRA. Alexander Barvinok COUNTING INTEGER POINTS IN POLYHEDRA Alexander Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html Let P R d be a polytope. We want to compute (exactly or approximately)

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information