An Algebraic and Geometric Perspective on Exponential Families

An Algebraic and Geometric Perspective on Exponential Families Caroline Uhler (IST Austria) Based on two papers: with Mateusz Micha lek, Bernd Sturmfels, and Piotr Zwiernik, and with Liam Solus and Ruriko Yoshida Current Trends on Gröbner Bases July 10, 2015 Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 1 / 24

Gaussian Graphical Models A random vector X R m follows a multivariate Gaussian distribution with concentration matrix θ S m 0 if it has density ( p θ (x) = (2π) m/2 det(θ) 1/2 exp 1 ) 2 x T θx θ ij = 0 if and only if X i X j X {1,...,m}\{i,j} Represent conditional independence relations by undirected graph (a) Gene interactome (Novarino et al., Science 343, 2014) (b) Stock market (Garos & Panos, Physica A 380, 2007) Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 2 / 24

Exponential Families An exponential family is a parametric statistical model p θ (x) = exp ( θ, T (x) A(θ) ) with sample space X, base measure ν on X, and sufficient statistics T : X R d (measurable). Theorem A(θ) = log X exp( θ, T (x) ) ν(dx) is the log-partition function The following sets are convex: Space of canonical parameters: C = { θ R d : A(θ) < + } Space of sufficient statistics: K = conv ( T (X ) ) R d Suppose C is open and K spans R d. Then the gradient map F : R d R d, θ A(θ) defines an analytic bijection between C and int(k). Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 3 / 24

From Analysis to Algebra Our exponential families satisfy A(θ) = α log(f (θ)), where f (θ) is a homogeneous polynomial and α > 0. The gradient of the log-partition function is the rational function F : R d R d : θ α f (θ) ( f θ 1, f θ 2,..., f θ d ) Algebraic geometers prefer F : CP d 1 CP d 1 : θ ( f θ 1 : f θ 2 : : f ) θ d The partition function f (θ) α = K exp( θ, x ) ν(dx) admits nice integral representation. Which polynomials f (θ), exponents α > 0, and convex sets C, K R d are possible? Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 4 / 24

Multivariate Gaussian Distribution The Multivariate Gaussian distribution is the exponential family ( p θ (x) = (2π) m/2 det(θ) 1/2 exp ( = exp θ, 1 2 xx T 1 ) 2 x T θx, ( 1 2 log det(θ) + m )) 2 log(2π) X = R m, ν Lebesgue measure on X d = ( ) m+1 2, A, B = tr (AB) T (x) = 1 2 xx T, A(θ) = 1 2 log det(θ) + m 2 log(2π) C = S m 0, K = Sm 0, F (θ) = 1 2 θ 1 Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 5 / 24

Duality of Polytopes Duality of Polytopes Ex: How Example to morph (Howa to cube morph into aancube octahedron? into an octahedron?) [Sturmfels [St-Uhler& 2010, U. 2010, Example Example 3.5] 3.5] Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 6 / 24 6/22

Exponential family for cube octahedron Fix the product of linear forms f (θ) = (θ 2 1 θ 2 4)(θ 2 2 θ 2 4)(θ 2 3 θ 2 4) Space of canonical parameters is C = cone over the 3-cube { θ i < 1 : i = 1, 2, 3 } Duality of Polytopes Example (How to morph a cube into an octahedron?) Space of sufficient statistics is [St-Uhler 2010, Example 3.5] K = cone over the octahedron conv{±e 1, ±e 2, ±e 3 } Gradient map f : P 3 P 3 gives bijection between C and int(k). Question: What is (X, ν, T ) in this case? Answer: X = K, T = id, and ν constructed via hypergeometric functions Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 7 / 24

Hyperbolic Polynomials A homogeneous polynomial f R[θ 1,..., θ d ] of degree k is hyperbolic if, for some t R d, every line through t intersects the complex hypersurface {f = 0} in k real points. The connected component C of t in R d \{f = 0} is the hyperbolicity cone. It is an open convex cone. Theorem (Scott & Sokal, 2015) Let f be a homogeneous polynomial in R[θ 1,..., θ d ] that is strictly positive on an open convex cone C. If there exists α > 0 and measure ν such that f (θ) α = exp( θ, σ ) ν(dσ) for all θ C, C then f is hyperbolic with respect to each point in C. The resulting statistical models are hyperbolic exponential families. Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 8 / 24

Riesz Kernel Theorem (Gårding 1951) Let f R[θ 1,..., θ d ] be hyperbolic with hyperbolicity cone C. If α > d, then the following integral converges for any θ C, is independ of θ, and is supported on K = C : q α (σ) = (2π) d R d f (θ + iη) α exp( θ + iη, σ )dη. The polynomial f can be recovered from the Riesz kernel q α via f (θ) α = K exp( θ, σ ) q α(σ) dσ for all θ C. Given a hyperbolic polynomial, what is the annihilator of the Riesz kernel? f product of linear forms: what is the GKZ-system / D-ideal? f elementary symmetric polynomial? Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 9 / 24

Symmetric Determinant f (θ) = det(θ) is a hyperbolic polynomial in d = ( ) m+1 2 unknowns. Hyperbolicity cone C = S m 0 ; its dual is K = C = S m 0. f (θ) has integral representation f (θ) α = exp( θ, σ ) ν(dσ) K for all θ C if and only if α = 0, 1 2,..., m 1 2 or α > m 1 2 Measure ν(dσ) = q α (σ) dσ is given by Wishart density (measure induced on S m 0 by multivariate Gaussian distribution on Rm ) Riesz kernel: q α (σ) = 1 Γ m(α) m+1 det(σ)α 2 Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 10 / 24

Hyperbolic Exponential Families: Another Example The space of canonical parameters C is the hyperbolicity cone of f = θ 1 θ 2 θ 3 + θ 1 θ 2 θ 4 + θ 1 θ 3 θ 4 + θ 2 θ 3 θ 4. Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 11 / 24

Hyperbolic Exponential Families: Another Example The space of sufficient statistics K = C is defined by the Steiner surface σ 4 i 4 σ 3 i σ j + 6 σ 2 i σ 2 j + 4 σ 2 i σ j σ k 40 σ 1 σ 2 σ 3 σ 4. Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 12 / 24

Duality Gradient map f : P 3 P 3 gives a bijection between C and K: Open Problem: What is the Riesz kernel? Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 13 / 24

Intersecting with a Subspace Fix exponential family with rational gradient map F : C K. Main case: F = f where f is hyperbolic Consider a linear subspace L R d with C L := L C nonempty: Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 14 / 24

Exponential Varieties The exponential variety is the image under the gradient map: L F := F (L) P d 1. Its positive part L F 0 lives in K. Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 15 / 24

Convexity and Positivity Theorem (X, ν, T ) exponential family with rational gradient map F : R d R d, and L R d a linear subspace. Restricted gradient map F L is composition C L C F K π L K L. Convex set C L of canonical parameters maps bijectively to positive exponential variety L F 0, and LF 0 maps bijectively to interior of convex set K L of sufficient statistics. Maximum Likelihood Estimation for an exponential variety means inverting these two bijections (by solving polynomial equations). Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 16 / 24

Bijections in Pictures Green maps to blue maps to green. Inverting this map is MLE. 5 0-5 -10 Caroline Uhler (IST Austria) Exponential Varieties 0 5 Osaka, July 2015 17 / 24-10 -5

Maximum Likelihood Estimation Questions: Algebraic degree of inversion of F L? [MSUZ, 2015] When does the MLE exist? I.e. characterize int(k L ). Study K L and its defining polynomial. [SU, 2010] Study the extremal rays of C L. Study Gaussian graphical models on undirected graph G = ({1,..., m}, E): C G = {θ S m 0 θ ij = 0 for all (i, j) / E} K G = CG V = π G (S m 0) Characterize the ranks of extremal rays of C G. Maximal rank is 1 if and only if G chordal (Agler et al., 1988) All graphs of maximal rank 2 have been characterized (Laurent, 2001) Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 18 / 24

Elliptopes and Spectrahedral Shadows Without loss of generality we study the following convex bodies instead of the corresponding convex cones: These convex bodies are dual to each other Elliptope of G: K G = π G ({σ S m 0 diag(σ) = (1,..., 1)}) Spectrahedral shadow of G: C G = π G ({θ S m 0 θ ij = 0 for all (i, j) / E and tr (θ) = 2} Problem: Characterize the ranks of the extremal points of C G Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 19 / 24

Example: 4-cycle 1 2 σ 1 K G = σ 2 σ 3 R4 σ 4 4 : u, v s.t. 3 1 σ 1 u σ 4 σ 1 1 σ 2 v u σ 2 1 σ 3 0 σ 4 v σ 3 1 θ 1 a θ 1 0 θ 4 C G = θ 2 θ 3 R4 : a, b, c s.t. θ 1 b θ 2 0 7/7/14 θ 4 0 θ 2 c θ 3 θ 4 0 θ 3 2 a b c 0 Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 20 / 24

Cut Polytopes, Elliptopes and Spectrahedral Shadows Consider another convex body, the cut polytope of G: Let U V ; the corresponding cutset is the collection of edges δ(u) E with one endpoint in U and the other endpoint in U c Assign to each cutset δ(u) a (±1)-vector v R E with v e = 1 if and only if e δ(u) convex hull of all such vectors is the cut polytope CUT ±1 (G) Note: CUT ±1 (G) K G Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 21 / 24

Example: 3-cycle (a) CUT ±1 (G) (b) K G (c) C G CUT ±1 (G) = conv((1, 1, 1), ( 1, 1, 1), ( 1, 1, 1), (1, 1, 1)) 1 σ 1 σ 3 θ 1 a θ 1 θ 3 K G = σ 1 1 σ 2 0, C G = θ 2 : θ 1 b θ 2 0 σ 3 σ 2 1 θ 3 θ 3 θ 2 2 a b Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 22 / 24

Graphs with no K 4 / K 5 minors Theorem Graphs with no K 5 -minor have the facet-ray identification property, i.e. the normal vectors to the facets of CUT ±1 (G) identify extremal points in C G. If v T x = b is a supporting hyperplane of a facet of CUT ±1 (G), then the extremal ray given by the normal vector v has rank b. Theorem For graphs with no K 4 -minor the facets of the cut polytope identify all extremal ranks. The extremal ranks are {1, m 2 C m is an induced cycle of G}. Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 23 / 24

"#$%&'()*+*,-.**/#("012$02"'*32#))024)5*)'%06'&040"'*%2"$07* References 9%:('"0945*246*8941'7*2(;'<$208*;'9%'"$=*>?@!/*AB5*BCDCE Micha lek, Sturmfels, U., and Zwiernik: Exponential varieties, -.*3'9%'"$=*9&*%270%#%*(0F'(0G996*')"0%2"094*04*32#))024*;$2:G0 arxiv:1412.6185 (2014). 96'()*>G9:'&#((=*94*"G'*2$H01*<=*!#462=E Solus, U., and Yoshida: Extremal positive semidefinite matrices for weakly bipartite graphs, arxiv:1506.06702 (2015). G246$2)'F2$245*!G2G5*,-*?)=%:"9"08)*9&*%270%#%*(0F'(0G996* )"0%2"094*04*32#))024*8=8(')*>04*:$9;$'))E Sturmfels, and U.: Multivariate Gaussians, semidefinite matrix completion, and convex algebraic geometry, Ann. Inst. Stat. Math. 62, (2010).!"#$%&'()* Caroline Uhler (IST Austria) Exponential Varieties Osaka, July 2015 24 / 24