Testing Algebraic Hypotheses Mathias Drton Department of Statistics University of Chicago 1 / 18
Example: Factor analysis Multivariate normal model based on conditional independence given hidden variable: X 1 = γ 1 H + ɛ 1, X 2 = γ 2 H + ɛ 2, X 3 = γ 3 H + ɛ 3, X1 X2 X3 X4 X 4 = γ 4 H + ɛ 4 Software (e.g. factanal in R) tests goodness-of-fit using LRT and χ 2 2 -approximation H 2 / 18
Example: Factor analysis Histograms of 20,000 simulated p-values for sample size n = 1000: Γ = (1, 1, 1, 1) t Γ = (1, 1, 1, 0) t Γ = (1, 1, 0, 0) t Γ = (1, 0, 0, 0) t 0.0 0.4 0.8 0.0 0.6 1.2 0.0 1.0 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 p value p value p value p value (Conditional/error variances = 1/3 = correlations = 0 or 3/4) Three types of limiting distributions? 3 / 18
Algebraic models Asymptotic behavior of the LRT in hidden variable models? Hidden variable models: parameter space smooth manifold Classical hidden variable models: parameter space is semi-algebraic Definition A semi-algebraic set is a finite union of the form Θ 0 = m {θ R k f (θ) = 0 for f F i and h(θ) > 0 for h H i } i=1 where F i, H i are finite collections of polynomials with real coefficients. 4 / 18
Example: Factor analysis Polynomially parametrized covariance matrix: ω 1 + γ1 2 γ 1 γ 2 γ 1 γ 3 γ 1 γ 4 ω 2 + γ2 2 γ 2 γ 3 γ 2 γ 4 Σ = ω 3 + γ3 2 γ 3 γ 4 ω 4 + γ 2 4 (assumed Var[H] = 1) Theorem (Tarski-Seidenberg) If g : R d R k is a polynomial map and Γ is a semi-algebraic set, then Θ 0 = g(γ) is semi-algebraic. 5 / 18
Likelihood ratio test Observations X (1),..., X (n) i.i.d. P θ, θ Θ R k Likelihood function where p θ (x) is density of P θ. Test for some Θ 0 Θ 1 Θ. L n : Θ R, θ n p θ (X (i) ), i=1 H 0 : θ Θ 0 vs. H 1 : θ Θ 1 \ Θ 0 Likelihood ratio test rejects H 0 for large values of the LR statistic λ n = 2 log sup θ Θ 1 L n (θ) sup θ Θ0 L n (θ).
Normal means Given observations and a set Θ 0 R k, test LR statistic: X (1),..., X (n) i.i.d. N (θ, Id k ), θ R k, H 0 : θ Θ 0 vs. H 1 : θ Θ 0 λ n = n inf θ Θ 0 X n θ 2 2 = inf θ Θ 0 n( X n θ 0 ) n(θ θ 0 ) 2 2 where θ 0 Θ 0 is true parameter and X n is the sample mean. Large sample distribution Squared distance between Z N (0, Id k ) and limit of n(θ 0 θ 0 ) 7 / 18
Normal means: Cuspidal cubic Cuspidal cubic Θ 0 = {(θ 1, θ 2 ) : θ1 3 = θ2} 2 Tangent cone at θ 0 = 0 is half-ray: TC 0 (Θ 0 ) = {θ : θ 1 0, θ 2 = 0} Mixture of chi-squares: 2 1 1 2 0.5 1.0 1.5 0 λ n Definition (Tangent cone) { TC θ0 (Θ 0 ) = D 1 2 χ2 1 + 1 2 χ2 2. } θ n θ 0 lim : β n > 0, θ n Θ 0, θ n θ 0 n β n 3 8 / 18
Chernoff s theorem Suppose {P θ : θ Θ} is a regular exponential family with Θ R k. Let θ 0 Θ 0 Θ be true parameter point with Fisher-information I (θ 0 ). If Θ 0 is Chernoff-regular at θ 0 and n, then LR statistic λ n for H 0 : θ Θ 0 vs. H 1 : θ Θ 0 converges to min τ TC θ0 (Θ 0 ) Z I (θ 0) 1/2 τ 2 2 where Z N (0, Id k ) and I (θ 0 ) 1/2 is any matrix square root of I (θ 0 ). Chi-square distributions Chi-square distribution: distance from linear space Chi-square mixtures: distances from convex cones 9 / 18
What is Chernoff-regularity? Condition on how tangent cone locally approximates a set. Definition A set Θ 0 R k is Chernoff-regular at θ 0 if For all τ TC θ0 (Θ 0 ) and β n 0 there exists a sequence θ n θ 0 in Θ 0 such that θ n θ 0 lim = τ. n β n Lemma Semi-algebraic sets are everywhere Chernoff-regular. 10 / 18
Algebra Geometry of a semi-algebraic set Θ 0 R k expresses itself algebraically in the vanishing ideal I(Θ 0 ) = {f R[t 1,..., t k ] : f (θ) = 0 for all θ Θ 0 }. Computation with suitable finite generating sets f 1,..., f s = I(Θ 0 ), f 1,..., f s R[t 1,..., t k ] reveals singularities and provides information about tangent cones. 11 / 18
Cuspidal cubic Cuspidal cubic Θ 0 = {(θ 1, θ 2 ) : θ1 3 θ2 2 = 0} Singularity at zero: (θ1 3 θ2) 2 = 0 Algebraic tangent cone {θ : θ2 2 = 0} = {θ : θ 2 = 0} 2 1 1 2 3 0.5 1.0 1.5 contains as full-dimensional subset the tangent cone: TC 0 (Θ 0 ) = {(θ 1, θ 2 ) : θ 1 0, θ 2 = 0} 12 / 18
An incorrect personal view of the one-factor model Singularities: 0 0 0 0 0 0 0 0 0 0 13 / 18
Bootstrapping/Subsampling: 4000 simulations 0 50 150 250 350 0 100 200 300 400 0 50 100 200 0.0 0.2 0.4 0.6 0.8 1.0 p value 0.2 0.4 0.6 0.8 1.0 p value 0.0 0.2 0.4 0.6 0.8 1.0 p value (Σ 0 = I 4 4, n = 2000, m = 100) 14 / 18
Wald test Model {P θ : θ Θ}, Θ R k, with asymptotically normal estimator n(ˆθ n θ 0 ) d N (0, Σ(θ 0 )), where Σ(θ 0 ) is positive definite and depends continuously on θ 0. Test polynomial constraint f (θ) = 0 using Wald statistic W n = f (ˆθ n ) 2 Var[f (ˆθ n )] = n f (ˆθ n ) 2 f (ˆθ n ) t Σ(ˆθ n ) f (ˆθ n ) Lemma (Smooth case) If f (θ 0 ) = 0 and f (θ 0 ) 0, then W n d χ 2 1. Equivalence of LRT and Wald test 15 / 18
General asymptotics Write f (t) = L f h (t θ 0 ), h=l where f h are homogeneous polynomials, deg(f h ) = h and f l 0. Since f (θ 0 ) = 0, the minimal degree l 1, and we define f θ0,min = f l. For any polyhomial h, define W (h, Σ) = h(z) 2 h(z) T Σ h(z), Z N (0, Σ). Lemma If f (θ 0 ) = 0, then W n d W (f θ0,min, Σ(θ 0 )). 16 / 18
Theorem (work in progress, with H. Xiao) (a) If h(z) = Z u 1 Z v 2 and Σ any positive definite matrix, then W (h, Σ) 1 (u + v) 2 χ2 1. (b) Let h(z) = az1 2 + 2bZ 1Z 2 + cz2 2 and Σ = I. Always, 1 4 1 χ2 1 d W (h, I ) d 4 χ2 2. If b 2 ac 0, then W (h, I ) 1 4 χ2 1. If b 2 ac < 0, then W (h, I ) 1 [ 4(ac b 2 ] 4 ) (a + c) 2 Z 1 2 + Z2 2. Note: Behavior of Wald test very different from LRT at singularities.
Take home Hidden variables algebraic models with ( arbitrarily ) complicated singularities Algebraic models: LR statistic always converges to distance from tangent cone; Standard bootstrap (n-out-of-n) inconsistent at singularities (e.g., see forthcoming paper with B. Williams) Subsampling/m-out-of-n bootstrap ok in pointwise asymptotic sense. In singular models, Wald LRT References, see e.g. [Drton, Sturmfels and Sullivant: Lectures on Algebraic Statistics, Oberwolfach Seminars Series, Vol. 39, Birkhäuser, Basel, 2009] 18 / 18
Regular exponential family Let P Θ = {P θ : θ Θ} be a family of prob. distributions on X R m that have densities p θ wrto. measure ν. We call P Θ an exponential family if there is a statistic T : X R k and functions h : Θ R k and Z : Θ R such that p θ (x) = 1 exp{ h(θ), T (x) }, x X. Z(θ) We say that P Θ is a regular exponential family (of order k) if { } H = η R k : exp{ η, T (x) } dν(x) < is an open subset of R k and h a diffeomorphism between Θ and H. X Fisher-information matrix Positive (semi-)definite matrix I (θ) with entries [( ) ( )] I (θ) ij = E θ log p θ (X ) log p θ (X ), i, j [k]. θ i θ j 18 / 18