Testing Algebraic Hypotheses

Similar documents
arxiv:math/ v3 [math.st] 2 Apr 2009

ALGEBRAIC STATISTICAL MODELS

arxiv: v1 [math.st] 22 Jun 2018

Stat 710: Mathematical Statistics Lecture 31

Composite Hypotheses and Generalized Likelihood Ratio Tests

Lecture 32: Asymptotic confidence sets and likelihoods

Lecture 17: Likelihood ratio and asymptotic tests

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

1 Glivenko-Cantelli type theorems

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

The intersection axiom of

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

Chapter 7. Hypothesis Testing

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Statistical Data Analysis Stat 3: p-values, parameter estimation

Lecture 28: Asymptotic confidence sets

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Master s Written Examination

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

DA Freedman Notes on the MLE Fall 2003

Convergence of Quantum Statistical Experiments

Some General Types of Tests

Exercises Chapter 4 Statistical Hypothesis Testing

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Open Problems in Algebraic Statistics

Final Examination Statistics 200C. T. Ferguson June 11, 2009

Asymptotic Approximation of Marginal Likelihood Integrals

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

simple if it completely specifies the density of x

WALD TESTS OF SINGULAR HYPOTHESES. By Mathias Drton and Han Xiao University of Washington and Rutgers University

A Very Brief Summary of Statistical Inference, and Examples

Chapter 4. Theory of Tests. 4.1 Introduction

Statistical Inference of Moment Structures

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Concentration Inequalities for Random Matrices

Hypothesis testing: theory and methods

Lecture 26: Likelihood ratio tests

Maximum Likelihood Estimation

Introduction to Estimation Methods for Time Series models Lecture 2

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

STAT 461/561- Assignments, Year 2015

Complexity of two and multi-stage stochastic programming problems

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Information geometry for bivariate distribution control

10. Composite Hypothesis Testing. ECE 830, Spring 2014

Lecture 21. Hypothesis Testing II

Geometry of Phylogenetic Inference

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Maximum Likelihood Estimation

High-dimensional graphical model selection: Practical and information-theoretic limits

Algebraic Statistics progress report

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Statistics Ph.D. Qualifying Exam

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models

Qualifying Exam in Probability and Statistics.

Nonparametric Tests for Multi-parameter M-estimators

Consistency of the maximum likelihood estimator for general hidden Markov models

Semiparametric posterior limits

Contents. O-minimal geometry. Tobias Kaiser. Universität Passau. 19. Juli O-minimal geometry

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

Mixture Models and Representational Power of RBM s, DBN s and DBM s

Graduate Econometrics I: Maximum Likelihood I

Chernoff Index for Cox Test of Separate Parametric Families. Xiaoou Li, Jingchen Liu, and Zhiliang Ying Columbia University.

Lecture 1: Introduction

Lecture 8: Information Theory and Statistics

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

1 Likelihood. 1.1 Likelihood function. Likelihood & Maximum Likelihood Estimators

Statistical Inference

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

Testing Statistical Hypotheses

1. Plurisubharmonic functions and currents The first part of the homework studies some properties of PSH functions.

Economics 583: Econometric Theory I A Primer on Asymptotics

Rectifiability of sets and measures

Chapter 3: Maximum Likelihood Theory

The Geometry of Cubic Maps

A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS

Testing Restrictions and Comparing Models

Resolution of Singularities in Algebraic Varieties

Optimal exact tests for complex alternative hypotheses on cross tabulated data

Convergence of Multivariate Quantile Surfaces

INDIAN INSTITUTE OF TECHNOLOGY BOMBAY MA205 Complex Analysis Autumn 2012

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

TESTS FOR HOMOGENEITY IN NORMAL MIXTURES IN THE PRESENCE OF A STRUCTURAL PARAMETER: TECHNICAL DETAILS

Lecture 8: Information Theory and Statistics

Information geometry of mirror descent

Location Multiplicative Error Model. Asymptotic Inference and Empirical Analysis

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

LECTURE 15: COMPLETENESS AND CONVEXITY

Lecture 8 Inequality Testing and Moment Inequality Models

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

STAT 7032 Probability Spring Wlodek Bryc

Transcription:

Testing Algebraic Hypotheses Mathias Drton Department of Statistics University of Chicago 1 / 18

Example: Factor analysis Multivariate normal model based on conditional independence given hidden variable: X 1 = γ 1 H + ɛ 1, X 2 = γ 2 H + ɛ 2, X 3 = γ 3 H + ɛ 3, X1 X2 X3 X4 X 4 = γ 4 H + ɛ 4 Software (e.g. factanal in R) tests goodness-of-fit using LRT and χ 2 2 -approximation H 2 / 18

Example: Factor analysis Histograms of 20,000 simulated p-values for sample size n = 1000: Γ = (1, 1, 1, 1) t Γ = (1, 1, 1, 0) t Γ = (1, 1, 0, 0) t Γ = (1, 0, 0, 0) t 0.0 0.4 0.8 0.0 0.6 1.2 0.0 1.0 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 p value p value p value p value (Conditional/error variances = 1/3 = correlations = 0 or 3/4) Three types of limiting distributions? 3 / 18

Algebraic models Asymptotic behavior of the LRT in hidden variable models? Hidden variable models: parameter space smooth manifold Classical hidden variable models: parameter space is semi-algebraic Definition A semi-algebraic set is a finite union of the form Θ 0 = m {θ R k f (θ) = 0 for f F i and h(θ) > 0 for h H i } i=1 where F i, H i are finite collections of polynomials with real coefficients. 4 / 18

Example: Factor analysis Polynomially parametrized covariance matrix: ω 1 + γ1 2 γ 1 γ 2 γ 1 γ 3 γ 1 γ 4 ω 2 + γ2 2 γ 2 γ 3 γ 2 γ 4 Σ = ω 3 + γ3 2 γ 3 γ 4 ω 4 + γ 2 4 (assumed Var[H] = 1) Theorem (Tarski-Seidenberg) If g : R d R k is a polynomial map and Γ is a semi-algebraic set, then Θ 0 = g(γ) is semi-algebraic. 5 / 18

Likelihood ratio test Observations X (1),..., X (n) i.i.d. P θ, θ Θ R k Likelihood function where p θ (x) is density of P θ. Test for some Θ 0 Θ 1 Θ. L n : Θ R, θ n p θ (X (i) ), i=1 H 0 : θ Θ 0 vs. H 1 : θ Θ 1 \ Θ 0 Likelihood ratio test rejects H 0 for large values of the LR statistic λ n = 2 log sup θ Θ 1 L n (θ) sup θ Θ0 L n (θ).

Normal means Given observations and a set Θ 0 R k, test LR statistic: X (1),..., X (n) i.i.d. N (θ, Id k ), θ R k, H 0 : θ Θ 0 vs. H 1 : θ Θ 0 λ n = n inf θ Θ 0 X n θ 2 2 = inf θ Θ 0 n( X n θ 0 ) n(θ θ 0 ) 2 2 where θ 0 Θ 0 is true parameter and X n is the sample mean. Large sample distribution Squared distance between Z N (0, Id k ) and limit of n(θ 0 θ 0 ) 7 / 18

Normal means: Cuspidal cubic Cuspidal cubic Θ 0 = {(θ 1, θ 2 ) : θ1 3 = θ2} 2 Tangent cone at θ 0 = 0 is half-ray: TC 0 (Θ 0 ) = {θ : θ 1 0, θ 2 = 0} Mixture of chi-squares: 2 1 1 2 0.5 1.0 1.5 0 λ n Definition (Tangent cone) { TC θ0 (Θ 0 ) = D 1 2 χ2 1 + 1 2 χ2 2. } θ n θ 0 lim : β n > 0, θ n Θ 0, θ n θ 0 n β n 3 8 / 18

Chernoff s theorem Suppose {P θ : θ Θ} is a regular exponential family with Θ R k. Let θ 0 Θ 0 Θ be true parameter point with Fisher-information I (θ 0 ). If Θ 0 is Chernoff-regular at θ 0 and n, then LR statistic λ n for H 0 : θ Θ 0 vs. H 1 : θ Θ 0 converges to min τ TC θ0 (Θ 0 ) Z I (θ 0) 1/2 τ 2 2 where Z N (0, Id k ) and I (θ 0 ) 1/2 is any matrix square root of I (θ 0 ). Chi-square distributions Chi-square distribution: distance from linear space Chi-square mixtures: distances from convex cones 9 / 18

What is Chernoff-regularity? Condition on how tangent cone locally approximates a set. Definition A set Θ 0 R k is Chernoff-regular at θ 0 if For all τ TC θ0 (Θ 0 ) and β n 0 there exists a sequence θ n θ 0 in Θ 0 such that θ n θ 0 lim = τ. n β n Lemma Semi-algebraic sets are everywhere Chernoff-regular. 10 / 18

Algebra Geometry of a semi-algebraic set Θ 0 R k expresses itself algebraically in the vanishing ideal I(Θ 0 ) = {f R[t 1,..., t k ] : f (θ) = 0 for all θ Θ 0 }. Computation with suitable finite generating sets f 1,..., f s = I(Θ 0 ), f 1,..., f s R[t 1,..., t k ] reveals singularities and provides information about tangent cones. 11 / 18

Cuspidal cubic Cuspidal cubic Θ 0 = {(θ 1, θ 2 ) : θ1 3 θ2 2 = 0} Singularity at zero: (θ1 3 θ2) 2 = 0 Algebraic tangent cone {θ : θ2 2 = 0} = {θ : θ 2 = 0} 2 1 1 2 3 0.5 1.0 1.5 contains as full-dimensional subset the tangent cone: TC 0 (Θ 0 ) = {(θ 1, θ 2 ) : θ 1 0, θ 2 = 0} 12 / 18

An incorrect personal view of the one-factor model Singularities: 0 0 0 0 0 0 0 0 0 0 13 / 18

Bootstrapping/Subsampling: 4000 simulations 0 50 150 250 350 0 100 200 300 400 0 50 100 200 0.0 0.2 0.4 0.6 0.8 1.0 p value 0.2 0.4 0.6 0.8 1.0 p value 0.0 0.2 0.4 0.6 0.8 1.0 p value (Σ 0 = I 4 4, n = 2000, m = 100) 14 / 18

Wald test Model {P θ : θ Θ}, Θ R k, with asymptotically normal estimator n(ˆθ n θ 0 ) d N (0, Σ(θ 0 )), where Σ(θ 0 ) is positive definite and depends continuously on θ 0. Test polynomial constraint f (θ) = 0 using Wald statistic W n = f (ˆθ n ) 2 Var[f (ˆθ n )] = n f (ˆθ n ) 2 f (ˆθ n ) t Σ(ˆθ n ) f (ˆθ n ) Lemma (Smooth case) If f (θ 0 ) = 0 and f (θ 0 ) 0, then W n d χ 2 1. Equivalence of LRT and Wald test 15 / 18

General asymptotics Write f (t) = L f h (t θ 0 ), h=l where f h are homogeneous polynomials, deg(f h ) = h and f l 0. Since f (θ 0 ) = 0, the minimal degree l 1, and we define f θ0,min = f l. For any polyhomial h, define W (h, Σ) = h(z) 2 h(z) T Σ h(z), Z N (0, Σ). Lemma If f (θ 0 ) = 0, then W n d W (f θ0,min, Σ(θ 0 )). 16 / 18

Theorem (work in progress, with H. Xiao) (a) If h(z) = Z u 1 Z v 2 and Σ any positive definite matrix, then W (h, Σ) 1 (u + v) 2 χ2 1. (b) Let h(z) = az1 2 + 2bZ 1Z 2 + cz2 2 and Σ = I. Always, 1 4 1 χ2 1 d W (h, I ) d 4 χ2 2. If b 2 ac 0, then W (h, I ) 1 4 χ2 1. If b 2 ac < 0, then W (h, I ) 1 [ 4(ac b 2 ] 4 ) (a + c) 2 Z 1 2 + Z2 2. Note: Behavior of Wald test very different from LRT at singularities.

Take home Hidden variables algebraic models with ( arbitrarily ) complicated singularities Algebraic models: LR statistic always converges to distance from tangent cone; Standard bootstrap (n-out-of-n) inconsistent at singularities (e.g., see forthcoming paper with B. Williams) Subsampling/m-out-of-n bootstrap ok in pointwise asymptotic sense. In singular models, Wald LRT References, see e.g. [Drton, Sturmfels and Sullivant: Lectures on Algebraic Statistics, Oberwolfach Seminars Series, Vol. 39, Birkhäuser, Basel, 2009] 18 / 18

Regular exponential family Let P Θ = {P θ : θ Θ} be a family of prob. distributions on X R m that have densities p θ wrto. measure ν. We call P Θ an exponential family if there is a statistic T : X R k and functions h : Θ R k and Z : Θ R such that p θ (x) = 1 exp{ h(θ), T (x) }, x X. Z(θ) We say that P Θ is a regular exponential family (of order k) if { } H = η R k : exp{ η, T (x) } dν(x) < is an open subset of R k and h a diffeomorphism between Θ and H. X Fisher-information matrix Positive (semi-)definite matrix I (θ) with entries [( ) ( )] I (θ) ij = E θ log p θ (X ) log p θ (X ), i, j [k]. θ i θ j 18 / 18