ON THE UNIQUENESS OF THE M. L. ESTIMATES IN CURVED EXPONENTIAL FAMILIES

Similar documents
Fast Matrix Multiplication*

2 Functions of random variables

Exhibit 2-9/30/15 Invoice Filing Page 1841 of Page 3660 Docket No

BMIR Lecture Series on Probability and Statistics Fall 2015 Discrete RVs

Maxima and Minima. (a, b) of R if

Multivariable Calculus

' '-'in.-i 1 'iritt in \ rrivfi pr' 1 p. ru

Nonisomorphic algebraic models of Nash manifolds and compactifiable C^ manifolds. Osaka Journal of Mathematics. 31(4) P.831-P.835

CHRISTENSEN ZERO SETS AND MEASURABLE CONVEX FUNCTIONS1 PAL FISCHER AND ZBIGNIEW SLODKOWSKI

LECTURE 16: LIE GROUPS AND THEIR LIE ALGEBRAS. 1. Lie groups

Linear Ordinary Differential Equations

9.6: Matrix Exponential, Repeated Eigenvalues. Ex.: A = x 1 (t) = e t 2 F.M.: If we set

176 5 t h Fl oo r. 337 P o ly me r Ma te ri al s

ACCELERATED LIFE MODELS WHEN THE STRESS IS NOT CONSTANT

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

SOME CHARACTERIZATION

Symmetric Matrices and Eigendecomposition

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

i c Robert C. Gunning

Topics in Probability and Statistics

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Random Vectors and Random Sampling. 1+ x2 +y 2 ) (n+2)/2

First order differential equations

i.ea IE !e e sv?f 'il i+x3p \r= v * 5,?: S i- co i, ==:= SOrq) Xgs'iY # oo .9 9 PE * v E=S s->'d =ar4lq 5,n =.9 '{nl a':1 t F #l *r C\ t-e

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Name of the Student: Problems on Discrete & Continuous R.Vs

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Gaussian interval quadrature rule for exponential weights

Modelling Non-linear and Non-stationary Time Series

Measure-theoretic probability

Class notes: Approximation

2. Second-order Linear Ordinary Differential Equations

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves

ESTIMATION THEORY. Chapter Estimation of Random Variables

GEOMETRY OF GAUSSIAN NONLINEAR REGRESSION PARALLEL CURVES AND CONFIDENCE INTERVALS

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

LIST OF FORMULAS FOR STK1100 AND STK1110

Lecture 1 Measure concentration

ODEs Cathal Ormond 1

MAS223 Statistical Inference and Modelling Exercises

arxiv: v1 [math.dg] 4 Sep 2009

SPACES OF MATRICES WITH SEVERAL ZERO EIGENVALUES

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

2.10 Saddles, Nodes, Foci and Centers

Topological properties

ASYMPTOTIC DISTRIBUTION OF THE MAXIMUM CUMULATIVE SUM OF INDEPENDENT RANDOM VARIABLES

VAROL'S PERMUTATION AND ITS GENERALIZATION. Bolian Liu South China Normal University, Guangzhou, P. R. of China {Submitted June 1994)

NEW ALGORITHM FOR MINIMAL SOLUTION OF LINEAR POLYNOMIAL EQUATIONS

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Name of the Student: Problems on Discrete & Continuous R.Vs

Gaussian processes for inference in stochastic differential equations

f(a)(\c = f(f l(c)c A). K.1 Compactly generated spaces

THE I Establiifrad June, 1893

SEPARATION AXIOMS FOR INTERVAL TOPOLOGIES

1. First and Second Order Conditions for A Local Min or Max.

Multivariate Analysis and Likelihood Inference

F O R M U L A S F O R CONVOLUTION F I B O N A C C I NUMBERS AND P O L Y N O M I A L S

Fitting Linear Statistical Models to Data by Least Squares I: Introduction

Mathematics Course 111: Algebra I Part I: Algebraic Structures, Sets and Permutations

ON THE MAXIMUM OF A NORMAL STATIONARY STOCHASTIC PROCESS 1 BY HARALD CRAMER. Communicated by W. Feller, May 1, 1962

µ X (A) = P ( X 1 (A) )

Math 240 Calculus III

arxiv: v1 [stat.me] 3 Nov 2015

1 Review of di erential calculus

Multiple Time Analyticity of a Statistical Satisfying the Boundary Condition

ON THE CAUCHY EQUATION ON SPHERES

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

PolynomialExponential Equations in Two Variables


A Parent s Guide to Understanding Family Life Education in Catholic Schools (and how it connects with Ontario s revised Health and Physical Education

Lecture10: Plasma Physics 1. APPH E6101x Columbia University

Inequality Constraints

Introduction to Probability and Stocastic Processes - Part I

DS-GA 1002 Lecture notes 10 November 23, Linear models

Convex Geometry. Carsten Schütt

Journal of Inequalities in Pure and Applied Mathematics

DUNFORD-PETTIS OPERATORS ON BANACH LATTICES1

Algebra II. Paulius Drungilas and Jonas Jankauskas

DIFFERENTIABILITY VIA DIRECTIONAL DERIVATIVES

B E E R W I N E M E N U

LECTURE 10: THE PARALLEL TRANSPORT

conditional cdf, conditional pdf, total probability theorem?

The Evolution of Outsourcing

Predicate Logic. 1 Predicate Logic Symbolization

LOWELL JOURNAL. DEBS IS DOOMED. Presldrtit Cleveland Write* to the New York Democratic Rilltors. friends were present at the banquet of

Subset selection for matrices

MAT 473 Intermediate Real Analysis II

The likelihood for a state space model

c) LC=Cl+C2+C3; C1:x=dx=O; C2: y=1-x; C3:y=dy=0 1 ydx-xdy (1-x)~x-x(-dx)+ = Jdldx = 1.

MATH 409 Advanced Calculus I Lecture 12: Uniform continuity. Exponential functions.

NONTANGENTIAL HOMOTOPY EQUIVALENCES

Random Variables (Continuous Case)

On the eigenvalues of Euclidean distance matrices

DYNAMICS OF THE ZEROS OF FIBONACCI POLYNOMIALS M. X. He Nova Southeastern University, Fort Lauderdale, FL. D, Simon

Section 2. Basic formulas and identities in Riemannian geometry

Derivatives and Integrals

First order Partial Differential equations

CONTROLLABILITY OF NONLINEAR SYSTEMS WITH DELAYS IN BOTH STATE AND CONTROL VARIABLES

Notes on Random Vectors and Multivariate Normal

This examination consists of 11 pages. Please check that you have a complete copy. Time: 2.5 hrs INSTRUCTIONS

CONVERTING OBSERVED LIKELIHOOD FUNCTIONS TO TAIL PROBABILITIES. D.A.S. Fraser Mathematics Department York University North York, Ontario M3J 1P3

Transcription:

KYBERNETIKA- VOLUME 22 (1986), NUMBER 2 ON THE UNIQUENESS OF THE M. L. ESTIMATES IN CURVED EXPONENTIAL FAMILIES ANDREJ PAZMAN Curved families (cf. Efron [3]) imbedded in exponential families having full rank differentiable sufficient statistics are considered. It is proved that if the rank is less or equal to the dimension of the sample space, then the maximum likelihood estimate is unique. Examples: the gaussian nonlinear regression, the gaussian family with unknown mean and variance, the ^-distribution. Generalized curved exponential families are considered as well. 1. INTRODUCTION AND EXAMPLE Curved exponential families, i.e. statistical models which can be imbedded in the well known exponential families of probability distributions, have been extensively studied recently in several papers; probably the best known are [3] and [1]. The importance of such families is in the possibility to obtain very general results for a large class of different statistical problems with the aid of geometrical considerations (see the examples given below for some special cases). In the curved exponential families the importance of the maximum likelihood estimates is emphasized once more, however the question of the uniqueness of such estimates seems to be still open. Let 3? = {P fl: 0 e 0} be an exponential family of probability distributions given by the densities (1) --^-) = exp {ff t(x) - x(0)} ; (fie 9) dv with respect to a carrier measure v on the sample space 9E. Denote by k the dimension of the vector 0, and suppose that 6 has a nonvoid interior in R k, int 0 4= % The function x(0) = In J" a- exp {0' t(x)} v(dx) is differentiable on int 9 and the mean and the covariance of the statistic t(x) defined by Eq. (l) are equal to 124 E e (t) = V(0):=(^,...,^Y w w ' [de, dej

ад-v.-м.,:-^}' (cf. [2], Chapter 8). In the case that D e (t) is nonsingular, from V e E (t') = D (t') we obtain by the inverse function theorem that the mapping 0 i E 0 (t) is one-to- -one on int 0. Let r be an open subset of R m (with m < k) and let q: y e T H-» i/(y) e int <9 be a mapping with continuous second order derivatives 5 2 ///3y i 5y J - and with linearly independent vectors of first order derivatives dtjjdy t,..., dtjjdy m. The family (2) {Wyer} is called a curved exponential family of dimension m (cf. [3] and [1]). The log-likelihood function of the family (2) is equal to l r (t)^t,'(y)t-x[n(y)]. Given the data point xel, the M. L. estimate (if it exists) is defined by y : = y( x ) : = Arg max / y [t(x)], and it is a solution of the normal equations: (3) o-fiffifl-w..)-m*-i c = i «o oy ; oy ; where we denoted /?(y) : = E.. (r) (t). Let (4) {{P,, (r) :y 6 r ( };(. 6 J)} be a finite or countable set of curved exponential families of (generally different) dimensions m(i); (i e J) which are imbedded in the same exponential family 0>. The union of these families will be called the generalized curved exponential family (briefly: GCEF). That means, it is the family 0> s = {? o :? o e0>,oss} where (5) S = IJ it iej and where S t = {tjxy 1 ): y l e TJ ; (i e J). The M. L. estimate in this family is (6) 0 = 0(x) = Arg max [6' t(x) - %(0)]. In contrast to the family (2), in a GCEF we can ensure the existence of the M. L. estimate, e.g. by supposing that S is compact (i.e. closed and bounded). Examples of compact sets given by Eq. (5) are: finite or bounded countable subsets of int 0, closed intervals, closed spheres contained in int 0, etc. 125

Example 1. Take SC = {x 0, x u x 2} and let 9 be the set of all strictly positive probability measures on X. 0> is an exponential family, which can be verified by setting <*t) = - d, = In [P^xO/P,^)], '*(*/) = <5, 7; ('' = L2, j = 0,1, 2), K(0) = In [1 + e 9 ' + e 92 ], O = «2 into Eq. (1). Obviously, dp 0( Xj.)/dv =? g( Xj) and E,(f,) = P e(x ;). Consider the curve /J: 76(0,i) If the data point is equal to x u then every y e(0, i) is a M. L. estimate. Indeed, we have Hence Z 7[t( Xl)] = In P vm( Xl) = In P,(y) = In (i) > In [<*,)]; (j -i 1). P-(r){*: y(x) is not unique} ^ P^xJ = \. Example 2. (The nonlinear regression.) Take in Eq. (1): S =»*, 0 = ft* t(x) = = K _1 x, where K is a positive definite iv x iv matrix, v(dx) = (2ny NI2 det~ 1/2 (K)x x exp{(- )x'k -1 x} A(dx), tc($) = (i)0'k _1 0. Then the curved family given by Eq. (2) has the density ^ - (Sri^ e* p ('-* - *»' K ~'<* - ' with respect to the Lebesgue measure k. The M. L. estimate coincides here with the least-squares estimate y m Arg min (x - i/(y))' K _1 (x - i,(y)). y The uniqueness of this estimate (with probability one) has been proved by the author in [5]. In this paper we shall prove that the M. L. estimate in a GCEF is unique with probability one, provided that the embedding family 0> has the properties: A) 3C is an open subset on U N, N ^ k. B) The statistic t: <%">-> M k defined by Eq. (1) has continuous first order derivatives and the rank of the matrix V' xt(x) (with entries dt^jdxf, (i = 1,..., k,j - I,..., N)) is equal to k. C) The family SP is dominated by the Lebesgue measure k (i.e. we can suppose that v < k). 126

We note that B) implies that t x(x) - E 0[tj(x)],..., t k(x) - E e[f t(x)] are linearly independent functions on 3C, hence D 0(t) is a nonsingular matrix. The family in Example 2 has the properties A) C). Other examples are Example 3. (The gaussian family with unknown mean and variance.) Take 3C = = R N, N ^ 2, 0 = (-oo, oo) x (-oo,0), t(x) = (f>,, > 2 ), v(dx) = (2n)-^i(dx), i = 1 i'=l x(o ue 2) = -(N/2)ln(-20 2) + 2V(0 2 /20 2). Then the density (l), but with respect to X, is equal to (*.-<0 2 1 1 2ff2 n e Mv(2*)<7 i.e. 0! = p/er 2, 0 2 = -l/(2<x 2 ). Evidently M'B'-^, '1,..., 2% 1 Hence rank [V xt'(x)] = k = 2 if we modify #" taking for 3C the set ff N - {x e R N, 3 x ; = Xy} which is evidently open, and has probability one for every Pe^. Example 4. (The beta-distribution.) Take N ^ 2, #" = {x: x e (0, i) N, x ; + x,; (i * ;)}, 0 = (-1, oo) x (-1, oo), t(x) - ( ln x In (1 - x ;)), v(dx) = A(dx), 1=1 i =1 y.(6 x, 0 2) = N{\n r(9 x + 1) + In T(0 2 + 1) - In r(0j + 0 2 + 2)}, where T( ) is the gamma function. Then the density (l) corresponds to the product of beta-densities fi r(fti + fe) xf'- i (i-x 1)''--- 1 l 1J Mr( Ml )r(^2) (JUJ = 0j + 1, At 2 = 0 2 + 1). Obviously, rvw- (_& _.,).::::-:if-..,). and rank [V xt'(x)] = 2 on if. 2. THE UNIQUENESS Let us denote.t := {t(x):xef}. Proposition 1. Under the assumptions A) and B) the set ST is an open subset of R k and F c T, X(F) = 0 => Aft"-(F)] - 0. The proof follows from Proposition A3. D 127

Denote by P* the measure induced from the measure P by the mapping t. Corollary. Under the assumptions A) C) we have for every Pe^. P'«X Denote the vector of first order derivatives and the matrix of second order derivatives of the log-likelihood function / y(t) by V r/ r(t), resp. by V vv;/ r(t). If 0 e S, i.e. des t for some i, then by / r(t) 0 resp. by V r/ r(t) we denote the log-likelihood function at 0 = i;'(y'), y' = y, resp. the vector of the first order derivatives with respect to y\,...,y m(i) (i.e. we omit the superscript i). Proposition 2. Under the assumption A) and B) we have that X{t: tej, 3 V r/ r(t) fl = 0, det [V rv;/ r(t)] fl = 0} = 0. esg Proof. From the second equality in Eq. (3) we obtain (7) det[v rv;/ T(t)] = Denote = det \>-mpf -&mr i dy tdyj dy t dy } J ;J=1J J^:= j{z:ze^, Z'MZ1=0; (i = 1,..., m)\. Any orthonormal basis w t(y),..., w k_ m(y) of P y is defined by the equations d 7j w 'i(y) w i(y) = <^,7; (i, / = L..., fc - m, j = 1,..., m). Let U x, -.., U s be the subsets of F such that on each U ; a fixed m x m submatrix of V rv'(y) is nonsingular. Evidently, the vectors w^y),..., w k^m(y) can be chosen as differentiate functions of y on each U ;. By differentiation with respect to y ; we obtain (8) W;( y )-M-L = _^WMZ) dy{ dyj dyt 8yj Take t e 3~, y e T so that [t - /}(y)]' d n{y)\dyj = 0 ; (j - 1,..., m). Then k-m t - js(y) = E c ; w /(y) for some c ;. ;=i Fix C u., c k^m and define t(y) by t(f):=ec lwl(j;) + /f(y). 128

For jf y\\ sufficiently small, we have t(y) e _". From Eq. (8) we obtain: hence from (7) it follows that Denote We have = det dy-.dyj i дy t õyj det[v r v;/ r (t)] = y c )\ w\ r V" 1 <=> ' <5?i <??; j 5y ; Lj-J W:-^),...,*^)), D:=V r '( ľ ), G:=V т E.w;( ľ ) + /ř'( ľ )]. I det [V r v; ř т (t)] = det(-gd') = det GD', -GW W'D', W'W = det( ^,)(D',W) since WW = I, WD' = 0. Since the matrix (D',W) has a full rank, we have that, under the assumption V r /,,(t) = 0, Consider the mappings det [V r v;/ r (t)] = 0 o det (-G' s W) = 0. h i :(y l,...,y m,c 1,...,c k m )eu t x M k ~ m ^ _T c. w,(y) + fi(y). ;=i s We have _~ _ (J /7 ( (U, x ff' t " m ), since to every tej there are y e T, t e {1,...,S} ; = i and c e _ k ~ m such that t = c, w ;(y) + /J(y). The set h~ x (ST) is open since 5" is open and h t is continuous. Denote by h t: h~ x (ST) h-» ST the restriction of /t; to the set hr 1^)-We have V (?, ca(y,c) = V (r;c)/ 1i(y,c) = (G,W). Hence det [V (ric)%, c)] = 0 o det [V rv.;/ r(t)] = 0. Thus the needed statement follows from Proposition A 4. Proposition 3. Under the assumptions A) and B) we obtain A{x: x T, J V r/ r[t(x)] 0~ = 0, V v/ r[t(x)] g = 0, S.OeS im*)ls = MX*)]s} = «129

Proof. Denote Further denote 9 : = {t: t e <T, 3 V,i.(t) _ = 0, det [V 7V;/ y(t)]_ = 0}. r rs = {(?, y): yet r,ye r >/ r (f) * >i s (y)}. According to Propositions 1 and 2, it is sufficient to prove that for every r, s e J X{t: t e _r - 9, 3 V,,/~(t) = 0, G.yWrs V r/ r(t) = 0, / f(t) = /-(t)} = 0. Denote by m, n the dimensions of T s and T r. Define a mapping _:T rs x (,r - ^)h->» m+n+k by.(?, y, t) := (v;/~(t), v,,/?(t), i~(t) - i-(t), _«>*) where t (,) ' := (._,...,.,.. + 1,..., f fc) and where i is choosen so that {»/ r (y)}; + 4= {t] s (y)}i (There is such a subscript i since t_ r (y) 4= >_*(.)) The Jacobian of _ is /v vv;/~(t), o, v v/~(t), o v (~, r- t)_(?,y,t)= o, v vv;/ r(t), -v...(t), o \v,v;.~(t), v tv;/-(t),.-(.)- 9'(y), i(; where I(i) is the k x (fc 1) matrix obtained from the identity matrix by removing the ith column. On the set Z : = {(., y, t): _-(y, y, t) = (0, 0, 0, t (i) ), t j T - 0} we have, _,.._ det [ v f,. Al> V' «)] = = det [V?V;/ f(t)] det [V vv;/ r(t)] det &(f) - <rxy), 1(0] * o since det [. r (y) - if(y), 1(0] = {f/ r (y)}. - {*.*($}. * 0. Thus _ is a diffeomorphism on Z, and It follows that hence dim Z = dim {y e R k+m+n : y = (0, 0, 0, t (i >); t e <r - 9} fc- 1. dim {t: t e *, 3 (y, y, t) e Z} k - 1, (r,t)-r,, A{t: 3 (y, y, t) e Z} = 0. Q (?,.)sr A direct consequence of Proposition 3 is the following theorem. Theorem. Under the assumptions A)-C), for any GCEF imbedded in 0> and for any P e _* we have 130 P{x: the M. L. estimate 0(x) is not unique} = 0,

APPENDIX Let 4>: X ^ R s be a mapping defined on an open set X <= R r (r S: s). Suppose that \jr has continuous first order derivatives (the elements of the r x s matrix V x^'(x)). Proposition Al. (The inverse function theorem.) If r = s and det [V xt^'(x)] 4= 0 on X, then xjf is a diffeomorphism (i.e. it is one-to-one and i]/' 1 : i/f(x)t~*r r is continuously differentiable). For the proof cf. [4] or [7]. Proposition A2. If \j/ is a diffeomorphism, then F ax, A(E) = 0 ^> A[«HE)] = 0. Proof. E is either bounded, or it is a countable union of bounded sets. For E bounded we have W)] - f K&Y) = f det [V>'(x)] X(dx) = 0. J*(F) Jf Proposition A3. If r _: s and rank [V x^'(x)] = s then the set ij/(x) is an open subset of JP, and p ^ ^ = Q ^ ^ 1(F)] = Q _ Proof. The set X is a finite union of sets U 1;..., U s such that on each U; a fixed s x s submatrix of V x^'(x) is nonsingular (i.e. its determinant is nonzero). Denote by p: R r t->r S and by q: fi r w B r_s the projections p(x) := (JC 1(..., x s), q(x) : = := ( x s+u > ",) Suppose that V p(x)^r'(x) is nonsingular. Define!P: U x i-> R r by «P(x) = Wx),q(x)). _* is a diffeomorphism on U, since the matrix fr (x ) = (W'(4 v, (x)^'(x)\ is nonsingular (Proposition A 1). Denote V, =»^(Ui). Obviously, ^(x) = p o 5P(x), hence V t:= p 0 *P(U,) is an open set, and ^(X) = (JV; is open as well. Further hence, according to Proposition A 2, X[p-\F n V,)] = 2[(E n V,) x' R- ] = 0, W,-\Fn V,)] = Ifr'of-^n V x)] = 0. s Consequently, 2[^_1 (E)] = A[«T X (E n V ;)] = 0. i=l Q Proposition A4. If r = s and X is bounded, then ;#(x): det [V>'(x)] = 0} = 0. The proof is given in [7], Lemma 3. 2. (Received January 24, 1985.) 131

REFERENCES [1] S. Amari: Differential geometry of curved exponential families curvatures and information loss. Ann. Statist. 10 (1982), 357-387. [2] O. Barndorff-Nielsen: Information and Exponential Families in Statistical Theory. Wiley, Chichester 1979. [3] B. Efron: The geometry of exponential families. Ann. Statist. 6 (1978), 362 376. [4] V. Jarník: Diferenciální počet II (Differential Calculus). NČSAV, Praha 1956. [5] A. Pázman: Nonlinear least squares uniqueness versus ambiguity. Math. Operationsforsch. Statist. Ser. Statist. 15 (1984), 323-336. [6] A. Pázman: Probability distribution of the multivariate nonlinear least squares estimates. Kybernetika 20 (1984), 209-230. [7] S. Sternberg: Lectures on Differential Geometry. Second printing. Prentice-Hall, Englewood Cliffs 1965. RNDr. Andrej Pázman, DrSc, Matematický ústav SAV (Mathematical Institute Slovak Academy of Sciences), ul. Obrancov mieru 49, 814 73 Bratislava. Czechoslovakia. 132