PIOTR GRACZYK LAREMA, UNIVERSITÉ D ANGERS, FRANCJA. Estymatory kowariancji dla modeli graficznych
|
|
- Roxanne Wilkerson
- 6 years ago
- Views:
Transcription
1 PIOTR GRACZYK LAREMA, UNIVERSITÉ D ANGERS, FRANCJA Estymatory kowariancji dla modeli graficznych wspó lautorzy: H. Ishi(Nagoya), B. Ko lodziejek(warszawa), S. Mamane(Johannesburg), H. Ochiai(Fukuoka) XLV Konferencja Zastosowań Matematyki, Zakopane-Kościelisko, 9/9/2016 1
2 [GI] P. Graczyk, H. Ishi, Riesz measures and Wishart laws associated to quadratic maps, Journal of Jap.Math. Soc.66(2014), [GIM] P. Graczyk, H. Ishi, S. Mamane, Wishart exponential families on cones related to A n graphs, submitted, [GIK] P. Graczyk, H. Ishi, B. Ko lodziejek, Wishart exponential families and variance function on homogeneous cones, submitted, [GIMO] P. Graczyk, H. Ishi, S. Mamane, H. Ochiai, On the Letac-Massam conjecture on cones Q An, submitted,
3 Objective: Modern statistics and analysis of χ 2 laws on matrix cones χ 2 laws on matrices are called Wishart laws Let X be a Gaussian random vector N(m, Σ) on R r, m R r, Σ Sym + r f X (x) = (det 2πΣ) 1/2 exp( 1 2 t (x m)σ 1 (x m)). X has unknown mean m and covariance Σ 3
4 We dispose of an observed sample X (1), X (2),..., X (s) We want to estimate m and Σ, using the observed results X (1), X (2),..., X (s) 4
5 The sample mean X s = X(1) + X (2) X (s) s is an estimator of m = XdP = ˆm and the sample covariance ˆΣ = 1 s s i=1 (X (i) X s ) t (X (i) X s ) is an estimator of Σ = (X m) t (X m)dp. ˆΣ is a quadratic map R r... R r Sym + (r, R). 5
6 Proposition. The estimators ˆm and ˆΣ are maximum likelihood estimators(mle), i.e. they maximize the sample density treated as a function of parameters (likelihood function) (m, Σ) s i=1 f X (x (i) ) = l(x (1),..., x (s) ; m, Σ) Proof. Maximizing of ln l(x (1),..., x (s) ; m, Σ) in (m, Σ). 6
7 Modern research in this area? All seems known and done? NO!!! Problems. Studying the MLE estimators for Σ when 1. Some entries of the vector X are known to be conditionally independent with respect to other ones. It follows that Σ is submitted to some restrictive conditions; the range of Σ is no longer the whole cone Sym + (r, R) but a subcone P Sym + (r, R) of matrices with obligatory zeros or its dual cone Q (explanation follows) 7
8 2. Some data is missing in observations X (i) A simplest example: We have s 1 full observations of X and s 2 observations of first k terms of X. How to estimate Σ from such a sample? 8
9 Conditional independence Example: Simpson paradox A university has students Half boys(24 000), half girls(24 000) At the final exams: boys and girls fail Feminist organizations threaten to close the university, girl students want to lynch the president! However, the president of the university proves that the results R of the exams are independent of the sex S of a student, knowing the department D 9
10 3 departments: A(sciences) B(literature, history, languages) C(law) students each A Succ. Fail B Succ. Fail C Succ. Fail Girls Boys Actually (R S) (D = d) for d =A,B,C 10
11 Conditional independence in a.c. case X = (X 1, X 2, X 3 ) : Random vector f X1,X 2,X 3 (x 1, x 2, x 3 ) : density function X 1 and X 3 are conditionally independent knowing X 2 f X1,X 3 X 2 =x 2 = f X1 X 2 =x 2 f X3 X 2 =x 2 f X1,X 2,X 3 (x 1, x 2, x 3 ) = F (x 1, x 2 )G(x 2, x 3 ) 11
12 Example 1. X N(0, Σ), Σ Sym + 3 f X1,X 2,X 3 (x 1, x 2, x 3 ) = (det 2πΣ) 1/2 exp( t xσ 1 x/2) Put σ := Σ 1. Mixing x 1 and x 3 can be avoided only when σ 13 = 0: f X1,X 2,X 3 (x 1, x 2, x 3 ) = (2π) 3/2 (det σ) 1/2 exp ( (σ 11 x σ 12x 1 x 2 + σ 22 x 2 2 )/2) exp ( (2σ 23 x 2 x 3 + σ 33 x 2 3 )/2) Therefore, (X 1 X 3 ) X 2 σ 13 = 0 The matrix σ = Σ 1 has obligatory zeros σ 13 = σ 31 = 0 12
13 The position of zeros in Σ 1 is encoded by a graph G = (V, E) : undirected graph V = {1,..., r} : the set of vertices E V V : the set of edges i j (i, j) E Z G := { x Sym(r, R) x ij = 0 if i j and i j } P G := Z G Sym + r a sub-cone of Sym + r X N(0, Σ), Σ 1 P G X i and X j are conditionally independent knowing all other components if i j and i j Example 1 (X 1 X 3 ) X 2 corresponds to G:
14 Example 1. Graph G = A 3 : Z G := x 11 x 12 0 x 12 x 22 x 23 x ij R 0 x 23 x 33 P G := Z G Sym + 3 This cone is homogeneous (GL(P G ) acts transitively on P G ) 14
15 Z G := ξ 11 ξ 12 ξ 12 ξ 22 ξ 23 x ij R ξ 23 ξ 33 P G = Q G := { ξ Z G tr xξ > 0 for all x Ω 1 \ {0} } = { ξ ZG ξ 11 ξ 12 ξ 12 ξ 22 > 0, ξ 22 ξ 23 ξ 23 ξ 33 > 0, ξ 33 > 0 } 15
16 Example 2. Graph G = A 4 : Z G := x 11 x x 21 x 22 x x 32 x 33 x 43 x 11,..., x 44 R 0 0 x 43 x 44 P G := Z G S + 4 This cone is non-homogeneous P G = Q G := { ξ Z G tr xξ > 0 for all x Ω 1 \ {0} } = { ξ ZG ξ 11 ξ 12 ξ 12 ξ 22 > 0, ξ 22 ξ 23 ξ 23 ξ 33 > 0, ξ 33 ξ 34 ξ 34 ξ 44 > 0, ξ 44 > 0 } 16
17 Theory of graphical models started in 1976 by Lauritzen and Speed, is for decomposable graphs G G is decomposable G has no cycle of length 4 as an induced subgraph Example: A 4 = from Example 2 Ω G Z G is homogeneous if and only if G is decomposable and A 4 -free (Letac-Massam, Ishi) 17
18 Back to the estimation of the covariance matrix of a normal vector Conditional independence case Simplification: X = t (X 1,..., X n ) : random vector obeying N(0, Σ) known mean 0, unknown covariance matrix Σ S + n with Σ 1 P G = Z G S + n Sample X (1),..., X (s) R n The MLE of Σ on the whole cone S n + is (when s n) Σ = 1 s {X(1) t X (1) +X (2) t X (2) + +X (s) t X (s) } Sym(n, R) Let us look for an estimator when Σ 1 P G. 18
19 Example 1: G=1-2-3, σ 13 = 0, we shall estimate σ = Σ 1 P G and Σ = σ 1 The density function of the sample X (1),..., X (s) R 3 f(x (1),..., x (s) ; σ) = = s k=1 {(2π) 3/2 (det σ) 1/2 exp( t x (k) σx (k) /2)} = (2π) 3s/2 (det σ) s/2 exp( s k=1 t x (k) σx (k) /2) Note that s k=1 t x (k) σx (k) = tr ( s k=1 x (k)t x (k) )σ = s Σ, σ = π(s Σ), σ where π = projection on Z G (since σ 13 = 0) 19
20 f(x (1),..., x (s) ; σ) = (2π) 3s/2 (det σ) s/2 exp( 1 2 π(s Σ), σ ) Which σ P G is most likely? Maximum Likelihood Estimation it s σ = ˆσ for which f(x (1),..., x (s) ; ˆσ) is maximum We study log f(x (1),..., x (s) ; σ) as a function of σ Z G grad σ does not contain grad log det x = x 1 σ 13 ; grad log f(x (1),..., x (s) ; σ) = s 2 (π(σ 1 ) π( Σ)) If s 3, then π( Σ) belongs to Q G. 20
21 We must find ˆσ P G such that π(ˆσ 1 ) = π( Σ) Q G The inverse map Ψ to x P G π(x 1 ) Q G is needed. Only if π = Id this is trivial (Ψ(y) = y 1 ) Then ˆσ = Ψ(π( Σ)) P G is the MLE of σ and, consequently ˆΣ = ˆσ 1 = (Ψ(π( Σ)) 1 is the MLE of Σ. π( Σ) is the MLE of π(σ), which determines Σ The MLE argument is valid for any graphical cone. 21
22 CRUCIAL POINTS: 1. law of π( Σ) on Q G 2. knowledge of the inverse map Ψ 22
23 law of π( Σ) on Q G π( Σ) is a quadratic map of the normal sample with values in Q G : ˆΣ = π( Σ) = 1 s π(x(1) t X (1) +X (2) t X (2) + +X (s) t X (s) ) π( Σ) Q G obeys a Wishart law on Q G 23
24 Missing data The simplest example: X = t (X 1, X 2 ) R 2, X N(µ, Σ) We dispose of s 1 full observations X (1),..., X (s 1) s 2 incomplete observations X (1),..., X (s 2) X = t (X 1, ): the second entry X 2 is not observed It is natural to set X conditional law of X 1 (X 2 = µ 2 ) = N(µ 1, σ = Σ 1 1 σ 11 ), 24
25 f(x (1),..., x (s 1), x (1),..., x (s 2) ; σ) = (2π) (s 1+s 2 )/2 (det σ) s 1/2 σ s 2/2 11 exp{ 1 2 [ i t x (i) σ x (i) + j σ 11 ( x (j) ) 2 ]}, x = x µ The power function appears s ( 1 +s 2 2, s 1 2 ) (σ) = σs 2/2 11 (det σ)s 1/2 The MLE equation grad σ log f = 0 is grad log ( s 1 +s 2 2, s 1 2 ) (σ) = π( q( x(1),..., x (s 1), x (1),..., x (s 2) )) with a quadratic form ( q(x) equal on the sample x to 12 1 s 1 2 i=1 x (i) t x s 2 (i) + j=1 ( x (j) ) 2 ) 0 = q s ,2 (x) q s 2 1 (x ) 25
26 The inverse map Ψ s to x P G grad log ( s 1 +s 2 2, s 1 2 ) (σ) Q G is needed. We show its existence and write it explicitely ([GIM]). Thus ˆσ = Ψ s (π( q(x)) P G is the MLE of σ and, consequently π( q(x)) is the MLE of π(σ), which determines Σ 26
27 The equation π(y 1 ) = ξ Q G, y P G was solved by Lauritzen(1991) for decomposable graphs G, in terms of graph theory (cliques, separators...); the inverse map is called the Lauritzen map We give such maps in monotonous missing data set-up: Ψ s = (π(grad log (s1,...,s n ) ) 1 = grad log δ (s1,...,s n ) where δ (s1,...,s n ) later) is another power function (definitions 27
28 From now on, G = A n = n Q An and P An are important non-homogeneous(n 4) cones appearing in the statistical theory of graphical models They correspond to the practical model of nearest neighbour interactions: in the Gaussian character (X 1, X 2,..., X n ), non-neighbours X i, X j, i j > 1 are conditionally independent with respect to other variables. 28
29 The Wishart laws on Q G and P G were first studied by Letac, Massam G. Letac and H. Massam, Wishart distributions for decomposable graphs, The Annals of Statistics, 35 (2007), Motivations: - MLE theory Lauritzen, S.L. (1996) Graphical models. Oxford University Press - Bayesian statistics: searching conjugate prior distributions for π(σ) Q G and σ P G Diaconis P. and Ylvisaker D. (1979) Conjugate priors for exponential families. The Annals of Statistics 7(2):
30 Letac-Massam, following the Lauritzen graph theory methods, define power functions H on Q G and P G. For G = A n = n they are defined: on Q An by: and on P An by H(α, β, η) = n 1 i=1 η {i:i+1} α i n 1 i=2 ηβ i ii H(α, β, π(y 1 )) 30
31 Our results: 1. we introduce natural power functions δ (M) s (η) on Q An, (M) s (y) on P An, M = 1,..., n which contain (strictly) the Letac-Massam functions H. These functions are densities of Riesz measures on the cones Q An and P An respectively 2. We construct all Wishart families on Q An and P An generated by δ s (M) (η) and s (M) (y). They contain (strictly) the Letac-Massam Wishart families γ Q (α,β,y) and γ P (α,β,η) We give their properties (density, moments) 31
32 3. we give an essential extension and simplification of Letac-Massam theory 4. we prove the Letac-Massam Conjecture on Q An, on Laplace transform property of H(α, β, η) 5. we find MLE of σ and π(σ) in the monotonous missing data case: we construct an infinity of Lauritzen-type inverse maps Ψ s : Q An P An
33 6. We determine the variance function V (m) for the Wishart families on Q An, where the mean m = m s (y) is the expectation of γ s,y. In [GIK] V (m) is determined for homogeneous cones This is done thanks to an explicit form of the inverse mean map ψ s. Ψ s and ψ s are closely related: Ψ s = ψ s
34 Density of q(x), X= a normal sample In order to find MLEs, one needs quadratic forms q(x (1),..., X (s) ) Wishart law γ q,σ the simplest case: q(x) = x t x µ q = q(leb R r) a Riesz measure L µq (η) = R r e η,x t x dx = R r e t xηx dx = π r/2 (det η) 1/2 For an s-sample L µ s (η) = πrs/2 (det η) s/2 q For more general q: L µ q (η) = c. s (η) s (x) := r k=1 (det x {1,...,k} ) s k s k+1 = x s ( ) 1 det x{1,2} s2 ( 11 x ) sr det x det x {1,...,r 1} 32
35 If µ is a measure on a cone Ω V = R n, then the family of probability measures γ y (dx) = e (x,y) L(µ)(y) µ(dx) is called exponential family generated by µ. The density of γ s,σ is e (y,σ) L µ s(σ) µ s(dy). Thus, the density of the Riesz measure µ s is crucial. 33
36 When Ω = Sym + n = S + n, the density of the Riesz measure is given by the multiparameter Siegel integral: For s C n with Rs k > k 1 2 (k = 1,..., n), S + n where δ s (x) = e tr xξ δ s (x)(det x) n+1 2 dx = Γ S + n (s) s (ξ) (ξ S + n ), ( ) s1 ( det x det x{n 1,n} det x... {2,...,n} x nn Γ S + n (s) := π n 1 2 n k=1 Γ(s k k 2 ) ) sn 1 x s n nn, 34
37 POWER FUNCTIONS AND THEIR LAPLACE TRANS- FORMS for G = A n Eliminating orders of vertices There are many (but not all) orders of vertices 1, 2,..., n that we should consider in order to have a harmonious theory of Riesz and Wishart distributions on the cones related to A n graphs. These orders are called eliminating orders of vertices. Let v + be the set of future(w.r. to the order) neigbours (w.r. to the graph) of v. 35
38 An eliminating order of the vertices of G is a permutation {v 1,..., v n } of V such that for all v, the set v + is a complete graph Example. For the graph A 3 : 1 2 3: the orders 1 2 3, 1 3 2, and are eliminating orders and are not eliminating. Proposition. All eliminating orders on A n are obtained by an intertwining of two sequences <... M 1 M n n 1... M + 2 M + 1 M for an M V. 36
39 Definition 1. For s C n, we define functions s : P G C n and δs : Q G C n by s (y) := (det y ) {v} v sv v V det y v (y P G ), δs (η) := (det η ) {v} v + sv det η v + (η Q G ) v V where det y = 1 = det η, V = {1,..., n} Property. On the cones Q An and P An, the power functions s (y) and δ s (η) for eliminating orders, depend only on the maximal element M of the order. We write s (y) = (M) s (y), δs (η) = δ s (M) (η) 37
40 Proposition 2. Let M = 1,..., n. The M-power functions s (M) (y) on P G and δ s (M) (x) on Q G are given by: (M) s (y) = y s 1 s y {1:M 1} s M 1 s M y s M y {M+1:n} s M+1 s M... y s n s n 1 nn. δ (M) s (η) = M 1 i=1 η {i:i+1} s i n i=m+1 η {i 1:i} s i M 1 i=2 ηs i 1 ii η s M 1 s M +s M+1 MM n 1 i=m+1 ηs i+1 ii. Theorem 3. For all y P An, δ (M) s (π(y 1 )) = s (M) (y). 38
41 Characteristic function of a cone ϕ Ω (x) = Ω e (x,y) dy = L (Ω,Leb) (Leb Ω )(x) ϕ Ω (x)dx is the invariant measure of the cone Ω: f(gx)ϕ Ω(x)dx = f(x)ϕ Ω(x)dx. Ω Ω For n 2 define ϕ n : Q An R + by ϕ n (η) = n 1 i=1 η {i,i+1} 3/2 η ii i 1,n We will see that ϕ n is the characteristic function of the cone Q An. 39
42 Theorem 4. For all n 1, for all 1 M n and for all y P An, the integral Q e tr(yη) A δ s (M) (η)ϕ n An (η)dη converges if and only if s i > 2 1, for all i M and s M > 0. have i M In this case, we Q e tr(yη) A δ s (M) (η)ϕ n An (η)dη { = π (n 1)/2 Γ(s i 1 } 2 ) Γ(s M ) (M) (y). s 40
43 Theorem 5. For all n 1, for all 1 M n and for all η Q An, the integral P A n e tr(yη) (M) s (y)dy converges if and only if s i > 3 2, for all i M and s M > 1. And in this case, we have P e tr(yη) A (M) s (y)dy n { = π (n 1)/2 Γ(s i + 3 } 2 ) Γ(s M + 1)δ (M) (η)ϕ A n (η). i M s 41
44 LETAC-MASSAM LAPLACE INTEGRALS Recall Letac-Massam power functions on Q An H(α, β, η) = n 1 i=1 η {i:i+1} α i n 1 i=2 ηβ i ii The Laplace transform formula y P An Q A n e tr(yη) H(α, β, η)ϕ QA n (η)dη = C α,βh(α, β, π 1 (y)), will be referred to as the Letac-Massam formula on Q An 42
45 Define r i = α i β i+1, for all 1 i n 3 and p i = α i β i, for all 3 i n 1. We have H(α, β, η) = δ s (M) M 1 (η) η r i 1 ii i=2 n 1 i=m+1 η p i ii, where s i = α i, for all 1 i M 1; s i = α i 1, for all M + 1 i n and β M = s M 1 s M + s M+1. We have proved Theorem. The Letac-Massam formula on Q An holds if and only if H(α, β, η) = δ s (M) (η) for some M = 2,..., n 1. (a new formulation of Letac-Massam conjecture ) 43
46 Methods. Change of variables Let Φ n : R + R P An 1 P An, (a, b, z) y with y = 1 b a z 0 1 b T Let Ψ n : R + R Q An 1 Q An, (α, β, x) η with η = π 1 β T α x 0 1 β The maps Φ n and Ψ n are bijections. 44
47 Back to MLE of σ and Σ Theorem The map has the inverse map grad log (M) s : P An Q An grad log δ (M) s : Q An P An proved on homogeneous cones by Kai-Nomura (2005) solution of MLE equation grad log s (ˆσ) = π(q s 1 1,2 (x) q s 2 1 (x )) 45
PIOTR GRACZYK LAREMA, UNIVERSITÉ D ANGERS, FRANCE. Analysis of Wishart matrices: Riesz and Wishart laws on graphical cones. Letac-Massam conjecture
PIOTR GRACZYK LAREMA, UNIVERSITÉ D ANGERS, FRANCE Analysis of Wishart matrices: Riesz and Wishart laws on graphical cones Letac-Massam conjecture Joint work with H. Ishi, S. Mamane, H. Ochiai Mathematical
More informationThe tiling by the minimal separators of a junction tree and applications to graphical models Durham, 2008
The tiling by the minimal separators of a junction tree and applications to graphical models Durham, 2008 G. Letac, Université Paul Sabatier, Toulouse. Joint work with H. Massam 1 Motivation : the Wishart
More informationGaussian representation of a class of Riesz probability distributions
arxiv:763v [mathpr] 8 Dec 7 Gaussian representation of a class of Riesz probability distributions A Hassairi Sfax University Tunisia Running title: Gaussian representation of a Riesz distribution Abstract
More informationGraphical Gaussian models and their groups
Piotr Zwiernik TU Eindhoven (j.w. Jan Draisma, Sonja Kuhnt) Workshop on Graphical Models, Fields Institute, Toronto, 16 Apr 2012 1 / 23 Outline and references Outline: 1. Invariance of statistical models
More informationFLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS
Submitted to the Annals of Statistics FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS By Bala Rajaratnam, Hélène Massam and Carlos M. Carvalho Stanford University, York University, The University
More informationFLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS
Submitted to the Annals of Statistics FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS By Bala Rajaratnam, Hélène Massam and Carlos M. Carvalho Stanford University, York University, The University
More informationTesting Equality of Natural Parameters for Generalized Riesz Distributions
Testing Equality of Natural Parameters for Generalized Riesz Distributions Jesse Crawford Department of Mathematics Tarleton State University jcrawford@tarleton.edu faculty.tarleton.edu/crawford April
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models I
MATH 829: Introduction to Data Mining and Analysis Graphical Models I Dominique Guillot Departments of Mathematical Sciences University of Delaware May 2, 2016 1/12 Independence and conditional independence:
More informationSymmetry Characterization Theorems for Homogeneous Convex Cones and Siegel Domains. Takaaki NOMURA
Symmetry Characterization Theorems for Homogeneous Convex Cones and Siegel Domains Takaaki NOMURA (Kyushu University) Cadi Ayyad University, Marrakech April 1 4, 2009 1 2 Interpretation via Cayley transform
More informationLet X and Y denote two random variables. The joint distribution of these random
EE385 Class Notes 9/7/0 John Stensby Chapter 3: Multiple Random Variables Let X and Y denote two random variables. The joint distribution of these random variables is defined as F XY(x,y) = [X x,y y] P.
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationContingency tables from the algebraic statistics view point Gérard Letac, Université Paul Sabatier, Toulouse
Contingency tables from the algebraic statistics view point Gérard Letac, Université Paul Sabatier, Toulouse College Park, October 14 th, 2010.. Joint work with Hélène Massam 1 Contingency tables Let I
More informationDecomposable and Directed Graphical Gaussian Models
Decomposable Decomposable and Directed Graphical Gaussian Models Graphical Models and Inference, Lecture 13, Michaelmas Term 2009 November 26, 2009 Decomposable Definition Basic properties Wishart density
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationLikelihood Analysis of Gaussian Graphical Models
Faculty of Science Likelihood Analysis of Gaussian Graphical Models Ste en Lauritzen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 2 Slide 1/43 Overview of lectures Lecture 1 Markov Properties
More information1 Data Arrays and Decompositions
1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is
More informationBayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)
Bayesian (conditionally) conjugate inference for discrete data models Jon Forster (University of Southampton) with Mark Grigsby (Procter and Gamble?) Emily Webb (Institute of Cancer Research) Table 1:
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationDETECTION theory deals primarily with techniques for
ADVANCED SIGNAL PROCESSING SE Optimum Detection of Deterministic and Random Signals Stefan Tertinek Graz University of Technology turtle@sbox.tugraz.at Abstract This paper introduces various methods for
More informationThe Multivariate Gaussian Distribution
The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance
More informationMarkov Random Fields
Markov Random Fields 1. Markov property The Markov property of a stochastic sequence {X n } n 0 implies that for all n 1, X n is independent of (X k : k / {n 1, n, n + 1}), given (X n 1, X n+1 ). Another
More informationFormulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More informationMultiple Random Variables
Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution function is F ( XY x, y) P X x Y y. F XY ( ) 1, < x
More informationProblem Selected Scores
Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected
More informationMultivariate Statistics
Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical
More information2 Functions of random variables
2 Functions of random variables A basic statistical model for sample data is a collection of random variables X 1,..., X n. The data are summarised in terms of certain sample statistics, calculated as
More informationQualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama
Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Instructions This exam has 7 pages in total, numbered 1 to 7. Make sure your exam has all the pages. This exam will be 2 hours
More informationPosterior convergence rates for estimating large precision. matrices using graphical models
Biometrika (2013), xx, x, pp. 1 27 C 2007 Biometrika Trust Printed in Great Britain Posterior convergence rates for estimating large precision matrices using graphical models BY SAYANTAN BANERJEE Department
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationIntroduction to Graphical Models
Introduction to Graphical Models STA 345: Multivariate Analysis Department of Statistical Science Duke University, Durham, NC, USA Robert L. Wolpert 1 Conditional Dependence Two real-valued or vector-valued
More informationAN EXACT SEQUENCE FOR THE BROADHURST-KREIMER CONJECTURE
AN EXACT SEQUENCE FOR THE BROADHURST-KREIMER CONJECTURE FRANCIS BROWN Don Zagier asked me whether the Broadhurst-Kreimer conjecture could be reformulated as a short exact sequence of spaces of polynomials
More informationGeneral Bayesian Inference I
General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for
More informationSubmitted to the Brazilian Journal of Probability and Statistics
Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More informationPartitioned Covariance Matrices and Partial Correlations. Proposition 1 Let the (p + q) (p + q) covariance matrix C > 0 be partitioned as C = C11 C 12
Partitioned Covariance Matrices and Partial Correlations Proposition 1 Let the (p + q (p + q covariance matrix C > 0 be partitioned as ( C11 C C = 12 C 21 C 22 Then the symmetric matrix C > 0 has the following
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationwhere r n = dn+1 x(t)
Random Variables Overview Probability Random variables Transforms of pdfs Moments and cumulants Useful distributions Random vectors Linear transformations of random vectors The multivariate normal distribution
More informationConditional Independence and Markov Properties
Conditional Independence and Markov Properties Lecture 1 Saint Flour Summerschool, July 5, 2006 Steffen L. Lauritzen, University of Oxford Overview of lectures 1. Conditional independence and Markov properties
More informationGraphical Models and Independence Models
Graphical Models and Independence Models Yunshu Liu ASPITRG Research Group 2014-03-04 References: [1]. Steffen Lauritzen, Graphical Models, Oxford University Press, 1996 [2]. Christopher M. Bishop, Pattern
More informationTotal positivity in Markov structures
1 based on joint work with Shaun Fallat, Kayvan Sadeghi, Caroline Uhler, Nanny Wermuth, and Piotr Zwiernik (arxiv:1510.01290) Faculty of Science Total positivity in Markov structures Steffen Lauritzen
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationIntroduction to Normal Distribution
Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction
More informationLecture 3. Inference about multivariate normal distribution
Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates
More informationDecomposable Graphical Gaussian Models
CIMPA Summerschool, Hammamet 2011, Tunisia September 12, 2011 Basic algorithm This simple algorithm has complexity O( V + E ): 1. Choose v 0 V arbitrary and let v 0 = 1; 2. When vertices {1, 2,..., j}
More informationMultivariate Gaussians. Sargur Srihari
Multivariate Gaussians Sargur srihari@cedar.buffalo.edu 1 Topics 1. Multivariate Gaussian: Basic Parameterization 2. Covariance and Information Form 3. Operations on Gaussians 4. Independencies in Gaussians
More informationMAS223 Statistical Inference and Modelling Exercises
MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationAccurate directional inference for vector parameters
Accurate directional inference for vector parameters Nancy Reid February 26, 2016 with Don Fraser, Nicola Sartori, Anthony Davison Nancy Reid Accurate directional inference for vector parameters York University
More informationNotes on Random Vectors and Multivariate Normal
MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution
More informationOn corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR
Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional
More informationZonal Polynomials and Hypergeometric Functions of Matrix Argument. Zonal Polynomials and Hypergeometric Functions of Matrix Argument p.
Zonal Polynomials and Hypergeometric Functions of Matrix Argument Zonal Polynomials and Hypergeometric Functions of Matrix Argument p. 1/2 Zonal Polynomials and Hypergeometric Functions of Matrix Argument
More informationWISHART/RIETZ DISTRIBUTIONS AND DECOMPOSABLE UNDIRECTED GRAPHS BY STEEN ANDERSSON INDIANA UNIVERSITY AND THOMAS KLEIN AUGSBURG UNIVERSITÄET
. WISHART/RIETZ DISTRIBUTIONS AND DECOMPOSABLE UNDIRECTED GRAPHS BY STEEN ANDERSSON INDIANA UNIVERSITY AND THOMAS KLEIN AUGSBURG UNIVERSITÄET 1 TARTU 6JUN007 Classical Wishart distributions. Let V be a
More informationMath 215B: Solutions 1
Math 15B: Solutions 1 Due Thursday, January 18, 018 (1) Let π : X X be a covering space. Let Φ be a smooth structure on X. Prove that there is a smooth structure Φ on X so that π : ( X, Φ) (X, Φ) is an
More information3. Probability and Statistics
FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important
More informationLinear-Time Inverse Covariance Matrix Estimation in Gaussian Processes
Linear-Time Inverse Covariance Matrix Estimation in Gaussian Processes Joseph Gonzalez Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 jegonzal@cs.cmu.edu Sue Ann Hong Computer
More informationLecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)
Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:
More informationErrata for Robot Vision
Errata for Robot Vision This is a list of known nontrivial bugs in Robot Vision 1986) by B.K.P. Horn, MIT Press, Cambridge, MA ISBN 0-6-08159-8 and McGraw-Hill, New York, NY ISBN 0-07-030349-5. If you
More informationEstimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator
Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationProblem Max. Possible Points Total
MA 262 Exam 1 Fall 2011 Instructor: Raphael Hora Name: Max Possible Student ID#: 1234567890 1. No books or notes are allowed. 2. You CAN NOT USE calculators or any electronic devices. 3. Show all work
More informationconditional cdf, conditional pdf, total probability theorem?
6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random
More informationOn Boolean functions which are bent and negabent
On Boolean functions which are bent and negabent Matthew G. Parker 1 and Alexander Pott 2 1 The Selmer Center, Department of Informatics, University of Bergen, N-5020 Bergen, Norway 2 Institute for Algebra
More informationModel Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao
Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley
More informationECE 275A Homework 7 Solutions
ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator
More informationSupplement: Universal Self-Concordant Barrier Functions
IE 8534 1 Supplement: Universal Self-Concordant Barrier Functions IE 8534 2 Recall that a self-concordant barrier function for K is a barrier function satisfying 3 F (x)[h, h, h] 2( 2 F (x)[h, h]) 3/2,
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationSolution to Assignment 3
The Chinese University of Hong Kong ENGG3D: Probability and Statistics for Engineers 5-6 Term Solution to Assignment 3 Hongyang Li, Francis Due: 3:pm, March Release Date: March 8, 6 Dear students, The
More informationAverage theorem, Restriction theorem and Strichartz estimates
Average theorem, Restriction theorem and trichartz estimates 2 August 27 Abstract We provide the details of the proof of the average theorem and the restriction theorem. Emphasis has been placed on the
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationMultivariate Analysis and Likelihood Inference
Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density
More informationMath Exam 2, October 14, 2008
Math 96 - Exam 2, October 4, 28 Name: Problem (5 points Find all solutions to the following system of linear equations, check your work: x + x 2 x 3 2x 2 2x 3 2 x x 2 + x 3 2 Solution Let s perform Gaussian
More informationLearning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University
Learning from Sensor Data: Set II Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University 1 6. Data Representation The approach for learning from data Probabilistic
More informationGraphical Models with Symmetry
Wald Lecture, World Meeting on Probability and Statistics Istanbul 2012 Sparse graphical models with few parameters can describe complex phenomena. Introduce symmetry to obtain further parsimony so models
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationMath 172 Problem Set 5 Solutions
Math 172 Problem Set 5 Solutions 2.4 Let E = {(t, x : < x b, x t b}. To prove integrability of g, first observe that b b b f(t b b g(x dx = dt t dx f(t t dtdx. x Next note that f(t/t χ E is a measurable
More informationAn Algebraic and Geometric Perspective on Exponential Families
An Algebraic and Geometric Perspective on Exponential Families Caroline Uhler (IST Austria) Based on two papers: with Mateusz Micha lek, Bernd Sturmfels, and Piotr Zwiernik, and with Liam Solus and Ruriko
More informationMarginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density
Marginal density If the unknown is of the form x = x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density πx 1 y) = πx 1, x 2 y)dx 2 = πx 2 )πx 1 y, x 2 )dx 2 needs to be
More informationErrata for Robot Vision
Errata for Robot Vision This is a list of known nontrivial bugs in Robot Vision 1986) by B.K.P. Horn, MIT Press, Cambridge, MA ISBN 0-6-08159-8 and McGraw-Hill, New York, NY ISBN 0-07-030349-5. Thanks
More informationSUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)
SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems
More information4. Distributions of Functions of Random Variables
4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n
More informationNAVARRO VERTICES AND NORMAL SUBGROUPS IN GROUPS OF ODD ORDER
NAVARRO VERTICES AND NORMAL SUBGROUPS IN GROUPS OF ODD ORDER JAMES P. COSSEY Abstract. Let p be a prime and suppose G is a finite solvable group and χ is an ordinary irreducible character of G. Navarro
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationGaussian vectors and central limit theorem
Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables
More informationPCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities
PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets
More informationMeixner matrix ensembles
Meixner matrix ensembles W lodek Bryc 1 Cincinnati April 12, 2011 1 Based on joint work with Gerard Letac W lodek Bryc (Cincinnati) Meixner matrix ensembles April 12, 2011 1 / 29 Outline of talk Random
More informationEcon 508B: Lecture 5
Econ 508B: Lecture 5 Expectation, MGF and CGF Hongyi Liu Washington University in St. Louis July 31, 2017 Hongyi Liu (Washington University in St. Louis) Math Camp 2017 Stats July 31, 2017 1 / 23 Outline
More informationA Probability Review
A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in
More informationA Bivariate Distribution whose Marginal Laws are Gamma and Macdonald
International Journal of Mathematical Analysis Vol. 1, 216, no. 1, 455-467 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/1.12988/ijma.216.6219 A Bivariate Distribution whose Marginal Laws are Gamma Macdonald
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationTail probability of linear combinations of chi-square variables and its application to influence analysis in QTL detection
Tail probability of linear combinations of chi-square variables and its application to influence analysis in QTL detection Satoshi Kuriki and Xiaoling Dou (Inst. Statist. Math., Tokyo) ISM Cooperative
More informationSTA205 Probability: Week 8 R. Wolpert
INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and
More informationProblems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B
Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2
More informationLagrangian submanifolds and generating functions
Chapter 4 Lagrangian submanifolds and generating functions Motivated by theorem 3.9 we will now study properties of the manifold Λ φ X (R n \{0}) for a clean phase function φ. As shown in section 3.3 Λ
More informationA6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 Reading Chapter 5 (continued) Lecture 8 Key points in probability CLT CLT examples Prior vs Likelihood Box & Tiao
More informationVariational Mixture of Gaussians. Sargur Srihari
Variational Mixture of Gaussians Sargur srihari@cedar.buffalo.edu 1 Objective Apply variational inference machinery to Gaussian Mixture Models Demonstrates how Bayesian treatment elegantly resolves difficulties
More informationRuijsenaars type deformation of hyperbolic BC n Sutherland m
Ruijsenaars type deformation of hyperbolic BC n Sutherland model March 2015 History 1. Olshanetsky and Perelomov discovered the hyperbolic BC n Sutherland model by a reduction/projection procedure, but
More information